Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SCALABLE SCHEDULING OF PARALLEL ITERATIVE SEISMIC JOBS
Document Type and Number:
WIPO Patent Application WO/2016/099747
Kind Code:
A1
Abstract:
System and method for scalable and reliable scheduling of iterative seismic full wavefield inversion algorithms with alternating parallel (11) and serial (12) stages of computation on massively parallel computing systems. The workers are independent, initiating actions and unaware of each other, and given limited information. This enables application of optimal scheduling, load-balancing, and reliability techniques specific to seismic inversion problems. The central dispatcher specifies the structure of the inversion, including task dependency, and keeps track of progress of work. Management tools (23) enable the user to make performance and reliability improvements during the execution of the seismic inversion.

Inventors:
BOBREK ALEKSANDAR (US)
MULLUR ANOOP A (US)
BEARD CHRISTOPHER S (US)
BRANTLEY ARRIAN M (US)
TOWNSLEY MICHAEL B (US)
DIMITROV PAVEL (US)
Application Number:
PCT/US2015/061018
Publication Date:
June 23, 2016
Filing Date:
November 17, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
EXXONMOBIL UPSTREAM RES CO (US)
BOBREK ALEKSANDAR (US)
MULLUR ANOOP A (US)
BEARD CHRISTOPHER S (US)
BRANTLEY ARRIAN M (US)
TOWNSLEY MICHAEL B (US)
DIMITROV PAVEL (US)
International Classes:
G01V99/00
Foreign References:
US20120316850A12012-12-13
US20130030777A12013-01-31
US20130238249A12013-09-12
US20140257780A12014-09-11
US20130090906A12013-04-11
US8121823B22012-02-21
Other References:
EN-JUI LEE ET AL.: "An optimized parallel LSQR algorithm for seismic tomography", COMPUTERS & GEOSCIENCES, vol. 61, 12 September 2013 (2013-09-12), pages 184 - 197, XP002756253
TARANTOLA, A.: "Inversion of seismic reflection data in the acoustic approximation", GEOPHYSICS, vol. 49, 1984, pages 1259 - 1266, XP009173614, DOI: doi:10.1190/1.1441754
SIRGUE, L.; PRATT G.: "Efficient waveform inversion and imaging: A strategy for selecting temporal frequencies", GEOPHYSICS, vol. 69, 2004, pages 231 - 248, XP002576516
FALLAT, M. R.; DOSSO, S. E.: "Geoacoustic inversion via local, global, and hybrid algorithms", JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, vol. 105, 1999, pages 3219 - 3230, XP012001005, DOI: doi:10.1121/1.424651
ARAYA-POLO, M.: "3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors", SCIENTIFIC PROGRAMMING, vol. 17, no. 1-2, 2009, pages 185 - 198
BROSSIER, R.: "Two-dimensional frequency-domain visco-elastic full waveform inversion: Parallel algorithms, optimization and performance", COMPUTERS AND GEOSCIENCES, vol. 37, no. 4, 2010, pages 444 - 455, XP028369279, DOI: doi:10.1016/j.cageo.2010.09.013
SUH, S.Y.: "Cluster programming for reverse time migration", THE LEADING EDGE, January 2010 (2010-01-01)
Attorney, Agent or Firm:
LOBATO, Ryan, L. et al. (CORP-URC-E2.4A.296P.O. Box 218, Houston TX, US)
Download PDF:
Claims:
CLAIMS

1. A parallel computing system for full wavefield inversion of seismic data, comprising: a pool of computational units, called workers, each programmed to operate independently of each other and initiate and perform one or more computational tasks that arise in the course of the full wavefield inversion, and having an input stage adapted to receive information needed to perform a task, and an output stage; wherein the full wavefield inversion is organized into a series of parallel computational stages and serial computational stages, and each parallel stage is divided into a plurality of parallel computational tasks sized for a worker according to a selected job scale;

a central dispatcher processor programmed to:

(a) provide task queues where tasks are made available to the workers and where the computational units output results of a completed task, wherein dependencies between tasks are enforced;

(b) monitor and store information relating to a current state of the full wavefield inversion; and

a controller, interconnected with the central dispatcher and programmed with an optimization algorithm that decides whether to halt a computational task that has not converged after a predetermined number of iterations.

2. The system of claim 1, wherein parallel stages of the full wavefield inversion comprise a gradient computation stage and a line search stage.

3. The system of claim 2, wherein the computational stages further comprise at least one stacking stage, which may be parallel or serial, in which results of parallel computations from a previous parallel stage are summed.

4. The system of claim 3, wherein workers include logic and heuristics to enable them to determine an optimal sized parcel stack to sum, or to place partial stacking results back into an input queue, allowing results to be recursively accumulated.

5. The system of claim 2, wherein the gradient computation stage comprises forward model-simulation of predicted data followed by computation of a cost function measuring difference between predicted data and the seismic data to be inverted, followed by computation of a gradient of the cost function with respect to parameters of the model, wherein computation of the gradient comprises reverse model-simulation of predicted data.

6. The system of claim 1, wherein tasks are assembled in the central dispatcher before assignment in task queues such that any dependencies between tasks can be enforced.

7. The system of claim 1, wherein the central dispatcher, the workers, and the controller each comprise one or more CPUs with associated data storage.

8. The system of claim 7, wherein the controller comprises one or more management tools selected from a group consisting of a data visualization and analysis tool, an inversion control tool, and a workflow integration tool.

9. The system of claim 8, wherein the management tools are adapted to enable a human user to monitor the current state of the full wavefield inversion and make runtime decisions on matters including task definition and number of workers to be utilized.

10. The system of claim 9, wherein the management tools in combination with the central dispatcher are adapted to use said state information to schedule tasks and balance load among the workers, wherein task completion times are monitored and decisions made on when to stop or restart a task based on one or more heuristic or statistical analysis tools.

11. The system of claim 1, wherein the current state of the full wavefield inversion includes one or more of

number of seismic gathers to process, number completed, and number currently processing;

task handout and completion times; and

input parameters necessary to execute the tasks.

12. The system of claim 1, wherein tasks for parallel stages are defined by source shot.

13. The system of claim 1, wherein the central dispatcher processor is programmed to use run-time data accumulated on completed tasks to set a timeout value for a task that is still running.

14. The system of claim 1, wherein the controller is programmed to allow a user to monitor progress of the state of the inversion.

15. A seismic prospecting method for exploring for hydrocarbons, comprising:

obtaining seismic survey data;

processing the seismic survey data by full wavefield inversion to infer a subsurface model of velocity or other physical parameter, wherein the processing is performed on a system of parallel processers, called workers;

dividing the full wavefield inversion into a sequence of parallel stages and serial stages, and defining one or more computational tasks to be performed at each stage, wherein the computational tasks are parallel tasks for a parallel stage;

providing a pool of workers, each programmed to perform at least one of the computational tasks, and perform them independently and without knowledge of the other workers, and wherein a task is sized for a worker according to a selected job scale;

providing a central dispatcher unit, being a processor programmed to:

maintain task queues, where tasks are placed for pickup by the workers and where completed tasks are returned by the workers, and enforce dependencies between tasks;

monitor and store information relating to a current state of the full wavefield inversion; and

providing a controller unit, interconnected with the central dispatcher and programmed with an optimization algorithm that decides whether to halt a computational task that has not converged after a predetermined number of iterations.

16. The method of claim 15, wherein parallel stages of the full wavefield inversion comprise a gradient computation stage and a line search stage.

17. The method of claim 16, wherein the computational stages further comprise at least one stacking stage, which may be parallel or serial, in which results of parallel computations from a previous parallel stage are summed.

18. The method of claim 17, wherein workers include logic and heuristics to enable them to determine an optimal sized parcel stack to sum, or to place partial stacking results back into an input queue, allowing results to be recursively accumulated.

19. The method of claim 16, wherein the gradient computation stage comprises forward model-simulation of predicted data followed by computation of a cost function measuring difference between predicted data and the seismic data to be inverted, followed by computation of a gradient of the cost function with respect to parameters of the model, wherein computation of the gradient comprises reverse model-simulation of predicted data.

20. The method of claim 15, wherein tasks are assembled in the central dispatcher before assignment in task queues such that any dependencies between tasks can be enforced.

21. The method of claim 15, wherein the central dispatcher, the workers, and the controller each comprise one or more CPUs with associated data storage.

22. The method of claim 21, wherein the controller comprises one or more management tools selected from a group consisting of a data visualization and analysis tool, an inversion control tool, and a workflow integration tool.

23. The method of claim 22, wherein a human user uses the management tools to monitor the current state of the full wavefield inversion and make runtime decisions on matters including task definition and number of workers to be utilized.

24. The method of claim 23, wherein the management tools in combination with the central dispatcher are programmed to use said state information to schedule tasks and balance load among the workers, wherein task completion times are monitored and decisions made on when to stop or restart a task based on one or more heuristic or statistical analysis tools.

25. The method of claim 15, wherein the current state of the full wavefield inversion includes one or more of

number of seismic gathers to process, number completed, and number currently processing;

task handout and completion times; and

input parameters necessary to execute the tasks.

26. The method of claim 15, wherein tasks for parallel stages are defined by source shot.

27. The method of claim 15, wherein the central dispatcher unit is programmed to use run-time data accumulated on completed tasks to set a timeout value for a task that is still running.

28. The method of claim 15, wherein the controller unit is programmed to allow a user to monitor progress of the state of the inversion.

Description:
SCALABLE SCHEDULING OF PARALLEL ITERATIVE SEISMIC JOBS

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of U.S. Provisional Patent Application 62/093,991 filed December 18, 2014 entitled SCALABLE SCHEDULING OF PARALLEL ITERATIVE SEISMIC JOBS, the entirety of which is incorporated by reference herein.

FIELD OF THE INVENTION

[0002] This disclosure relates generally to the field of seismic prospecting for hydrocarbons and, more particularly, to seismic data processing. Specifically the disclosure relates to a system or method for scalable and reliable scheduling for massive parallel computing of full wavefield inversion ("FWI") of seismic data to infer a physical property model, such as a seismic wave propagation velocity model, of the subsurface.

BACKGROUND

[0003] Geophysical inversion attempts to find a model of subsurface properties that optimally explains observed data and satisfies geological and geophysical constraints. See, e.g., Tarantola, A., "Inversion of seismic reflection data in the acoustic approximation," Geophysics 49, 1259-1266 (1984); and Sirgue, L., and Pratt G. "Efficient waveform inversion and imaging: A strategy for selecting temporal frequencies," Geophysics 69, 231-248 (2004). Acoustic wave propagation velocity is determined by the propagating medium, and hence it is one such physical property for which a subsurface model is of great interest. There are a large number of well-known methods of geophysical inversion. These well-known methods may be classified as falling into one of two categories, namely iterative inversion and non-iterative inversion. Non-iterative inversion may be used to mean inversion that is accomplished by assuming some simple background model and updating the model based on the input data. This method does not use the updated model as input to another step of inversion. For the case of seismic data these methods are commonly referred to as imaging, migration (although typical migration does not update the model), diffraction tomography or Born inversion. Iterative inversion may be used to mean inversion involving repetitious improvement of the subsurface properties model such that a model is found that satisfactorily explains the observed data. If the inversion converges, then the final model will better explain the observed data and will more closely approximate the actual subsurface properties. Iterative inversion may generally produce a more accurate model than non-iterative inversion, but may also be much more expensive to compute.

[0004] One iterative inversion method employed in geophysics is cost function optimization. Cost function optimization involves iterative minimization or maximization of the value, with respect to the model M, of a cost function S(M) selected as a measure of the misfit between calculated and observed data (this is also sometimes referred to as the objective function), where the calculated data are simulated with a computer using a current geophysical properties model and the physics governing propagation of the source signal in a medium represented by a geophysical properties model. (A computational grid subdivides the subsurface region of interest into discrete cells, and the model consists of assigning a numerical value for at least one physical property such as seismic wave propagation velocity to each cell. These numerical values are called the model parameters.) The simulation computations may be done by any of several numerical methods including but not limited to finite difference, finite element or ray tracing.

[0005] Cost function optimization methods may be classified as either local or global. See, e.g., Fallat, M. R., Dosso, S. E., "Geoacoustic inversion via local, global, and hybrid algorithms," Journal of the Acoustical Society of America 105, 3219-3230 (1999). Global methods may involve computing the cost function S(M) for a population of models {Ml, Ml, M3,... } and selecting a set of one or more models from that population that approximately minimizes S(M). If further improvement is desired, this new selected set of models may then be used as a basis to generate a new population of models that can be again tested relative to the cost function S(M). For global methods, each model in the test population can be considered to be iteration, or, at a higher level, each set of populations tested can be considered iteration. Well known global inversion methods include Monte Carlo, simulated annealing, genetic, and evolution algorithms. Local cost function optimization may involve: (1) selecting a starting model M; (2) computer-simulating predicted data corresponding to the measured data and computing a cost function S(M); (3) computing the gradient of the cost function with respect to the parameters that describe the model; and (4) searching for an updated model that is a perturbation of the starting model in the negative gradient direction in multi-dimensional model parameter space (called a "line search") that better explains the observed data. This procedure may be iterated by using the new updated model as the starting model and repeating steps 1-4. The process continues until an updated model is found that satisfactorily explains the observed data. Typically, this is taken when the updated model agrees with the previous model to within a preselected tolerance, or another stopping condition is reached. Commonly used local cost function inversion methods include gradient search, conjugate gradients and Newton's method.

[0006] As discussed above, iterative inversion may be preferred over non-iterative inversion in some circumstances because it yields more accurate subsurface parameter models. Unfortunately, iterative inversion may be so computationally expensive that it would be impractical to apply it to some problems of interest. This high computational expense is the result of the fact that all inversion techniques require many compute-intensive forward and sometimes also reverse simulations. Forward simulation means computation of the data forward in time, e.g. the simulation of data in step 2 above. Reverse simulation means computation of the data in discrete time steps running backward in time. Back-propagation of the waveform may be used when a particularly efficient method, called the adjoint method, is used to compute the gradient of the cost function. See, Tarantola, A., "Inversion of seismic reflection data in the acoustic approximation," Geophysics 49, 1259-1266 (1984). The compute time of any individual simulation is generally proportional to the number of sources to be inverted, and typically there may be large numbers of sources in geophysical data. By encoding the sources, multiple sources may be simulated simultaneously in a single simulation. See, U.S. Patent No. 8, 121,823 to Krebs, et al. The problem may be exacerbated for iterative inversion because the number of simulations that must be computed is proportional to the number of iterations in the inversion, and the number of iterations required is typically on the order of hundreds to thousands.

[0007] Fig. 1 illustrates how the data inversion work stream may be divided into parallel stages 11 and serial stages 12, where the parallel stages are inherently suitable to simultaneous computation on many parallel processors. The parallel stages are the stages in the process where many different sources (e.g., different source locations) are being simulated. For example, in what is termed as the "gradient stage" in Fig. 1, each parallel computer may simulate predicted data for all receiver locations for one or more source shots, then compute the contribution to the cost function, and then the contribution to the gradient of the cost function, for its assigned source shots. Input quantities for each parallel computer include source and receiver locations, source parameters, and the measured data at the receiver for those sources. The serial stage shown following the gradient stage in Fig. 1 sums all of the gradient contributions. This gives a direction in parameter space, usually a negative of the gradient, in which to update the model. For the gradient stage, the simulations will be forward simulations to generate the predicted data to be used with the survey data to compute the cost function, but will also include reverse simulations if the adjoint method is used to compute the gradient of the cost function. In the line-search stage, predicted data are simulated in the direction given by the gradient for several different model update step sizes, and the step size generating the minimum cost function value (or maximum cost function value in case of a cross-correlation cross function) is selected and the model is incremented at the serial stage following the line search stage. Thus, the line search involves several more forward simulations for each geophysical source. The parallel stages 11 in Fig. 1 graphically illustrate an additional challenge to iterative geophysical inversion, whereby delays in processing a single source will delay the final solution and result in low efficiency and utilization of computer hardware.

[0008] Due to its high computational cost, iterative inversion typically requires a large high performance computing ("HPC") system in order to reach a solution in a practical amount of time. As described above, a single iterative inversion can be decomposed into many independent components and executed on hundreds of thousands of separate compute cores of an HPC system, thus reducing time to solution. Problem decomposition can be achieved at several levels: for a single forward or reverse simulation, the model domain can be spatially divided among computational cores with each core communicating only the information at the edge of the subdomain to neighboring cores. This is a common parallel speedup strategy used in iterative and non-iterative inversion, as well as other scientific computing domains. See, e.g., Araya-Polo, M., et.al. "3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors," Scientific Programming 17(1- 2): 185-198 (2009); Brossier, R., "Two-dimensional frequency-domain visco-elastic full waveform inversion: Parallel algorithms, optimization and performance," Computers and Geosciences, Volume 37, Issue 4, p. 444-455 (2010). However, the scalability of such methods is eventually limited by the overhead of sharing edge information, since computational work done by each computational core decreases as number of subdomains is increased to grow the parallel speedup.

[0009] Another parallel speedup method involves dividing an iterative inversion across separate forward or reverse simulations for each source to be inverted. As there are large numbers of sources in geophysical data which can be modeled separately, this approach enables many independent simulations to be mapped onto an HPC system without scalability concerns due to communication overhead. For traditional non-iterative seismic inversion methods, this parallelization scheme is trivial to implement, as the sources can simply be divided into separate simulations and run to completion. See, e.g., Suh, S.Y., et. al, "Cluster programming for reverse time migration", The Leading Edge, January 2010. For iterative inversion, separate simulations for each source must be combined to determine the gradient of the cost function S(M) for the entire model area as well as the line search for the updated model, requiring efficient scheduling and load balancing among independent simulations to efficiently execute large inversions. Due to low communication requirements, parallelization across geophysical sources exhibits excellent scalability and presents the main opportunity to implement iterative seismic inversions in an efficient way on massively parallel computing systems. (Scalability refers to the algorithm's ability to use more parallel processors on a single problem, with overall time to solution decreasing proportionally as more processors are added).

[0010] The two parallelization strategies described above, parallelization of single source simulation by domain decomposition and parallelization across separate seismic sources, can both be used concurrently to reduce time to solution. The attention is now turned to the implementation methods for the second parallelization strategy using separate seismic sources, for which there are several different published implementation methods with which parallel speedup of iterative geophysical inversion may be achieved.

[0011] In typical practice, HPC systems use message passing (e.g., MP I) or shared memory (e.g., OpenMP, OpenACC, Cilk Plus, Intel Threaded Building Blocks, Parallel Global Address Space (PGAS) languages) techniques to decompose a scientific computing problem across many compute cores. This is a common method for parallelization of seismic simulations for a single geophysical source, and can be used to parallelize across separate sources as well. However, this method has several disadvantages when applied to large scale iterative seismic inversions as parallelization via message passing or shared memory encapsulates the entire inversion as a single HPC job. On current HPC systems, failure of a single thread, MPI rank, or a compute core may cause the entire job to fail, thereby losing progress to date. Regular checkpointing to disk may be required to be performed as long geophysical inversion run times coupled with typical HPC failure rates make a single failure likely to occur during a single inversion. Another limitation of encapsulation into a single job is that it does not allow the system level scheduler to interject other work onto nodes that are temporarily unused during inversion. Since compute demands can drastically change during the course of iterative geophysical inversion (serial stages of Fig. 1 require a very small amount of compute power compared to a large demand in parallel stages), and HPC systems generally lack job-level preemptive schedulers (preemptive scheduling would allow currently running jobs to be paused and switched for a more urgent job), without novel approaches, iterative geophysical inversion can result in very inefficient uses of the system. If any additional capability needs to be added into the inversion workflow, they must be integrated as additional ranks or threads (ranks or threads are common methods to specify single serial part of a parallel program) and coordinated with existing simulations, which can be time consuming. Finally, changing the number of compute nodes (a node means a grouping of one or more processing units in a close proximity to each other within an HPC system) used for the inversion during runtime or running the inversion across multiple heterogeneous sets of HPC nodes is not possible using this approach.

[0012] An alternative to the above approaches is to submit each separate geophysical source simulation as a separate HPC job, and allow the system batch job scheduling software (e.g., Moab Cluster Suite, SLURM, PBS Professional, Condor, etc.) to manage these jobs. Separate jobs would share information via files on a shared disk, and workflow support packages like Condor Directed Acyclic Graph Manager (DAGman) or Pegasus can coordinate complex workflows with many dependent tasks. This approach is simple to set up and implement, but suffers from slow scheduling times of batch scheduling software, which is typically not equipped to handle tens of thousands to hundreds of thousands of separate jobs that geophysical inversion may inject into the system each iteration. Additionally, batch schedulers tend to prioritize overall system utilization instead of the overall workflow runtime, as they are designed to extract maximum use of the HPC system having small numbers (e.g., tens) of large workloads queued up to run.

[0013] What is needed is a method to efficiently schedule and execute many separate geophysical source simulations within the context of a single iterative seismic inversion. As such, the method should balance between the overall iteration runtime and system utilization, allow efficient scaling of the inversion to many cores in order to make processing of realistic seismic surveys possible, and provide reliability capabilities to enable inversion runs lasting multiple days or weeks. There are a variety of difficulties with delivering such a system. For example, at high levels of parallelism (100k+ cores- where core is an independent processor within the HPC system capable of executing a single serial part of a parallel program) keeping consistent global picture of the computation state is expensive. Collecting information across all the tasks can affect the overall performance of the system. Additionally, at high levels of parallelism (100k+ cores) hardware may be inherently unreliable. This means that one of the parallel tasks may be likely to fail over long execution times. Recovery from failure is not trivial since collection of global state needed for recovery is expensive (see first item). Further, load balancing and smart scheduling may become increasingly important as the level of parallelism increases because the "jitter" between task run times (e.g., the unevenness of tasks shown in Fig. 1) becomes a bigger percentage of overall runtime.

SUMMARY

[0014] This disclosure concerns a system and method for scalable and reliable scheduling for massive parallel computing of iterative geophysical inversion of seismic data to infer a physical property model, such as a seismic wave propagation velocity model, of the subsurface. In one embodiment, the invention is a parallel computing system for full wavefield inversion ("FWI") of seismic data, comprising a pool of computational units, called workers, each programmed to operate independently of each other and initiate and perform one or more computational tasks that arise in the course of the full wavefield inversion, and having an input stage adapted to receive information needed to perform a task, and an output stage; wherein the full wavefield inversion is organized into a series of parallel computational stages and serial computational stages, and each parallel stage is divided into a plurality of parallel computational tasks sized for a worker according to a selected job scale, a central dispatcher processor programmed to (a) provide task queues where tasks are made available to the workers and where the computational units output results of a completed task, wherein dependencies between tasks are enforced, and (b) monitor and store information relating to a current state of the full wavefield inversion, and a controller, interconnected with the central dispatcher and programmed with an optimization algorithm that decides whether to halt a computational task that has not converged after a predetermined number of iterations.

[0015] In another embodiment, the invention is a seismic prospecting method for exploring for hydrocarbons, comprising obtaining seismic survey data, processing the seismic survey data by full wavefield inversion to infer a subsurface model of velocity or other physical parameter, wherein the processing is performed on a system of parallel processers, called workers, dividing the full wavefield inversion into a sequence of parallel stages and serial stages, and defining one or more computational tasks to be performed at each stage, wherein the computational tasks are parallel tasks for a parallel stage, providing a pool of workers, each programmed to perform at least one of the computational tasks, and perform them independently and without knowledge of the other workers, and wherein a task is sized for a worker according to a selected job scale, providing a central dispatcher unit, being a processor programmed to maintain task queues, where tasks are placed for pickup by the workers and where completed tasks are returned by the workers, and enforce dependencies between tasks, monitor and store information relating to a current state of the full wavefield inversion, and providing a controller unit, interconnected with the central dispatcher and programmed with an optimization algorithm that decides whether to halt a computational task that has not converged after a predetermined number of iterations.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The advantages of the present invention are better understood by referring to the following detailed description and the attached drawings, in which:

Fig. 1 is a schematic diagram showing an example of parallel and serial stages of computation on an FWI application;

Fig. 2 is a schematic diagram of the seismic computation framework of one embodiment of the present invention;

Fig. 3 is a flow chart describing how a seismic problem may be divided and executed via the central dispatcher and workers in one embodiment of the present inventive method; and

Fig. 4 schematically illustrates mapping dependent tasks (workflows) of a computation with parallel and serial stages onto the parallel computation framework according to one embodiment of the present invention.

[0017] The invention will be described in connection with example embodiments. However, to the extent that the following detailed description is specific to a particular embodiment or a particular use of the invention, this is intended to be illustrative only, and is not to be construed as limiting the scope of the invention. On the contrary, it is intended to cover all alternatives, modifications and equivalents that may be included within the scope of the invention, as defined by the appended claims.

DETAILED DESCRIPTION

[0018] The present invention implements a distributed parallel framework for scalable and reliable scheduling of iterative geophysical inversions across large HPC systems. The invention decentralizes and separates computation, global state, and control mechanisms within the system, allowing scalability to hundreds of thousands of cores, while still providing ability to apply smart scheduling, load balancing, and reliability techniques specific to the seismic inversion problems.

[0019] Data parallelization means dividing the data processing by source shot among multiple processors. Data parallelization is inherently scalable. However, for full wavefield inversion ("FWI") at a large scale, serial sections (gradient stacking) and coordination of the inversion (how many shots to select, when finished, etc.) can complicate the pure data parallelization (across shots) and compromise scalability. The present invention avoids such problems.

[0020] Fig. 2 describes a system composed of many independent "workers" 21 that retrieve tasks from a central dispatcher 22. Thinking of them as animate objects for ease of explanation, the workers themselves may only be aware of the dispatcher and the task they are performing. They may not be aware of how many other workers there are in the system or the specific tasks those other workers are performing. It may be prohibitively costly to provide this information to the workers, and one of the objectives of the present invention is to avoid a need for doing so. This allows independent workers to be dynamically added or removed from the system during execution, taking advantages of available temporary resources. Stated more precisely, the individual workers may add or remove themselves. Individual workers can also fail without affecting the rest of the system, allowing computation to continue. Dynamic addition and/or removal of resources while maintaining the ability to continue on failure are advantages of the many -task computing approach over more traditional HPC parallelization methods like the message passing interface (MPI).

[0021] As examples of the foregoing characteristics of independent workers, a worker might remove itself from the system if, for example, it was unable to run any jobs due to hardware failure. Another reason might be to give up its HPC nodes back to the system, possibly for another user's job on a shared system. Nodes may be temporarily borrowed, or returned, when available and when advantageous. A policy might be set whereby if a worker has no work for a specified length of time, the worker is shut down and the nodes returned to the system. Alternatively, this decision can be made manually by the user, observing information compiled by the central dispatcher on the status of the inversion. Regarding task failures, the user may specify at the beginning of the inversion how many shot simulations will be allowed to fail before the inversion will be halted.

[0022] The central dispatcher can be thought of as a passive responder to the active initiating by the independent workers, but the central dispatcher provides the jobs and the data and information to perform them, and monitors the status of all the jobs being performed by the workers, e.g., the state of the inversion process, for example, how many shot simulation failures have occurred. Using another example, the central dispatcher may be thought of as a centralized information store that specifies the structure of the inversion (i.e. how tasks are connected to each other) and keeps track of progress of work. The central dispatcher does not initiate action; the central dispatcher only responds to requests. Workers may function in an opposite way, namely, they may initiate actions but have limited information. In some published descriptions of conventional parallel processing of seismic inversion problems, there is a central controller that schedules seismic inversion tasks and hands out work to the parallel processors. The central controller stores information about where the tasks are, but unlike in the present inventive method, the central controller initiates performance of those tasks by their version of the workers. Yet even though certain tasks are decentralized in the present invention compared to conventional parallel processing, this pushing more responsibility to the workers is not true of every aspect of what the workers do. MPI (probably the most popular parallelization strategy, not just for seismic jobs) requires capability for a message to be addressed from one parallel "worker" to another. That means each worker needs to know the address that connects it to each other worker and, at a minimum, how many workers exist. A feature of the present invention is that a worker may not even "know" that there are other workers, much less who they are and what they are doing. A single word that describes this aspect of a worker in the present invention as well as the initiating aspect is independent.

[0023] Additionally, workers can execute on heterogeneous hardware, allowing a single type of parallel task to be executed on several different architectures simultaneously. This feature allows the framework to operate on a single seismic problem at high parallelism in environments where several generations of different hardware exist. Similarly a single inversion can be performed on a heterogeneous system where some components have traditional processors, while others have accelerators (graphical processing units, or GPU's; and field programmable gate arrays, or FPGA's; etc.).

[0024] The central dispatcher, shown as 22 in the middle of Fig. 2 and 47 in Fig. 4, contains several types of task queues (shown as stacked white boxes in center of the dispatcher) allowing dependencies between tasks to be enforced. For example, one task might filter or pre-process seismic shot data while a second task, dependent on the first finishing, would take the processed seismic data and simulate propagation through the subsurface. Each independent worker is programmed with a list of task types it can execute, allowing dedicated heterogeneous hardware to be used for a certain stage in the processing flow.

[0025] The flow chart in Fig. 3 shows how a seismic problem can be executed in one embodiment of the present invention and details examples of the type of interactions that may happen between the central dispatcher and the workers. In the presence of many workers, this approach may allow seismic problems to be parallelized across a number of available work items. [0026] Workers may not be aware of task dependencies; they may simply have an input pool of work which they grab their assignment from and an output pool where they place the result. An example of a task dependency in FWI is that the line search cannot begin until the gradient is computed. The central dispatcher in Fig. 4's example shows four tasks queues. The one at the top is functioning as an input queue. As the workers complete the task, they store the task result in the second queue, functioning as an output queue. But the output queue may become the input queue for a next task that is dependent on it as shown in Fig. 4, and so on. Thus, the central dispatcher does not move work between different queues in this example; the workers do. The worker that has finished one task and posted it in an output queue may or may not be qualified to perform the next task in the FWI sequence. For example, one task type in FWI may have been written for a GPU, but the rest of the tasks run only on regular processors. Thus the GPU workers will look only at one queue in the system, while the regular processors may take work from whichever queue offers it first. If the central dispatcher were to be programmed to have the responsibility to assign tasks to the workers, it would need to know all the workers present in the system, which workers have stopped working, which ones were killed by the user, and would need to know when additional workers have been added to the system. When failures occur, workers can disappear quietly, meaning that to have an accurate list of workers to assign work to, the central dispatcher would have to poll the workers to make sure they are still present. This type of operation may not scale well.

[0027] It may be desirable to run several big inversion problems simultaneously, each one loaded into its own central dispatcher, but with a common pool of independent workers. The user can control which inversion problem will get priority on the workers, thus scaling up some problems and scaling down others.

[0028] By connecting work pools together inside the central dispatcher, complex workflows that involve many dependent computational tasks can be mapped onto the proposed parallel computation system (Fig. 4). This allows the system user to consider the serial task dependencies (work flow) separately from the parallelism it will be executed with. In Fig. 4, 41 represents the work flow. The work flow is programmed by parts into workers. Each worker may know a single "arrow" (edge) in the work flow graph for the tasks it knows how to run. It does not know the work flow for tasks it is not supposed to run. So, the work flow may be distributed among the workers, but each worker may receive only the pieces they need to know, e.g. are capable of handling. [0029] As a typical example from an FWI problem, 42 may represent a task that filters the input data and removes all frequencies above 30 Hz. The arrow going from task 42 to the right represents mapping of task 42 onto any number of workers 46. Other tasks are similarly mapped to workers 46. At task 43, the velocity model is prepared for simulation by chopping up the model to apertures just around the shot area. There are two paths from task 42 to task 43 because (in this illustrative example) the user decided to have larger apertures for shots at the edge of the survey, which sets up two types of tasks (shots in the center go to one box 43, shots on the edge get processed at the other box 43). Task 44 may be simulation and gradient computation, and task 45 may be stacking each shot gradient into the master gradient.

[0030] It may be noted that the workers performing task 44 in the work flow, the simulation and gradient computations, will need to know various parameters in order to execute, for example source location, other source parameters, and receiver locations. Some of these parameters are static (meaning they are known at the beginning of inversion), and some are dynamic (calculated by a preceding task). Each work item stored in the central dispatcher is essentially such a "bag" of parameters. The launcher of the inversion may populate all the static parameters, including source/receiver locations, source signature files, and parameters that specify the physics of the simulation, etc., into an appropriate task such as, in this case, task 44. For dynamic parameters, when the worker returns this bag of data (work item) to the central dispatcher to be put in the output queue, the central dispatcher can delete, modify, or add new parameters into the work item.

[0031] The type of work flow described above, with dependent tasks, is typical not only in FWI but in seismic processing in general, and Fig. 4 shows how a work flow of tasks dependent on each other may be mapped to the dispatcher/worker model of the present invention. The top horizontal arrow shows that a parallel piece of task 42 in the work flow gets executed by a worker. The bottom horizontal arrow shows that the line connecting tasks 43 and 44 in the work flow is represented by a queue in the central dispatcher 47. In other words, that queue holds the output of task 43 which becomes the input for task 44. As explained previously, the mapping indicated in Fig. 4 is programmed into the software that is installed in the central dispatcher and the software that is installed in the workers, although the workers may not know all the connections - only the connections for the tasks they know how to execute. A work flow (or any directed acyclic graph, using mathematical terminology) may be inputted into the dispatcher by specifying which queues are input and output queues for every task. There may be a configuration file that specifies this for every task. For example, for the shot gradient simulation task 44, input may be from a queue called "sim in" and output may be put in queue called "sim out". Dependent task 45 may be a stacking operation that needs to grab shot gradients from the simulation task, so its input queue will be "sim out" and its output queue may be named "grad out", which in the end may contain the stacked gradient. Workers may be programmed to know only input and output queue names for tasks types they know how (i.e., are programmed) to run. Information such as who will pick up the work further once they put it in an output queue may not be given to them.

[0032] The distributed nature of the system and the design of the central dispatcher allow for several performance and reliability improvements that are crucial to execution of seismic inversion at large scales. Using the present state of the ongoing inversion as monitored by the central dispatcher, global progress of the application can be tracked and displayed to the user, some of whose various possible interactions with the FWI system are represented by the management tools 48 in Fig. 4 (also shown as panel 23 on the right side of Fig. 2). Data Analysis and visualization tools can access the central dispatcher state and display runtime statistics, enabling debugging and performance analysis of complex parallel applications. Autonomous tools can make runtime decisions on how many items of work to inject into the work pools in the central dispatcher, effectively starting the computation. Ability to affect the system dynamically as it is running is of great value to seismic inversion, as optimization- style algorithms are by definition data driven and the amount of computation needed cannot be defined prior to execution. An example of this might be dynamically changing timeouts for tasks (how long the system will wait for a worker to complete a task before giving up) based on a statistical distribution of previous runs. Management tools may be user controlled or automated, or a combination of both. Just one example might be the system sends the user an e-mail when the inversion reaches a certain stage.

[0033] Another application of the invention is to parallelize the "stacking" stage in seismic processing. Stacking combines results calculated for portions of the seismic survey into a full survey result. Although computationally simple, at high levels of parallelism stacking can lag the overall inversion time if done serially. Using the system described above, parallelization can be done simply by adding additional workers into the system to perform the stacking task. "Partial stacks" from each parallel stacker can then be further combined into the final result. To optimize "reduce" operations such as stackers, several other modifications to the worker-dispatcher communications can be made, including the following, which also give some idea of the capabilities that may be programmed into the independent workers: (A) workers may place partial results back on their input work pools, allowing results to be recursively accumulated, and/or (B) workers may request multiple items to stack before producing the partial result, reducing stacking time by reducing disk I/O; the number of items before producing a partial result can be held static, or can be based on a heat up/cool down heuristic, or based on the number of available items to be stacked in the central dispatcher state.

[0034] In some of its embodiments, the present invention includes one or more of the following features: (1) application of a distributed, loosely coupled, many -task computing paradigm to seismic inversion algorithms on HPC systems, composed of (a) independent workers that are not aware of each other's existence that are either assigned tasks by a master controller (represented by "Inversion Control" among the management tools in Fig. 4) or request tasks from the master controller; (b) a master state storage entity (the central dispatcher) that separates computation state (e.g., progress of inversion; tasks completed, to be done, etc.) from the province of the workers. Due to serving only as a state storage mechanism, the central dispatcher may accept tens of thousands of workers and scale a single inversion to 100k+ cores of an HPC system; and/or (c) one or more controllers/monitors that can affect computation state at runtime and drive the seismic inversion; (2) recording state of the iterative inversion process in the central dispatcher for purposes of applying scheduling, load balancing, and error recovery algorithms, where the state can include, but not be limited to (a) number of seismic gathers to process, number completed, and number currently processing, (b) task handout and completion times, and/or (c) all input parameters necessary to execute the tasks; (3) using the state to perform the following tasks to optimize performance and increase reliability (a) (may be performed by the central dispatcher) restart stopped inversions at the exact point they stopped, where the design of the central dispatcher and how the tasks are assigned work allows this to happen without need for additional logic, (b) (may be performed by the central dispatcher) use a history of task runtimes to set task timeout values, where tasks that do not return a result past a certain threshold (timeout) are either re-run or ignored; if task runtimes are not known or depend on dynamic information, setting timeouts is difficult (too short stops legitimate work, too long extends wait time to proceed with computation), but since the central dispatcher has all the task runtimes, a variety of heuristic rules based on statistical analysis may be used to base timeout values on previous task runtimes, (c) decide whether to re-run timeout tasks, or proceed without waiting for all results, dependent on a variety of factors including (i) skip unfinished work: (typically performed by a master controller, among the management tools, that monitors the inversion, e.g., monitors inversion state information compiled by the central dispatcher). During the reduction stage (see Fig 1) the central dispatcher state can be used to determine exactly how many items are still left to stack, and if the inversion is waiting on only a small percentage of tasks, the master controller may choose to proceed forward for performance reasons without waiting for all tasks to finish; in the case of seismic processing, if a few gathers out of a large survey are not completed, and they are appropriately separated spatially (central controller contains this information too), they can simply be ignored; and/or (ii) re-run unfinished work: (the re-run logic typically is programmed in the central dispatcher, which keeps track of how many times a work item has been re-run, and decides whether it should be made available to the workers again). Failure mechanisms that cause many tasks to time out at once likely result from a transient hardware event, and because central dispatcher contains information about when failures occurred, failures close in time can be flagged and re-run as soon as they are identified; similarly, if a heuristic installed in the central dispatcher shows that the number of failures is determined to have a significant impact on result fidelity (if failures were skipped), the failed tasks can be re-run, (d) (may be performed by management tools used to read, analyze, and/or visualize information compiled and stored in the central dispatcher) allow users to monitor progress and intermediate values of their inversions from a variety of visualization tools using the state of the central dispatcher; this includes, but is not limited to: number of tasks completed in various stages of inversions, parameters calculated from each resulting task, and monitoring convergence of the optimization algorithm; (4) parallel reduction techniques (most notably stacking per-gather results into a final result) that utilize the above-described framework: (a) parallelization of stacking by introducing multiple stacker workers, or multiple levels of stacking; (b) recursive stacking by placing partially stacked results back on the input work pool of the stacking task, to be stacked again; and/or (c) batch stacking by requesting multiple inputs at once, reducing disk I/O and improving stacking time; the threshold for requesting multiple inputs may be adjusted either by a heat up/cool down heuristic or based on the state in the central dispatcher.

[0035] The foregoing description is directed to particular embodiments of the present invention for the purpose of illustrating it. It will be apparent, however, to one skilled in the art, that many modifications and variations to the embodiments described herein are possible. All such modifications and variations are intended to be within the scope of the present invention, as defined by the appended claims.