METHOD AND APPARATUS FOR DETERMINING COMPILER PARAMETERS SETTINGS

Title:

METHOD AND APPARATUS FOR DETERMINING COMPILER PARAMETERS SETTINGS

Document Type and Number:

WIPO Patent Application WO/2020/083487

Kind Code:

Abstract:

An apparatus and method for determining compiler parameters settings for program optimization is described. A representation of a program is generated from the program's sources, the representation including data regarding the program in relation to a performance metric. A system including an algorithm having at least some trainable weightsis used to determine compiler parameters from the representation. The system is trained using an exploration strategy which avoids the need for a training set with optimized parameters.

Inventors:

AIT AOUDIA FAYÇAL (FR)
ENRICI ANDREA (FR)

Application Number:

PCT/EP2018/079222

Publication Date:

April 30, 2020

Filing Date:

October 24, 2018

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA SOLUTIONS & NETWORKS OY (FI)

International Classes:

G06F8/41

Other References:

CUMMINS CHRIS ET AL: "End-to-End Deep Learning of Optimization Heuristics", 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), IEEE, 9 September 2017 (2017-09-09), pages 219 - 232, XP033243793, DOI: 10.1109/PACT.2017.24
LI FENGQIAN ET AL: "Feature Mining for Machine Learning Based Compilation Optimization", 2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING, IEEE, 2 July 2014 (2014-07-02), pages 207 - 214, XP032697354, DOI: 10.1109/IMIS.2014.26
GRIGORI FURSIN ET AL: "Milepost GCC: Machine Learning Enabled Self-tuning Compiler", INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, vol. 39, no. 3, 1 June 2011 (2011-06-01), pages 296 - 327, XP055114871, ISSN: 0885-7458, DOI: 10.1007/s10766-010-0161-2

Attorney, Agent or Firm:

IANOS, Anca-Marina (FR)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. An apparatus (100, 200) comprising means configured to perform:

generating (110) a representation (120, 220) of a program from the program’s sources (130, 230), wherein the representation includes data regarding the program in relation to a performance metric; and

determining (140), using a system including an algorithm having at least some trainable weights, compiler parameters (s) from the representation (120, 220), wherein the compiler parameters (s) are configured such that compiling the program’s sources (130, 230) using the compiler parameters (s) produces a program executable (160, 260) optimized with respect to the performance metric.

2. The apparatus of claim 1, wherein the means are further configured to perform:

compiling (150) the program’s sources using the compiler parameters (s).

3. The apparatus of claim 1 or 2, wherein the means are further configured to perform:

generating the representation (120, 220) of the program from the program’s sources (130, 230) from a compiler, wherein the representation (120, 220) is an intermediate representation.

4. The apparatus of claim 3, wherein:

the representation (120, 220) is at least one representation selected from the list of: memory exclusion graph, control-flow graph, data-flow graph, single static assignment form, and program dependence graph.

5. The apparatus of any preceding claim, wherein:

the performance metric is at least one metric selected from the list of: memory footprint, execution time, processing throughput, and power consumption.

6. The apparatus of any preceding claim, wherein the means are further configured to perform: training (510) of the system, comprising:

providing (512) a plurality of training program sources (230); generating (514) a representation (220) of each training program from each training program’s sources (230), wherein the representation includes data regarding that training program in relation to a performance metric;

selecting (516) a set of training program sources at random from the plurality of training program sources (230);

determining (518), using the system, compiler parameters (s) for each training program from the representation of each training program in the set;

compiling (522) each training program in the set from that training program’s sources using the determined compiler parameters (s) modified by a perturbation (e);

adjusting (526) the trainable weights of the system using a loss function pertaining to the performance metric.

7. The apparatus of claim 6, wherein the means are further configured to perform training (510) of the system, comprising:

determining (520) a set of perturbations (e), according to a distribution, wherein each training program in the set is compiled using the determined compiler parameters (s) modified by a perturbation (e) in the set of perturbations;

determining (524) a loss value (1) pertaining to the performance metric for each compiled program by benchmarking;

adjusting (526) the trainable weights of the system using a loss function determined from the loss values (1) of the compiled programs in the set.

8. The apparatus of claim 6 or 7, wherein:

the trainable weights of the system are adjusted using stochastic gradient descent or a variant thereof.

9. The apparatus of any preceding claim, wherein the means comprises:

at least one processor (410); and at least one memory (420) including computer program code (430), the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

10. A method (500) comprising :

generating (502) a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric; and

determining (504), using a system including an algorithm having at least some trainable weights, compiler parameters from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produces a program executable optimized with respect to the performance metric.

11. The method of claim 10, further comprising:

generating (508) the representation of the program from the program’s sources using a compiler, wherein the representation is at least one intermediate representation selected from the list of: memory exclusion graph, control- flow graph, data-flow graph, single static assignment form, and program dependence graph.

12. The method of claims 10 or 11, wherein:

the performance metric is at least one metric selected from the list of: memory footprint, execution time, processing throughput, and power consumption.

13. The method of claims 10 to 12, further comprising:

training (510) the system, comprising the steps of:

providing (512) a plurality of training program sources;

generating (514) a representation of each training program from each training program’s sources, wherein the representation includes data regarding that training program in relation to a performance metric;

selecting (516) a set of training program sources at random from the plurality of training program sources; determining (518) , using the system, compiler parameters for each training program from the representation of each training program in the set; compiling (522) each training program in the set from that training program’s sources using the determined compiler parameters modified by a perturbation;

adjusting (526) the trainable weights of the system using a loss function pertaining to the performance metric.

14. The method of claim 13, wherein training (510) of the system further comprises:

determining (524) a loss value (1) pertaining to the performance metric for each compiled program by benchmarking;

adjusting (526) the trainable weights of the system using a loss function determined from the loss values (1) of the compiled programs in the set.

15. A computer readable storage medium storing instructions which, when executed by a computer, cause the computer to perform a method (500), the method comprising:

generating (502) a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric; and

determining (504), using a system including an algorithm having at least some trainable weights, compiler parameters from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produced a program executable optimized with respect to the performance metric.

Description:

Title

METHOD AND APPARATUS FOR DETERMINING COMPILER PARAMETERS

SETTINGS

Technical Field

[001] Various example embodiments relate generally to methods and apparatus for determining compiler parameters settings for program optimization.

Background

[002] Program sources, also known as source code, describe, in a humanly understandable language, the operations that a program performs. However, these program sources cannot typically be executed by the targeted hardware or virtual machine. Therefore, an executable program must be generated from the program sources through a compilation process. This is typically done by a compiler, which is a software that takes as an input the sources as well as parameters for setting the compilation process and produces an executable program. Some of these parameters relate to the performance optimization of the program, and are critical for achieving the desired performance requirements.

[003] The setting of these parameters is typically done through benchmarking and hand-tuning, which requires a significant amount of time and resources. The performance of the executable program is affected by the parameter settings input to the compiler.

[004] Often it is desirable to optimize the performance of the executable program with respect to an performance metric, such as memory footprint or energy usage. The biggest obstacle to practical optimization of compiler parameters is that it requires hand tuning using benchmarking, which may necessitate a high amount of memory, computational and/or input-output resources, especially if the program being optimized is greedy. In addition to the high amount of resources required, it takes a considerable amount of time to achieve good parameter settings.

Summary

[005] A method and apparatus are disclosed herein to avoid the requirement of having well-optimized settings for compilation parameters associated to the programs present in the training set by enabling the learning system to explore the space of possible settings by introducing small perturbations. The method and apparatus, once trained, may be able to map a given input program to compiler parameters which achieve high performance with respect to a predetermined performance metric.

[006] In one embodiment, an apparatus comprises means configured to perform: generating a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric; and determining, using a system including an algorithm having at least some trainable weights, compiler parameters from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produced a program executable optimized with respect to the performance metric.

[007] In one embodiment, a method comprises generating a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric; and determining, using a system including an algorithm having at least some trainable weights, compiler parameters from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produced a program executable optimized with respect to the performance metric.

[008] In one embodiment, a computer readable storage medium stores instructions which, when executed by a computer, cause the computer to perform a method. The method comprises generating a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric; and determining, using a system including an algorithm having at least some trainable weights, compiler parameters from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produced a program executable optimized with respect to the performance metric.

Brief Description of the Drawings

[009] Example embodiments will now be described with reference to the accompanying drawings, in which:

[010] FIG. 1 shows an apparatus according to some embodiments described herein;

[Oil] FIG. 2 shows a system according to some embodiments described herein;

[012] FIG. 3 shows an apparatus according to some embodiments described herein;

[013] FIG. 4 depicts a high-level block diagram of an apparatus 400 suitable for use in performing functions described herein according to some embodiments; and [014] FIGs. 5 and 6 show a method according to some embodiments described herein. Detailed Description

[015] Example embodiments will now be described, including methods and apparatus that to avoid the requirement of having well-optimized settings for compilation parameters associated to the programs present in the training set by enabling the learning system to explore the space of possible settings by introducing small perturbations. The method and apparatus, once trained, may be able to map a given input program to compiler parameters which achieve high performance with respect to a predetermined performance metric.

[016] Functional blocks denoted as "means configured to perform ..." (a certain function) shall be understood as functional blocks comprising circuitry that is adapted for performing or configured to perform a certain function. A means being configured to perform a certain function does, hence, not imply that such means necessarily is performing said function (at a given time instant). Moreover, any entity described herein as "means", may correspond to or be implemented as "one or more modules", "one or more devices", "one or more units", etc. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional or custom, may also be included. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

[017] FIG. 1 illustrates an example apparatus 100 according to embodiments described herein.

[018] The apparatus 100 comprises means 110 configured to perform generating a representation 120 of a program from the program’s sources 130. The representation 120 includes data regarding the program in relation to a performance metric.

[019] The apparatus 100 comprises means 140 configured to perform determining, using a system including an algorithm having at least some trainable weights, compiler parameters S from the representation 120.

[020] In some embodiments, the apparatus 100 further comprises means 150 configured to perform compiling the program’s sources 130 using the compiler parameters S. The means 150 is a compiler in some embodiments. The compiler parameters S are configured such that compiling the program’s sources 130 using the compiler parameters S produces a program executable 160 optimized with respect to the performance metric.

[021] In some embodiments, the means 1 10 are configured to perform generating the representation 120 of the program from the program’s sources 130 from the compiler 150. In such embodiments, the representation 120 may comprise an intermediate representation generated by the compiler 150. In some embodiments, the representation 120 may be a static program representation, namely a representation generated without the need to run the executable generated by the compiler 150. Such representations can be generated using the compiler’s intermediate representations (e.g., number and type of instructions, control graphs).

[022] In some embodiments, the representation 120 comprises at least one representation selected from the list of: memory exclusion graph, control- flow graph, data flow graph, single static assignment form, and program dependence graph. Other representations known to those in the art may also be used. In some embodiments, the representation 120 may take the form of a vector representation of at least one of the listed representations.

[023] Referring now to FIG. 2, in some embodiments the system 140 is implemented as a neural network having a plurality of trainable weights. A vector Z is input to the system 140. The system 140 in some embodiments comprises a plurality of dense layers 142.

[024] Returning now to FIG. 1, in some embodiments the performance metric is at least one metric selected from the list of: memory footprint, execution time, processing throughput, and power consumption. Other performance metrics known to those skilled in the art may also be used.

[025] As described above, the representation 120 includes data regarding the program in relation to the performance metric. The representations 120 listed above essentially capture the flow of data (e.g., input samples) and control information (e.g., variables used in control statements such as loops, if-then-else), and in some cases the amount of data, in a program from the program sources 130. [026] As will be understood by the skilled person, the same representation 120 can be used for different purposes: for instance, both control- flow graphs and data-flow graphs can be used to track the value of variables and optimize the memory footprint. Further, different representations 120 can be used for the same goal. For instance representations such as call graphs, program dependence graphs and control flow graphs can all be used to optimize power consumption as they capture how often a code block is executed. Such representations 120 can be used to reduce the power consumption of a target machine by training the means 140 to determine compiler parameters that map code that is seldom executed to an energetically expensive processor and map code that is often executed onto a less power- hungry processor.

[027] Referring now to FIG. 3, in some embodiments, the apparatus further comprises means 200 further configured to perform training of the system l40’s trainable weights. The system 140 defines the mapping P— » S, where P is a set of program representations 120, and S is the set of compiler parameter settings. The system 140 is parametrized by a vector, and the aim of the training phase is to find values for the trainable weights that maximize the performance, with respect to a given performance metric, of the program compiled with the parameters determined by the system 140.

[028] The means 200 is configured to perform training of the system 140 as follows. A plurality of training program sources 230, which will be used to train the system 140, are provided.

[029] Typically, the program sources 230 are not used as inputs to the system 140 since learning a mapping from the program sources to the compiler parameters S would be inefficient. Instead, the means 200 is configured to perform generating a representation 220 of each training program from each training program’s sources 230. The representation 220 is chosen so that it includes pertinent data about the program sources 230 with respect to the performance metric being optimized. The system 140 is then fed with the representations 220 as described below. An efficient way of generating the representation 220 is using static program representations, which can be generated without the need of running an executable generated by the compiler 150, and therefore can be generated once for all the training program sources 230. Such representations 220 can be computed using the compiler’s 150 intermediate representations as discussed above. In some embodiments, the program sources 230 may be used as the representation 220 input to the system 140.

[030] The means 200 is further configured to perform training of the system 140 by selecting a set of N training program sources at random from the plurality of training program sources 230.

[031] The means 200 is next configured to perform training of the system 140 by determining, using the system 140, compiler parameters S for each training program from the representation 220 of each training program 220 in the set.

[032] In some embodiments, the means 200 is further configured to perform training of the system 140 by determining a set ofN perturbations, e, according to a distribution, P(e). In some embodiments, the perturbations, e, are determined independently. In some embodiments, the distribution, P(e), is a Gaussian distribution, for example with a mean of zero and a small variance s ².

[033] The means 200 is next configured to perform training of the system 140 by compiling each training program in the set from that training program’s sources using the determined compiler parameters modified by a perturbation. In some embodiments, the perturbation is a perturbation, e, in the set of perturbations. Thus, for the /th training program, the compiler parameters used to compile the program are s _/ + c _/, where s _/ is the compiler parameters determined by the system 140 for the zth training program and e/ is the /th perturbation in the set of perturbations.

[034] In some embodiments, the means 200 is next configured to perform training of the system 140 by determining a loss value, /, pertaining to the performance metric for each compiled program 260 from the set of training programs by benchmarking. Program benchmarking aims at evaluating the effectiveness of the compiler’s optimizations by measuring the performance of a program in its compiled, executable form. This evaluation is conducted by providing each compiled program with different types of workloads (e.g., signal-processing operations involving different matrix and vector operations) to be run onto a specific target platform. Data are collected by profiling the executable program (e.g., monitor variables’ values, memory accesses, control statements, power usage), and from this data a scalar loss value /, relevant to a certain performance metric, is computed using a suitable loss function. The loss value / indicates how well the program is performing with respect to this metric.

[035] The means 200 is next configured to perform training of the system 140 by adjusting the trainable weights of the system 140 using a loss function pertaining to the performance metric. In some embodiments, the loss function is determined from the loss values / of the compiled programs in the set. In some embodiments, the system 140 is adjusted by applying one step of stochastic gradient descent (SGD) of variant thereof to the trainable parameters, or weights, of the neural network forming the system 140 using the loss function J

[036] where N is the number of training programs in the set selected at random from the plurality of training program sources 230, U is the loss value determined for the /th training program, Pr() means‘probability of, s _/ are the compiler parameters determined by the system 140 for the zth training program, and c,· are the /th perturbation, and p, are the representation 220 of the /th training program. The loss function could take any suitable form and does not necessarily need to be differentiable. Any suitable SGD variant may be used, including but not limited to Adam, RMSProp, and Momentum.

[037] The means 200 is next configured to perform training of the system 140 by either stopping or repeat the above steps for another iteration. The stop criterion in can take multiple forms: stop after a fixed number of training iterations, stop when the loss function L has not decreased during a fixed number of iterations, stop when the loss or another associated metric has reached a desired value.

[038] In some embodiments, the size N of the set as well as the learning-rate and possible other parameters of the chosen SGD variant (e.g., ADAM, RMSProp, Momentum) are parameters of the means 200.

[039] As the means 200 does not rely on examples of optimized compilation parameters for programs forming a training set (as would be required using a supervised learning approach), any new program for which the apparatus 100 might be useful can be added to the training set, and be used to optimize furthermore the system 140 after its initial training phase.

[040] In some embodiments, parameters of the distribution P() can be chosen to be trainable e.g., if P() is a Gaussian distribution with mean m and variance s ² then m and s can be made trainable parameters.

[041] The skilled person will note that the loss function J is simply a function of which the gradient with respect to the trainable parameters is computed. The function is also known as the policy gradient. Methods known to the skilled person to improve the convergence speed of the training algorithm may be applied. [042] Training relies on exploring the space of possible representation to compiler parameter mappings. Exploring the space is done by trying new mappings that are small variations of the current mapping, using the perturbations. Exploring can be done in numerous ways, two of the most popular approaches being Gaussian policy and -greedy.

[043] In Gaussian policy, the perturbation is determined from a multivariate zero-mean normal distribution and added to the current compiler parameters. This ensures exploration “in the neighborhood” of the current compiler parameters. This is useful where the compiler parameters take values in a continuous set.

[044] In s-greedy, in which with probability l- , the compiler parameters are those determined by the system 140, and with probability a random compiler parameter is taken. This is useful in the case where the compilation parameters take values in a discrete set.

[045] The covariance matrix of the distribution from which the perturbations are determined, and the s parameter of the e-greedy approach, are usually fixed-parameters, i.e. not learned during training. These parameters control the“amount of exploration”, as making this parameters smaller reduces the amount of random exploration, and favors actions from the current policy of the system 140.

[046] Referring now to FIG. 4, in some embodiments, the means 110, 140, and 150 may take the form of an apparatus 400 comprising at least one processor 410 (e.g., a central processing unit (CPU) and/or other suitable processor(s)) and at least one memory 420 (e.g., random access memory (RAM), read only memory (ROM), or the like). The apparatus 400 further comprises computer program code 430 and various input/output devices 440 (e.g., a user input device (such as a keyboard, a keypad, a mouse, a microphone, a touch-sensitive display or the like), a user output device 450 (such as a display, a speaker, or the like), and storage devices (e.g., a tape drive, a floppy drive, a hard disk drive, a compact disk drive, non-volatile memory or the like)). The computer program code 430 can be loaded into the memory 420 and executed by the processor 410 to implement functions as discussed herein and, thus, computer program code 430 (including associated data structures) can be stored on a computer readable storage medium, e.g., RAM memory, magnetic or optical drive or diskette, or the like.

[047] Some embodiments relate to a method 500 as shown in FIG. 5. The method 500 comprises, at 502, generating a representation of a program from the program’s sources, wherein the representation includes data regarding the program in relation to a performance metric.

[048] The method 500 further comprises, at 504, determining, using a system including an algorithm having at least some trainable weights, compiler parameters S from the representation, wherein the compiler parameters are configured such that compiling the program’s sources using the compiler parameters produces a program executable optimized with respect to the performance metric.

[049] In some embodiments, the method 500 further comprises, at 506, compiling the program’s sources using the compiler parameters S. The compiler parameters S are configured such that compiling the program’s sources using the compiler parameters S produces a program executable optimized with respect to the performance metric.

[050] In some embodiments, generating 502 a representation of the program comprises generating the representation of the program from the program’s sources from a compiler. In such embodiments, the representation may comprise an intermediate representation generated by the compiler. In some embodiments, the representation may be a static program representation, namely a representation generated without the need to run the executable generated by the compiler. Such representations can be generated using the compiler’s intermediate representations (e.g., number and type of instructions, control graphs).

[051] In some embodiments, the representation comprises at least one representation selected from the list of: memory exclusion graph, control-flow graph, data-flow graph, single static assignment form, and program dependence graph. Other representations known to those in the art may also be used. In some embodiments, the representation may take the form of a vector representation of at least one of the listed representations.

[052] In some embodiments the performance metric is at least one metric selected from the list of: memory footprint, and power consumption. Other performance metrics known to those skilled in the art may also be used.

[053] In some embodiments, the method 500 further comprises, at 510, training the system as follows. At 512, a plurality of training program sources, which will be used to train the system, are provided.

[054] Next, at 514, a representation of each training program is generated from each training program’s sources. The representation is chosen so that it includes pertinent data about the program sources with respect to the performance metric being optimized. The representations may be static program representations generated without the need of running an executable generated by the compiler. Such representations can be computed using the compiler’s intermediate representations as discussed above.

[055] At 516, training 510 further comprises selecting a set of N training program sources at random from the plurality of training program sources.

[056] Training 510 further comprises, at 518, determining, using the system, compiler parameters S for each training program from the representation of each training program in the set.

[057] In some embodiments, training 510 further comprises, at 520, determining a set of N perturbations, e, according to a distribution, P(e). In some embodiments, the perturbations, e, are determined independently. In some embodiments, the distribution, P(e), is a Gaussian distribution, for example with a mean of zero and a small variance s ².

[058] Training 510 further comprises, at 522, compiling each training program in the set from that training program’s sources using the determined compiler parameters modified by a perturbation. In some embodiments, the perturbation is a perturbation, e, in the set of perturbations. Thus, for the zth training program, the compiler parameters used to compile the program are s _/ + c _/, where s _/ is the compiler parameters determined by the system for the zth training program and ez is the zth perturbation in the set of perturbations.

[059] In some embodiments, training 510 further comprises, at 524, determining a loss value, /, pertaining to the performance metric for each compiled program from the set of training programs by benchmarking.

[060] Training 510 further comprises, at 526, adjusting the trainable weights of the system using a loss function pertaining to the performance metric. In some embodiments, the loss function is determined from the loss values / of the compiled programs in the set. In some embodiments, the system is adjusted by applying one step of stochastic gradient descent (SGD) or varaint thereof to the trainable parameters, or weights, of the neural network forming the system using the loss function J

[061] where N is the number of training programs in the set selected at random from the plurality of training program sources 230, U is the loss value determined for the zth training program, Pr() means‘probability of, s _/ are the compiler parameters determined by the system for the zth training program, and e,· are the zth perturbation, and p, are the representation of the zth training program. The loss function could take any suitable forms and does not necessarily need to be differentiable. Any suitable SGD variant may be used, including but not limited to Adam, RMSProp, and Momentum.

[062] Training 510 further comprises, at 528, either stopping or repeat the above steps for another iteration. The stop criterion in can take multiple forms: stop after a fixed number of training iterations, stop when the loss function L has not decreased during a fixed number of iterations, stop when the loss or another associated metric has reached a desired value.

[063] Referring now to FIG. 6, an example representation 600, which may be used in some embodiments described herein, is illustrated. The example representation 600 is a memory exclusion graph, however as described above other representations may also be used. The memory exclusion graph (MEG) 600 is an example of a program representation that can be statically extracted by a compiler’s intermediate representations. A MEG describes a program’s logical memory objects (e.g., First-In First-Out buffers) that cannot share the same physical memory area. That is, objects that cannot be allocated to the same memory addresses. MEGs may be used for optimizing the memory footprint. A Memory Exclusion Graph is a undirected, weighted graph that can be formally denoted as G = <V;E;w>, where V is the set of vertices that represent indivisible (atomic) memory objects (e.g., a First-In First-Out buffer), E is the set of edges representing memory exclusion relation, i.e., the impossibility to share physical memory area, and w : V ® N is a function, with w(v) representing the weight of a vertex v. The weight of a vertex v corresponds to the size of the memory object that the vertex v denotes.

[064] The MEG 600 illustrated in FIG. 6 comprises a plurality of vertices 602, each of which represents a logical memory object of a program. Each vertex 602 is labelled with a first label 602a corresponding to the actor(s) in the program to which the memory object relates, and a second label 602b that indicates the size of the memory object. A plurality of edges 604 extend between pairs of vertices 602. An edge 604 extending between two vertices 602 represents a memory exclusion relation between those two vertices, that is the two memory objects represented by the vertices 602 cannot be allocated to the same physical memory addresses.

[065] In some embodiments, the MEG 600 can be used as a program representation in order to optimize the memory footprint of an input source program. Vertices 602 that do not have a edge 604 extending between them designate memory objects that can be allocated to the same physical memory area. This establishes how to reuse memory and thus to reduce a program’ s memory footprint. A system, as described above, may determine compiler parameter from the MEG 600.

[066] In this example, during training of the system, the goal of benchmarking is to measure the“goodness” of compilation parameters with respect to the memory footprint for a program and its associated representation. More precisely, the benchmarking phase consists of measuring the physical memory area that is allocated by the compiled program. The amount of physical memory allocated is denoted by m. The loss 1 is defined as the difference between m and the weight of the Maximum Weight Clique (MWC) of the program’s MEG 600, denoted by c. Thus the loss value is calculated as 1 = m - c.

[067] A MWC is used in determining the loss value since the allocation of a MEG will never use less memory than the weight of its MWC. A clique is a subgraph of a MEG’s vertices within which each pair of vertices is linked by an edge. The MWC is the clique whose weight is the largest of all cliques in the graph. In the context of a MEG, as the memory objects of a clique cannot share memory space because they mutually exclude each other, the weight of a clique gives a lower bound to the amount of memory that must be allocated for all of the clique’s memory objects. This weight is equal to the sum of the sizes of all clique’s memory objects.

[068] For instance, in the MEG 600 shown in Figure 6, the subsets of vertices 602 labelled ABl;AB2;B2C2 and ClDl;D2E;C2D2;DlE are examples of cliques. The MWC for the MEG 60 in Figure 6 is given by the vertices 602 labelled AB2;BlCl;B2C2;ClC2;ClDl .

[069] Embodiments described herein can also be applied to other compilation approaches such as those based on Model-Driven Engineering. Traditionally, software is developed from textual programming languages (e.g., C/C++, Java) that are used to write program sources 130. However, alternative software development paradigms exist. For instance, in Model-Driven Engineering, software is generated from domain-specific modeling languages. These graphical languages abstract a system’s structure, behavior and requirements and combine code generation engines that produce source code (e.g., C/C++), simulation and verification software or alternative model representations. Embodiments described herein in relation to FIGs. 1 and 2 may be applied to such program sources. An executable program can be compiled from model-based program sources. From the latter, a program representation can be extracted and executable programs can be benchmarked in order to learn the optimal compiler parameters.

[070] It will be appreciated that the functions depicted and described herein may be implemented in software (e.g., via implementation of software on one or more processors, for executing on a general purpose computer (e.g., via execution by one or more processors) so as to implement a special purpose computer, or the like) and/or may be implemented in hardware (e.g., using a general purpose computer, one or more application specific integrated circuits (ASIC), and/or any other hardware equivalents).

[071] A further embodiment is a computer program product comprising a computer readable storage medium having computer readable program code embodied therein, the computer readable program code being configured to implement one of the above methods when being loaded on a computer, a processor, or a programmable hardware component. In some embodiments, the computer readable storage medium is non-transitory.

[072] A person of skill in the art would readily recognize that steps of various above- described methods can be performed by programmed computers. Herein, some embodiments are also intended to cover program storage devices, e.g., digital data storage media, which are machine or computer readable and encode machine-executable or computer-executable programs of instructions where said instructions perform some or all of the steps of methods described herein. The program storage devices may be, e.g., digital memories, magnetic storage media such as magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media. The embodiments are also intended to cover computers programmed to perform said steps of methods described herein or (field) programmable logic arrays ((F)PLAs) or (field) programmable gate arrays ((F)PGAs), programmed to perform said steps of the above-described methods.

[073] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[074] While aspects of the present disclosure have been particularly shown and described with reference to the embodiments above, it will be understood by those skilled in the art that various additional embodiments may be contemplated by the modification of the disclosed machines, systems and methods without departing from the scope of what is disclosed. Such embodiments should be understood to fall within the scope of the present disclosure as determined based upon the claims and any equivalents thereof.

Previous Patent: PHENOTYPIC PROFILING BY MASS SPECTROMETRY AND MACHINE LEARNING

Next Patent: APPARATUS, METHOD AND COMPUTER PROGRAM