Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VECTORIZATION APPARATUS
Document Type and Number:
WIPO Patent Application WO/2016/089242
Kind Code:
A1
Abstract:
A method and apparatus for providing a vectorized version /P/ of a source program, P, wherein the apparatus (1) comprising a graph interpreter (2) is adapted to traverse nodes of a graph-based intermediate representation, IR, of a source program, P, to retrieve for each node a current operation and to maintain and update a current context, ctx, representing along with the current operation the state of the graph interpreter (2) during the graph traversal; and a staged evaluator (3) adapted to vectorize the current operation in the current context, ctx, to provide a vectorized fragment, VF, returned by the staged evaluator (3) to said graph interpreter (2) which updates the current context, ctx, with an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph-based intermediate representation, IR', of the vectorized version /P/ of the source program, P.

Inventors:
SLESARENKO ALEXANDER VLADIMIROVICH (CN)
ZHANG HONGBO (CN)
ORLOV ANTON YURYEVICH (CN)
Application Number:
PCT/RU2014/000910
Publication Date:
June 09, 2016
Filing Date:
December 04, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
International Classes:
G06F9/45
Domestic Patent References:
WO2014142972A12014-09-18
Other References:
ARVIND K. SUJEETH ET AL: "Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages", ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, vol. 13, no. 4s, 1 July 2014 (2014-07-01), pages 1 - 25, XP055187083, ISSN: 1539-9087, DOI: 10.1145/2584665
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD (25 bldg, Moscow 0, RU)
Download PDF:
Claims:
A method for constructing a vectorized version /P/ of a source program, P, represented by a graph-based intermediate representation, IR, comprising:

(a) supplying (SI) said graph-based intermediate representation, IR, to a graph interpreter (2) which traverses nodes of said graph-based intermediate representation, IR, to retrieve for each node a current operation and to maintain and update a current context, ctx, representing along with the current operation the state of the graph interpreter (2) during the graph traversal;

(b) calling (S2) a staged evaluator (3) by said graph interpreter (2) for each traversed node and supplying the staged evaluator (3) with the current operation by using a symbol of the current operation and the current context, ctx, and

(c) vectorizing (S3) by the staged evaluator (3) the current operation in the current context, ctx, to provide a vectorized fragment, VF, returned by the staged evaluator (3) to the graph interpreter (2) which updates the current context, ctx, of the graph interpreter (2) with an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph-based intermediate representation, IR', of the vectorized version /P/ of the source program, P.

The method according to claim 1, wherein the graph- based intermediate representation, IR, is formed by a data structure where graph nodes represent operations of the source program, P, and graph edges represent information dataflows between operations.

The method according to claim 1 or 2, wherein the graph-based information representation, IR, is stored in a memory as a look-up table comprising a list of entries for all nodes of said graph-based intermediate representation, IR.

The method according to claim 3, wherein each entry of the list does correspond to a node of the graph-based intermediate representation, IR, and comprises as a key a node -ID of the respective node formed by a symbol and an operation class instance derived from a node class.

The method according to claim 4, wherein the operation class instance comprises an operation identifier of the operation and operation arguments formed by symbols of nodes of the graph-based intermediate representation, IR. .

The method according to one of the preceding claims 2 to 5, wherein the graph nodes of the graph-based intermediate representation, IR, represent primitive operations of a programming language or user-defined functions processed by a corresponding subroutine associated with a corresponding node class.

The method according to one of the preceding claims 4 to 6, wherein the staged evaluator (3) retrieves the respective node traversed by said graph interpreter (2) by using the symbol of the respective node as a key for the stored look-up table to get the current operation and its operation arguments.

8. The method according to claim 7, wherein the staged evaluator (3) maps the symbols of the operation arguments of the current operation to corresponding symbols of vectorization arguments by using the current context, ctx, supplied by the graph interpreter (2).

9. The method according to claim 8, wherein the staged evaluator (3) selects for vectorizing the current operation a vectorization rule, VR, configured for the respective operation and applies the selected vectorization rule, VR, being a staged function to the vectorized arguments of the current operation. 10. The method according to claim 9, wherein by applying the selected vectorization rule, VR, to the vectorized arguments of the current operation, a symbol that represents the vectorized fragment, VF, is provided. 11. The method according to claim 10, wherein an association between the symbol of the respective node and the symbol that represents the vectorized fragment, VF, is added to the current context, ctx. 12. The method according to one of the preceding claims 1 to 11, wherein for each primitive operation of a graph- based intermediate representation, IR, a vectorization rule, VR, is specified using the vectorization rule template, VRT.

13. The method according to one of the preceding claims 9 to 12, wherein the staged evaluator (3) applies during the execution of the selected vectorization rule, VR, an optimization rule, OR, to transform the original graph-based intermediate representation, IR, -.of the source program, P, to the resulting graph-based intermediate representation, IR, of the 'vectorized version /P/ of the source program, P.

The method according to claim 13, wherein by applying the optimization rule, OR, a sub-graph of the graph- based intermediate representation, IR, of the source program, P, is replaced by another sub-graph.

The method according to one of the preceding claims 1 to 14, wherein the context, ctx, of the graph interpreter (2) maintains a mapping between symbols of the graph-based intermediate representation, IR, of the source program, P, input to said graph interpreter (2) and symbols of a resulting graph-based intermediate representation, IR', of the vectorized version /P/ of the source program, P, output by the graph interpreter (2) .

A compiler comprising a vectorizer adapted to . perform the method according to one of the preceding claims 1 to 15.

A vectorization apparatus (1) for providing a vectorized version /P/ Of a source program, P, said apparatus (1) comprising: a graph interpreter (2) adapted to traverse nodes of a graph-based intermediate representation, IR, of a source program, P, to retrieve for each node a current operation and. to maintain and update a current context, ctx, representing along with the current operation the state of the graph interpreter (2) during the graph traversal; and a staged evaluator (3) adapted to vectorize the current operation in the current context, ctx, to provide a vectorized. ..fragment, VF, returned by the staged evaluator (3) to said graph interpreter (2) which updates the current context, ctx, . with, an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph- based intermediate representation, IR', of the vectorized version /P/ of the source program, P.

18. The vectorization apparatus (1) according to claim 17, wherein the vectorization apparatus (1) forms a stage of a compiler which is adapted to produce an executable machine . code from a source code of said source; program,

P ..

Description:
TITLE

Vectorization apparatus The invention relates to a method and an apparatus for constructing a vectorized version of a source program having a graph-based intermediate representation.

TECHNICAL BACKGROUND

The method and apparatus according to the present invention relates to the field of computer software engineering and in particular relates to compiling a source code by use of an intermediate code representation. Compiling a computer program involves converting a high-level language code, i.e. a source code, into instructions of a processor of a computer system. A compiler is a computer program that takes a source code of a source program and produces, an executable machine code . The sequence of the produced machine code instructions is executed by a processor to perform a given task.

A compiler performs program transformations to optimize a source program for fast execution. One of these applied optimizations performed' by the compiler is vectorization.

Fig. 1 shows a diagram illustrating the vectorization during the process of compilation by a conventional compiler. An intermediate representation IR of the source code is processed by a vectorizer to provide a vectorized intermediate representation of the source code. During vectorization, a sequence of scalar operations is replaced by a single vector operation as illustrated in Eig. 2. As can be seen, many scalar operations are replaced -with a single vector operation. This vectorization is performed because computing hardware supports vector operations and vectorized programs are most suited for parallel execution. Accordingly, vectorization is a process of converting a computer program from a scalar implementation which processes a single pair, of operands at a time, to a vector implementation which processes one operation on multiple pairs of operands at once. The term "vectorization" originally derives from the convention of putting operands into vectors or arrays. Automatic vectorization allows the compiler to convert scalar programs automatically into vectorized programs. A compiler may automatically generate both scalar and vector versions of a function from a single source code description. This allows a compiler to make vector function calls from within vectorized loops rather than making multiple serialized scalar function calls from within a vectorized loop.

Parallel execution of a program allows to achieve a maximum performance, however, is often cumbersome for humans to write such parallel executable programs. Accordingly, compilers perform partial vectorization automatically. Vectorization can also be used to replace sequential loops with vector operation as illustrated in relation to Fig. 3. With respect to this task, vectorization can be used to construct a vectorized version /P/ of a source program; The transformation of a source program, P, into a vectorized version /P/ can also be called replication. Accordingly, there is a need for a simple but generic method for the task of constructing vectorized versions of programs .

SUMMARY OF THE INVENTION According to a first aspect of the present invention, a method for constructing a vectorized version /P/ of a source program, P, represented by a graph-based intermediate representation, IR, is provided, said method comprising :

supplying said graph-based intermediate representation, IR, to a graph interpreter which traverses nodes of said graph- based intermediate representation to retrieve for each node a current operation and to maintain and update a current context representing along with the current operation the state of the graph interpreter during the graph traversal; calling a staged evaluator by said graph interpreter for each traversed node and supplying the staged evaluator with the current operation by using a symbol of the current operation and the current context, and

vectorizing by the staged evaluator the current operation in the current context to provide a vectorized fragment returned by the staged evaluator to the graph interpreter which updates the current context of the graph interpreter with an association between a symbol of the current operation and a symbol of the vectorized fragment to generate a graph-based intermediate representation of the vectorized version of the source program. According to a first implementation of the method according to the first aspect of the present invention, the graph- based intermediate representation is formed by a data structure where graph nodes represent operations of the source program and graph edges represent information dataflows between operations.

According to a second implementation of the first implementation of the method according to the first aspect or according to the method according to the first aspect as such, the graph-based information representation is stored in a memory as a look-up table comprising a list of entries for all nodes in the graph-based intermediate representation.

According to a third implementation of the second implementation of the method according to the first aspect of the present invention, each entry of the list does correspond to a node of the graph-based intermediate representation and comprises as a key a node- ID of the respective node formed by a symbol and an operation class instance derived from a node class.

According to a fourth implementation of the third implementation of the method according to the first aspect of the present invention, the operation class instance comprises an operation identifier of the operation and operation arguments formed by symbols of nodes of the graph-based intermediate representation.

According to a fifth implementation of the first to fourth implementation of the method according to the first aspectof the present invention, the graph nodes of the graph- based intermediate representation represent primitive operations of a programming language.

In a further possible sixth implementation of the first to fourth implementation of the method according to the first aspect of the present invention, the graph nodes of the graph-based intermediate representation represent user- defined functions processed by a corresponding subroutine associated with a corresponding node class.

In a further possible seventh implementation of the third to sixth implementation of the method according to the first aspect of the present invention, the staged evaluator retrieves the respective node traversed by said graph interpreter by using the symbol of the respective node as a key for the stored look-up table to get the current operation and its operation arguments.

In a further possible eighth implementation of the seventh implementation of the method according to the first aspect of the present invention, the staged evaluator maps the symbols of the operation arguments of the current operation to corresponding symbols of vectorization arguments by using the current context supplied by the graph interpreter.

In a further possible ninth implementation of the eighth implementation of the method according to the first aspect of the present invention, the staged evaluator selects for vectorizing the current operation a vectorization rule configured for the respective operation and applies the selected vectorization rule being a staged function to the vectorized arguments of the current operation.

In a further tenth implementation of the ninth implementation of the method according to the first aspect of the present invention, by applying the selected vectorization rule to the vectorized arguments of the current operation, a symbol that represents the vectorized fragment is provided.

In an eleventh implementation of the tenth implementation of the method according to the first aspect of the present invention, an association between the symbol of the respective node and the symbol that represents the vectorized fragment is added to the current context.

In a twelfth implementation of the first to eleventh implementation of the method according to .. the first aspect of the present invention or according to the method of the first aspect as such, for each primitive operation of a graph-based intermediate representation, a vectorization rule is specified using a vectorization rule template. In a thirteenth implementation of the tenth to twelfth implementation of the method according to the first aspect of the present invention, the staged evaluator applies during the execution of the selected vectorization rule, an optimization rule to transform the original graph-based intermediate representation of the source program to the resulting graph-based intermediate representation of the vectorized version of the source program.

In a fourteenth implementation of the thirteenth implementation of the method according to the first aspect of the present invention, by applying the optimization rule a sub-graph of the graph-based intermediate representation of the source program is replaced by another sub-graph. In a further possible fifteenth implementation of the first to fourteenth implementation of the method according to the first aspect of the present invention or according to the method of the first aspect as such, the context of the graph interpreter maintains a mapping between symbols of the graph-based intermediate representation of the source program input to said graph interpreter and symbols of a resulting graph-based intermediate representation of the vectorized version of the source program output by the graph interpreter. According to a second aspect of the present invention, a compiler comprising a- vectorizer adapted to perform the method according to the first aspect of the present invention is provided.

According to a third aspect of the present invention, a vectorization apparatus for providing a vectorized version of a source program is provided, said apparatus comprising: a graph interpreter adapted to traverse nodes of a graph- based intermediate representation of a source program to retrieve for each node a current operation and to maintain and update a current context representing along with the current operation the state of the graph interpreter during the graph traversal and a staged evaluator adapted to vectorize the current operation in the current context to provide a vectorized fragment returned by the staged evaluator to said graph interpreter which updates the current context with an association between a symbol of the current operation and a symbol of the vectorized fragment to generate a graph-based intermediate representation of the vectorized version of the source program.

According to a first possible implementation of the vectorization apparatus according to the third aspect of the present invention, the vectorization apparatus forms a stage of a compiler which is adapted to produce an executable machine code from a source code of said source program . BRIEF DESCRIPTION OF FIGURES

In the following, different implementations of different aspects of the present invention are described in more detail with reference to the enclosed figures. Fig. 1 shows a diagram for illustrating vectorization in a conventional process of compilation;

Fig. 2 shows a diagram for illustrating the replacement of scalar operations with a vector operation according to the state of the art;

Fig. 3 illustrates the replacement of an original program with a loop by a vectorized program without loops;

Fig. 4' shows a diagram for illustrating a possible arrangement of a vectorization apparatus according to an aspect of the present invention;

Figs. 5A, show an exemplary source program and a correspond-

5B, 5C ing graph-based intermediate representation of this source program;

Figs. 6A, show a graph-based intermediate representation of 6B an exemplary program for illustrating the operation of an apparatus and method according to the present invention;

Fig. 7 shows a diagram for illustrating a possible interpreter design pattern;

Fig. 8 illustrates the vectorization of a single graph node received from the graph interpreter as performed by the method and apparatus according to the present invention;

Fig. 9 shows a vectorization rule template and corresponding exemplary specification as employed by the method and apparatus according to the present invention;

Fig. 10 shows an optimization rule template and corresponding exemplary specification as employed by the method and apparatus according to the present invention;

Fig. 11 shows an extension of a staged evaluation process with corresponding optimizations as employed by the method and apparatus according to the present invention;

Fig. 12 shows a flowchart for illustrating a method for constructing a vectorized version of a source program according to an aspect of the present invention .

DETAILED DESCRIPTION OF EMBODIMENTS

As illustrated in Fig. 4, a vectorization apparatus 1 according to an aspect of the present invention is adapted to provide a vectorized version of a source program, P. The vectorization apparatus 1 comprises a graph interpreter 2 and a staged evaluator 3. The graph interpreter 2 is adapted to traverse nodes of a graph-based intermediate representation, IR, of the source program, P, to retrieve for each node a current operation and to maintain and update a current context representing along with the current operation the state of the graph interpreter 2 during the graph traversal .

The staged evaluator 3 of the vectorization apparatus 1 is adapted to vectorize the current operation in the current context to provide a vectorized fragment, VF, which is returned by the staged evaluator 3 to the graph interpreter 2 as illustrated in Fig. 4. The graph interpreter 2 updates the current context, CTX, as an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph-based intermediate representation, IR', of the vectorized version of the source program, P. The vectorization apparatus 1 illustrated in Fig. 4 can form a stage of a compiler which is adapted to produce an executable machine code from a source code of the source program, P. The graph-based intermediate representation, IR, of the source program, P, is a data structure that can be used by the vectorization apparatus 1 and allows an analysis and transformation before outputting the executable machine code. Each intermediate representation is a graph or a tree data structure which comprises specific information data inside nodes. The intermediate representation, IR, contains all information needed to evaluate the source program, P. There are ' different ways to represent a source program as a data structure in a memory. The intermediate representation, IR, has as a first property that it is graph-based, i.e. it is a graph data structure with specific information data stored inside graph nodes, and has as a second property that it can be implemented in an object-oriented language. Most of the graph-based intermediate representation corresponds to operations of the source program, P. Edges of the graph-based intermediate representation, IR, correspond to a dataflow between operations. The order of execution of operations is not explicitly specified by the graph-based intermediate representation, IR, and is dictated by dependencies between the nodes. The graph-based information representation, IR, can, be stored in a memory as a look-up table comprising a list of entries for all nodes of the^ graph-based intermediate representation, IR. Each entry of the list, can correspond to a node of the graph-based intermediate representation,. IR, and does comprise as a key a node-ID of the respective node formed by a symbol and an operation class instance derived from a node class. The operation class instance can comprise an operation identifier of the operation and operation arguments formed by symbols of nodes of the graph-based intermediate representation, IR. Graph nodes of the graph-based intermediate representation, IR, represent primitive operations of a programming language or user-defined functions processed by a corresponding subroutine associated with a corresponding node class.

In a possible embodiment, each graph node of the graph data structure of the intermediate representation can be represented by a pair (S,N), wherein

S is a globally unique identifier and can also be called a symbol and is typically formed by a number. All symbols are represented by instances of a class SYM. Each symbol refers to a node it identifies and can be used to access the node. N is an instance of a class derived from the node class. The concrete class of the node is denoted as an operation. Different operations can be represented by different classes. Classes of operations can form a hierarchy of classes. Abstract classes in the hierarchy can represent a subset of operations.

Fig. 5A shows an exemplary source program, P, which can be represented by an intermediate representation, IR. The shown exemplary source program, P, after being compiled, does when executed calculate an average of all values in an array. arr is an argument of the function. When the function is called, the parameter arr provides a . reference to an instance of a class Array <Float> which represents an array or indexed collection of values. The array of floating point numbers can be supplied by a caller of the function and is processed by the function. The value of the function shown in Fig. 5A is executed sequentially step by step. In line 2 of the program an invocation of the method sum of class Array is performed using the instance arr. The method calculates the sum of all elements in the array. In line 3 of the exemplary source program, an invocation of the method length of class Array is performed using the instance arr. The method calculates the number of the elements in the array. Finally, an average value is calculated and returned as a result of the function.

A graph-based intermediate representation, IR, of the exemplary source program, P, illustrated in Fig. 5A is shown in Fig. 5B. An in-memory representation of the program graph of Fig. 5B is illustrated in Fig. 5C. Fig. 5C shows an ih-memory intermediate representation, IR, for the program graph shown in Fig. 5B .

The intermediate representation, IR, of the source program, P, can be implemented in an object-oriented language. An object-oriented language is a computer programming language that supports object-oriented programming, OOP. Object- oriented programming is a programming paradigm using objects, usually instances of a class, consisting of data fields and methods together with their interactions to design applications and computer programs. The : used programming techniques may include data abstraction, encapsulation, messaging, modularity, polymorphism and inheritance. In object-oriented programming, a method is a subroutine or procedure associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program runtime. Methods used in object-oriented programming have the property that at runtime they have access to data stored in an instance of the class they are associated with and are thereby able to control the state of the instance. As can be seen in Fig. 4, the graph interpreter 2 traverses operations (nodes) of the graph to retrieve for each node a current operation and to maintain and update a current context representing along with the current operation the state of the graph interpreter 2 during the graph traversal. The current operation and the current context, CTX, provided by the graph interpreter 2 is supplied to the staged evaluator 3 which performs a vectorization of the operation in the given context, CTX, to provide a vectorized fragment, VF, which is returned by the staged evaluator 3 to the graph interpreter 2. The graph interpreter 2 updates the current context, CTX, with an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph- based intermediate representation, IR' - of the vectorized version /P/ of the source program, P.

As shown in Fig. 4, the graph-based intermediate representation, IR, is either created from some other representation such as a text file or from a previous compilation phase. The graph-based intermediate representation, IR, forming the program graph is supplied to the graph interpreter 2 of the vectorization apparatus 1. The graph interpreter 2 can create initially an empty context and start a graph traversal respecting dependencies between the nodes of the graph. For each node of the graph, the graph interpreter 2 calls the staged evaluator 3 and supplies it with the current operation and context. The staged evaluator 3 retrieves the respective node traversed by said graph interpreter 2 by using the symbol of the respective node as a key for the stored look-up table to get the current operation and its operation arguments. In a possible embodiment, the staged evaluator 3 can map the symbols of the operation arguments of the current operation (OpNode) and corresponding symbols of the vectorization arguments by using the current context supplied by the graph interpreter 2. The staged evaluator 3 can select for vectorizing the current operation (OpNode) a vectorization rule, VR, configured for the respective operation and can apply the selected vectorization rule, VR, being a staged function to the vectorized arguments of the current operation (OpNode) . By applying the selected vectorization rule, VR, to the vectorized arguments of the current operation (OpNode) , the symbol (sym') that represents the vectorized fragment, VF, is provided. An association between the symbol (sym) and the respective node and the symbol (sym') that represents the vectorized fragment, VF, is added to the current context, CTX. In a possible embodiment, for ach primitive operation of the graph-based intermediate representation, IR, a vectorization rule, VR, is specified using a vectorization rule template, VRT. In a possible embodiment, the staged evaluator 3 applies during the execution of the selected vectorization rule, VR, an optimization rule, OR, to transform the original graph-based intermediate representation, IR, of the source program, P, to the resulting graph-based intermediate representation, IR' , of the vectorized version /P/ of the original source program, P. By applying the operation rule, OR, a sub-graph of the graph-based intermediate representation, IR, of the source program, P, is replaced by another, usually simpler, sub-graph.

Accordingly, the staged evaluator 3 uses vectorization rules, VR, and optimization rules, OR, to perform a vectorization of the current operation in the current context. After the vectorization is performed by the staged evaluator 3 for a given operation, a symbol that represents vectorized fragment, VF, is returned back to the graph interpreter 2 which updates the context, CTX, with a new pair (Operation →/Operation/ ) and continues with the traversal through the graph.

The resulting graph for the vectorized version /P/ of the input program, P, is ready after the traversal has been finished by the graph interpreter 2.

In Fig. 4, a simple example of the input program, P, is illustrated, wherein the input program, P, performs a simple mathematical calculation for input operands a, b, c. In the given example, the input program, P, is:

P (a, b, c) = a + (b * c) .

A corresponding graph-based intermediate representation, IR, of the source program, P, is also shown in Fig. 6A. This graph-based intermediate representation, IR, forms the input data to the vectorization apparatus 1 and the result of the vectorization. is also formed by a graph-based intermediate representation, IR', as. shown in Fig. 6B.

The intermediate representation, IR, nodes of the graph do correspond to operations of the source program, P, and the edges · correspond to a dataflow between operations. Fig. 6A illustrates a graph-based intermediate representation, IR, of the exemplary source program, P. The order of the execution is dictated by the dependencies between the nodes and is not specified explicitly. In a possible embodiment, the graph-based intermediate representation, IR, is stored in a machine memory as a look-up table from symbols to nodes as also illustrated in Fig. 6B. Each pair (S,N) in the graph data structure does represent the definition of a symbol :

S - symbol (unique id)

N - instance of class derived from the node class.

Edges of the graph are represented by symbols that are stored in nodes. The " staged evaluator 3 retrieves the respective node traversed by . the graph interpreter 2 by using the symbol of the respective node as a key for the stored lbok-up table shown in Fig. 6B to get the current operation (OpNode) and its operation arguments.

In order to , traverse the graph-based intermediate representation of the exemplary source program, P, the graph interpreter 2 can for instance use an interpreter such as illustrated in Fig. 7. Fig. 7 illustrates an interpreter design pattern. This can adapt the behavior of a canonical interpreter software design pattern, for example Add, Mul are classes derived from node abstract class. Edges of the graph are represented by symbols that are stored in nodes. The abstract node class of a program graph can be treated as AbstractExpression. Instances of classes derived from the Node class can be treated as either TerminalExpression or NonTerminalExpression:

Program graph nodes that represent primitive operations of the language (such as Java) are treated as terminals and program graph nodes that represent user-defined functions are treated as non-terminals .

User-defined functions can be called "Lambdas" wherein each Lambda is characterized by:

Parameter symbols which do not have definition in the graph-based intermediate representation, IR, and

a lambda body, i.e. a sequence of the graph-based intermediate representation definitions that depend on the function parameter symbol and

a result symbol which represents the results of the execution of the function.

A method interpret ( ) of a terminal node N can call the staged evaluator 3 providing a symbol S of node N as the first argument of (Operation, Context) pair. It can also use the current context as second component of the pair. The context as a mapping between symbols of input and resulting graphs is illustrated in Fig. 6A. The pair (/Operation/ , ctx') is returned by the staged evaluator 3. The current context ctx is replaced with Context ctx' which means that it is used as the current context for the next definition. The resulting graph is updated by the staged evaluator 3.

The method interpret () of a lambda node L does the following: First, it creates a new lambda node /L/ for the vectorized version of this lambda.. For each parameter symbol of lambda L, it adds a new parameter symbol to /L/ and remembers this correspondence in the current context ctx. Further, it calculates the body of.. the lambda L by . building a list of definitions that depend on parameter symbols. The definitions from the body of the lambda are topologically sorted with respect to dependencies represented by edges of the graph. This means that each definition is interpreted after all the definitions it depends on. Further, for each definition body of the lambda, the graph interpreter 2 calls Interpret () method and updates the current context ctx. The resulting graph is updated by the staged evaluator 3. Further, after the traversal of the lambda body has been done, the graph interpreter 2 retrieves the result symbol of /L/ from the context . This is a symbol from the resulting graph which corresponds to the result symbol of lambda L . After all, the components of /L/ (parameters, body and result symbols) have been calculated /L/ is added to the resulting graph and the L -→ /L/ pair is added to the context ctx.

The staged evaluator 3 forms a component of the vectorization apparatus 1. The staged evaluator 3 is called by the graph interpreter 2 to process a single graph node of the original graph. It can be implemented by using a staged evaluation method as well as vectorization rules, VR, and optimization rules, OR, to transform the original graph of the source program, P, to the resulting graph IR 1 of the vectorized version of the source program, P.

Given a symbol from the original graph and the current context (sym, ctx) , the staged evaluator 3 can execute the vectorization rule, VR, and apply optimization rules, OR, and produce a new pair (sym', ctx') , wherein sym' is a symbol from the resulting graph that corresponds to the symbol sym. The behavior of the staged evaluator 3 of the vectorizatiori apparatus 1 is illustrated in Fig. 8.

Fig. 8 shows on the left a pseudocode with the following meaning :

Line 1: Each node of the original graph is given by its symbol (sym) and the current context ctx is given by a second argument.

Line 2 : Retrieve the node of the original graph by its symbol. The result is an operation and a list of argument symbols that represent incoming edges of the graph.

Line 3: Map arguments to their vectorizations by using the context.

Line 4: Select a vectorization rule, VR, which is configured for the current operation (OpNode) . The rule VR is a staged function which is implemented using the staged evaluation method. The rule VR can be a function from symbols to symbol.

Line 5: The selected rule VR is used as a function and applied to the vectorized arguments. The result (sym') is a symbol that represents a result of a vectorization of a give operation.

Line 6: Pair sym → sym' is added to the context.

Each vectorization rule, VR, transforms a single graph node (primitive operation) of a particular type in a given context. The rule set which is complete contains one rule specification for each primitive operation. These specifications can be used to configure the staged evaluator 3 to vectorize the nodes of the source program, P, as illustrated in Fig. 4. In a possible embodiment, for each primitive operation of the graph-based intermediate representation, IR, a vectorization rule, VR, is specified using a vectorization rule template, VRT. Fig. 9 shows a vectorization rule template, VRT, and an exemplary specification. .

The result of vectorization is specified using current context. This means that for each argument of the original operation the context introduces (declares) names for vectorizations of these arguments. These names (see vl, ... vN in Fig. 9) can be used in output term (or right-hand- side of the rule) as it is shown in the example.

The language that is used here to define <output-term> component of Vectorization Rule Template is the same language that is used for original program. In order to execute vectorization Rules this <output-term> parts should be staged by using staged evaluation method. This is a key point of implementation. When Staged Evaluator is configured with a set of rules by using rule specifications, each rule specification is staged and kept as some persistent store for later reuse. The ' result of staging is a staged function which is created for each rule specification ' . Staged functions are stored in a look-up table with an operation type as a key. Line 4 from the pseudocode in Fig. 9 retrieves a staged function from the look-up table.

In a possible embodiment, the staged evaluator 3 applies during the execution of the selected vectorization rule,

VR, an optimization rule, OR, to transform the original graph-based intermediate representation, IR, of the source program, P, to the resulting graph-based intermediate representation, IR', of the vectorized version /P/ of the source program, P.

Optimization rules, OR, are used to simplify the graph which is produced by the staged evaluator 3. Each optimization rule, OR, can recognize a particular combination of graph nodes (sub-graph) and can specify how this sub-graph can be replaced by another more simple subgraph . Fig. 10 shows an optimization rule template, ORT, and a corresponding exemplary specification.

The <pattern> component of the optimization rule, OR, specifies which sub-graph of this optimization rule should recognize. It also introduces or binds some variables that can be used in <output-term> component. The pattern component can be implemented using existing methods of pattern compilation. Such a sub-graph is recognized by the variables of the pattern bound to symbols of the sub-graph. Each variable can hold a symbol of a node.

The <output-term> component is analogous to the output term of the vectorization rule, VR. When this <output-term> is staged, it becomes a staged function. The variables that are bound in <pattern> are used as arguments when the function is called. When the function is executed, it produces new graph nodes which are added to the resulting graph .

To implement optimization rule, OR, in the staged evaluator 3, one can extend the method of staged evaluation. One can change one place in the implementation of the staged evaluation. It is possible to add a rewriting step in the method which is called to add each new node to the graph. Such an extension is illustrated in Fig. 11. A function rewrite is provided to apply optimization rules to a sub-graph denoted by newSym. If there is no matching rule, then rewrite returns newSym, otherwise the rule is executed (as a staged function) resulting in a new subgraph and a new symbol .

Fig. 12 shows a flowchart of a possible embodiment of a method for constructing a vectorized version /P/ of a source program, P, according to the first aspect of the present invention.

In a first step SI, the graph-based intermediate representation IR, is supplied to a graph interpreter 2 as illustrated in Fig. 4 which traverses nodes of said graph- based intermediate representation, IR, to retrieve for each node a current operation and to maintain and update a current context representing along with the current operation the state of the graph interpreter 2 during the graph traversal.

In a further step S2, a staged evaluator 3 is. called by the graph interpreter 2 for each traverse node and the staged evaluator 3 is supplied with the current operation by using a symbol of the current operation and the current context.

In a further step S3, the staged evaluator 3 vectorizes the current operation in the current context to provide a vectorized fragment, VF, returned by the staged evaluator 3 to the graph interpreter 2. The graph interpreter 2 updates the current context of the graph interpreter 2 as an association between a symbol of the current operation and a symbol of the vectorized fragment, VF, to generate a graph- based intermediate representation, IR' of the vectorized version /P/ of the source program, P.

To implement a vectorizer or vectorization apparatus 1, as illustrated in Fig. 4, or a method as illustrated in Fig. 12 one can proceed as follows:

First, a graph-based intermediate representation, IR, is provided. Then, the graph interpreter 3 is implemented using an interpreter design pattern. Further, a staged evaluator 3 of the vectorization apparatus 1 is implemented using a method of staged evaluation as described in "The method for constructing intermediate representation of a program using staged evaluation of the program in an object-oriented language" ( 83706387PCT01) .

Further, vectorization rules, VR, can be specified for the primitive operation of the graph-based IR using for instance a vectorization rule template, VRT. The vectorization rules, VR, can be implemented using a staged evaluation method and the staged evaluator 3 of the vectorization apparatus 1 is configured to use specific rules in the process of vectorization. Further, optimization rules, OR, are implemented using a staged evaluation method and by configuring the staged evaluator 3 of the vectorization apparatus 1 to use specified rules in the process of vectorization. Finally, the vectorization apparatus 1 is embedded as a separate stage of the compiler or as one of the transformations of an existing stage. The method and apparatus according to the present invention provides an automatic construction of vectorized versions of any source program, P, which can be represented by a graph-based intermediate representation, IR, thus allowing a direct usage of a graph-based intermediate representation, IR, of the source program, P, and the resulting vectorized version /P/. A coat bloating is avoided by fusing vectorization and optimization phases of the compiler while keeping vectorization rules, VR, and optimization rules, OR, independent of each other.

The vectorization method is independent or abstracted from a particular set of primitive operations that are used in the source program, P, and in the nodes of the program graph which represents the source program, P. This is achieved by performing the vectorization according to the vectorization rules, VR. The graph construction and transformation is simplified by using staged evaluation methods .

The method and apparatus can provide different effects ' .

One effect is the direct manipulation of graph-based intermediate representation, IR, for deeper optimizations. The vectorization method and apparatus can directly be applied to programs represented by a graph-based intermediate representation, IR. The choice of graphs can be dictated by the fact that in many cases graph-based intermediate representation, IR, is better suited for optimization and lead to a higher performance of the generated program.

A further effect of the method and apparatus according to the present invention lies in the independence from language including DSLs and primitive operations. The vectorization method and apparatus is abstracted, i.e. made independent, from concrete primitive operations used in the intermediate representation, IR, which makes it applicable " in many different contexts and different programming languages as well as DSLs . This is achieved by allowing configuration of the vectorization method and apparatus using vectorization rules, VR, and optimization rules, OR. A further effect of the method and apparatus according to the present invention is its simplicity of configuration and embedding it into a compiler. Graph transformations are typically harder to be implemented. The method and apparatus according to the present invention are based on the method of staged evaluation which greatly simplifies construction and transformation of graphs by supporting compact and declarative specification of vectorization rules, VR, and optimization rules, OR. The method and apparatus 1 have " as a further effect a faster compilation and resulting program code. The fusion of vectorization and optimization stages has the effect of increasing the speed of compilation. The resulting vectorized code is usually more compact and hence faster.

Further, vectorization can be performed without optimization rules, OR, i.e., a set of optimization rules, OR, can also be empty. The vectorization method according to the present invention can be used independently from a full function vectorization method. Also, the method can be used for an implementation of a replication procedure of a nested data parallelism and is not limited to this domain. The method can be applied in other domains by selecting different bases of primitive operations and vectorization rules, VR. The method for constructing a vectorized version /P/ of a source program, P, represented by a graph-based intermediate representation, IR, does include the use of a graph-based intermediate representation, IR, to represent source programs, P, wherein the graph nodes represent operations and graph edges represent information flows. The vectorization apparatus 1 performing the vectorization method can be configured' with vectorization rules, VR, and optimization rules, OR. The vectorization rules, VR, can be specified independently for each primitive operation in the graph-based intermediate representation using for instance a vectorization rule template, VRT. Further, optimization rules, OR, can be specified independently from the vectorization rules, VR.

A graph interpreter such as graph interpreter 2 illustrated in Fig. 4 can be used to traverse nodes of the graph to produce a pair comprising operation context for each graph node. During graph traversal, for each node a pair (Operation, Context) is produced and this pair is submitted to the staged evaluator 3. During processing of each pair (Operation, Context) in the staged . evaluator 3, one of the configured vectorization rules, VR, for the given operation can be selected and then the specified operation is vectorized in the given context to produce a pair: (/Operation/, Context') . During executing a selected vectorization rule, VR, using the method of staged evaluation for each new node of the resulting vectorized graph, it is possible to apply optimization rules, OR. During graph traversal by the graph interpreter 2, one can use a context data structure to maintain a mapping between symbols of input and a resulting graph to assemble the resulting graph. The vectorization task performed by the method and apparatus 1 according to the present invention can be formalized as follows:

For a given source program, P, in a graph-based intermediate representation, IR, this program, P, is transformed to a vectorized version /P/ in the same, graph- based intermediate representation so that the following holds :

If P takes values of type T as its input and produces values of type R as its output (i.e. P:T → R) , then the vectorized version of the program, P, is . a program /P/ :Array [T]→Array [R] such that if ys=/P/ (xs) , then ys [i] =P (xs [i] ) for all i=0,..., length (xs) -1 ,

wherein T and R are arbitrary types supported by the intermediate representation, IR, and wherein Array [T] and Array [R] are arrays of elements of type T and R, respectively.

Both programs, i.e. the original source program, P, and the vectorized version /P/ of the program, P, can be represented by the same graph-based intermediate representation, IR.