Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR TRANSFORMING A SOURCE CODE INTO A TARGET CODE ON A COMPUTER
Document Type and Number:
WIPO Patent Application WO/2017/065631
Kind Code:
A1
Abstract:
The disclosure relates to a method (200) for transforming a source code into a target code on a computer, wherein properties of the source code are determined by a source code type system comprising a collection of rules that apply to constructs of the source code, the method comprising: defining (201) a type wrapper for importing a type of a target code type system to the source code type system; virtualizing (202) the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode; concretizing (203) the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode; extending (204) the intermediate representation of the source code with new objects associated with new node types of the intermediate representation; and generating (205) the target code based on the extended intermediate representation of the source code.

Inventors:
SLESARENKO ALEXANDER VLADIMIROVICH (CN)
GEKK MAXIM VIKTOROVICH (CN)
Application Number:
PCT/RU2015/000673
Publication Date:
April 20, 2017
Filing Date:
October 15, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
International Classes:
G06F9/45; G06F9/44
Foreign References:
US20050149914A12005-07-07
Other References:
ALEXANDER SLESARENKO: "Lightweight Polytypic Staging: a new approach to Nested Data Parallelism in Scala", PROCEEDINGS OF THE 3RD ANNUAL SCALA WORKSHOP, APRIL 17-18, 2012, LONDON, UK, 17 April 2012 (2012-04-17), pages 1 - 12, XP055189311, Retrieved from the Internet [retrieved on 20150513]
ALEXANDER SLESARENKO ET AL: "First-class isomorphic specialization by staged evaluation", GENERIC PROGRAMMING, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 26 August 2014 (2014-08-26), pages 35 - 46, XP058054974, ISBN: 978-1-4503-3042-8, DOI: 10.1145/2633628.2633632
ARVIND K SUJEETH ET AL: "Composition and Reuse with Compiled Domain-Specific Languages", 1 July 2013, ECOOP 2013 OBJECT-ORIENTED PROGRAMMING, SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 52 - 78, ISBN: 978-3-642-39037-1, XP047033428
ARVIND K. SUJEETH ET AL: "Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages", ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, vol. 13, no. 4s, 1 July 2014 (2014-07-01), pages 1 - 25, XP055187083, ISSN: 1539-9087, DOI: 10.1145/2584665
KEVIN J. BROWN; ARVIND K. SUJEETH; HYOUK JOONG LEE; TIARK ROMPF; HASSAN CHAFI; MARTIN ODERSKY; KUNLE OLUKOTUN: "A Heterogeneous Parallel Framework for Domain-Specific Languages", PACT, 2011, pages 89 - 100, XP032083291, DOI: doi:10.1109/PACT.2011.15
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD. (RU)
Download PDF:
Claims:
CLAIMS:

1. A method (200) for transforming a source code into a target code on a computer, wherein properties of the source code are determined by a source code type system comprising a collection of rules that apply to constructs of the source code, the method comprising: defining (201 ) a type wrapper for importing a type of a target code type system to the source code type system; virtualizing (202) the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode; concretizing (203) the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode; extending (204) the intermediate representation of the source code with new objects associated with new node types of the intermediate representation; and generating (205) the target code based on the extended intermediate

representation of the source code.

2. The method (200) of claim 1 , wherein the source code type system is a Domain Specific Language type system or a general purpose language type system.

3. The method (200) of claim 1 or 2, wherein virtualizing (202) the type wrapper is based on polymorphic embedding techniques.

4. The method (200) of one of the preceding claims, wherein the at least one evaluation mode is one of a standard evaluation mode or a staged evaluation mode.

5. The method (200) of claim 4, comprising: concretizing (203) the virtualized representation of the type wrapper to one of the standard evaluation mode or the staged evaluation mode.

6. The method (200) of claim 4 or 5, wherein the evaluation of the virtualized representation in the staged evaluation mode produces an intermediate representation of the source code.

7. The method (200) of claim 6, wherein the intermediate representation of the source code is a graph-based intermediate representation.

8. The method (200) of one of the preceding claims, wherein the new objects associated with the staged evaluation mode represent new types of nodes of the intermediate representation of the source code.

9. The method (200) of claim 8, wherein the new objects represent new method calls for calling methods of the external types of the target code type system. 10. The method (200) of claim 9, wherein the new objects representing new method calls comprise the following parameters: a reference represented by a symbol to an element of the intermediate representation of the source code, a name of a called method in the types of the source code type system, a list of method arguments represented by some symbols.

1 1. The method (200) of one of claims 8 to 10, wherein the new objects represent new instances of the types of the source code type system. 12. The method (200) of claim 1 1 , wherein the new objects comprise meta information of the types and arguments required for construction of the instances of the types.

13. A system (300) for transforming a source code into a target code on a computer, wherein properties of the source code are determined by a source code type system comprising a collection of rules that apply to constructs of the source code, the compiler system comprising: a type wrapper subsystem (301 ) configured to define a type wrapper for importing a type of a target code type system to the source code type system; a virtualization subsystem (302) configured to virtualize the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode; a concretization subsystem (303) configured to concretize the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode; an extension subsystem (304) configured to extend the intermediate

representation of the source code with new objects associated with new node types of the intermediate representation; and a code generation subsystem (305) configured to generate the target code based on the extended intermediate representation of the source code.

14. The system (300) of claim 13,

wherein the at least one evaluation mode is one of a standard evaluation mode or a staged evaluation mode.

15. The system (300) of claim 13 or 14, wherein the concretization subsystem (303) is configured to concretize the virtualized representation of the type wrapper to the standard evaluation mode.

Description:
Method and system for transforming a source code into a target code on a computer

TECHNICAL FIELD

The present disclosure relates to a method and a system for transforming a source code into a target code on a computer, in particular by a compiler using staged type wrappers. The present disclosure further relates to the field of software engineering and the sub-field of domain specific languages (DSL).

BACKGROUND

Large scale data processing introduces new challenges for processing of unstructured data. Distributed systems like Hadoop® and Spark™ address the problem by providing common API and general algorithms for distributed data processing. Due to the nature of large scale data, time processing of data becomes very important. Existing approaches offer either micro optimization or manual tuning of internal general mechanisms.

Compiled Domain Specific Languages (DSLs) is a promising approach to raise a level of programming and achieve high performance at the same time, e.g. as illustrated by "Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. A Heterogeneous Parallel Framework for Domain-Specific Languages. In PACT, pages 89-100, 2011". Flexibility of DSLs is achieved by extending of Domain Type System. Usually, new types are imported to DSL by wrapping of external types. Type wrappers are standard practice of code importing. Most of the modern languages, like C# and Java use type wrappers for importing external types from other languages like C++ to evaluate the types in a standard mode.

Application developers want to write algorithms only once using high-level domain-specific languages. Program code written by the developer may look like the illustration of Fig. 1 (shown in a Scala language), where user defined types are applied for finding the maximum value of f(x) in this example program code. The "Bound" type of the program code of Fig. 1 that is written in a domain-specific language can be implemented by using one of the following DSL types, for example: Scala's Collection Types Seq[Double], Apache Spark's Types RDD[Double] or User-defined types Line. There is a need for program developers for reusing external types and libraries from one DSL where the program code is written to a second DSL in which the same types should be used.

SUMMARY

It is the object of the invention to provide a concept for efficiently reusing types originally defined in one program language in a second program language.

This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures. A basic idea of the invention is to apply wrapping of internal components of distributed data processing systems and to perform internal algorithm specialization based on actual processing data. Although staged type wrappers play an important role in this concept, standard type wrappers can be used as well. In order to describe the invention in detail, the following terms, abbreviations and notations will be used:

DSL: Domain Specific Language

IR: Intermediate Representation

API: Application Programming Interface

AST: Abstract Syntax Tree

In the following, systems, devices and methods for transforming a source code into a target code on a computer are described, which apply type wrappers for importing types of a target code type system to a type system of source code, also referred to as source code type system, e.g. a DSL type system.

A source code or source program is a computer program that is used as an input to a compiler which is translated into executable machine code. A compiler is a computer program that takes source code of source program and produces executable machine code. Executable machine code is a sequence of machine code instructions to be executed by a processor to perform a given task. Alternatively, a source code may be used as input to an interpreter. A domain-specific language (DSL) is a computer language specialized to a particular application domain.

Systems, devices and methods as described in the following may be used to transform a source code, e.g. written in Scala, Java, C++ into a target code. Scala is a programming language for general software applications fully supporting functional programming and a very strong static type system. This allows programs written in Scala to be very concise and therefore smaller in size than other general-purpose programming languages. Scala source code may be compiled to Java bytecode, so that the resulting executable code may run on a Java virtual machine. Java libraries may be directly used in Scala code and vice versa. Scala has an advanced type system supporting algebraic data types, covariance and contra-variance, higher-order types, and anonymous types as well as operator overloading, optional parameters, named parameters and raw strings.

Systems, devices and methods according to the disclosure may produce IR

representations of the source code. An intermediate representation (IR) is a data structure that is used inside a compiler for representing the source program and allows analysis and transformation before outputting the executable machine code. Typically this is a graph or a tree data structure with some specific information inside nodes. As the intermediate representation contains all the information needed to evaluate the program (if given the input data), we can say about the evaluation of the intermediate representation as an equivalent way to execute the program. The IR may use a tree data structure that is a data structure that simulates a hierarchical tree structure with a set of linked nodes.

Systems, devices and methods according to the disclosure may use staged code. Staged code for a source program P is a program P' such that evaluation of P' produces intermediate representation, which is semantically equivalent to the program P.

Systems, devices and methods according to the disclosure may apply staged evaluation. Staged evaluation is evaluation of a program that is separated into two stages: 1 ) IR generation, 2) IR execution. In staged evaluation, instead of resulting data value, the result of executing the program is an IR.

Systems, devices and methods as described in the following may be based on and evaluation mode, in particular staged evaluation mode or standard evaluation mode. In the context of this disclosure, an evaluation mode is either staged evaluation mode or standard evaluation mode. Staged evaluation mode is one of compiler modes when it performs initial steps of staged evaluation of source program. Standard evaluation mode is one of compiler modes when it handles source program according to operational semantic of programming language.

Systems, devices and methods as described in the following may apply polymorphic embedding techniques. In polymorphic embedding of DSLs, many different interpretations of a DSL can be provided as reusable components. Polymorphic embedding may be realized in the programming language Scala, for example. With polymorphic embedding, the static type-safety, modularity, composability and rapid prototyping of pure embedding are reconciled with the flexibility attainable by external toolchains.

Systems, devices and methods as described in the following may use type wrappers, type systems and virtualized representations. In programming languages, a type system is a collection of rules that assign a property called a type to the various construct, such as variables, expressions, functions or modules that a computer program is composed of. A type wrapper is a type of an internal type system which represents a type of another type system. A virtualized representation is a view on an instance of a type during program evaluation. Virtualization is the next level of abstraction when representation is hidden behind other elements of the level.

According to a first aspect, the invention relates to a method for transforming a source code into a target code on a computer, wherein properties of the source code are determined by a source code type system comprising a collection of rules that apply to constructs of the source code, the method comprising: defining a type wrapper for importing a type of a target code type system to the source code type system; virtualizing the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode; concretizing the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode; extending the intermediate representation of the source code with new objects associated with new node types of the intermediate representation; and generating the target code based on the extended intermediate representation of the source code.

Such a method provides the advantage that types originally defined in one program language, the source code, can be efficiently reused in a second program language when using the type wrapper. External types and libraries from the source code can be transformed to the source code type system where they may be reused by application developers.

In a first possible implementation form of the method according to the first aspect, the source code type system is a Domain Specific Language type system or a general purpose language type system.

This provides the advantage that the transformation method can be applied to a Domain Specific programming language as well as to a general purpose programming language. Thus, types have to be defined only once as they can be transformed to any other programming language by using such method.

In a second possible implementation form of the method according to the first aspect as such or according to the first implementation form of the first aspect, virtualizing the type wrapper is based on polymorphic embedding techniques.

This provides the advantage that many different interpretations of a DSL can be provided as reusable components. With such method based on polymorphic embedding, the static type-safety, modularity, composability and rapid prototyping of pure embedding are reconciled with the flexibility attainable by external tool chains.

In a third possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the at least one evaluation mode is one of a standard evaluation mode or a staged evaluation mode. Such a method provides the advantage that it may be applied in both evaluation modes, i.e. staged evaluation mode and standard evaluation mode.

In a fourth possible implementation form of the method according to the third

implementation form of the first aspect, the method comprises concretizing the virtualized representation of the type wrapper to one of the standard evaluation mode or the staged evaluation mode.

This provides the advantage that the method can be flexibly applied to both evaluation modes. When standard evaluation mode is applied the method may handle source program according to operational semantics of the programming language. When staged evaluation mode is used, the method may perform initial step of stage evaluation of the source program. In a fifth possible implementation form of the method according to the third implementation form or the fourth implementation form of the first aspect the evaluation of the virtualized representation in the staged evaluation mode produces an intermediate representation of the source code. This provides the advantage that the IR can be used for representing the source program allowing analysis and transformation before outputting the executable machine code. As the intermediate representation contains all the information needed to evaluate the program, evaluation of the intermediate representation is an equivalent way to execute the program.

In a sixth possible implementation form of the method according to the fifth

implementation form of the first aspect the intermediate representation of the source code is a graph-based intermediate representation. This provides the advantage that the graph-based IR can be used to store specific information inside the nodes. Thus, the graph-based IR contains all the information needed to evaluate the program. Hence, such a method can be efficiently performed, yet providing enough flexibility for the application developer. In a seventh possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the new objects associated with the staged evaluation mode represent new types of nodes of the intermediate representation of the source code.

By using these new objects new node types can be implemented for the IR providing an increased flexibility and adaptability.

In an eighth possible implementation form of the method according to the seventh implementation form of the first aspect the new objects represent new method calls for calling methods of the external types of the target code type system.

By using these new objects new method calls can be implemented providing an increased flexibility and adaptability.

In a ninth possible implementation form of the method according to the eighth

implementation form of the first aspect the new objects representing new method calls comprise the following parameters: a reference represented by a symbol to an element of the intermediate representation of the source code, a name of a called method in the types of the source code type system, a list of method arguments represented by some symbols.

By using such new method call parameters, an efficient transformation of programming constructs from a first programming language to a second programming language can be applied.

In an tenth possible implementation form of the method according to any one of the seventh to the ninth implementation forms of the first aspect the new objects represent new instances of the types of the source code type system.

By using such new objects, new instances of the types of the type source code type system can be created providing a higher flexibility to the application developer. In an eleventh possible implementation form of the method according to the tenth implementation form of the first aspect the new objects comprise Meta information of the types and arguments required for construction of the instances of the types. This provides the advantage that by describing the contents and context of the types and arguments required for the construction of the instances of the types, Meta information increases the usefulness of such source code transforming. Meta information facilitates in the discovery of relevant information and helps organizing resources and archiving and preservation of the resources. Meta information assists in resource discovery by allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information.

According to a second aspect, the invention relates to a system for transforming a source code into a target code on a computer, wherein properties of the source code are determined by a source code type system comprising a collection of rules that apply to constructs of the source code, the compiler system comprising: a type wrapper subsystem configured to define a type wrapper for importing a type of a target code type system to the source code type system; a virtualization subsystem configured to virtualize the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode; a concretization subsystem configured to concretize the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode; an extension subsystem configured to extend the intermediate representation of the source code with new objects associated with new node types of the intermediate representation; and a code generation subsystem configured to generate the target code based on the extended intermediate representation of the source code.

Such a system provides the advantage that types originally defined in one program language, the source code, can be efficiently reused in a second program language when using the type wrapper subsystem. External types and libraries from the source code can be transformed to the source code type system where they may be reused by application developers. In a first possible implementation form of the system according to the second aspect, the at least one evaluation mode is one of a standard evaluation mode or a staged evaluation mode. Such a system provides the advantage that it may operate in both evaluation modes, i.e. staged evaluation mode and standard evaluation mode.

In a second possible implementation form of the system according to the second aspect as such or according to the first implementation form of the second aspect, the concretization subsystem is configured to concretize the virtualized representation of the type wrapper in the staged evaluation mode.

This provides the advantage that the system can produce an intermediate representation of the source code, e.g. a graph based IR, and can flexibly create new objects in the IR.

According to a third aspect, the inventions relates to a stage type wrapper method, comprising the following steps: Define a type wrapper using an interface declaration with semantic annotations; Virtualize the type wrapper using a method of staged evaluation; Concretize the type wrapper using polymorphic embedding technique; Extend

Intermediate Representation with special MethodCall and NewObject nodes; and Use standard code generation techniques.

This provides the advantage that the type wrapper, that is, a type of an internal type system may be used to represent a type of another type system. Hence, the type wrapper method allows to write code in one programming language and using the same code in a second programming language without rewriting the code.

BRIEF DESCRIPTION OF THE DRAWINGS

Further embodiments of the invention will be described with respect to the following figures, in which:

Fig. 1 shows a schematic diagram illustrating a program code 100 in a domain specific language including a user defined type; Fig. 2 shows a schematic diagram illustrating a method 200 for transforming a source code into a target code on a computer according to an implementation form; Fig. 3 shows a schematic diagram illustrating a system 300 for transforming a source code into a target code on a computer according to an implementation form;

Fig. 4 shows a schematic diagram illustrating a method 400 for transforming a source code into a target code on a computer according to an implementation form;

Fig. 5 shows a schematic diagram illustrating an example of an external type 500 according to step 0 of the method 400 depicted in Fig. 4;

Fig. 6a shows a schematic diagram illustrating an example of a type wrapper definition 600a according to step 1 of the method 400 depicted in Fig. 4;

Fig. 6b shows a schematic diagram illustrating an example of a virtualization 600b of a type wrapper according to step 2 of the method 400 depicted in Fig. 4; Fig. 7 shows a schematic diagram illustrating two exemplary concretizations 700 of a type wrapper, i.e., a standard concretization 701 and a staged concretization 702 according to step 3 of the method 400 depicted in Fig. 4;

Fig. 8a and 8b show schematic diagrams illustrating an exemplary program code for a method call 800a and a new object 800b according to step 4 of the method 400 depicted in Fig. 4;

Fig. 9a and 9b show schematic diagrams illustrating an exemplary program code for implementing a new IR node 900a based on the external method call 800a according to Fig. 8a and a new IR node 900b based on external instance creation 800b according to Fig. 8b;

Fig. 10 shows a schematic diagram illustrating an example of staged evaluation 1000 according to step 4 of the method 400 depicted in Fig. 4; Fig. 1 1 shows a schematic diagram illustrating a code generation example 1 100 according to step 5 of the method 400 depicted in Fig. 4; and

Fig. 12 shows a schematic diagram illustrating an example of an optimization according to step 4a of the method 400 depicted in Fig. 4.

DETAILED DESCRIPTION OF EMBODIMENTS In the following detailed description, reference is made to the accompanying drawings, which form a part thereof, and in which is shown by way of illustration specific aspects in which the disclosure may be practiced. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

It is understood that comments made in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise. Fig. 2 shows a schematic diagram illustrating a method 200 for transforming a source code into a target code on a computer according to an implementation form. Properties of the source code are determined by a source code type system including a collection of rules that apply to constructs of the source code. The method 200 includes defining 201 a type wrapper for importing a type of a target code type system to the source code type system. The method 200 includes virtualizing 202 the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode, e.g. a standard evaluation mode or a staged evaluation mode. The method 200 includes concretizing 203 the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode. The method 200 includes extending 204 the intermediate representation of the source code with new objects associated with new node types of the intermediate representation. The method 200 includes generating 205 the target code based on the extended intermediate representation of the source code.

In one example, the source code type system may be a Domain Specific Language type system. In another example, the source code type system may be a general purpose language type system.

The virtualizing 202 of the type wrapper may be based on polymorphic embedding techniques. The at least one evaluation mode may be a standard evaluation mode or a staged evaluation mode. The method 200 may further include concretizing 203 the virtualized representation of the type wrapper to one of the standard evaluation mode or the staged evaluation mode.

The evaluation of the virtualized representation in the staged evaluation mode produces an intermediate representation of the source code. The intermediate representation of the source code is a graph-based intermediate representation.

The new objects associated with the staged evaluation mode may represent new types of nodes of the intermediate representation of the source code. The new objects may represent new method calls for calling methods of the external types of the target code type system. The new objects representing new method calls may include the following parameters: a reference (represented by some symbol) to an element of the intermediate representation of the source code, a name of a called method in the types of the source code type system, a list of method arguments represented by some symbols.

The new objects may represent new instances of the types of the source code type system. The new objects may include meta-information of the types and arguments required for construction of the instances of the types.

Fig. 3 shows a schematic diagram illustrating a system 300 for transforming a source code into a target code on a computer according to an implementation form. Properties of the source code are determined by a source code type system including a collection of rules that apply to constructs of the source code.

The system 300 includes a type wrapper subsystem 301 configured to define a type wrapper for importing a type of a target code type system to the source code type system. The system 300 includes a virtualization subsystem 302 configured to virtualize the type wrapper to a virtualized representation for an evaluation of the virtual representation in at least one evaluation mode. The system 300 includes a concretization subsystem 303 configured to concretize the virtualized representation of the type wrapper to a particular evaluation mode of the at least one evaluation mode to provide an intermediate representation of the source code based on the particular evaluation mode. The system 300 includes an extension subsystem 304 configured to extend the intermediate representation of the source code with new objects associated with new node types of the intermediate representation. The system 300 includes a code generation subsystem 305 configured to generate the target code based on the extended intermediate representation of the source code.

The at least one evaluation mode may be a standard evaluation mode or a staged evaluation mode. The concretization subsystem 303 may concretize the virtualized representation of the type wrapper in the staged evaluation mode or alternatively in the standard evaluation mode.

The type wrapper subsystem 301 may perform the first step, i.e. defining 201 a type wrapper as described above with respect to Fig. 2. The virtualization subsystem 302 may perform the second step, i.e. virtualizing 202 the type wrapper to a virtualized

representation as described above with respect to Fig. 2. The concretization subsystem 303 may perform the third step, i.e. concretizing 203 the virtualized representation of the type wrapper as described above with respect to Fig. 2. The extension subsystem 304 may perform the fourth step, i.e. extending 204 the intermediate representation as described above with respect to Fig. 2. The code generation subsystem 305 may perform the fifth step, i.e. generating 205 the target code as described above with respect to Fig. 2.

Fig. 4 shows a schematic diagram illustrating a method 400 for transforming a source code into a target code on a computer according to an implementation form. The method 400 includes the following steps: 1 ) Define a type wrapper 402 using an interface declaration with semantic annotations. 2) Virtualize the wrapper 403 using the method of staged evaluation. 3) Concretize the type wrapper 404 using polymorphic embedding technique. 4) Extend prior art Intermediate Representation 405 with two special nodes: MethodCall and NewObject. 4a) Optionally, IR optimization 406. 5) Use standard (prior art) code generation techniques 407.

The method can be represented as linear process which consists of five main steps 402, 403, 404, 405, 407 and initial step of preparation 401. Exemplary implementations of these steps are described below with respect to Figures 5 to 12.

The first step 402 may correspond to the first step, i.e. defining 201 a type wrapper as described above with respect to Fig. 2. The second step 403 may correspond to the second step, i.e. virtualizing 202 the type wrapper to a virtualized representation as described above with respect to Fig. 2. The third step 404 may correspond to the third step, i.e. concretizing 203 the virtualized representation of the type wrapper as described above with respect to Fig. 2. The fourth step 405 may correspond to the fourth step, i.e. extending 204 the intermediate representation as described above with respect to Fig. 2. The fifth step 407 may correspond to the fifth step, i.e. generating 205 the target code as described above with respect to Fig. 2.

Fig. 5 shows a schematic diagram illustrating an example of an external type 500 according to step 0 of the method 400 depicted in Fig. 4. The program code depicted in Fig. 5 shows an example of a Scala type 500 that may be reused in application code or in user-defined types. In this example, the external type is restricted to provide the semantic of a class.

Many of existing types can be reused in an application code or in a new DSL. To be reused in DSLs the types have to provide semantic of a class. It means that most Scala classes can be reused in the compiler, for instance. Besides of Scala's types, there are options of importing of C++ and Java classes. The example below demonstrates Scala class and its companion object that could be identified as a type for importing into DSL. Fig. 6a shows a schematic diagram illustrating an example of a type wrapper definition 600a according to step 1 of the method 400 depicted in Fig. 4. The construct Seq[A] denotes the external type from Scala's collections which shall be imported to DSL. The construct SSeq[A] denotes the new name of the imported type in the DSL.

For the importing type, a new non-conflict name should be chosen. Inside DSL, the external type cannot be used directly. It is available through a type wrapper which has a distinct name. Any definition of a type wrapper adds a new type to the DSL. The definition binds the external type to internal type (e.g. DSL type). Fig. 6a shows the binding of external type Seq[A] and DSL type SSeq[A]. The type wrapper provides access to wrapped value via special field or method. Fig. 6a shows the wrappedValue method which provides such access. Exposed methods from original type may be marked by language specific way. Fig. 6a demonstrates one of the ways via the ©External annotation. Other properties of the original type (container property, for example) should be defined in the type wrapper in any implementation specific way. On Scala and Java languages, annotations can be used for that. The example shows such annotations: @ContainerType and ©Semantics. Fig. 6b shows a schematic diagram illustrating an example of a virtualization 600b of a type wrapper according to step 2 of the method 400 depicted in Fig. 4. The construct "Rep" denotes the virtualizing of the imported type by wrapping of its value.

Signatures of all methods and fields of type wrappers should be transformed to virtualized representation. The virtualization process is wrapping of a type (T) by Rep[T] and wrapping of literal constants by using a polymorphic embedding technique.

Wrapping of types/constants by Rep[]/toRep() is needed to evaluate the wrapped type in staged mode. Evaluation in the staged mode means that instead of producing data values, the result of execution is intermediate representation of the source program. The result of staged evaluation can be input to compilation pipeline, which is usually has many steps. On the first step, evaluation of the program could produce some graph-based IR. On second step, the IR is optimized and transformed to another IR. And on third step, the IR is interpreted according to some operational semantic and produces final value. Fig. 7 shows a schematic diagram illustrating two exemplary concretizations 700 of a type wrapper, i.e., a standard concretization 701 and a staged concretization 702 according to step 3 of the method 400 depicted in Fig. 4. The virtualized type wrapper can be evaluated in multiple modes. There are at least two kinds of evaluations which are standard evaluation and staged evaluation. In standard evaluation, a program with virtualized types 705 runs as usual according to the operational semantic of the host language. Wrapping of types/literals by Rep[]/toRep() is just ignored. Semantically, in standard evaluation, Rep[T] equals to T and toRep(C) equals to C. All existing methods of type importing work only in this mode of program evaluation and any of them can be used in the concretization 700 example shown in Fig. 7. In staged evaluation 702, Rep[T] carries another meaning. It's value is a symbol of type Sym[T]. The symbol can be imagined as an identifier of a node of graph representation of a program.

Fig. 7 shows how the external method size is implemented in both modes. In standard evaluation mode 701 , the size method is just an invocation of a method (the same name - size) of the wrapped value of the external type. In the case of staged evaluation 702, the size method creates a new object MethodCall by passing method's name.

In the following, an example SSeq(3.14).size is described and how it evaluates in standard mode:

- Due to SSeq is wrapper of standard Scala collection Seq, it is substituted by

StdSSeqlmpl(Seq(3.14)).size;

- De-sugaring of Seq(3.14) to scala. collection. Seq. apply(3.14);

- Substitution Seq by more concrete collection List: StdSSeqlmpl(List(3.14)).size;

- The method size is invoked through a reflection API on an object instance:

List(3.14).size;

- This last invocation results to a real evaluation of the last expression to 1.

Fig. 8a and 8b show schematic diagrams illustrating an exemplary program code for a method call 800a and a new object 800b according to step 4 of the method 400 depicted in Fig. 4. Fig. 8a and Fig. 8b show examples of NewObj and MathodCall in the context of implementation of type wrapper for staged mode. The concrete implementation of type wrapper for staged evaluation may require new objects. The objects represent new type of IR nodes, i.e. a new type associated with a method call and a new type associated with creation of new instance of the external type. Regarding the method call, i.e. calling of a method of the original external type, to construct the IR node, the following parameters should be passed to the constructor: A reference to an owner symbol that creates new element of IR. The name of called method in the original type; and a list of method arguments represented by some symbols. Regarding the new object creating of new instance of the external type, to create the NewObj instance, the creator of the IR node has to pass to the constructor, i.e. Meta information about the original type, and arguments needed for type construction.

Fig. 9a and 9b show schematic diagrams illustrating an exemplary program code for implementing a new IR node 900a based on the external method call 800a according to Fig. 8a and a new IR node 900b based on external instance creation 800b according to Fig. 8b.

Fig. 10 shows a schematic diagram illustrating an example of staged evaluation 1000 according to step 4 of the method 400 depicted in Fig. 4.

The example of Fig. 10 corresponds to the example described above with respect to Fig. 7 evaluated in standard mode. In the following it is described how this example

SSeq(3.14).size evaluates in staged mode. On the left side, the program code 1001 is represented, while on the right side, the transformation to IR 1010 is represented.

The constant literal 3.14 (reference sign 1002) is transformed to a new IR node

Const(3.14) (reference sign 1011) which has assigned symbol s34 (reference sign 1003), the number may be selected by compiler. The 3.14 constant is substituted by assigned symbol s34: SSeq(s34).size (reference sign 1003). The SSeq(s34) expression means creating of a new instance. And the original expression is replaced by s35.size (reference sign 1004) where s35 is NewObj(s34).size (reference sign 1012). The s35.size expression (reference sign 1004) is substituted by s36 (reference sign 1005) where s36 is

MethodCall(s34) (reference sign 1013). The result of the last step is some graph-based IR. On the next stages, the IR is transformed (optimized) and/or is interpreted.

Fig. 1 1 shows a schematic diagram illustrating a code generation example 1 100 according to step 5 of the method 400 depicted in Fig. 4.

Generation of target code may be performed by walking in graph-based IR 1 101 and producing target code 1 110. A configuration file 1 105 may be used to produce the target code 1 1 10. The code generation example 1 100 is a continuation of the example described above with respect to Fig. 10 for step 5 of the method 400.

The process starts from leaf. In particular, the IR node s34 is translated to val s34: Double = 3.14 (reference sign 1 102). The node NewObj(s34) (reference sign 1 103) is translated to a statement 1 1 10 which creates new instance of the Seq type: val s35 = new Seq(s34). The MethodCall(s35, "size") node (reference sign 1 104) becomes: val s36 = s35.size. Final result of the IR translation: s36.

Fig. 12 shows a schematic diagram illustrating an example of an optimization according to step 4a of the method 400 depicted in Fig. 4.

In an application 1204, e.g. a Spark application, a compiler performs the following compiler phases 1203: a parser 1205 parses an external type 1201 providing an abstract syntax tree (AST) representation 1206 to a META block 1207. The META block 1207 transforms the AST representation 1206 in a virtualized AST representation 1208 by using a configuration file 1202. A textual representation 1209 of the virtualized AST

representation 1208 is output to an optimized Spark application 1210.

The present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein, in particular the methods 200, 400 or the system 300 described above with respect to Figs. 2 to 4. Such a computer program product may include a readable storage medium storing program code thereon for use by a computer. The program code may perform the method 200, 400 as described above with respect to Fig. 2 and Figures 4 to 13 or the system 300 described above with respect to Fig. 3. While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations, such feature or aspect may be combined with one or more other features or aspects of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "include", "have", "with", or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprise". Also, the terms "exemplary", "for example" and "e.g." are merely meant as an example, rather than the best or optimal. The terms "coupled" and "connected", along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other. Although specific aspects have been illustrated and described herein, it will be

appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.

Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.

Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.