Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FIELD SPECIALIZATION SYSTEMS AND METHODS FOR IMPROVING PROGRAM PERFORMANCE
Document Type and Number:
WIPO Patent Application WO/2016/161130
Kind Code:
A1
Abstract:
Systems and methods for improving the performance of a computer program, such as a database management system (DBMS), are provided. Such methods involve identifying invariant intervals for variables in the DBMS code, based on a Program Representation (PR). Program interactions within the DBMS and domain assertions are deduced, based on the PR and an Ecosystem Specification for the DBMS. One or more candidate snippets are identified, based on the invariant intervals for variables in the DBMS code, the PR, one or more execution summaries associated with the DBMS, the deduced program interactions and the deduced domain assertions. Spiffs are then generated, based on the one or more candidate snippets. Such spiffs include predicate query spiff, hash-join query spiff, aggregate spiff, page spiff, and string matching spiff. The DBMS code is modified based on the speccode generated from these spiffs.

Inventors:
SNODGRASS RICHARD T (US)
DEBRAY SAUMYA K (US)
ZHANG RUI (US)
THOMAS STEPHEN (US)
MASON SEAN (US)
Application Number:
PCT/US2016/025295
Publication Date:
October 06, 2016
Filing Date:
March 31, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DATAWARE VENTURES LLC (US)
International Classes:
G06F9/45; G06F9/445; G06F12/00; G06F17/30
Foreign References:
US20140365533A12014-12-11
US20030200537A12003-10-23
US20080189277A12008-08-07
US20130054649A12013-02-28
Other References:
See also references of EP 3278218A4
Attorney, Agent or Firm:
SOLOWAY, Norman, P. et al. (4640 E. Skyline DriveTucson, Arizona, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A computer-implemented method for improving the performance of a computer program code, comprising:

identifying, based on a Program Representation (PR), that is, an abstract syntax tree or other embodiment of the computer program code, invariant intervals for variables in the computer program code;

deducing, based on the PR and an Ecosystem Specification for the computer program, program interactions within the computer program;

deducing, based on the PR, the identified invariant intervals for variables in the computer program code and the deduced program interactions, the domain assertions;

identifying, based on the invariant intervals for variables in the computer program code, the PR, one or more execution summaries associated with the computer program, the deduced program interactions and the deduced domain assertions, one or more candidate snippets;

generating specialized computer program code, based on the one or more candidate snippets; and

modifying the computer program code based on the generated specialized computer program code; and

concealing the specialized computer program code.

2. The computer-implemented method of claim 1 characterized by one or both of the following features:

(a) wherein the identified invariant intervals span multiple executions; and (b) wherein the identified invariant intervals comprises at least set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node.

3. The computer-implemented method of claim 1 or claim 2, wherein each of the one or more candidate snippets comprises (a) an interval of code identified by the PR, or (b) a set of invariants and a set of possible values for each variant.

4. The computer-implemented method of any of claims 1 -3, wherein each of the one or more candidate snippets comprises an appropriate lifetime of the Candidate Snippet, and wherein each of the one or more candidate snippets preferably comprises suggested optimizations to be employed within the appropriate lifetime of the Candidate Snippet. 5. The computer-implemented method of any of claims 1-4, wherein generating specialized computer program code comprises (a) inserting codes in places within the computer program to create the specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, or (b) creating a specialized function for matching arbitrary strings to a given string pattern, or comprises using a meta-specializer to traverse an interpreted data structure and hot swapping to convert existing specialized computer program code, or involves query, or comprises eliminating branches in the computer program code and thus reducing the size of the computer program code, or comprises utilizing a numeric value to slab-allocate a specializer-in-the-field (Spiff) data section, wherein the Spiff data section is then reused by computing a corresponding aggregate function across all input rows within the computer program code to eliminate memory allocation per-row, wherein the numeric value is defined by max number of digits that can be supported per row, or comprises utilizing invariant within a disk or memory page with which the computer program is stored, reorganize the data layout once the page is read, and optimize data locality.

6. The computer-implemented method of any one of claims 1 -5, wherein the specialized computer program code is created at a run time and invoked at a later time., and optionally, further comprising determining whether any violations of the identified invariable intervals occur in a given execution.

7. A system configured for improving the performance of a computer program, comprising:

an Invariant Finder identifying invariant intervals for variables in the computer program, based on a Program Representation (PR) of the computer program;

an Interaction Deducer deducing program interactions within the computer program based on the PR and an Ecosystem Specification for the computer program,;

a Domain Assertion Deducer deducing domain assertions based on the PR, the identified invariant intervals for variables in the computer program and the deduced program interactions; a Snippet Finder identifying one or more candidate snippets based on the invariant intervals for variables in the computer program, the PR, one or more execution summaries associated with the computer program, the deduced program interactions and the deduced domain assertions;

a specializer-in-the-field (Spiff) Maker generating specialized computer program code based on the one or more candidate snippets; and

a specialized source receiving the generated specialized computer program code to modify the computer program code based on the generated specialized computer program code.

8. The system of claim 7, wherein the Invariant Finder performs static analysis on the PR to identify invariant intervals.

9. The system of claim 7 or claim 8, wherein the identified invariant intervals comprises at least set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node.

10. The system of any one of claims 7-9, further comprising an invariant check determining whether any violations of the identified invariable intervals occur in a given execution.

11. The system of anyone of claims 7-10, wherein each of the one or more candidate snippets comprises a set of invariants and a set of possible values for each variant.

12. The system of anyone of claims 7- 11, wherein each of the one or more candidate snippets comprises an appropriate lifetime of the Candidate Snippet, and wherein each of the one or more candidate snippets preferably comprises suggested optimizations to be employed within the appropriate lifetime of the Candidate Snippet.

13. The system of anyone of claims 7-12, wherein the Spiff Maker is further configured to insert codes in places within the computer program utilized to create the specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, or to create a specialized function for matching arbitrary strings to a given string pattern, or to use a meta-specializer to traverse an interpreted data structure and hot swapping to convert existing specialized computer program code, or to combine query invariants in effect during a query with schema and row invariants, or to eliminate branches in the computer program code and thus reduce the size of the computer program code, or to utilize a numeric value to slab-allocate a specializer- in-the-field (Spiff) data section, wherein the Spiff data section is then reused by computing a corresponding aggregate function across all input rows within the computer program code to eliminate memory allocation per-row, wherein the numeric value is defined by max number of digits that can be supported per row, or to utilize invariant within a disk or memory page with which the computer program is stored, reorganize the data layout once the page is read, and optimize data locality.

14. A non-transitory computer readable medium comprising computer executable instructions that, when executed by a processor of a computing device, cause the computing device to:

determine a variable in a computer program whose value is invariant within an identified invariant interval via static analysis on a Program Representation (PR) of the computer program;

produce codes in places within the computer program to create a specialized computer program code, to invoke the specialized computer program code and to destroy the specialized computer program code, the a specialized computer program code being created based at least on the determined variable; and

modify at least a part of the computer program with the specialized computer program code when the specialized computer program is invoked.

15. The non-transitory computer readable medium of claim 14, wherein producing the specialized computer program code is further based on at least one of domain-specific knowledge associated with the computer program and extra-source knowledge.

Description:
FIELD SPECIALIZATION SYSTEMS AND METHODS FOR IMPROVING

PROGRAM PERFORMANCE

The present disclosure is generally related to field specialization for improving performance of a computer program, and more particularly is related to systems and methods for improving the performance of database management systems by identifying invariant intervals for variables and modifying the DBMS code utilizing specialized code generated, at least in part, based on the identified invariant intervals.

A database management system (DBMS) is a collection of software programs that manage the storage and access of data. As larger volumes of data are being generated nowadays and thus must be stored and efficiently accessed, DBMSes have been adopted across a wide range of application domains. Driven by such ubiquitous deployments over the last four decades, DBMSes have been designed and engineered based on a few data models that are generally applicable to those domains. The relational data model is the one most prevalently adopted by commercial and open-source DBMSes. A significant amount of effort has been devoted to efficiently support this data model.

Due to the generality of the relational data model, relational database management systems are themselves general, in that they can handle whatever schema the user specifies and whatever query or modification is presented to them. Relational operators work on essentially any relation and must contend with predicates specified on any attribute of the underlying relations. Through such innovations as effective indexing structures, innovative concurrency control mechanisms, and sophisticated query optimization strategies, the relational DBMSes available today are very efficient. Such generality and efficiency has enabled their prohferation and use in many domains.

Nevertheless, such generality is realized via multiple layers of indirections and sophisticated code logic. Efficiency can be further enhanced for DBMSes by exploiting invariant values present during the execution of such systems. The field specialization technology disclosed in the present application is developed to automatically identify invariants and effect code specialization based upon invariants.

Embodiments of the present disclosure provide systems and methods for improving the performance of a database management system (DBMS). Briefly described, one embodiment of the method, among others, can be implemented as follows. A computer- implemented method for improving the performance of a DBMS includes the steps of: (i) identifying, based on a compile-time analysis of the DBMS source code, invariant intervals for variables in the DBMS code; (ii) deducing, based on the source code and an Ecosystem Specification for the DBMS, program interactions within the DBMS; deducing, based on the source code, the identified invariant intervals for variables in the DBMS code and the deduced program interactions, termed domain assertions; (iii) identifying, based on the invariant intervals for variables in the DBMS code, the source code, and one or more execution summaries associated with the DBMS executed using various workloads, the deduced program interactions, and the deduced domain assertions, one or more candidate snippets; (iv) generating specialized DBMS code, at various time points including compile time and runtime based on the identified candidate snippets; and (v) modifying the DBMS by inserting code that may be utilized to perform the specialized code generation possibly at runtime and later invoke the specialized code.

Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.

Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure.

Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram illustrating the spiff tool architecture, in accordance with an exemplary embodiment provided by this disclosure.

FIG. 2 is a block illustration of a field specialization process in accordance with an exemplary embodiment provided by this disclosure.

FIG. 3 is an illustration of field specialization for elaboration a paradigm of computer science with an exemplary embodiment provided by this disclosure. Many embodiments of the disclosure may take the form of computer-executable instructions, including algorithms executed by a programmable computer. However, the disclosure can be practiced with other computer system configurations as well. Certain aspects of the disclosure can be embodied in a special-purpose computer or data processor that is specifically programmed, configured or constructed to perform one or more of the computer-executable algorithms described below.

The disclosure also can be practiced in distributed computing environments, where tasks or modules are performed by remote processing devices that are linked through a communications network. Moreover, the disclosure can be practiced in Internet-based or cloud computing environments, where shared resources, software and information may be provided to computers and other devices on demand. In a distributed computing environment, program modules or subroutines may be located in both local and remote memory storage devices. Aspects of the disclosure described below may be stored or distributed on computer-readable media, including magnetic and optically readable and removable computer disks, fixed magnetic disks, floppy disk drive, optical disk drive, magneto-optical disk drive, magnetic tape, hard-disk drive (HDD), solid state drive (SSD), compact flash or non- volatile memory, as well as distributed electronically over networks including the cloud. Data structures and transmissions of data particular to aspects of the disclosure are also encompassed within the scope of the disclosure.

While the present invention may be described herein primarily with respect to a relational DBMS, the present invention is not limited to such a DBMS type. It will be readily understood that the present invention may be applied to any DBMS type, including, but not limited to, hierarchical, network and object-oriented DBMS types. Moreover, while field specialization is disclosed herein primarily with respect to a DBMS, it should be understood that the concepts provided herein may be applied to any program that manipulates data and in particular, performs complex analysis on that data. Specifically, it is understood that the system and method disclosed may also be applicable to computer programs that require high run-time performance and execute the application multiple times over the same data, but with different parameters or queries. A "spiff," which stands for specializer in the yield, is code that dynamically creates specialized code at DBMS runtime. "Field specialization" is the process of inserting spiffs into DBMS code so that the DBMS can specialize itself by exploiting runtime invariants. The specialized code (which may be referred to herein as "speccode") is faster and generally smaller than the original unspecialized code. Field specialization gets its name from the fact that the speccode is generated and invoked "in the field," i.e., after the DBMS has been deployed and is running at the end user's site. A spiff uses the actual value of a runtime invariant— which is obtained at runtime— to dynamically produce code that is specialized to that particular value of the runtime invariant.

In Applicants' co-pending U.S. Patent Application serial number 14/368,265, which the present application claims priority to, the term "micro-specialization" is equivalent to the term "field specialization" as used herein; the term "bee" is equivalent to the term "spiff' as used herein; an instantiated bee is equivalent to "specialized code" as used herein, which is the result of a spiff; and the HRE (hive runtime environment) is equivalent to "SRE" (spiff runtime environment) as used herein.

Architecture

FIG. 1 is a block diagram illustrating the spiff tool architecture, in accordance with an exemplary embodiment provided by mis disclosure.

In one embodiment, the present disclosure provides a spiff tools architecture that automatically field specializes an arbitrary program, given three inputs, as shown in FIG. 1 :

· the application's Source Code, the obvious input,

· one or more Workloads, and

· an Ecosystem Specification.

This architecture thus posits that field specialization will analyze the application source code and eventually be an entirely automatic process utilizing these three inputs to produce a specialized application having the identical semantics but running faster. Goals

The goals of this architecture include the following.

1. Provide an end-to-end solution that takes source files for a program or suite of related programs and automatically provides field specialized versions of those source files, including the code for spiff generation, spiff compilation, spiff instantiation, spiff invocation, and spiff garbage collection. 2. Provide domain independence, in that the architecture works with virtually any program, compiled to any conventional architecture, including highly parallel Graphical Processing Units (GPUs).

3. Minimize the information needed to be provided by the tool user and

maximize the information extracted from the program under analysis.

4. Partition the analysis into a sequence of tools, each of which performs only one conceptual task.

5. Enable scalability of the tools, by ensuring that each tool produces a small output that captures the result of that tool' s analysis.

6. Enable incremental development, so that tools can be tested initially on small programs and then on realistic programs.

7. Enable successive refinement, in that each tool can initially do just a partial, best- effort analysis (e.g., finding just some of the invariants, or rninimal candidate snippets) and then be refined to produce a more comprehensive output over time.

8. Enable performance benefit estimation, as the benefit of each individual code transformation introduced by a spiff can be evaluated dynamically and/or independently; the overall benefit of that spiff can be computed by taking into account the effected code transformations and the characteristics of a particular workload, without exhaustively evaluating all combinations of code transformations and measuring their execution time.

Tools

As shown in FIG. 1, the spiff tool architecture includes a number of tools. These tools include: Invariant Finder, Tracer, Invariant Checker, Program Interaction Deducer, Domain Assertion Deducer, Snippet Finder, and Spiff Maker, each of which will be described in further detail below. With the exemplary spiff tool architecture shown in FIG. 1 , one skilled in the art would understand that many variations and other partitioning may be without departing substantially from the spirit and principles of the disclosure.

The description here utilizes a particular Program Representation (PR), such as an Abstract Syntax Tree (AST), which is a common computer-readable stracturing of the source code of the application. This invention also applies when the analysis is performed instead directly on the textual source code of the application, on a lower-level intermediate representation (IR) of the source code, or even on the equivalent assembly or machine code representation of the source code. We term each individual program construct utilized by the PR a Program Expression (PE). For an AST, a PE is an AST node; for IR, a PE is an IR instruction; for source code, a PE is a single statement

Invariant Finder

Invariant Finder takes as input a PR of the DBMS to be specialized and Trace Events (optional), performs static analysis on the PR and outputs zero or more Invariant Intervals.

Some definitions:

Invariant Interval: A set of paths defined by a single starting PE (or equivalently, a single position within the source code), and a single ending PR node that is reachable during one or more possible executions from the starting node, over which a particular property of a variable holds. An example of such a property is not written. (An interval can consist of a set of paths, rather than a single path. For example, none of the branches of an if/else block modifies the variable in question, so the variable remains invariant on all the code paths associated with these branches.) Note that the Invariant Interval starts within the starting PE (that is, as soon as the variable has that assigned value: the starting PE is always a statement that sets the value of the variable) and ends within the ending PE, right before the value is set again. At each PE along this path, the value of that variable will be the same as it is at other points along that path, hence: the term invariant.

Invariant Interval Set: A set of invariant intervals for a particular variable, where all invariant intervals in the set share the same starting node. An Interval perhaps may not be maximal, in that it is terminated earlier than needed, if the analysis cannot ascertain that the indicated property still holds after the execution of that PE

Value Flow Tree (VFT): A tree that captures the copying of one variable's value to another variable. The VFT connects an Invariant Interval Set of variable X to an Invariant Interval Set of variable Y, when Y is assigned its value from X.

As an example, an invariant interval may exist over an interval where a value (i.e., the property is a value) of the variable holds, e.g., 'Variable equals N" (for some constant N). This may omit certain kinds of optimizations, e.g.:

· Optimizations based on program state, e.g., the memory allocation optimization in aggregate computations. · Optimizations based on properties not of the form "variable equals N". E.g., given a code fragment if ( p I = NUL L ) S we know that the pointer p must be non~NULL within S , and should be able to optimize away redundant NULL-checks, e.g., in functions called from S.

· Optimizations based on derived values, e.g., string length, mat may not be explicitly materialized in the code.

· Optimizations based on domain knowledge, e.g., the cardinality of the set of values that can appear in a column.

Example 1 : if statements

Consider the following, from Example 1 shown below:

Here, Invariant Finder won't know statically whether the "if statement will be true or false. Thus, Invariant Finder should output the following Invariant Interval Sets for the variable x:

For simplicity, we'll use line numbers instead of PE IDs to identify source locations. We use closed-open intervals.

· Invariant Interval Set #1 : Starts on line 1 , with 1 invariant interval: o Invariant Interval #1.1 : Ends at line 3 • Invariant Interval Set #2: Starts on line 10, with 1 invariant interval:

o Invariant Interval #2.1: Ends at line 15 (that is, the end of the

program)

Invariant Finder may output the above in some structured format, such as XML; however, in the present disclosure, lists and sublists will be utilized for simplicity.

Invariant Finder may vary in its precision but must be accurate. Specifically, the invariants that it produces should be correct, but do not necessarily need to be exhaustive. For instance, x is actually invariant from line 1 to line 5, and from line 1 to line 9. However, it's also accurate (but less precise) to, for example, just stop the interval at the beginning of the "if statement. Of course, with less precise intervals, Snippet Finder and Spiff Maker (tools which will be described below) will not have as many opportunities to field specialize the application.

Invariant Finder could output such an Invariant Interval Set for every variable in the program. Let's look at those for the variable h:

· Invariant Interval Set #3 : Starts on line 11 , with 1 invariant interval:

o Invariant Interval #3.1: Ends at line 13

· Invariant Interval Set #4: Starts on line 13, with 1 invariant interval:

o Invariant Interval #4.1 : Ends at line 15

Variable y should be:

· Invariant Interval Set #5: Starts on line 2, with 1 invariant interval: o Invariant Interval #5.1: Ends at line 15

z's should be:

· Invariant Interval Set #6: Starts on line 12, with 1 invariant interval: o Invariant Interval #6.1: Ends at line 15

Finally, variable a's should be:

· Invariant Interval Set #7: Starts on line 14, with 1 invariant interval: o Invariant Interval #7.1 : Ends at line 15

Notice how variable h gets its value from variable x: its value "flows" from x. z's value, which in turn, "flows" from h. So, tying it all together, the VFT for x would be as shown in Example 2, below, given in an exemplar canonical representation.

The numbers in the "from" and "to" attributes refer to one of the Invariant Interval Sets (IIS) above. So the first line indicated is from Invariant Interval Set #1 to Invariant Interval Set #4.

Example 2: Pass-bv-value functions

Suppose that a variable "a" is invariant in a function X() but temporarily changes its value within a called functionY(a) . WhenY(a) returns, the value of"a" still has its (invariant) value. This situation is accommodated because a call to a function passing a variable's value is a copy of that value to another variable, which is associated with its own Invariant Interval Set.

Example 3: Loops with no assignment

Consider the following code, from Example 3:

To understand loops, Invariant Finder should not actually unroll loops. Rather, it should check to see if there is an assignment to the variable within the loop. If not, as in this example, then an invariant interval that reached that loop would extend across the loop:

· Invariant Interval Set #1 : Starts on line 1 , with 1 invariant intervals:

o Invariant Interval #1.1: Ends on 8

Example 4: Loops with assignment to existing variable

However, consider a conditional assignment to a variable in a loop, with reference to Example 4:

Here, Invariant Finder would create the following intervals:

· Invariant Interval Set #1 : Starts on line 1 , with 1 invariant intervals:

o Invariant Interval #1.1: Ends on 2

· Invariant Interval Set #2: Starts on line 7, with 1 invariant interval: o Invariant Interval #2.1: Ends on 9

· Invariant Interval Set #3: Starts on line 9, with 1 invariant interval: o Invariant Interval #3.1: Ends on 10

Note that again, we're making the less-precise-but-still-accurate simplification to exclude any loop mat might potentially write to the variable. Further, it should be noted that an interval is not ended at line 8, because the function call cannot change the value of x; rather, this value is copied to a variable local to some_other_func.

Example 5: Loops with creation of new variable

Consider the creation of a variable in a loop, with reference to Example 5:

Example 5

Here, Invariant Finder would create:

• Invariant Interval Set #1 : Starts on line 4, with 1 invariant interval:

o Invariant Interval #1.1: Ends on 8 (i.e., after the last iteration of the loop)

Example algorithm for Invariant Finder

Invariant Finder starts with the leaves of the call graph, that is, the functions that do not invoke any other functions. It could then compute the VFT for that function, adding edges when a variable is copied (e.g., h = x ; ). Then it could consider functions that only call leaf functions, and then add edges for local variables (such as when x is passed) and merge intervals. Then it could iterate, considering functions that only call functions that have VFTs computed for them.

Recursive functions and cycles within the call graph require additional care. The bottom-up traversal of the call graph is a static analysis of the program. As an invariant must be true on all paths, Invariant Finder uses the signature and make the assumption that the indirect call can be to any function that matches its caller signature.

Note that a memory allocation within a loop can give rise to many different runtime locations. Once such an allocation occurs, the memory pointed to by that variable is invariant until the variable is assigned. An allocation within a loop will either be assigned to a new element (say of an array) or will overwrite a variable

For indirect function calls, Invariant Finder can alternate forward-analysis steps, which propagates values for function pointers to figure out the set of possible targets for each indirect call, with backward analysis steps, which propagate value flows through the call graph bottom-up as described above. This alternation can be iterated until the set of function-pointer targets stabilizes.

The Invariant Interval may further identify the possible values that the variable may take on. As an example, for a variable join_type, there may be only a few different values ever assigned to that variable, and they may all be known statically. Sometimes this is specified in the variable type (an enumeration) and sometimes this can be discovered by static analysis, e.g., by examining all the values assigned to mat variable. When the set of possible values is small, Invariant Finder may record the invariant interval for each value.

Correctness

Each invariant interval returned by the tool should be correct-that is, the associated variable should be guaranteed to be unchanged over all paths between the start and the end of the interval, not including the end. If there are any indirect assignments within any of the paths, the Invariant Finder tool must ensure that all such assignments cannot change the value of the indicated variable.

The analysis may be conservative, in two ways. First, there may be false negatives: intervals that are correct but are not returned by the tool, either (a) as an interval set or (b) as an individual interval within an interval set. It is acceptable if the tool indicates where variables are assigned (that is, starting an invariant interval) but not analyzed by the tool (missing interval set) as well as the interval sets that are incomplete (missing individual intervals).

Second, there may be non-maximal intervals: intervals that do not end in a statement that definitely changes the value. This could be caused by (a) an assignment that does not actually change the value or (b) a non-assignment for which the analysis is not sufficiently precise to determine that the value is not changed, for example, a "for" statement that might change the value within that statement.

Correctness also requires that all value flow tree links be correct: that each represents the copying of a value. However, these links can be non-maximal, in mat an interval set need not be linked to another one, even if its value does indeed come from that other one.

Tracer

Another tool disclosed herein is referred to as "Tracer." Tracer takes as input an Executable under a Workload and outputs a sequence of Trace Events. A Trace Events output typically records an execution of an instruction that may affect data flow within the program, such as "loop entered", "variable read", or "function call".

The Trace Events are processed by a further tool, "summarizer," to produce

Execution Summaries, which provide an output of functions, statements, and variables along with their execution statistics. Such information indicates "hot spots" within the application that could benefit from field specialization.

Correctness dictates that if certain activities of interest occur during the execution, that the relevant Trace Event is output and/or recorded, and that every output and/or recorded Trace Event corresponds to an occurrence of an activity of interest, in the order indicated.

Invariant Checker

Another tool, "Invariant Checker," determines whether any violations of identified Invariants (e.g., as identified by Invariant Finder) occur in a given execution, using the Trace Events from that execution. (Alternatively, the developer can provide guidance to Invariant Checker by indicating important variables to watch.) Ideally, Invariant checker would find no violations for the many executions of the DBMS Executable over many Workloads (thereby confirming as correct the invariants found by Invariant Finder).

Invariant Checker may be run to periodically to further validate the analysis done by the other tools (such as Invariant Finder and Tracer). Users of the application may run Invariant Checker, for example, and be provided with an indication that no violations were found. On the other hand, if violations are found, the user may be provided with an indication that violations were found, and may further be provided with a message to contact technical support for assistance.

Another use for Invariant Checker is as a debugging tool, for example, employed by the developers of the tools described herein to ensure the correctness of the static analysis (e.g., the invariants identified by Invariant Finder).

Program Interaction Deducer

The tool "Program Interaction Deducer" uses the PR (or an equivalent

representation, such as source code, IR code, or even assembly or machine code) and an Ecosystem Specification to deduce the Program Interactions, a list of data files along with associated information. Basically, the Program Interaction Deducer ascertains where in the program(s) values are stored in a file, where values are subsequently read from a file, and where those values are removed from the file (or the file itself is removed). Those values will then be determined to be long-term invariants within the Domain Assertion Inducer.

The Ecosystem Specification states (a) what data is involved, (b) which data files are fixed and which can vary, (c) which program(s) can create, access, and discard this data, and (d) any concurrency requirements. In this disclosure, the focus is on files; however, in general, this specification is concerned more generally with reading and writing data from the outside world, which includes files, but may also include user I/O, streams to/from other processes, and possible other ways for a program to get data as well as other interactions with the O/S, such as allocating memory and dealing with character encodings. Files may be the most common way, and the focus of the discussion here, but it should be understood that the present invention may utilize any other such form of data.

Table Spiff Use Case

To aid in the description of tools provided herein, some examples will be described in relation to a prototypical DBMS ("minidb"). We give in Example 6 an excerpt from the minidb .h and minidb.c source files, which we will refer to repeatedly.

An Ecosystem Specification, which can be provided by the developer as a configuration file describing non-obvious traits of particular functions regarding data flow manipulation in the application (an example Ecosystem Specification is shown in Example 7 below), would state that (a) the data starts out empty and the workload is read from stdin or a file, (b) (workload) data can vary, (c) only minidb will access the data, and (d) at most one instance of minidb will be running on any specific directory. The Ecosystem Specification is essential to understand that the schema is invariant across executions of minidb.

minidb uses two types of data: table, a file holding the rows of a table; and workload, a file containing SQL statements. The type names are only to differentiate these files in the rest of the description. Each table is in a directory (the database).

There is one program in this ecosystem: minidb. It creates table data files. We give the line of code that performs this action (e.g., line 3 in the CreateTable function), to tell the Domain Assertion Deducer which file is being manipulated (here, the specific file mentioned on that line of code). The Consolas font also used in the Examples are the names of functions in the minidb source code. The verb "reads" indicates that the directory is not created nor removed by the application. For table data files, the file is indicated by file passed to CreateTable ( ) . The verb "creates" also implies "opens," "reads,"

"writes," and "removes." (It may be possible for Domain Assertion Deducer to figure out that each table is in the database directory, so the inDirectory attribute and perhaps the entire directory element may not be necessary, shortening the Ecosystem Specification by one line.)

This program opens workload data files, which implies "reads." Here the file is that passed to Get Next Command ( ) . Or this file might be read from stdin in line 7 of

Get NextCommand ( ) . Multiple parallel instantiations of minidb might be reading the same workload file, but won't be accessing or changing the database directory or the table files within it. The Program Interactions extracted from the PR (see Example 8) states that minidb creates table files in this directory, reads and writes them, and then removes them, indicating exactly where in the source each of these file actions occur. Furthermore, the table header within a file is never changed within a file and that file is uniquely identified by the variable "data_file_name."

The table data file is first created in the database directory. (Since a sole application is used in this example, minidb, we can specify it in the datafile rather that at the add, remove, ext. operations on that data.) This file contains three data structures: the

TableHeader, multiple "RowHeaders", each with the row (a .string). The analysis in subsequent tools doesn't need to know the inclusion structure; all that is needed is the data structures written and subsequently read. Of course, once data is written to a file, it can be read, possibly many times, before that data is deleted.

The lifetime of a file extends beyond an individual execution of an application. One execution might create the file, another might subsequently write data to that file, another might subsequently read that data, and another might subsequently remove the file. The critical semantics is that data written to a file will be the same data subsequently read from that file, until that data is deleted from the file or the file itself is removed. The other critical semantics is that we know from the PR the actual C structs written out to the file and subsequently read in.

Coming back to the particulars of the prototypical DBMS (i.e., minidb), interestingly, the delete is actually a file write. What happens is that the row is written over, effecting a delete of the original row.

The logic in ExecuteDelete( ) is particularly complex: a temporary file is created, tile rows before the row to be deleted are copied over to the temporary file, the rows after the row to be deleted are copied over, and then the temporary file is renamed. Program

Interaction Deducer may include such logic to handle these particulars.

Table Spiff Instance Use Case Table spiff instances are associated with particular rows in the database, the handling of which is discussed above.

Row Spiffs

The notion of a "row" seems quite domain-specific. However, the general notion is of a part of a data file that is read, written, and processed as a whole. There is also the notion of a query evaluation loop over each row, but that can also be generalized as the portion of code used to process each ascribed part of the input file. So recognizing a row spiff requires recognizing when a portion of the input file is processed, with the same code being used repeatedly over different portions, each with the same structure.

The implementation of row spiffs requires (i) determining which invariant values to utilize in partitioning the data, (ii) placing a spiff id in the data, and (iii) possibly removing data values that can be determined from the spiff id. The first step uses the cost model, which relies on the workload. The second actually changes the structure of the input data and so must change each relevant program in the ecosystem, those that read or write that portion of the data. The third challenge would be handled similarly.

Hence, the unique aspects of the notion of a row spiff is (a) identifying portion(s) of the data that are processed in a unit and (b) changing the data so that it can be more efficiently manipulated in the program(s) that access this data.

Query Spiff Use Case

Query spiffs are a combination of query, table, and row invariants. The last two are dealt with above, while query invariants are found by Invariant Finder, as in this case they do not persist across rninidb executions, because the workload where queries come from is only read and could be used by several minidb incarnations (e.g., as parallelAccess is allowed).

Example Algorithm for Program Interaction Deducer

As shown in FIG. 1, Program Interaction Deducer has two inputs: the Ecosystem Specification and the PR. While the Ecosystem Specification focuses on the programs mat read and manipulate the data, the Program Interactions produced focuses on what is done to files, in particular, where data structures within the programs are written to and read from files. To do so, the Program Interaction Deducer, or PID, analyzes file manipulation system calls, in particular, f open ( ) , f writ e ( ) , and remove ( ) . It uses as its starting point the <datafile> and <workload> specified by the Ecosystem Specification, in this case, table and workload (e.g., as shown in Example 6). (Note that PID also analyzes database, but figures out pretty quickly that this is a directory mat is only read by OpenTable().)

Between these file manipulation calls, PID watches the flow ofF I LE * values.

The workload file is particularly easy to analyze. The Ecosystem Specification specifies that mis file is opened atGetNextCommand():13. (The file can also be read from stdin.) PID determines by analyzing the source code referenced by the specification:

· that this file is named byquery_file_name,

· that it is associated with the FILE query_file and stdin, and · that the only reads of this file are of character strings read from this file at GetNextCommand():8 andGetNextCommand():18.

The Program Interactions Deducer thus outputs this determined information into the Program Interactions file, as shown in Example 7.

The table file has more complex behavior. The Ecosystem Specification states creates="CreateTable() : 3'' indicating that we need to follow data_file_name, which originates in the data structure TableHeader.table_file as deduced by analyzing the referenced source code.. So PID can see the flow:

· from main(): case 'C to CreateTableQ :2 (fopenQ)

· then a few lines later a call to WriteTableHeaderQ : 3 (fwriteQ) · back to main() and then to many rows getting written (in

WriteRow() : 3and 6 via case T: fwrite())

· and rows getting deleted (in ExecuteDelete() :25 via case 'D' : the

absence of an fwrite() , though this will be challenging to detect), · followed eventually by the file being deleted within main() :57: remove(), again, by examining the referenced source code.

From this overall sequence of operations on a table datafile, associated with the C FILE table_header.table_file, PID can deduce

· that the TableHeader data structure is written to the table file at WriteTableHeader() : 3, then

· subsequently read at 0penTable() : 8. Interestingly, that is all that is done with aTableHeader: it is only written once and never deleted from the file.

PID can also determine that the RowHeader data structure is

· added to the table file at WriteRowQ:3,

· subsequently read at SequentialScan() :7, and

· removed from the file at ExecuteDeleteQ :25.

Finally, PID can determine that character strings are

· added to the table file at WriteRow() :6,

· read at SequentialScanO :15, and

· removed from the file at ExecuteDeleteQ: 15.

The analysis performed by PID is thus to analyze how each program manipulates each file identified in the Ecosystem Specification, by following the values of variables of type FILE and observing:

1. where the name of the file comes from (also a variable within the program), 2. where the file is opened,

3. from there, where does the file value flow in the program,

4. and hence, what data structures are (i) written to and subsequently (ii) read from and subsequently (iii) deleted from the file,

5. and finally, where that file is deleted or closed.

It should be noted that this analysis is entirely within the context of a single execution of a single program. If there are multiple programs, each is analyzed separately. There may often be multiple executions of each program, but the analysis only considers a single execution.

The PED analysis first must:

· find the file variables,

· calculate the value flow of such values,

· and along the value flow, identify file open, read, write, and delete operations,

· for each, identifying specific information to be recorded in the Program

Interactions. The Domain Assertion Deducer, described below, takes the per-execution behavior extracted by PID and stitches them together into a holistic understanding of how data flows from programs into files and then back into (perhaps subsequent executions of) programs, thereby computing invariant flows across program executions, something traditional compiler analysis cannot do.

Domain Assertion Deducer

The Domain Assertion Deducer tool uses the PR, identified Invariants, and the Program Interactions to deduce the Domain Assertions. For table spiffs, the Program Interactions imply which schema information is invariant. For table spiff instances, the Program Interactions is essential to understand where in the program(s) rows are created, accessed, updated, and deleted. Query spiffs can exploit schema, row, and workload- generated invariants, in conjunction with invariants of smaller scope that are within the purview of conventional compiler optimization techniques. Specifying Domain Specific Knowledge describes some of these computations in some detail. Then the Domain

Assertion Deducer stitches together the Invariant Interval, following the values as they are read from and written to files, perhaps across multiple invocations of the program(s), to derive the complete lifetime of a value, which is then encoded in the Domain Assertions. If the Workloads are complete, that is, if they completely characterize the possible operations on the data, which can be the case in some batch applications, the Domain Assertion Deducer (DAD) can also deduce a restricted set of possible values for the invariant variable. It is precisely this information—Domain Assertions and perhaps a set of possible invariant values—that a conventional compiler does not have.

An important aspect of field specialization is that it makes use of information not generally available to a compiler. This information is of two forms: (i) domain-specific knowledge and (ii) extra-source knowledge. Both kinds of knowledge go beyond what a compiler can discover or conclude by just looking at the source code of a single program. Field specialization takes into account the much broader ecosystem of the program being specialized: what data will it read or manipulate, what programs will it invoke, what other programs will also read or manipulate this data, what operating system(s), network routers, and storage systems will be involved? This ecosystem provides a great deal of information that field specialization can exploit to increase the efficiency of the program (and its data) being specialized.

Compared to somewhat generic extra-source information, domain-specific knowledge applies only to programs in a particular domain. An example of domain-specific knowledge is "all changes to a table's schema will be serializable." The notion of serializability is a complex one that arose out of the database domain, though it is finding its way into other parallel and distributed information processing applications. Such knowledge allows creating table spiffs that speed up the DBMS, including indicating exactly where a table spiff should be created and where it should be destroyed.

A second form of domain-specific knowledge is that of the workload of a program. An example is "OLAP (on-line analytical processing) applications exhibit little data volatility: (often complex) queries dominate during the day, with updates occurring infrequently, typically overnight." Such information is of the form "this activity is more frequent than that other activity," thus providing guidance to field specialization, allowing it to make better-informed decisions trading off work now that will speed up something else later on.

An example of extra-source knowledge is a particular portion of a file that is written to and read by only the one program will remain until either the code that modifies that portion or the code that deletes the file is executed. Such knowledge allows creating spiffs that speed up any program that processes an input file repeatedly.

Preferably, the domain-specific and extra-source knowledge is formalized, so that the Spiff Finder can read files containing domain assertions and extra-source assertions that state in a formal manner such knowledge. The Spiff Finder would then read the files comprising the DBMS source code (or more generally, the source code of any program in the domain described by the domain spec) and output the spiff invariants, for use by the SpiffMaker.

Table Sniff Use Case

Coupling the information from the Program Interactions with the Invariant Interval on mainQ :table_header. num_columns provides the information needed by DAD to produce the following domain assertion, shown in Example 9.

Note that we don't include ExecuteDelete( ) : 38. That is because our analysis notices that that is a rename.

Table Spiff Instance Use Case

Database table rows differ from schemas in that there are multiple rows per table but only one schema. In the above, there is one value oftable_header. num_columns stored in a file associated with a table; this value is initially written out to a file and subsequently read back in. The Program Interactions tells us that, as well as that various different values ofrow_data are written to the same file.

We tell rows apart by where they reside in the data file, that is, by the file position (the offset of the first byte of the row).

The domain assertions generated by DAD, as shown in Example 10, differentiate rows with a keyword OFFSET on the left-hand-side of the function dependency, which indicates that the functional dependency holds only when the current position in the execution of the program witriin the indicated file to be read or written is a particular value.

Intuitively, while there is data at the position when the read occurs, that data will be the same at the data written to that position earlier, stated by the segment start. DAD notices that multiple values of a variable in the program are written to the same file, and then uses OFFSET to differentiate those values. Note mat one of the segment end statements does not include OF FSET, as that ends all row invariants for that entire file, as the file is deleted.

The DAD may even be able to tell mat case 'D' (a minidb command, which is interpreted in minidb 's implementation using a switch/case statement) moves data packets from one OFFSET to another within the file. It will be readily understood by those skilled in the relevant field that one need only to extend the domain assertion formalism to accommodate such moves.

Other than the OFFSET, then, a row invariant has the same structure as a table schema invariant.

Going further, each file written to or read from an application is considered to be composed of data packets, each of which is an external form of values in local variables of the program that are written and read as a unit. So minidb.c places in files first a schema data packet (including the value of table_header . num_columns) and then a sequence of row data packets (including the value of row_val lies).

Query Spiff Use Case (Example 11 )

There are two cases. The first is when the workload (that is, the queries) come from stdin, in which case no domain assertion can be deduced, because the user could type anything in. (Of course, we can still use the VFT to determine the (many) invariants active during a query, but that was already done by Invariant Finder, in a previous step.) The second is when workload comes from a file, say as a file named in the invocation argument, in which case the domain assertion is essentially the same as that for a row invariant, in that it deals with the OF FSET. For this second case, we use the name of the file to denote the actual file as the source of the workload. These two cases are differentiated by the

Ecosystem Specification, which in the second case states that a particular Workload file could be specified. That said, we may not know who created the workload file, and in such a case we cannot yet create query spiffs for that workload.

The second case may be used to specialize on a workload. Query spiff ids may then be put into the workload or as an association stored somewhere else, and used when that workload was executed.

In the following, we will just consider the first case, where the query is not known until it is read in.

Note that a query spiff that only includes invariants in effect during a query should be discoverable by an aggressive optimizing compiler. But more importantly, a query spiff combines such query invariants with schema and row invariants, which cannot be discovered by a compiler, because such schema and row invariants require knowledge about the semantics of file reads and writes. It's that aspect that makes a query spiff a true spiff.

Example Algorithm for Domain Assertion Deducer

Trace partition elements come from program interactions: where data files or components of data files (here: table header and rows) are created, inserted, or deleted. The dependencies within domain assertions come from the directory and file name and optionally an OFFSET within the file. The Invariants tie these together, so that values of variables within the application can be seen to flow through the application code, out into a file, and later back into the application, thereby ascertaining long-lived invariants, for which candidate snippets and ultimately spiffs can be determined.

For the table spiff use case, in which the table_header.num_columns is characterized in a domain assertion, DAD can determine that:

· main(): Case 'C calls CreateTable()

· which calls WriteTableHeader( )

· which writes the table header to the file.

The table header:

· is read by this and subsequent invocations of the program,

· until the file is deleted, at main(): 57 .

This implies that the table header in a table <datafile> is only written once and never modified by the program. Knowing this, DAD can generate the appropriate trace partition elements, that of the table <datafile> being created and deleted. DAD can also create the dependency concerning the table_header. num_columns.

For the table spiff instance use case, the relevant data packets are the row_data and row_values being added to the table <datafile>, perhaps deleted from that file, and mtimately removed when the file itself is deleted. DAD determines that once the file is created, the program may store multiple values ofrow_data into the file, thus each such packet can be identified by the OFFSET it resides at.

The above observations provide an algorithm for DAD. For each FILE that is created or opened by a program, DAD figures out where that file was initially created and where the name for that file comes from, via the VFT. Then, for each data structure that is stored in that file (these are the <data > elements in the Program Interactions), DAD ascertains the file operations performed on that data structure (adding that data structure to the file, possibly changing or removing that data structure, and finally deleting that file). These operations then imply the appropriate trace partition elements. Finally, from the program data structures used in these operations (the C or C++ program data structure written to the file), DAD can inspect the VFT to determine where the values in such program data structures originate, to imply the dependencies. DAD also determines whether a file contains only one data packet (as in the case ofnum_columns) or multiple packets (as in the case of row_data) by tracking what is done on each FILE variable as it flows through the program, also determinable via the VFT. Multiple packets require an OFFSET in domain trace partitions and dependencies.

Correctness

Correctness dictates that the Domain Assertions that are produced are complete and are consistent with the input PR, Invariants, and Program Interactions.

Snippet Finder

As shown in FIG. 1 , another tool, Snippet Finder, takes as input:

· One or more Invariants

· An PR

· One or more Execution Summaries

· Domain Assertions Program Interactions, and

· A Cost Model

Snippet Finder outputs one or more <spiff > elements, each containing one or more Candidate Snippets, each of which contains:

· an Interval of code identified by PEs

· a set of Invariants

· a set of possible values for each Invariant

· the source location(s) where the value of each invariant was first written to a file,

· the source location(s) where that value was removed from a file, · the source location(s) where the value is read in each time,

· the appropriate lifetime of the Candidate Snippet, that is, when the associate spiff can be created (and whether at compile-time or run-time), and

· optionally, suggested optimizations to be employed within the Interval.

Each Domain Assertion implies an interval that is probably broader man just one program execution, in contrast to the interval recorded in Invariants, which has a scope within a single program execution. Snippet Finder uses Domain Assertions to expand the scope of the Invariants and to refine the set of possible values for each Invariant. The interval of each Invariant overlaps (either partially or entirely) the interval of the Candidate Snippet. Furthermore, the interval of each Candidate Snippet is tailored by the Cost Model, to minimize the size of the interval while maximizing the savings, calculated as the cost of executing an optimized version of the snippet times the number of times that snippet was evaluated, drawn from the Execution Summaries, plus the cost of invoking a spiff. To do so, the Snippet Finder must have an idea of the possible optimizations that the Spiff Maker performs and the benefit of each such optimization, the latter from the Cost Model.

Query Spiff Use Case (Example 12)

There are two major differences between query spiffs and the table spiffs considered earlier. The first is encountered at this step: Snippet Finder uses the Invariants, not from the Domain Assertions, as queries usually do not persist (though see the discussion above about Workloads being given to stdin). The second will be encountered later: Spiff Maker needs explicit guidance from the Ecosystem Specification as to where the boundary is between compiling spiff code and just instantiating a spiff instance at run time.

Snippet Finder infers:

· an Interval of code from SequentialScan() :40 (immediately after

unpacking the data packet into row_values [ ]) to SequentialScan() : 59 (the end of the method),

· a set of Invariants: main( ) .query, in particular,

query.executor_routine, query.executor_command, query. num_predicates, query.predicate_list and predicates[] read from stdin and query. schema from the table spiff use case,

· a set of possible values for each Invariant, in this case,

query.executor_routine is always SequentialScan(), query.executor_command is always SCAN_FWD. For each predicate, column_id is an arbitrary int read from stdin (deduced from the assignment to that field in BuildPredicates()), constant_operand is an unsigned long read from stdin, and operator_function is either &EqualInt4or&LessThanInt8, · the source location(s) where the value of each invariant was first determined: the value of query is determined by main ( ) : 32, that is, right after the call to BuildAndPlanQueryO.

· the value of query is never written to, removed from, or read from a file, · the appropriate lifetime of the Candidate Snippet: the lifetime is just within the 'S' switch; the Ecosystem Specification tells us that we cannot call the compiler there, so we just pre-compile spiffs for executor_routine as Sequent ialScan( ), for executor_command as SCAN_FWD, for num_predicates from 1 to 6, and for each such predicate, operator_function is either &EqualInt4 or &LessThanInt8, · unroll me for loop at SequentialScan( ) :47 as it is tenninated by the

It is not clear how fromValue and toValue are determined, but it seems that it is to bound the number of compile-time query spiffs that are generated.

Example Algorithm for Snipper Finder

Snippet Finder first expands invariants to across program executions by tracking which variables are read from files and where those values are put into files. This results in invariant intervals that span multiple executions. The tool also needs to track when the value was first written to the file and when it is deleted.

The other challenge to Snippet Finder is in bounding the snippets, using the Cost Model. In doing so, the tool needs to know what optimizations Spiff Maker can effect, and under what conditions each optimization is feasible.

Correctness

Correctness of this tool dictates that each of the Candidate Snippets produced by this tool be consistent with the input Invariants, PR, Execution Summaries, Domain Assertions, Program Interactions, and Cost Model. The indicated invariants should indeed be invariant over the snippet, the suggested optimizations should be consistent with these invariants and their manipulation within the PR, and the possible values should indeed be possible.

It is desirable though not required that:

· the snippets that are returned are the most desirable, given the cost model, · the snippets be maximal, in that making them larger would result in a larger cost, from the cost model,

· the snippets be minimal, in that making them smaller would result in a larger cost, from the cost model, and · that the suggested optimizations would be helpful, given the cost model. Spiff Maker

As shown in FIG. 1 , another tool, Spiff Maker, takes as input one or more Candidate Snippets and a PR, and produces as output Specialized Source Code.

Specifically, for each input Candidate Snippet, Spiff Maker should perform the following tasks:

1. Create a .h file for the spiff pattern, defining all pattern parameters and spiff pattern functions.

2. Create a .h file for the spiff implementation declarations.

3. Create a .c file for the spiff implementation definitions.

4. Insert code in the appropriate place(s) to create the spiff (for dynamic spiffs), to invoke the spiff, and to destroy the spiff (again, for dynamic spiffs).

Each use case is associated with a specified branch of minidb, for concreteness. Each branch includes the Candidate Snippet that causes the generation of that configuration.

It may be convenient to use TXL for the actual transformation, as a PR-to-PR transformation, with the transformed PEs, then converted back into textual source code to create the spiff. TXL includes a parser, but it may be possible to take PEs directly. TXL also includes a syntax tree unparser which may work with our PEs.

For Spiff Maker to function as described, it may require some guidance based on domain knowledge. Specifically, Spiff Maker may need to be given/told:

· Specifications for all static implementations to be produced, (ie, which

variables(s) are specialized, and their values if so.)

· Rules for disambiguation, for cases where more than one static

implementation applies

· Rules for creation of dynamic implementations: Are they allowed at all? Are they cached? If so, how? In memory? On disk? What are the size(s) and management rules for the cache(s)? Should a generated dynamic implementation be fully specialized, or would it be better to only partially specialize, and leave some parameters generic? Is it acceptable to compile a dynamic implementation on-the-fly as needed, or is it only acceptable to use a dynamic implementation if it's already in the cache? Do the answers to these questions change?

· Should a fully-generic implementation be created (and used as a fallback inside)? Or, will some variables always be specialized, one way or another? (This will determine which variables will need to have representations in the data block.)

In general, Spiff Maker will be told all of the above in the input file. It's Snippet Finder's job to figure out how many static implementations to create, whether it should be static/dynamic, which variables are specialized and which are not, and so on. There will only be a single static implementation to create, and that single static implementation is always the one that should be called.

Query Spiff Use Case

Referring to Example 13, the input is as follows, indicating compile-time query spiff for executor_command as SCAN_FWD, for num_predicates from 1 to 6, and for each such predicate, operator_function is either &EqualInt4 or &LessThanInt8, as specified by Snippet Finder.

As noted above, Spiff Maker needs explicit guidance from the Ecosystem Specification as to where the boundary is between compiling spiff code and just instantiating a spiff instance at runtime. We assume that Ecosystem Specification specifies that that bound occurs at the 'S', 'Γ, and 'D' cases: that no spiffs can be compiled within anything called by these three cases. (That emphasizes knowledge about what delays users will find tolerable. Note that compiling a new spiff for a particular row might be particularly beneficial for speeding up a collection of Workloads viewed as a whole, but the user may still want to specify that that not be done, say because a particular workload must itself run faster with field specialization.) This is the reason why Spiff Maker includes the Ecosystem Specification as an input.

Spiff Maker thus makes a spiff for a portion of SequentialScan ( ) , one for each value of num_columns at compile-time, for query . executor_command always being SCAN_FWD. For each predicate, column_id is an arbitrary int, constant_operand is an unsigned long read from stdin, and operator_f unction is either &EqualInt4 or &LessThanInt8, with spiff 0 a non-specialized version that can handle any number of num_columns. The relevant transformations are loop unrolling and constant folding. Spiff Maker will produce a spiff pattern for spiffID of 23, for num_predicates = 2, with the first having column_id = 2 and operator_f unction = &EqualInt4 and the second having column_id = 7 and operator_function = &LessThanInt8, as the following. Note that query spiffs ID computation is normally associated with the particular values of the spiff pattern parameter. Spiff Maker should utilize an application-specific ID generation mechanism to produce the proper spifflD. In this example however, we are going to assume a computed spifflD of 23.

Example Algorithm for Spiff Maker

Spiff Maker decides only one thing: whether to allow the compiler to perform the optimization, after Spiff Maker has indicated what the invariant value(s) are, or to perform the optimization manually, by generating different code.

Spiff Maker then cobbles together the generated files by copying mostly verbatim from the original source to the specialized source, using the file names, line numbers, and column counts within the relevant PEs to determine the extent of what is copied and of what is replaced say with a spiff parameter (e.g., num_columns). Hence, Spiff Maker needs to do a very limited amount of parsing and unparsing, with most of its effort consisting of copying code from the appropriate place in the original source to the appropriate place in the specialized source.

Correctness

Correctness dictates that the code compiles and runs and is identical in semantics to the original code it replaces, while being consistent with the input information.

The following discussion provides additional examples demonstrating the creation of a MiniDB Table Spiff and a MiniDB Query Spiff.

MiniDB Table Spiff

The following example demonstrates the creation of a table spiff from the invariant schema->num_columns == CONSTANT in the SequentialScan( ) function, shown in Example 4.

Invariant Finder

In the above example, Invariant Finder should identify the following Invariant Interval Sets for the SequentialScanQ : : schema->num_columns variable:

· Invariant Interval Set #1 : Starts on line 52, with 1 Invariant Intervals:

o Invariant Interval #1.1 : Ends on line 114 Invariant Finder should also produce a VFT to show where the variable

Thus, the value ofSequentialScan() : : schema->num_columns eventually comes from a call to fread() in OpenTable().

Invariant Checker

Invariant Checker would verify that once

main() : :table_header->num_columns was assigned to on line 634, and mat the value of mis variable was never changed through the particular end node(s) taken by that execution of the given Workload.

If the Domain Assertion Deducer executed before Invariant Finder, the Invariant Checker could check to ensure that the actual value was included in the Possible Values. That might be able to reduce the scope of the Invariant Finder, by focusing that analysis on particular values or variables.

Snippet Finder

Snippet Finder should deterrnine, through an analysis of the Execution Summaries in conjunction with the Cost Model, that the 'C, Ί', and 'D' cases are too expensive to create spiffs, but that the computation time within the 'S' case is sufficient to suggest that that case be specialized.

Let's start with a simple Cost Model that just states mat a PE (or other equivalent embodiment) that is executed less than a fixed or percentage of times will not be specialized.

In such a case, Snippet Finder should then infer from the domain assertion that schema->num_columns is invariant across the body ofSequentialScan() with a scope of when the data file was created to when that file was removed, thus indicating that the value of that variable was first written to the file when the number of columns is stored, in Writ eTableHeader ( ) : 3, which is executed shortly after minidb.c:553. The value was never removed from the file, but the file itself was removed at main ( ) : 57. This indicates that the spiff can be created at compile-time. The snippet should extend from

ExecuteTable( ) : 2Θ through ExecuteTable( ) : 23. That is the scope of the snippet, specialized on num_columns, which is determined from seeing what other statements can be specialized on num_columns. (Essentially, num_columns is used infrequently, and thus far away from this specialization opportunity.) However, doing an extra indirect call is expensive, so Snippet Finder expands this snippet to the entire body of ExecuteQuery/ ), which has another use of this invariant on line 7. Finally, Snippet Finder should suggest loop unrolling for this Candidate Snippet.

In this case, the spiff will have but one spiff function, indicated by the <snippet>, as shown in Example 14.

< ?xml version="1.0" encoding="UTF-8"?>

<candidateSnippets xtnlns:xsi=" ..."

xsi:noNamespaceSchemaLocation="candidateSnippets.xsd">

<spiff createAt="compileTime" valueRead="OpenTable() :8">

<invariantIntervalSet variable="ExecuteQuery.query->schema- >num.num_columns"

start="ExecuteQuery() :1" value=">0">

<interval stop="ExecuteQuery() : 33"/>

</invariantIntervalSet>

<snippet existsFrom="WriteTableHeader() :3" existTo="main() :57" replaceFunction="ExecuteTable()">

< suggest opt="Constant fold" on="num_columns"

at="ExecuteQuery() :7" />

<suggest opt="Unroll loop" at="ExecuteQuery():21" />

</snippet>

</spiff>

Snippet Finder could infer from the domain assertion that data packets are created in cases 'Γ and 'D' ofmain() and removed is cases 'P' and 'D'. More specifically, Snippet Finder infers:

As shown in Example 15, note that the analysis combines the broader scope of the table invariant with the narrower scope of the row invariant, and employs different strategies for each: the former allows generating code when the table spiff is defined, whereas the latter involves instantiating the spiff at runtime by providing value(s) for the row_values array. In field specializing DBMSes, the schema invariants will play a large roll in table spiff instances and query spiffs, which involve invariants of successively narrower scope

SpiffMaker

This is the simplest use case, as there are no instances. We explore four variants of this case.

Variant 1 : A single static implementation:

Consider the following input Candidate Snippet, corresponding to a single invariant in minidb that should result in a static spiff implementation, shown in Example 16.

Example 16

This input is telling Spiff Maker to make a spiff pattern, with one spiff pattern function based on ExecuteQuery() , and to make a static implementation that specializes the variabl to the single literal value 3. Variant 2: Specialization at a finer granularity:

The above example showed a spiff applied to replace an entire function

(ExecuteQuery()). In fact, we can observe that only a small code segment in mat function involves an invariant. Thereafter, we can convert just that small code segment into a spiff, as shown in the following snippet.

The candidate snippet shown below (shown in Example 17) is very similar to before, with the interval much smaller than the entire ExecuteQuery() function, rather just three lines of the for loop. The replaceFunction attribute thus disappears. Finally, the constant fold suggestion for line 21 is omitted, because that line is not in the interval to be

Variant 3: Employing fixed array of implementations:

In our particular example, we decide to identify the spiff implementations with a one-byte integer. Therefore, we can have a total of 255 implementations from the same spiff pattern, with the num_columns variable acting as the spiff-pattern parameter, varying from 1 to 255 (0 is chosen to represent the invalid value). So the candidate snippet shown below in Example 42 doesn't have a value just of3, but all values from 1-255 (that is, the fromValue and toValue attributes of the invariantlntervalSet element). We also revert back to a full function replacement.

Variant 4: Dynamic spiff:

In reality, each column in a table can be of a particular datatype. Assuming there are eight datatypes (int2, int4, char, varchar, etc), a static table spiff for a three-column table requires possible implementations. Hence, a dynamic table spiff is more suitable in

this scenario.

The candidate snippet given below (see Example 19) states this with the c reat eAt attribute, which here specifies where in the application the spiff is created, that is, within the Creat eTable ( ) function (the c reat eAt attribute, which in the previous example was compileTime) as well as where the spiff is to be instantiated, that is, within the

OpenTable ( ) function (the instantiateAt attribute). There are no f romValue or toValue attributes in the invariantlntervalSet element, as the num_columns value is supplied when the snippet is instantiated. The one other important difference from the previous example is the additional optimization suggestion to constant fold on

column_definitions.

Unlike creating a static spiff, a dynamic spiff is created at runtime by inserting a call to compile the spiff into CreateTable ( ) , for a table spiff.

The Design of Various Types of Spiffs

(Here we use examples from the open-source Postgres DBMS.)

Predicate Query Spiff

This spiff is utilized by evaluating both regular predicates within queries, such as o_orderdate >= date ' 19940801 ' , and join predicates, such as o_orderkey = l_orderkey. These predicates are evaluated by the ExecQual ( ) function (in Postgres).

Specifically, predicates are normally represented in a linked list. ExecQual ( ) iterates through this list and invoke particular evaluation function corresponding to each individual predicate. The code excerpt presented in Example 20 (from PG 9.3 stock,

src/backend/executor/exec ual.c:5125 shows such lo ic.

The per-predicate evaluation function is stored within the clause variable. For each predicate, in the form of a > b, there are three components, operand #1 , an operator, and operand #2. In Postgres, the operator is evaluated by function ExecEvalOper. This function (see Example 21) essentially performs a look up according to the type of operator and fetches the address of the actual type-specific comparison function. ExecEvalOper( ) also requires that the operands to be stored in another linked list. In many cases, the length of this list is two. Below is an example to specialize this function on those cases.

Note that an optimization to ExecEvalOper ( ) is that it is executed just once to perform the comparison function look up. It then stores into xprstate. evalfunc a different function. It also calls that function once to do the predicate. Subsequent evaluations of the operator is done by ExecMakeFunctionResultNoSets ( ) (for scalar predicates considered within our current specialization scope).

ExecMakeFunctionResultNoSets ( ) then iterates through the list of operands by calling the argument-extraction function for each operand.

ExecEvalExpr is a macro, defined in src/include/executor/executor.h:72 as:

So if the operand is a constant, the ExecEvalConst ( ) would be called, and finally, invokes the comparison function.

The bottlenecks observed in predicate evaluation are, first, the loop that iterates through just two elements in the operand list, and second, the extraction of individual operands. Specifically, we observe that for a regular predicate, one operand is normally a table column and the other operand is a constant. In such a case, the constant's value (or address) can be directly "stored" in the code rather than having to invoke multiple functions to fetch it. In addition, the original implementation requires multiple function calls to extract the column ID for the table-originated operand. Similarly, this column ID can be stored directly into the specialized code.

For a join predicate, both operands are non-constant. The origin of an operand can be of one of three types, that ofINNER_VAR (I), 0UTER_VAR (O), and scantuple (S). The origin of the operand is also an invariant given a query. By knowing this invariant, we can further simplify the routine that extracts the actual operand's value. Note that although theoretically, there are 9 possible combinations for the origins of the two operands, in reality, only the following combinations are allowed.

Hashioin Query Spiff

In function ExecHashJoin(), defined in file src/backend/executor/nodeHashjoin.c. The variable is invariant for a given query. Depending on the query, it will take on one value from the set

JOIN_INNER}.

In the same file and function, the variable List *joinqual is also invariant for a given query.

Hashjoin Query Spiff eliminates entire branches in the code, thereby reducing the number of if statements and, more importantly, the size of the code.

The analysis has to be able to handle complex data structures involving pointers and heap-allocated structs. For example, to eliminate if statements in the body of the for loop in ExecHashJoin() , we have to be able to reason about expressions like the following (Example 22).

Page Spiff

A page spiff utilizes invariant(s) within a disk/memory page with which the DBMS manages its storage of data. Often, such invariants could include the number of rows stored on the page, the free space remained, and whether the page is empty or full. In postgres' page-scan routines, there are additional invariants, such as the scan direction and scan mode (pageatatime).

More interestingly, page spiffs can enable more aggressive optimizations. For instance, a page spiff can reorganize the data layout once the page is read into memory to optimize data locality. In addition, once data layout is changed, instead of following the existing function-call sequence in further process, the page spiff can invoke these calls in a block-at-a-time manner, thereby improving instruction locality as well.

A page spiff is possible to specialize a long sequence of calls that eventually access data, passing up the data a ways, where it is possible to specialize out a lot of code in the called functions.

The chief benefit of the page spiff is to inline the called functions to produce a single specialized function mat can fit in the instruction cache. Once mis transformation is done, three other mutually exclusive transformations become available.

1. Eager invocation of theGetColumnsToLongs() speccode routine: as soon as the packed tuple is extracted from the page, convert to a unpackedtupletableslot, then store it in the array manipulated by the speccode routine.

2. Eager partial unpacking: have the code that calls the specialized code compute the maximum column that will be needed, and only unpack the columns up to there.

3. Lazy unpacking: do multiple unpacking operations, wherever the original code called

GetColumnsToLong().

A variant of this uses the selectivity of the select to decide. If the selectivity is high, meaning that only a few rows will be referenced, use lazy unpacking before applying the predicate. In general, it is best to place the call to GetColumnsToLong() such that the execution can maximize instruction cache locality.

Aggregate Query Spiff

Aggregate spiffs are designed to improve the efficiency of the SUM andAVG aggregate functions. In particular, we have found that in evaluating aggregate functions with the numer i c datatype, Postgres incurs a significant overhead in performing memory allocation and deallocation. In particular, aggregate spiffs avoid such memory-management overheads.

In Postgres, a numeric type is represented by a byte string, with each digits stored in an array ofNumericDigit . This representation allows very fine precision control but sacrifices performance by needing to perform essentially string-based arithmetic operations.

In the generic implementation, the reason of having to perform per-row based memory allocation is that for each input row, the number of digits present in the value for each row can differ. Especially in evaluating a * b, the resulting value's range can go far beyond that of the input values. Nevertheless, there is a constant

(NUMERIC_MAX_PRECISION) in Postgres that defines the max number of digits that can be supported for a numeric value. The aggregate spiffs utilize this value to slab-allocate a spiff data section, which is then reused by computing the corresponding aggregate function across all the input rows, thereby eliminating per-row memory allocation.

Note that the evaluation of an aggregate function consists of two steps. For instance, given an aggregate function SUM(a + b) , the first step is to evaluate the result of expression a + b. The second step is then to accumulate the values ofa + b for all the input rows. In PostgreSQL both a + b and the SUM() function are evaluated using the numeric_add() function. This function takes two inputs. In the case ofa + b, the two inputs are a and b, respectively. In the case of computing SUM(x), the second input is the x, which can essentially come from a scanned row. The first input is a transition value, which is the current sum of the rows that have been processed up to this point.

Evaluating SUM ( )

According to numeric_add() , the two inputs are added and the resulting value is copied into the returning res variable allocated by make_result(). res is then returned back to advance_transition_function() within nodeAgg.c, which copies this returned value into pergroupstate->transValue and then frees the returned value. The next time advance_transition_function() is executed to process the next row, the transvalue is copied to the first input value ofnumeric_add( ) via the following snippet.

fcinfo->arg[0] = pergroupstate->transValue;

fcinfo->argnull[0] = pergroupstate->transValueIsNull; This logic indicates that transValue can in fact be shared without being freed across all the rows. Therefore, for the data section of the EvaluateNumericAdd spiff, at the beginning of the aggregate evaluation, the necessary variables are allocated, namely agg_temp_values->result_value and agg_temp_values-> result_arg, by using AllocateAggTempValues(). (Note that these two variables represent the same value but Postgres requires two such variables as the return value and as a temporary computation argument, respectively.)

Evaluating Expression

As discussed earlier, another usage ofnumeric_add() is in computing arithmetic expression, such as a + b. In this case, the variable that stores the evaluation result of the expression are reused, which is previously allocated by make_result() . This variable is added to the spiffs data section as agg_temp_values->expr_result_arg.

Unlike in the case of evaluating SUM() , where the first input comes directly from agg_temp_values->result_value within the spiffs data section, both inputs in evaluating a + b are regular variables that are required to be obtained using existing Postgres' implementation. In fact, in evaluating a + b, numeric_add() is invoked from ExecEvalOper() within execQualc. So similar to the predicate spiff, a spiff

(EvaluateAggregateExpression) is created that specializes the

ExecMakeFunctionResultNoSets() function. This spiff then invokes the expression- evaluation version of the EvaluateNumericAdd spiff.

In addition to +, an expression can include other operations such as -, *, and /. The functions that evaluate these operations are also specialized in the same fashion as numeric_add().

In summary of the EvaluateNumericAdd spiff, the following invariants are considered. 1 ). The caller/execution path where numeri c_add ( ) is invoked. This can be either from evaluating an expression in execQual.c or from evaluating the SUM( ) function in nodeAgg.c.

2). In evaluating expressions, the result value's memory location can be invariant. 3). In evaluating SUM( ) , both the result value's memory location and the first input's memory location can be invariant. In addition, these two variables can even share the same memory location.

4). Sharing a common memory segment across all the rows is permitted by the constant that bounds the maximal precision of the numeric datatype.

String Matching Spiff

Say we have a C function, match, which matches a string x with another string pattern (containing wildcards and other special characters), y . If we know string y in advance of query execution (maybe it is a query constant), then we can create a specialized function for matching arbitrary strings to this particular string pattern.

One approach for specialization would be to first create the following query

specialization code (speccodes):

· one each for constant string of length 1-32

· one for the query string character

We could then string various combinations of these speccodes together to make a specialized function for matching a string to y. For example, say that we have the pattern "%abc%def g%" , and we want to create a specialized function for matching arbitrary strings to it. We would string together the following speccodes:

Each of these speccodes would assume that there are more characters in the string left to match after it has completed. Once one of the speccodes has finished matching, it would pass the rest of the string on to the next speccode in the sequence to continue the matching process. Matching the constant portions of the string would be accomplished using a combination of longlong, long, short, and char combos.

Given an arbitrary query string, it would be easy to instantiate a sequence of query spiff function pointers, each one of which except the last invoked the next stage by a query spiff invocation using the spiff id stored as a local variable.

The following (Example 23) shows an example of how this would be implemented for the string "%abc%def g%" (using pseudocode).

Once we have these speccode-routines created, we will construct a sequence of functions calls as an array to match a string to this pattern. The array would look like in Example 24.

These functions would then be called in sequence to match the string. Constant portions of length greater than 32 could be broken up into segments, so a string of length 65 would require three instantiated speccodes, of32, 32, and 1 character.

More generally, we have a method with a subset of arguments that are invariant. These invariants render some of the if statements, including in recursive calls and within loops, to be deterministic. We unroll this sequence through a series of speccodes that invoke one another. So this appears to be a general transformation that works on loops and on recursive and non-recursive calls.

Since the actual sequence of speccodes that get invoked isn't known until runtime (when the actual pattern being matched is available), we could have the spiff instantiator fill in an array of function pointers that specify the sequence of speccodes to invoke.

Per-Ouerv Spiff Sequencing

Per-Query Spiff Sequencing uses a nieta-spiff to traverse an interpreted data structure and hot swapping to convert existing speccode into something similar that would be emitted by a compiler. In some embodiments, the hot-swapping mechanism is used to turn switch/case blocks into specialized code that can be stitched together at runtime, according to the relationships among various cases. Specifically, when a case is followed by another particular case during execution. Hot-swapping will replace the calls to the branch-based dispatcher to direct jumps to the target branch. This applies to general dispatcher and the interpretive execution model. Instead of interpreting query plans and invoke corresponding plan-node specific functions, all dispatcher calls can be replaced by direct jumps to the child plan node, given a plan tree.

Specialized Code Storage

When the specialization is invoked, a specialized code (speccode) is generated and may be stored in various locations along the field specialization process. For example, speccodes may involve invariants both from the oil field data 220 and the oil field simulator 230. Speccodes may involve invariants both from the config params 210 and the oil field simulator 230. In some embodiments, speccode may be stored in the Linux operating system 230 may involve invariants from the simulator and oil field data. In some embodiments, speccode may be stored in an external router or an external cloud service.

The speccode stored within the simulator may be originating from the oil field data and from the simulator, and may be identified through basic field specialization. Other spiffs are stored with the operating system, router, and cloud, specializing code found in the indicated applications. In some embodiments, speccodes may flow from where they are stored to where they may be invoked (the application that provided the specialization candidate from which they were subsequently specialized). For example, the oil field data may store speccodes to be invoked at the external the router. In some embodiments, speccode identifiers can reside with data or with applications and can also be included in communications with subsequent applications, indicating the relevant speccode to (later) invoke.

FIG. 3 is an illustration of Field specialization for elaboration a paradigm of computer science with an exemplary embodiment provided by this disclosure. The diagram comprises four quadrants 310, 320, 330 and 340 for scenarios of data representing as data, code representing as data, data representing as code, and code representing as code, respectively.

In the early stages of computer architecture, from the Babbage machine through the 1930's, data was differentiated from code. The data was what was manipulated and the program code was the instructions for how to manipulate the data to effect a computation. This is represented in quadrant 310 in Figure 3 as data represented in binary format in a computer memory or storage device, that is, data stored as data, and source code represented in some other way (e.g., patch cords), that is, code represented as code.

In the 1940's, John von Neumann proposed a revolutionary architecture that stored the program in machine code in the computer's memory as numbers, mixing code and data. (Indeed, the code can be manipulated as data, and even changed during the running of the program.) This architecture is represented in quadrant 320, with code (machine instructions) represented as data.

In the 1960's there were some initial forays into combining code, in the form of a Lisp function, with data, say the value of one of the arguments, to produce a Lisp continuation, which is a Lisp function (code) paired with the value of that parameter, which is just a function with one less argument. This is in a very particular way that data are stored/encapsulated in code, as represented in quadrant 330.

In the 1980's the Postscript language was invented. This was code that when executed would create an image. Postscript is produced by a formatter, taking a document such as a Microsoft Word file, which is data, and converting to a program, again, code as data, as represented in quadrant 320. The Postscript file produced from the Microsoft Word file is not an image to be directly printed, but instructions for drawing each letter of the document, so that that program could be executed, for example, within a postscript printer or by a postscript conversion program, to produce a bit-mapped image of the document.

Field specialization takes this idea further. Field specialization takes the values of invariants, that is, data, and uses these values to create a specialized code version of a portion of an application, such as a DBMS, which is code that can be executed. So a relation speccode is the result of specializing DBMS code using the schema of a relation (data). A tuple speccode is the result of using the data values within a tuple (row of a table). An O/S speccode is a specialization of a snippet of an operating system based on particular data values of particular invariants within that snippet; ditto for router speccodes.

This data-represented-as-code (as represented in quadrant 330) can be created in one application from a snippet in that application or another application, passed around between applications, and invoked by the target application when appropriate. The field specialization technology provides the means for identifying when such speccode are effective in increasing performance, when they should be created, with which invariants should they be specialized upon, how they can be communicated across applications, and when they should be invoked.

The implication is that for any coherent region within a data file, it is possible to ascertain the invariant values within mat region, follow those values into areas of the application code, and then make speccodes out of those areas, then associate those speccodes back to their regions. This perspective thus focuses on the originating data, rather than starting with the code and specializing it.

It should be emphasized that the above-described embodiments of the present disclosure, particularly, any "preferred" embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected by the following claims.