Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A STATISTICAL CONTROL FLOW RESTORATION METHOD
Document Type and Number:
WIPO Patent Application WO/2009/091278
Kind Code:
A1
Abstract:
Embodiments of the present invention provide for collecting information on execution control flow in a manner highly correlated with the program performance characteristics. To achieve the correlation, the collection of branch records may be initiated upon reception of a profiling interrupt. The branch records may contain information on at least the source and target addresses of a taken branch instruction. The branch records may be collected automatically by dedicated performance monitoring logic and stored in a memory buffer or preserved in specifically provided registers. The collection may proceed until a predefined number of branches are collected (i.e., said memory buffer or registers are filled with the branch records) or until the reception of a subsequent profiling interrupt, whichever condition occurs first. In case the performance monitoring logic is capable of reporting information on the source and destination branch addresses only, the branch collection may be optionally initiated when a software task being monitored becomes active, and completed after the task goes to the inactive state. The collected branch records may then be processed in a manner that enables future control flow restoration. One embodiment of the present invention may compress the collected branching information and preserve the results in a non-volatile memory for further analysis, while other embodiments may employ methods of constructing control flow graphs at the time of collection and discard the collected branch records, thus eliminating the need for data compression and preservation.

Inventors:
BRATANOV STANISLAV VIKTOROVICH (RU)
Application Number:
PCT/RU2008/000024
Publication Date:
July 23, 2009
Filing Date:
January 17, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTEL CORP (US)
BRATANOV STANISLAV VIKTOROVICH (RU)
International Classes:
G06F11/34
Foreign References:
US20060230390A12006-10-12
US6247146B12001-06-12
US20020059562A12002-05-16
US6253338B12001-06-26
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY & PARTNERS" LTD (B.Spasskaya str. 25, stroenie, Moscow 0, RU)
Download PDF:
Claims:
CLAIMS What is claimed is:

1. In a processing system that restores information on control flow for statistically prominent execution regions, a method comprising: collecting branch records in a memory buffer upon reception of a profiling interrupt; processing previously collected branch records upon reception of each subsequent profiling interrupt; and processing branch records if notified by the processing system that a predefined number of said branch records has been collected.

2. The method of claim 1, wherein branch records comprise information on at least branch instruction source and target addresses.

3. The method of claim 1, wherein a profiling interrupt comprises interruption of the normal processor execution flow generated in a manner correlated with any operational characteristic of the processing system.

4. The method of claim 1, wherein processing branch records comprises copying and transforming branch records in a manner that facilitates further control flow restoration.

5. The method of claim 4, wherein processing branch records further comprises enabling subsequent collection of branch records.

6. An article comprising: a machine accessible medium having a plurality of machine readable instructions, wherein when the instructions are executed by a processor, the instructions provide for restoring information on control flow for statistically prominent execution regions by: collecting branch records in a memory buffer upon reception of a profiling interrupt; processing previously collected branch records upon reception of each subsequent profiling interrupt; and processing branch records if notified by the processor that a predefined number of said branch records has been collected.

7. The article of claim 6, wherein branch records comprise information on at least branch instruction source and target addresses.

8. The article of claim 6, wherein a profiling interrupt comprises interruption of the normal processor execution flow generated in a manner correlated with any operational characteristic of a processing system.

9. The article of claim 6, wherein processing branch records comprises copying and transforming branch records in a manner that facilitates further control flow restoration.

10. The article of claim 9, wherein processing branch records further comprises enabling subsequent collection of branch records.

11. A processing system that restores information on control flow for statistically prominent execution regions, comprising: logic to collect branch records in a memory buffer upon reception of a profiling interrupt; logic to process previously collected branch records upon reception of each subsequent profiling interrupt; and logic to process branch records if notified by the processing system that a predefined number of said branch records has been collected.

12. The system of claim 11, wherein branch records comprise information on at least branch instruction source and target addresses.

13. The system of claim 11, wherein a profiling interrupt comprises interruption of the normal processor execution flow generated in a manner correlated with any operational characteristic of the processing system.

14. The system of claim 1 1, wherein processing branch records comprises copying and transforming branch records in a manner that facilitates further control flow restoration. 15. The system of claim 14, wherein processing branch records further comprises enabling subsequent collection of branch records.

Description:

A STATISTICAL CONTROL FLOW RESTORATION METHOD

A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND

1. FIELD

Embodiments of the present invention relate generally to computer program performance monitoring and analysis and, more specifically, to low intrusive methods of program logic restoration, such as constructing statistical control flow graphs for performance-critical execution regions.

2. DESCRIPTION

A control flow graph is one of the most fundamental software models describing the actual control transfers within a program and revealing the dynamic execution logic, not otherwise evident, even when studying the program source code. The control flow graph is irreplaceable in various types of software analysis and, in particular, if combined with appropriate performance information, enables efficient application of software optimization methods (such as procedure in-lining and code re-factoring based on branch probability estimations) and helps in discovering parallel execution potential (by means of detecting loops and estimating the number of loop iterations which can be executed in parallel).

Thus, the ability to reconstruct program flow logic and correlate it with performance characteristics is essential for modern computer program performance monitoring systems. Unfortunately, control flow graphs, when constructed by traditional means, suffer from enormous overhead that distorts the correlation between collected program performance data and the actual execution points, which limits the precision and applicability of the control flow graph model to advanced software analysis. The overhead of the traditional control flow graph tools may be explained by the fact that most of them are based on code instrumentation, and the finer grained that instrumentation is (each branch instruction is expected to be instrumented in the most

precise case) the higher the overhead. Moreover, the intrusiveness level of such instrumentation-based tools cannot be adjusted without negatively impacting their precision (it is hard to determine the critical execution paths to instrument in advance).

While the systems that try to correlate performance information collected in real time with the results of an independent program control flow analysis are less intrusive, they suffer from poorer precision, since no information on the actual program execution state is preserved along with performance characteristics (to decrease the intrusiveness) and the correct correspondence between such performance characteristics and independently determined program execution states cannot be established. Therefore, a need exists for the capability to decrease control flow graph construction overhead, thus improving the correlation of the performance data with the actual execution flow, and, at the same time, to form a representative control flow graph which will cover the most performance critical regions in any given program.

BRIEF DESCRIPTION OF THE DRAWINGS The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which:

Figure 1 is a diagram illustrating the construction of a control flow graph for a code sample based on branching information according to an embodiment of the present invention; Figure 2 is a diagram illustrating the contents of a branch trace record that reflect information on the control flow transfers according to an embodiment of the present invention;

Figure 3 is a diagram illustrating the process of collecting branching information for statistically prominent execution regions over time according to an embodiment of the present invention; and

Figure 4 is a flow diagram illustrating the process of collecting branch records upon reception of performance monitoring and branch tracing interrupts according to an embodiment of the present invention.

DETAILED DESCRIPTION An embodiment of the present invention is a method that provides for efficient program control flow restoration in a manner highly correlated with a program's performance by collecting information on executed (taken) branch instructions in the

context of performance monitoring (profiling) interrupts. The disclosed method relies on automated branch tracing capabilities of the processors commercially available from Intel Corporation.

Reference in the specification to "one embodiment" or "an embodiment" of the present invention means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "in one embodiment" appearing in various places throughout the specification are not necessarily all referring to the same embodiment. The following definitions may be useful for understanding embodiments of the present invention described herein.

Basic Block of execution is a group of instructions that are executed in a logically consequent order, without transferring control to other instruction groups.

Statistical Control Flow Graph is a partial program control flow graph reconstructed for statistically prominent code regions, wherein each node represents a basic block and each branch instruction is assigned an estimated number of control transfers.

Branch Trace Store is the processor capability (available in IA-32 and Intel-64 processor families manufactured by Intel Corporation) to preserve the source and destination addresses of each taken branch instruction in a memory resident buffer (called a branch trace buffer).

Branch Trace Interrupt (BTI) is an interruption signal generated by the processor when the branch trace buffer is full.

Performance Event is a measurable change in processor operational characteristics (e.g., processor clock tick or bus access).

Performance Monitoring Interrupt (PMI) is an interruption signal generated by the processor when any one of the enabled performance event counters overflows.

Any reference in the specification to "memory buffer" should not be construed in a limiting sense since the term relates to any type of memory, being static or dynamic, volatile or non-volatile, general-purpose random-access memory or specifically designed register files or similar logic, irregardless of a particular computer system implementation embodying the present invention.

Figure 1 is a diagram illustrating the construction of a control flow graph for a code sample based on branching information. According to the figure, sample code 100 may be represented in the form of a control flow graph 102. For that purpose, memory access, comparison, and register increment instructions separated by a conditional branch instruction may be grouped in basic blocks 104 and 108. The conditional branch instruction may be represented as conditional block 106. The dynamic execution structure of sample code 100 is determined by the frequency of control transfers 110 (which indicates the number of repetitions of the sample code) and 112 (which corresponds to the number of times the control was transferred out of the code in question). Thus, in order to restore the program control flow, information on the conditional and unconditional branch instructions needs to be preserved. The behavior of each branch instruction may be characterized by its source and destination addresses, wherein the source address may comprise the address of the branch instruction itself, and the destination address may comprise the address of the instruction which gains control after executing said branch instruction.

To collect information necessary for restoring program control flow, embodiments of the present invention employ a processor capability to trace information on taken branch instructions available in processors manufactured by Intel Corporation. Figure 2 is a diagram illustrating the contents of a branch trace record that reflect information on the control flow transfers. According to the figure, for each taken branch instruction the processor forms branch trace record 200, comprising source address 202, destination address 204, and branch prediction information 206. The latter indicates whether execution of a particular branch instruction was predicted by the processor branch prediction logic. This functionality is optional for different processor models and, being important from the software analysis point of view, is not essential for embodiments of the present invention and does not limit the use and applicability of the herein disclosed method. Other processor implementations may report additional information in their branch trace records. As many modern operating systems are built to support concurrent execution of multiple tasks, and modern processors provide for task address space separation and virtualization, there may be a need to differentiate between branch trace records

collected for multiple tasks in the same memory buffer. In this case additional information provided in the branch trace records may include process identifier 208 and thread identifier 210. Other means of task identification, including complete per-task virtualization of the branch tracing resources may be employed by particular embodiments of the present invention to address specifics of particular operating systems and processor capabilities.

In order to correlate program control flow with performance information, the collection of branch records may be initiated upon reception of a profiling interrupt. The profiling interrupt may comprise any interruption of the normal execution flow due to a substantial change in operational characteristics of the computational system, including, but not limited to, system task scheduler state, execution time, executed instructions, consumed resources, etc. Embodiments of the present invention may employ various performance monitoring mechanisms available in modern processors to tie the program control flow information to the processor performance events. Figure 3 is a diagram illustrating the process of collecting branching information for statistically prominent execution regions over time. According to the figure, during task activity period 300 the collection of branch records may be repeatedly initiated. Thus, at time T 0 , the corresponding processor logic may be programmed to start collecting branching information in branch trace buffer 302. The collection may stop when said buffer overflows at time Ti. Then, upon reception of the profiling interrupt at time T 2 the collection of branch records may resume, and the contents of the branch trace buffer may be processed when the task being monitored becomes inactive at time T 3 .

One embodiment of the present invention that employs processor capabilities available from Intel Corporation may collect the branch traces in a memory buffer until said buffer overflows. Then, the branch collection software may be automatically notified by the processor logic on the overflow, and the branch records collected so far may be processed. In case the automatic notification capability is not supported by a particular processor, the processing of branch records may be delayed until the reception of a next profiling interrupt.

The process of branch record collection is summarized in Figure 4, which is a flow diagram illustrating the process of collecting branch records upon reception of

performance monitoring and branch tracing interrupts in accordance with an embodiment of the present invention. According to the figure, when the profiling interrupt occurs, the current state of the branch collection process may be tested at block 400. In case the collection of braches was not currently active, it may be initiated at block 402. Otherwise, the processor logic may be programmed to stop the collection at block 404. Then, at block 406, the previously collected branch records may be processed. The processing of branch records may comprise copying said branch records to a non-volatile memory location in order to enable future analysis. The processing may further comprise compressing branch records to minimize the amount of data to be preserved. It should be noted that implementing any other method of processing the branch data to no extent limits the scope of the present invention as soon as it facilitates the construction of a control flow graph, said construction to be performed either at the stage of branch record collection or at a post-processing stage, over the preserved data.

After having processed the previously collected information, the control may be passed to block 402 to re-initiate the collection of branch records.

In case the system that embodies the present invention provides mechanisms to notify the monitoring logic on the overflowing of the branch trace buffers or on collection of the specified number of branch records, the collection of branching information may be stopped at block 408, and the collected data may be processed at block 410 as described above for block 406. The collection of branches in this case may not be resumed until the arrival of a next profiling interrupt.

The statistical nature of the above described invention minimizes the intrusion of data collection, while providing enough information to restore the program control flow. Adjustable profiling frequency and trace buffer depth present extra benefits in balancing the precision and intrusiveness level of the statistical call graph construction. This enables the creation of and forming a basis for a wide range of advanced software analysis instruments that address the needs of automated sequential and parallel program development and optimization, as well as new processor architecture research, by introducing reliable means of capturing dynamic program execution information for such program code regions which affect the performance of the entire system.

For a C language example of an embodiment of the present invention refer to Appendix A. The provided code excerpt illustrates allocation of per-processor structures

for Debug Save Area and Branch Trace Store (as specified in the processor manuals available from Intel Corporation), initiation of branch tracing in a performance monitoring interrupt service routine, handling Branch Trace Store overflow condition, and processing the collected branching information. The processing of the branching information includes differential branch record compression and storing the resulting data in a file.

The provided code excerpt does not constitute a complete performance monitoring nor a control flow restoration system, is provided for illustrative purposes, and should not be viewed as a reference implementation with regard to both its functionality and efficiency.

One skilled in the art will recognize the option of implementing different schemes to process the collected branch records, depending on the actual branch record structure, available processor support, and the goals of a particular performance monitoring system - without deviating from the scope of the present invention. Furthermore, one skilled in the art will recognize that embodiments of the present invention may be implemented in other ways and using other programming languages.

The techniques described herein are not limited to any particular hardware or software configuration; they may find applicability in any computing or processing environment. The techniques may be implemented in logic embodied in hardware, software, or firmware components, or a combination of the above. The techniques may be implemented in programs executing on programmable machines such as mobile or stationary computers, personal digital assistants, set top boxes, cellular telephones and pagers, and other electronic devices, that each include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and one or more output devices. Program code is applied to the data entered using the input device to perform the functions described and to generate output information. The output information may be applied to one or more output devices. One of ordinary skill in the art may appreciate that the invention can be practiced with various computer system configurations, including multiprocessor systems, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing environments where tasks may be performed by

remote processing devices that are linked through a communications network.

Each program may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. However, programs may be implemented in assembly or machine language, if desired. In any case, the language may be compiled or interpreted.

Program instructions may be used to cause a general-purpose or special-purpose processing system that is programmed with the instructions to perform the operations described herein. Alternatively, the operations may be performed by specific hardware components that contain hardwired logic for performing the operations, or by any combination of programmed computer components and custom hardware components. The methods described herein may be provided as a computer program product that may include a machine readable medium having stored thereon instructions that may be used to program a processing system or other electronic device to perform the methods. The term "machine readable medium" used herein shall include any medium that is capable of storing or encoding a sequence of instructions for execution by the machine and that cause the machine to perform any one of the methods described herein. The term "machine readable medium" shall accordingly include, but not be limited to, solid- state memories, optical and magnetic disks. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, process, application, module, logic, and so on) as taking an action or causing a result. Such expressions are merely a shorthand way of stating the execution of the software by a processing system to cause the processor to perform an action or produce a result.

While this invention has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments of the invention, which are apparent to persons skilled in the art to which the invention pertains are deemed to lie within the spirit and scope of the invention.

APPENDIX A

© 2007 Intel Corporation

A C code example for the statistical branch tracing.

The present example shows one possible implementation of the statistical branch tracing algorithm being an essential part of the statistical call sequence restoration method.

The goal of the code is to allocate per-processor structures for Debug Save Area and Branch Trace Store as specified in the processor manuals available from Intel

Corp., to initiate branch tracing in a performance monitoring interrupt service routine, to handle Branch Trace Store overflow condition, and to process the collected branching information. The processing of the branching information includes differential branch record compression, that is, storing the difference between adjacent branch addresses. The resulting compressed buffer may be saved in a file.

The provided code excerpt does not constitute a complete performance monitoring nor a control flow restoration system, is provided for illustrational purposes, and should not be viewed as a reference implementation with regard to both its functionality and efficiency.

/*

// BTS Structures

*/

#defme CPU_BTS_SIZE 0x3000

#defϊne DS_AREA_MSR 0x0600

#define DEBUGCTL MSR 0x0 Id9

#define BTS_ENABLE_MASK_P4 0x003c #defϊne BTS_ENABLE_MASK_P6 0x03c0

#pragma pack(push, 1 )

typedef struct

{ void* bts base; void* bts index; void* bts absmax; void* bts threshold;

void* pebs base; void* pebs index; void* pebs absmax; void* pebs_threshold;

void* pebs_reset[2];

void* reserved[4];

void* padding[8];

} ds area t;

typedef struct

{ void* branch_from; void* branch to; void* prediction;

} btrecord_t;

#pragma pack(pop)

typedef struct

{ void* bts_ptr;

} pcb_t;

/// debug save areas extern ds_area_t dsa[VTSS_PROCESSORS_SUPPORTED];

/// processor control blocks extern pcb_t pcb[VTSS_PROCESSORS_SUPPORTED];

/*

// BTS Functions

*/

/// Allocate per-CPU Branch Trace Store buffers int alloc_bts()

{ int ij; char* bts_ptr; volatile char* access_ptr;

if(!(bts_ptr = malloc(CPU_BTS_SIZE * cpu_no)))

{ return ERR_NOMEMORY;

} for(i = 0; i < cpu_no; i++)

{ pcb[i].bts_ptr = (void*)&bts_ptr[i * CPU_BTS_SIZE];

/// make sure the memory is accessed and 'dirty' access_ptr = (volatile char*)pcb[i].bts_ptr;

for(j = 0; j < CPU_BTS_SIZE; j += 0x1000)

{ access_ptr[j] = 0; }

} return 0;

}

/// Initialize Debug Save Area for Branch Trace Store on the current processor void init_bts()

{ ds_area_t* dsa_ptr; /// a pointer to a per-CPU debug save area void* bts_ptr; /// a pointer to a per-CPU branch trace store

dsa_ptr = &dsa[curr_cpu()]; bts_ptr = pcb[curr_cpu()].bts_ptr;

dsa_ptr->bts_base = bts_ptr; dsa_ptr->bts_index = bts_ptr; dsa_ptr->bts_threshold = (void*)((size_t)bts_ptr

(CPU_BTS_SIZE / 4 * 3 / recsize * recsize)); dsa_ptr->bts_absmax = (void*)(((size_t)bts_ptr

(CPU_BTS_SIZE / recsize - 1) * recsize + I));

write_msr(DS_AREA_MSR, (size_t)dsa_ptr);

}

/// Test if the BTS buffer overflowed int has_bts_overflowed()

{ if(dsa[curr_cpu()].bts_index >= dsa[curr_cpu()].bts_threshold)

{ return 1;

} return 0; }

/// Enable hardware branch tracing void enable btsO

{ write_msr(DEBUGCTL_MSR, (hardcfg.family == OxOf)

BTS_ENABLE_MASK_P4 : BTS_ENABLE_MASK_P6); }

/// Disable hardware branch tracing void disable btsO

{ write_msr(DEBUGCTL_MSR, 0);

}

/// Common PMI and BTI handler void pmi_handler()

{ ack_apic_eoi(); ack_apic_pmi();

if(has_bts_overflowed())

{

/// invoke BTI handler bts handler(ctx);

} else

{ /// invoke PMI handler cntr_handler(eframe, ctx);

} }

/// BTS buffer overflow handler (BTI) void bts_handler(void* ctx)

{ void* bts base; void* bts index; void* bts_absmax;

disable_bts(); process_bts(); }

/// event counter overflow handler (PMI) void cntr_handler(exec_frame_t* eframe, void* ctx) { void* bts index; void* bts absmax; void* bts_base;

/// sample instruction pointer, re-program event counters, etc.

/// process and re-program BTS disable_bts(); process_bts(); init_bts(); enable btsQ;

}

/// Compress collected branch records and save them to the trace file void process_bts() { btrecord t* src; btrecord t* src end; unsigned char* dst; unsigned char dst_start[MAX_FILE_RECORD_SIZE]; size t offset; size t value; int prefix; int sign; int i, j;

src = (btrecord_t*)pcb[curr_cpu()].bts_ptr;

src end = (btrecord_t*)dsa[curr_cpu()].bts_index;

for(; src < src_end;)

{ /// differentially compress branching info and dump to a file dst = dst start = &thread_desc[tidx].scratch[0xl8];

for(offset = 0; (dst - dst_start < SAFE_FILE_RECORD_SIZE) && (src < src_end); src++)

{ src2 = (btrecord_t*)src;

for(i = 0; i < 2; i++)

{ value = ((size_t*)src)[i];

value -= offset; offset = ((size_t*)src)[i]; prefix = (int)((size_t)src->prediction « 3);

} sign = (value & (((size_t) 1 ) « ((sizeof(size_t) « 3) - 1 ))) ? Oxff : 0;

for(j = sizeof(size_t) - 1 ; j >= 0; j~)

{ if(((value » G « 3)) & Oxff) != sign) { break; } } prefix |= sign ? 0x40 : 0; prefix |= j + 1 ;

*dst++ = (unsigned char)prefix;

for(; j >= 0; j--)

{ *dst++ = (unsigned char)(value & Oxff); value »= 8; } } } /// store the resulting compressed BTS buffer to a file write_file(dst, (dst - dst start)); }