Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS TO OPTIMIZE POWER SETTINGS FOR A WORKLOAD
Document Type and Number:
WIPO Patent Application WO/2017/171973
Kind Code:
A1
Abstract:
A system with granular power/performance management. The system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage. Each platform executes an optimization agent to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings. A candidate list creator exists as part of the optimization agent to aggregate a list of performance metrics associated with the plurality of permutations.

Inventors:
HOFFMAN ANDY (US)
BODAS DEVADATTA (US)
RAJAPPA MURALIDHAR (US)
ABOU GAZALA NEVEN (US)
SONG JUSTIN (US)
BALASUBRAMANIAN KAUSHIK (IN)
BIRRER THOMAS (US)
GREFE BENJAMIN (US)
FORBES MARVIN (US)
Application Number:
PCT/US2017/013440
Publication Date:
October 05, 2017
Filing Date:
January 13, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTEL CORP (US)
International Classes:
G06F1/32; G06F9/50
Foreign References:
US20120151490A12012-06-14
US20130261826A12013-10-03
US9292060B12016-03-22
US20110282982A12011-11-17
US20130179706A12013-07-11
Attorney, Agent or Firm:
MALLIE, Michael, J. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A system with granular power/performance management, the system comprising:

a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage;

an optimization agent to execute on each platform, the optimization agents to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings; and

a candidate list creator to aggregate a list of performance metrics associated with the plurality of permutations.

2. The system of claim 1, further comprising:

an administrative node to accept a user-defined criterion for system optimization, to display the selection list, and to accept a selection of a permutation from the list.

3. The system of claim 1, wherein the optimization agents are collectively to apply an optimal permutation of the settings to the plurality of platforms based on the metrics.

4. The system of claim 1, further comprising:

a load balancer to distribute tasks between the plurality of platforms.

5. The system of claim 4, wherein the load balancer comprises:

an optimal setting storage, the optimal setting storage to maintain an association between a task type and an optimal permutation of settings for the task type.

6. An apparatus with granular power management, comprising:

a platform having a processor to execute tasks;

a plurality of knobs to adjust settings of the platform;

an optimization agent to cause the processor to execute a workload under a plurality of permutations of the settings and tracks a performance metric for each permutation; and

wherein the optimization agent is to adjust the knobs to a permutation responsive to a comparison between the metric and a user- specified set of criteria.

7. The apparatus of claim 6, further comprising:

a user interface to accept the set of user-defined criteria for comparison with the metric.

8. The apparatus of claim 6, further comprising:

a setting storage to store an association between a type of task and an optimal setting permutation for the type of task. 9. The apparatus of claim 6, wherein the optimization agent is to select the optimal setting permutation from the setting storage for a task to be processed.

10. The apparatus of claim 7, further comprising:

a selection list compiler to aggregate a list of permutations and associated metrics, and wherein the user interface accepts a user selection of one of the permutations in the selection list for use in the apparatus.

11. A method to granularly control power/performance in a system, comprising:

applying a sample workload to an execution environment;

evaluating performance of the execution environment executing the workload at a plurality of permutations of platform settings; and

establishing one of the permutations of settings for use with future workloads based at least in part on the evaluation.

12. The method of claim 11, further comprising:

compiling a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.

13. The method of claim 12, further comprising:

displaying the selection list within a user interface; and

accepting a selection of one of the permutations from the list.

14. The method of claim 11, further comprising:

accepting a user-defined set of criteria that dictate an optimal platform setting permutation;

automatically setting the platform settings to the optimal permutation.

15. The method of claim 14, wherein the user-defined set of criteria include a ratio of power savings to performance impact.

16. The method of claim 11, wherein evaluating comprises:

comparing an effect of a permutation against a user-defined criterion; and

rejecting a permutation not satisfying the criterion for the workload.

17. The method of claim 12, wherein compiling comprises:

eliminating permutations not satisfying a user-defined set of criteria.

18. A non-transitory computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a set of operations to granularly control power/performance in a system comprising:

applying a sample workload to an execution environment; evaluating performance of the execution environment executing the workload at a plurality of permutations of platform settings; and

establishing one of the permutations of settings for use with future workloads based at least in part on the evaluation.

19. The non-transitory computer-readable medium of claim 17, wherein the instructions cause the processor to perform a set of operations further comprising:

compiling a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.

20. The non-transitory computer-readable medium of claim 17, wherein the instructions cause the processor to perform a set of operations further comprising:

accepting a user-defined set of criteria that dictate an optimal platform setting permutation;

automatically setting the platform settings to the optimal permutation.

21. The non- transitory computer-readable medium of claim 20, wherein the user-defined set of criteria include a ratio of power savings to performance impact.

22. The non-transitory computer-readable medium of claim 17, wherein evaluating causes the processor to perform a set of operations comprising:

comparing an effect of a permutation against a user-defined criterion; and

rejecting a permutation not satisfying the criterion for the workload.

23. A system with granular power performance control, the system comprising:

a plurality of platforms to execute tasks;

means for determining an optimal permutation of platform settings from a plurality of permutations of platform settings; and

means for applying the optimal permutation of platform settings to each platform in the plurality.

24. The system of claim 23, wherein the means for determining comprises:

means for causing execution of a workload on one platform of the plurality under one permutation of the platform settings; and

means for comparing a metric generated responsive to the execution with a user-defined value of a desired metric.

25. The system of claim 23, further comprising:

means for compiling a list of setting permutations associated with the metric generated responsive to executing a workload on the platforms.

Description:
METHOD AND APPARATUS TO OPTIMIZE POWER

SETTINGS FOR A WORKLOAD

BACKGROUND

FIELD

Embodiments of the invention relate to power management. More specifically, embodiments of the invention relate to granular control of power performance tradeoffs in an execution environment.

BACKGROUND

Systems providing, for example, cloud services, often employ hundreds of thousands of servers to provide those services. Many servers are used for specific types of workloads or tasks. Depending on the tasks, power performance tradeoffs may exist. However, existing platforms generally support only three power modes: "performance," "balanced" and "power savings." These three modes are generally one-size-fits-all such that a single case or single type of workload may force the platform to, for example, always operate in the performance mode with the corresponding negative power tradeoff. Where scaled over hundreds of thousands of units, the unnecessary power usage becomes quite significant.

A similar problem exists in high performance computing (HPC). In HPC, a node or server runs a specific job (task) and may run for hours on that task. Because each task may have a different power performance optimal function, the one-size-fits-all existing methodology generally results in all nodes exhibiting inefficient power usage.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to "an" or "one" embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one.

Figure 1 is a block diagram of a system according to one embodiment of the invention.

Figure 2 is a flow diagram of operation according to one embodiment of the invention. DETAILED DESCRIPTION

Figure 1 is a block diagram of a system according to one embodiment of the invention. Execution environment 106 includes a plurality of platforms 120-1, 120-2... 120-n (generically, platform 120). The number of platforms 120 may be arbitrarily large, e.g., in the context of cloud services, the execution environment 106 may include hundreds of thousands of platforms 120. Execution environment 106 may be connected over a network 104 to an administrative node 102. Administrative node displays a user interface 110 through which a user can specify optimization criteria 112 that may be used to optimize the power-performance characteristics of the execution environment 106.

Execution environment 106 may include a load balancer 122 to distribute incoming service requests amongst the platforms 120. Each platform 120 includes a processor 142 which executes tasks 144. Each platform 120 also includes a number of settings 146 which may be thought of as knobs 148-1, 148-2, 148-3... 148-n (generically, knob 148). As used herein, "knob" refers to a virtual knob that can be set to a plurality of values. Examples of knobs 148 that typically exist includes, for example:

- Knobs for selection, change, coordination, of frequencies, power states, idle states for:

- CPU cores, uncores, Field-Programmable Gate Arrays (FPGAs), Application- Specific Integrated Circuits (ASICs), packages, sockets

- Input/output controllers, buses, devices (e.g., QuickPath Interconnect (QPI), UltraPath Interconnect (UPI), Front-side Bus (FSB), Double Data Rate (computer memory bus) (DDR), Peripheral Component Interconnect Express (PCIe), Inter-Integrated Circuit (I2C), Platform Environmental Control Interface (PECI), rings, etc.)

- Fabric/communication controllers, buses, devices (e.g., Serial Advanced Technology Attachment (SATA), Serial- Attached SCSI (SAS), Universal Serial Bus (USB), Video, Ethernet (ETH), computer interconnects e.g., Stormlake and Infiniband, etc.)

- Knobs for write-back/through, prefetching of data and instructions for:

- Data storage devices, non-volatile memory, memory, caches, pipelines, queues, (e.g.

DDR clock enable (CKE))

- Knobs for selection, change, coordination, of width of buses and links (e.g. LOp, LOs (LOx in general, which is the nomenclature used in configuring link widths, feature designed to save power while activity is low for PCIe, QPI, UPI, FSB, rings, etc.))

- Knobs for selection of values for various Performance-Bias registers.

- Knobs for configurations of future generations and technologies of buses, links, interconnect, processors, chipsets, controllers, memory devices, etc.

As can be appreciated this myriad of knobs provides a vast number of possible permutations of knob settings that collectively provide tremendous granularity to the power- performance tradeoff decision. By providing greater control of the granular knob settings improved power-performance can be achieved.

The platform 120 also includes an optimization agent 152 that causes the processor 142 to execute a sample workload such as task 144 based on a particular permutation of knob settings 146, and a performance metric for that permutation may be associated with the task type. The optimization agent includes an candidate list creator that aggregates an indication of the permutation and the resulting metric as part of a candidate list 154. Thus for example in some embodiments a candidate list 154 may include an indication of the workload executed (Al) and indication of the criteria (CI) and indication of the metric (Mx) corresponding to the permutation (Px).

By repeatedly executing the workload 144, with different permutations of knobs 148 and optimal permutation can be identified based on the resulting metric. Given the arbitrarily large number of knobs that presently exist or may exist in the future, it may be impractical to use a brute-force approach to go through every possible permutation. However, since the general effect of adjusting the knob is known, that is, some knobs will improve energy usage characteristics but will reduce performance (e.g. reducing processor clock speed reduces performance and improves power usage), if the goal is to maximize performance, different permutations where that knob is changed may be reduced or eliminated. Some embodiments may choose a set of knobs known to have the greatest impact on e.g. power consumption and cycle through all permutations of those knobs. As used herein, "set" is deemed to have one or more members and does include the empty set. In the context of selecting a set of knobs a suitable set is likely tens of knobs.

Additionally, while some embodiments may execute the sample workload repeatedly on a single platform with a different set of permutations on each execution, other embodiments execute the workload over a plurality of platforms, with each platform having a different permutation of the knobs 148. This permits parallel identification of the optimal permutation. Generally, as used herein, the "optimal permutation" is a permutation most closely matching the user-supplied criteria.

As noted above, a user may supply criteria to be used in permutation selection via the administrative node 102 over network 104. In some embodiments, the criteria may be a ratio of power savings to performance lost. Some embodiments allow complex specification of the criteria, and may include establishment of a performance floor and/or a ceiling for power usage. Other criteria may primarily specify, for example, that a tradeoff is acceptable when it achieves, e.g., a 2% power savings for each % performance decrease. Other ratios and metrics are within the scope of embodiments of the invention.

The optimization agent 152 provides the user interface 110 to the administrative node 102 to accept the optimization criteria 112. It also populates the benefit selection list within the user interface 110 with entries from the candidate list 152 that satisfy the user-supplied criteria. Candidates from the optimizations agents 152 from all the platforms 120 can be assembled into the selection list 114 to be displayed to the user where multiple platforms are used to cycle knob permutations. In some embodiments a selection list compiler 134 may exist in a central location such as load balancer 122 to assemble the selection list from the candidate lists 154 of each platform 120. In other embodiment, the selection list compiler may be part of the user interface 110. In still other embodiment the selection list compiler may be part of the optimization agent.

The selection list 114 both displays permutations within the threshold of the optimization criteria and, in some embodiments, permits user selection of one of the

permutations from the benefit selection list. The user interfce 110 conveys the selection to the optimization agent 152, and knobs 148 are set consistent with that permutation for each platform 120. Some embodiments of the invention allow the optimization agent 152 to select a permutation based on the user-defined criteria 112 automatically, without providing the benefit selection list to the user.

In some embodiments, an association between a task type and an optimal permutation may be stored within the execution environment 106, such as optimal setting storage 132 that maintains and associate between task type and the optimal permutation in load balancer 122. In this manner, tasks having a particular type can trigger the automatic setting of the desired permutation when the load balancer 122 sends a task to a particular platform 120.

While administrative node 102 is shown remote from the platform 120, in some embodiments administration may be local to the platform. For example, embodiments of the invention may be employed in a mobile environment such as a laptop computer, where the optimization agent optimizes the laptop based on a locally provided metric, and the selection of the permutation of knobs settings may be locally administered through the mobile platform.

Figure 2 is a flow diagram of operation according to one embodiment of the invention. At block 202, an execution environment accepts a user-defined criteria for optimal operation. Typical execution environments include cloud services server facilities, HPC facilities and mobile computing facilities. The criteria may be provided by an administrative node over a network or locally within the execution environment. At block 204, optimization agent sets a permutation of knobs on a platform within the execution environment. At block 206, a sample workload is executed on the platform. The sample workload may be taken from a real data set or a fictitious set, which may be artificially created to be representative of the task to be performed by the platform. A metric is generated based on the execution of the workload. For example, a ratio of power to performance is one suitable metric.

If the metric generated from execution does not satisfy user-defined criteria at block 208, the effectiveness of the existing permutation may be compared against one or more prior permutations at block 209. The current permutation is rejected at block 210. Then at block 211, the result of the comparison may be used to predict a permutation with greater effectiveness. For example, if turning a knob in one direction has shown a negative effectiveness relative to a prior permutation, it may be inferred that the knob should be turned in the opposite direction to improve effectiveness. Use of a heuristic approach can more rapidly find an optimal permutation. If instead the metric satisfies the criteria, that metric and its corresponding permutation (or a representation thereof) is added to a candidate list at block 212. A

determination is then made whether there are more permutations in the set of possible permutations desired to be tested at block 214. If there are more permutations at block 214, the process repeats.

If there are no more permutations, at block 216 a selection list may be optionally displayed to allow a user to select their desired permutation. The selection list may be derived from the candidate list and may include information on the metric achieved by the corresponding permutation. A best permutation is selected from the selection list at block 220. As noted, the best permutation may be selected by a user through, for example, administrative node where a selection list is provided to the user. Some embodiment may not display the selection list and the best permutation may be selected automatically as the highest performance or lowest power option satisfying the user-defined criteria depending on whether power or performance is desired for the particular task.

The selected permutation is then applied to all knobs for all platforms in the execution environment that will execute the task at block 218. At block 220, the permutation may be stored in association with the task type so that future executions of tasks of the same type may have the optimal permutation automatically applied.

The following examples pertain to further embodiments. The various features of the different embodiments may be variously combined with some features included and others excluded to suit a variety of different applications. Some embodiments pertain a system with granular power/performance management. The system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage. The platforms include an optimization agent to execute on each platform, the optimization agents to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings. A candidate list creator is used to aggregate a list of performance metrics associated with the plurality of permutations. In further embodiments, the system has an administrative node to accept a user- defined criterion for system optimization, to display the selection list, and to accept a selection of a permutation from the list.

In further embodiments, the optimization agents are collectively to apply an optimal permutation of the settings to the plurality of platforms based on the metrics.

In further embodiments, the system has a load balancer to distribute tasks between the plurality of platforms.

In further embodiments, the load balancer has an optimal setting storage, the optimal setting storage to maintain an association between a task type and an optimal permutation of settings for the task type.

Some embodiments pertain to an apparatus with granular power management. A platform has a processor to execute tasks. The platform also has a plurality of knobs to adjust settings of the platform. An optimization agent on the platform causes the processor to execute a workload under a plurality of permutations of the settings and tracks a performance metric for each permutation. The optimization agent is to adjust the knobs to a permutation responsive to a comparison between the metric and a user- specified set of criteria.

In further embodiments, the apparatus provides a user interface to accept the set of user- defined criteria for comparison with the metric.

In further embodiments, the apparatus has a setting storage to store an association between a type of task and an optimal setting permutation for the type of task.

In further embodiments, the optimization agent is to select the optimal setting permutation from the setting storage for a task to be processed.

In further embodiments, the apparatus has a selection list compiler to aggregate a list of permutations and associated metrics. The user interface accepts a user selection of one of the permutations in the selection list for use in the apparatus.

Some embodiments pertain to a method to granularly control power/performance in a system. The control is accomplished by applying a sample workload to an execution

environment. Based on the application, performance of the execution environment executing the workload at a plurality of permutations of platform settings is evaluated. One of the

permutations of settings is established for use with future workloads based at least in part on the evaluation.

In further embodiments, a benefit selection list is compiled for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations. In further embodiments, the selection list is displayed within a user interface and a selection of one of the permutations from the list is accepted.

In further embodiments, a user-defined set of criteria that dictate an optimal platform setting permutation is accepted. The platform settings are automatically set to the optimal permutation.

In further embodiments, the user-defined set of criteria include a ratio of power savings to performance impact.

In further embodiments, evaluating is accomplished by comparing an effect of a permutation against a user-defined criterion and rejecting a permutation not satisfying the criterion for the workload.

In further embodiments, compiling includes eliminating permutations not satisfying a user-defined set of criteria.

Some embodiments pertain to a non-transitory computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a set of operations to granularly control power/performance in a system. The instructions cause the processor to apply a sample workload to an execution environment. The instructions also cause the processor to evaluate performance of the execution environment executing the workload at a plurality of permutations of platform settings. The instructions also cause the processor to establish one of the permutations of settings for use with future workloads based at least in part on the evaluation.

In further embodiments, the instructions cause the processor to compile a benefit selection list for the permutations, the benefit selection list providing an indication of performance versus power tradeoff of the permutations.

In further embodiments, the instructions cause the processor to display the selection list within a user interface and accept a selection of one of the permutations from the list.

In further embodiments, the instructions cause the processor to accept a user-defined set of criteria that dictate an optimal platform setting permutation and automatically set the platform settings to the optimal permutation.

In further embodiments, the user-defined set of criteria include a ratio of power savings to performance impact.

In further embodiments, the instructions cause the processor to compare an effect of a permutation against a user-defined criterion and reject a permutation not satisfying the criterion for the workload. In further embodiments, the instructions cause the processor to eliminate permutations not satisfying a user-defined set of criteria.

Some embodiments pertain to a system with granular power performance control. The system has a plurality of platforms to execute tasks. The system also has means for determining an optimal permutation of platform settings from a plurality of permutations of platform settings and means for applying the optimal permutation of platform settings to each platform in the plurality.

In further embodiments, the means for determining has means for causing execution of a workload on one platform of the plurality under one permutation of the platform settings and means for comparing a metric generated responsive to the execution with a user-defined value of a desired metric.

In further embodiments, the system has means for compiling a list of setting permutations associated with the metric generated responsive to executing a workload on the platforms.

While embodiments of the invention are discussed above in the context of flow diagrams reflecting a particular linear order, this is for convenience only. In some cases, various operations may be performed in a different order than shown or various operations may occur in parallel. It should also be recognized that some operations described with respect to one embodiment may be advantageously incorporated into another embodiment. Such incorporation is expressly contemplated.

In the foregoing specification, the invention has been described with reference to the specific embodiments thereof. It will, however, be evident that various modifications and changes can be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.