Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MANAGING A MULTIPROCESSING COMPUTER SYSTEM
Document Type and Number:
WIPO Patent Application WO/2011/134716
Kind Code:
A1
Abstract:
A system and corresponding method for managing a multiprocessing computer system comprises a hardware control component for selecting a hardware management action responsive to a hardware policy; a virtualization component for presenting virtual hardware resources to a software task execution environment; a software workload management component for controlling at least one running software task and routing at least one new software task using the virtual hardware resources; and a communication component for signalling from the hardware control component across the virtualization component to the software workload management component to perform a software management action in compliance with the hardware management action. The hardware policy may be a hardware power management policy, and the software management action may comprise quiescing the at least one running software task or routing the new software tasks to a different software task execution environment.

Inventors:
JOHNSON PAUL (GB)
Application Number:
PCT/EP2011/053984
Publication Date:
November 03, 2011
Filing Date:
March 16, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
IBM (US)
JOHNSON PAUL (GB)
International Classes:
G06F9/50
Foreign References:
US20100037038A12010-02-11
EP2071458A12009-06-17
US20040111596A12004-06-10
US20090327781A12009-12-31
US20080104587A12008-05-01
Other References:
None
Attorney, Agent or Firm:
STRETTON, Peter, John (Intellectual Property LawHursley Park, Winchester Hampshire SO21 2JN, GB)
Download PDF:
Claims:
CLAIMS

1. A system for managing a multiprocessing computer system comprising:

a hardware control component for selecting a hardware management action responsive to a hardware policy;

a virtualization component for presenting virtual hardware resources to a software task execution environment;

a software workload management component for controlling at least one running software task and routing at least one new software task using said virtual hardware resources; and

a communication component for signalling from said hardware control component across said virtualization component to said software workload management component to perform a software management action in compliance with said hardware management action.

2. The system as claimed in claim 1 , wherein said hardware policy is a hardware power management policy.

3. The system as claimed in claim 1 or claim 2, wherein said software management action comprises quiescing said at least one running software task.

4. The system as claimed in any preceding claim, wherein said software management action comprises routing said new software tasks to a different software task execution environment.

5. The system as claimed in any preceding claim, wherein said hardware management action comprises shutting down a hardware resource.

6. The system as claimed in any of claims 1 to 4, wherein said hardware management action comprises starting a hardware resource.

7. A method of operating a system for managing a multiprocessing computer system comprising: selecting, by a hardware control component, a hardware management action responsive to a hardware policy;

presenting, by a virtualization component, virtual hardware resources to a software task execution environment;

controlling, by a software workload management component, at least one running software task and routing at least one new software task using said virtual hardware resources; and

signalling, by a communication component, from said hardware control component across said virtualization component to said software workload management component to perform a software management action in compliance with said hardware management action.

8. The method as claimed in claim 7, wherein said hardware policy is a hardware power management policy.

9. The method as claimed in claim 7 or claim 8, wherein said software management action comprises quiescing said at least one running software task.

10. The method as claimed in any of claims 7 to 9, wherein said software management action comprises routing said new software tasks to a different software task execution environment.

11. The method as claimed in any of claims 7 to 10, wherein said hardware management action comprises shutting down a hardware resource.

12. The method as claimed in any of claims 7 to 10, wherein said hardware management action comprises starting a hardware resource.

13. A computer program comprising computer program code to, when loaded into a computer system and executed thereon, cause said computer system to perform all the steps of the method of any of claims 7 to 12.

Description:
MANAGING A MULTIPROCESSING COMPUTER SYSTEM

Field of the Invention

The present invention relates to the management of multiprocessing computer systems, and more particularly to a system for combining dynamic workload management over a virtualization layer with control of power-consuming hardware resources.

Background of the Invention

Executing applications in a highly heterogeneous, dynamically balanced computer system or network of systems, provides a highly available, responsive platform for user applications. Such systems are typically optimised for maximum possible throughput and application response time.

Users of such complex systems today have an increasing need to manage additional constraints in the way of energy efficiency and cost control. Today most approaches to managing these constraints consist of manual schemes such as energy consumption monitoring by operations staff and manual server quiesce and shutdown, or simple automated execution of the same manual process. This does not cater well for the variable demand that may be exerted on such systems, nor for the responsiveness that modern systems require.

Certain market segments among users of such systems in specific geographies (e.g. the banking sector in China) have regulatory maxima placed on CPU consumption by the logical partitions within their servers. There is currently no way of meeting these requirements except by cost-intensive human monitoring and manual manipulation of system resource use.

It is desirable to address these shortcomings of known multiprocessing computer systems wherein the conflicting needs for dynamic workload management over a virtualization layer and for power management of hardware resources cannot be reconciled other than partially, and by means of expensive and potentially error-prone application and system redesign.

Summary of the Invention

The present invention accordingly provides, in a first aspect, a system for managing a multiprocessing computer system comprising: a hardware control component for selecting a hardware management action responsive to a hardware policy; a virtualization component for presenting virtual hardware resources to a software task execution environment; a software workload management component for controlling at least one running software task and routing at least one new software task using said virtual hardware resources; and a

communication component for signalling from said hardware control component across said virtualization component to said software workload management component to perform a software management action in compliance with said hardware management action.

Preferably,

said hardware policy is a hardware power management policy. Preferably, said software management action comprises quiescing said at least one running software task. Preferably, said software management action comprises routing said new software tasks to a different software task execution environment. Preferably, said hardware management action comprises shutting down a hardware resource. Preferably, said hardware management action comprises starting a hardware resource.

There is provided, in a second aspect, a method of operating a system for managing a multiprocessing computer system comprising: selecting, by a hardware control component, a hardware management action responsive to a hardware policy; presenting, by a virtualization component, virtual hardware resources to a software task execution environment; controlling, by a software workload management component, at least one running software task and routing at least one new software task using said virtual hardware resources; and signalling, by a communication component, from said hardware control component across said virtualization component to said software workload management component to perform a software management action in compliance with said hardware management action. Preferably, said hardware policy is a hardware power management policy. Preferably, said software management action comprises quiescing said at least one running software task. Preferably, said software management action comprises routing said new software tasks to a different software task execution environment. Preferably, said hardware management action comprises shutting down a hardware resource. Preferably, said hardware management action comprises starting a hardware resource.

In a third aspect, there is provided a computer program comprising computer program code to, when loaded into a computer system and executed thereon, cause said computer system to perform all the steps of the method of the second aspect.

Brief Description of the Drawings

A preferred embodiment of the present invention will now be described by way of example only, with reference to the accompanying drawings, in which:

Figure 1 shows the layers of a complex dynamically-routing, virtualized system in which the present invention may be embodied;

Figure 2 shows a simplified representation of multiprocessing system components in which the present invention may be embodied;

Figure 3 shows a method of operation of a system according to a preferred embodiment of the present invention; and

Figure 4 shows a further method of the operation of a system according to a preferred embodiment of the present invention.

Detailed Description of the Preferred Embodiment

In brief, the preferred embodiment of the present invention provides a system and method for communicating between a hardware resource layer of a complex computing system and a routing layer so that, in spite of an intervening virtualization layer, hardware resource policies may be enacted in the system without significantly impacting the operation of dynamic routing of application task workloads.

Figure 1 shows the layers of a complex dynamically-routing, virtualized system in which the present invention may be embodied. At the top of the system stack is located user layer 100, which typically consists of client systems on which "real-life" applications, such as business processes, are initiated and where user I/O interactions occur. Beneath user layer 100 lies task layer 102, in which application tasks are initiated to satisfy the requirements of user layer 100 applications. In reality, in such complex systems as are under consideration here, these underlying tasks may be dynamically routed among logical partitions, address spaces or the like, and this activity is achieved by routing layer 106, which may comprise a workload manager. The nature of such dynamically-routed task systems is that, at least conceptually, there is a virtualization layer 106 between the upper, software-implemented layers in the stack and the underlying hardware resources. The underlying hardware resources lie at the base of the stack in the hardware layer 108, and it is at this level in the stack that real, physical resources, such as, for example, processors and storage devices are located, and where considerations such as power consumption, physical space requirements and the need for cooling must be taken into consideration.

An exemplary multiprocessing system of intercommunicating components in which the preferred embodiment of the present invention may be implemented is shown in a much- simplified schematic form in Figure 2 and comprises multiprocessing system 200 in which a hardware controller 202 is in direct communication with a virtualization component 204. In order to provide communication between hardware controller 202 and software workload management component 206 for the purpose of correctly implementing hardware policies, communication component 208, which is capable of presenting appropriate data and signals both to hardware controller 202 via virtualization component 204 and to software workload management component 206, thus overcoming the "information gap" that is caused by having the virtualization layer between the hardware layer 108 and the routing layer 104 of Figure 1. The preferred embodiment of the present invention introduces energy consumption metrics in the hardware layer which understands hardware resources, such as CPU structures, and storage to logical partition and address space relationships. A notification capability is provided between this hardware layer and the routing layer which controls allocation of work requests in the application task layer address spaces. A hardware policy can be applied to the energy consumption metrics at the hardware layer to identify hardware resources that can be relinquished to reduce energy consumption at any particular time.

The notification capability causes the routing layer to autonomically redistribute work away from, and subsequently back to, those address spaces that have been identified as associated with the underlying hardware resources. Thus, resource management policies and their implementation in the hardware layer can identify potential CPUs that are capable of being successfully "drained" (that is, have running tasks complete or quiesce and allow no new tasks to start). Corresponding events or signals can be issued for the logical partitions or address spaces associated with those CPUs to cause the running tasks to quiesce and to allow no new tasks to be started. It will be clear to one of ordinary skill in the art that simply removing any allocation from the logical partitions or address spaces for already-running tasks is not desirable as this would block work already in progress.

The routing layer, on receiving a notification to "drain" would then route any new requests away from these address spaces (although it will be clear to one of ordinary skill that this action is subject to any affinity relations between tasks and address spaces based on, for example, local data ownership by an address space). The address space passes through a quiescing-to-quiesced transition. A return notification, or observation of the quiesced state by a monitoring component in the hardware layer, could then activate the relinquishing of the hardware resources, for example, powering down the selected CPU. It will be clear to one of ordinary skill in the art that simple routing logic at the routing layer could not achieve this objective, as the mapping of address spaces and hardware (for example CPU allocation) is not known at that level.

Turning to Figure 3, there is shown a method of operation of a multiprocessing system of the type illustrated in Figure 2 according to a preferred embodiment of the present invention. In Figure 3, the method of operation begins at START step 300, and at step 302, a hardware policy is stored. Test step 304 determines whether the system is compliant with the hardware policy. If the determination at test step 304 is positive, no further action needs to be taken, and the process completes at END step 320. In normal operation, the testing of test step 304 is iterative, and may be implemented by any of a number of known hardware monitoring systems. If the determination at test step 304 is negative, the hardware is not compliant with the policy and at step 306 one or more hardware resources are selected for offlining to bring the hardware into compliance with the hardware policy that was stored at step 302. At step 308 a signal or notification is sent to the routing layer 104, which begins, at step 310, to route, at step 306, newly- initiated tasks away from the logical partition, address space or the like that is consuming hardware resources that have been selected for offlining. At test step 312, it is determined whether it is now possible to relinquish the hardware resource that was selected at step 306, so that it may be offlined to bring the system into compliance with the hardware policy. If the determination at test step 312 is positive, a signal or notification is sent to the hardware layerl08 at step 318, and the process completes at END step 320. If the

determination at test step 312 is negative, one or more running tasks need to be quiesced at step 314, and at test step 316, it is determined whether all such running tasks have been quiesced. If the determination at test step 316 is negative, steps 314 and 316 are repeated until the outcome of test step 316 is positive, when a signal or notification is sent to hardware layer 108 at step 318, and the process completes at END step 320.

As will be clear to one of ordinary skill in the art, when the policy allows, or the workload requires, that additional hardware resources be assigned, the same notification capability can be used between the system layers to allow a hardware resource, such as a CPU, to be once more made available for workload and for address spaces associated with the resource to have new tasks routed to them.

This is illustrated in Figure 4, in which, after START step 400, test step 402 determines whether additional hardware resource is required, and either falls within the policy or is of a high enough priority to override a "soft cap" policy ("soft caps" are well known to those of ordinary skill in the art to be maximum limits that can be overridden in exceptional circumstances). If the determination at test step 402 is negative, processing continues and the test step is repeated as necessary, a process that may be implemented by any of a number of known hardware monitoring systems. If the determination at test step 402 is positive, a signal or notification is sent to the hardware layer 108 at step 404. At step 406, it is determined whether additional quiesced or offlined hardware resource is available to meet the requirement of test step 402. If the determination is negative, the process completes at END step 414. If the determination at test step 406 is positive, the hardware resource is restarted at step 408. Test step 410 tests to determine whether the hardware resource is ready (for example, whether a CPU bootstrap process has completed and an operating system, virtual machine, logical partition or the like has been fully initialized). Test step 410 iterates until the outcome is positive, at which time a signal or notification is sent to the routing layer 104 at step 412, indicating that tasks may now be routed in such a way as to take advantage of the hardware resource, and the process completes at END step 414.

It will be readily appreciated by one of ordinary skill in the art that various further modifications to the embodiment described above will be apparent to a person of ordinary skill in the art.

It will be clear to one skilled in the art that the method of the present invention may suitably be embodied in a logic apparatus comprising logic means to perform the steps of the method, and that such logic means may comprise hardware components or firmware components.

It will be appreciated that the method described above may also suitably be carried out fully or partially in software running on one or more processors (not shown), and that the software may be provided as a computer program element carried on any suitable data carrier (also not shown) such as a magnetic or optical computer disc. The channels for the transmission of data likewise may include storage media of all descriptions as well as signal carrying media, such as wired or wireless signal media.

The present invention may suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer readable instructions either fixed on a tangible medium, such as a computer readable medium, for example, diskette, CD-ROM, ROM, or hard disk, or transmittable to a computer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques. The series of computer readable instructions embodies all or part of the functionality previously described herein.

Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to, semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink-wrapped software, pre-loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web.