Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NON-RECURSIVE ADAPTIVE FILTER FOR PREDICTING THE MEAN PROCESSING PERFORMANCE OF A COMPLEX SYSTEM´S PROCESSING CORE
Document Type and Number:
WIPO Patent Application WO/2009/047664
Kind Code:
A1
Abstract:
The present invention describes a power management unit (120) and a corresponding method for controlling the performance and power consumption of a complex low-power integrated system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing. Thereby, a linear non-recursive adaptive filter (200) which performs a processor load prediction of the system's processing core is applied whose filter coefficients may e.g. be calculated based on the least mean square (LMS) optimization criterion or based on any other similarity measure, hence being able to reduce the power consumption of the system's entire processing sub¬ system. In this connection, the adaptive filter may e.g. be used to predict the regularity of the clock frequency in the processing core. By using this information, the linear non- recursive adaptive filter (200) predicts the duration of how long the processing core may lower its operating voltage to still be able to complete all its tasks in time. Thereby, a power-efficient filter implementation is provided for running the adaptive filter on a digital signal processor.

Inventors:
ECKHARD WALTERS (DE)
Application Number:
PCT/IB2008/053931
Publication Date:
April 16, 2009
Filing Date:
September 26, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ST WIRELESS SA (CH)
ECKHARD WALTERS (DE)
International Classes:
G06F1/32
Domestic Patent References:
WO2006056824A22006-06-01
WO2004044720A22004-05-27
Foreign References:
US7111179B12006-09-19
EP0666527A11995-08-09
US20060287739A12006-12-21
Other References:
MIN LI ET AL: "A Novel Penalty Controllable Dynamic Voltage Scaling Scheme for Mobile Multimedia Applications", IEEE TRANSACTIONS ON MOBILE COMPUTING, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 5, no. 12, 1 December 2006 (2006-12-01), pages 1719 - 1733, XP011149687, ISSN: 1536-1233
Attorney, Agent or Firm:
PEZZOLI, Ennio et al. (Via Settembrini 40, Milano, IT)
Download PDF:
Claims:
CLAIMS

1. A power management unit for controlling the performance and power consumption of a complex low-power integrated system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing, said power management unit (120) comprising or having access to an adaptive predic- tion filter (200) for predicting the regularity of the processing core's clock frequency (f c ) without requiring information about a scheduled processing load by executing a look- ahead prediction based on the processing core's sleep time ratio, the latter being monitored in a sliding observation window for N subsequent time slices.

2. A power management unit according to claim 1, wherein said adaptive prediction filter (200) is realized as a linear finite impulse response filter with (N+ 1) filter coefficients.

3. A power management unit according to anyone of the preceding claims, wherein the adaptive prediction filter (200) provides amplification (202), summation

(204) and delay elements (206) for calculating a predicted clock frequency (f c " + l ) at a time slice (n+l) directly succeeding a current time slice (n) within said sliding observation window as a weighted average of measured clock frequencies (// , f " ~l , / c " ~2 , ..., f c " ~N ) at time slices (n, n-l, n-2, ..., n-N) preceding said time slice (n+l), thereby using real- valued weighting coefficients {cik \ k = 0, 1, 2, ..., N) which are adapted to minimize the clock frequency prediction error.

4. A power management unit according to claim 3, comprising a digital signal processor which implements said adaptive prediction filter (200), said digital signal processor being adapted to calculate the minimized frequency prediction error and thus to calculate a minimized sleep duration of the processing core by applying a similarity measure.

5. A power management unit according to claim 4, wherein said similarity measure is given by the least mean square optimization criterion.

6. A power management unit according to anyone of the preceding claims, wherein said complex low-power integrated system is a high-end cellular mobile terminal, a workstation, a notebook, a laptop, an organizer, a personal digital assistant, a pocket calculator or any other wireless or wire-bound, battery- or means-powered computing, communication and/or information processing device.

7. A complex low-power integrated system comprising a power management unit for controlling the performance and power consumption of the system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing, said power management unit (120) comprising or having access to an adaptive prediction filter (200) for predicting the regularity of the processing core's clock frequency (f c ) without requiring information about a scheduled processing load by executing a look- ahead prediction based on the processing core's sleep time ratio, the latter being monitored in a sliding observation window for N subsequent time slices.

8. A complex low-power integrated system according to claim 7, wherein said adaptive prediction filter (200) is realized as a linear finite impulse response filter with (N+ 1) filter coefficients.

9. A complex low-power integrated system according to anyone of claims 7 or 8, wherein the adaptive prediction filter (200) provides amplification (202), summation (204) and delay elements (206) for calculating a predicted clock frequency (f c " + l ) at a time slice (n+X) directly succeeding a current time slice (n) within said sliding observa- tion window as a weighted average of measured clock frequencies (// , f c " ~l , f c " ~2 , ■ ■ ■■ > f " ~N ) at ti me slices (n, n-\, n-2, ..., n-N) preceding said time slice (n+\), thereby

using real- valued weighting coefficients {cik \ k = 0, 1, 2, ..., N) which are adapted to minimize the clock frequency prediction error.

10. A complex low-power integrated system according to claim 9, comprising a digital signal processor which implements said adaptive prediction filter (200), said digital signal processor being adapted to calculate the minimized frequency prediction error and thus to calculate a minimized sleep duration of the processing core by applying a similarity measure.

11. A complex low-power integrated system according to claim 10, wherein said similarity measure is given by the least mean square optimization criterion.

12. A method for controlling the performance and power consumption of a complex low-power integrated system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing, wherein an adaptive prediction filtering algorithm for predicting (S2) the regularity of the processing core's clock frequency (f c ) without requiring information about a scheduled processing load is applied which executes a look-ahead prediction while the sleep time ratio is monitored (Sl) in a sliding observation window for N subsequent time slices.

13. A method according to claim 12, wherein said adaptive prediction filtering algorithm is based upon a filtering model us- ing a linear finite impulse response filter (200) with (N+ 1) filter coefficients.

14. A method according to anyone of claims 12 or 13, wherein the adaptive prediction filtering algorithm provides amplification, summation and delay operations for calculating a predicted clock frequency (/ c " + 1 ) at a time slice (n+X) directly succeeding a current time slice (ή) within said sliding observation win-

dow as a weighted average of measured clock frequencies (// , f c " ~l , f " ~2 , ■ ■ ■, f c " ~N ) at time slices (n, n-\, n-2, ..., n-N) preceding said time slice (n+l), thereby using real- valued weighting coefficients {at \ k = 0, 1, 2, ..., N} which are adapted to minimize the clock frequency prediction error.

15. A method according to claim 14, comprising the step of calculating the minimized frequency prediction error and thus calculating a minimized sleep duration of the processing core by applying a similarity measure.

16. A method according to claim 15, wherein said similarity measure is given by the least mean square optimization criterion.

17. A software program product for controlling the performance and power consumption of a complex low-power integrated system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing when being installed and running on the system, wherein an adaptive prediction filtering algorithm for predicting (S2) the regularity of the processing core's clock frequency (f c ) without requiring information about a scheduled processing load is applied which executes a look-ahead prediction while the sleep time ratio is monitored (Sl) in a sliding observation window for N subsequent time slices.

18. A software program product according to claim 17, wherein said adaptive prediction filtering algorithm is based upon a filtering model using a linear finite impulse response filter (200) with (N+ 1) filter coefficients.

19. A software program product according to anyone of claims 17 or 18, wherein the adaptive prediction filtering algorithm provides amplification, summation and delay operations for calculating a predicted clock frequency (/ c " + 1 ) at a time slice (n+X) directly succeeding a current time slice (ή) within said sliding observation win- dow as a weighted average of measured clock frequencies (// , f c " ~l , f c " ~2 , ■ ■ ■, f c " ~N ) at time slices (n, n-\, n-2, ..., n-N) preceding said time slice (n+l), thereby using real- valued weighting coefficients {at | k = 0, 1, 2, ..., N) which are adapted to minimize the clock frequency prediction error.

20. A software program product according to claim 19, comprising the step of calculating the minimized frequency prediction error and thus calculating a minimized sleep duration of the processing core by applying a similarity measure.

21. A software program product according to claim 20, wherein said similarity measure is given by the least mean square optimization criterion.

Description:

DESCRIPTION

Non-Recursive Adaptive Filter for Predicting the Mean Processing Performance of a Complex System's Processing Core

FIELD OF THE INVENTION

The present invention proposes a method for minimizing the power consumption of a complex low-power integrated system's processing core and a non- recursive adaptive filter that is adapted to perform a processor load prediction of a complex system's processing core so as to minimize its processing clock frequency and thus being able to reduce power consumption of the entire processing subsystem. Thereby, a power-efficient filter implementation is provided for running the adaptive filter on a digital signal processor.

BACKGROUND OF THE INVENTION

There has been tremendous progress in semiconductor technology since the first ICs were introduced in the 1960's. Minimum feature sizes, i.e., minimum di- mensions of integrated semiconductor structures, have become much smaller, and die sizes have increased. Consequences of this technology scaling trend are reduced device capacitances, higher integration densities, performance improvements and increased circuit complexities. Whereas circuit performance and the chip area were the major issues in IC design in the past, power consumption is now another major design criterion. This development has been driven mainly by the rapid growth of the portable consumer electronics market, where system running time, battery weight, and battery volume are critical parameters. The aforementioned increase in integration density and circuit performance, however, has led to enormous on-chip power and power densities. Since excessive total power and power density cause serious reliability problems, power con- sumption is no longer a specific problem of mobile applications. In fact, it is equally critical, if not more, in the design of high-performance ICs for non-battery-powered applications.

Throughout the last ten years, numerous approaches to low power design have been proposed. These include both software and hardware optimization strategies. Aside from regulation circuitries supporting advanced voltage scaling and architecture- driven voltage scaling strategies based on process parallelization and pipelining, sys- tem-based power management techniques are frequently employed.

Power management reduces the amount of energy wasted whenever parts of a system are not needed at all or not at full speed. With power management schemes the functionality and the performance of a system or circuit are adjusted to time-variant requirements. Examples of such methods are power supply shutdown, dynamic power management, clock gating, and adaptive supply voltage scaling.

In a simple embodiment of power management, a system component, e.g. a particular chip, is completely separated from the power supply via an external controllable regulator during idle periods. This is an effective way of avoiding unnecessary static and dynamic power dissipation in inactive components that does not complicate the design of the component to be shut down. A power manager unit (PMU) that controls the regulator is completely external, and the power supply pins are the only required interface to the power-managed component. Thus, the component can be de- signed in the traditional way without the need for any special power management support to be implemented. Major drawbacks of this power supply shutdown approach are the following. Firstly, there is a large power-on delay, which is the time it takes for the supply voltage to stabilize after being switched on again. Secondly, registers and other non-permanent memory cells lose their content.

A power supply shutdown can, in principle, be applied to blocks within an integrated circuit instead of to the entire chip. This, however, requires the power supply infrastructure on the chip to be modified such that the power supply nets of different blocks are separated from each other and made accessible from the exterior via separate pins. As a consequence, power supply shutdown is restricted to chips in their entirety or to a small number of large blocks on a chip.

For better understanding the principles of an efficient power management, a brief description of how dynamic power consumption Pd yn of conventional CMOS-based low-power integrated semiconductor circuitries can be estimated and controlled will now be given. In digital CMOS circuits, which are used in the majority of microprocessors, power consumption can be modeled quite accurately by simple equations. CMOS circuits have both dynamic and static power consumption. Whereas static power consumption caused by bias and leakage currents usually remains under 1 mW, the dynamic component is the dominant source of power consumption for most of the CMOS microprocessors which are available on the market. Every transition of a digital circuit consumes power, because every charge and subsequent discharge of the digital circuit's capacitance results in a dissipation in the circuit's resistive components. As described in the article ,,Processor design for portable systems" (Journal of VLSI Signal Processing, August 1996) by T. Burd and R. Brodersen, dynamic power consumption of a CMOS microprocessor can be estimated by

P dyn = ∑ c m -f m - u D 2 D [W], (l) m = 1

where M denotes the total number of gates in the circuit, C m the load capacitance of gate gm,fm the specific switching frequency of gate g m (with m e {1, 2, ..., M)), and U DD the circuit's supply voltage. It follows from equation (1) that a reduction of U DD is the most effective way to lower dynamic power consumption Pdyn- Lowering U DD , however, creates the problem of increased circuit delay. An estimation of this circuit delay is given by

U DD

(u G - u T ) 2 ' (2)

where τ [ns] is the propagation delay of the CMOS transistor, U T [V] the threshold voltage, and U G the input gate voltage. This propagation delay restricts the clock frequency f c in a microprocessor. From equations (1) and (2) it follows that there is a fundamental tradeoff between switching speed and supply voltage. Processors can operate at a lower

supply voltage, but only if clock frequency / c is reduced correspondingly to tolerate the increased propagation delay τ. When assuming that dynamic power Pd yn is dominant and that gates g m of the microprocessor form a collective switching capacitance C s with a common clock frequency^., it can be obtained that

P dyn = C s - f c - U D 2 D [W]. (3)

Equation (3) shows that a clock frequency reduction linearly decreases power, and that voltage reduction results in a quadratic power reduction. The critical path of a microprocessor is the longest path a signal must travel in a clock cycle T c = 1 / f c . The implicit constraint is that the propagation delay τ of the critical path must be smaller than T c . In fact, the microprocessor ceases to function when U DD is lowered and propagation delay τ becomes too large to satisfy internal timings at clock frequency /J. (see equation (2)). For a given clock frequency /J., voltage scaling is then the mechanism to minimize power consumption.

Complex low-power integrated electronic systems, such as e.g. high-end cellular mobile terminals, personal computers, workstations, notebooks, laptops, organizers, personal digital assistants, pocket calculators and other wireless or wire-bound, battery- or means-powered computing, communication and/or information processing devices, often apply advanced dynamic power management (DPM) schemes. Such systems contain various power-manageable components (PMC) controlled by a PMU. Each PMC provides a number of high performance, low power and sleep modes/states. The PMU, which may either be implemented in hardware or in software, continuously ob- serves the system and puts the PMCs in appropriate states according to the actual requirements at certain points in time.

Dynamic power management is widely used in modern notebook computers and, hence, special notebook processors are designed as PMCs. This requires the instruction set, the clock network, the interrupts, etc. to be adapted to the requirements of dynamic power management. Most processors support different low power and sleep modes. In some modes, idle modules within the processors are not separated from the

power supply as in the power supply shutdown approach. Instead, the respective parts of the clock network are switched off. If all inputs of the modules to be switched off are registered, there is absolutely no switching activity and, hence, no dynamic power dissipation in the idle modules. This technique is called global clock gating. In other modes, certain modules are actually separated from the power supply via internal switches in the power supply nets. Finally, for modules which are not completely idle but also not fully utilized, the clock frequency or the supply voltage or both may be momentarily reduced.

Although designing a PMC requires a significant amount of additional design effort, the most challenging task is the development of an effective power management policy (PMP) and its implementation as PMU firm- or software. This software should know about the power characteristics of all modules and be aware of the inevitable performance degradation and power overhead associated with going to and returning from different low power and sleep modes. An effective PMP should reliably predict the idle time of a module and accurately calculate the net power reduction.

The Advanced Power Management (APM) specification was the first industry standard in the field of DPM and has only recently been replaced by the more powerful Advanced Configuration and Power Interface (ACPI).

Local clock gating is another popular power management technique that requires only moderate additional design effort. It is frequently used in digital signal processors (DSPs), application specific processors, embedded processors and the like, but can be applied to practically any type of circuit. With local clock gating, the control signals that are used to deactivate certain parts of the clock network are locally generated in hardware. In principle, arbitrarily small sub-circuits can be deactivated in this way. Since power management based on local clock gating is rather an architectural- level than a high-level technique.

A relatively new power management approach is adaptive supply voltage scaling. This is a very attractive technique for dynamic power optimization if the requirements on the performance of a chip vary continuously over time. Instead of just

switching off idle components of a system or idle modules on a chip, the clock frequency and the supply voltage are continuously adjusted to the instantaneous performance demand.

As mentioned above, a complex system, such as e.g. a high-end cellular mobile phone, requires measures to minimize power consumption of its major power supplied circuit elements. In the digital domain, the most power-consuming entities typically are processing cores. To reduce its power consumption, a processing core's supply voltage U DD must be reduced to its bare minimum. The low voltage limiting fac- tor for a supply voltage is a critical parameter for the processing delay τ, which is assumed to be shorter than clock cycle time T c of the processing core. The slower the clock frequency /J., the higher the tolerable delay τ and the lower the tolerable supply voltage U DD - On the other hand, said clock frequency / c must be high enough to perform a task in a given time frame. It can be observed that the processing performance re- quirements (in MIPS) usually vary over time. And typically, there is some regularity in the processing load over time (the processing profile"). As there is an opportunity to save power by adapting operation clock frequency /J. to the bare minimum of what is required for a certain time period to accomplish a task, a prediction of the required clock speed is needed. But clock frequency /J. prediction is everything but trivial, as the clock frequency requirement depends on parameters such as e.g. on the operation mode of the system, the behavior of the user and the air interface condition (e.g. the signal strength of a received wireless signal or a wireless signal to be transmitted). Analytic prediction of clock frequency /J. is therefore a very difficult task. Hence, an algorithm that predicts the clock frequency of a complex system's processing core when being executed on said complex system is needed.

US 2003 / 0 217 296 Al describes a method and an apparatus for performing adaptive runtime power management in an information processing system employing a central processing unit (CPU) and an operating system (OS). A CPU cycle tracker (CCT) module monitors critical CPU signals and generates CPU performance data based on the critical CPU signals. An adaptive CPU throttler (THR) module uses the CPU performance data, along with a CPU percent idle value fed back from the operating system, to generate a CPU throttle control signal during predefined runtime seg-

ments of the CPU run time. The CPU throttle control signal links back to the CPU and adaptively adjusts CPU throttling and, therefore, power usage of the CPU during each of the runtime segments.

SUMMARY OF THE INVENTION

Although there are means to estimate a complex system's processing performance (in million instructions per second, MIPS) for the next time slot by monitoring an open operating system scheduler queue, not every processing core has an open oper- ating system installed on it. It may thus be an object of the present invention to provide a suitable measure for predicting clock frequency^, without requiring knowledge of the software load scheduled in the operating system. Moreover, prediction during runtime is required as the software may be too complex to predict performance requirements during design time.

Typically, every software application has some sort of a main program, a RISC OS Toolkit (RTK) - a class library for developing RISC OS application programs in C++, which differs from other such libraries currently available for RISC OS in its support for automatic layout by specifying the relationship between different visual components (for example, the fact that they are arranged in a grid or a column), thus eliminating the need for a template editor and allowing a layout to change at runtime to accommodate varying content - or at least a simple scheduler that calls tasks and detects idle states, where then the processor clock can be stalled to save power. But as mentioned above, from power perspective it is more efficient to reduce clock frequency^, to just accomplish tasks just in time rather than run and sleep.

The present invention is therefore dedicated to a power management unit and a method for minimizing the power consumption of a complex low-power integrated system's processing core. Thereby, an adaptive filter is used to predict the regu- larity of the clock frequency in the processing core. By using this information, the adaptive filter predicts the duration of how long the processing core may lower its operating voltage to still be able to complete all its tasks in time. A power-efficient filter implementation is provided for running the adaptive filter on a digital signal processor.

As mentioned above, a plurality of battery- and non-battery-powered applications and devices comprise suitable power management tools to manage the idle times of their processing cores. Furthermore, there is usually a time basis in each infor- mation processing system. So the time where the system is allowed to sleep can be measured. In this connection, the present invention is dedicated to a suitable means for predicting clock frequency requirements by monitoring the sleep time ratio in a sliding observation window representing a time frame of N subsequent time slices, thereby using a non-recursive filtering model realized by an adaptive finite impulse response filter to execute a look-ahead prediction. As there is always some periodic behavior to be expected in a software processing profile, a finite impulse response filter can be used to detect this regularity by means of adaptive filter coefficients which are updated after the prediction for each one of a given set of subsequent time slices within a sliding observation window. For example, an algorithm which is based on the least mean square (LMS) optimization criterion can be applied to minimize sleep duration by stretching the clock periods in the particular time slices to their maximum tolerable.

To be more precise, a first exemplary embodiment of the present invention relates to a power management unit for controlling the performance and power con- sumption of a complex low-power integrated system's processing core by automatically reducing them to a certain level where outstanding computational operations and software tasks can be performed just in time for further processing. The power management unit may be implemented as an instance comprising or having access to an adaptive prediction filter which predicts the regularity of the processing core's clock frequency /J. without requiring information about a scheduled processing load. According to the present invention, this is accomplished by monitoring the sleep time ratio in said sliding observation window and executing a look-ahead prediction.

The adaptive prediction filter mentioned above may e.g. be realized as a linear finite impulse response filter with (N+ 1) filter coefficients, wherein said filter provides amplification, summation and delay elements for calculating a predicted clock frequency f c " + l at a time slice (n+l) directly succeeding a current time slice n within

said sliding observation window as a weighted average of measured clock frequencies f c ■> f c " ~l > f c " ~2 > ■ ■ ■■ > f c " ~N at a number of time slices (n, n-\, n-2, ..., n-N) preceding said time slice (n+1), thereby using real- valued weighting coefficients {cik \ k = 0, 1, 2, ..., N] which are adapted to minimize the clock frequency prediction error.

The adaptive prediction filter may e.g. be implemented by a digital signal processor which is adapted to calculate the minimized frequency prediction error and thus to calculate a minimized sleep duration of the processing core by applying a certain similarity measure, wherein the latter may e.g. be given by the least mean square opti- mization criterion.

According to the present invention, the aforementioned complex low- power integrated system may e.g. be given by a high-end cellular mobile terminal, a workstation, a notebook, a laptop, an organizer, a personal digital assistant (PDA), a pocket calculator or any other wireless or wire-bound, battery- or means-powered computing, communication and/or information processing device.

A second exemplary embodiment of the present invention relates to a method for controlling the performance and power consumption of a complex low- power integrated system's processing core by automatically reducing them to a level where outstanding computational operations and software tasks can be performed just in time for further processing. Thereby, an adaptive prediction filtering algorithm for predicting the regularity of the processing core's clock frequency^, without requiring information about a scheduled processing load is applied which executes a look-ahead prediction while the sleep time ratio is monitored in a sliding observation window for N subsequent time slices.

The adaptive prediction filtering algorithm mentioned above may e.g. be based upon a filtering model using a linear finite impulse response filter with (N+1) filter coefficients, wherein the adaptive prediction filtering algorithm may provide amplification, summation and delay operations for calculating a predicted clock frequency f c " + l at a time slice (n+1) directly succeeding a current time slice n within said sliding

observation window as a weighted average of measured clock frequencies f " , f c " ~l , f c " ~2 , ..., f c " ~N at a given number of time slices (n, n-\, n-2, ..., 1) preceding said time slice (n+l), thereby using real- valued weighting coefficients {at | k = 0, 1, ..., N) which are adapted to minimize the clock frequency prediction error.

Said method may e.g. comprise the step of calculating the minimized frequency prediction error and thus calculating a minimized sleep duration of the processing core by applying a similarity measure, wherein the latter may e.g. be given by the least mean square optimization criterion.

According to a third and a fourth exemplary embodiment, the present invention further refers to a complex low-power integrated system comprising a power management unit as described above and to a software program product for executing the above-described method when being installed and running on the system, respec- tively.

BRIEF DESCRIPTION OF THE DRAWINGS

Advantageous features, aspects, and advantages of the invention will be- come evident from the following description, the appended claims and the accompanying drawings. Thereby,

Fig. 1 shows a block diagram of a computer system using a power management unit as known from the prior art,

Fig. 2 shows a schematic block diagram of a linear non-recursive adaptive prediction filter used by the proposed power management unit according to the first exemplary embodiment of the present invention, and

Fig. 3 shows a flowchart that illustrates the proposed method according to the second exemplary embodiment of the present invention.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

In the following, the above-described power management unit and method will be explained in more detail with respect to special refinements and referring to the accompanying drawings and in comparison to the prior art. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but, on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

Referring now to the drawings, Fig. 1 shows a block diagram of a com- puter system 100 including a power management unit as known from prior-art document EP 0 666 527 Al, the disclosure of which being herewith incorporated by reference for illustrating the interconnections and interactions of the particular system components in a information processing system to which a power management unit as proposed by the present invention can advantageously be applied. As depicted within the

fϊgure, said computer system 100 comprises a microprocessor as given by central processing unit 102 (CPU), which may e.g. be realized as a model 20486 microprocessor, a system memory 104 as well as a peripheral device 108. Furthermore, said computer system 100 comprises a power switching unit 110, a clock generator 112 and a power management unit 120. Clock generator 112 is used for generating a CPU clock signal and a system clock signal, and power switching unit 110 provides power to the various components of the computer system. Peripheral device 108 is illustrative of, for example, a variety of peripheral devices such as e.g. a keyboard, a printer, a modem, etc.

As can further be taken from Fig. 1, power management unit 120 comprises a power control unit 122 coupled to power switching unit 110 as well as a clock control unit 124 coupled to clock generator 112. Power management unit 120 further includes a decoder 126, a mask register 128, a ready counter 130, a doze counter 132, a stand-by counter 134, and a power management state register 136 coupled to a bus 138. Finally, power management unit 120 also comprises a system monitor 140 coupled to mask register 128, and a power management state machine 142 coupled to power control unit 122 and clock control unit 124. Thereby, power management unit 120 is provided to regulate and minimize the power consumed by computer system 100. For the embodiment of Fig. 1, power switching unit 110 is controlled to selectively provide power to microprocessor 102, system memory 104, and peripheral device 108 depending upon the state of power management unit 120. Clock generator 112 may be similarly controlled such that the frequencies of the CPU clock signal and the system clock signal are varied depending upon the state of power management unit 120, as will be described in greater detail below.

Fig. 1 shows that power control unit 122 and clock control unit 124 control the power switching unit 110 and clock generator 112, respectively, depending upon the internal state of power management state machine 142. This power management state machine 142 may e.g. have a ready state, a doze state, a stand-by state, and a sus- pend state. During ready state, computer system 100 is considered full-on. All components of the computer system 100 are clocked at full speed and are powered-on. Power management state machine 142 enters the ready state upon power-up of the computer system and upon reset. Power management state machine 142 also enters the ready state

when primary system activity is detected by system monitor 140 or in response to software writing of a ready state value into power management state register 136, as will be described below.

Transitions of power management state machine 142 from the ready state to the doze state if the computer system 100 is idle for a programmable amount of time are determined by ready counter 130 and system monitor 140. Power management state machine 142 can alternatively enter doze state via software writing of a doze state value into power management state register 136. During doze state, clock control unit 124 controls clock generator 112 such that the CPU clock signal is slowed down to a preprogrammed frequency. It is noted that during doze state, the system clock signal continues to be driven at maximum frequency and all components are powered-on.

Transitions of power management state machine 142 from the doze state to the stand-by state if the system is idle for a programmable amount of time without any primary activities occurring are determined by doze counter 132 and system monitor 140. The power management state machine 142 can alternatively enter the stand-by state via software writing to the power management state register 136. During the standby state, power control unit 122 causes the power switching unit 110 to remove power from peripheral device 108. In addition, during stand-by state, clock control unit 124 causes clock generator 112 to turn-off the CPU clock signal. The system clock signal thereby continues to be driven at maximum frequency.

Transitions of power management state machine 142 to the suspend state from the stand-by state if the system is idle for a programmable amount of time without any primary activities occurring are determined by stand-by counter 134 and system monitor 140. Thereby, power management state machine 142 may alternatively enter the suspend state via software writing of a suspend state value into power management state register 136. When power management state machine 142 is in the suspend state, power control unit 122 causes power switching unit 110 to remove power from peripheral device 108, and clock control unit 124 causes clock generator 112 to stop both the CPU clock signal and the system clock signal. Depending upon the system, the power

control unit 122 may further cause power switching unit 110 to remove power from microprocessor 102 and system memory 104.

Decoder 126 is provided for decoding I/O write cycles executed on bus 138 by, for example, microprocessor 102. During such I/O write cycles, mask register 128, ready counter 130, doze counter 132, stand-by counter 134, and power management state register 136 may be loaded with various data that controls the power management unit 120. Data is provided to the mask register 128, ready counter 130, doze counter 132, stand-by counter 134, and power management state register 136 from bus 138 via internal data bus 150. It is noted that bus 138 may be coupled to microprocessor 102 directly or through a bus bridge.

System monitor 140 monitors the microprocessor 102, system memory

104, and other system components to determine whether certain primary activity is oc- curring. For example, system monitor 140 may monitor the CPU local bus to determine whether certain cycles are currently being executed. System monitor 140 may similarly monitor various interrupt signals to determine the initiation of primary system activity.

If system monitor 140 detects primary system activity, a signal labeled ,,Primary System Activity" is provided to power management state machine 142. Mask register 128 allows the programmer to mask certain activities that are normally detected by system monitor 140. For example, the system programmer may desire to prevent activities of a video monitor from being considered ,,primary activity" by system monitor 140. Accordingly, the mask register 128 may be set such that activities of the video monitor are ignored.

As stated previously, said power management state register 136 may be software programmed with one of several predetermined state values that controls the current state of power management state machine 142. A particular state value is written into power management state register 136 by executing an I/O write cycle on bus 138. Power management state register 136 thus accommodates Advanced Power Management (APM) software.

Ready counter 130, doze counter 132, and stand-by counter 134 may be configured within the system to protect against misbehaved software that does not operate according to, for example, the Advanced Power Management software standard. During operation, the ready counter 130 is loaded with a value that causes the ready counter 130 to count a period of time. As stated above, upon lapse of this programmable amount of time, power management state machine 142 makes the transition from the ready state to the doze state if primary system activity is not detected by system monitor 140. Similarly, doze counter 132 may be loaded with a value that causes the doze counter 132 to count for a programmable amount of time. Doze counter 132 controls the doze time-out period which causes power management state machine 142 to transition from doze state to stand-by state if primary system activity is not detected by system monitor 140. Finally, stand-by counter 134 may be loaded with a value that causes stand-by counter 134 to count a programmable amount of time.

The stand-by counter 134 controls the time-out period which causes the power management state machine 142 to transition from the stand-by state to the suspend state if primary activity is not detected by system monitor 140. The power management state machine 142 remains in suspend state until primary system activity is detected by system monitor 140 or until power management state register 136 is soft- ware written with a new state value. Primary system activity that causes power management state machine 142 to transition from the suspend state to the ready state may be, for example, the detection of a keyboard entry. It should further be noted that the ready counter 130, the doze counter 132, and the stand-by counter 134 are each reset when primary system activity is detected by system monitor 140. In accordance with the power management unit 120 described above,

Advanced Power Management software may be employed to control the state of the power management unit 120 via software I/O writes to power management state register 136. Power management unit 120 thereby protects against misbehaved software by providing ready counter 130, doze counter 132, and stand-by counter 134. If primary activ- ity is undetected for an amount of time programmed within the various counters, power management state machine 142 successively enters several power reducing states during which the power to various components of the computer system may be removed and during which the frequencies of the CPU clock signal and the system clock signal may

be reduced (or stopped). Thus, the power consumed by the computer system 100 is reduced even if software that is incognizant of the Advanced Power Management software standard is employed.

According to the first exemplary embodiment of the invention, a linear adaptive finite impulse response (FIR) prediction filter having a non-recursive filtering structure as depicted in Fig. 2 is proposed. Therein, delay elements 206 allow an observation of a given processing core's clock frequency / c at discrete times t n = h + n At (with n € INo) within a sliding observation window. As can be taken from this figure, a discrete time-domain signal x[n] representing this clock frequency at time slice n, in the following denoted as f" , is fed to the FIR filter's input port. The discrete time-domain output signal x[n-k] of the FIR filter's k-th delay element 206 (with k e {1, 2, ..., N}) reflects the clock frequency at time slice (n-k), in the following denoted as f" ~k . The predicted clock frequency at a time slice (n+l) directly succeeding a current time slice n, in the following referred to as f c " + l and represented by the discrete time-domain signal y[n] at the FIR filter's output port, is calculated as a weighted average of measured clock frequencies f c " , f c " ~l , f c " ~2 , ■ ■ ■, f c " ~N at time slices (n, n-\, n-2, ..., n-N) preceding time slice (n+l) of the prediction, wherein these measured clock frequencies are represented by discrete time-domain signal x[n] at the FIR filter's input port and its time-delayed versions jψz-1], x[n-2], ..., x[n-N], respectively. Fig. 2 further shows that signals x[n], x[n-\], x[n-2], ..., x[n-N] are weighted with a set of filter coefficients {cik k = 0, 1, 2, ..., ./V) that are specially adapted to minimize the clock frequency prediction error (e.g. the processing core's sleep duration). In frequency domain, the filtering procedure executed by prediction filter 200 can be expressed by the transfer function

using

X(z) = Z{x[n]} = ∑ [Vw] (ne JN 0 ) and (4b)

Y{z) = Z{y[n]}= ∑ y[n]-z- [Vw] (n ε JN 0 ), (4c)

with {z Cj k} k = 1; ; J v -4 , which are given by a function of filter coefficients {α^} * = o , , N - I > being the N- 1 zeros of transfer function H(z), said zeros z ClJ t having an order O c ,k in a range between 1 and N-I, and {z c ,k} k = l N with z c ^ = z c ^.\ = ... = z c ,\ being an //-fold pole at z = 0 of said transfer function H(z). Thereby, H(z) can be obtained by applying a one-sided z transform to the impulse response

y[n] = x[n]* h[n]

= x[n]* ∑ a k -δ [n-k]= ∑ a k -x[n-k] [Vw] (/i e JN 0 ), (5a) i = 0 £ = 0

with

jc [/i] = Z- χ {x{z)} = -^- (f X(Z)-Z" "1 dz [Vw ] and (5b)

j

y[n] = Z- l {Y{z)}= -^^Y{z)-z"- 1 dz [Vw] (5c)

2π-7

and

δ (?) := I (6a, b)

being Dirac's delta function, and solving the hereby obtained equation, which is given in the form Y(z) = H(z) X(z), for transfer function H(z). Therein, z := e σ ■ e j2π f = e σ +J 2π f is a complex- valued substitution variable representing a real- valued frequency/, e σ is a real- valued weighting factor for the magnitude of said

substitution variable z, j := V-T represents the imaginary unit and curve C is a closed integration path around z = 0 for calculating above circulation integrals, which may e.g. be realized as a circle \z\ = R having a radius R being greater than the respective convergence radii px, pr, and P H of transfer functions X(z), Y(z) and H(z). In time domain, this filtering process can be expressed by the corresponding impulse response h[n] of said transfer function H(z):

h[n] = Z- l {H{z)} = -^— . J Y( Z ). Z ^ dz = £ a k - δ [n - k ]. (7)

2π • j k = 0

In equation (5 a), discrete filter input signal x[n] can be identified as clock frequency f " at a current time slice n, discrete signals jψz-1], x[n-2], ..., x[n-N] can be identified as N measured clock frequencies f c " ~ l , f c " ~2 , ..., f " ~N at past time slices (n-\), (n-2), ..., (n-N), and discrete filter output signal y[n] can be identified as the predicted clock frequency / c " + 1 at time slice (n+1) directly succeeding the current time slice n and thus as a prediction for x[/?+l].

For calculating the (N+1) filter coefficients {cik \ k = 0, 1, 2, ..., N} of the non-recursive, adaptive filter, a similarity measure, such as e.g. the least mean square

(LMS) optimization criterion or any other optimization criterion, may be used for minimizing the prediction error. In case of using an LMS optimization criterion, said prediction error is given in the form

* • χ [» - *i (8)

wherein a := [ao, a\, α 2 , ..., α#] r G R w+1 denotes a coefficient vector whose elements are to be optimized by using necessary condition

e (a ) = MM = 0 (9)

in conjunction with the two sufficient conditions

det H- 2 (a opt )>0 and (10a)

Thereby,

with H 2 ( fl ) e R ( ( N+ V)X(N+ V)

is the ηesse matrix of said mean square error e 2 (a), a opt := [ ά 0 , ά γ , ..., ά N ] τ denotes an optimized parameter vector whose elements are given by a set of (N+ 1) optimized parameters ά 0 , ά γ , ..., ά N , and the argument of multivariate prediction error function e 2 (a) as described above is given by coefficient vector α. In equation (10b), { λ k (H-2 (α opt )) } k = o, l, ..., N denote the eigenvalues of ηesse matrix H_ 2 opt ) , which can be calculated by solving characteristic equation

det (H_, (α opt ) - λ k (H_ 2 (α ° pt )) / ) = 0 (12)

for unknown variables λ k (k € {0, 1, ..., N}), and

/ := diαg (l,l, ..., l) = (δ η ) Me{01 >JV} , with (13a)

(7V+1) matnx elements

δ y := !M° r ^ = / , for ^,/e {0,l,...,N} (13b)

[ 0, for k ≠ I

being the Kronecker delta, denotes the (/V+l)x(/V+l)-dimensional identity matrix. The components of optimized parameter vector a ° pt := [ O 0 5 O 1 , ..., a N ] T = arg min

e 2 (a ) are then substituted into the right side of equation (5a) instead of filter coefficients ao, ci\, a,2, . .. , QN in order to make the prediction as good as possible.

In this context, it should further be noted that measures must be applied to avoid clock underrun (which means the case where a task is not completed in time).

This can be done either by applying a clock frequency margin Af c or through a ,,panic mode", where a higher clock frequency is applied in case that the timing gets overcriti- cal.

The implementation of this adaptive FIR filter may e.g. be done in such a way that a track record of selected clock frequency values is stored in a random access memory (RAM) of a component comprising the processing subsystem. By means of a shared memory concept, either the same or another processing entity could run the filter algorithm to calculate the optimum clock frequency f c " + l for the time slice (n+1) which directly succeeds a current time slice n. As a digital signal processor is especially suited for filter implementations and as the digital signal processor can operate in a very power efficient mode, it is recommended to use a digital signal processor for this task.

In Fig. 3, a flowchart which illustrates the proposed method according to an exemplary embodiment of the present invention is shown in form of an endless loop. After having initialized (SO) the start position of a sliding observation window which represents a time frame of N subsequent time slices, a look-ahead prediction for predict- ing the clock frequency /J. of a complex low-power integrated system's processing core whose performance and power consumption are to be controlled is executed (S2) based on the monitored (Sl) sleep time ratio of said processing core within this observation window. As indicated above, this prediction may e.g. be executed by calculating a predicted clock frequency f c " +l at a time slice (n+1) directly succeeding a current time slice (n) within this observation window as a weighted average of measured clock frequencies f " , f " ~x , f " ~2 , ..., f " ~N at time slices (n, n-\, n-2, ..., n-N) preceding said

time slice (n+1), thereby using a set of real- valued weighting coefficients {cik \ k = 0, 1,

..., N] which are specially adapted to minimize the clock frequency prediction error and thus to calculate a minimized sleep duration of the system's processing core by applying a similarity measure, such as e.g. given by the least mean square criterion. After that, the window start position of the sliding observation window is incremented (S3), and the procedure is continued again with step S 1.

APPLICATIONS OF THE PRESENT INVENTION

The invention can advantageously applied to multi-tasking and multi- threading systems with varying processing loads. Aside from being applied for clock rate based power management tasks which arise in the scope of personal computers, workstations, notebooks, laptops, organizers, personal digital assistants, pocket calculators, etc., the invention can also be applied to high-end cellular mobile terminals where baseband processing units and application processing units are implemented by a multi- processor concept with up to ten processors which have to be controlled in a coordinated way. Moreover, the invention may be used for power management of any other wireless or wire-bound, battery- or means-powered computing, communication and/or information processing devices.

While the present invention has been illustrated and described in detail in the drawings and in the foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, which means that the invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word ,,comprising" does not exclude other elements or steps, and the indefinite article ,,a" or ,,an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indi- cate that a combination of these measures can not be used to advantage. A computer program may be stored/distributed on a suitable medium, such as e.g. an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as e.g. via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope of the invention.