Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
OPTIMIZATION SYSTEM AND METHOD FOR INTEGRATED CIRCUITS INCLUDING MULTI-PHASE LEVEL-SENSITIVE LATCHES
Document Type and Number:
WIPO Patent Application WO/2018/132131
Kind Code:
A1
Abstract:
The present invention includes an optimization method for an integrated circuit including multi-phase level-sensitive latches. Constraining a signal arrival time at a level-sensitive latch against the arrival time of a clock signal that is determined based on the path that the signal propagates through, the optimizer modifies the netlist of an integrated circuit including multi-phase level-sensitive latches by statistical logic optimization, level-sensitive latch removal and insertion, and/or insertion of timing error predicting and/or detecting apparatus. The optimizer is used in conjunction with other automatic design tools such as an automatic cell placer and wire router.

Inventors:
LIU BAO (US)
Application Number:
PCT/US2017/047039
Publication Date:
July 19, 2018
Filing Date:
August 16, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
LIU BAO (US)
International Classes:
H03K3/037
Foreign References:
US20090055786A12009-02-26
US20130086444A12013-04-04
US20070200597A12007-08-30
US20130214816A12013-08-22
US6763475B12004-07-13
Download PDF:
Claims:
What is claimed is:

1. A computer readable medium for storing instructions that when executed by a computer implement a computerized method of optimizing an integrated circuit including multiphase level-sensitive latches, said instructions of said computer readable medium implementing the method of

a) receiving a netlist of cells with timing characterization;

b) computing signal arrival times through signal propagation paths including those within a logic stage and across level-sensitive latches;

c) constraining a signal arrival time at a level -sensitive latch against the arrival time of a clock signal that turns said level-sensitive latch opaque; and d) modifying the netlist to meet said signal arrival time constraints.

2. A computer readable medium as described in claim 1 wherein signal arrival time

computation comprises the steps of

bl) setting arrival times for primary inputs;

b2) for each cycle c in a given range;

b3) for each phase p in a given range;

b4) setting arrival times for clocks of phase p in cycle c;

b5) computing signal arrival times across clock networks, at latch outputs, across combinational logic networks, and across any transparent level-sensitive latches; b6) exiting if no time borrowing occurs for all phases in the current cycle.

3. A computer readable medium as described in claim 1 wherein signal arrival time

computation comprises the steps of

bl) setting arrival times for primary inputs and clocks of all phases in the first cycle; b8) computing all signal arrival times across clock networks, at latch outputs, and across combinational logic networks;

b9) for each cycle c in a given range;

blO) for each phase p in a given range;

bl 1) advancing arrival times of clocks of phase p to the next cycle;

bl2) repeating b 8); and

bl3) exiting if no time borrowing occurs for all phases in the current cycle.

4. A computer readable medium as described in claim 1 wherein a signal arrival time at a level-sensitive latch data input is checked against the arrival time of a clock signal that is identified by

cl) tracing the path wherein said signal propagates;

c2) finding the clock signals that turn the level-sensitive latches in said path transparent; and

c3) returning the subsequent clock signal that turns the level-sensitive latch at the end of the path opaque.

5. A computer readable medium as described in claim 1 wherein signal arrival times are computed by a statistical method that takes parametric variations.

6. A computer readable medium as described in claim 1 wherein the netlist is modified by level-sensitive latch removal and insertion.

7. A computer readable medium as described in claim 6 wherein netlist modification comprises the steps of:

dl) finding the first logic gate g in a timing-critical path;

d2) finding all the level-sensitive latches L that send signals through a path across combinational logic to g;

d3) finding all the fanouts F of a net in a signal propagation path from a level- sensitive latch in L to g;

d4) removing all the level-sensitive latches in L;

d5) inserting level-sensitive latches driving all the fanouts in F; and

d6) inserting a level-sensitive latch at the output of g.

8. A computer readable medium as described in claim 6 wherein netlist modification

comprises the steps of:

d7) finding the last logic gate g in a timing-critical path;

d8) finding all the level-sensitive latches L that receive signals through a path across combinational logic from g;

d9) finding all the side inputs / of a logic gate in a signal propagation path from g to a level-sensitive latch in J;

dlO)removing all the level-sensitive latches in J;

dl l)inserting level-sensitive latches driving all the side inputs in /; and

dl2)inserting a level-sensitive latch at each input of g.

9. An apparatus including a finite state machine that takes inputs signals from

combinational logic networks across multiple clock phases, and gives a signal when a signal propagates through a timing-critical path.

10. A computer readable medium as described in claim 1 wherein modifying the netlist includes insertion of an apparatus as described in claim 9.

11. A computer readable medium as described in claim 10 wherein modifying the netlist comprises the steps of

el) identifying a timing-critical path that crosses one or a plurality of level -sensitive latches;

e2) identifying the side inputs of said path which have the least probabilities to take their respective non-controlling logic values; and

e3) inserting a finite state machine that takes input signals from said side inputs.

12. An apparatus including a comparator in an integrated circuit including multi-phase level- sensitive latches which compares the input and output signals of a level-sensitive latch when said latch is opaque and its preceding level-sensitive latch is also opaque.

13. A computer readable medium as described in claim 1 wherein modifying the netlist includes insertion of an apparatus as described in claim 12 at the end of a timing-critical path.

14. A computer readable medium as described in claim 1 wherein modifying the netlist includes insertion of a level-sensitive latch and an apparatus as described in claim 12 in a timing-critical path.

15. A computer readable medium as described in claim 1 wherein modifying the netlist includes combinational logic optimization which considers the occurrence probability for a signal to propagate through a timing-critical path.

16. A computer readable medium as described in claim 1 wherein in an iterative process at each iteration applied is a method including as described in either claim 7, 8, 11, 13, 14,

Description:
Optimization System and Method for Integrated Circuits Including Multi-Phase Level-Sensitive Latches

Cross-Reference to Related Application

[0001] This application is based upon and claims priority to U.S. provisional patent application US 62/446,494, entitled "Multi-Phase Latch-Based Integrated Circuit Design Methods and Paradigms," filed on January 15, 2017. The entire content of this application is incorporated herein by reference.

Background of the Invention

[0002] 1. Field of the Invention

[O003J This invention is related to an integrated circuit design technique that leads to improved performance, power consumption, and cost, especially in advanced technology nodes wherein traditional methodologies lead to pessimistic design in the presence of significant performance variability.

[0004] Technology scaling has driven the semiconductor industry in continuous cost reduction and performance improvement for decades. However, at advanced technology nodes such scaling has stopped leading to significant performance improvement because significant parametric variations from the manufacturing process and runtime environment result in significant performance variability, while the traditional synchronous integrated circuit design methodology demands the worst-case signal propagation delay across a combinational logic network not to exceed the clock cycle time.

[0005] 2. Brief Description of the Related Art

[0006] Traditional synchronous integrated circuits are mostly based on flip-flops that require a data signal to arrive before a triggering clock edge, and after the previous triggering clock edge. Traditional static timing analysis verifies this by calculating the best- and worst-case signal propagation delays across a combinational logic network, and checking signal arrival rimes at flip-flop data pins against a setup time and a hold time that can be found in a cell library. A signal must .arrive at a flip-flop data pin before the next triggering clock edge with a margin that needs to be larger than the setup time, and after the previous triggering clock edge with a margin that needs to be larger than the hold time. Further considerations include clock skew, uncertainty, etc. (0007] A multi-phase latch-based integrated circuit includes level-sensitive latches that are connected to clock signals of multiple phases. Specifically, in a typical implementation, a D flip- flop consists of two level -sensitive latches of complementary clock phases. A level-sensitive latch turns transparent or opaque such that a data signal can or cannot propagate through it when the clock signal is high or low. A sequence of level-sensitive latches that are connected to clock signals of multiple phases control a data signal propagating ' through them at a desired pace.

[0008] A level-sensitive latch has different timing reqxjirements compared with a flip-flop. While a signal must arrive at the data pin of a fl ip-flop before the flip-flop latches the signal, a l evel- sensitive latch allows a signal to arrive after the latch turns transparent while the signal still needs to arrive before the latch ruins opaque. In static timing analysis, setup timing checks for level-sensitive latches are performed in a method called time borrowing, wherein the borrow time or the amount of time by which the signal arrival time exceeds the arrival time of the opening clock signal or the clock signal that turns the latch transparent is excluded from the signal arrival time at the latch data pin which is subsequently compared against the arrival time of the opening clock signal, usually assuming a zero setup time. In this method, the data signal arrival time is compared against the opening clock edge for a level-sensitive latch which is the same for a flip-flop. Otherwise, for a level-sensitive latch the data signal arrival time needs to be compared against the closing clock edge or the clock edge that turns the latch opaque, since the setup time for a level-sensitive latch is characterized for the closing edge. The maximum borrow time is the arrival time of the closing clock edge minus the setup time, against which a signal arrival time at a level-sensitive latch data pin is still checked in static timing analysis. The borrow time is included in subsequent signal propagation delays in the next logic stage, in other words, any borrow time needs to be compensated in subsequent logic stages. As a result, while time borrowing allows latches to tolerate local performance variation leading to timing error rate reduction, replacing edge-triggered flip-flops by level-sensitive latches in an integrated circuit leads to limited performance improvement in traditional design methodologies.

[0009] A clock signal may propagate to the output of a level-sensitive latch through the enable pin, or when time borrowing occurs, in other words, a data signal arrives later than the opening clock edge, the data signal may propagate to the output of a level-sensitive latch through the data pin. In static timing analysis, the latest of such signal arrival times at the output of a level- sensitive latch is computed, from which subsequent signal propagation starts.

(0010] Traditional integrated circuit optimization techniques include combinational logic optimization and register retiming. In combinational logic optimization, a functionally- equivalent combinational logic implementation is found to better meet the design requirements such as in timing performance and power consumption. In register retiming, a functionally- equivalent implementation is found for a sequential circuit, e.g., by moving some of the combinational logic from one logic stage to another across certain flip-flops, to better meet the design requirements such as in timing performance. (0011] Asynchronous design tolerates performance variability, however, it has not become practical after decades of research. Alterative to synchronous and asynchronous integrated circuit design, variable latency design allows logic computation in a logic stage to complete in a flexible number of clock cycles. This leads to better-than-worst-case design, wherein the worst-case signal propagation delay across a logic stage is allowed to exceed the clock cycle time, while function correctness is achieved by identifying all timing errors, e.g., either from logic inputs in timing error prediction or logic outputs in timing error detection, subsequently invoking a timing error recovery or correction mechanism.

[0012] For example, Razor is a timing error detection technique which compares the signals captured in a flip-flop and a shadow level -sensitive latch. The original Razor design can be found in U.S. Pat, No. US 10/392,382, entitled "Error Detection and Recovery Within Processing Stages of an Integrated Circuit,'' filed on March 20, 2003, by Kriszti an Flautner, Todd Michael Austin, David Theodore Blaauw, and Trevor Nigel Mudge. Other Razor-type architectures can be found in U.S. Pat, No, US 14/702,426, entitled "Timing Violation Resilient Asynchronous Template," filed on May 1 , 2015, by Peter A. Beerel, Melvin Breuer, Benmao Cheng, and Dylan Hand.

[0013] A practical timing error prediction scheme can be found in a journal article entitled "Application- Specific Cross-Layer Optimization Based on Predictive Variable-Latency VLSI Design," by V, Be, -A. B_ Kahng, T. Karnik, B. Liu, M. Maleki and L, Wang, which appeared in ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on •Cross-Layer System Design and Regular Papers, Volume 12, Issue 3, September 2015, Article No. 21, on pages 21:1-21:19.

Summary of the Invention

[0014] This invention is based on a method that constrains a signal arrival time at a data pin of a level-sensitive latch against the arrival time of a closing clock edge that is determined based on the path that the signal propagates through. The arrival time of signals which are launched at different clock phases and cycles and propagate through latches of different phases at different cycles need to be checked against the arrival time of different closing clock edges. For this we differentiate such signal arrival times in static timing analysis by computing the worst-case signal arrival time in each category at each timing node. Specifically, there is a sequenee of clock signals that turn latches transparent in a signal propagation path. At the end of the signal propagation path, the signal needs to arrive before the subsequent closing clock edge at the level- sensitive latch.

[0015] This method is less pessimistic than the traditional time borrowing method. The setup timing constraint is tighter for signal propagation paths that cross more latches of mul tiple phases, and approaches the same timing constraint in the traditional time borrowing method as the number of latches in a signal propagation path approaches infinity. However, in practice. such an infinite long signal propagation path does not need to consider, because (1) any loop does not need to repeat, and (2) when parametric variations are taken into consideration, without perfect correction, the canceling effect brings down the total signal propagation path delay as a path grows longer, such that the worst case does not occur at an infinite long signal propagation path.

[001.6} This invention includes an iterative methodology wherein at each iteration a signal arrival time at a level-sensitive latch through a signal propagation path is reduced by one of the following techniques.

[0017] This invention includes a technique that includes removal and insertion of level-sensitive latches at the start or end of a timing-critical signal propagation path such that the tiniing-critical path is shortened and the signal propagation delay in the path is reduced while the functionality of the circuit is kept intact, e.g., for given inputs the circuit gives the same outputs in the same cycles.

[0018] To reduce pessimism in the presence of parametric variations, this invention includes a timing error predicting apparatus in the form of a finite state machine that is constructed by identifying timing critical paths based on the said timing constraints , and taking some of the side inputs .of the identified timing critical paths as inputs for the finite state machine.

[0019] To handle performance variations that may not be predictable such as due to

unpredictable runtime uncertainties or inaccuracy in characterization, simulation or timing analysis, a timing error detecting apparatus is constructed, e.g., by identifying timing critical paths based on the said timing constr aints, and compari ng the input and output of the latch at the end of a timing-critical path.

[0020] Such a timing error predicting/detecting apparatus enables hetter-than-worst-case integrated circuit design or allows a signal propagation path delay to exceed the clock cycle time without compromising the functionality of the circuit, which is achieved by predicting/detecting every timing error occurrence, and invoking a timing error recovery/correction mechanism.

[0021] The performance of such an integrated circuit is based on the occurrence probability of predicted/detected timing errors. Subsequently, combinational logic is optimized for a statistical objective such as average performance or power consumption with consideration of ' the occurrence probability of predicted/detected timing errors.

Detailed Description of the Drawing

[0022] Advantages of the present invention will become apparent to those skilled in the art with the benefit of the following detailed description of embodiments and upon reference to the accompanying drawings in which: [0023] Figure 1 illustrates a level-sensitive- latch with a setup time and hold time around a closing clock edge, the maximum borrow time, and the borrow time for a signal arriving later than an opening clock edge.

[0024J Figure 2 illustrates signal arrival time computation at a level-sensitive latch output for setup time check in static timing analysis.

[0025] Figure 3 illustrates a set of timing constraints for an integrated circuit including two- phase level-sensitive latches, where n is the number of logic stages that a signal propagation path crosses.

[0026] Figure 4 illustrates a set of timing constraints for an integrated circuit including four- phase level-sensitive latches, where n is the number of logic stages that a signal propagation path crosses, k is the number of clock phases, m is the number of phases wherein a clock signal stays high.

[0027J Figure 5 illustrates a signal arrival time computation method for an integrated circuit including multi-phase level-sensitive latches, wherein after setting arrival times for primary inputs, for each clock cycle c from 0 to n, for each phase p from 0 to k, arrival times of clock signals .of phase p in cycle c are set, and subsequent signal arrival times are computed.

[0028] Figure 6 illustrates a signal arrival time computation method for an integrated circuit including multi-phase level-sensitive latches, wherein after initial signal arrival time computation after setting arrival times for ail primary inputs and clock signals of all phases in the first cycle, for each clock cycle c from 0 to n, for each phase p from 0 to k, arrival times of clock signals of phase in cycle c advance to the next cycle, and subsequent signal arrival times are computed.

[0029] Figure 7 illustrates level-sensitive latch removal and insertion around the start of a timing-critical path.

[Ό03Θ] Figur e 8 illustrates level -sensitive latch removal and insertion around the end of a timing- critieal path.

[0031] Figure 9 gives a timing error predicting apparatus in the form of finite state machine that predicts upcoming timing errors by sampling signals from different logic stages in an integrated circuit including multi-phase level-sensitive latches.

[0032] Figure 10 gives a timing error detecting apparatus which compares the input and output signals of a level- sensitive latch when the said level-sensitive latch and its preceding ievel- sensitive latch are both opaque in an integrated circuit including multi-phase level-sensitive latches.

[0033] Figur e 1 1 gives an overall optimization method for an integrated circuit including multiphase level-sensitive latches. [0034] ' While the invention may be susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. The drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but to the contrary, the intention is to cover ail modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

Detailed Description of the Illustrative Embodiment

[0035] It is to be understood that the present invention is not limited to particular devices or systems, which may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a", "an", and "the" includes singular and plural references unless the content clearly dictates otherwise.

[0036] Figure 1 shows a level-sensitive latch that is transparent when the clock signal□ is high. Its setup time and hold time are characterized relative to a closing clock edge, which are the minimum amount of time that a data signal must be held steady before and after the closing clock edge, respectively. Time borrowing occurs when a signal arrives at the data pin I) later than an opening clock edge. The borrow time is the gap between the data signal arrival time and the opening clock signal arrival time. The maximum borrow tune is the gap between the opening clock edge and the closing clock edge minus the setup time,

[0037] Figure 2 gives that, in static timing analysis, for setup time cheek, the worst-case signal arrival time at a level-sensitive latch output is given by the maximum of (I) the signal arrival time at a data input pin plus the signal propagation delay </,<· q from the data input pin to the output pin, and (2) the arrival time of a clock signal□ that turns the latch transparent plus the clock signal propagation delay d e k- q from the enable pin to the output pin.

[0038] In a multi-phase latch-based integrated circuit, a signal may propagate through a sequence of latches of multiple phases. When a clock signal turns a latch transparent, a signal propagates through it (usually across a combinational logic network) and reaches another latch that is opaque and will turn transparent at a subsequent cl ock phase. If the signal arrives before the receiving latch turns transparent, the signal waits before its subsequent propagation. If the signal arrives after the receiving latch turns transparent and while the receiving latch is transparent, i.e., time borrowing occurs, the signal continues propagate across the next combinational logic stage. If the signal arrives after the receiving latch turns transparent at a subsequent clock phase and then opaque, the signal cannot propagate through the receiving latch. It must wait for one more clock cycle to propagate through the receiving latch, in the meantime it may be destroyed by the logic computation in the subsequent clock cycle, causing a logic error. (0039] Consequently, a signal is required to arrive at a receiving level-sensitive latch before it rums opaque. The setup time characterizes the minimum allowable gap between a data signal arrival time at a level-sensitive latch and the arrival time of a clock edge that turns the level- sensitive latch opaque. Further, a data signal must remain stable after a clock edge that turns a levekseasitive: latch opaque. The hold time characterizes the minimum amount of time that a data signal must remain stable after a clock edge that turns a level-sensitive latch opaque.

{0040} Figure 3 gives a two-phase level-sensitive latch-based integrated circuit with two complementary clock signals, A signal propagation delay across the first logic stage is denoted tfj. The latest signal arrival time at a receiving level-sensitive latch at an output of the first logic stage is one clock cycle time minus the setup time of the receiving level-sensitive latch, allowing the signal to set up before the receiving level-sensitive latch turns opaque. The earliest signal arrival time at the recei ving level-sensiti ve latch must be larger than the hold time of the receiving level-sensitive latch, otherwise, the early signal may destroy the signal in the previous cycle. Any clock skew and uncertainty must be included, e.g., in the clock arrival time in these constraints.

[0041] Figure 4 gives a four-phase level-sensitive latch-based integrated circuit. Along any data signal propagation path, the level-sensitive latches turn transparent subsequently: after a level- sensitive latch tons transparent, a data signal propagates from it across a logic network, while at the output of the logic network a receiving level-sensiti ve latch turns transparent in a subsequent clock phase. Each level-sensitive latch is transparent for two consecutive clock phases and opaque for two consecutive clock phases. The signal propagation delay across the first logic stage must be smaller than three cl ock phases minus the setup time of the receiving level- sensitive latch, allowing the signal to set up before the receiving level-sensitive latch turns opaque; it further must be larger than one negative clock phase plus the hold time at the receiving level-sensitive latch, allowing the previous signal to hold after the receiving level- sensitive latch turns opaque. Such a hold time constraint is usually easily met as it requires the signal propagation delay be larger than a negative amount. While any clock skew and/or uncertainty needs to be included.

£0042} In general, let k be the number of clock phases, m be the number of clock phases wherein a level-sensitive latch is opaque, and n be the number of logic stages that a data propagation path crosses, the setup and hold tuning constraints in a multi-phase level-sensitive latch-based integrated circuit can be found in Figure 4. Specifically, as n increases by one, the maximum and minimum allowable data propagation path delay across n logic stages increase by the duration of a clock phase, respectively. This holds for a multi-phase level-sensitive latch-based circuit wherein a signal propagates through a sequence of level-sensitive latches which are controlled by clock signals of consecutive phases. In some cases, some phases may be skipped in a signal propagation path. More generally speaking, the setup time constraint for a signal arriving at a level- sensitive latch is given by the subsequent clock signal that turns the latch opaque, while the hold time constraint is given by the previous clock signal that turns the latch opaque. (0043] As a signal propagates more logic stages and level-sensitive latches which are controlled by clock signals of multiple phases in different clock cycles, the signal propagation path gets longer, and the average signal propagate path delay across a logic stage gets a tighter setup time constraint For example, from the inequalities in Figure 3 and Figure 4, as n approaches infinity, the setup time constraint for an average path delay in a logic stage approaches the clock cycle time minus the setup time, which is the same as in a flip-flop-based integrated circuit. However, such a long signal propagation path rarely occurs. In. static timing analysis, a data signal propagates through a level-sensitive latch only if it arrives at the level-sensitive latch later than the clock signal that turns the level-sensitive latch ijansparent; otherwise, the worst-case signal arri val time at a level-sensitive latch output is given by the arrival time of the clock signal that turns the level-sensitive latch transparent,

[0044] To compute signal arrival times across level-sensitive latches, one needs to identify all time borrowing occurrences, for that all clock signals need to be given. The arrival times of signals which are launched by level- sensitive latches of different phases or in different clock cycles need to be separated in static timing analysis. Further, the clock signals need to be updated for different clock cycles. Figure 5 gives a straight-forward method which is to advance in the time domain, e.g., first all clock signals of the first phase in the first clock cycle arrive, which triggers signal propagation across clock networks, at corresponding latch outputs, across corresponding combinational logic networks, , and across any transparent level-sensitive latches, before the clock signals of the next phase arrive.

[0045] Figure 6 gives another computation method. First the arrival times are set for all primary inputs and clock signals of all phases in the first clock cycle, then signal arrival times are computed across clock networks, at latch outputs, and across combinational logic networks. Subsequently the arrival times of the clock signals of the first phase in a list of clock phases which are sorted in an ascending order in the time domain advance to the next clock cycle. Then signal arrival times are computed across clock networks, at latch outputs, and across

combinational logic networks again. Note that these new signal arrival times are from signal propagation across two logic stages and need to be separated from the previous signal arrival, times which are from signal propagation across one logic stage. They have different setup time requirements. Subsequently for each clock phase p in the list of clock phases which are sorted in an ascending order in the time domain, the clock signals of phase p advance to the next clock cycle, and signal arrival times are computed across clock networks, at latch outputs, and across combinational logic networks. Note that this method is based on the assumption ..that a signal propagates through a sequence of level -sensitive latches of which clock signals arrive in an ascending order in the time domain.

(0046] After identifying a timing-critical signal propagation path, this invention modifies the netlist of an integrated circuit to improve its timing performance. In one embodiment, a timing- critical signal propagation path is shortened by level-sensitive latch retiming or moving a level- sensitive latch across a logic gate in the netlist. Such a logic gate can be at the start of a timing- critical path, as Figure 7 illustrates. To move a level-sensitive latch from, an input of a logic gate to its output(s) while keeping the sequence of level-sensitive latches in any signal propagation path controlled by the same sequence of clock signal s of the same phases , respectively, one needs to remove all the level-sensitive latches L in the fanin cone of the logic gate which send a signal to the logic gate without crossing any other latch, and insert a level-sensitive latch controlled by the same clock signal of the same phase as the removed level-sensitive latches at any fanout F from which a signal may propagate to other logic gates out of the fanin cone.

[Q047J Alternati vely, as Figure 8 illustrates, a timing-critical signal propagation path may be shortened by moving a level-sensitive latch from the output of a logic gate at the end of the path to its input(s). To keep the sequence of level -sensitive latches in any signal propagation path controlled by the same sequence of clock signals of the same phases, respectively, one needs to remove all the level-sensitive latches L in the fanout cone of the logic gate which the logic gate sends a signal to without crossing any other latch, and insert level-sensitive latches controlled by the same clock signals of the same phase as the removed l atches at any side input / from which a signal may propagate to a logic gate in the fanout cone.

[0048] In another embodiment, modification, of the netlist of an integrated circuit includes insertion of an apparatus that identifies occurrences of signal propagation through a timing- critical path, such that better-than- worst-case design can be applied, which allows a signal propagation path delay to exceed the clock cycle time while invoking a timing error recovery or correction mechanism at any occurrence of signal propagation through the timing-critical path. Figure 9 illustrates such an apparatus which includes a finite state machine that takes input signals from combinational logic networks across multiple clock phases. Such a finite state machine can be constructed by first identifying timing-critical paths across one or more logic stages in an integrated circuit including multi-phase level- sensitive latches, subsequently selecting a subset of the side inputs of the timing-critical paths, and designing a finite state machine that outputs logic one after receiving a sequence of signals from the selected side inputs across multiple phases in a. given pattern. For a logic gate in a signal propagation path, a side input is an input of the logic gate that is not in the path. For a signal to propagate through the path, a side input needs to take a non-controlling logic value with respect to the logic gate. For a signal to propagate through a timing-critical path across multiple phases, the side inputs need to take non-controlling logic values in specific phases, respectively. For example, Figure 9 shows two AND logic gates in a timing critical signal propagation path that crosses the two logic stages. The two side inputs of the two AND logic gates need to take logic one in two consecutive phases. A finite state machine can be constructed to output logic one in case that the. two side inputs take logic one in two consecutive phases.

{O049J The occurrence probability for a signal to propagate through a path is given by the occurrence probability for all the side inputs of the path to take respecti ve non-controlling logic values, and can be approximated by some of the least occurrence probabilities for a side input to take the respecti ve non-controlling logic value- As a result, the side inputs of the least occurrence probabilities to take respective non-controlling logic values are better to be selected as the input signals of a finite state machine in a timing error predicting apparatus, such that the occurrence probability for the apparatus to predict a timing error is minimized, while the sil icon cost of the apparatus is also minimized due to a small number of selected side inputs. Such an apparatus predicts every occurrence of signal propagation through a given timing-critical path, while any false alarm leads to performance degradation but not logic error.

[0050] Figure 10 gives a timing error detecting apparatus in an integrated eircuit wherein a signal may propagate through a sequence of level-sensitive latches which are controlled by clock signals in four phases. Such a sequence of latches turn transparent subsequently, and turn opaque subsequently. Each level-sensitive latch remains opaque in two consecutive phases. When a level-sensitive latch is opaque while its preceding level-sensitive latch is also opaque, there should be no signal arriving in the combinational logic network between the two opaque level- sensitive latches; otherwise, such a signal is a late arriving signal that cannot go through the receiving level-sensitive latch and will be destroyed in the subsequent logic computation. Such a late arriving signal can be detected by a timing error detecting apparatus that includes a group of comparators each taking the input and output of a level-sensitive latch at the end of a timing- critical signal propagation path. Or, in case that a signal may be too late for such an apparatus at the end of a timing -critical path to catch, an additional level-sensitive latch controlled by the same clock signal as of the le vel-sensi tive latch at the end of the path can be inserted in t he middle of a timing-critical path with a timing error detecting apparatus.

[0051] Insertion of such a timing error detecting apparatus includes identification of a group of timing-critical path which delays are within a timing margin for a given group of clock signals. A detected timing error may invoke a timing error recovery mechanism such as relaunching the faulty computation, or a timing error correction mechanism such as shifting the clock phases to capture a late signal.

[0052| For such an integrated circuit with a timing error predicting/detecting apparatus which invokes a timing error recovery/correetion mechanism for functional correctness, its average performance is given by the occurrence probability of predicted/detected timing errors, its performance without timing error occurrence, and its performance at the occurrence of a timing error including any timing error recovery/correction-indueed performance degradation.

Subsequently, a combinational logic network in such an integrated circuit can be optimized for a statistical objective such as average performance while taking the occurrence probability of predicted/detected timing errors into consideration. For example, a timing-critical path of a tiny occurrence probability for a signal to propagate through it is suitable for better-than- worst-case design by inserting a timing error predicting/detecting apparatus around it; while a timing-critical path of a significant occurrence probability for a signal to propagate through it is better for a combinational logic optimization or latch removal and insertion method to reduce its signal propagation delay. [0053] Figure 11 gives an overall optimization method for an integrated circuit including multiphase latches. Specifically, for the top timing-critical path identified by the said timing analysis method based on the said timing constraints, its delay can be reduced by either (1) latch removal and insertion at the start or end of the timing-critical path, (2) insertion of a timing error predicting or detecting apparatus which enables better-than-worst-case design possibly with a timing error recovery or correction mechanism, or (3) combinational logic optimization. The optimizer chooses the best optimization solution for each timing-critical path based on the performance improvement and the cost. The optimization process proceeds to the next timing- critical path, until all design requirements are met.

[0054] The preferred embodiment of the present invention, an optimizer for an integrated circuit including multi-phase latches, is thus described. In this patent, certain U.S. patents, U.S. patent applications, and other materials (e.g., articles) have been incorporated by reference. The text of such U.S. patents, U.S. patent applications, and other material is, however, only incorporated by reference to the extent that no conflict exists between such text and the other statements and dr awings set forth herein. In the event of such conflict, then any such conflicting text in such incorporated by reference U.S . patents, U.S. patent applications, and other materials is specifically not incorporated by reference in this patent.

[0055] Further modifications and alternative embodiments of various aspects of the invention will.be apparent to those skilled in the art in view of this description. Accordingly, this description is to be construed as illustrative only and is for the purpose of teaching those skilled in the art the general manner of carrying- out the invention. It is to be ' understood that the forms of the invention shovyn and described herein are to be taken as examples of embodiments. Elements and materials may be substituted for those illustrated and described herein, parts and processes may be reversed, and certain features of the lm¾ntion may be utilized independen tly, al l as would be apparent to one skilled in the art after having benefit of this description of the

invention- Changes may be made in the elements described herein without depar ting from the spirit and scope of the invention as described in the following claims.