Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MULTI-LANE DATA TRANSMISSION DE-SKEW
Document Type and Number:
WIPO Patent Application WO/2012/038546
Kind Code:
A1
Abstract:
There is described a de-skew circuit and a method of de-skewing a multi lane serial interface. De-skew can be done by delaying one of the recovered symbol clocks from a lane and using this delayed symbol clock to sample the data received from all the lanes. Thus, it is possible to mandate that the maximum possible skew is less than a symbol clock period for short interconnect lengths and also through stricter routing guidelines. The circuit and the method ensure low power and low latency de-skew of a multi-lane data transmission link.

Inventors:
BALAKRISHNAN BIPIN (NL)
GOULAHSEN ABDELAZIZ (FR)
Application Number:
PCT/EP2011/066622
Publication Date:
March 29, 2012
Filing Date:
September 23, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ST ERICSSON SA (CH)
BALAKRISHNAN BIPIN (NL)
GOULAHSEN ABDELAZIZ (FR)
International Classes:
H04L7/00; H04L7/033; H04L25/14
Foreign References:
US20100008460A12010-01-14
US20030131301A12003-07-10
Other References:
None
Attorney, Agent or Firm:
VERDURE, Stephane et al. (52 rue de la Victoire, Paris Cedex 09, FR)
Download PDF:
Claims:
CLAIMS

1 . A method comprising:

a) receiving a plurality of symbol sequences in a first device (1 ), each of the symbol sequences having been transmitted in parallel by a second device (2) over a respective serial data link of a multi-lane serial interface that couples the first and second devices, and each of the symbol sequences including embedded clocking information and a plurality of instances of at least one pattern of non-data information known by the first device;

b) performing a process of de-skewing the serial data links, which comprises: b1 ) extracting (26) symbol clocks respectively associated with each one of the active serial data links, based on the clocking information embedded in the symbol sequences respectively received through said active serial data links;

b2) selecting (31 , 31 a) an initial common sampling clock among the extracted symbol clocks respectively associated with the active serial data links, b3) setting (31 ,31 b) an initial delay value for at least said common sampling clock;

b4) delaying (21 ) the common sampling clock by the delay value set for said common sampling clock;

b5) based on one instance of the pattern of non data information, checking (22,33) whether all the symbol sequences can be sampled correctly using the delayed common sampling clock; and, if not, repeating steps b3) to b5) for another instance of the pattern of non-data information, while setting another delay value at step b3) for the common sampling clock, until all the symbol sequences can be sampled correctly using the delayed common sampling clock (34) or an ultimate delay value is reached for said common sampling clock (39),; and,

c) sampling (35) all the symbol sequences using the delayed common sampling clock if all the symbol sequences can be sampled correctly using said delayed common sampling clock.

2. The method of claim 1 wherein the initial common sampling clock selected at step b2) is the extracted symbol clock associated with the lowest active serial data link (Lane_0), i.e. the lowest numbered serial data link among the plurality of serial data links on which data transmission is ongoing.

3. The method of claim 2, wherein the initial delay value set at step b3) for the common sampling clock is a minimum possible delay value, and the delay value set and used at each subsequent iteration of steps b3) to b5) is incremented at each of said subsequent iterations, up to a maximum delay value tied to the maximum tolerated amount of skew.

4. The method of claim 2, wherein the initial delay value set at step b3) for the common sampling clock is a maximum delay value tied to the maximum tolerated amount of skew, and the delay value set and used at each subsequent iteration of steps b3) to b5) is decremented at each of said subsequent iterations, up to a minimum possible delay value.

5. The method of claim 1 , wherein the initial common sampling clock selected at step b2) among the extracted symbol clocks respectively associated with the serial data links, is the symbol clock associated with the serial link on which the earliest transmission activity is detected.

6. The method of claim 5, wherein the initial delay value set at step b3) for the common sampling clock is a maximum delay value tied to the maximum tolerated amount of skew, and the delay value set and used at each subsequent iteration of steps b23) to b5) is decremented at each of said subsequent iterations.

7. The method of any one of claims 1 to 6, wherein if at step b5) the ultimate delay value is reached for the selected common sampling clock and still not all the symbol sequences can be sampled correctly using said delayed common sampling clock, then steps b3) to b5) for are repeated for another common sampling clock selected among the remaining extracted symbol clocks respectively associated with the active serial data links.

8. The method of any one of claims 1 to 7, wherein hardware resources dedicated to the de-skewing on lanes whose symbol clock is not selected as final common sampling clock are deactivated.

9. The method of any one of claims 1 to 8, wherein the plurality of serial data links support the Mobile Industry Processor Interface, Low Latency Interface, MIPI LLI, data transfer protocol.

10. The method of claim 9, wherein only a subset of total number of serial data links in a given direction is used for data transmission depending on the bandwidth requirements.

1 1. A computer-readable storage medium, with computer-readable instructions stored therein for execution by a processor to perform the method of at least one of claims 1 to 10. 12. An electronic system comprising a multi-lane serial interface, a first device (1 ) and a second device (2) coupled by said multi-lane serial interface, wherein:

- the first device is adapted to receive a plurality of symbol sequences, each of said symbol sequences having been transmitted in parallel by the second device over a respective serial data link of the multi-lane serial interface, and each of the symbol sequences including embedded clocking information and a plurality of instances of at least one pattern of non-data information known by the first device;

- the first device comprises Clock and Data Recovery, CDR, units (26) adapted to extract symbol clocks respectively associated with each one of the serial data links, based on the clocking information embedded in the symbol sequences respectively received through said serial data links;

- the first device comprises a Lane Alignment circuit (20) having: • a Lane and Delay Selector (23) adapted to select an initial common sampling clock among the extracted symbol clocks respectively associated with the active serial data links, and to set an initial delay value for said common sampling clock;

· Delay blocks (21 ) adapted to delay the common sampling clock by the delay value set for said common sampling clock under control of the Lane and Delay selector;

- a de-skew logic (22) adapted to check whether, based on one instance of the pattern of non data information, all the symbol sequences can be sampled correctly using the delayed common sampling clock; and, if not, to cause the Lane and Delay Selector set another delay value for the common sampling clock until all the symbol sequences can be sampled correctly using the delayed common sampling clock or an ultimate delay value is reached for said common sampling clock, or else to allow the sampling of all the symbol sequences using the delayed common sampling clock.

13. The electronic system of claim 12 wherein the de-skew logic is adapted to cause the Lane and Delay Selector select as initial common sampling clock the extracted symbol clock associated with the lowest active serial link (Lane_0), i.e. the lowest numbered one among the plurality of serial data links on which data transmission is ongoing.

14. The electronic system of claim 12 wherein the de-skew logic is adapted to cause the Lane and Delay Selector select as common sampling clock the symbol clock associated with the serial link on which the earliest transmission activity is detected.

15. The electronic system of any one of claims 12 to 14, wherein the de-skew logic is further adapted to, if the ultimate delay value is reached for the selected common sampling clock and still not all the symbol sequences can be sampled correctly using said delayed common sampling clock, cause the Lane and Delay Selector select another common sampling clock selected among the remaining extracted symbol clocks respectively associated with the active serial data links.

16. The electronic system of any one of claims 12 to 15 wherein the multi-lane serial interface supports the Mobile Industry Processor Interface, Low Latency Interface, MIPI LLI, data transfer protocol.

Description:
MULTI-LANE DATA TRANSMISSION DE-SKEW

TECHNICAL BACKGROUND

Technical Field

The present invention generally relates to multi-lane data transmission de-skew process, and more specifically to a method and a circuit to ensure low latency de-skew of a multi-lane data transmission link.

It finds applications, in particular, in low latency data transfer interconnects such as MIPI LLI (Mobile Industry Processor Interface, Low Latency Interface) which can be used for instance in mobile terminal systems, e.g., cell phones, smart phones, etc. It is also appropriate for broader applications.

Related Art

The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section is not admitted to be prior art by inclusion in this section.

Increasing bandwidth requirements of the chip to chip interfaces favours the interface implementation as serial interfaces over parallel interfaces. This is due to the fact that parallel interfaces require a higher number of pins, which increases the cost of the system. Also the parallel interfaces are usually synchronized with the source and hence transmit a clock signal along with the parallel data. The receiver samples the data using this transmitted clock. Due to the routing length mismatches of traces on the PCB (Printed Circuit Board), impedance mismatches of traces, routing through vias, mismatches in chip transmit and receive l/Os etc, the time needed by each signal (including the clock) to travel from the transmitter to the receiver varies from one signal to another signal. This phenomenon is known as "skew". The possible skew between the clock and data signals and also among the different data signals, limits the maximum clock frequency at which the system may be operated, against the requirement for high bandwidths with lower number of pins. Serial interfaces brings down the cost as the number of pins required for data transmission is lower, and also permit higher operating frequency through embedded clocking. In order to cater to higher bandwidths than what is possible by a single serial signal, multiple serial signals may be coupled together to form a serial interface. Each serial signal is called a lane, and thus a serial interface comprised of multiple serial signals is called a "multi-lane serial interface". Multi- lane serial interfaces also suffer from problems of skew between signals, which needs to be eliminated at the receiver. The process of removing skew is known as "de-skew".

Serial interfaces such as PCIe (Peripheral Component Interconnect

Express), Infiniband etc., transmit a pre-determined training pattern on each serial lane. The training pattern is a group of special code words, that are encoded according to e.g. the 8b/10b encoding scheme, and that do not show up in the normal data transmission. For that reason, in the present description, such special code words shall sometimes be referred to as non data information. As the scheme name suggests, 8 bits of data are transmitted as a 10-bit entity called a symbol. The number of the code words in the training pattern depends on the skew tolerance to be achieved for the system. The receiver scans the training pattern on each lane and calculates the skew. Such a de-skew process allows for relaxed skew tolerances at the cost of added latency for data transmission.

Existing de-skew protocols/mechanisms scans for a training pattern (which usually consists of one or more 8b10b code words or symbols) in elastic buffers which are respectively dedicated to each one of the lanes. It is only after all the active lanes have received this training pattern in their respective buffers, that de-skew is done by adjusting the read pointers for the buffers of each lane. Such a de-skew mechanism has the drawback that the latency involved in achieving data alignment among the lanes is not low. This is in the order of a few symbol clock periods, which are needed for writing data into the buffers, comparing the training pattern among the buffers, adjusting the read pointers, binary to gray converting the pointers and vice versa, etc. Hence, the existing de- skew schemes are not suited for serial interfaces that need to support low latency. Secondly, such schemes are not optimum from the power efficiency perspective, as each lane has an elastic buffer which is written into using a respective independent clock and read from using a common clock. Each buffer is subject to write and read operations, and also the clock driving circuitry consumes power and cannot be turned off even after the de-skew value has been determined.

SUMMARY

The present invention proposes a data transfer system between integrated circuits and more particularly a method and circuit to deal with the reduction or removal of skew in such a data transfer system. Though embodiments shall be described therein in their application to low latency data transfer interconnects such as MIPI LLI, it will become apparent that their basic concept can be used for broader applications.

More precisely, a first aspect relate to a method comprising:

a) receiving a plurality of symbol sequences in a first device, each of the symbol sequences having been transmitted in parallel by a second device over a respective serial data link of a multi-lane serial interface that couples the first and second devices, and each of the symbol sequences including embedded clocking information and a plurality of instances of at least one pattern of non-data information known by the first device;

b) performing a process of de-skewing the serial data links, which comprises:

b1 ) extracting symbol clocks respectively associated with each one of the active serial data links, based on the clocking information embedded in the symbol sequences respectively received through said active serial data links; b2) selecting an initial common sampling clock among the extracted symbol clocks respectively associated with the active serial data links,

b3) setting an initial delay value for at least said common sampling clock;

b4) delaying the common sampling clock by the delay value set for said common sampling clock; and, b5) based on one instance of the pattern of non data information, checking whether all the symbol sequences can be sampled correctly using the delayed common sampling clock; and, if not, repeating steps b3) to b5) for another instance of the pattern of non-data information, while setting another delay value at step b3) for the common sampling clock, until all the symbol sequences can be sampled correctly using the delayed common sampling clock or an ultimate delay value is reached for said common sampling clock; and,

c) sampling all the symbol sequences using the delayed common sampling clock if all the symbol sequences can be sampled correctly using said delayed common sampling clock.

There is thus provided a clock delay based de-skew mechanism, instead of elastic based methods known in the prior art. This difference results in very low latency in de-skewing.

For short interconnect lengths and also through stricter routing guidelines, it is possible to mandate skew tolerances of hundreds of picoseconds magnitude (which corresponds to duration less than one symbol or 8b/10b code clock duration at high serial speeds of 5 GHz) so that data transfer with low latencies is possible. Since the maximum possible skew is less than a symbol clock period, de-skew can be done by delaying one of the recovered symbol clocks from a lane and using this delayed symbol clock to sample the data received from all the lanes.

Assuming that the serial data links are numbered by decreasing order of their exhibited amount of skew, the initial common sampling clock selected at step b2) may be the extracted symbol clock associated with the lowest active serial data link, i.e. the lowest numbered serial data link among the plurality of serial data links on which data transmission is ongoing.

In this case, the initial delay value set at step b3) for the common sampling clock may be the minimum possible delay value, and the delay value set and used at each subsequent iteration of steps b3) to b5) is incremented at each of said subsequent iterations, up to a maximum delay value tied to the maximum tolerated amount of skew. In a variant, the initial delay value set at step b3) for the common sampling clock a maximum delay value tied to the maximum tolerated amount of skew, and the delay value set and used at each subsequent iteration of steps b3) to b5) is decremented at each of said subsequent iterations, up to a minimum possible delay value.

In other embodiments, the initial common sampling clock selected at step b2) among the extracted symbol clocks respectively associated with the serial data links, is the symbol clock associated with the serial link on which the earliest transmission activity is detected.

In this case, the initial delay value set at step b3) for the common sampling clock is preferably a maximum delay value tied to the maximum tolerated amount of skew, and the delay value set and used at each subsequent iteration of steps b23) to b5) is decremented at each of said subsequent iterations.

In all of the above cases, the methods allows de-skew tolerances of sub- symbol clock durations, e.g. duration of a bit clock period, whereas elastic buffer based methods of the prior art restrict the granularity to integral multiple of symbol clocks.

According to embodiments, if at step b5) the ultimate delay value is reached for the selected common sampling clock and still not all the symbol sequences can be sampled correctly using said delayed common sampling clock, then steps b3) to b5) may be are repeated for another common sampling clock selected among the remaining extracted symbol clocks respectively associated with the active serial data links.

Advantageously, hardware resources dedicated to the de-skewing on lanes whose symbol clock is not selected as final common sampling clock can be deactivated.

Indeed, once the lane whose symbol clock is chosen as the sampling clock for all lanes has been selected, e.g. the symbol clock driver and the delay logic for all other lanes can be shut down so as to save power. Also the unused delay elements in the clock lane that was selected can also be shut down to save power. Therefore, the de-skewing process exhibits lower power consumption when unused clock drivers and delay logic are thus shut down.

In one example of implementation, the plurality of serial data links support the Mobile Industry Processor Interface, Low Latency Interface, MIPI LLI, data transfer protocol.

In some embodiments only a subset of total number of serial data links in a given direction is used for data transmission, depending on the bandwidth requirements. Namely, only part of the total number of serial data links may be active. The method according to embodiments as defined above is then carried out only for the active serial links, i.e. lanes. The method thus adapts with scaling of active lanes, when some lanes are turned off or turned on.

A further aspect relates to a computer-readable storage medium, with computer-readable instructions stored therein for execution by a processor to perform the method of the first aspect above.

According to a third aspect, there is further proposed an electronic system comprising a multi-lane serial interface, a first device and a second device coupled by said multi-lane serial interface, wherein:

- the first device is adapted to receive a plurality of symbol sequences, each of said symbol sequences having been transmitted in parallel by the second device over a respective serial data link of the multi-lane serial interface, and each of the symbol sequences including embedded clocking information and a plurality of instances of at least one pattern of non-data information known by the first device;

- the first device comprises Clock and Data Recovery, CDR, units adapted to extract symbol clocks respectively associated with each one of the serial data links, based on the clocking information embedded in the symbol sequences respectively received through said serial data links;

- the first device comprises a Lane Alignment circuit having:

- a Lane and Delay Selector adapted to select an initial common sampling clock among the extracted symbol clocks respectively associated with the active serial data links, and to set an initial delay value for said common sampling clock; - Delay blocks adapted to delay the common sampling clock by the delay value set for said common sampling clock under control of the Lane and Delay selector;

- a de-skew logic adapted to check whether, based on one instance of the pattern of non data information, all the symbol sequences can be sampled correctly using the delayed common sampling clock; and, if not, to cause the Lane and Delay Selector set another delay value for the common sampling clock until all the symbol sequences can be sampled correctly using the delayed common sampling clock or an ultimate delay value is reached for said common sampling clock, or else to allow the sampling of all the symbol sequences using the delayed common sampling clock.

According to embodiments, the de-skew logic is adapted to cause the Lane and Delay Selector select as initial common sampling clock the extracted symbol clock associated with the lowest active serial link, i.e. the lowest numbered one among the plurality of serial data links on which data transmission is ongoing.

In a variant, the de-skew logic is adapted to cause the Lane and Delay Selector select as common sampling clock the symbol clock associated with the serial link on which the earliest transmission activity is detected.

In some embodiments, the de-skew logic is further adapted to, if the ultimate delay value is reached for the selected common sampling clock and still not all the symbol sequences can be sampled correctly using said delayed common sampling clock, cause the Lane and Delay Selector select another common sampling clock selected among the remaining extracted symbol clocks respectively associated with the active serial data links.

For instance, the multi-lane serial interface may support the Mobile Industry Processor Interface, Low Latency Interface, MIPI LLI, data transfer protocol. BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which like reference numerals refer to similar elements and in which:

- FIG.1 is a block diagram of a chip-to-chip multi-lane serial interface;

- FIG.2 is a schematic block diagram of a lane alignment circuit, in one example corresponding to an interface with four lanes;

- FIG.3 is a schematic block diagram of a state machine describing an example of a method used to select the amount of delay for the lane whose symbol clock is chosen as the sample clock; and,

- FIG.4 is a schematic block diagram of a state machine describing an example of a method used to select the amount of delay and the lane whose symbol clock is chosen as the sample clock.

DESCRIPTION OF PREFERRED EMBODIMENTS

Referring to FIG.1 , there is shown therein a block diagram of a chip-to- chip serial interface made up of multiple lanes. It shall be understood, however, that embodiments of the multi lane serial interface may be implemented in any manner suitable for the specific implementation, for instance for data transmission between IP blocks within a single System-on-Chip (SoC).

The shown system consists of two chip nodes, which are marked as local node 1 and a remote node 2 in the FIG.1 .

FIG.1 shows a number N+1 of serial data links in a given direction (from local node to remote node) and a number M+1 of serial data links in the reverse direction (from remote node to local node), where N and M are integral numbers, to convey that invention is valid for any generic multi-lane configuration. In what follows, the serial data links of which the multi lane serial interface is made up at the physical level, and the lanes i.e. the serial signals transmitted through such interface, may be referred to without specific distinction. State otherwise, for the purpose of the present description, the terms "serial data links" and "lanes" may be used as equivalents for designating the plurality of parallel channels used for simultaneously transmitting serial data.

In the shown example, the N+1 lanes and the M+1 lanes in the given and reverse directions, respectively, are labelled as Lane_0 to Nane_N and as Lane_0 to Lane_M, respectively. This numbering of the lanes may be fully arbitrary. In one embodiment, however, the lanes are ranked (and this ranking is reflected by their numbering) by order of their associated skew, which is somehow related to their respective length at the physical level. For instance, Lane_0 may be the maximum skewed lane and Lane_N or Lane_M may be the less skewed lane.

As shown in FIG.1 , each node may implement a data transfer protocol such as MIPI LLI and the protocol is shown to be consisting of multiple layers which can be based on the OSI (Open System Interconnection) protocol stack. Such a protocol stack usually consists of at least a Physical (Phy) layer and a Data Link (DL) layer on top of the Physical layer. Protocols such as MIPI LLI and MIPI UniPro has an intermediate Physical Adapter (PA) layer which abstracts away the details of Physical layer such as number of lanes from the upper layers. This also permits supporting multiple Physical layers such as MIPI M-PHY or MIPI D-PHY, and using a corresponding Physical Adaptation layer. As can be understood by the one with ordinary skills in the art, the system of FIG.1 resembles closely a system with the MIPI LLI protocol using a MIPI M-PHY as the Physical layer. In the shown example, the protocol stack further comprises a Transaction layer on top of the Data Link layer. The Physical Adapter layer, Data Link layer and Transaction layer form the Upper Protocol layers of the protocol stack.

The data transfer protocol stack usually consists of a Receiver (Rx) and Transmitter (Tx) functionality. This is shown by independent transmitter blocks 1 1 and 21 , and receiver blocks 12 and 22, respectively, within each one of nodes 1 and 2 in FIG.1 . The transmitter block 1 1 of node 1 transmits data over serial e.g. point-to-point links that couple the nodes 1 and 2, said data being received by receiver block 22 of node 2. Conversely, the receiver block 12 of node 1 receives data over serial point-to-point links that couple the nodes 1 and 2, and which is transmitted by transmitter block 21 of node 2.

Let us first concentrate on the transmitter side, namely, on blocks 1 1 and 21 of nodes 1 and 2, respectively.

When the transmitter Physical Adaptation layer is present, it is responsible for taking the data from the upper layer and data (usually at byte granularity) striping across the number of active lanes when more than one serial lane is present. In implementations wherein there is no Physical Adaption layer, this functionality can be done by the Physical layer itself. Also, this feature can be integrated into the Data Link layer and, hence, the location of data striping logic is in no way restricting the present disclosure to the shown embodiments.

The Physical layer then receives this data allocated to each lane and serializes them using a SERIALIZER unit (formed by blocks depicted as "SER_ * " in FIG.1 ), and transmits the serialized data on each of the active serial lanes. It should be noted that there shall be as many SER_ * units as there are serial lanes, namely N such units for data transmission from node 1 to node 2, and M such units for data transmission from node 2 to node 1.

The purpose of using more than one active serial lane is to cater to higher bandwidths than what is possible to be serviced with just one serial lane. Such serial interfaces are called multi-lane serial interfaces.

Clocking information is embedded into the serialized data by using well known embedded clocking schemes such as the 8b/10b encoding scheme which has been presented in the introductory part of the present description, or any of its known equivalents. In the example, words of 8 data bits are encoded into symbols of 10 bits. It follows that, in this example, the symbol clock period is ten times the bit clock period.

Let us turn now to the receiver side, namely, on blocks 21 and 22 of nodes 1 and 2, respectively.

The receiver Physical layer consists of Clock and Data Recovery (CDR) units 26, preferably one such CDR unit per lane, as shown in FIG.1 . Each CDR unit extracts the bit clock and also the serial data and outputs parallel data (depicted as "Data_*" in FIG.1 ) and a corresponding clock called symbol clock (depicted as "SymC_*" in FIG.1 ), for the associated lane.

In the example considered here, the "Data_*" signal is 8 bits wide and the corresponding symbol clock frequency is one-tenth of the serial bit clock frequency. It shall be appreciated that wider "Data_*" signals are possible with corresponding scaling of the symbol clocks and the width currently shown herewith is for illustrative purposes only. Each CDR unit further outputs a Data Ready signal (depicted as "DRdy_*" in FIG.1 ) which, when asserted, indicates that received data is valid. Still further, the CDR unit also outputs a bit clock (depicted as "BitC_*" in FIG.1 ).

Due to the mismatches in routing and various other reasons example of which have been listed in the introductory part of this description, there may be static variation in the arrival times of the serial data from each of the lanes. This static variation in arrival time is known as skew. In a multi-lane serial interface such as the one shown in FIG.1 , skew is inevitable and has to be removed at the receiver before the bytes that were stripped over the different lanes can be reassembled. This functionality is handled by embodiments described herein and is carried out by the lane alignment circuit 13 or 23, present in node 1 or node 2, respectively, as shown in FIG.1 .

It shall be appreciated that, while an example of a data transmission system has been given above with reference to the block diagram of FIG.1 in the context of the MIPI LLI protocol, teachings of the present disclosure of embodiments are still relevant for data transfer protocols other than MIPI LLI, and also for protocols which are not based on the OSI protocol stack. Any person skilled in the art can easily modify/adapt embodiments in any manner suitable for the specific implementation, using any suitable hardware and/or software material.

FIG.2 shows an example of a lane alignment circuit 20 in further detail, in a specific implementation corresponding to a multi lane serial interface with four lanes. It will become apparent, however, that any other number of lanes may be present, depending on the specific implementation. In the shown example, the lane alignment circuit consists of four building blocks or groups of blocks, namely:

- Delay blocks 21 ;

- Pattern Checking & Lane Select, Delay value Generator block 22, - a Lane and Delay Selector & Power Save logic block 23; and,

- a Clock Buffer & Multiplexer block 24 with buffering functionality.

There is one Delay block per lane and hence there are four such Delay blocks shown in the example of FIG.2.

Each delay block is adapted to delay the symbol clock SymC_* of the corresponding lane by a given delay value, using e.g. a delay chain. The delay chain may, for instance, comprise an integral number of delay elements e.g. latches, flip-flops or similar synchronous logic gates.

Each delay element may be arranged to delay the symbol clock SymC_* by e.g. an amount of time equal to the bit transmission duration on the serial lane, i.e. to the bit clock period. As shown in FIG.2, bit clocks BitC_* are fed as inputs to the Delay blocks to illustrate one possible implementation using Flip-Flops as delay elements.

It should be noted, however, that any specific amounts of delay can be obtained by using different implementation methods. Hence, neither the granularity of delay nor the implementation is restricted by the present disclosure of embodiments.

The amount of delay applied by Delay blocks 21 may be chosen based on respective selection signals Del_Sel[3:0] issued from the Lane and Delay Selector block 23. The width of the Del_sel[3:0] signals is a function of the granularity of delay and hence the width of 4 bits presented herein is purely illustrative of the described examples of embodiments and is not limiting.

The delayed symbol clocks Clk#*_D respectively issued from each one of the Delay blocks 21 respectively associated to the lanes is fed to the Clock Buffer & Multiplexer block 24. The Clock Multiplexer 24 is used to select a data sampling clock AlignedSymC from among the plurality of symbol clocks SymC_* respectively associated to each lane. In the shown example of implementation, the select signal DELAYED_SYM_CLK_SELECT of the Clock multiplexer block 24 is driven by the Delay Selector logic block 23. The selected symbol clock, as delayed by the corresponding one of the upstream Delay blocks 21 , is to be used as common sampling clock for sampling data.

The clock multiplexer's output is fed to external Clock Routing resources 25 as shown in FIG.2. The Clock Routing resources 25 may include an interconnect network, a wiring harness, etc., adapted to transmit the data sampling clock AlignedSymC, i.e. the delayed selected symbol clock, to any specific location where said signal is to be used in the system. In particular, the data sampling clock AlignedSymC is fed to the Pattern Checking logic 22 of the Lane Alignment circuit 20, either through the Clock Routing resources as shown, or via direct internal connection within the circuit 20.

The Lane and Delay Selector & Power Save logic block 23 may comprise registers 231 and 232 as shown in FIG.2, which can be programmed. For instance, the programmable registers 231 may contain a digital value which identifies the lane the symbol clock of which should be selected as the common sampling clock used for sampling data received in all lanes. In the example where there are four lanes, a 2-bit value is sufficient for that purpose. Further, the programmable registers 232 may contain a delay value which determines the amount of delay that should be added to the selected symbol clock. The default (or reset) values set in these registers could be defined by the data transfer protocol. For instance a value "Lane_0" could be the default lane and the amount of default delay could be equal to zero bit clock periods if the lane referred to by the value "lane_0" is assumed to be the maximum skewed lane, with respect to the known topology of the electronic system.

FIG.3 depicts an example of a state machine which describes the method used to select the amount of delay and the lane whose symbol clock is chosen as the common sampling clock. This state machine describes the operation of the proposed de-skew logic.

Such method supports a power saving scheme which allows any lane to be turned off out of the total available lanes on the chip, to scale down the bandwidth if required. Also the amount of delay is programmable as the skew of the lane chosen is not known in advance, since it depends on the level of transmission activity on the multi lane serial interface between nodes 1 and 2 of FIG.1 .

It shall be noted that the names of signals depicted in FIG.3 are typed with a capital letter for enhancing clarity of the drawing and of the corresponding description which follows. These binary signals are described as being either "asserted" or "de-asserted" rather than "active high" or "active low", to indicate that proposed embodiments have no preference for active high or active low logic.

As shown in FIG.3, the State machine remains in a RESET state 30 until a Reset signal is de-asserted. This Reset signal may be a combination of a Chip Reset signal and the de-skew logic reset signal. The de-skew reset signal is de- asserted only after the number of active lanes is known.

For instance, the active lanes may be identified by readily available mechanisms such as the Link Start-up mechanism in MIPI UniPro. In other embodiments, the active lanes may be mandated by the data transfer protocol. In short, the data transfer protocol always knows how many lanes need to be active, and this information may also be used by the proposed embodiments. If the number of active lanes is changed at any point in time, then the de-skew Reset signal is asserted to reset the de-skew logic.

Once the state machine comes out of RESET state up on the de- assertion of the Reset signal, it sets, at 31 , the value of the lane whose delayed symbol clock is chosen as aligned symbol clock, i.e., common sampling clock (depicted as "AlignedSymC" in FIG.2).

In the shown example, this value may be set in register 231 to the value identifying the lowest active lane, i.e., the lane with the lowest numbered serial data link from among the plurality of serial data links that make up the multi lane serial interface.

It should be noted that the multi lane serial interface may consist of more than one serial data link (or lane) in each direction, for example in FIG.1 , there is represented an interface made up of (N+1 ) lanes from node 1 to node 2 and (M+1 ) lanes from node 2 to node 1 . The number of lanes in each direction of the multi lane serial interface is dependent on the bandwidth required by the application(s) using the interface and it is possible that some of the lanes can be turned off when lower bandwidths are needed. Such lanes are considered inactive as there is no data transmission on them while active lanes refer to those lanes on which data transmission is ongoing. Furthermore, inactive lanes can be made active when more bandwidth is required by the application(s). For example, in FIG.1 , when the bandwidth requirement from node 1 to node 2 is low, only one lane (for instance the lowest numbered lane, i.e., Lane_0) may be used for data transmission resulting in Lane-_0 being the only active lane and the remaining N lanes Lane_1 to Lane_N being inactive. Later on, for instance when high bandwidth becomes required, all of the (N+1 ) lanes may be used for data transmission if needed, resulting in all lanes being active and no lane being inactive. Any intermediate number of lanes, between and including 1 and N, may be made active depending on the bandwidth requirements. Stated otherwise, only a subset of total number N+1 of lanes in a direction may be used for data transmission depending on the bandwidth requirements in the running application(s).

In embodiments, the amount of delay may also be set to the minimum delay value, e.g. zero, in register 232.

It shall be appreciated, however, that the de-skew scheme works similarly if and when the symbol clock of any other active lane is chosen as initial common sampling clock. For instance, the symbol clock selected as common sampling clock may be the symbol clock associated with the lane on which the earliest transmission activity is detected. In that case, the initial amount of delay may be set in register 232 to the maximum delay value, which may be tied to, e.g. be equal to, the maximum tolerated amount of skew.

Once the values for the selected lane and amount of delay are set, the state machine waits in the WAIT_DRdy_AII_Active_l_anes state 32 for all the data ready signals DRdy_* (see FIG.1 ) corresponding to each active lane to be asserted to know whether each of them has received data.

Once all of the data ready signals DRdy_* are asserted, the state machine transitions to PATTERN_CHECK state 33 to check whether all the data lanes can be sampled with the AlignedSymC clock. This may be achieved, for instance, using one and the same pattern of non-data information known by the receiver device, which may be transmitted on all active lanes e.g. during the training period. The sampling is considered to be correct if the value for the pattern retrieved on the receiver side matches the expected value. For instance, the pattern used may comprise special codes used by the physical layer to signify the start of burst such as "Marker-0" in the case of MIPI M-PHY, or some special pattern that is transmitted on all active lanes by the data transfer protocol. The pattern used should preferably be transmitted on all active lanes, e.g. at regular time intervals to speed-up the de-skewing process.

In Pattern_CHECK state 33, a test 34 is thus performed as explicated herein above, to check whether all the lanes can be sampled correctly using the selected symbol clock as delayed by the selected delay value. If yes, then the current values stored in the registers 231 and 232 for the selected lane and selected delay value, respectively, are retained and this is indicated by asserting the Lane_delay_value_Ok signal at 35 and transitioning to the POWER_SAVE state 36. Also before the transition to POWER_SAVE state 36, the unused logic in the CDR blocks and Delay blocks 21 may be shut down by driving the "PD_ * " signals accordingly (see FIG.2), to save power. More generally, any hardware resources dedicated to the processing of data for a lane which is not active are shut down to save power.

If the pattern is not sampled correctly on all the active lanes, then it implies that the currently selected delay value is not sufficient. Hence, the signal Pattern_match_failed is thus pulsed at step 37 to indicate to the upper layer that an opportunity to achieve de-skews was lost. Also, the amount of delay added to selected lane's symbol clock is modified at step 38, for instance incremented by at least one delay element, e.g. one bit clock period.

After the increment, a test 39 is made to check whether the maximum delay value is exceeded. If it is not exceeded, then the skew is within the value tolerated by the data transfer protocol and the state machine goes to WAIT_DRdy_AII_Active_l_anes state 32. In this state 32, the state machine awaits the re-transmission of the pattern which is done according to the data transfer protocol. The above loop is continued until either de-skew is achieved (i.e., there is a pattern match on all active lanes) or the amount of delay exceeds the maximum delay (which may be a function of the maximum skew tolerated) allowed by the data transfer protocol. In case it is determined at test 39 that the maximum skew is exceeded, then the Skew_violated signal is asserted at step 40 and prompted to the upper data transfer protocol layers. Also, the state machine goes into the SKEW_VIOLATED state 41 .

In some embodiments the SKEW_VIOLATED state 41 may be exited only by assertion of the Reset signal, which brings the state machine in RESET state 30. It is not a problem to reset the de-skew logic when the de-skew process is performed during a training period, since it does not jeopardize the reception of useful data.

The Pattern_match_failed signal, when pulsed, can be used by the local node, i.e. receiver, to request from the remote transmitter that the pattern be re- sent, e.g. via negative acknowledgement of the data transfer protocol (such as MIPI LLI).

In order to speed up the method of achieving the pattern match more promptly, the PCB routing skew and Transmit skew between lanes, if known, can be used to determine the Delayjmin value and also the lane selected. The Delayjmax value is dependent on the number of delay elements in the Delay blocks 21 of FIG.2.

FIG.4 shows an extension of the state machine as explicated above with reference to FIG.3, which describes an example of a method used to select the amount of delay and the lane whose symbol clock is chosen as the common sampling clock. In FIG.4, the same elements as already shown in FIG.3 bear the same reference signs, and their description is shall not be repeated.

As shown in FIG.4, the setting of the lane number identifying the initial lane the symbol clock of which is selected as common sampling clock, on one hand, and of the initially selected delay value, on the other hand, are illustrated by independent steps 31 a and 31 b, respectively, which replaces single step 31 represented in FIG.3. This is to convey the idea that the de-skew logic according to the embodiments considered here carries out an iterative process with two loops.

The first loop is the one which has been described above with reference to FIG.3. Steps 32, 34, 37, 38 and 39 are iterated until either de-skew is achieved (i.e., in case of pattern match on all active lanes at test 34) or when the delay_value exceeds the maximum delay allowed by the data transfer protocol at test 39.

If, on the contrary, no pattern match is achieved and the delay value (Delay_value) is determined at step 39 to exceed the maximum delay value that can be added (Delayjmax, e.g. allowed by the number of delay elements available), then the lane number may be increased e.g. by one at step 43 and Delay_value is reset to the minimum delay value (Delayjmin) as indicated in FIG.4, and the first loop repeated as long as, at test 42 performed after test 39 and before step 43, the active lane considered is not determined to be the last lane. This results in the symbol clock of another lane being tried as possible common sampling clock. This process may be repeated for other active lanes if necessary, in a second loop of higher level than the first loop.

This second loop of trying out each active lane, starting from the lowest active lane, is exited if the maximum active lane is reached and still pattern match cannot be achieved. In such a case, the maximum skew is exceeded, and the Skew_violated signal is asserted to prompt the upper data transfer protocol layers accordingly, and the state machine goes into the SKEW_VIOLATED state 41 . The SKEW_VIOLATED state is exited only by assertion of Reset signal, thus bringing the state machine in RESET state 30.

Since the skew is a static variation of delay among the plurality of lanes which stems from design choices made at e.g. the PCB routing of the electronic system, the proposed method may be carried out once e.g. during a training period, and not during normal data transfer since the lane and amount of delay have already been chosen. Thus, very low latencies can be achieved.

It should be noted that the method is described herein as starting with the minimum delay and incrementing to the maximum delay, but that and any person skilled in art can make adaptations or modifications to this method such starting with maximum delay and decrementing to minimum delay value and so on.

Furthermore, it is also possible to design a logic which detects the earliest lane among all the active lanes, and adds the maximum tolerated amount of skew as delay to the earliest lane's symbol clock. The earliest lane may be the lane for which data are received first, i.e. the less skewed lane. Once properly delayed, its associated symbol clock can thus be used as aligned and common symbol clock for sampling data received from all lanes. A logic implementing such a scheme may be an alternative to the logic of the Lane alignment circuit 13 and 23 as described above.

Finally the pattern checking can be done by software running on a CPU and the lane alignment embodiment can be subjected to such adaptations easily. Also CPU can be used to program the lane number and amount of delay in the registers in the lane alignment embodiment.

More generally, the present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which - when loaded in an information processing system - is able to carry out these methods. Computer program means or computer program in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language. Such a computer program can be stored in a non transitory manner on a computer or machine readable medium allowing data, instructions, messages or message packets, and other machine readable information to be read from the medium. The computer or machine readable medium may include non-volatile memory, such as ROM, Flash memory, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer or machine readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits. Furthermore, the computer or machine readable medium may comprise computer or machine readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network, that allow a device to read such computer or machine readable information.

Expressions such as "comprise", "include", "incorporate", "contain", "is" and "have" are to be construed in a non-exclusive manner when interpreting the description and its associated claims, namely construed to allow for other items or components which are not explicitly defined also to be present. Reference to the singular is also to be construed in be a reference to the plural and vice versa.

While there has been illustrated and described what are presently considered to be the preferred embodiments of the present invention, it will be understood by those skilled in the art that various other modifications may be made, and equivalents may be substituted, without departing from the true scope of the present invention. Additionally, many modifications may be made to adapt a particular situation to the teachings of the present invention without departing from the central inventive concept described herein. Furthermore, an embodiment of the present invention may not include all of the features described above. Therefore, it is intended that the present invention not be limited to the particular embodiments disclosed, but that the invention include all embodiments falling within the scope of the invention as broadly defined above.

A person skilled in the art will readily appreciate that various parameters disclosed in the description may be modified and that various embodiments disclosed and/or claimed may be combined without departing from the scope of the invention.