Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INTEGRATED CIRCUIT AND METHOD FOR DATA TRANSFER IN A NETWORK ON CHIP ENVIRONMENT
Document Type and Number:
WIPO Patent Application WO/2006/048826
Kind Code:
A1
Abstract:
The present invention relates to an integrated circuit comprising a plurality of modules (P, C; MA, SL) coupled to an interconnect means (N) for a transaction-based communication between each other via connections (C1, C2) over the interconnect means (N). A plurality of shells (NIS) are each associated to one of said plurality of modules (P, C; MA, SL) for managing the communication between said plurality of processing modules (P, C; MA, SL) and said interconnect means (N), for inspecting messages received through a first connection (C 1), and for forwarding a first part of said received messages through at least a second connection (C2) to at least a second one of said plurality of modules (P, C; MA, SL).

Inventors:
GANGWAL OM P (NL)
RIJPKEMA EDWIN (NL)
Application Number:
PCT/IB2005/053573
Publication Date:
May 11, 2006
Filing Date:
November 02, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
GANGWAL OM P (NL)
RIJPKEMA EDWIN (NL)
International Classes:
H04L12/56
Foreign References:
US6182183B12001-01-30
US6311212B12001-10-30
Other References:
BOLOTIN E ET AL: "QNoC: QoS architecture and design process for network on chip", JOURNAL OF SYSTEMS ARCHITECTURE, ELSEVIER SCIENCE PUBLISHERS BV., AMSTERDAM, NL, vol. 50, no. 2-3, February 2004 (2004-02-01), pages 105 - 128, XP004492175, ISSN: 1383-7621
DIELISSEN J ET AL: "Concepts and Implementation of the Philips Network-on-Chip", -, 13 November 2003 (2003-11-13), XP002330547
Attorney, Agent or Firm:
Eleveld, Koop J. (AA Eindhoven, NL)
Download PDF:
Claims:
CLAIMS:
1. Integrated circuit, comprising: a plurality of modules (P, C; MA, SL) coupled to an interconnect means (N) for a transactionbased communication between each other via connections over the interconnect means (N), a plurality of shells (NIS) each associated to one of said plurality of modules (P, C; MA, SL) for managing the communication between said plurality of processing modules and said interconnect means (N), for inspecting messages received through a first connection, and for forwarding a first part of said received messages through at least a second connection to at least a second one of said plurality of modules.
2. Integrated circuit according to claim 1, wherein each of said shells (NIS) is adapted to inspect header information of said received messages and to select the first part of messages to be forwarded according to said header information.
3. Integrated circuit according to claim 2, wherein said header information comprise additional bits indicating whether parts of the messages is to be forwarded.
4. Integrated circuit according to claim 3, wherein said additional bits further indicate which part of the message is to be forwarded.
5. Integrated circuit according to claim 1, wherein said shell (NIS) comprises a connection table (CT) for storing forward information based on which the shell (NIS) decides to forward the first part of the messages.
6. Integrated circuit according to claim 1, wherein said first part of messages represents synchronization information (Sl, S2) and a second part of messages not forwarded represents data (Dl, D2) to be transferred to said processing module associated to said shell (NIS).
7. Data processing system comprising at least one integrated circuit according to any one of claims 1 to 6.
8. Method for transferring data in a network on chip environment having a plurality of modules (P, C; MA, SL) coupled to an interconnect means (N) for a transaction based communication between each other via connections over the interconnect means (N), comprising the steps of: managing the communication between said plurality of modules and said interconnect means, inspecting messages received through a first connection, and forwarding a first part of said received messages through at least a second connection to at least a second one of said plurality of modules.
Description:
Integrated circuit and method for data transfer in a network on chip environment

The present invention relates to an integrated circuit and a method for data transfer in a network on chip environment.

The concept of buffering is used within the area of streaming applications to adjust or compensate for different rates of data production and data consumption. Typically, such systems are modeled using the so-called Kahn process network KPN model as described in G. Kahn, "The semantics of a simple language for parallel programming, in information proceeding", J. L. Rosenfeld, 1974. Possible implementations of these KPN models are C- HEAP as described by Om Prakash Gangwal, Andre Nieuwland, and Paul Lippens, "A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems", In ISSSOl (International Symposium on system synthesis), 2001 or ECLIPSE as described M. J. Rutten, J. T. J. van Eijndhoven, E. J. D. Pol, E. G. T. Jaspers, P. van der Wolf, O. P. Gangwal, A. Timmer, "Eclipse: a heterogeneous multiprocessor architecture for flexible media processing", IEEE Design and Test of Computers vol. 19, no. 4, pp. 39-50, July- Aug. 2002. The underlining ideas of these methods are to separate the data transfer and the data synchronization between producers and consumers of data. Accordingly, the granularity of data transfers and the respective synchronization between the producer and consumer is made independent which is advantageous in the case of video processing applications where synchronization points may range widely from one block of 8x8 pixels to one field or one frame. A typical synchronization action consists of updating administrative information regarding the produced or consumed data and forwarding a signal to the respective consumer or producer. A memory for this administrative information may be provided centrally as a shared memory or distributed at places adjacent to the producer and/or consumer.

According to C-HEAP the data information and the administrative information within bus-based systems will be placed centrally within a shared memory as shown in Fig. 6. A data transfer between a producer and a consumer C is performed via the bus B and the shared memory M 3 i.e. there is no direct data transfer between the producer and the

consumer while the synchronization signals Sl, S2 are directly transferred between the producer and the consumer. In other words, the data transfer actions Dl, D2 and the signaling actions Sl, S2 are separated. Due to the nature of the shared infrastructure, the single shared bus, the data transfer actions and the signaling actions occur within a specified order, i.e. Dl- >S1— >O2— >S2. The producer P writes data Dl into the memory M, awaits an acknowledgement and issues a signaling action Sl to the consumer to notify the consumer that there is data present in the memory M. Thereafter, the consumer C accesses the memory by the data transfer action D2 in order to read the memory. When the data Dl previously written into the memory M by the producer P has been read or has been accessed by the consumer C, a respective signaling action S2 is issued directly to the producer P via the shared bus B in order to notify the producer P that the data has been read or accessed by the consumer.

An alternative architecture, namely a network on chip environment, is shown in Fig. 7. Within this network on chip four different connections Dl, D2, i.e. for the transfer of the data, and Sl, S2 for the respective signaling must be implemented. Due to the nature of these separated connections for data and signaling, the producer P has to firstly verify that the data Dl has actually been written into the memory M in order to forward a signal Sl thereafter to the consumer C. However, some kind of mechanism must be put in place to ensure that data Dl is tagged and to wait for some kind of response, like a tag acknowledge, from the memory M in order to verify that the data finally reached the memory M. The latency within network on chip for round trip latencies is usually high, like 100 cycles. The round trip latency from the producer to the memory and back to get the required acknowledgement may be acceptable for larger blocks, i.e. a larger granularity but for smaller granularity like a 16x16 block such high latency or waiting time is not acceptable. Note that there may be other latencies present in the system that has the same order of magnitude.

It is therefore an object of the invention to provide a reduced latency for a data transfer as well as an improved network resource utilization within a network on chip environment.

This object is solved by the integrated circuit according to claim 1 as well as according to the method for data transfer within an network on chip environment according to claim 8.

Therefore, an integrated circuit is provided comprising a plurality of modules coupled to a interconnect means for a transaction-based communicating between each other via connections over the interconnect means. A plurality of shells are each associated to one of said plurality of modules for managing the communication between said plurality of processing modules and said interconnect means, for inspecting messages received through a first connection, and for forwarding a first part of said received messages through at least a second connection to at least a second one of said plurality of modules.

Accordingly, already created connections may be used to forward data to further modules such that the required number of connections can be reduced, in order to implement a transfer of data to multiple targets.

According to a further aspect of the invention each of said shells is adapted to inspect header information of said received messages and to select the first part of the messages to be forwarded according to said header information. Hence, the information which message is to be forwarded is stored in the header information of the messages. According to a further aspect of the invention additional bits further indicate which part of the message is to be forwarded. Hence, the information which part of the message is to be forwarded is stored in additional bits in the header of the messages.

According to still a further aspect of the invention the first part of messages represents synchronization information and a second part of messages not forwarded represents data to be transferred to said processing module associated to said shell.

Guarantees for the bandwidth and the throughput can thus be provided at data transfer level. The low bandwidth stream for the synchronization signaling and the high bandwidth stream for the data transfer are merged or combined. Throughput guarantees can thus be provided at a channel level (a KPN or C-HEAP channel level) for the Quality of Service QoS, i.e. predictability at the application level. Hence, the guarantees are translated from network level to application level.

The invention is also related to a method for transferring data in a network on chip environment having a plurality of modules coupled to an interconnect means for a transaction-based communication between each other via connections over the interconnect means. The communication between said plurality of modules and said interconnect means is managed. Messages received through a first connection are inspected. A first part of said received messages is forwarded through at least a second connection to at least a second one of said plurality of modules.

The invention is based on the idea to use existing connections within a network on chip environment, which are used to transfer first data from a first to a second module indirectly via a third module, to communicate second data between the first and second module.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiment(s) described hereinafter.

Fig. 1 shows a basic arrangement of a network on chip environment, Fig. 2 shows a basic block diagram of a network on chip environment according to a first embodiment of the invention,

Fig. 3 shows a basic block diagram of a network on chip environment according to a second embodiment,

Fig. 4 shows an illustration of the communication within a network on chip according to Fig. 2,

Fig. 5 shows a basic block diagram of a network on chip environment according to a third embodiment,

Fig. 6 shows a basic block diagram of a bus based system according to prior art, and Fig. 7 shows the basic block diagram of a network on chip environment according to prior art.

The following embodiments relate to systems on chip, i.e. a plurality of modules on a single chip or on multiple chips communicate with each other via some kind of interconnect. The interconnect is embodied as a network on chip NoC. The network on chip may include wires, bus, time-division multiplexing, switches, and/or routers within a network. At the transport layer of said network, the communication between the modules are performed over connections. A connection is considered as a set of channels, each having a set of connection properties, between a first module and at least one second module. For a connection between a first module and a single second module, the connection comprises two channel, namely one from the first module to the second channel, i.e. the request channel, and a second from the second to the first module, i.e. the response channel. The request channel is reserved for data and/or messages from the first to the second, while the response channel is

reserved for data and/or messages from the second to the first module. However, if the connection involves one first and N second modules, 2*N channels are provided.

The modules as described in the following can be so-called intellectual property blocks IPs (computation elements, memories or subsystems which may internally contain interconnect modules) that interact with network at network interfaces NI. A network interface NI can be connected to one or more IP blocks. Similarly, an IP can be connected to more than one network interfaces.

Fig. 1 shows a basic block diagram of a network on chip according to a first embodiment. In particular, a master module MA and a slave module SL each with an associated network interface NI are depicted. Each module MA, SL is connected to a network N via its associated network interface NI, respectively. The network interfaces NI are used as interfaces between the master and slave modules MA, SL and the network N. The network interfaces NI are provided to manage the communication between the respective modules MA, SL and the network N, so that the modules can perform their dedicated operation without having to deal with the communication with the network or other modules. The network comprises a plurality of interconnected routers R. The routers R serve to forward packets (commands and data) to the next router R or to a network interface NI. For more details on a router architecture please refer to Rijpkema et al, "A Router Architecture for Networks on Silicon", Proceedings of Process 2001, 2nd Workshop on Embedded Systems, or "Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services For Networks on Chip", by Rijpkema et al in Design, Automation and Test in Europe Conference and Exhibition (DATE'03) March 03 - 07, 2003 Munich, Germany.

The network interfaces NI are designed to provide services at the transport layer in the OSI reference model as it represents the first layer with services which are independent of the network implementation. This improves the decoupling between the computation and the communication allowing the IP blocks and the network to be designed independently from each other. Transport layer services are provided by defining connections like a point-to-point connection or a multicast connection with specific properties, e.g. throughput, ordering etc. The communication between the IP blocks over the network is performed on the basis of a transaction-based protocol. Here, the master IP block issues a request messages req which is executed by the slave IP block as addressed in the request message req. The slave IP block can respond with a response message resp. The request message req may constitute write or read commands at an address and possibly data. The response message

resp may indicate the status of the command execution like an acknowledgement of the transaction execution and possibly also some data.

The network interface NI is designed to convert the packet-based communication of the network to the higher- level protocol of the IP modules. The network interface NI mainly comprise a network interface kernel and a network interface shell NIS. The kernel packetizes the messages and schedules them to the routers. The shell NIS implements the connections, transaction ordering and higher- level issues which are specific to the protocol of the IP module. The shell sequentializes commands and flags, addresses and write data in request messages and desequentializes messages into read data and write responses.

The signals to be transferred over the network are sequentialized into request and response messages req, resp which are supplied to the network and transported as packets. Each packet consists of several flits, which constitute the minimal transmission unit. This packetization is preformed by the network interface and is therefore transparent to the IP modules. The communication over the network N is performed via connections. These connections may be uni-cast (i.e. one master, one slave), multicast, (i.e. one master, multiple slave each slave execute each transaction), and narrowcast connections (i.e. one master, multiple slaves the transaction is executed by only one slave).

Figure 2 shows a schematic block diagram of a network on chip environment according to a first embodiment. A producer P of data and a consumer C of data are connected to a memory M via a network N (or a network on chip). The memory M is coupled to the network via a network interface NI. This network interface NI may be part of the network on chip or may be arranged between the network and the memory M. The producer P and the consumer C can be so-called intellectually property blocks IP modules (computation elements, memories, or subsystems which may internally contain interconnect models) that may be connected with the network at network interfaces NI as well.

According to the first embodiment the data transfer and the synchronization signaling are preformed through the same connection. The producer P creates a first connection Cl to the memory M to transfer data to the memory M, while the consumer C creates a second connection C2 to the memory M to access the memory M. The creation of the first and second connection Cl, C2 are firstly necessary for the data transfer. It should be noted that no additional or dedicated connection is created for the synchronization signal, i.e. the data as well as synchronization signaling is transferred via the same connection. In other words, the messages relating to the data Dl to be transferred are sent through the first

connection Cl from the producer P to the memory M, then the messages relating to the synchronization Sl are sent to the memory M. After the reception of the messages relating to the synchronization Sl by the memory M these messages Sl are forwarded to the consumer C through the already created second connection C2. The consumer C accesses the memory M via the second connection C2 and issues synchronization information S2 through the second connection C2 to the memory M which forwards these messages S2 to the producer P. Therefore, no additional network resources are required for sending the synchronization information Sl, S2.

The above scheme is advantageous as the waiting time for a confirmation of the reception of the data to be transferred to the memory M is eliminated, wherein the confirmation is needed to issue the synchronization signal. In addition, the required number of connections is reduced from 4 to 2, namely Cl, C2 which are already present for the data transfer. Accordingly, costs in the network interface e.g. for buffering, as well as in the routers is saved. If guaranteed bandwidth for a communication over the network on chip NoC is to be implemented at application level, two separate guaranteed throughput connections (one for the data transfer Dl and one for the synchronization Sl) are required between the producer P and the memory M and two separate guaranteed throughput connections (one for the data transfer D2 and one for the synchronization S2) are required between the memory M and the consumer C. However, such an implementation is inefficient, as the Tag and Tagack (Tag acknowledge) exchange is require a lot of time, whereby the latency is increased. Additionally, further waiting time is required for the synchronization signaling to be conducted over a separate guaranteed throughput channel with low bandwidth. Hence, the latency is increased. Accordingly, a larger application channel buffer latency and a token latency is required.

However, using the scheme according to the first embodiment can provide guarantees for the bandwidth and the throughput at data transfer level. The low bandwidth stream for the synchronization signaling and the high bandwidth stream for the data transfer are merged or combined and is implemented using the multi-target data transfer mechanism. In particular, throughput guarantees can be provided at a channel level (a KPN or C-HEAP channel level), which is a higher abstraction level. Thus throughput guarantees can be provided for the Quality of Service QoS and predictability at the application level. Hence, the guarantees are translated from network level to application level.

IfC-HEAP channel are used an ordering of data and the synchronization signaling must be performed in the network interface.

In the first embodiment the principles of the invention have been described on the basis of a producer-memory-consumer architecture. However, a multicast architecture with more than one target or consumer is also possible, i.e. one producer, one memory and several consumers.

Figure 3 shows a basic block diagram of a network on chip environment according to a second embodiment. The elements of the network on chip correspond to those according to the first embodiment of Figure 2. In addition, the network interface comprises a network interface shell NIS. Alternatively, this shell NIS may also be next to existing network interfaces NI or between existing network interfaces and an associates IP module. Here, the information whether a message is to be forwarded is included into the data message itself, e.g. in the header of the message. This information is monitored by the shell NIS and data messages are forwarded to another network interface or another IP module. The additional bits N of the next or following destination of the message are incorporated into each data message. According to these additional bits N the shell NIS determines whether the payload Datal, Data2, comprise normal data, i.e. data which is destined for the IP module associated to the shell NIS, or whether this data is to be forwarded to another destination within the network on chip environment. IfN=O then the payload in the message is normal data and hence has reached its destination. The value of N may indicate that the N th word after this header is the next header. This next header may contain the address of the next destination or index of the queue of the next connection (for destination routing) of the payload or at least parts thereof. Alternatively, this next header may contain the route to the next destination (in case of source routing). The interpretation of the messages is preformed by the shell NIS. For more information of the interpretation of the messages in the shell please refer to "An Efficient On- Chip Network Interface Offering Guaranteed Services, Shared-Memory Abstraction, and Flexible Network Programming", by Radulescu et al. in Design Automation and Test Conference (DATE '2004), Paris February 2004. After the interpretation, the shell NIS removes the first header, i.e. the head header, and filters all normal data messages and delivers them to its associated IP module. All other messages, i.e. those messages with an additional or next header, are forwarded with the additional or next header as normal header indicating the destination through the already existing connection. In other words, the data message is received by the memory M or its associated shell NIS from the producer P

through the first connection Cl, is interpreted by the shell NIS and is forwarded via the second connection C2 to the consumer C after the head message header or original message header is removed and the following header is made the new message header.

Figure 4 shows an illustration of the communication within a network on chip according to Fig. 2. In particular, the different possibilities to forward selected data messages to a further destination within the network on chip is depicted. First of all, messages may be forwarded fwd as soon as the data messages indicating a read request req are received by a network interface. Secondly, the data messages are forwarded if the received request req is accepted by the associated IP module. Thirdly, the data messages are forwarded fwd as soon as the associated IP module returns a response resp.

However, the third option requires that a response resp is returned to the producer P before the data messages can be forwarded to the consumer C. Accordingly, this option is more stricter for the forward information.

Although the forwarding to particular data messages has been described according to the second embodiment with reference to a producer-consumer concept, the forwarding of data messages may also be implemented for a multicast concept, i.e. more than one target pl-ml-cl-m2-c2-c3. Here, a data message from the producer Pl may have the following structure:

({hl, nl=3}, dl, d2,

{h2, n2=4}, d3, d4, d5,

{h3, n=0}, {h4, n4=0}, {h5, n5=l}, d6) Every above element represents a word within a data message. The elements dl, d2 are transported as indicated by header hi. The elements d3, d4 and d5 are delivered as indicated by the header h2. For the case that d3 constitutes synchronization information then a consumer will be notified that dl and d2 have been stored in the memory ml.

In order to implement a multicast concept, the headers are implemented such that all destinations are targeted. Those data messages which are required for the multicast should not be removed from the message. This can be performed by an additional bit in the header indication whether the message is to be removed or not. E.g. the data may be moved after the next header in every destination node or network interface. Accordingly, if the initial message is (Hl, D, H2, H3, H4) then after the first destination the message will have the following format (H2, D, H3, H4).

Figure 5 shows a basic block diagram of a network on chip environment according to a third embodiment. The elements of the network on chip correspond to those according to the first embodiment of Figure 2 and the second embodiment according to

Figure 3. Here, connection tables CT are used to store the information for the forwarding of the messages. The connection tables CT may be implemented in the network interfaces NI or next to the network interfaces NI. The connection table stores the information which connections are connected to each other at a particular network interface. E.g. in Figure 1 the first and second connection Cl, C2 are connected to each other.

The third embodiment differs from the second embodiment in that the forwarding information is not stored in the messages themselves but in the network interfaces or the tables next to the network interfaces. The processing of the respective messages according to the third embodiment is preformed as described according to the second embodiment.

The connection tables may be mapped to a shared memory.

Although the forwarding to particular data messages has been described according to the third embodiment with reference to a producer-consumer concept, the forwarding of data messages may also be implemented for a multicast concept, i.e. more than one target pi -ml -cl-m2-c2-c3.

In other words, the above described multi target data transfer scheme according to the first, second and third embodiment may be implemented based on a table containing the forwarding information, by message header containing the forwarding information or by a separate dedicated unit next to the network interface. Please note, that the last option will not work for a C-HEAP channel.

In the above embodiments the principles of the invention have been described on the basis of transferring data and sending respective synchronization signals. However, the principles of the invention are also applicable to any kinds of messages which are sent over the connections.