Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR BYTE SLICE PROCESSING DATA PACKETS AT A PACKET SWITCH
Document Type and Number:
WIPO Patent Application WO/2007/074423
Kind Code:
A2
Abstract:
A method and system for byte slice processing of received data packets at a packet switch is provided. When a data packet is received at the packet switch, the packet is buffered by a buffer manager and then processed by a processor to determine to which outgoing port to send the packet. When aggregate bandwidth of all incoming ports is high, resources of the processor can be optimized to minimize hardware logic, and maximize a processing rate of the switch. One optimization is to limit the number of bytes that are sent to the processor based on some preliminary identification of packet type together with known configuration information about the receiving port interface type, so that only information within the data packets that is deemed necessary is sent to the processor.

Inventors:
FISCHER STEPHEN (US)
KALAMPOUKAS LAMPROS (US)
SINGH KANWAR JIT (IN)
Application Number:
PCT/IB2006/055051
Publication Date:
July 05, 2007
Filing Date:
December 28, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UTSTARCOM INC (US)
FISCHER STEPHEN (US)
KALAMPOUKAS LAMPROS (US)
SINGH KANWAR JIT (IN)
International Classes:
H04J1/02
Foreign References:
US20020085507A1
US6160819A
US6237052B1
US20050041665A1
US6690646B1
Attorney, Agent or Firm:
MACKINNON, Charles (Rolling Meadows, Illinois, US)
Download PDF:
Claims:
Claims

[1] A method for processing data packets received at a packet switch comprising: receiving a data packet into a port interface module of the packet switch; identifying a packet type of the data packet; based at least on the packet type, removing a portion of the data packet to reduce a size of the data packet; adding an internal packet switch header to the reduced data packet so as to form a modified data packet in order to route payload of the data packet within the packet switch; and sending the modified data packet to a processor of the packet switch. [2] The method of claim 1, further comprising dividing the modified data packet into n portions, such that each consecutive portion includes every subsequent nth byte of the modified data packet. [3] The method of claim 2, wherein sending the modified data packet to the processor of the packet switch comprises sending each consecutive nth portion to one of n buffer managers. [4] The method of claim 3, wherein sending each consecutive nth portion to one of n buffer managers comprises forwarding bytes located in a specific location within the data packet to the same buffer manager so that a byte at location k within the data packet is sent to a buffer manager identified by the following equation: destination buffer manager = k mod n. [5] The method of claim 1, wherein removing the portion of the data packet to reduce the size of the data packet comprises removing at least one packet header. [6] The method of claim 1, wherein the internal packet switch header carries information about the data packet including a data packet type, an arriving port number, and an indication of a number of bytes to send to the processor. [7] The method of claim 1, further comprising within the processor processing the modified data packet by removing at least one packet header to form a second modified data packet. [8] The method of claim 7, further comprising storing the second modified data packet in memory. [9] The method of claim 1, wherein receiving the data packet into the port interface module of the packet switch comprises receiving the data packet from one network input. [10] The method of claim 1, wherein receiving the data packet into the port interface module of the packet switch comprises receiving the data packet from a plurality of network inputs.

[11] The method of claim 10, further comprising temporarily storing the data packet in memory. [12] The method of claim 1, wherein the port interface module includes multiple modules and wherein the processor includes n buffer managers, and wherein the method further comprises: receiving at each module a data packet; dividing the data packet at each module into n portions; and sending from each module each consecutive nth portion to one of the n buffer managers, wherein each buffer manager has multiple input ports so that each buffer manager receives portions of the data packets from each of the modules in parallel.

Overall Byte-Slicing [13] A method for processing data packets received at a packet switch comprising: receiving a data packet; dividing the data packet into n portions, such that each subsequent portion includes every subsequent nth byte of the data packet; processing the n portions of the data packet; and storing the processed portions of the data packet. [14] The method of claim 13, wherein dividing the modified data packet into n portions comprises byte slicing the data packet. [15] The method of claim 13, further comprising modifying the data packet to include an internal packet switch addressing label. [16] The method of claim 13, further comprising modifying the n portions of the data packet by removing at least packet header data.

Egress packet processing [17] A method for processing data packets received at a packet switch comprising: receiving n portions of a data packet, such that each subsequent portion includes every subsequent nth byte of the data packet; reading an internal packet switch header in the n portions of the data packet so as to determine an outgoing port of the packet switch to which the data packet is to be forwarded; sending the n portions of the data packet to the identified outgoing port of the packet switch; removing the internal packet switch header from the n portions of the data packet; forming the n portions of the data packet into one data packet; and sending the one data packet into a network. [18] The method of claim 17, further comprising storing the n portions of the data

packet. Packet Switch

[19] A packet switch comprising : a port interface module for receiving a data packet and performing a pre- classification of the data packet to determine a type of processing to be performed on the data packet, wherein the port interface module provides processing instructions for the data packet according to the pre-classification by prepending signature bytes to the data packet, the port interface module further operable to slice the data packet on a byte-by-byte basis for processing; and a line card coupled to the port interface module and including: buffer manager devices, each buffer manager device receiving a slice of the data packet from the port interface module and based on information in the signature bytes, the buffer manager devices identify a number of bytes of the data packet to be processed; a header processor for receiving the number of bytes from the buffer manager devices and assembling together the bytes to form a modified data packet, wherein the header processor processes the modified data packet and sends to each buffer manager device a portion of the modified data packet; and a traffic manager for informing the buffer manager devices when to send their portion of the modified data packet across a switch fabric.

[20] The packet switch of claim 19, wherein based on the pre-classification of the data packets, the port interface module removes bytes from the data packets.

[21] The packet switch of claim 19, wherein the signature bytes carry information about the data packet selected from the group consisting of a packet type, an arriving port number, and a number of bytes to be processed within the data packet.

[22] The packet switch of claim 19, wherein the buffer manager devices each send their respective portion of the modified data packet across the switch fabric when directed to do so by the traffic manager.

Description:

Description

Method and System for Byte Slice Processing Data Packets at a

Packet Switch

FIELD OF INVENTION

[1] The present invention relates to processing data packets at a packet switch (or router) in a packet switched communications network, and more particularly, to a method of dividing data packets for buffering and processing of the data packets.

BACKGROUND

[2] A switch within a data network receives data packets from the network via multiple physical ports, and processes each data packet primarily to determine on which outgoing port the packet should be forwarded. In a packet switch, a line card is typically responsible for receiving packets from the network, processing and buffering the packets, and transmitting the packets back to the network. In some packet switches, multiple line cards are interconnected via a switch fabric, which can route packets from one line card to another. On a line card, the direction of packet flow from network ports toward the switch fabric is referred to as 'ingress', and the direction of packet flow from the switch fabric toward the network ports is referred to as 'egress'.

[3] In the ingress direction of a typical line card in a packet switch, a packet received from the network is first processed by an ingress header processor, then stored in external memory by an ingress buffer manager, and then scheduled for transmission across the switch fabric by an ingress traffic manager. In the egress direction, a packet received from the switch fabric at a line card is processed by an egress header processor, stored in external memory by an egress buffer manager, and then scheduled for transmission to a network port by an egress traffic manager.

[4] In packet switches where bandwidth requirements are high, it is common for the aggregate bandwidth of all the incoming ports to exceed the feasible bandwidth of an individual device used for buffer management. In such cases, the buffer managers typically include multiple devices to achieve the required bandwidth. The aggregate input bandwidth can be split between multiple devices in the ingress buffer manager by dividing the number of incoming ports evenly among the number of buffer manager devices. However, when there is a single incoming interface from the network to the packet switch, it can become more difficult to split the incoming bandwidth among the multiple buffering devices.

[5] One method by which incoming bandwidth from a single high speed port is split over multiple buffering devices in a packet switch is through inverse multiplexing. Inverse multiplexing will send some packets to each of the available buffering devices

in the packet switch in a load-balancing manner. For example, inverse multiplexing speeds up data transmission by dividing a data stream into multiple concurrent streams that are transmitted at the same time across separate channels to available buffering devices, and are then reconstructed at the port interface into the original data stream for transmission back into the network.

[6] Unfortunately, however, existing techniques used to decide which packets should be sent to which buffering device have some disadvantages. For example, if some packets from a particular flow are sent to one buffering device, and other packets from the same flow are sent to another buffering device, then data packets will likely arrive out of order at their final destination. This requires data packet re-ordering at the destination, which adds implementation complexity if the re-ordering is accomplished at high rate incoming interfaces (such as 40Gb/s). On the other hand, if some flow identification is used so that data packets from a certain flow are always sent to the same buffering device, then it becomes difficult to evenly balance the bandwidth among the available buffering devices. Such load balancing imperfections typically lead to performance loss.

SUMMARY

[7] In exemplary embodiments, aggregate bandwidth received by a packet switch from a network over one or more incoming ports is divided or sliced on a byte-by-byte basis to be sent to multiple devices in the switch, including line card buffer managers. Exemplary embodiments are useful when the aggregate bandwidth of one or more incoming switch ports exceeds the bandwidth capabilities of an individual buffer manager device, for example. The buffer manager devices together form a single high bandwidth interface to a switch fabric.

[8] In operation, when a data packet is received on an incoming port at a packet switch from the network, the data packet is buffered by a buffer manager and then processed by an ingress packet processor in the packet switch to determine to which outgoing port to send the packet, for example. When the aggregate bandwidth of all incoming ports is high, resources of the packet processor can be optimized to minimize hardware logic, minimize cost and maximize a packet processing rate of the packet switch, for example. One optimization of ingress packet processor resources is to limit the number of bytes which are sent to the processor. For example, if it is known that the processor will only ever need to process the first 64 bytes of a data packet, then a buffer manager can send only the first 64 bytes of each packet to the processor.

[9] The number of bytes sent to the processor can be further optimized. For example, if the processor is always ignoring a certain number of bytes at the start of a data packet header, then these bytes can be removed by a port interface module or buffer manager of the packet switch prior to sending the data packet header to the processor. As a

specific example, if the processor is performing a destination Internet Protocol (IP) address lookup, then an Ethernet header of a data packet (if present) is not needed by the processor. Ethernet header bytes can therefore be stripped from the data packet prior to data packet processing.

[10] Still, another optimization of the number of bytes sent to the processor is possible if some data packets can be identified as requiring less bytes to be processed than other packets. In such a case, the buffer manager can be instructed to send a variable number of header bytes to the processor on a packet-by-packet basis. The number of bytes to send to the processor can be determined by a port interface module of the packet switch based on some preliminary identification of packet type together with known configuration information about the receiving port interface type. For example, if a data packet is identified as an Address Resolution Protocol (ARP) packet, then it is known that this data packet will be forwarded to a CPU of the packet switch, so it is sufficient to send only enough bytes to the processor to identify the packet type as an ARP packet. On the other hand, if a packet is identified as an IPv4 data packet, then it is known that the IP header of the data packet is needed to determine where the data packet should be routed, so more bytes need to be sent to the processor than for the ARP packet, for example.

[11] In general, to limit or optimize the number of bytes that are sent to the processor, the receiving port within the switch divides received data packets based on known information about the data packets, the port at which the data packets were received, etc., so that only information within the data packets that is deemed necessary is sent to the processor, for example.

[12] Therefore, in one aspect, a method for processing data packets received at a packet switch is provided. The method includes receiving a data packet into a port interface module of the packet switch, identifying a packet type of the data packet, and based at least on the packet type, removing a portion of the data packet to reduce a size of the data packet. The method further includes adding an internal packet switch header to the reduced data packet so as to form a modified data packet in order to route payload of the data packet within the packet switch, and sending the modified data packet to a processor of the packet switch.

[13] In another aspect, the method for processing data packets received at the packet switch includes receiving a data packet, and dividing the data packet into n portions such that each subsequent portion includes every subsequent nth byte of the data packet. The method further includes processing the n portions of the data packet, and storing the processed portions of the data packet.

[14] These as well as other features, advantages and alternatives will become apparent to those of ordinary skill in the art by reading the following detailed description, with

appropriate reference to the accompanying drawings.

BRIEF DESCRIPTION OF FIGURES

[15] Figure 1 is a block diagram illustrating one embodiment of a communication network.

[16] Figure 2 is a block diagram illustrating one example of a packet switch.

[17] Figure 3 is a block diagram illustrating a detailed example of the packet switch.

[18] Figure 4 is a block diagram illustrating one example of a port interface module and a line card for processing received data packets.

[19] Figure 5 conceptually illustrates one example of a received data packet.

[20] Figure 6 is a block diagram illustrating one example of a port module interface.

[21] Figure 7 is a block diagram illustrating one example configuration distribution of data from a port module interface to a line card.

[22] Figure 8 is a block diagram illustrating one example of a switch fabric.

[23] Figures 9A-9C conceptually illustrate one example of a byte-slicing data packet processing technique.

DETAILED DESCRIPTION I. Communication Network

[24] Referring now to the figures, and more particularly to Figure 1, one embodiment of a communication network 100 is illustrated. It should be understood that the communication network 100 illustrated in Figure 1 and other arrangements described herein are set forth for purposes of example only, and other arrangements and elements can be used instead and some elements may be omitted altogether, depending on manufacturing and/or consumer preferences.

[25] By way of example, the network 100 includes a data network 102 coupled via a packet switch 104 to a client device 106, a server 108 and a switch 110. The network 100 provides for communication between computers and computing devices, and may be a local area network (LAN), a wide area network (WAN), an Internet Protocol (IP) network or some combination thereof.

[26] The packet switch 104 receives data packets from the data network 102 via multiple physical ports, and processes each individual packet to determine to which outgoing port the packet should be forwarded, and thus to which device (e.g., client 106, server 108 or switch 110) the packet should be forwarded. In cases where packets are received from the data network 102 over multiple low bandwidth ports, the aggregate input bandwidth can be split to multiple devices in the packet switch 104 by dividing the number of incoming ports evenly among the packet switch components. For example, to achieve 40Gb/s of full-duplex packet buffering and forwarding through the packet switch 104, four 10Gb/s full-duplex buffer engines can be utilized.

However, when there is a single, high bandwidth (e.g., 40Gps physical interface) incoming interface from the data network 102, incoming bandwidth into the packet switch 104 is split among multiple buffering chips using a byte-slicing technique. Thus, the packet switch 104 may provide optimal performance both in the case where a large number of physical ports are aggregated over a single packet processing pipeline, as well as where a single high-speed interface (running at 40Gb/s) needs to be supported, for example.

[27] When the aggregate bandwidth of all incoming ports at the packet switch 104 is high, the resources of the packet switch 104 can be optimized to minimize hardware logic, minimize cost and maximize packet processing rate. One optimization of packet switch 104 resources includes limiting the number of bytes that are sent to a processor in the switch. For example, if it is known that the processor will only need to process the first 64 bytes of a packet, then only the first 64 bytes of each packet can be sent to the processor. The number of bytes sent to the processor can be further optimized as follows, for example: if the processor is always ignoring a certain number of bytes at the start of the header, then these bytes can be removed by the port interface module prior to sending the header to the processor. As an example, if the processor is performing a destination IP address lookup, then the Ethernet header is not needed by the processor. The Ethernet header bytes can therefore be stripped from the packet prior to the packet being sent to the processor.

[28] A further optimization of the number of bytes sent to the processor is accomplished by having some packets initially identified as requiring less bytes to be processed than other packets. In such a case, a variable number of header bytes can be sent to the processor on a packet-by-packet basis. The number of bytes to send to the processor can be determined by the port interface module based on some preliminary identification of packet type together with configuration information about the port interface type. For example, if a packet is identified as an ARP packet, then it is known that this packet will be forwarded to the CPU, so it is sufficient to only send enough bytes to the processor to identify the packet type as ARP. On the other hand, if a packet is identified as IPv4, then it is known that the IP header is needed to determine where the packet should be routed, so more bytes need to be sent to the processor than for the ARP packet.

[29] The packet switch 104 supports multiple types of packet services, such as for example L2 bridging, IPv4, IPv6, MPLS (L2 and L3 VPNs), on the same physical port. A port interface module in the packet switch 104 determines how a given packet is to be handled and provides special 'handling instructions' to packet processing engines in the packet switch 104. In the egress direction, the port interface module frames outgoing packets based on the type of the link interface. Example cases of the

processing performed in the egress direction include: attaching appropriate SA/DA MAC addresses (for Ethernet interfaces), adding/removing VLAN tags, attaching PPP/ HDLC header (POS interfaces), and similar processes. In depth packet processing, which includes packet editing, label stacking/unstacking, policing, load balancing, forwarding, packet multicasting supervision, packet classification/filtering and other, occurs at an ingress header processor engine in the packet switch.

II. Packet Switch

[30] Figure 2 illustrates a block diagram of one example of a packet switch 200. The packet switch 200 includes port interface modules 202-210 coupled through a mid- plane to packet processing cards or line cards 212-220, which each connect to a switch fabric 222. The packet switch 200 may include any number of port interface modules and any number of line cards depending on a desired operating application of the packet switch 200. The port interface modules 202-210, line cards 212-220 and switch fabric 222 may all be included on one chassis, for example.

[31] Each port interface module 202-210 connects to only one line card 212-220. The line cards 212-220 process and buffer received packets, enforce desired Quality- of-Service (QoS) levels, and transmit the packets back to the network. The line cards 212-220 are interconnected via the switch fabric 222, which can switch packets from one line card to another.

[32] Figure 3 is a block diagram illustrating a detailed example of the packet switch. In

Figure 3, only one port interface module 300, which is connected to a line card 302, is illustrated. The line card 302 includes an ingress buffer manager 304, an ingress header processor 306, memory 308 including ingress memory 310 and egress memory 312, an ingress traffic manager 314, an egress buffer manager 316, an egress header processor 318 and an egress traffic manager 320.

[33] The ingress buffer manager 304 receives data from the port interface module 300 and passes some or all of the data to the ingress buffer manager 304 and then to the ingress header processor 306. The ingress header processor 306 processes header information within the data and passes the processed data back to the ingress buffer manager 304, which stores the processed data with payload packet data in the buffer memory 310. The ingress header processor 306 determines to which output port the data will be sent, and the quality of data processing to be performed on the data, for example. Subsequently, the ingress traffic manager 314 will direct the ingress buffer manager 304 to pass the stored data to the switch fabric.

[34] The egress buffer manager 316 will receive data from the switch fabric and pass some or all of the data to the egress header processor 318. The egress header processor 318 processes header information within the data and passes the processed data back to the egress buffer manager 316, which stores the processed data with payload packet

data in the buffer memory 312. Subsequently, the egress traffic manager 320 will direct the egress buffer manager 316 to pass the stored data to the port interface module 300, which in turn, sends the data to the network.

Ill . Byte-Slicing Data Packet Processing

[35] Figures 4 and 5 illustrate one example operation of the packet switch. Figure 4 is a block diagram illustrating one example of a port interface module 402 and a line card 404 processing a received data packet. The line card 404 includes a buffer manager 406 with buffer manager devices #l-#4, an ingress header processor 408, an ingress traffic manager 410 and buffer memory 412. The line card 404 also includes an egress header processor and an egress traffic manager, which are not shown for ease of illustration. In one embodiment, a packet received from the network is processed by the ingress header processor 408, stored in external memory 412 by an ingress buffer manager, and scheduled for transmission across the switch fabric by the ingress traffic manager 410. In the egress direction, a packet received from the switch fabric is processed by an egress header processor, stored in external memory by an egress buffer manager, and scheduled for transmission to a network port by an egress traffic manager.

[36] Figure 5 conceptually illustrates a received data packet. In Figure 5, the data packet is shown to include header bytes H1-H16 and data bytes D1-D96. The data packet is illustrated in a column format for ease of explanation of the byte-slicing processing technique used within disclosed embodiments.

[37] The packet switch implements a packet editing language where header/data bytes may be added, deleted, or altered at different packet processing stages. The decision of how to modify and what needs to be modified is performed by programmable engines that implement the port interface module 402, the ingress header processor 408 and the egress header processor. These engines, in addition to being able to perform specific types of packet header editing, also instruct the buffer manager devices #l-#4 (which have visibility into the complete packet payload) of additional editing that needs to occur. Within the packet switch, the aforementioned packet processing engines can be implemented with programmable devices (e.g., FPGAs), while the buffer management engines can be implemented utilizing ASIC technology, for example. This approach achieves a degree of flexibility and extensibility of the switch as it allows for continuous adaptation to new services and applications, for example.

[38] Within the packet switch, in depth packet processing (which includes packet editing, label stacking/unstacking, policing, load balancing, forwarding, packet multicasting supervision, packet classification/filtering and other) occurs within the line card 404. The line card 404 operates on an internal packet signature, which is the result of packet pre-classification that occurred in the port interface module 402, as well as

the actual header of the packet under processing.

[39] The packet switch enforces end-to-end Qos guarantee throughout by utilizing both ingress and egress traffic management. For example, in the ingress direction, the ingress traffic manager enforces service guarantees when there is contention between multiple cards sending traffic to the same egress switch fabric port. In the egress direction, the scheduler enforces guarantees and resolves contention between traffic streams destined to the same egress physical port.

[40] Initially, the port interface module 402 first aggregates incoming traffic into four

10Gb/s pipelines, for example. So, in the case of low speed interfaces, multiple ports are first aggregated into each one of these four pipelines. Then, these four pipelines are multiplexed together, utilizing a byte-sliced method, into a single 40Gb/s pipeline. With the exception of some packet pre-classification that occurs on the line card 404, the majority of packet buffering, processing, QoS enforcement, and packet switching can be accomplished at 40Gb/s speeds. The 40Gb/s processing and buffering throughput is only one example, since other speeds may be supported by alternate processors, such as speeds up to 48Gb/s, for example. Thus, for example, the packet switch can support 48 10/100/1000 Ethernet ports (without overbooking), or 96 10/100/lOOOBT Ethernet ports (with 1:2 overbooking), or 4 10GE/OC192 ports (without overbooking), or 8 IOGE ports (with 1:2 overbooking). Overbooking refers to a configuration where an amount of total network port bandwidth exceeds the available packet processing capacity.

[41] In the ingress direction, the port interface module 402 receives the data packet and checks for L1/L2/L3 packet correctness (i.e., CRC checks, IP checksums, packet validation, etc.). Once packet correctness is established, the port interface module 402 performs a high-level pre-classification of the received data packet, which in turn, determines the type of processing/handling for the data packet. Since the packet switch supports multiple types of packet services, such as for example L2 bridging, IPv4, IPv6, MPLS (L2 and L3 VPNs), on the same physical port, the port interface module 402 determines how a given packet is to be handled and provides special 'handling instructions' to packet processing engines, such as buffer manager devices #l-#4.

[42] The packet switch utilizes a method whereby the aggregate bandwidth received from the network over one or more incoming ports is sliced on a byte-by-byte basis to be transferred concurrently to multiple devices which comprise the buffer manager. Such a method is of particular importance when the aggregate bandwidth of the one or more incoming ports exceeds the bandwidth capabilities of an individual buffer manager device, for example. By utilizing a byte-slicing based approach, multiple buffer manager devices form a single high bandwidth interface to the switch fabric.

[43] Byte slicing is accomplished by dividing each data packet into N pieces, and

forwarding each piece to a different buffer manager device. An N-level slicing is accomplished by forwarding exactly I/Nth of each data packet to a given buffer engine. Thus, an N-level slicing requires the use of N buffer management engines. Therefore, more or fewer buffer engines may be included within the line card 404. Furthermore, the slicing technique forwards bytes located in a specific location within the packet to the same buffer management engine. For example, a byte at location k within a packet is sent to a buffering engine identified by the following equation:

[44] destination buffer engine = k mod N

[45] so that, for example, using a 4-level slicing method, bytes 2, 6, 10, etc., will all be sent to the second buffering engine.

[46] Given that the port interface module 402 is capable of processing approximately

12Gb/s of full-duplex traffic, and since the architecture/implementation shown in Figure 4 utilizes four buffer manager devices and thus slices the data packet into (4) slices, each slice going into or out of the port interface module 402 has approximately 3Gb/s of full-duplex bandwidth capacity. Conversely, since each buffer manager device handles one slice worth of traffic originating or destined to the port interface module 402, then each buffer manager device is capable of buffering and forwarding approximately 12Gbps of full duplex capacity. Note that the implementation of the byte-slicing method described above can handle issues related to coordination, synchronization and error handling across the multiple buffer management engines that handle different pieces of the same data packet (described below). The implementation in Figure 4 is for illustration purposes only, since other alternatives are possible as well. For example, incoming/outgoing data can be arranged in lines 128-bit wide. Slicing can then be accomplished at word level (32-bits). As such, k in the formula would above represent the kth word within the packet, and not the kth byte.

[47] As a specific example of the byte-slicing processing method, consider steps 1-8 in

Figure 4. Initially, as shown at step 1 in Figure 4, a data packet is received from the network and includes data payload (e.g., D1-D96) encapsulated by one or more headers (e.g., H1-H16) containing specific information about the packet, such as the packet's type, source address and destination address, for example. The multiple headers of a packet originate from the multiple protocol layers in the data network. For example, some data to be transferred over the network may be encapsulated with a TCP header at the Transport Layer to form a TCP packet, then encapsulated with an IP header at the Network Layer to form an IP packet, and then encapsulated with an Ethernet header at the Link Layer to form an Ethernet packet.

[48] Next, at step 2 in Figure 4, the port interface module 402 performs some identification of packet type. Based on packet type and interface type, some bytes within the data packet might be removed from the start of the packet header, e.g., Ethernet

header bytes can be removed from an IP packet that arrives on an IP routing interface. In the example in Figure 4, header bytes H1-H4 are removed from the packet. The port interface module 402 also prepends some 'signature' bytes to the front of the data packet to carry certain information about the packet that may only be relevant within the packet switch. In particular, the packet signature carries information about the packet type, the arriving port number, and the number of bytes to send to the ingress header processor 408, for example. In the example in Figure 4, signature bytes S1-S8 are prepended to the data packet (as shown in Figure 5). The signature bytes may include a number of fields containing other information than described herein.

[49] The port interface module 402 then slices the modified data packet (e.g., the signature plus packet payload) on a byte-by-byte basis to send to the buffer manager 406, as shown at step 3. For example, the data packet is divided into 4 portions and one portion is sent to each of the buffer manager devices #l-#4. Each portion includes each subsequent byte of the data packet. For instance, the first portion includes bytes 1, 5, 9, ..., the second portion includes bytes 2, 6, 10, ..., the third portion includes bytes 3, 7, 11, ..., and the forth portion includes bytes 4, 8, 12, ..., etc. Thus, as shown in Figure 5, the data packet is conceptually divided into 4 columns, and each column is then sent to a buffer manager device.

[50] Each buffer manager device receives its slice of the data packet from the port interface module 402, as shown at step 4 in Figure 4, and based on information in the packet signature, the buffer manager devices send a limited number of bytes to the ingress header processor 408. In the example shown in Figure 4, the first 4 bytes on each slice are sent in parallel to the ingress header processor 408, namely signature bytes S1-S8 and header bytes H5-H12.

[51] The number of signature and header bytes that each buffer manager extracts and subsequently sends to the ingress processing engine is determined during the packet pre-classification process that occurred within the port interface module 402. One optimization of the ingress packet processor resources is for the buffer manager 406 to limit the number of bytes that are sent to the ingress header processor 408. For example, if it is known that the ingress header processor 408 will only need to process the first 64 bytes of a packet, then the buffer manager 406 can simply send only the first 64 bytes of each packet to the ingress header processor 408. The number of bytes sent to the ingress header processor 408 can further be optimized as follows, for example: if the ingress header processor 408 is always ignoring a certain number of bytes at the start of the header, then these bytes can be removed before sending the header to the ingress header processor 408. As an example, if the ingress header processor 408 is performing a destination IP address lookup, then the Ethernet header is not needed by the header processor. The Ethernet header bytes can therefore be

stripped from the packet prior to the packet signature and packet header being sent to the ingress header processor 408.

[52] The ingress header processor 408 receives the packet slices and re-forms the signature and header bytes together into one packet by joining the bytes received from each slice, as shown at step 5 in Figure 4. For example , since the original packet (and consequently the attached internal packet signature) become sliced into four pieces within the byte-slicing process, one fourth of the signature and header information needed resides within each one of the four buffer management engines, and thus the ingress header processor 408 re-constructs the original packet signature and packet header by assembling together the pieces that the ingress header processor 408 receives from each one of the buffer manager devices. The packet header then undergoes processing by the ingress header processor 408. In particular, the ingress header processor 408 performs a lookup of the packet's destination address to determine to which outgoing port the packet should be forwarded, for example.

[53] The ingress header processor 408 modifies the signature bytes to include some new information about the packet, such as information concerning the outgoing port determined from the lookup of the packet's destination address, for example. The ingress header processor might also modify, add or remove some of the header bytes, e.g., the bytes of an MPLS (multi-protocol label switching) label header can be removed from the header, or the TTL (time-to-live) byte of an IP header can be decremented. In the example shown in Figure 4, the ingress header processor 408 sends back to the buffer manager devices #l-#4 modified signature bytes S1-S8 plus new header bytes J1-J4 as shown at step 6. The bytes are sliced into 4 portions to send a portion to each of the four buffer manager devices.

[54] Each buffer manager device then forms a 'slice' of the data packet by joining the original slice payload bytes received from the port interface module 402 (i.e., parts of data packet not sent to the ingress header processor 408) with the modified signature bytes and modified/new header bytes received from the ingress header processor, as shown at step 7 in Figure 4. Some of the signature bytes carry information about how many of the original bytes should be dropped when forming this slice of the packet. In the example shown in Figure 4, four bytes of the original packet are dropped on each slice, namely bytes S1-S8 and H5-H12, and are replaced with modified/new bytes Sl'-S8' and J1-J4. Each buffer manager device stores its slice of the packet in external buffer memory 412.

[55] Following, the buffer manager devices send messages to the ingress traffic manager

410 to inform the ingress traffic manager 410 of the new packets that have been stored in buffer memory. The traffic manager 410 then performs a scheduling algorithm to determine an order in which packets should be sent to the switch fabric. For example,

the traffic manager 410 may send the packets to the switch fabric based on a first- in-first-out algorithm. Other algorithms may be used as well, such as a hierarchical weighted fair queuing (WFQ) scheduling or traffic shaping policy to determine the time and order in which packets should be sent to the switch fabric. The traffic manager 410 then sends messages to the buffer manager devices to inform the devices of which packet should next be sent to the switch fabric, as shown at step 8 in Figure 4.

[56] Each buffer manager device #l-#4 reads its slice of a packet from memory and sends the packet slice across the switch fabric once the ingress traffic manager 410 schedules transmission of the packet, as shown at step 9 in Figure 4. The data packet slices are then sent across the switch fabric and will be received by an egress buffer manager of a line card in the packet switch (e.g., possibly the same line card 404).

[57] The buffering and processing of packets on egress is substantially the same as processing described on ingress at steps 3-9. Slices of packet bytes are received on egress by buffer manager devices on a line card from the switch fabric. Some bytes of these slices are sent to the egress header processor, and the egress header processor sends modified/new signature and header bytes back to the buffer manager devices. The packet slices are stored in external buffer memory, and scheduled to be sent to outgoing ports by the egress traffic manager.

[58] On egress, the port interface module removes the signature bytes of the data packet before transmitting the packet onto an outgoing port to the network. The port interface module might also need to add some header bytes to the packet, e.g., an IP packet needs to have an Ethernet header added. Thus, the port interface module frames outgoing packets based on the type of the link interface. Additional examples of the processing performed in the egress direction include: attaching appropriate SA/DA MAC addresses (for Ethernet interfaces), adding/removing VLAN tags, attaching PPP/ HDLC header (POS interfaces), and similar processes.

[59] When the sliced bytes are joined back together at the ingress header processor 408 or the port interface module 402 on egress, it is necessary that the bytes from the same packet are re-joined together. If a transmission error occurs in the sending of bytes on one slice, then a mechanism is needed to detect the error at the receiver so that the bytes received on the other slices can be discarded, and the sliced bytes from the next packet can be correctly aligned together.

[60] An error mechanism used for protection and alignment of sliced bytes between the devices in the byte sliced architecture can be achieved by implementing a cyclic redundancy code (CRC) to protect data transmitted on each slice. The sliced data can be discarded upon detection of a CRC error. The signature bytes can carry a 'packet id' on each slice, which is used to align the bytes so that the bytes correspond to the same packet.

[61] Some data within received data packets may include exception traffic. Exception traffic is packet traffic that requires special handling. Processing of exception traffic is typically performed by a system CPU within the buffer manager 406. The system CPU allows for a powerful and flexible manner to integrate and handle exception traffic within the fast packet processing pipeline. On the ingress direction, exception traffic is identified either by the port interface modules or the header processing units. Incoming exception packets are forwarded to the buffer manager CPU through an interface that looks to the fast data packet processing pipeline as another physical port. As such, CPU traffic is subjected to powerful classification/filtering, rate limiting, and scheduling/shaping policies, for example. Packets carrying signaling, routing or other critical information can be handled with pre-configured high priority. At the same time, low priority control traffic is rate limiting.

[62] In addition, the system performs discrimination on control packets selectively, for each protocol of control traffic type, for example. CPU overload is controlled through the use of the hardware powered rate limiting and shaping capabilities. The CPU can inject special packets into the ingress side of the main data packet pipeline, and the packets can be forwarded to the appropriate egress port. Such action is accomplished by making the fast data processing pipeline look to the CPU as another PCI interface. Control packets are then routed or forwarded to the egress port using the fast data packet processing pipeline, for example.

IV. Port Interface Module Architecture

[63] The packet switch can receive data packets from the data network via multiple physical ports that may be configured in many different manners. Figure 6 illustrates a block diagram of one example of a port module interface 600. The port module interface 600 may receive packets from the data network over multiple low bandwidth ports, such as four 10 Gps input ports or forty 1 Gps input ports. In such a case, the aggregate input bandwidth can be split by a port interface receiver 602 to multiple port interface controllers 604, 606, 608 and 610 by dividing the number of incoming ports evenly among the controllers 604, 606, 608 and 610. To achieve 40Gb/s of full-duplex packet buffering and forwarding through the port module interface 600, four 10Gb/s full-duplex port interface controllers can be utilized.

[64] As an example, if data is received over four 10 Gps input ports, one 10 Gps input port can be directed to each of the controllers 604, 606, 608 and 610. Alternatively, if data is received over forty 1 Gps input ports, then 10 Gps input ports can be directed to each of the controllers 604, 606, 608 and 610. Here, the data may then be received at the controllers 604, 606, 608 and 610 and stored in temporary fast memory before being output over the controller's ports #l-#4 based on availability.

[65] When there is a single, high bandwidth (e.g., one 40Gps physical interface)

incoming interface from the data network, incoming bandwidth into the port interface module 600 is split among the controllers 604, 606, 608 and 610. The data may be split between the multiple buffering chips using the byte-slicing technique. For example, the first four bytes of data received can be sent to the first controller 604, the second four bytes can be sent to the second controller 606, the third four bytes can be sent to the third controller 608 and the fourth four bytes can be sent to the fourth controller 610. This process can be repeated until all bytes of an incoming data stream have been distributed. The controllers 604, 606, 608 and 610 will then process the data and add information within signature bytes, and then divide their share of the data packet into bytes to be distributed to buffer managers.

[66] Figure 7 is a block diagram of one example configuration of a port module interface

700 coupled to a line card 702. Figure 7 illustrates an example distribution of data from the port module interface 700 to the line card 702. In this example, the port module interface 700 receives data packets from the data network over four 10 Gps input ports. The port module interface 700 includes four modules 704, 706, 708 and 710, each of which receives data packets from one of the lOGps input ports. Each module 704, 706, 708 and 710 slices incoming data packets on a byte-by-byte basis and distributes the bytes to the buffer manager devices as shown in Figure 7 and as described above with reference to Figure 4. For example, module 704 divides incoming packets such that a first byte is sent to buffer device 712, a second byte is sent to buffer device 714, a third byte is sent to buffer device 716, a fourth byte is sent to buffer device 718, and the dispensing of the remaining bytes repeats in this manner until all bytes of the data packet are distributed. Further, each buffer manager device has four input ports, so that each buffer manager device can receive inputs from each of the port interface modules 704, 706, 708 and 710 in parallel.

[67] Subsequently, the buffer manager devices 712, 714, 716 and 718 send the processed data packet slices to an ingress header processor 720 for processing, and then the data is stored in memory 724, 726, 728 and 730. An ingress traffic manager 722 schedules the data for transfer through a switch fabric. An egress header processor 732 and an egress traffic manager 734 processes data en route to the network, as described above with regard to Figure 4.

V. Switch Fabric Architecture

[68] Referring back momentarily to Figure 2, each line card 212-220 is connected to the switch fabric 222 through an external optical backplane interface. Capacity between each line card and a switch fabric port is approximately 80 Gb/s, for example. The switch fabric interface on the line card can be implemented as a group of 32 serial lanes running at 2.5Gb/s, for example. The serial speed can be configured to run up to 3.125Gb/s per lane. The effective capacity of each switch fabric interface is ap-

proximately full-duplex 60Gb/s, for example. The switch fabric 222 can be implemented utilizing a single stage, non-blocking, bufferless architecture. Traffic arriving at a given switch fabric port is divided into a number of slices. The number of slices depends on the number and size of cross-point devices that implement switching between ports. For example, the packet switch 200 may utilize five 72-port asynchronous cross-point devices or sixteen 8-port asynchronous cross-point devices. The cross-point sizes described here are based on board complexity and reduction of cost. Other cross-point device sizes can be used as well as long as the number of serial lanes supported on a single device is greater than the number of switch fabric ports that need to be switched.

[69] One example of an architecture of a switch fabric 800 is conceptually illustrated in

Figure 8. Here, traffic received at port #1 of the switch fabric is switched to destination port #3. The 32 serial data lanes are switched to the destination port through five cross- point devices. The manner in which the serial data lanes are sliced across multiple cross-point devices is not significant, as long as all ports follow the same slicing scheme. The cross-point devices are reconfigured continuously based on a configuration matrix computed by the switch fabric arbitration and scheduling module (SAM) 802. The SAM 802 receives requests from each ingress switch fabric port indicating the egress port for which the SAM 802 has traffic. Furthermore, for each egress port, the ingress port provides QoS related information, such as the service class of traffic to be switched from each ingress port to each egress port. The SAM 802 then computes a cross-point configuration schedule that meets the QoS requirements of switched traffic while optimizing the overall switch fabric efficiency.

[70] The switch fabric 800 may support 10 packet processing card (PPC) ports, for example. The switch fabric 800 may be configured to support more or fewer PPC ports. In Figure 8, the switch fabric 800 is shown to include four PPCs. Note also that up to three PPCs may be interconnected directly in a non-blocking configuration without a switch fabric. This option allows for lower cost base configuration for the packet switch, for example.

[71] Each PPC card can offer up to 48Gb/s of packet processing. Traffic switched across PPC cards is forwarded to external optical backplane interfaces. No switching fabric is required when there are up to three PPC cards installed in the system. A 4-port switch fabric is needed in the case where all four PPC slots are utilized. The configuration shown in Figure 8 can include two 4xlOGE/OC192 MC cards, four 48-10/100/lOOOGE ports, and 4 PPCs, for example.

[72] The switch fabric 800 further includes two slots for system controller boards (SCs)

(not shown). The SC cards perform central chassis management and control. Also, a variation of the SC card, referred to as SC/SF (system controller/switch fabric),

integrates a 4-port switch fabric with the SC card. The switch fabric portion of the SC card may only be needed when there are 4 PPC cards installed in the system, for example. The system does not need a switch fabric when operating with up to 3 PPC cards.

[73] The switch fabric 800 also includes six media card (MC) slots, e.g., slots 0, 2, 6, 7,

9, and 10 (not shown). MC slots can be paired together as follows, for example: MCO (slot 0) and MCl (slot 3) may be paired together and be controlled by either one of the PPC cards installed next to them, that is PPCO (slot 1) and PPCl (slot 3); MC2 and MC3 (slots 6 and 7) may be paired together and may be interfaced to PPC 2 (slot 8); and MC4 and MC5 (slots 9 and 10) may be paired together and interfaced to PPC4 (slot 11).

VI. Example Operation

[74] Figure 9 illustrates a detailed example of the byte-slicing technique described herein. Figure 9A conceptually illustrates a data packet that has been received at a port module interface of a packet switch according to embodiments described herein (such as described in Figure 7, for example). The data packet includes a destination MAC address (6 bits), an SRC MAC address (6 bits) and a payload type (PT) identifier all included within an Ethernet header as bytes H1-H4; an IP header (20 bits) as bytes H5-H12; a TCP header (20 bits) as bytes H13-H16; and possibly additional headers followed by the packet payload data and CRC data, for example. Thus, the data packet includes bytes Hl -H 16 followed by data bytes Dl-Dn.

[75] A port interface module receives the data packet and strips off the Ethernet header and CRC information, and adds data packet signature information, as shown at Figure 9B. The signature information includes information such as packet type, incoming interface/port, outgoing interface/port, error checking CRC and signature size. Now, the data packet includes bytes S1-S8 followed by H5-H16 and Dl-Dn.

[76] Subsequently, the port interface module divides the data packet into bytes and distributes the bytes to buffer manager devices on a line card on a byte-by-byte basis according to embodiments described above. For example, each buffer manager device will receive a signature byte. For this reason, the signature size field may be replicated within each slice of data that is sent to each buffer manager device because the signature field indicates the number of bytes that the buffer manager device will subsequently send to the ingress header processor.

[77] The ingress header processor will receive the indicated number of bytes from the buffer manager, and the buffer manager will temporarily hold the remaining bytes. Here, the ingress header processor will receive the signature and IP header bytes. The ingress header processor will determine that this data packet is an IPv4 packet, for example, and will subsequently perform an IP address lookup. The ingress header

processor will output information back to the buffer manager, such as for example, a line card destination identifier (indicating a line card to which the data packet will be sent through the switch fabric), an outgoing physical port and QoS information. The ingress header processor modifies some of the signature bytes to include this information. In addition, the ingress header processor may remove some of the header information, such as some of the IP header information since only the destination port, a TTL bit, and an IP checksum may be needed.

[78] The buffer manager will receive the modified bytes from the ingress header processor and join these bytes with the bytes temporarily on hold (e.g., bytes not sent to the ingress header processor and stored on buffer manager). The resulting data packet is conceptually illustrated in Figure 9C. The buffer manager device then writes the data packet to external memory (either memory external to the packet switch, or within the packet switch but external to the buffer manager). The buffer manager device further informs the ingress traffic manager of the location in memory where the data packet was stored.

[79] Subsequently, the ingress traffic manager will schedule the data packet to be switched across the switch fabric to the destination line card indicated within the signature byte information. The ingress traffic manager may schedule data packet transfer across the switch fabric based on a FIFO technique or others, for example.

[80] The switch fabric will receive the data packet and transfer the packet to the destination line card. On egress, the destination line card will receive the data packet slices and send the slices to buffer manager devices, which communicate with an egress header processor to process the data packet slices and transfer the processed slices to the port interface module. The port interface module then reforms the slices into one data packet and forwards the data packet to the network.

[81] It should be understood that the processes, methods and networks described herein are not related or limited to any particular type of software or hardware, unless indicated otherwise. For example, operations of the packet switch may be performed through application software, hardware, or both hardware and software. In view of the wide variety of embodiments to which the principles of the present embodiments can be applied, it is intended that the foregoing detailed description be regarded as illustrative rather than limiting, and it is intended to be understood that the following claims including all equivalents define the scope of the invention.