CONTROL MECHANISMS FOR SWITCHING DEVICES - CEDAR POINT COMMUNICATIONS INC

Title:

CONTROL MECHANISMS FOR SWITCHING DEVICES

Document Type and Number:

WIPO Patent Application WO/2004/051497

Kind Code:

A1

Abstract:

Systems for in-band control that establish relationships between incoming data and one or more destinations to which the data is to be transmitted. In the in-band control embodiments described herein, a connection is established between two ends of a circuit (210) by provisioning at one end of the circuit. A circuit connection table is kept at both ends of the circuit (210). This table contains the new and existing circuit connection information. A software process writes to the connection table at the local end of the circuit (210) which the address information of the remote end of the circuit (210). The circuit connection information is to be periodically sent to the remote end of the circuit and to establish or tear down new connections.

Inventors:

FITZGERALD JEFFREY J (US)
SAAB HICHAM (US)
ENGLISH DANIEL W (US)
MISHRA RAJAESH (US)

Application Number:

PCT/US2003/038206

Publication Date:

June 17, 2004

Filing Date:

December 02, 2003

Export Citation:

Click for automatic bibliography generation Help

Assignee:

CEDAR POINT COMMUNICATIONS INC (US)
FITZGERALD JEFFREY J (US)
SAAB HICHAM (US)
ENGLISH DANIEL W (US)
MISHRA RAJAESH (US)

International Classes:

G06F15/16; G06F15/167; (IPC1-7): G06F15/167; G06F15/16

Foreign References:

US6532239B1	2003-03-11
US5781537A	1998-07-14
US6598106B1	2003-07-22
US6711171B1	2004-03-23

Attorney, Agent or Firm:

Kelly, Edward J. (One International Place Boston, MA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

1.

A system for providing inband control for transmitting data from a local end to a remote end and for remote provisioning of circuits, comprising a plurality of storage devices at the local end, a process for sorting incoming TDM data into the plurality of storage devices as a function of a destination associated with respective TDM data, and a process for reading data from the storage devices according to a predetermined sequence, whereby data transmitted from the local end is received at the remote end in a sequence representative of data destination.

2.

A system according to claim 1, further comprising a remote provisioning process for monitoring incoming data and for creating a connection table at a local end for storing information representative of memory locations at the remote end being provisioned for storing data associated with a respective one of the destinations.

3.	A system according to claim 2, further comprising a process for inband transmission of data representative of the connection table to the remote end for creating a remote connection table.

4.	A system according to claim 2, further comprising An initialization process for transmitting a connection table to the remote end to allow transmission of data between the local and the remote end.

5.	A system according to claim 2, further comprising an update process for monitoring incoming TDM data and altering the local and remote connection tables in response to detected changes in calls being handled by the system.

6.	A system according to claim 5, wherein the update process further includes a remote table update process for generating control cells for inband communication of connection table data for updating the remote connection table.

7.	A system according to claim 1, wherein the remote end and the local end each include connection tables for supporting bidirectional calls.

8.	A system according to claim 1, wherein the connection table stores information for supporting 1: 1 switching.

9.	A system according to claim 1, wherein the connection table stores information for supporting 1: N switching.

10.	A system according to claim 1, wherein the connection table stores information for supporting multicast or broadcast switching.

11.	A system according to claim 4, wherein the initialization process is activated as part of a failover sequence.

12.	A system according to claim 5, wherein the update process is responsive to a scheduling signal that schedules updates at a rate selected to employ an amount of redefined amount of bandwidth.

13.

A process for providing inband provisioning control for a switch, comprising the steps of processing incoming TDB data to identify a number of circuits to provision for, at a local end, provision memory locations at a remote end that are capable of storing data for respective ones of the identified circuits, generating a table representative of the circuits and provisional memory 9318344 11. DOC46 locations, and transmitting the table as inband data packets to the remote end.

14.	The process of claim 13, further comprising defining a sequence for transferring data between the local and remote end, and at the remote end, storing data into memory locations as a function of the order in which data occurs in the sequence.

15.	The process of claim 13, further comprising at the remote end, building a connection table, and returning an acknowledge signal to the local end.

16.	The process of claim 13, further comprising periodically transferring connection table data to the remote end.

17.	The process of claim 13, further comprising providing connection tables at the local end and the remote end to support bidirectional calling.

18.	The process of claim 13, further comprising transferring connection table data in response to a detected failure at the remote end.

19.	The process of claim 13, further comprising determining a rate for updating a remote connection table as a function of available bandwidth.

20.

A system for failing over a remote device, comprising a local switch having a connection table storing information representative of data flows being supported, a failure detector capable of detecting a failure out of a remote device, and a failover device capable of identifying an alternate remote device and delivering the connection table to the alternate remote device.

21.

A process for arbitrating between active and protected status, comprising the steps of identifying a plurality of cards capable of communicating with each other, allowing each card to make a determination of the health of another one of the cards, allowing each card to deliver to another of the cards a vote representative of the respective cards determination of the health of the other card, and having a respective card determine as a function of delivered votes a health status representative of whether the card is to be isolated.

22.	A process according to claim 21, wherein determining as a function of delivered votes includes determining as a function of a majority of votes.

23.	A process according to claim 21, wherein determining as a function of delivered votes includes determining as a function of a plurality of votes.

24.	A process according to claim 21, further comprising isolating a card as a function of delivered votes.

25.	A process according to claim 21, wherein making a determination of the health of a card includes measuring response time, identifying a parity error, identifying a check sum error, and identifying a failure to respond to a command. 9318344_11. DOC48 26.

26.	A process according to claim 24, wherein isolating a card includes entering a state that prevents the card from exchanging data.

27.	A processing according to claim 24, wherein isolating a card includes disabling a hardware interface to an external system bus.

28.	A processing according to claim 21, further comprising a self diagnostic test for having a card monitor local parameters to determine a health status for the respective card.

29.	A process according to claim 28, further comprising determining an isolation state in response to the selfdiagnostic test.

30.	A process according to claim 28, wherein the selfdiagnostic test includes monitoring a heartbeat timer.

31.	A process according to claim 21, further comprising the step of monitoring a control signal representative of an instruction to adjust between a protection state and an active state.

32.

A system for arbitrating between an active state and a protected state, comprising a plurality of devices capable of exchanging data, a card monitor for monitoring parameters of other cards in the system representative of operating characteristics, a vote out mechanism, responsive to the monitored parameters, for generating a vote signal representative of an assessment of a card's operating condition, and a vote tally mechanism, responsive to vote signals received from card in the system, and capable of changing an operational state of card in response thereto.

33.

A method of improving network availability in a segmented network, comprising the steps of: periodically transmitting a test message over a plurality of communication links from a source node in communication with a source network segment to a plurality of destination nodes, each of the plurality of destination nodes being in communication with a respective destination network segment; generating, for each of the plurality of destination nodes, a return message if the test message is received at the destination node; determining the status of each of the plurality of communication links in response to the return messages generated by the plurality of destination nodes; and providing the status of the plurality of communication links to each of the plurality of destination nodes that generated a return message.

34.

The method of claim 33, wherein the step of determining the status further comprises indicating a fault in one of said one or more paths if said source node does not receive at least a predetermined number of return messages from said destination nodes in response to a predetermined number of test messages transmitted to said destination nodes.

35.	The method of claim 33, and further comprising the step of configuring one of said paths between said source node and said one or more destination nodes in response to the determined status.

36.	The method of claim 33, wherein the test message is an LLC type 1 frame format.

37.	The method of claim 33, wherein the return message is an echo message generated in response to the test message.

38.	The method of claim 33, wherein the source and destination nodes are selected from the group consisting of a host, a router and a load balancer.

39.	The method of claim 33, and further comprising the step of updating a routing table in response to the determined status.

40.	11. DOC50 40.

41.	The method of claim 33, wherein the step of configuring includes avoiding paths through dead links between nodes or paths connecting to unresponsive destination nodes.

42.

The method of claim 33, wherein determining the status includes the steps of : waiting a predetermined period of time for the return message from a destination node, and if the status of the destination node has changed, the source node updating a local adjacency status table, and propagating an updated routing table to other nodes on the segmented network.

43.	The method of claim 33, wherein said test message is not sent within the same segment.

44.	The method of claim 33, wherein the test message is transmitted approximately once per second.

45.

A system for improving availability comprising: a plurality of destination nodes in communication with a respective one of a plurality of destination network segments, each of the destination nodes configured to receive a test message through one of a plurality of communication links and generate a return message; a source node in communication with each of the plurality of destination nodes, the source node configured to provide a test message to each of the plurality of destination nodes, and for determining the status of each of the plurality of communications links in response to the return messages; and a configuration update module in communication with the source node and the plurality of destination nodes, the configuration update module providing a status message to each of the destination nodes that provides a return message to the source node.

46.	The system of claim 44 wherein the source node transmits the test message approximately once per second.

47.	The system of claim 44 wherein the source nodes and the destination nodes are selected from the group consisting of a host, a router and a load balancer.

48.	The system of claim 44 wherein the test message is an LLC type 1 frame format.

49.	The system of claim 44 wherein the return message is an echo message of the test message.

50.

A system for improving network availability in a segmented network, comprising: a first network segment having a plurality of connected source nodes, a second network segment having a plurality of connected destination nodes, said second network segment connected to said first network segment over one or more paths; identification means for identifying from one or more source nodes one or more cooperating destination nodes, transmission means for periodically transmitting a test message over the one or more paths from a source node to one or more destination nodes; said transmission means in response to a return message received from said destination nodes, determining the status of said one or more paths ; and status update means for providing said status to each of the plurality of destination nodes that generated a return message.

51.

The system of claim 49, further comprising fault indicating means for indicating a fault in one of said one or more paths if said source node does not receive at least a predetermined number of return messages from said destination nodes in 9318344 ll. DOC52 response to a predetermined number of test messages transmitted to said destination nodes.

52.	The system of claim 49, further comprising configuration means for configuring one of said paths between said source node and said destination nodes in response to said determined status.

Description:

CONTROL MECHANISMS FOR SWITCHING DEVICES BACKGROUND [0001] Today, there is tremendous opportunity to change and improve the way individuals exchange data. Engineers and scientists are now developing technologies and devices that can be employed to allow cable subscribers to make telephone calls, transmit faxes and perform other telecommunication functions over the cable network. One technology just combines into one system two separate switches, one that switches TDM data and one that switches packets. Other architectures have been suggested. In any case, the basic job of the switch is to provide a system that brings together the time division multiplexing (TDM) telephony switching technology used by telephony circuits with the IP switching technology employed by the cable network. Thus, as networks merge, systems are being designed to accommodate both TDM circuit traffic and packet traffic simultaneously.

[0002] Accordingly, there is a need in the art for a control scheme that operates with communication systems that support call distribution over a mixed network.

SUMMARY OF INVENTION [0003] The systems and methods described herein include, among other things, systems and methods for in-band control signaling and for remote provisioning of circuits. As will be described in more detail hereinafter, the systems include in- band control mechanisms that establish relationships between incoming data and one or more destinations to which the data is to be transmitted. Typically, the relationship comprises a circuit or a data flow and the in-band control mechanism provides for call set up and tear down as calls and flows begin and end.

[0004] In one embodiment, the in-band control mechanism works with systems that pass TDM traffic through a packet switch. Such a system may include a packet switch that has a plurality of data ports and is capable of routing packets 9318344 I1. DOC-1- between the plurality of data ports. A TDM encapsulation circuit processes a TDM data flow that is incoming to the switch. A circuit demultiplexer processes the incoming data flow to buffer data associated with different TDM circuits into different buffer locations. The different buffer locations may be associated with different circuits or destinations. The in-band control mechanism described herein performs provisioning control at one end of the circuit, and this provisioning establishes the connections between two ends of the circuit. To this end, a software process writes to a connection table at the local end of the circuit with the address information of the remote end of the circuit. The address information identifies memory locations where data associated with a particular call may be stored. The circuit connection table can be transmitted in-band to the remote end and is kept at both ends of the circuit. The table contains the new and existing circuit connection information. The connection information is sent to the remote end (s) to establish or tear down new connections either periodically or as the table content changes.

[0005] In another aspect, the systems and methods described herein include systems and methods for arbitrating between a master and slave status or an active or protection status. More particularly, the systems and methods described herein may be employed in a system that provides for redundancy, such as hardware redundancy, although these systems and methods may be applied to other systems as well wherein there is in interest in detecting a failing circuit or device and isolating that circuit board device from the rest of the system to thereby prevent or reduce the likelihood of a larger system wide failure.

[0006] In still another aspect, the invention provides a method and apparatus for improving local area network (LAN) availability by implementing a standards-based link up/link down status detection protocol on segment-to-segment communications paths. Also disclosed is a method to increase data throughput by employing a compaction method that substitutes fixed values in a packet header with a tag value. This reduces the amount of data that needs to be processed and allows for quicker amortization of overhead.

[0007] Other objects and aspects of the invention will, in part, be obvious, and, in part, be shown from the following description of the systems and methods shown herein.

BRIEF DESCRIPTION OF THE DRAWINGS [0008] The present disclosure may be better understood and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.

Figure 1 is a high-level block diagram of a prior art VoIP softswitch architecture.

Figure 2 is a high-level block diagram of the VoIP architecture according to one embodiment of the invention.

Figure 3 is a functional block diagram according to one embodiment of the invention.

Figures 4A-C are flowchart depictions of packet handling processes employed by the invention.

Figure 5 is a high-level block diagram of a switch fabric architecture of an exemplary embodiment of the invention.

Figure 6 is a schematic representation of the TDM data format employed in some embodiments of the invention.

Figure 7 depicts a functional block representation of a system according to the invention employing in-band control.

Figure 8 provides a graphical representation of two examples of data being encoded into control cells.

Figure 9 depicts a state diagram representation of operating modes employed by the invention.

Figure 10 depicts a functional block diagram showing communication paths between cards in a system.

Figure 11 depicts a flow chart representation of one process for isolating a card.

Figure 12 depicts a flow chart diagram of a firewall unlock process.

Figure 13 is a high-level block diagram of a Local Area Network (LAN) configured in accordance with one embodiment of the invention.

Figure 14 is a flowchart of a method of increasing LAN efficiency, according to one embodiment of the invention.

[0009] The use of the same reference symbols in different drawings indicates similar or identical items.

DESCRIPTION [0010] To provide an overall understanding of the invention, certain illustrative embodiments will now be described. For example, in one embodiment the systems and methods described herein provide a media switching system that may couple to the headend of a cable system operator and exchange data between the headend and the PSTN. Provisioning control occurs in-band and calls are setup and tom down in response to the in-band control. The systems described herein allow cable system operators having two-way connectivity to offer, i77ter alia, voice services. As multiple calls and flows may be handled through a single switch, and as the number of calls and flows may change dynamically, the in-band control mechanisms facilitate efficient control over the calls and flows being handled through the system.

[0011] However, it will be understood by one of ordinary skill in the art that the systems and methods described herein can be adapted and modified for other applications and that such other additions and modifications will not depart from the scope hereof.

9318344 I 1. DOC-4- [0012] In prior art distributed telephony systems, such as the system 100 depicted in Figure 1, a Call Management Server 120 responds to the media termination adapters (MTAs) 101 that will be involved in the call by providing to each MTA the other MTA's IP address. The depicted MTA 101 may be a PacketCable client device that contains a subscriber-side interface to the physical telephony equipment 130 (e. g. , telephone), and a network-side signaling interface to call control elements in the network. An MTA typically provides the codecs, signaling and encapsulation functions required for media transport and call signaling. MTAs typically reside at the customer site and are connected to other network elements, such as for example via the Data Over Cable Service Interface Specification (DOCSIS) network. There are two common MTAs : standalone (S-MTA) and embedded (E- MTA), although any suitable MTA may be employed with the systems described herein. The IP network can thus route calls between MTAs based on traditional routing methods, path availability, cost, levels of congestion, number of hops, and others aspects of traffic and routing engineering. To this end, as shown in Figure 1, the system 100 includes telephone systems 130 coupled to the MTAs 101. The MTAs exchange data across the IP Network 170, with signaling data 125 and voice data (RTP) 105 traveling across the network and between the Call Management Server (CMS) 120 and the PSTN gateway 110. The system 100 further depicts that coupled to the PSTN gateway is the PSTN network 150 with a class 4 or 5 switch for establishing circuit connections to the depicted telephony equipment 115.

[0013] In operation, a customer using the telephone 130 can use the MTA 101 to connect to the IP network 170 where the CMS 120 can support the delivery of a call across the PSTN gateway and onto the PSTN 150. Typically, the CMS provides call control and signaling-related services for the MTA, and PSTN gateways.

[0014] The approach provided by system 100, although effective for data communications, proves to be non-causal and non-deterministic for time-sensitive services like voice, where path latency, quality, privacy, and security need to be tightly controlled.

[0015] By contrast, the systems and methods described herein, provide paths that are deterministic, allowing for controlling the latency and the quality of the path to the switch. To this end, the systems and methods described herein provide, among other things, a single, highly integrated, secure, and reliable delivery point for multimedia services/data in a single network element with a single management system. By reducing the number of elements in the network, links are reduced, points of failure are reduced, and overall network reliability is increased. In addition, the level of integration provided by the described architecture allows for the efficient implementation of high-availability, fault-resilient methods of call-control, path restoration, and service delivery.

[0016] Figure 2 illustrates one exemplary embodiment of a system 200 according to the invention. As shown in Figure 2, the system 200 may comprise a plurality of MTAs 101, and a packet switch 210. The MTAs 101 and the packet switch 210 exchange information between themselves across the network 170.

System 200 also includes a circuit switch, depicted as a class 4 or 5 switch 116 that allows circuits to be established across the PSTN 150 to support calls made with the one or more telephones 115. In one application the system 200 is situated at the headend of a cable system. The system 200 may interface with the cable modem termination system (CMTS) in a packet-based architecture, and/or replace a Class 5 switch as the device between the public switching telephone network and the host digital terminal in a circuit environment, thereby offering a migration path from circuit switched telephony to packet switched telephony. The system 200 may, in certain embodiments, enable packet-to-packet, packet-to-circuit, circuit-to-packet and circuit-to-circuit calls over the same packet based switch fabric.

[0017] Accordingly, the system 200 depicts a system for supporting communication across an IP network and the PSTN. However, this architecture was chosen merely for the purpose of clarity in illustration and the systems and methods described herein are not so limited. For example, although Figure 2 depicts the PSTN network 150, the systems and methods described herein may be employed to support architectures that incorporate other types of switching networks. Thus, the systems and methods may be employed with any circuit-switched network that provides a 9318344 11, DOC-6- network in which a physical path is obtained for and typically dedicated, or substantially dedicated, to a single connection between two or more end-points in the network for the duration of the connection. Moreover, although Figure 2 depicts an IP network 170, the systems and methods described herein are not so limited and may be employed with other packet-switched networks. Thus, the system 200 may be employed with other types of packet networks in which small units of data (packets) are routed through a network based on the destination address contained within a packet. This type of communication is typically described as connectionless, rather than dedicated like the circuit-switched network.

[0018] The switch 210 includes interfaces to the IP 170 and PSTN Networks 150, optional Denial of Service (DoS) attack protection, optional encryption and decryption unit, routing and bridging, and TDM Coding/Decoding (CODEC) functions, as shown in Figure 3.

[0019] RTP data units, conventionally referred to as"packets,"originating from any MTA 101 in the IP network 170 are first received at an ingress port (not shown), processed by the Packet Interface 310, and sent, in one embodiment, to the Denial of Service Protection block 320. The DoS protection block 320 keeps Denial of Service attacks from reaching and degrading subsequent packet processing.

Packets are then decrypted to meet the security requirements of the IP network 170 and sent to the Routing & Bridging block 330.

[0020] Note that the"processing"referred to above includes reformatting the RTP data unit streams into encapsulated packet streams for use internal to the switch 210. These encapsulated packets (discussed further below) provide for efficient transport and receipt at the egress ports.

[0021] The Routing & Bridging block 330 applies the appropriate routing and/or bridging function based on the destination and services specified for the call to determine which egress port to send the data units out on. Packets can be rerouted (directed) back to the IP network 170, in which case they will be encrypted 324 and processed by the Packet Interface 310, or sent to the CODEC block 340.

9318344 11. DOC-7- [0022] The CODEC block performs standard coding and decoding functions such as those described in ITU Specifications G. 711, G. 729, G. 168, and/or N-way bridging.

[0023] The depicted Circuit Interface 350 provides a standard DSO circuit interface to the PSTN; likewise, the Call Management and Media Gateway Controller 370 performs typical functions defined for VoIP telephony and currently practiced in the art. Arranging these functions as illustrated protects IP network users and services from malicious attack and provides a unique solution for providing carrier grade telephony and CALEA monitoring services in a VoIP network.

[0024] In one embodiment, internal communications and switching functions within the switch are implemented using direct fiber optic connections through a fiber optic backplane equipped with removable optical backplane connectors. One removable back : plane connector is further described in U. S. Patent Application Serial No, 09/938, 228, filed on August 23, 2001, incorporated herein by reference in its entirety.

[0025] The depicted system 200 utilizes a low-latency, low-overhead, bandwidth-efficient method for DSO circuit encapsulation designed to carry circuit traffic as efficiently as packet traffic. This DSO circuit encapsulation method may be configured to accommodate mix of data units, packets, and VoIP traffic. In particular, it provides a method for encapsulating circuit traffic for low-overhead packet data unit switching through a fabric so that the specific delay requirements of voice and other latency-intolerant circuit traffic are met. Although the system 200 is described with reference to DSO traffic, it is not so limited and may be applied to other types and formats of traffic traveling across the circuit-switched network. In such applications, the information being transmitted across or otherwise associated with a circuit will be identified and encapsulated for transmission through the switch 210.

[0026] One exemplary encapsulation process is illustrated in Figure 4A.

Here, the Ingress flow (whose processing is shown in Fig. 4A) is a data stream coming into the switch from the circuit network, i. e. , the PSTN. The Egress flow (shown in 9318344_11. DOC-8- Fig. 4A) is the data stream leaving the switch and entering the PSTN in a TDM time slot.

[0027] For an ingress flow, shown in Fig. 4A, processing begins when the circuit data unit is received (read) during the selected TDM time slot, step 405. The process then checks memory to determine the routing information corresponding to the received data unit, step 410. The data unit is directed to a particular queue based on the egress port number derived, at least in part, from the routing information, step 415. In one embodiment, the memory stores a connection table. In a circuit-switch system a connection may be established between two ends of a circuit by provisioning one end of the circuit. The connection may be within the same system or between physically separate systems. In either case, the system is to have a transmission media with available bandwidth to send control cells that are carried in-band and that contain the information that the remote end of the circuit employs to provision that circuit. The systems and methods described herein may be employed for switching/routing control information as well as data being communicated by the parties.

[0028] In one embodiment, a circuit connection table is kept at both ends of the circuit. This table contains the new and existing circuit connection information.

The circuit that is to be established may be constant bit rate (CBR) or variable bit rate (VBR). A software process writes to the connection table at the local end of the circuit with the address information of the remote end of the circuit. The circuit connection information may be periodically sent to the remote end (s) to establish or tear down new connections. The frequency of these updates may be determined by the requirement of the application and the amount of bandwidth that one is willing to forfeit. In applications where a live person is waiting for the connection to be established, an update frequency of 5-50 times a second is sufficiently fast. Because this mechanism may be embedded in hardware such as filed programmable gate arrays, it can be very fast and introduces little overhead on the system processor. In either case, the connection table stores information that may be employed by the switch 210 for maintaining routing information for the circuits being handled.

9318344 11. DOC-9- [0029] In one embodiment, the incoming TDM data is sorted and data associated with one circuit is stored in one of a plurality of FIFOs and data associated with a second separate destination is stored in a different respective FIFO. In this way, each FIFO acts as a memory buffer that stores incoming data for a particular destination. This process is graphically depicted in Figure 4C which shows the incoming TDM data being sorted and stored into separate storage devices based on the destination associated with each incoming packet of TDM data. To this end, the switch may include a circuit demultiplexer that processes the incoming data flow to buffer data associated with different TDM circuits into different buffer locations.

Once the incoming TDM data is sorted into separate storage devices, the system may read data from the storage devices in a predictable and controlled manner. On the other end, the switch may include a circuit demultiplexer that has a table access process for accessing the connection table and for providing data to one or more of the TDM circuits at the output port.

[0030] Step 420 continues the process of steps 405 through 415 until enough data units are collected to fill a generated full size data unit (FSDU) by testing for a frame boundary, 425, after each addition to the queue. Once the FSDU is filled, a header is added to the FSDU, creating the encapsulated packet. The encapsulated data packet is then sent into the switch fabric and directed to the proper egress queue, in step 430. The process repeats at step 405 as long as there is data present at the ingress port.

[0031] The egress flow process, depicted in one embodiment, in Fig. 4B, is similar. In step 450, the encapsulated data packet is received from the switching fabric and placed in a FIFO buffer. The header is read, step 455, and the source port is determined from the information stored in the header. The source port identifier is used to read the memory location corresponding to this FSDU to determine the correct timeslot for each data unit in the FSDU in step 460. The data units in the FSDU are then unpacked (i. e. , re-formatted) and placed into jitter buffers corresponding to the destination timeslot for each data unit, step 465.

[0032] When the correct timeslot time arrives, the data units are read out of each jitter buffer and transmitted into the TDM stream.

[0033] The switching system processes described above with reference to Figures 4A and 4B may be realized as a one or more software processes operating on data processing platform. Li that embodiment, the processes may be implemented as a C language computer program, or a computer program written in any high level language including C++, Fortran, Java or BASIC. Additionally, in applications where the processes is code running on an embedded system, the computer programs may be written, in part or in whole, in microcode or written in a high level language and compiled down to microcode that can be executed on the platform employed. The development of programs is known to those of skill in the art, and such techniques are set forth in, for example, Stephen G. Kochan, Programming in C, Hayden Publishing (1983).

[0034] The depicted exemplary embodiment is used to switch both packet traffic and TDM-based DSO traffic simultaneously using the same fabric. A packet switch (by definition) is designed to handle the specific requirements of packet traffic, and the system 200 may provide conventional packet switching functionality as described elsewhere in this disclosure.

[0035] Turning to Figure 5, one embodiment of a system 500 according to the invention is depicted. Specifically, Figure 5 illustrates a packet switch fabric 500 capable of passing TDM traffic. As show, the packet switch fabric 500 has a plurality of data ports, 1,2,... 8. The switch is capable of routing FSDU packets between these data ports. As described in more detail below, in the depicted embodiment, several of the ingress ports, 3,4, and 5, are managing both incoming and outgoing flows of TDM data. Port 5 is capable of supporting a combination of traffic types. The switch 500 includes several TDM encapsulation circuits 512 that process a respective TDM data flow. The circuit demultiplexer processes the incoming data flow to buffer data associated with different TDM circuits into different buffer locations, as shown in Figure 4C. An internal timer process monitors the amount of time available to fill the FSDU, and when the time period reaches the frame boundary, the depicted FSDU generators 514 create an FSDU that is filled with data associated with the TDM circuits. The FSDU generators 514 create header information that is added for allowing the packet switch 500 to route the generated FSDU to a port associated with the respective TDM circuit.

[0036] Figure 5 further depicts that certain ports handle both type of traffic. A FSDU merge circuit 518 communicates with the FSDU generators 514 and the encapsulation circuits 512 to merge the generated packet flow for delivery to and from port 5.

[0037] As the ports 1,2,... 8 are depicted as bi-directional, the switch 500 may include a decapsulation circuit for processing a generated FSDU that has been passed through the packet switch. The decapsulation circuit provides data to one or more TDM circuits that are sending data from a port 1,2,.. 8 of the packet switch 500.

[0038] A packet switch fabric 500 has a finite number of high-speed ports 510 (eight, for example), as shown in Figure 5, although any number can be used in practice. This implies that there may be eight ingress (input) ports and eight egress (output) ports to the switch fabric. In each cycle time, the eight inputs may be connected in some fashion to the eight outputs to create up to eight connections.

During this cycle time, one FSDU may be passed on each connection. Depending on traffic behavior, not every input may be able to connect to the output required for the FSDU it has to send (i. e., contention may arise). In this situation, it may take several cycle times for all eight inputs to send one FSDU. If all inputs wish to send their FSDU to the same output, it will take eight cycle times for all the FSDUs to be sent.

[0039] Switch traffic may consist of any arbitrary mix of packet traffic and DSO traffic on the different ports. Some ports may be dedicated to packet traffic (e. g., port 511 in Fig. 5), some ports may be dedicated to DSO traffic (e. g. , port 513), and some ports may support a combination of traffic types (e. g. , port 515). The depicted switch 500 allows for any combination of traffic types without affecting the performance characteristics required by the applications. Furthermore, the switch 500 is transparent to the actual information represented by the packet or TDM (DSO) data 9318344 I 1. DOC-12- flows. Voice signals, data, FAX or modem signals, video, graphics, or any other information can be carried and switched with equal ease and facility by and within the systems described herein.

[0040] Typically, a DSO TDM circuit carries a single byte of data every 125 usec. This time interval is more commonly referred to as a frame in the telecommunications arts. Since the packet fabric has an FSDU of between 64 bytes and 256 bytes, a single DSO TDM circuit does not come close to filling an FSDU.

The remaining space in the FSDU would be wasted by filling it with padding if an individual DSO circuit was dedicated to a single FSDU. Thus, it would be very inefficient to map a single byte of data into such a large FSDU.

[0041] One option is to wait a longer period of time in order to accumulate a larger number of bytes for the DSO circuit. In order to fill a 64 byte FSDU, one would have to wait 8 msec, or 32 msec for a 256 byte FSDU. With voice calls, this represents a significant delay for the circuit and does not meet typical"toll"Quality of Service network switching requirements. It also requires a much larger amount of memory to handle the temporary storage of this data. Neither one of these options is ideal.

[0042] According to some embodiments, multiple DSO TDM circuits are combined within each frame time to more completely fill an FSDU 610. This combination is illustrated in Figure 6. Since there is a fixed and manageable number of switch fabric ports, it is reasonable to fill FSDUs 610 with multiple DSO circuits destined for the same switch fabric output port.

[0043] This a dynamic process: as DSO circuits destined for a particular switch fabric come and go, the process dynamically adds and drops DSO circuits to efficiently fill the FSDU 610 destined for each port. To this end, the packet switch 500 may include a dropped-circuit detector that determines if a circuit is dropped. In one embodiment, the dropped circuit detector is a table that monitors the number of circuits in the data flow of TDM data. If a circuit is dropped, all circuits that are still active will slide to fill in the hole left by the dropped circuit. The dropped circuit detector may be a hardware device, a software process or any combination thereof.

[0044] For each of the switch fabric ports receiving packet flow data, a separate circuit 520 (referring to Fig. 5) is used to generate FSDUs for traffic headed for that port. A header 620 at the beginning of each FSDU identifies the switch fabric port along with priority information to guide the FSDU properly through the switch fabric. The rest of the FSDU is filled with TDM data. For example, if a 64 byte FSDU contains 2 bytes of overhead, the remainder of the FSDU can be filled with 62 DSO TDM channels. Within each frame, as a single byte of data from each circuit is received on the device, it is added to the FSDU for the particular destination port to which it should be sent. When an FSDU is full, it is sent to the packet switch and another FSDU is started for that port. In typical embodiments, all FSDUs are sent by the end of the frame, even if they are not completely filled, in order to keep the delay of the method to 125 usec. Using this scheme, the data is filled in the FSDUs with 62/64 or 96.9% efficiency per FSDU except for at most 16 FSDUs that may be only partially filled. With a 256 byte FSDU, this efficiency rises to 254/256 or 99.2% per FSDU. If the switch can handle 16,000 circuits, the 64 byte FSDU has an overall efficiency of 91.2% and the 256 byte FSDU has an overall efficiency of 93.4%. In comparison, putting a single DSO TDM channel into an FSDU has an efficiency of only 1.6%.

[0045] On the output (egress) side of the switch fabric, the FSDUs are processed (reformatted) to remove the headers and pull out the individual DSO channels. The routing information for each channel can be made available to the processing device at the output of the switch fabric through different methods that are well-known in the switching and routing arts. The DSOs are then put back into a TDM format again for transmission through the egress ports and into the network.

[0046] In conjunction with the method for packing the FSDUs as described above, since the DSO data is very time sensitive, it must be passed through the switch fabric with a guaranteed maximum latency. As mentioned above, the switch fabric is shared with packet traffic that can exhibit bursty arrival behavior. To 9318344 11. DOC-14- prevent packet traffic from competing with the high priority TDM traffic, the TDM- based FSDUs are assigned a priority level that is higher than that of the packet traffic.

Provided the bandwidth allocated for the particular priority level assigned to TDM traffic is not exceeded, the amount of TDM traffic is deterministic, and the latency for TDM traffic through the switch fabric can be guaranteed independent of the quantity of packet traffic. To this end, the switch 500 may include a priority circuit for associating a routing priority level with a generated FSDU. In one embodiment, the circuit adds priority data to the header information of the respective FSDUs.

Additionally, the switch 500 may include a bandwidth allocation process for allocating bandwidth for the generated FSDU traffic. The bandwidth allocation process may be a software process that provides a predetermined latency period for routing traffic through the packet switch.

[0047] The effect of this method is such that the circuit and packet traffic can be mixed in any combination through the switch fabric with neither type of traffic impacting the performance of the other.

[0048] Since the latency can be bounded, it is now possible to reconstruct the TDM stream on the output side of the switch fabric and align the individual data streams into the correct time slot assignment. In order to accomplish this, a jitter buffer is necessary to smooth out the variable delays encountered by going through the packet switch. Since low latency is important, it is necessary to size the jitter buffer as small as possible ; however, the jitter buffer must be sized large enough to account for the minimum and maximum latencies through the switching fabric. Using commercially available packet switch fabrics, it is reasonable to constrain the jitter buffer to introduce an additional delay of no more than a single frame time of 125 microseconds.

[0049] This system architecture's switch fabric implementation thus produces a single switch fabric with the latency and throughput performance equivalent to having separate dedicated TDM and packet switches. It accomplishes this without the expense of multiple fabrics or the complexity of interconnecting them.

9318344 11. DOC-15- [0050] Although the systems and methods described have been directed to systems that exchange information, whether voice, data, fax, video, audio or multi- media, over the PSTN network 150, the systems and methods described herein are not so limited. The systems and methods describe herein may be employed to support architectures that incorporate other types of switching networks. Thus, the systems and methods may be employed with other circuit-switched networks that provide a network in which a physical path is obtained for and typically dedicated, or substantially dedicated, to a single connection between two or more end-points in the network for the duration of the connection. Moreover, although Figure 2 depicts an IP network 170, the systems and methods described herein are not so limited and may be employed with other packet-switched networks. Thus, the system 200 may be employed with other types of packet networks in which small units of data (packets) are routed through a network based on the destination address contained within a packet. This type of communication is typically described as connectionless, rather than dedicated like the circuit-switched network.

[0051] In-Band Control Mechanism [0052] Circuits are typically bi-directional entities that require a simultaneous coordination of both ends of the circuit to establish the circuit.

Provisioning mechanisms may be out-of-band which requires an additional communication channel, or in-band which requires some available bandwidth for the setup control messages. In-band mechanisms are often preferred because they eliminate the need for this additional control channel. In either case, it usually results in an asynchronous setup whereby one end of the circuit is initiated prior to the other, resulting in delays before the circuit can be used. In addition, an in-band provisioning mechanism allows the connection state to be refreshed periodically to prevent the termination of a connection in the case where connection state information is lost. A valuable benefit achieved by this refresh capability is to provide an efficient failover mechanism when the hardware retaining the circuit state information must be failed over to another piece of hardware.

[0053] The in-band control mechanism described here applies very little overhead to the communication channel and can easily be implemented in a programmable logic device. This mechanism applies seamlessly to a 1: 1 or 1: N protection scheme.

[0054] In the in-band control embodiments described herein, a connection is established between two ends of a circuit by provisioning at one end of the circuit.

The connection may be within the same system or between physically separate systems. In either case, the system is to have a transmission media with available bandwidth to send control cells that are carried in-band and that contain the information that the remote end of the circuit employs to provision that circuit. In one embodiment, a circuit connection table is kept at both ends of the circuit. This table contains the new and existing circuit connection information. The circuit that is to be established can be constant bit rate (CBR) or variable bit rate (VBR). A software process writes to the connection table at the local end of the circuit with the address information of the remote end of the circuit. The circuit connection information is to be periodically sent to the remote end (s) to establish or tear down new connections.

The frequency of these updates may be determined by the requirement of the application and the amount of bandwidth that one is willing to forfeit. In applications where a live person is waiting for the connection to be established, an update frequency of 5-50 times a second is sufficiently fast. At the times designated by the update frequency, the connection information from the connection table is read by hardware, formed into control cells, routing headers are attached, and the control cells are sent to the remote end (s). Because this mechanism may be embedded in hardware, it can be very fast and introduces little to no overhead on the system processor.

[0055] The control cells are received by the remote end (s), parsed, and entered into the connection memory table at the device on that end. In one practice, the control cells are structured as a concatenation of destination addresses. Explicit source address information is not necessary, as the order in which data arrives at the remote end is representative of the source address for that data. Thus, this scheme employs the ordering of the addresses in the control cell to be representative of the source address for that data. The order is recorded in the connection table at the 9318344 11. DOC-17- remote end. This control structure allows one to use a very efficient data cell structure.

The data in the data cells can be sent without address information, as the ordering of the data matches the ordering of the addresses in the control cells [0056] An acknowledgement cell is returned from destination to source to validate the receipt of these control cells. Once the source receives the acknowledgement, it is free to start sending data for the new connections. In one embodiment, a simple mechanism is utilized to identify to the destination that the source has started sending the updated table. A single bit is used in the data header to identify that a"table swap"has occurred. Because of this, the data for the new connections can be sent at any time without the concern of synchronizing the far-end.

Thus the latency of the connection is irrelevant for the setup and the scheme can be used without concern about latency.

[0057] Since the control cell information is only sent periodically, it introduces only a small bandwidth penalty on the transmission channel. For example, voice calls are sent at a frequency of 8000 times per second. If the control information is sent at a rate of 5 to 50 times a second, the control cells are 160 to 1600 times less frequent than the data cells.

[0058] With this control cell structure, there is also considerable flexibility in the number of circuits that can be controlled. Control and data cells can be made arbitrarily long and not affect the protocol.

[0059] Each time connection changes need to be made, or on a periodic basis regardless of changes, the entire connection map or portions thereof, may be sent to the remote/destination end. This provides advantages in the area of protection switching. In the case of a far-end equipment failure, where an equipment switch occurs but the connection information is lost, the control cells will arrive again and refresh the connection memory. In the case of a network protection switching event, the near-end can reroute the entire set of connections quickly as the systems may be fully implemented in hardware and is very bandwidth efficient.

[0060] Figure 7 depicts one embodiment of the system according to the invention for providing an efficient in-band control mechanism for remote provisioning of circuits. Specifically, Figure 7 depicts a system 700 that includes a switch fabric 710 that couples between a local end or aggregation point 720 and a plurality of remote ends 722A, 722B and 722C. As shown in Figure 7, the local aggregation point 720 can be a point at which the system 700 couples to an incoming trunk or trunks that may comprise a plurality of T1 lines delivering TDM data. As is known to those of skill in the art, Tl is a standardized TDM technology. Tl belongs to the physical layer in the OSI reference model, and T1 lines mostly connect between PABX's and CO's. The T1 standard is mostly deployed in USA, with E1 and J1 being other standards deployed in other areas. However, the systems and methods described herein do not depend on any particular standard or data format.

[0061] As shown in Figure 7 the TDM data is received at this local aggregation point and will be processed by the switch 710 to move data across a plurality of circuits being supported by that switch 710. To set up the circuits, the system 700 employs the above described in-band control mechanism that performs remote provisioning for bidirectional circuits, although these systems can also be used with unidirectional circuits or multi-cast or broadcast channels. As will be described more fully below with reference to Figures 4C and 8 and with reference to Table 2, the system 700 sets up flows or connectors between calls. To this end, circuits are set up between local and remote points in the call. The system 700 uses in-band delivery of control cells which have information that can be employed at the remote end for setting up the remote end of a circuit or a plurality of circuits.

[0062] In one practice, the system 700 sorts the incoming TDM data into a plurality of separate storage devices and each storage device may be associated with a particular destination where a destination can represent a multiplicity of circuits. For example, the incoming TDM data may have information associated with four different calls to be made on four separate circuits. At the local end of the call the system can sort the incoming data based on the destination associated with that data and can store the data associated with one particular destination in a separate respective storage device. The local end of the circuit has a plurality of FIFO storage devices. In 931S34411. DOC-19- operation the incoming TDM data is sorted and data associated with one circuit is stored in one of the respective FIFOs and data associated with a second separate destination is stored in a different FIFO. In this way, each FIFO act as a memory buffer that stores incoming data for a particular destination. This process is graphically depicted in Figure 4C which shows the incoming TDM data being sorted and stored into separate storage devices based on the destination associated with each incoming packet of TDM data. Once the incoming TDM data is sorted into separate storage devices, the system can read data from the storage devices in a predictable sequence. Accordingly, in the systems and methods described herein, information is read from the storage devices of Figure 4C according to an established sequence.

Thus, the sequence of data packets transmitted from the local end is selected to indicate with which circuit the data packet is associated. In the depicted embodiment, the system provides separate memory devices, however in other embodiments, fewer memory devices may be provided and a software-based system can maintain separate locations for each of the circuits. In this way a series of FIFO memory devices may be simulated and used with the systems and methods described herein.

[0063] Table 2 depicts one example of a connection table that may be established at the local end to store information that is representative of the relationship between the sequence of data being transmitted from the local end and the circuit to which each data packet in that sequence is associated. As shown in Table 2, the connection table may include a circuit ID, such as for example, CKTO, as well as a memory address or range of memory addresses, that represent where information associated with that respective circuit should be stored at the remote end. Information from the connection table may be encoded into control cells that may be transmitted in-band with the data being delivered from the local end to the remote end.

TABLE 2 Connection Table CKT 0 Memory Add CKT 1 Memory Add 0 o o CKTN Memory Add [0064] Figure 8 presents two examples of how data may be encoded into control cells. Figure 8 depicts a first example wherein a routing header is attached to a plurality of control cells. Each control cell includes information that represents the relationship between the sequence of data packets being transmitted from the local end and the circuits to which the respective data packets are associated. Additional information within the control cell can include the memory addresses that are to be set aside at the remote end for storing the data being transmitted across a respective one of the circuits. In one embodiment, the remote end includes an addressable memory device that can store information within a specified address or within a specified range of addresses. Data packets associated with a particular source may be stored within an established address range. At the remote end the control cell information may be received and stored within a control cell memory location. Once this control data is stored in the control cell memory location the remote end can begin receiving data packets and can sort the data packets into particular addresses within a memory device based upon the order in which the data packets arrives at the remote end. As circuits are added or removed, new control cell data may be sent that will replace the existing control cell information stored at the connection table at the remote end.

Additionally, as these circuits may be bidirectional, both the remote and local end may set up connection tables that are identical thereby allowing for bidirectional transfer of data across the circuits.

[0065] Figure 8 also depicts an alternate embodiment wherein each control cell is associated with a separate routing header. In this embodiment a plurality of routing headers may be associated with the control cell information so that the control cell information may be distributed across a plurality of remote ends. In this way switching can occur not just in a 1: 1 manner but also in a 1 : N manner.

[0066] The in-band control mechanism has been described above with reference to call set up and tear down. However, this mechanism may be employed to support other control functions including call management, and tear down, wireless services such as personal communications services (PCS), wireless roaming, and mobile subscriber authentication, local number portability (LNP), toll-free and toll wireline services, enhanced call features such as call forwarding, calling party name/number display, and three-way calling.

[0067] Thus, the systems described herein may be employed to support VoIP (Internet Protocol), a connectionless protocol that typically is understood to allow traffic (packets) to be routed along paths with the least congestion.

Additionally and alternatively, the systems and methods described herein may be employed to support VoDSL (Digital Subscriber Line). As is generally known, DSL is employed by phone companies to provide data and communications services on their existing infrastructure. VoDSL can be delivered over ATM, frame relay or IP. In such applications, the systems will typically include DSL modems (or Integrated Access Devices-IAD) in the consumer's home to interface between the DSL network service and the customer's voice/data equipment, and a voice gateway, which receives the voice traffic from the data network, depacketizes it and sends it to a Class-5 switch on the PSTN 150. The switch provides dial tone, call routing and other services for the circuit-switched network. In a further alternate embodiment, the systems and methods described herein may be used in VoFR (Frame Relay) applications. Such technology is used in corporate networks. Thus, in these applications, the systems and methods described herein may be employed for exchanging data across an LAN.

9318344 l l. DOC-22- Further, other applications include VoATM (Asynchronous Transfer Mode). As is generally known in the art, ATM is a high-speed scalable technology that supports voice, data and video simultaneously. It uses short, fixed-length packets called cells for transport. Information is divided within the cells, transported and then re- assembled at their destination. ATM allows providers to guarantee QoS. Many carriers use ATM in the core of their networks. It is also chosen by large corporations to connect locations over a WAN. ATM employs out-of-band signaling which enables it to establish call paths for the voice samples using signaling-only circuits that handle signaling for a large number of calls. The systems of the invention may be modified to support ATM networks and in some embodiments may support out-of-band signaling.

[0068] Below are presented several systems and processes that provide improved equipment for delivering voice services over a cable network. As will be described below, these systems and processes provide redundant components that can be brought on line as needed to isolate a failed component from the system. To this end, the systems may employ a voting process that allows the cards in a system to vote on the health or operating condition of other cards in the system.

[0069] Distributed Arbitration Process with Voting to Detect Failed Cards [0070] In a system with more than one hardware-based controller, a single controller typically acts as the master and the other controllers act as the slaves, ready to become the master if the master should fail. It is generally a difficult problem to correctly assign master and slave relationships to the controllers in the presence of all possible failures. Typically, software is written to implement a process that fails to consider certain hardware failures that will cause a failed controller to be given master status in the system. The systems and methods described herein provide a mechanism integrated with both hardware and software that makes a comprehensive or substantially comprehensive determination of who should be the master controller. In addition, the systems and methods described herein provide a mechanism for effectively isolating the failed controller from the system so that it can not harm the system. Moreover, in the systems and methods described herein, the master/slave arbitration system employs a distributed architecture that distributes control of the 9318344 11. DOC-23- arbitration system and process across a plurality of devices. In this way the system is more resistant to error arising from a single point of failure.

[0071] The systems and methods include an arbitration process that allows for isolating the failed card. As will be seen from the following description, the arbitration process employs a software arbitration process to arbitrate between a plurality of cards to identify or select one of the cards to be the master. Any suitable master/slave arbitration process may be employed for initially selecting master/slave status for the plurality of cards and suitable arbitration processes are known to those of skill in the art. For example, upon system initiation a master card may be selected by default, such as by function of card slot. Once selected, the master card is typically activated or brought on line to function within the system. Any card or cards that provide redundancy support will typically, upon power-up, be isolated from the system. Again, the initial arbitration or selection of which the available cards are to be brought on-line may be accomplished by any suitable technique, including by back plane wiring of the card slot.

[0072] The arbitration process includes a voting process that allows cards within the system to cast votes to determine the health of each of the individual cards.

The voting process thereby allows the system to determine whether a bad card is present and to make sure that a bad card has not been selected to be the master card for the system. By preventing a bad or malfunctioning card from being selected as the master, the systems and methods described herein guard against a system failure that may arise from appointing a malfunctioning card as the master card. Optionally and preferably, the systems described herein will also isolate a malfunctioning card from the system. Further optionally, the systems may select a new card as active or master as a result of isolating a failing active card.

[0073] Figure 9 depicts diagrammatically the master/slave (or active/protected) and healthy/isolated states employed by one embodiment of the system. In particular, Figure 9 depicts diagramatically a state diagram 910 that includes an active state 912 and a protection state 914. In the active state 912, a card would be active and exchanging information across the back-or mid-plane with other 9318344 11. DOC-24- cards in the system. In the protection state 914, as will be described in greater detail hereinafter, the card would be decoupled or otherwise logically and/or physically separated from the mid-plane, back-plane or other cards in this system. As further shown in Figure 9, the card can move back and forth from the active state 912 to the protection state 914 as shown by the state diagram transfer lines 916. Similarly, within the active state 16, the card can alternate between a healthy state 918 and an isolated state 920. Similarly, in protection state 914, the card can move between a healthy state 922 and an isolated state 924. Thus, Figure 9 depicts a set of states that a card may be in.

[0074] The state diagram 910 depicted in Figure 9 can be coded into a state machine of the kind commonly employed in digital logic design. The design and development of such state machines is known to those of skill in the art and is discussed in David J. Cromer, Digital Logic and State Machine Design, CBS College Publishing, New York (1995). In a typical implementation, the state machine is built using a programmable logic device that has input and output pins that can receive and transmit signals to drive the state machine from one state to another and to generate the appropriate responses as it moves from state to state. Any suitable programmable device may be used, including commercially available devices from ATMEL Corporation of San Jose, California, including the ATMEL 6000 FPGA [0075] Figure 9 shows that in response to a particular state, the actions and operations of the card may be modified in a way that reduces the likelihood of overall system failure. This is true whether the card is in active state 912 or protection state 914. As described herein, a software master/slave arbitration process may be employed for performing active/protection arbitration regardless of state, as even a card in protection state can cause system damage if it is failing. In one process, to distinguish between healthy cards and cards that are to be isolated, the systems and methods employ a health voting process that requires a majority (i. e. , two of three) of cards to vote that a particular card is healthy.

[0076] This process handles system level redundancy between different cards in any type of system. For purposes of illustration the systems and processes are 9318344 11. DOC-25- described with reference to a switch system that supports the switching of data being transferred across a network including a cable network. In this example, the system and methods will be discussed with respect to a switch system that has system control processor (SCP) cards and routing switch fabric (RSF) cards, including redundant pairs with stable common system control handling for alarms, reset, power control, communications, etc. As described herein, certain embodiments include hardware support for processor isolation, as well as isolation mechanisms for common system controls and insertion scenarios. Hardware support may also be provided for system power-up sequencing. The hardware portion of the redundancy circuit may contain signals which are distributed on a midplane to assist in health determinism, as well as indicators for slot position and Active/Protection status. The redundancy circuitry may be identical on each card. In one embodiment the software process implements an Active/Protection arbitration process after system start. Hardware, in this embodiment, is responsible for protection (via isolation) of the system.

[0077] In this embodiment, hardware-level health voting is employed to determine health/isolate status. Figure 10 depicts four cards, or at least the redundancy circuits on four cards, connected together, including two SCP cards (SCP 7 and SCP 10) and two RSF cards (RSF 8 and RSF 9). More particularly, Figure 10 depicts a system that comprises four circuit cards shown in Figure 10 as an SCP card 1032 placed in slot number 7, RSF card 1033 placed in slot number 8, RSF card 1034 placed in slot number 9 and SCP card 1038 placed in slot number 10. As further depicted in Figure 2, each of the circuit cards 1032 through 1038 includes a vote-out logic circuit 1040 and a vote-in logic circuit 1042. The system 1030 includes two SCP cards and two RSF cards and one of each pair of cards can be in the active state 912 and one in the protection state 914.

[0078] Each depicted card 1032-1038 has a six-pin interface (on all four cards). Three pins are connected to circuit block 1040 titled"Vote Out"and three pins are connected to a circuit block 1042 labeled'Vote In". As shown, each of the three pins on the Vote Out block is connected to one Vote In pin of a respective one of the three other cards in the system 1030. Thus, the voting circuitry of each card is connected to the voting circuitry of each other card in the depicted system 1030. In 9318344 11. DOC-26- this embodiment and for this purpose, all four cards behave identically, and have the same state machines and hardware. The hardware and state machines support the Voting process.

[0079] Although Figure 10 depicts a system 1030 having 2 pairs of cards, where one card in the pair is active and another card in the pair is protected, the systems are not so limited. For example each card type may have multiple redundant cards or devices, and can select among the available redundant cards or devices as needed to replace a failing active or protected card. Additionally, the system 1030 employs hardware redundancy for each card type. This is not necessary, and in certain applications only some card types are part of the arbitration system. In still other embodiments, certain ones or all of the cards may be supported by software redundancy systems that activate software modules to simulate the functioning of one of the cards. Thus, in these embodiments, the software modules can exchange vote signals with the circuits and with other software modules. Moreover, in certain alternate embodiments the system may comprise a network of computers/servers having redundant components, and the arbitration system can bring systems online and offline as appropriate. Other systems and applications of the invention will be apparent to those of skill in the art.

[0080] In the embodiment of Figure 10, each card 1032-1038 can give and receive votes to and from the other three cards in the system 1030. In the embodiments described herein, no single card can change the status of another card, however a two out of three vote is required to keep the card in a"healthy'state. Thus, if the software on a card suspects that another card in the system 1030 is faulty, that card can vote on the health of the suspect card. If another card in the system 1030 also suspects that the same card is faulty, it too can vote on the health of the suspect card. In this case, where two of the three cards vote that the card is faulty, the state machine of the suspect card can identify that its health is suspect and move to put itself in an isolation state where it cannot harm the rest of the system 1030. A card can suspect another card of being faulty if it appears to the first card that the other card is not responding, is responding too slowly, delivers data with parity or checksum errors, or for other similar reasons. In the embodiment of Figure 10, the Voting 9318344 11. DOC-27- system includes four cards and all three votes are considered. However, in other embodiments, where the number of cards may be larger or smaller, the percentage or the number of cards that need to vote a certain way to change the status of a card can vary. For example, in some cases all cards must vote the same way to change the health status of a card. In other embodiments, the system provides a weighting function that weighs the votes of some cards more heavily than others, or that allows one card to dictate status regardless of the votes received from other cards. In other embodiments, only certain cards connect to other cards. Still other embodiments and practices may be employed as the application requires.

[0081] In one embodiment, the override is implemented as part of the redundancy circuit. To this end, the override may part of the logical state machine implemented as part of a CPLD/FPGA device. The unlock sequence may be a set of predetermined logical operations, such as unique memory access, mathematical operations using open and stored in preselected memory locations, and other such functions, that move a sequence lock through a series of steps. By following these steps, the programmable logic device will move through a series of states that result in granting the override software access to the control register that sets the state of the vote out signal or signals.

[0082] Vote Out Functions [0083] The circuit in block 1040, in one embodiment, includes, a firewall unlock sequence register to allow local software override and control of voting status for any of the'vote out'signals. This provides a mechanism for higher-layer software redundancy protocols to vote in the hardware isolation mechanism. (i. e. , if local software on SCP #7 believes SCP #10 is bad, it may vote to cause isolation by accessing the vote control register). A failure of the hardware watchdog, local protocol violation, or assertion of reset will drive all'vote out'signals to the healthy state (a defective card should not affect 2/3 voting. Its vote is invalid when unstable).

This also applies when a card is voted into isolation, its own vote outputs will go "healthy".

[0084] Vote In Functions [0085] In one practice, voting is by 2 out of 3 system cards. There is no vote from the local card, external inputs are the only votes present. No single card may take out another, and at least two of three must concur on the fourth's condition.

A card may not vote on its own health, other than negatively (failure due to heartbeat timeout, etc). Optionally, a system card may isolate itself through other means. The Vote In function is implemented as a state machine that responds to input signals, including the Vote input signals from the other cards. In one embodiment, the state machine is implemented in a programmable logic device, although other designs, including microcontroller based designs, may also be employed.

[0086] As described above, a vote is accepted from each of the three other cards present in the system 1030. Vote inputs 1042 to the redundancy circuits are pulled to the healthy state at the input to the card receiving the vote. This along with optional debounce protects the cards against live card removal interfering with the voting. Votes may be debounced over a reasonable time period, using a local free-run timing oscillator. In a non-redundant configuration, or a configuration where a card is removed, the vote may be pulled to the healthy state, this is accomplished by pull-up resistors, typically on the plane or termination card. This ensures effective de- activation of the voting portion of the redundancy circuit with this system configuration. Note that in a system configuration with two RSFs and a single SCP (redundant switch fabrics, but non-redundant controllers), it is possible for both RSF's to vote the primary SCP to isolation. In other embodiments, the state machine in the SCP card, or any of the cards, may prevent itself from being isolated, if it is the only card performing a certain function. However, in a preferred embodiment, any card can be voted out if it appears to be malfunctioning to a significant number of cards in the system, or to other higher level processes. In a case where the only remaining SCP card is voted out, a termination card, or pull-ups on the card should place the card in an isolated state that does not harm the overall system 1030.

[0087] Although the system and process discussed above with reference to the Figures describes a system that provides each card with on board logic to vote on 931834411. DOC-29- the health of other cards in the system and to respond to the votes of other cards, it will be apparent to those of skill in the art the systems and processed are not so limited. For example, in other embodiments, the systems may provide a back-plane or mid-plane that includes some or all of the voting logic. Moreover, although the described systems provide a preferred approach that distributes the arbitration process, it can be, in other embodiments, that the systems employ a central logic device that performs the voting arbitration process described above.

[0088] Along with the health voting process, other events may be monitored by the system cards 1032-1038, typically in hardware, to determine when a card should be isolated. One example presented below is a heartbeat monitor.

However this is not the only event that may be of interest and other events may also be monitored and employed to decide the proper state of the system card.

[0089] Heartbeat [0090] In one embodiment a retriggerable monostable timer runs in hardware and is to be accessed by software periodically to ensure the redundancy circuit does not isolate the card. This is a mechanism used to determine the health of the local processor/software. Reset or power cycling will restart the timer. Rather than have a power-on fixed delay to wait for software start, the timer's start may be initiated by the first heartbeat access by software. Control register access may be denied until the start of the hardware watchdog timer. In the event that software does not initialize to the point of being able to participate, the card may be voted at any time into isolation by the above described 2/3 voting mechanism. Otherwise, the redundancy circuit will idle waiting for the first heartbeat. The first access the software makes to the redundancy circuit will be to the hardware watchdog. Note that the card may not be isolated upon power-up, cold start, or reset. This allows it to participate in power sequencing, etc. However, control register access may be denied as indicated above until: the Heartbeats start, and all registers are initialized.

[0091] Figure 11 depicts one example of an isolation sequence that may ensure that a processor is active before granting any register access. More 9318344 11, DOC-30- particularly, Figure 11 depicts a process 1150 for isolating a card from the system 1030. As shown in Figure 11, the process begins in a step 1152 wherein the card powers up from a cold start. Once powered up the process proceeds to step 1154 wherein an onboard software process waits to detect a heartbeat generated, typically, by the local oscillator or circuit connected to the local oscillator. The heartbeat detect step 1154 waits for a heartbeat to ensure that the processor is up and active before any register access is granted. After step 1154 the process 1150 proceeds to step 1156 which is an idle mode. In the idle mode of step 1156 the processor waits for an attempt at access via the processor. Departure from step 1156 requires a successful unlock sequence be carried out. If there is a successful unlock operation, then the process 1150 proceeds from step 1156 to step 1160.

[0092] At step 60, the process 1150 waits to determine whether or not there is a heartbeat. This optional step 1160 ensures that the processor is still alive.

Once the heartbeat is detected the process proceeds to step 62 where it waits for the control register access to occur. In this step the process 1150 will allow a write to the control register. Thus in step 1162 the process 1150 grants access to the control register. After step 1162 if the access has occurred or there has been a timeout the process can proceed back to step 1156 wherein process 1150 will idle until driven from that state.

[0093] In the alternative, at step 1162 if a heartbeat failure is detected the process proceeds to step 1158 wherein the system is deemed to have failed and the health status is set to Null indicating that the card is not healthy. It will be noted that state 1158 indicating the card has failed is accessible from states 1156 and 60 either because a heartbeat failure occurs or because the system 1030 was unable to perform an unlock sequence on the control register. Thus, as shown, for any access to be successful, the following state process is to occur: the card is to present an active software heartbeat to the watchdog, and it is to pass an unlock sequence each time it desires access to a system control function. This mechanism reduces the timeframe in which the processor has access to the system control functions. Without the unlock mechanism, the processor could access the control functions at any time. With the unlock mechanism, the system 1030 is only vulnerable to processor failure between 9318344 11. DOC-31- the unlock request, and the actual register access. This substantially reduces the probability of failure (a double or greater fault is now required). The isolation level increases to complete isolation in the event of heartbeat or protocol failure, as the circuitry enacts full and immediate protection. These system functions do not require high-speed access, so the insertion of unlock time is not an issue.

[0094] During the intervening period of time post failure, but pre-isolation, the system 1030 is vulnerable to misbehavior by the defective card. Isolation may be bounded by hardware detection (i. e. , watchdog timer, etc) and/or software detection (voting input from other cards). During the time before isolation is enacted by the control response from the first mechanism that detects failure, the common system control functions are at risk. The unlock firewall protocol protects the memory locations during this time. Thus, a firewall unlock process may also be employed to protect against memory failures. One such process is depicted in Figure 12.

[0095] Thus for example, in a system that requires a number of cards to vote on the health of a particular card before that card is deemed unhealthy, the meaningful period of time may pass before a faulty card is detected and taken out of commission. To card against the failures that may occur during this time period, the system 1030 may include a firewall with an unlock procedure or protocol that mitigates the likelihood that the failing card will effect the system 1030 during this intervening period.

[0096] Turning to Figure 12 one process 1270 for unlocking a firewall is depicted. As shown therein, the process 1270 may begin in a step 1272 wherein the card powers up from a cold start. After step 1272 the process may proceed to step 1274 wherein the process waits to detect the heartbeat of the card. If a heartbeat is detected then the system may proceed to step 1276 wherein access at a particular location, typically a unique address, is made. In step 1276 a counter is read and that information is employed in step 1278 to access a code and count. If that access is accessible and the address was correct then the process proceeds to step 80 wherein an access can be made to a matching control register. If this access is successful then the process may return to step 1274 wherein it waits for a heartbeat to be detected. As 931S34411. DOC-32- shown in Figure 12, if at any point during the process 1270 a step fails to complete successfully, such as because there is a boundary violation, or a write access has occurred to a wrong address or that a wrong code timeout has been determined or a bad value has been read, the system can proceed to step 1282 wherein the unlock process fails. Optionally, the failure of the unlock process can lead to an indication that the health of the card is questionable or has failed. This information can be used internally for allowing the local card to arbitrate its own health status.

[0097] In certain embodiments, there are six known events that drive card isolation. The card's access to the common system function nets will be isolated when: The card's own redundancy circuit detects a heartbeat timeout, and therefore isolates the card; the voting inputs of the other cards are not sufficient to keep the card active in the system ; the software, through proper access to the firewall unlock protocol, voluntarily disables the card; an unlock sequence failure occurs; a parity error occurs ; and reset is asserted to the card.

[0098] Improving LAN Availability [0099] Presently disclosed is a method and apparatus for improving local area network (LAN) availability by implementing a standards-based link up/linlc down status detection protocol on segment-to-segment communications paths. Also disclosed is a method to increase data throughput by employing a compaction method that substitutes fixed values in a packet header with a tag value. This reduces the amount of data that needs to be processed and allows for quicker amortization of overhead.

[00100] The protocol for determining the status may employ the industry- standard Logical Link Control (LLC) Type 1"test frame,"described in IEEE Standard 802.2, to provide Ethernet status test messages and return responses. Continuous status information thus provided enables greater LAN efficiency by enabling rapid routing table updates in the LAN (or attached WAN), thus avoiding inefficient routing to or through disabled or unavailable (down) nodes.

[00101] According to one embodiment, a method of improving network availability in a segmented network includes the acts of periodically transmitting a test message over a plurality of communication links from a source node in communication with a source network segment to a plurality of destination nodes, each of the plurality of destination nodes being in communication with a respective destination network segment ; generating, for each of the plurality of destination nodes, a return message if the test message is received at the destination node; determining the status of each of the plurality of communication links in response to the return messages generated by the plurality of destination nodes ; and providing the status of the plurality of communication links to each of the plurality of destination nodes that generated a return message.

[00102] In another exemplary embodiment, the method disclosed first detects the initial state of the network by observing the routing table at the local host or node on which the method is operating. That node may be a router, load balancer, firewall, special-purpose device or simply a host connected to the network.

[00103] Next, messages are sent by that node to all connected nodes on the network. These messages are preferably LLC Type 1 test frame messages, but other standard types of compact messages are useable.

[00104] In one embodiment, the present method may operate simultaneously on all nodes in the network segment to be protected. Each node then performs its own self-discovery of adjacency and the status of the adjacent nodes and links. This information is then used to update an adjacency status table at each node with adjacency information seen from the perspective of that node.

[00105] In an alternate embodiment, less than all of the nodes in the segment may utilize the present method. More than one node should preferably operate, however, in order to provide redundancy.

[00106] In another embodiment, a fault in one of the one or more paths may be present if the source node does not receive at least a predetermined number of 9318344 11. DOC-34- return messages from the destination nodes in response to a predetermined number of test messages transmitted to the destination nodes.

[00107] The status can be determined by waiting a pre-determined period of time for a return acknowledgment message, in one embodiment a simple echo of the transmitted test frame. If the status of any node has changes, as denoted by the failure to receive a return message from that node signifying either a node or a link failure, the sending node updates its local adjacency status table. The status changes may then be incorporated into the local RIB/routing table, which is then propagated to all other routers on the network through standard means well-known in the art.

[00108] Because each router will update its adjacency status table each time the local message/response cycle is completed, reflecting the true state of all links, LAN efficiency will be improved by avoiding routes through dead links or to unresponsive nodes. For example, a response wait period of approximately one second will allow router table updates approximately every few seconds, instead of the 5 to 10 minutes seen in the prior art. A test message is typically not sent within the same segment.

[00109] One or more of the nodes performing the above status discovery process may be, in some embodiments, simply one of the hosts on the network, or a dedicated device configured to act as a router (as that term and function is known in the art) with the added functionality necessary to implement the presently-disclosed methods. Alternately, one or more of the status-discovering nodes may be a specially- adapted hardware and/or software device dedicated to this function.

[00110] In an alternate embodiment, the local node may update its copy of the network routing table directly upon determining that a node on the network (or network segment) has not responded to the test message. The modified routing table may then be advertised and propagated to all other routers on the network.

[00111] According to another aspect of the invention, a system for improving availability includes a plurality of destination nodes in communication with a respective one of a plurality of destination network segments, each of the destination 9318344 11. DOC-35- nodes configured to receive a test message through one of a plurality of communication links and generate a return message; a source node in communication with each of the plurality of destination nodes, the source node configured to provide a test message to each of the plurality of destination nodes, and for determining the status of each of the plurality of communications links in response to the return messages ; and a configuration update module in communication with the source node and the plurality of destination nodes, the configuration update module providing a status message to each of the destination nodes that provides a return message to the source node.

[00112] According to yet another aspect of the invention, a system for improving network availability in a segmented network includes a first network segment having a plurality of connected source nodes; a second network segment having a plurality of connected destination nodes, the second network segment connected to the first network segment over one or more paths ; identification means for identifying from one or more source nodes one or more cooperating destination nodes ; transmission means for periodically transmitting a test message over the one or more paths from a source node to one or more destination nodes ; the transmission means in response to a return message received from the destination nodes, determining the status of the one or more paths; and status update means for providing the status to each of the plurality of destination nodes that generated a return message.

[00113] According to yet another aspect of the invention, a'compaction' method'substitutes'fixed values in a packet header with a'tag'value. In one embodiment, IPv4 frames which are not-optioned and not fragmented are selected, which allows for removing the'version','ihl','flags'and fragment offset fields, saving 3 bytes. The Total Length and Checksum fields are then removed, saving an additional 4 bytes. Five bits are removed from the Type of Service Field, and three bits are removed from the Time to Live Field.

[00114] Figure 13 is a high level block diagram of a typical LAN 1310 comprised of two segments 1312 and 1314. Each segment contains a multiple links 1320 between nodes 1325. Nodes 1325 may be hosts, routers, load balancers, 9318344 11. DOC-36- firewalls, or any other network device currently known or yet to be deployed in a network. Routers 1330A, 1330B, and 1330C are also nodes on the segments. Routers 1330A and 1330B can communicate with each other over paths 1 and 2, thereby connecting segments 1312 and 1314.

[00115] Network segments 1312 and 1314 may be Ethernet networks, although the present disclosure is broadly applicable to other network protocols.

Stated more formally, although an Ethernet is described, those skilled in the art will realize that networks other than those utilizing Ethernet protocols can be used.

Accordingly, the invention is not limited to any particular type of network.

[00116] Router 1330A, in one exemplary embodiment, may be configured to act as one of the status-discovering nodes for segment 1312. As such, router 1330A sends messages to all external (to segment 1312) nodes 1325, one node at a time, to see if the paths to them (e. g. , paths 1 or 2) are operational. These messages may be LLC type 1 test frames, although any short test messages with a regular and predefined format may be used. The Logical Link Control (LLC) layer is the higher of the two data link layer sub-layers defined by the IEEE in its Ethernet standards. The LLC sub-layer handles error control, flow control, fiaming, and MAC-sub-layer addressing. The most prevalent LLC protocol is IEEE Standard 802.2, which includes both connectionless and connection-oriented variants. As the IEEE Standard 802.2 is well-known to those of ordinary skill in the art, further definition and description herein is unnecessary.

[00117] Test frames are not sent to locally attached nodes, i. e. , hosts 1325 within segment 1312, in order to reduce intra-segment traffic. Only nodes outside of segment 112 (referred to herein as"destination"nodes) are sent messages.

[00118] Return messages are generated by the destination nodes and sent back to the source node (i. e. , the status-discovering node) for collection and matching to transmitted test messages. The return message may be a simple echo of the test message or a different, confirming message may be sent. Either way, the presence of a return message acknowledging (in some sense) the transmitted message provides a complete, end-to-end test of path continuity and therefore its status.

[00119] One advantage of using the LLC Type 1 test message is that it is purely a Layer 2 approach that does not propagate any overhead to Layer 3 or above in the protocol stack. Accordingly, the low overhead on the source and destination nodes makes for low round-trip delay and hence improved link fault detection timeliness.

[00120] Note that this statusing approach differs from the link integrity test used to determine the health of a link as far back as 1 OBase-T Ethernet. As described in the Cisco Press, Internetworking Technology Handbook (online, at: http://www. cisco. com/univercd/cc/td/doc/cisintwk/ito doc/index. htm in Chapter 2, (accessed September 20,2002) : [00121] 1 OBase-T was also the first Ethernet version to include a link integrity test to determine the health of the link. Immediately after power-up, the physical medium attachment (PMA) sublayer transmits a normal link pulse (NLP) to tell the NIC at the other end of the link that this NIC wants to establish an active link connection: [00122] If the NIC at the other end of the link is also powered up, it responds with its own NLP.

[00123] If the NIC at the other end of the link is not powered up, this NIC continues sending an NLP about once every 16 ms until it receives a response.

[00124] The link is activated only after both NICs are capable of exchanging valid NLPs.

[00125] Clearly, the 1 OBase-T integrity check is only used at initial power- up, to establish the link between the Network Interface Cards (NICs) in two hosts.

The statusing mechanism herein described, by contrast, operates continuously to keep track of segment host status. Indeed, in some exemplary embodiments, the status test 9318344 DOC-38- message is sent approximately once per second (rather than once only, at initialization in the prior art) in order to keep all status information current.

[00126] Figure 14 illustrates, in flowchart form, the process whereby the network efficiency is improved by the present disclosure. The process begins on power-up of a status-detecting node, 1410. Initially, each status-detecting node performs a discovery step 1415 to identify its nearest (adjacent) network neighbors outside of the status host's own network segment and their status, using conventional means. Alternatively, a status-detecting node may refer to the initial status and adjacency information supplied to it in a local configuration file.

[00127] Next, the status-detecting node begins sending test messages 1420 to each nearest neighbor not within the status-detecting node's 95 segment. After each message, the status-detecting node waits a pre-determined time (on the order of 500 milliseconds) for a response, 1430. Test 1440 is a simple binary test on the reply received: if the reply matches the expected message (branch 1442), then the link or path is up and working. The status of that connection is then marked as"up"in the local adjacency status table, 1444.

[00128] In some embodiments, the local adjacency status table is a separate table in the local routing information base (RIB) ; it may also be separate and distinct from the RIB. The adjacency status table is not, however, a part of the local routing table when that term is used as implying a distinction from the RIB.

[00129] If, however, the return message is not as expected or does not arrive at all within the pre-determined wait time, branch 1446 is taken and the link path status is marked as"down"in step 1448.

[00130] In a preferred embodiment, the pre-determined wait time is specified in a configuration table (or file) supplied to the status discovery process or coded into software as a default value of, for example, one second. This link-specific wait time may then be adjusted (not shown) according to the (known) speed of each link and the actual round-trip time (RTT) through means well-lcnown to those of ordinary skill in the art. Thus, for distant (long) links operating at slow speeds, the 93183441 1. DOC-39- discovery process will increase the linlc-specific wait time during the initial discovery.

In particular, the method will never mark a link as"down"until it first verifies the RTT wait time by finding (and marking) the link as"up, "as depicted by secondary test 1470.

[00131] In marking the link down in the adjacency status table, there may be several degrees of"down"indicated. The link may be down because it is overly congested, i. e. , when no replies are received in the wait period for several tries.

Alternately, the link may be marked down because the destination node is itself down or congested. Furthermore, the link may be down because the network or a segment thereof is down as signaled through for example, a routine routing table update. This information may be included by using different symbols for the different states or by encoding the information using two or more bits through methods well-known in the art.

[00132] The updated path status from either step 1444 or 1448 is then used to update the local node's adjacency status table 1450, which in turn forces a Routing Information Base (RIB) update, 1455. The process waits approximately one second, 1460, before sending a test message to the next host in step 1420, repeating the cycle indefinitely or until commanded to cease or power-down. (As noted above, in some embodiments the wait time is dynamically adjusted to reflect the actual RTT to each node).

[00133] The wait durations described above are examples only. Longer or shorter wait times 1430 (before declaring a lack of response message as a linlc"down" indicator) and 1460 (recycle time between messages) are also useable. The length of wait determines the degree to which message traffic overhead (from the test messages and their responses) impact the overall network's performance. Obviously, longer waits (especially at recycle step 1460) decrease message overhead, but at the cost of additional latency before status updates hit the router table and can be propagated through the network.

[00134] The present method can be practiced by a single node, by a plurality of nodes, or by all nodes in a segment or network. When multiple nodes each act as independent status discoverers, very rapid RIB/routing table updates will result as nodes, links, or paths come up or go down. In such a scenario, link state information may be updated on the order of once every five or ten seconds, a significant improvement over prior methods of monitoring link status.

[00135] According to another practice for improving network efficiency, a compaction'method is described herein that will'substitute'fixed values in a packet header with a'tag'value to reduce switching overhead. It is common for data and telecommunications communications switching gear to use ATM switch fabrics as their core-switching matrix. These switch fabrics provide deterministic switching bandwidth at a commodity cost to the vendor. However, they require that variable length packet data be"cellified" (broken into cells) before transmission over the switch matrix. This"cellification"process induces overhead into I/O bandwidth of the data stream, increasing the total amount of bandwidth required to carry a given set of packets. The cellification process adds two types of overhead; a cell header which provides switch routing and cell reassembly control and padding to cells when the data frame is not an even modulus of the cell size (which is commonly the case). This cell overhead is commonly referred to as the cell tax. The cell tax is especially painful when the original packet data size only slightly exceeds a single cell size. For these cases the cell tax is over 100% (2x the overhead plus the cell size-1 byte).

[00136] An exemplary IP frame header is given in the Table below.

012 34 567 8 90 12 34 56 78 90 12 34 56 7 89 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ version) IHL iTypeofService 1 TotalLength +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+w+-+-+-+-+-+-+-+-+- +-+-+ Identification {Flagsl Fragment Offset +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ ) Tune to Live) Protocol) Header Checksum) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ Source Address +-+-+-+-+-)-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ Destination Address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ Options Padding 9318344 11. DOC-41- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- +-+-+ [00137] For example: take a cell size of 64 bytes with a four-byte cell header (i. e. each cell can carry 60 bytes of data). In this case a 60 byte frame exactly fits into a 64 byte cell and only incurs a 66% increase in the I/O bandwidth consumed across the switch fabric. However, a 61 byte frame requires two cells and incurs 109% overhead across the switch plane.

[00138] This'overhead'effect is quickly amortized for packets that exceed two cell sizes, so it is only for small packet sizes that are just over the cell size that this inefficiency occurs. A solution to this problem requires that the initial packet size be reduced before the cellification process. While'compression'algorithms exist, they require intense processor cycles and software complexity, as do label based path substitution algorithms. Therefore an approach that is stateless and applies to virtually all packets is better.

[00139] Described herein is a'compaction'method that will'substitute' fixed values in a packet header with a'tag'value. From analysis, it is a realization that the TCP/IP header in the example application carries 8 bytes that can be removed and substituted (from a'standard'20 byte header) by categorizing the following IP frame types as a standard frame type and recognizing some inherent aspects of IPv4 forwarding. The processes described herein then use a'frame type code'as a tag across the switch fabric to indicate this type.

[00140] The following presents on example of a process and analysis that may be employed with the systems and methods described herein: [00141] 1. Select IPv4 frames which are not-optioned and not fragmented.

[00142] This allows for removing the'version','ihl','flags'and fragment offset fields, (3 bytes).

[00143] 2. Remove the Total Length and Checksum fields [00144] The total length field is not needed once it is verified on input. The total frame length will be carried across the switch fabric in the frame header, thus it 9318344 L 1. DOC-42- can removed from the IP header. The checksum would have been verified on input and will need to be recalculated on output, so it can be removed. (4 bytes) [00145] 3. Remove five bits from the Type of Service Field [00146] 4. Remove three bits from the Time to Live Field [00147] Both DSCP and IP Precedence mapping only use three bits of the TOS field. The maximum TTL field being used is Oxlf. (1 byte).

[00148] As set out above and as described herein, the compaction method will substitute fixed values in a packet header with a tag value. This reduces the amount of data that needs to be processed and allows for quicker amortization of overhead.

[00149] The order in which the steps of the methods are performed is purely illustrative in nature. In fact, the steps can be performed in any order or in parallel, unless otherwise indicated.

[00150] The methods described herein may be performed in hardware, software, or any combination thereof, as those terms are currently known in the art. In particular, the present method may be carried out by software, firmware, or microcode operating on a computer or computers of any type. Additionally, software embodying the invention may comprise computer instructions in any form (e. g. , source code, object code, microcode, interpreted code, etc. ) stored in any computer-readable medium (e. g. , ROM, RAM, flash memory, magnetic media, punched tape or card, compact disc (CD) in any form, DVD, etc. ). Furthermore, for purposes of clarity in illustration, the systems and methods described discuss telephony applications.

However, the systems and methods described herein are not limited and may be employed in other applications including other uses of data packet technologies, including a range of multimedia services, multimedia conferencing, interactive gaming, video on demand, distance learning and general multimedia applications.

Furthermore, such software may also be in the form of a computer data signal embodied in a carrier wave, such as that found within the well-known Web pages 9318344 11. DOC-43- transferred among devices connected to the Internet. Accordingly, the present invention is not limited to any particular platform, unless specifically stated otherwise in the present disclosure.

[00151] While particular embodiments of the present invention have been shown and described, it will be apparent to those skilled in the art that changes and modifications may be made without departing from this invention in its broader aspect and, therefore, the appended claims are to encompass within their scope all such changes and modifications as fall within the true spirit of this invention.

Previous Patent: MOBILE COMMUNICATION DEVICE WITH WIRELESS HANDSET

Next Patent: A SIMD PROCESSOR WITH MULTI-PORT MEMORY UNIT