Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NETWORKING MODULE AND METHOD FOR FAULT-TOLERANT TRANSMISSION OF SYSTEM MANAGEMENT INFORMATION
Document Type and Number:
WIPO Patent Application WO/1995/020853
Kind Code:
A1
Abstract:
A fault-tolerant module (10) and method for a networking chassis provides a primary path (20) for transmission of system management information and a secondary path (26) for transmission of system management information in the event of failure of the primary path. The primary path includes a first microprocessor controller (22), coupled between the first system management bus and a processor (CPU) (14) located on the networking module. The secondary path includes a second microprocessor controller (28) system and a dual-port memory (30). The second controller system is coupled to the second system management bus and to a first port of the memory, and a second port of the memory is coupled to the CPU. The dual-port memory thus provides an interface between the CPU and the second microprocessor controller system, thus providing isolation and allowing the memory to be accessible by either processor. Environmental information and module identification information are stored in the memory; in the event of failure of the primary path, the information can be accessed and transmitted over the backup transmission path.

Inventors:
FEE BRENDAN
OLIVER CHRIS
Application Number:
PCT/US1995/001177
Publication Date:
August 03, 1995
Filing Date:
January 26, 1995
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CABLETRON SYSTEMS INC (US)
International Classes:
G06F11/20; G06F13/00; H04L12/44; H04L69/40; (IPC1-7): H04L29/14; H04L12/40; G06F11/20
Other References:
D. GREENFIELD ET AL.: "SMART HUB VENDORS MOVE IN ON FDDI", DATA COMMUNICATIONS, vol. 20, no. 14, October 1991 (1991-10-01), NEW YORK US, pages 113 - 124
E. M. HINDIN: "WELLFLEET STRIKES BACK WITH ITS FASTEST ROUTER YET", DATA COMMUNICATIONS, vol. 20, no. 11, September 1991 (1991-09-01), NEW YORK US, pages 137 - 138
Download PDF:
Claims:
CLAIMS
1. A networking module positionable in a chassis including a plurality of such modules and first and second system management buses connecting the modules, each module having a processor for processing data, characterized in a faulttolerant communication and control system disposed on each module comprising: a first microprocessor controller coupled between the first system management bus and the processor, for forming a primary path for transmission of system management information; and a second microprocessor controller and a multiport memory forming a secondary path for transmission of system management information, the second microprocessor controller coupled to the second system management bus and to a first port of the memory and a second port of the memory being coupled to the processor, wherein information stored in the memory is accessible by both the processor and the second microprocessor controller to enable faulttolerant transmission on both system management buses.
2. The system of claim 1, wherein information concerning the status of the module is stored in the memory by the processor.
3. The system of claim 1, wherein the first microprocessor controller and the second microprocessor controller are powered by different power supplies.
4. The system of claim 1, wherein the second system controller includes a nonvolatile memory containing network variables.
5. The system of claim 4, wherein the second system controller includes a network variable monitoring protocol module for gathering, storing and transmitting the network variables.
6. The system of claim 4, wherein the network variables include at least one of environmental information and module identification information.
7. The system of claim 6, wherein the environmental information includes one or more of the power supply voltage, the power supply current, and the temperature of the module.
8. The system of claim 6, wherein the module identification information includes one or more of the module part number, serial number, and revision level.
9. The system of claim 1, wherein the module includes hardware and software for performing networking functions including at least one of bridging and routing.
10. The system of claim 1, further including a chassis and a chassis management agent for providing system management information on the busses.
11. The system of claim 1, including a reset line coupling the second system controller and processor for resetting the processor in the event of failure of the primary path.
12. The system of claim 1, including a reset line coupling the second system controller and processor for allowing the processor to reset the second controller system.
13. The system of claim 1, wherein the memory includes a nonvolatile memory.
14. The system of claim 1, wherein the multiport memory is a dualport memory.
15. In a chassis including a plurality of networking modules and first and second system management buses coupled to each networking module, each module having a processor for processing data, characterized in a faulttolerant method of transmitting system management information on the buses, comprising the steps of: transmitting system management information over a primary path, the primary path including a first microprocessor controller coupled between the first system management bus and the processor; and transmitting system management information over a secondary path in the event of failure of the primary path, the secondary path including a second microprocessor controller coupled to the second system management bus and to a first port of a memory, and a second port of the memory being coupled to the processor.
Description:
NETWORKING MODULE AND METHOD FOR FAULT-TOLERANT TRANSMISSION OF SYSTEM MANAGEMENT INFORMATION

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates generally to system management buses used to control modules in a networking chassis. More particularly, the present invention relates to a fault-tolerant system management bus architecture for transmitting communications and control information between networking modules in a networking chassis.

Discussion of the Related Art

In a conventional networking chassis, a single network is connected to all of the networking modules for communication and control of individual modules. This is called an "out-of-band" network. The out-of-band network typically is not accessible from outside the chassis and is only used to transmit communication and control information ("system management information") between networking modules within the chassis. The network that connects the ports on the chassis together for purposes of transmitting data from one station or network segment to another is called the "in-banc" networ .

Furthermore, in a conventional networking chassis the central processing unit (CPU) located on the network module that controls communication over the in-band network is also used to control transmission over the out-of-band network. One problem with this type of system is that no fault tolerance is provided. If a problem occurs with the out-of-band network or with the networking module itself, there is no other path provided for transmission of system management information.

Some prior networking chassis have provided two out-of- band networks wherein if there is a failure in one of the

networks, system management information can be transmitted over the second out-of-band network. However, even in these systems, if the CPU of the networking module is off or has failed, no information regarding the status of the networking module can be obtained.

Therefore, it is an object of the present invention to provide a networking module and method that provides fault- tolerant operation if there is a failure of the primary out-of-band network.

Another object is to provide a module and method that allows information to be gathered about a module even if there is a failure of the primary out-of-band network.

Another object is to provide a module and method that allows information to be gathered about the physical condition and environment of the networking module.

SUMMARY OF THE INVENTION

The present invention overcomes the disadvantages of the prior art by providing a fault-tolerant system management bus architecture for a networking chassis. The architecture provides a primary path for transmission of system management information and a secondary (back-up) path for transmission of system management information in the event of failure of the primary path. The networking chassis includes two system management buses coupled to each networking module. The primary path (primary out-of-band network) includes a first microprocessor controller coupled between the first system management bus and a processor (CPU) located on the networking module. The backup path (secondary out-of-band network) for transmission of system management information includes a second microprocessor controller system and a multi-port memory. The second controller system is coupled to the second bus and to a first port of the memory; a second port of the memory is coupled to the processor (located on the networking module) . The memory provides an interface

between the processor and the second controller system, thus providing isolation between the networking module and the second out-of-band network. The information stored in the memory is accessible by the processor on the networking module and the second microprocessor controller system.

In one embodiment of the invention, the primary system management bus is a ten megabit/second (10Mbps) ethernet network that operates according to IEEE Standard 802.3, and the second system management bus is a LOCALTALK network.

Preferably, the first microprocessor controller and the second microprocessor controller system are powered by different power supplies so that in the event of failure of the primary power source (that powers the first microprocessor controller), the backup path remains operational.

The primary path is used for transmission of system management information when the networking module and the first system management bus are operating properly. If the primary transmission path should fail, the second path may be used to obtain information concerning the status of the CPU on the networking module as well as to provide some limited processing of data that would normally be transmitted over the primary out-of-band network or processed by the CPU on the networking module.

Additionally, the second microprocessor controller system monitors environmental information such as the temperature of the module, module voltages, and module currents, and stores this information in the multi-port memory. The memory may also store module identification information (received from the CPU) such as part number, serial number, and revision level of the networking module. The environmental information and the module identification information (collectively, the "network variables") are stored in a nonvolatile memory so that in the event of a module failure or a failure of the primary power supply, the information is

not lost and can still be transmitted over the secondary out-of-band network to the chassis management agent.

Other features and advantages of the present invention will be more readily understood and apparent from the following detailed description of the invention, which should be read in conjunction with the accompanying drawings, and from the claims which are appended at the end of the detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, which are incorporated herein by reference and in which like elements have been given like reference characters,

FIG. 1 is a schematic block diagram of one embodiment of a fault-tolerant system management bus architecture according to the present invention;

FIG. 2 is a schematic block diagram of a second microprocessor controller system for transmission of control data over the secondary out-of-band network in the system of FIG. 1;

FIG. 3 is a listing of the information that may be stored in a dual-port memory and a non-volatile memory of the second microprocessor controller system of FIG. 2;

FIG. 4 is a schematic block diagram of the software modules used to program the second microprocessor controller system;

FIG. 5 is a diagram of a data structure used in the programs of FIG. 4;

FIG. 6 is a module map for the second microprocessor controller system; and

FIG. 7 is a module map for the networking module on the CPU side.

DETAILED DESCRIPTION

The subject matter of the present application may be advantageously combined with the subject matters of the following copending and commonly owned applications filed on January 28, 1994, and which are hereby incorporated by reference in their entirety:

U.S. Serial No. 08/187,856 entitled "Distributed Chassis Agent For Network Management" by Fee, et al.

• U.S. Serial No. 08/188,238 entitled "Network Having Secure Fast Packet Switching and Guaranteed Quality of Service" by Dobbins, et al.

The present invention is particularly useful in the networking chassis described in the referenced applications, which chassis includes a plurality of networking modules and a chassis backplane. Referring to FIG. 1, there is shown one such networking module 10 and a chassis backplane 12. The networking module 10 includes a central processing unit (CPU) 14 that is used to control the processing carried out on the data received over the in-band network (not shown) by module 10. The CPU 14 controls the module hardware, shown schematically as block 16, over address, control, and data lines 18. Circuitry 16 includes that necessary to carry out the processing functions on module 10, such as bridging and/or routing. In one embodiment of networking module 10, CPU 14 is an i960 processor available from Intel Corporation. The present invention may be used with any type of processor 14 and module hardware 16.

The primary path (out-of-band network) includes system management bus 20 and first microprocessor controller 22. CPU 14 receives system management information from bus 20 via controller 22 and the set of address, data, and control lines 24. In one embodiment, controller 22 is a DP83932B systems oriented network interface controller, available from National Semiconductor Corporation, and system management bus 20 is a conventional ethernet (IEEE 802.3) network operating

with a bit transfer rate of ten megabits/second. System management bus 20 and microprocessor controller 22 form the primary path between networking module 10 and a chassis management agent (.not shown) which provides the system management information on the buses.

A second system management bus 26 is also provided on chassis backplane 12. A second microprocessor controller system 28 is used to couple second bus 26 to module CPU 14 through a dual-port RAM 30 via two sets of address, control, and data lines 32 and 34. The combination of second bus 26, controller system 28, and dual-port RAM 30 provides a secondary or backup path for transmission of system management information. In one embodiment, system management bus 26 may be a LOCALTALK serial bus having an increased clock rate such that data may be transmitted over the bus at a rate of 1 megabit/second. LOCALTALK hardware specifications and software protocols are described in "Inside Appletalk," second edition, published by Apple Computer, Inc., 1990, which is incorporated herein by reference.

A reset line 36 is provided from microprocessor system 28 to the CPU 14, allowing CPU 14 to be reset in the event of failure of, for example, the primary system management path or the software running on the module 10. In a like manner, reset line 38 is provided from module CPU 14 to the second controller system 28 to allow CPU 14 to reset the second controller system 28.

In this embodiment the secondary system management path is powered by a separate power supply from that used to power the rest of the circuitry on module 10. The second controller system 28 is powered by a separate 5-volt supply line 39 on chassis backplane 12. The remainder of module 10, including CPU 14 and the primary system management path is powered by a separate 48-volt supply line 40 on chassis backplane 12.

The 48-volt supply line is connected to a DC-to-DC converter 42, which converts the 48 volts into appropriate

voltages (e.g., 5 volts) for supplying power to the various electrical components on networking module 10 (i.e., module hardware 16, CPU 14, and first controller 22). An enable line 44 from the second controller system 28 is used to enable or disable the converter 42, thus allowing the second controller system 28 to turn the module 10 on and off.

An analog-to-digital (A/D) converter 46 is also provided on module 10 to allow for monitoring of environmental variables. The digital output of AD converter 46 is coupled to the second controller system 28 via data and control lines 48. AD converter 46 is used to convert analog signals from DC-to-DC converter 42 representative of: the voltage of the supply line 40, the current delivered by the supply line 40, the voltage output by converter 42, and the current output by converter 42 over lines 50, 52, 54, and 56, respectively, so that these parameters can be monitored. In addition, AD converter 46 receives a signal from temperature sensor 58, thus allowing the temperature of module 10 and its environment to be monitored. In one embodiment, AD converter 46 is model number MC14051, available from Motorola Semiconductor.

The environmental information gathered by the second controller system 28 may be passed between networking modules via second system management bus 26 because this information is generally relatively small data packets that do not require the capabilities of the primary system management bus. However, the module processor 14 and the primary system management bus have access to the environmental information because it is stored in dual-port memory 30.

Reference is now made to FIG. 2 which illustrates in more detail the second microprocessor controller system 28. In one embodiment, the second controller system includes a microprocessor 70, which may be a Z80180 microprocessor available from Zilog Corporation. The second controller system 28 also includes a light emitting diode (LED) subsystem 72 including a programmable array logic (PAL) 74

and light emitting diodes 76. The subsystem 72 may be used to indicate the status of networking module 10 to a user. A bus driver, such as an RS-422 driver 78 is used to interface microprocessor 70 to second system management bus 26. Various memories 30, 82, 84, and 86 are coupled to microprocessor 70 using address and data lines 88 and 90, respectively. Memory 82 (EEProm, 2K) is a nonvolatile memory used to store identification information about module 10, such as that shown in FIG. 3. Memory 84 (static RAM, 28K) is working memory for the microprocessor 70. Memory 86 (EPROM, 32K) is used to store the program that operates microprocessor 70.

As previously discussed, memory 30 (Dual Port RAM, 2K) provides an interface between second controller microprocessor 70 and CPU 14 on the networking module, and provides isolation between second controller system 28 and CPU 14. Dual-port memory 30 may be an IDT 7321 CMOS RAM available from Integrated Device Technology, Inc. An interrupt system is used to alert CPU 14 and microprocessor 70 to communications from either processor, by writing to a predefined memory location. For example, microprocessor 70 may write data into a predefined memory location, which causes memory 30 to generate an interrupt to processor 14. When processor 14 reads the data from the predefined memory location, an interrupt flag is reset and processor 14 can resume execution of its program. More generally, control of dual-port RAM 30 may be accomplished by using a shared memory manager program stored in program memory 86 as follows.

Shared Memory Manager

Figs. 6-7 illustrate the shared memory manager; Fig. 6 is a module map for the second microprocessor controller system side, and Fig. 7 is a module map for the networking module CPU side. In the figures: "SSC" means the second microprocessor controller system; and "DP" means the dual-port RAM memory.

With regard to Fig. 6, shared memory is located at F800h - FFFFh, where F803h - FBFEh is writable by the SSC. F800h is the ack byte and F801h is the ack-ack byte for the SSC.

This module will receive a hardware interrupt from the host when there is a message in shared memory or when there is a diagnostic message to read from the interrupt code byte at OFFFFh. It will interrupt the host by writing to location OFFFEh.

The interrupt codes are: (nondiagnostic interrupt code)

00 non-diagnostic message in shared memory (diagnostic interrupt codes) reg 01 request status (written by SSC or host) req 02 SSC should read data=addr pattern in DP

(written by host) req 03 SSC should read data=~addr pattern in DP

(written by host) req 04 SSC should write data=addr pattern in DP

(written by host) req 05 SSC should write data=~addr pattern in DP

(written by host) ack 81 running power-up diagnostics

(host_status = testing) (written by host) ack 82 running peripheral diagnostics

(host_status = testing) (written by host) ack 83 running operational firmware

(host_status = OK) (written by SSC or host) ack 84 SSC read data=addr pattern in DP (written by

SSC) ack 85 SSC read data=~addr pattern in DP (written by

SSC) ack 86 SSC wrote data=addr pattern in DP (written by

SSC) ack 87 SSC wrote data=~addr pattern in DP (written by SSC)

err F0 error running power up diagnostics (host_status

•= failure) (written by host) err Fl error running peripheral diagnostics (host status = crippled) (written by SSC or host) err F2 error running operational firmware (host_status

- failure) (written by host) err F3 SSC error reading data=addr pattern in DP (set ssc_status and host_status to no DPRAM communication)(written by SSC) err F4 SSC error reading data=~addr pattern in DP

(indicate DPRAM failure) (written by SSC) err F5 SSC error writing data=addr pattern in DP (not sure how the SSC can detect) (written by SSC) err F6 SSC error writing data-=~addr pattern in DP

(an error writing to DPRAM) (written by SSC)

The valid ack or ack/ack codes are: decode message on-board - Olh send message off board - 02h update message - 03h received message - 80h

In regard to Fig. 7, shared memory is located at abOOOOOOh - ab0007FFh, where ab000401 - ab0007FDh is writable by the host. ab0003FFh is the ack byte and ab000400h is the ack-ack byte for the host.

This module will receive a hardware interrupt from the host when there is a message in shared memory or when there is a diagnostic message to read from the interrupt code byte at ab0007FEh. It will interrupt the host by writing to location ab0007FFh.

The interrupt codes are: (nondiagnostic interrupt code)

00 non-diagnostic message in shared memory

(diagnostic interrupt codes) req 01 request status ack 83 running operational firmware (host_status = OK)

All other diagnostics interrupts will be used by the diagnostic image of the host only.

During operation, the system operates in a fault-tolerant, non-load sharing manner with first system management bus 20 and controller 22 acting as the primary transmission path and second system management bus 26 and controller system 28 acting as the secondary or backup transmission path. Except for environmental information, system management information is always transmitted over the primary path and the backup system is only used when there is a failure of the primary transmission path.

The environmental variables, such as module voltage, current, and temperature, are stored in the dual-port memory 30. Module identification information, such as the module part number, serial number, and revision level (illustrated in FIG. 3), are stored in dual-port memory 30, as well as in nonvolatile memory 82. The module identification information is supplied by CPU 14 and stored in dual-port memory 30 during, for example, the initialization and power-up sequence of module 10. The identification information is also stored in nonvolatile memory 82 by microprocessor 70. Therefore, if the module 10, CPU 14, or the primary transmission path should fail, the module identification and environmental variables (collectively the "network variables") are still available to the backup transmission path. This provides a particular advantage in that this information can be accessed remotely so that the identity and type of networking module can be ascertained before a service person visits the actual location of the networking chassis. This provides a considerable time savings since the defective module type can

be identified and a replacement part brought with the service person to the site of the networking chassis.

Reference is now made to FIG. 4, which illustrates, in schematic block diagram form, the software modules used to program the second microprocessor 70. The software modules include the shared memory manager software 100 (see prior disclosure of pseudo code implementation) that is used to control shared memory 30. Module 101 contains software for initializing the microprocessor 70 at power on or after a reset, as well as diagnostic routines. Module 102 is a monitor and control software module that is used to provide the interface between AD converter 46 and DC-to-DC converter 42 and microprocessor 70. LLAP driver module 104 is a local LOCALTALK link access protocol software module that provides the interface to LOCALTALK network 26. LLAP driver 104 may be designed in accordance with the aforementioned "Inside AppleTalk" reference. Module 106 is a message encode module (for information to be transmitted over second bus 26) and decode module (for messages received from second bus 26) that provides any necessary protocol translation between LLAP driver 104 and NVMP module 108. Message encode and decode module 106 may be designed in accordance with the protocols and specifications described in "Internetworking with TCP/IP, Principals, Protocols, and Architecture, Vol. 1," 2d Edition, by Douglas E. Comer, published by Prentice Hall, Incorporated, 1991, incorporated herein by reference in its entirety. NVMP module 108 is a Network Variable Monitoring Protocol module that is used to provide the interface between modules 100, 102, and 106 to gather, store, and transmit environmental and module identification information. The network variable monitoring protocol generally follows the Simple Network Management Protocol (SNMP) protocol paradigm as referenced in the previously identified U.S. Serial No. 08/187,856.

Reference is now made to FIG. 5, which illustrates the format of a data structure used in the NVMP module 108 to transmit a message. The data structure 120 includes a number

of fields. Each of the fields is one byte in length. The size field 122 contains the length of the entire data structure including size field. The version field 124 identifies the version of the software used to format the message. The sequence field 126 contains a monotonically increasing number that is used to match a response with a corresponding message that initiated the transmission of the data structure.

The command field 128 is used to distinguish the different processing actions to be taken regarding the data structure. There are three commands that are recognized by NVMP module 128. The first command is a "set" command that is used to set values of an entire data structure. The second command is a "get" command that is used to get values of an entire data structure. The third command is a "trap" command that is used to indicate an alarm condition, such as host failure. The most significant bit of the command field is used to indicate whether there has been a response. The program always sends a response to inform the sending process whether or not the data structure was received successfully.

The board referenced identification field 130 is an optional field and is used to indicate the particular networking module about which information is being gathered. The structure indentification field 132 is used to reference the particular type of network variable. The structure instance field 134 is used to identify a particular instance of the network variable identified in field 132. The structure index field 136 is used to control access to the fields in the data structure. For example, if the index is set to 0, the whole data structure may be accessed.

The error status field 138 is used to indicate if an error has occurred. A value of "0" in this field may indicate that there has been no error. A value of "1" in this field may indicate that the requested operation identified an unknown structure. A value of "2" may indicate that the requested operation identified an unknown variable.

A value of "3" may indicate that the requested operation specified an incorrect syntax when trying to modify a structure or variable. A value of "4" may indicate that the requested operation tried to modify a structure or variable that is readable only. A value of "5" may indicate that the requested operation tried to read a command structure.

Data field 140 is used to store the actual data for the variable being monitored, such as voltage, current, or temperature. For example, if the command is a "get" command, the data field will be empty for an incoming data packet and will be filled in with the appropriate data when the data structure is transmitted in reply to the request. If the command in the command field is a "set" command, then in an incoming data structure, the data field will contain data indicating the value to which the particular variable of interest is to be set.

The NVMP protocol 108 is advantageous in the present invention because it provides a low overhead structure and method for gathering environmental and module identification information.

Having thus described one particular embodiment of the invention, various modifications and improvements will readily occur to those skilled in the art and are intended to be within the scope of the invention. Accordingly, the foregoing description is by way of example only and is not intended as limiting.