Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BACKUP COMMUNICATIONS SCHEME IN COMPUTER NETWORKS
Document Type and Number:
WIPO Patent Application WO/2017/044226
Kind Code:
A1
Abstract:
Various techniques for managing communications backup for computer networks are disclosed herein. In one embodiment, a method includes detecting an abnormal operating condition at a primary network node, the primary network node being coupled to a computing device via a first optical connection between an optical switch and the primary network node. In response to the detected abnormal operation condition, the method includes prompting the optical switch to switch from the first optical connection to a second optical connection between the optical switch and a standby network node. The method further includes instructing the standby network node to facilitate communications with the computing device based on the replicated network configuration.

Inventors:
RATTERREE GARY (US)
COX JEFF (US)
DEGRACE GERALD (US)
Application Number:
PCT/US2016/046096
Publication Date:
March 16, 2017
Filing Date:
August 09, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
International Classes:
H04L69/40; H04Q11/00
Foreign References:
US20130232382A12013-09-05
US20120076006A12012-03-29
US20070058973A12007-03-15
Other References:
None
Attorney, Agent or Firm:
MINHAS, Sandip et al. (US)
Download PDF:
Claims:
CLAIMS

[cl] 1. A method for providing communications backup in a computer network, the method comprising:

detecting an abnormal operating condition at a primary network node, the primary network node being coupled to a computing device via a first optical connection between an optical switch and the primary network node;

in response to the detected abnormal operation condition, prompting the optical switch to switch from the first optical connection to a second optical connection between the optical switch and a standby network node having similar structure and function as the primary network node; and

instructing the standby network node to facilitate communications with the computing device once the optical switch is switched to the second optical connection, thereby allowing continued network communications of the computing device via the standby network node.

[c2] 2. The method of claim 1 wherein:

the optical switch is a primary optical switch;

the computer network includes a standby optical switch coupled to the standby network node; and

prompting the optical switch includes:

transmitting a first instruction to the primary optical switch to switch from being connected to the primary network node to being connected to the standby optical switch; and

transmitting a second instruction to the standby optical switch to connect the primary optical switch to the standby network node.

[c3] 3. The method of claim 1 wherein:

the computing device is a first computing device coupled to the primary network node via an optical multiplexer and the optical switch; and the computer network includes a second computing device coupled to the primary network node via the optical multiplexer and the optical switch; and wherein the optical multiplexer is configured to receive and multiplex signals from both the first and second computing devices and transmitting a multiplexed signal to the optical switch.

[c4] 4. The method of claim 1 wherein:

the standby network node is a first standby network node; and

the method further includes, in response to the detected abnormal operation condition,

selecting one of the first standby network node or a second standby network node; and

prompting the optical switch to switch from the first optical connection to a second optical connection that is between the optical switch and the selected one of the first standby network node or second standby network node.

[c5] 5. The method of claim 1 wherein:

the standby network node is a first standby network node;

the computer network further includes:

a second standby network node; and

a standby optical switch configured to selectively couple the primary network node to the first or second standby network node; and the method further includes, in response to the detected abnormal operation condition,

selecting one of the first standby network node or the second standby network node;

transmitting a first instruction to the primary optical switch to switch from being connected with the primary network node to being connected with the standby optical switch; and

transmitting a second instruction to the standby optical switch to connect the primary optical switch to the selected one of the first standby network node or the second standby network node.

[c6] 6. The method of claim 1 wherein:

the optical switch is a first primary optical switch;

the primary network node is a first primary network node;

the computing device is a first computing device coupled to the first primary network node via the first primary optical switch; the computer network also includes a second computing device coupled to a second primary network node via a second primary optical switch; and the first and second primary optical switches are both coupled to the standby optical switch.

[c7] 7. The method of claim 6 wherein:

detecting the abnormal operating condition includes detecting an abnormal operating condition at the first primary network node but not the second primary network node; and

prompting the optical switch includes:

transmitting a first instruction to the first primary optical switch to switch from being connected to the first primary network node to being connected to the standby optical switch; and transmitting a second instruction to the standby optical switch to connect the first primary optical switch to the standby network node.

[c8] 8. The method of claim 6 wherein:

detecting the abnormal operating condition includes detecting an abnormal operating condition at both the first primary network node and the second primary network node;

the method further includes selecting one of the first primary network node or the second primary network node based on operating profiles of the first computing device and the second computing device; and prompting the optical switch includes:

transmitting a first instruction to the first primary optical switch or the second primary optical switch to switch from being connected to the first primary network node or the second primary network node to being connected to the standby optical switch; and transmitting a second instruction to the standby optical switch to connect the first primary optical switch or the second primary optical switch to the standby network node.

[c9] 9. The method of claim 6 wherein:

the standby network node is a first standby network node;

the computer network further includes a second standby network node coupled to the standby optical switch; detecting the abnormal operating condition includes detecting an abnormal operating condition at both the first and second primary network nodes; and

prompting the optical switch includes:

transmitting a first instruction to the first primary optical switch to switch from being connected to the first primary network node to being connected to the standby optical switch;

transmitting a second instruction to the second primary optical switch to switch from being connected to the second primary network node to being connected to the standby optical switch; and transmitting a third instruction to the standby optical switch (i) to connect the first primary optical switch to the first standby network node and (ii) the second primary optical switch to the second standby network node.

[C10] 10. A computing device having a processor and a memory containing instructions executable by the processor to cause the processor to perform operations according to one of claims 1-9.

Description:
BACKUP COMMUNICATIONS SCHEME IN COMPUTER NETWORKS

BACKGROUND

[0001] Computer networks can have a large number of servers or other types of computing devices interconnected with one another by routers, switches, bridges, firewalls, or other network nodes via wired or wireless network links. The network nodes can enable communications among the computing devices by exchanging messages via the network links in accordance with one or more network protocols.

SUMMARY

[0002] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0003] Computer networks in datacenters can include multiple interconnected switches, routers, and other network nodes organized into a hierarchy, a mesh, or other suitable arrangements. For example, in one implementation, a single enclosure (e.g., a rack) can house multiple servers that are coupled to a single switch associated with the enclosure. Such a switch is sometimes referred to as "top-of-rack" or "TOR" switch. Multiple TOR switches can then be connected to one or more Tier 1 or "Ti" switches, each of which can in turn be connected to one or more Tier 2 or "T2" switches.

[0004] Typically, redundancy of Tl, T2, or other upper-level switches can be readily provided, for example, by adding one or more extra switches. In contrast, providing redundancy for TOR switches can be challenging due to added costs and operating complexity. For instance, one solution includes installing two TOR switches for each enclosure housing multiple computing devices and provisioning two network interface controllers ("NICs") in each of the computing devices. However, such an arrangement can easily double the capital investments associated with the TOR switches. Also, the dual TOR switches may confuse the computing devices during operation because both TOR switches may be operating at the same time. As such, the computing devices can be more prone to communications failures with dual NICs communicating with dual TOR switches than using just one NIC for each computing device.

[0005] Several embodiments of the disclosed technology can provide efficient and cost effective TOR switch redundancy by implementing optical switching between multiple primary TOR switches and one or more standby TOR switches. In one implementation, computing devices in an enclosure can be individually coupled to an optical multiplexer via fiber optic cables. A primary optical switch can then couple the optical multiplexer to a primary TOR switch. The primary optical switch can switch the computing devices from being connected to the primary TOR switch to a standby optical switch when the primary TOR switch encounters abnormal operation conditions. In turn, the standby optical switch can couple the primary optical switch to a standby TOR switch operating in place of the primary TOR switch.

[0006] The standby TOR switch can be generally similar to the primary TOR switch in structure and function. As such, a single standby TOR switch can provide redundancy for two, four, eight, sixteen, thirty two, or any other suitable number of primary TOR switches. Thus, capital investments for providing redundancy to the primary TOR switches can be much lower than using dual TOR switches for each enclosure. Several embodiments of the disclosed redundancy scheme can also be more efficient than using dual TOR switches per enclosure because switching optical switches can be a simple operation. Optical switches can be more reliably switched than switching between a pair of active TOR switches. Thus, communications reliability of computer networks in datacenters can be improved.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Figure 1 is a schematic diagram illustrating a computer network having a standby network node in accordance with embodiments of the disclosed technology.

[0008] Figures 2A-2C are block diagram showing software components suitable for the network controller of Figure 1 during various modes of operation in accordance with embodiments of the disclosed technology.

[0009] Figure 3 is a block diagram showing software components suitable for the controller of Figure 1 having multiple standby network node in accordance with embodiments of the disclosed technology.

[0010] Figure 4 is a flow diagram illustrating embodiments of a process of providing standby backup capabilities to a network node in a computer network in accordance with embodiments of the disclosed technology.

[0011] Figure 5 is a flow diagram illustrating embodiments of a process of detecting abnormal operating conditions at a network node to a computer network in accordance with embodiments of the disclosed technology.

[0012] Figure 6 is a flow diagram illustrating embodiments of a process of switching optical connections in a computer network in accordance with embodiments of the disclosed technology. [0013] Figure 7 is a flow diagram illustrating embodiments of a process of enabling a standby network node in a computer network in accordance with embodiments of the disclosed technology.

[0014] Figure 8 is a schematic diagram illustrating another computer network having a standby network node with multiple input ports in accordance with embodiments of the disclosed technology.

[0015] Figure 9 is a computing device suitable for certain components of the computing frameworks in Figures 1-3 and 8.

DETAILED DESCRIPTION

[0016] Certain embodiments of systems, devices, components, modules, routines, and processes for managing backup capability of primary network nodes in a computer network are described below. In the following description, specific details of components are included to provide a thorough understanding of certain embodiments of the disclosed technology. A person skilled in the relevant art will also understand that the disclosed technology may have additional embodiments or may be practiced without several of the details of the embodiments described below with reference to Figures 1-9.

[0017] As used herein, the term "computer network" generally refers to an interconnected network that has a plurality of network nodes connecting a plurality of computing devices (e.g., servers) to one another and to other networks (e.g., the Internet). One example computer network can include a Gigabit Ethernet network implemented in a datacenter for providing various cloud-based computing services. The term "network node" generally refers to a physical or software emulated network device. In one example, a network node can include a TOR switch. In other examples, network nodes can include routers, other types of switches, hubs, bridges, load balancers, security gateways, firewalls, network name translators, and name servers. Each network node may be associated with one or more ports. As used herein, a "port" generally refers to a physical and/or logical communications interface through which data packets and/or other suitable types of communications can be transmitted and/or received. For example, switching one or more ports can include switching routing data from a first optical port to a second optical port, or switching from a first TCP/IP port to a second TCP/IP port.

[0018] The term "optical switch" generally refers to is a switch configured to selectively switch signals in optical fibers or integrated optical circuits from one circuit or optical pathway to another. An optical switch can have a number of input and output ports. For example, a "1 :2" optical switch includes a single input port and two selectively switchable output ports. A "32: 1" optical switch includes thirty two input ports selectively connectable to a single output port. In another example, a "16:2" optical switch includes sixteen input ports each selectively connectable to one of the two output ports. An optical switch can include mechanical, electro-optic, magneto-optic, or other suitable switching mechanisms. Example optical switches suitable for various embodiments of the disclosed technology include N77 series optical switches provided by Agilent Technologies of Santa Clara, California and S Series optical circuit switches provided by Calient Technologies, of Goleta, California.

[0019] The term "standby" is used herein to denote a readiness for duty and/or immediate deployment. For example, a standby network node (e.g., a standby switch or router) can be generally similar in structure and/or function as a corresponding primary network node. The standby network node can also be suitably connected to other computing devices, network nodes, or other components of a computer network via, for example, fiber optic, Ethernet, or other suitable types of cables. In certain embodiments, the standby network node can be powered up and await instructions to perform certain functions in a computer network in place of the corresponding primary network node. In other embodiments, the standby network node can be in a power-safe mode and may be awaken upon reception of certain instructions to perform the functions in place of the corresponding primary network node.

[0020] Figure 1 is a schematic diagram illustrating a computer network 100 having a standby network node as a backup for multiple primary network nodes in accordance with embodiments of the disclosed technology. As shown in Figure 1, the computer network 100 can include a network controller 120 and multiple network nodes 102 interconnecting multiple computing devices 106. Even though particular components are shown in Figure 1, in other embodiments, the computer network 100 can also include additional and/or different network nodes 102, computing devices 106, and/or other suitable types of components.

[0021] The network nodes 102 can be organized into a hierarchy, a mesh, or other suitable organizations. For instance, in the illustrated embodiment, the network nodes 102 can include primary network nodes 112 (illustrated as first primary network node 112a and second primary network node 112b), tier one network nodes 114, and tier two network nodes 116 interconnected with one another in a hierarchy. In particular, the primary network nodes 112 are individually connected with one or more tier one network nodes 114. In turn, the tier one network nodes 114 are individually connected with one or more tier two network nodes 116. Though not shown in Figure 1, the computer network 100 can include additional network nodes 102 at tier 3, tier 4, or at other suitable number of tiers. In Figure 1, particular number of network nodes 102 at each tier are shown for illustration purposes. In other embodiments, the computer network 100 can include any suitable number of network nodes 102 at each tier. In further embodiments, the computer network 100 can also be connected to a core network (not shown).

[0022] As shown in Figure 1, the computing devices 106 can be organized into sets of computing devices 106. Each set can be individually associated with an enclosure 104 (illustrated as first enclosure 104a and second enclosure 104b). Each computing device 106 can be a network server, a storage server, a network storage device, or other suitable types of computing component. In certain embodiments, the enclosures 104 can include physical structures (e.g., racks, cabinets, shipping containers, etc.) housing the computing devices 106. In other embodiments, the enclosures 104 can be logical divisions or groupings of sets or subsets of the computing devices 106. In further embodiments, the enclosures 104 can be both physical structures that house the computing devices 106 and logical groupings of the housed computing devices 106. Even though only two enclosures 104 are illustrated in Figure 1, in other embodiments, the computer network 100 can include four, eight, sixteen, thirty two, or any suitable number of enclosures 104.

[0023] As shown in Figure 1, in each enclosure 104, multiple fiber optic cables connect a set of the computing devices 106 to an optical multiplexer 108 (individually identified as first and second optical multiplexers 108a and 108b). A pair of fiber optic cables (or a single fiber optic cable) carrying multiplexed signals can connect the optical multiplexer 108 to a primary optical switch 110 (illustrated as first primary optical switch 110a and second primary optical switch 110b). The optical multiplexer 108 can be configured to multiplex/de-multiplex signals to/from the computing devices 106 in the enclosure 104 utilizing wavelength division multiplexing, time division multiplexing, or other suitable multiplexing techniques. One example optical multiplexer suitable for the computer network 100 is a remotely controlled layer 1 A/B switch Model No. SW1044A-SM provided by Black Box Corporation of Lawrence, Pennsylvania. In other embodiments, each enclosure 104 can also include two or more optical multiplexers 108 (not shown) both connected to the primary optical switch 110 and individually to a subset of the computer devices 106 in each enclosure 104. In further embodiments, the optical multiplexers 108 may be omitted or integrated into the corresponding optical switches 110. As such, fiber optic cables can connect the computing devices 106 in each enclosure 104 directly to the optical switch 110. [0024] Each enclosure 104 can also be associated with one of the primary network nodes 112. For example, as illustrated in Figure 1, the optical switches 110 in the enclosures 104 are individually connected with a corresponding primary network node 112 via, for example, fiber optic cables. The primary network node 112 can be configured to facilitate communications with all or a portion of the computing devices 106 in the individual enclosures 104. In certain embodiments, each primary network node 112 can be a TOR switch. In other embodiments, the primary network nodes 112 can also include load balancers, firewalls, or other suitable types of network devices. One example network device suitable for the primary network node 112 is a network switch (Model No. Cisco Catalyst 4500-X Switch) provided by Cisco System, Inc. of San Jose, California.

[0025] As shown in Figure 1, the computer network 100 can also include one or more standby network nodes 118 and a standby optical switch 111 configured to provide standby backup capability to the primary network nodes 112. In the illustrated embodiment, the primary optical switches 110 can each include an output port connected to an input port of the standby optical switch 111. As such, the primary optical switches 110 can have a 1 :2 configuration with one input port connected to the optical multiplexer 108 and two output ports individually connected to the primary network node 112 and the standby network node 118. The standby optical switch 111 can have a 2: 1 configuration with two input ports individually connected to the first and second primary optical switches 110a and 110b and an output port connected to the standby network node 118. In other embodiments, the standby optical switch 111 can also have 3 : 1, 4: 1, 8: 1, 16: 1, 32: 1, or other suitable configurations to accommodate additional primary optical switches 110 (not shown).

[0026] The standby network node 118 can have generally similar connectivity with higher level network nodes 102 as the primary network nodes 112. For example, in the illustrated embodiment, the standby network node 118 can be connected to one or more of the tier one network nodes 114. In other embodiments, the standby network node 118 can also be connected to one or more of the tier two or other suitable network nodes 102. In certain embodiments, the standby network node 118 can be generally similar in structure and function as the primary network nodes 112. In other embodiments, the standby network node 118 can have different structure and/or function as the primary network nodes 1 12. One example is described in more detail below with reference to Figure 8.

[0027] The network controller 120 can include a sever, a virtual machine, or other suitable computing facilities operatively coupled to the computing devices 106, the primary optical switches 110, the primary network nodes 112, the standby optical switch 111, the standby network node 118, and/or other components of the computer network 100. In Figure 1, the network controller 120 is shown as being independent from the computing devices 106. In other embodiments, the network controller 120 can be hosted on one or more of the computing devices 106. In certain embodiments, the network controller 120 can include components of a software defined network ("SDN") controller associated with the computer network 100. In other embodiments, the network controller 120 can also include components of a cloud controller (e.g., Microsoft Azure™ controller) associated with the computer network 100.

[0028] In operation, the network nodes 102 can facilitate communications with the computing devices 106. For example, in certain embodiments, messages (e.g., packets) from a computing device 106a in the first enclosure 104a can be routed to another computing device 106b in the second enclosure 104b via a first optical connection along the first optical multiplexer 108a, the first primary optical switch 110a, and the first primary network 112a to a tier one network node 114. The tier one and/or tier two network nodes 114 and 116 can then route the messages to the computing device 106b following a suitable protocol. The tier one and/or tier two network nodes 114 and 116 can also route the messages to a destination outside the computer network 100 via upper-level network nodes (not shown), core network nodes (not shown) or other suitable components.

[0029] During operation, the network controller 120 can be configured to monitor for an abnormal operating condition of one or more of the primary network nodes 1 12 and provide backup capabilities with the standby network node 118 accordingly. For example, in response to a detected abnormal operating condition at, for instance, the first primary network node 110a, the network controller 120 can be configured to cause the first primary optical switch 110a to switch from the first optical connection 113a to a second optical connection 113b between the first primary optical switch 110a and the standby network node 118. The network controller 120 can also be configured to cause the standby optical switch 111 to connect the first primary optical switch 110a to the standby network node 118. The network controller 120 can then enable the standby network node 118 to facilitate communications with the computing devices 106 in the first enclosure 104a in place of the first primary network node 112a. Similarly, in response to a detected abnormal operation condition at the second primary network node 110b, the network controller 120 can also cause the standby network node 118 to provide backup capability for the second primary network node 110b. [0030] As such, the standby network node 118 can provide standby backup capabilities to two, three, or any suitable number of primary network nodes 112. Thus, capital investments for providing such standby backup capabilities can be much lower than providing dual primary network nodes (not shown) for each enclosure 104. Several embodiments of the computer network 100 can also operate more efficiently and reliably than using dual primary network nodes per enclosure. Optical switches such as the primary optical switches 110 and standby optical switch 111 can be more reliably operated than switching between a pair of active dual primary network nodes. Operations and components of the network controller 120 are described in more detail below with reference to Figures 2A-2C.

[0031] Figures 2A-2C are block diagram showing software components suitable for the controller of Figure 1 during various modes of operation in accordance with embodiments of the disclosed technology. In particular, Figure 2A illustrates a normal operating mode in which both the first and second primary network nodes 112a and 112b are functioning properly. Figure 2B illustrates an operating mode in which an abnormal operating condition is detected at the first primary network node 112a but not the second primary network node 112b. Figure 2C illustrates another operating mode in which an abnormal operating condition is detected at both the first primary network node 112a and the second primary network node 112b. In Figures 2A-2C, active connections are illustrated as solid lines while non-active connections are illustrated as dashed lines.

[0032] In Figures 2A-2C and in other Figures hereinafter, individual software components, objects, classes, modules, and routines may be a computer program, procedure, or process written as source code in C, C++, Java, and/or other suitable programming languages. A component may include, without limitation, one or more modules, objects, classes, routines, properties, processes, threads, executables, libraries, or other components. Components may be in source or binary form. Components may include aspects of source code before compilation (e.g., classes, properties, procedures, routines), compiled binary units (e.g., libraries, executables), or artifacts instantiated and used at runtime (e.g., objects, processes, threads). Components within a system may take different forms within the system. As one example, a system comprising a first component, a second component and a third component can, without limitation, encompass a system that has the first component being a property in source code, the second component being a binary compiled library, and the third component being a thread created at runtime.

[0033] The computer program, procedure, or process may be compiled into object, intermediate, or machine code and presented for execution by one or more processors of a personal computer, a network server, a laptop computer, a smartphone, and/or other suitable computing devices. Equally, components may include hardware circuitry. A person of ordinary skill in the art would recognize that hardware can be considered fossilized software, and software can be considered liquefied hardware. As just one example, software instructions in a component can be burned to a Programmable Logic Array circuit, or can be designed as a hardware circuit with appropriate integrated circuits. Equally, hardware can be emulated by software. Various implementations of source, intermediate, and/or object code and associated data may be stored in a computer memory that includes readonly memory, random-access memory, magnetic disk storage media, optical storage media, flash memory devices, and/or other suitable computer readable storage media excluding propagated signals.

[0034] As shown in Figure 2A, the network controller 120 can include a processor 130 operatively coupled to a memory 150. The processor 130 can include a microprocessor, a field-programmable gate array, and/or other suitable logic devices. The memory 150 can include volatile and/or nonvolatile media (e.g., ROM; RAM, magnetic disk storage media; optical storage media; flash memory devices, and/or other suitable storage media) and/or other types of computer-readable storage media configured to store data received from, as well as instructions for, the processor 130 (e.g., instructions for performing the methods discussed below with reference to Figures 4-7).

[0035] As shown in Figure 2A, the memory 150 can also contain records of sets configuration information 152 associated with the primary network nodes 112. A set of the configuration information 152 can include data suitable to cause a network node 102 (Figure 1) to perform desired functions. For example, the configuration information 152 can include data of port configurations, routing tables, network addresses, connectivity configuration, enable/disable configuration, and/or other suitable information, in certain embodiments, the set of configuration information 152 can be collected from the primary network nodes 112 and updated on a continuous, periodic, or other suitable basis. In other embodiments, the set of configuration information 152 can be collected from the primary network nodes 112 and cached for a pre-determined period of time. In further embodiments, the set of configuration information 152 can be collected from the primary network nodes 112 on an ad hoc or other suitable basis.

[0036] The processor 130 can execute instructions to provide a plurality of software components 140 configured to facilitate providing backup capabilities to the primary network nodes 112. As shown in Figure 2A, the software components 140 include a detection component 133, a control component 135, and an output component 137 operatively coupled to one another. In one embodiment, all of the software components 140 can reside on a single computing device (e.g., a server). In other embodiments, the software components 140 can also reside on multiple distinct servers or computing devices. In further embodiments, the software components 140 may also include network interface components and/or other suitable modules or components (not shown).

[0037] The detection component 133 can be configured to detect an abnormal operating condition at the individual primary network nodes 112. In certain embodiments, the detection component 133 can be configured to receive one or more operating parameters 154 from the individual primary network nodes 112 and indicate an abnormal condition based on the received operating parameters 154. For example, the operating parameters 154 can include an average, accumulative, or other suitable types of throughput values at the primary network nodes 112. In other examples, the operating parameters 154 can include instantaneous or average transmission speed, instantaneous or average change in throughput, network load balancing parameters, and/or other suitable parameters. In certain embodiments, the detection component 133 can poll the primary network nodes 112 for the operating parameters on a continuous or periodic basis. In other embodiments, the primary network nodes 112 can be configured to automatically transmit the operating parameters 154 to the detection component 133.

[0038] The detection component 133 can then compare the received operating parameters 154 with a corresponding threshold value to indicate whether the primary network nodes 112 are associated with abnormal operating conditions. For example, in certain embodiments, the detection component 133 can indicate an abnormal operating condition at the primary network nodes 112a based on comparisons indicating the following:

· An associated average throughput over a period of time is below a threshold;

• An accumulated throughput over a period of time is below a threshold;

• An instantaneous transmission speed is below a threshold for a pre-determined period of time; or

• A change in throughput is greater than a throughput reduction threshold;

In other embodiments, the detection component 133 can indicate an abnormal operating condition at the primary network nodes 112a based on other suitable conditions.

[0039] In other embodiments, the detection component 133 can be configured to detect abnormal operating conditions by receiving one or more status indicators 156 from the primary network nodes 112. For example, the status indicator 156 can indicate that one of the primary network node 112 is in a non-operating mode, e.g., device failure, software update, system maintenance, or other suitable modes. The detection component 133 can then indicate an abnormal operating condition at the primary network nodes 112a based on the status indicators 156.

[0040] In certain embodiments, the detection component 133 can indicate an abnormal condition at the individual primary network nodes 112 with an impact period associated with the indicated abnormal condition. For example, if the status indicator 156 indicates that a primary network node 112 is undergoing software update, the detection component 133 can indicate the abnormal operating condition with an associated impact period (e.g., 10 minutes). At the expiration of the impact period, the detection component 133 may re- check a status of the corresponding primary network node 112. In other embodiments, the detection component 133 can indicate an abnormal condition (e.g., system failure) at the primary network nodes 112 without an impact period. Thus, the indication of the abnormal operating condition can be indefinite. In further embodiments, the detection component 133 can re-check a status of the primary network nodes 112 even without an associated impact period, for instance, over a pre-determined time periods. The detection component 133 can also be configured to forward an indicated abnormal operating condition at the individual primary network nodes 112 to the control component 135 for further processing.

[0041] The control component 135 can be configured to provide standby backup capabilities to a primary network node 112 associated with an indicated abnormal operating condition from the detection component 133. Figure 2B illustrates an example in which the detection component 133 indicates an abnormal operating condition at the first primary network node 112a. As shown in Figure 2B, in response to the indicated abnormal operating condition, the control component 135 can be configured to prompt the first primary optical switch 110a to switch connection from the first primary network node 112a to the standby network node 118. For example, the control component 135 can cause the output component 137 to transmit:

• A first instruction 160a to the first primary optical switch 110a to switch connection from the first primary network node 112a to the standby optical switch 111; and

• A second instruction 160b to the standby optical switch 111 to connect the first primary optical switch 110a to the standby network node 118. As such, the first optical switch 110a can switch from the first optical connection 113a to the second optical connection 113b (as shown in the solid lines).

[0042] The control component 135 can also retrieve a set of configuration information 152 associated with the first primary network node 112a from the memory 150. The control component 135 can then be configured to cause the output component 137 to transmit the retrieved configuration information 152 to the standby network node 118 along with an instruction (not shown) to configure the standby network node 118 based on the transmitted configuration information 152. In certain embodiments, the standby network node 118 can provide a confirmation message (not shown) to the control component 135 confirming successful completion of configuration based on the transmitted configuration information 152. Upon receiving the confirmation message, the control component 135 can cause the output component 137 to transmit another instruction 160c to the standby network node 118 to facilitate communications with the computing devices 106 (Figure 1) based on the replicated configuration information 152. In other embodiments, the standby network node 118 can be configured to initiate facilitation of communications with the computing devices 106 once configuration is complete without the instruction 160c. As such, the computing devices 106 in the first enclosure 104a (Figure 1) can communicate with other computing devices 106 via the first optical multiplexer 108a, the first optical switch 110a, the standby optical switch 111, and the standby network node 118.

[0043] The output component 137 is configured to transmit instructions, configuration information 152, and/or other suitable types of data to the various components of the computer network 100 (Figure 1). In certain embodiments, the output component 137 can include a network interface controller. In other embodiments, the output component 137 can also include virtual network interface controller, a wireless network interface controller, or other suitable hardware/software components.

[0044] The control component 135 can also be configured to determine to provide standby backup capabilities to one or more selected primary network nodes 112 having abnormal operating conditions. Figure 2C illustrates an example in which both the first and second primary network nodes 112 have an indicated abnormal operating conditions. In response to the indicated abnormal operating conditions at both the first and second primary network nodes 112, the control component 135 can be configured to determine a number of available standby network node(s) 118.

[0045] If the determined number of available standby network node(s) 118 is less than the number of primary network nodes 112 with abnormal operating conditions, in certain embodiments, the control component 135 can be configured to select one or more of the primary network nodes 112 based on, for example, an operating profile of the computing devices 106 associated with the primary network nodes 112, administrator preference, or other suitable criteria. The operating profile can include priority of tasks for execution, current operating modes of the computing devices 106, service availability guarantee associated with the computing devices 106, and/or other suitable characteristics. For instance, with respect to Figure 2C, if the computing devices 106 associated with the first primary network node 1 12a are currently performing higher priority tasks (e.g., web searching), and are associated with a higher service availability guarantee than those associated with the second primary network node 112b, the control component 135 can be configured to select the first primary network node 112a over the second primary network node 112b. Once the first primary network node 112a is repaired, replaced, or otherwise becomes functional again, the control component 135 can then select the second primary node 112b. In another example, an administrator can modify a selection preference between the first or second primary network node 112a or 112b during an outage of these components. In yet further examples, the control component 135 can select one of the first or second primary network node 112a and 112b based on an administrator designation, random selection, or other suitable basis.

[0046] Based on the selection, the control component 135 can be configured to provide standby backup capabilities to the selected primary network node(s) 112 as discussed in more detail above with reference to Figure 2B. In the illustrated embodiment, the first primary network node 1 12a is selected over the second primary network node 112b. As a result, control component 135 can cause the first primary optical switch 110a is switch connection from the first primary network node 112a to the standby network node 118.

[0047] If the determined number of available standby network node(s) 118 is not less than the number of primary network nodes 112 with abnormal operating conditions, the control component 135 can be configured to provide standby backup capabilities to all of the primary network nodes 112, as illustrated in Figure 3. As shown in Figure 3, the computer network 100 can include two standby network nodes 118 (individually identified as first and second standby network nodes 118a and 118b). The first and second standby network nodes 118a and 118b are both connected to the standby optical switch 111. The standby optical switch 111 can controllably connect the individual first and second standby network nodes 118a and 118b to the first and second optical switches 110a and 110b, respectively. [0048] Upon receiving indication of abnormal operating conditions at both the first and second primary network nodes 1 12a and 112b, the control component 135 can be configured to cause the output component 137 to transmit:

• A first instruction 161 a to the first optical switch 110a to switch connection from the first primary network node 112a to the standby optical switch 111;

• A second instruction 16 lb to the second optical switch 11 Ob to switch connection from the second primary network node 112b to the standby optical switch 111; and

• A third instruction 161c to the standby optical switch 111 to connect the first primary optical switch 110a to the first standby network node 118a and to connect the second primary optical switch 110b to the second standby network node 118b.

The control component 135 can also cause the configuration information 152 of the first and second primary network nodes 112a and 112b to be replicated at the first and second standby network nodes 118a and 118b, respectively. Thus, the first and second standby network nodes 118a and 118b can facilitate communications with the computing devices 106 in the first and second enclosures 104a and 104b in place of the first and second primary network nodes 112a and 112b, respectively.

[0049] Even though only two standby network nodes 118a and 118b are illustrated in Figure 3, in other embodiments, the computer network 100 can also include three, four, or any suitable number of standby network nodes 118 (not shown). In certain embodiments, the number of standby network nodes 118 may be determined based on a threshold availability value (e.g., 99.9%) of the computing devices 106. In other embodiments, the number of standby network nodes 118 may be determined based on thresholds of capital investment, operating complexity, or other parameters.

[0050] Figure 4 is a flow diagram illustrating embodiments of a process 200 of providing standby backup capabilities to a network node in a computer network in accordance with embodiments of the disclosed technology. The process 200 is described below with reference to the computer network 100 and software components of Figures 1-2C. For example, the network node can be the first or second primary network node 112 connected to the computing devices 106 (Figure 1) via the first or second primary optical switches 110, respectively. In other embodiments, the process 200 can also be implemented in other suitable computer networks and/or hardware/software components. [0051] As shown in Figure 4, the process 200 includes detecting an abnormal operating condition at the network node at stage 202, for example, by utilizing the detection component 133 of Figure 2A. In certain embodiments, detecting the abnormal operating condition can include continuously or periodically receive operating parameters from the network node and comparing the received operating parameters with corresponding thresholds, as described in more detail below with reference to Figure 5. In other embodiments, detecting the abnormal operating condition can include receiving and analyzing status indicators from the network node. In further embodiments, detecting the abnormal operating condition can include a combination of comparing the received operating parameters with corresponding thresholds and analyzing status indicators from the network node. In yet further embodiments, detecting the abnormal operating condition can include receiving administrator input or utilizing other suitable techniques.

[0052] The process 200 can then include a decision stage 204 to determine whether an abnormal operating condition is detected at the network node. In response to determining that an abnormal operating condition is not detected at the network node, the process 200 includes reverting to detecting an abnormal operating condition at stage 202. In response to determining that an abnormal operating condition is detected at the network node, the process 200 includes switching optical connections from the network node to a standby network node at stage 206, for example, by utilizing the control component 135 of Figure 2A. In certain embodiments, switching optical connections includes switching one or more optical switches, as described in more detail below with reference to Figure 6. In other embodiments, switching optical connections can also include enabling/disabling optical switches and/or other suitable operations.

[0053] As shown in Figure 4, the process 200 can further include enabling the standby network node to operate in place of the network node having the indicated abnormal operation condition at stage 208. In certain embodiments, enabling the standby network node includes configuring the standby network node with the same configuration information as the network node, as described in more detail below with reference to Figure 7. In other embodiments, enabling the standby network node can also include verifying configuration of the standby network node and activating the configured standby network node via remote instructions or other suitable techniques.

[0054] Optionally, the process 200 can include re-checking condition of the network node by reverting to detecting abnormal operating condition at stage 202. In one embodiment, re-checking condition of the network node can be based on an impact period with the indicated abnormal operating condition, as described in more detail above with reference to Figure 2A. In other embodiments, re-checking condition of the network node can be at a pre-selected time interval (e.g., one hour), upon administrator input, or based on other suitable criteria. In response to a determination that the indicated abnormal operating condition is cleared and/or the network node is in normal operating status, the process 200 can optionally include reverting optical connections to original configuration at stage 207. The process 200 can then optionally include returning the standby network node at a standby state at stage 209 by, for example, erasing configuration information from, re-initiating, and/or disabling the standby network node.

[0055] Figure 5 is a flow diagram illustrating embodiments of a process 202 of detecting abnormal operating conditions at a network node to a computer network in accordance with embodiments of the disclosed technology. As shown in Figure 5, the process 202 can include receiving operating parameters from the network node at stage 212. The operating parameters can include, for example, various types of throughput, speed, change of throughput, and/or other suitable parameters, as discussed above with reference to Figure 2A. The process 202 can then include comparing the received operating parameters with threshold values at stage 214. The threshold values can be input by an administrator, based on historical values, and/or based on other suitable values. The process 202 can then include a decision stage 216 to determine whether the received operating parameters conform with the threshold values. For example, the process 202 can include determining whether a received throughput of the network node is below a threshold value. In response to determining that received operating parameters conform with the threshold values, the process 202 includes indicating a normal operating condition at stage 218; otherwise, the process 202 includes indicating an abnormal operating condition at stage 220.

[0056] Figure 6 is a flow diagram illustrating embodiments of a process 206 of switching optical connections in a computer network in accordance with embodiments of the disclosed technology. As shown in Figure 6, the process 206 can include determining an optical connection path at stage 222, for example, by utilizing the control component 135 of Figure 2A. In one embodiment, determining the optical connection path includes determining an optical connection path from the computing devices 106 (Figure 1) associated with the network node to the standby network node. Based on the optical connection path, a switching pattern of one or more optical switches (e.g., the primary and/or standby optical switches 110 and 111 in Figure 1) can be determined. [0057] The process 206 can then include switching one or more primary optical switches 110 at stage 224 by, for example, utilizing the output component 137 of Figure 2A to transmit a switching instruction to the first optical switch 110a to switch from the first primary network node 112a to the standby optical switch 111 shown in Figure 2B. The process 206 can also include switching the standby optical switch 111 by transmitting another switching instruction at stage 226, for example, to connect the first optical switch 110a to the standby network node 118 shown in Figure 2B. Even though the operations at stages 224 and 226 are shown as in series, in other embodiments, these operations may be performed generally concurrently.

[0058] Figure 7 is a flow diagram illustrating embodiments of a process 208 of enabling a standby network node in a computer network in accordance with embodiments of the disclosed technology. As shown in Figure 7, the process 208 can include retrieving a set of configuration information associated with the network node. In one embodiment, the set of configuration information can be retrieved from the memory 150 in Figure 2A. In other embodiments, the set of configuration information can be retrieved from the network node directly.

[0059] The process 208 can then include replicating the retrieved configuration information at the standby network node at stage 234. In one embodiment, replicating the configuration information includes transmitting the retrieved configuration information to the standby network node with an instruction to configure based on the configuration information. In other embodiments, configuration information may be replicated manually or via other suitable techniques. The process 208 can then include activating the standby network node with the replicated configuration information at stage 236. In one embodiment, activating the standby network node can be automatic. In other embodiments, activating the standby network node can include transmitting an activation instruction to the standby network node.

[0060] Figure 8 is a schematic diagram illustrating another computer network 300 having a standby network node with multiple input ports in accordance with embodiments of the disclosed technology. The computer network 300 can include components generally similar to those of the computer network 100 shown in Figure 1. As such, similar references denote similar components. Unlike the computer network 100 in Figure 1, the computer network 300 in Figure 8 does not include the standby optical switch 111. Instead, the computer network 300 includes a standby network node 311 having multiple optical input ports 302. In the illustrated embodiment, four optical input ports 302 are shown for illustration purposes. In other embodiments, the standby network node 311 can include two, three, or any other suitable number of optical input ports.

[0061] As shown in Figure 8, an output port from each of the primary optical switches 110 is connected to a corresponding optical input port 302 of the standby network node 311. As such, during operation, network controller 120 can cause the primary optical switches to switch from the first optical connection 113a to the second optical connection 113b without switching the standby optical switch 111 of Figure 1. The network controller 120 can then cause the standby network node 311 to provide standby backup capabilities to the primary network nodes 112 as discussed above with reference to Figures 1-2C.

[0062] Figure 9 is a computing device 400 suitable for certain components of the computing network 100 in Figures 1-2B. For example, the computing device 400 may be suitable for the computing device 106 or the network controller 120 of Figure 1. In a very basic configuration 402, computing device 400 typically includes one or more processors 404 and a system memory 406. A memory bus 408 may be used for communicating between processor 404 and system memory 406.

[0063] Depending on the desired configuration, the processor 404 may be of any type including but not limited to a microprocessor (μΡ), a microcontroller (μθ), a digital signal processor (DSP), or any combination thereof. The processor 404 may include one more levels of caching, such as a level one cache 410 and a level two cache 412, a processor core 414, and registers 416. An example processor core 414 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. An example memory controller 418 may also be used with processor 404, or in some implementations memory controller 418 may be an internal part of processor 404.

[0064] Depending on the desired configuration, the system memory 406 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. The system memory 406 can include an operating system 420, one or more applications 422, and program data 424. As shown in Figure 9, in certain embodiments, the application 422 may include, for example, the detection component 133, the control component 135, and the output component 137, as described in more detail above with reference to Figure 2A. In other embodiments, the application 422 can also include other suitable components. The program data 424 may include, for example, the configuration information 152. This described basic configuration 402 is illustrated in Figure 9 by those components within the inner dashed line. [0065] The computing device 400 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 402 and any other devices and interfaces. For example, a bus/interface controller 430 may be used to facilitate communications between the basic configuration 402 and one or more data storage devices 432 via a storage interface bus 434. The data storage devices 432 may be removable storage devices 436, non-removable storage devices 438, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.

[0066] The system memory 406, removable storage devices 436, and non-removable storage devices 438 are examples of computer readable storage media. Computer readable storage media include storage hardware or device(s), examples of which include, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other media which may be used to store the desired information and which may be accessed by computing device 400. Any such computer readable storage media may be a part of computing device 400. The term "computer readable storage medium" excludes propagated signals and communication media.

[0067] The computing device 400 may also include an interface bus 440 for facilitating communication from various interface devices (e.g., output devices 442, peripheral interfaces 444, and communication devices 446) to the basic configuration 402 via bus/interface controller 430. Example output devices 442 include a graphics processing unit 448 and an audio processing unit 450, which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 452. Example peripheral interfaces 444 include a serial interface controller 454 or a parallel interface controller 456, which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 458. An example communication device 446 includes a network controller 460, which may be arranged to facilitate communications with one or more other computing devices 462 over a network communication link via one or more communication ports 464.

[0068] The network communication link may be one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media. A "modulated data signal" may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media as used herein may include both storage media and communication media.

[0069] The computing device 400 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. The computing device 400 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.

[0070] Specific embodiments of the technology have been described above for purposes of illustration. However, various modifications may be made without deviating from the foregoing disclosure. In addition, many of the elements of one embodiment may be combined with other embodiments in addition to or in lieu of the elements of the other embodiments. Accordingly, the technology is not limited except as by the appended claims.