Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
STORAGE AREA NETWORK SWITCH REBOOT NOTIFICATION
Document Type and Number:
WIPO Patent Application WO/2016/133490
Kind Code:
A1
Abstract:
Example implementations relate to a reboot notification sent by a switch of a storage area network to a host or a storage device. The reboot notification may include a port identifier affected by the reboot. The host or the storage device may determine whether redundant SAN fabric is available. The switch may receive a response relating to whether redundant SAN fabric is available.

Inventors:
RAHUL KUMAR (US)
MOHAN RUPIN (US)
PUTTAGUNTA KRISHNA (US)
Application Number:
PCT/US2015/016121
Publication Date:
August 25, 2016
Filing Date:
February 17, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEWLETT PACKARD ENTPR DEV LP (US)
International Classes:
H04L12/937; H04L12/24
Domestic Patent References:
WO2011025511A12011-03-03
Foreign References:
US20080250042A12008-10-09
US20100217845A12010-08-26
US20070088870A12007-04-19
Other References:
VMWARE: "SAN Conceptual and Design Basics", 28 April 2014 (2014-04-28), Retrieved from the Internet
Attorney, Agent or Firm:
KWOK, Jonathan T. et al. (3404 E. Harmony RoadMail Stop 7, Fort Collins CO, US)
Download PDF:
Claims:
We claim:

1 . A method comprising:

identifying, by a switch of a storage area network (SAN) fabric, a host or a storage device, each connected to the SAN fabric, to be affected by an initiated reboot of the switch;

if the identifying identifies an affected host, sending a reboot notification, by the switch, to the affected host, the reboot notification including: a host port identifier (ID) of the affected host if the affected host is local to the switch, or a storage device port ID targeted by the affected host if the affected host is remote to the switch;

if the identifying identifies an affected storage device, sending the reboot notification, by the switch, to the affected storage device, the reboot notification including a host port ID associated with the affected storage device; and

receiving, by the switch, a response from the affected host or the affected storage device relating to whether a redundant SAN fabric is available.

2. The method of claim 1 , wherein the switch performs the identifying the affected host or the affected storage device by querying a name server database on the switch.

3. The method of claim 1 , wherein the response is one of a plurality of responses received from a plurality of affected hosts or a plurality of affected storage device,

the switch completes the reboot if each response of the plurality of responses indicates that redundant SAN fabric is available, and

the switch suspends the reboot if at least one response of the plurality of responses indicates that no redundant SAN fabric is available.

4. The method of claim 1 , wherein the response received from the affected host relates to whether the affected host has redundant SAN fabric for connecting with logical units accessed via the host port ID of the affected host or via the storage device port ID targeted by the affected host.

5. The method of claim 1 , wherein the response received from the affected storage device relates to whether redundant SAN fabric is available for

connecting (a) logical units mapped to the host port ID associated with the affected storage device with (b) another host port ID on the same host as the host port ID associated with the affected storage device.

6. The method of claim 3, further comprising:

transmitting, if the switch suspends the reboot, a user notification that identifies an affected host or an affected storage device for which no redundant SAN fabric is available; and

receiving a user command to proceed with the reboot or to abort the reboot.

7. The method of claim 1 , wherein the reboot notification and the response are each included in messages exchanged via a SAN protocol.

8. A non-transitory machine readable medium storing instructions executable by a processor of a host, the non-transitory machine readable medium

comprising:

instructions to receive a reboot notification from a switch of a storage area network fabric indicating that the switch intends to reboot and including a port identifier (ID) affected by the reboot;

instructions to identify a logical unit accessed by the host via the port ID; instructions to determine whether a redundant path is available for the host to access the identified logical unit without using the switch; and

instructions to send a response to the switch based on the whether redundant paths are determined to be available.

9. The non-transitory machine readable medium of claim 8, wherein

the port ID is a host port ID if the host is local to the switch, and the port ID is a storage device port ID targeted by the host if the host is remote to the switch.

10. The non-transitory machine readable medium of claim 8, wherein the instructions to identify the logical unit and the instructions to determine whether a redundant path is available are included in a multipathing module of the host.

1 1 . The non-transitory machine readable medium of claim 8, wherein the instructions to determine whether a redundant path is available calculates a number of remote active paths by subtracting a number of paths to the logical unit via the switch from a number of total active paths between the host and the logical unit, and

the number of remote active paths being one or greater indicates that at least one redundant path is available for the host to access the logical unit.

12. A non-transitory machine readable medium storing instructions executable by a processor of a storage device, the non-transitory machine readable medium comprising:

instructions to receive a reboot notification from a switch of a storage area network (SAN) fabric, the reboot notification indicating that the switch intends to reboot and including an affected host port identifier (ID) in the SAN fabric;

instructions to identify a logical unit mapped to the affected host port ID; and

instructions to determine another host port ID on the same host as the affected host port ID, the another host port ID being mapped to the logical unit and forming part of a redundant SAN fabric.

13. The non-transitory machine readable medium of claim 12, further comprising instructions to ping the another host port ID to verify if the another host port ID is reachable by the storage device.

14. The non-transitory machine readable medium of claim 13, further comprising:

instructions to send a reboot approval to the switch if each another host port ID of a plurality of another host port IDs associated with the storage device is verified by ping to be reachable by the storage device; and

instructions to send a reboot disapproval to the switch if at least one another host port ID of the plurality of another host port IDs cannot be verified by ping to be reachable by the storage device.

15. The non-transitory machine readable medium of claim 12, further comprising instructions to send a reboot disapproval to the switch if the storage device cannot identify another host port ID being on the same host as the affected host port ID and being mapped to the logical unit.

Description:
STORAGE AREA NETWORK SWITCH REBOOT NOTIFICATION

BACKGROUND

[0001 ] Computer networks can be implemented to allow networked devices, such as personal computers, servers, data storage devices, etc., to communicate and share resources. One type of network implementation is a storage area network (SAN), which can, for example, interconnect of data storage devices with associated host devices. SANs may include network switches to route data traffic along routing paths between networked storage devices and host devices.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Various examples will be described below with reference to the following figures.

[0003] FIG. 1 is a block diagram of an example system that may make use of the present disclosure.

[0004] FIG. 2 is a flow diagram of a method for sending a reboot notification according to an example implementation.

[0005] FIG. 3 is a flow diagram of a method for completing or aborting a switch reboot according to an example implementation.

[0006] FIG. 4 is a flow diagram of a method for receiving a reboot notification according to an example implementation.

[0007] FIG. 5 is a flow diagram of a method for receiving a reboot notification according to an example implementation.

[0008] FIG. 6 is a block diagram showing a non-transitory, machine-readable medium encoded with instructions for receiving a reboot notification according to an example implementation.

[0009] FIG. 7 is a block diagram showing a non-transitory, machine-readable medium encoded with instructions for receiving a reboot notification according to an example implementation. [0010] FIG. 8 is a block diagram showing a non-transitory, machine-readable medium encoded with instructions to send a reboot approval or to send a reboot disapproval according to an example implementation.

DETAILED DESCRIPTION

[001 1 ] A storage area network (SAN) can be used to provide host devices (e.g., workstations, servers, and the like) with networked access to data storage devices (e.g., disk arrays, tape libraries, optical storage devices, and the like). A SAN may employ interconnected switches, that form SAN fabrics (also referred to as routing paths), to route data between host devices and data storage devices. However, a SAN fabric may be disrupted when a switch reboots (for example, a switch may need to reboot following a firmware or driver upgrade), which in turn may cause undesirable service disruptions between host devices and data storage devices. To mitigate disruptions, some SAN implementations may include failover operations that place components and services of the SAN in a standby state until completion of the switch reboot or other failover operations that move data traffic to a redundant SAN fabric if available, but such failover operations may nevertheless result in undesirable service delays and/or may still be vulnerable to disruptions. The above described disruptions and delays may prove fatal in mission critical systems. Moreover, in enterprise organizations, SAN environments may be large and complex with hundreds of switches, hosts, and storage devices, for example. In such large and complex SAN

environments, an information technology (IT) professional may not be able to manually manage switch reboots to avoid disruptions and delays. Accordingly, it may be desirable for switch reboots to be well-coordinated and automated events between switches, host devices, and/or storage devices.

[0012] The techniques of the present application may, in some example implementations, identify a host device or a storage device that will be affected by a reboot of a switch, and send a reboot notification by the switch to the affected host or affected storage device. The switch may then receive a response from the affected host or the affected storage device relating to whether a redundant SAN fabric is available. The switch may then complete the reboot if the received response indicates that redundant SAN fabric is available or suspend the reboot if the received response indicates that no redundant SAN fabric is available. Accordingly, the techniques of the present application may be useful for coordinating a switch reboot among SAN components in an automated manner.

[0013] Referring now to the figures, FIG. 1 is a block diagram of an example storage area network (SAN) 100 that may make use of the present disclosure. The SAN 100 can include a plurality of switches (e.g., switches 130, 132, 132, 140, and 142) for connecting a host 102 to a logical unit 1 16 of a storage device 1 10 and to a logical unit 126 of a storage device 120, in a manner described herein below. The SAN 100 can implement a networking protocol such as Fibre Channel Protocol, Internet Small Computer System Interface (iSCSI), ATA over Ethernet (AoE), Fibre Channel over Ethernet (FCoE), and the like. It should be understood that the implementations described herein can be used or be adapted for use with SANs having greater or fewer components, different types of devices, and different network arrangements, without departing from a scope of the present disclosure.

[0014] The host 102 may be, for example, a server, a desktop computer, a workstation, a laptop computer, or the like. The host 102 may include at least one host port, such as, for example, host port 104 or host port 106. A host port may be, for example, an Ethernet port of a network interface card or a Fibre Channel port of a Fibre Channel host bus adapter. Each host port (104, 106) may be identified by a host port identifier (ID) (e.g., an eight-bit hexadecimal number). For example, the host port ID may be a Fibre Channel ID where the SAN 100 implements Fibre Channel Protocol. In some implementations, the host 102 may include a multipathing module that manages what logical units the host 102 may communicate with, and what routing paths are available for connecting the host 102 with those logical units. As used herein, the term

"module" (e.g., as used in "multipathing module") may refer a set of instructions encoded on a machine-readable medium of a device and executable by a processor of the device. Additionally or alternatively, a module can include a hardware device comprising electronic circuitry for implementing the functionality described below. Additional functionality of the host 102 is described further herein below with respect to at least the method 400 of FIG. 4.

[0015] The storage devices 1 10 and 120 may each be, for example, a disk array, a tape library, optical storage devices, and the like. The storage devices 1 10 and 120 may each include at least one storage device port, which may be of the same networking technology as the host port (e.g., an Ethernet port or a Fibre Channel port, as the case may be). For example, as depicted in FIG. 1 , the storage device 1 10 may have at least a storage device port 1 12, and the storage device 120 may have at least storage device ports 122 and 124. Each storage device port may be identified by a corresponding storage device port ID (e.g., an eight-bit hexadecimal number). In some implementations, a portion or all of the storage capacity of a storage device may be presented as a logical unit (which may also be known as a logical disk or a virtual disk, among other terms). For example, as depicted in FIG. 1 , the storage device 1 10 presents a logical unit 1 16 and the storage device 120 presents a logical unit 126. In some

implementations, storage capacity from multiple storage devices may be collected and presented as a single logical unit. In some implementations, storage capacity of a single storage device may be presented as multiple logical units. A logical unit may be referred to by a logical unit number (LUN). A storage device may include a controller for processing data traffic between its storage device ports and any logical units of the storage device. Additional functionality of the storage devices 1 10 and 120 is described further herein below with respect to at least the method 500 of FIG. 5

[0016] In operation of some implementations, the host, which may be referred to as an "initiator," initiates a protocol session (e.g., a session using Fibre

Channel Protocol, iSCSI, AoE, FCoE, or the like) to perform a data transfer operation with a storage device, which may be referred to as a "target," over a SAN fabric. A SAN fabric, as will be described further herein below, may include interconnected switches. In some implementations, the host initiator may address the storage device target via the SAN fabric using a four level hierarchy syntax such as Host:Bus:Target:LUN, which identifies (or targets), among other things, a storage device port ID of the storage device target. Once the storage device target receives data from the host, the storage device determines the appropriate LUN to complete that particular data transfer operation.

[0017] A SAN fabric refers to a network topology comprised of interconnected network switches that connects a host port of a host to a storage device port of a storage device. It should be appreciated that the term "switch" can include other devices for forming a SAN fabric, such as suitable routers, gateways, and other devices that can provide switch-like functionality for a SAN. As an example, in the SAN 100 of FIG. 1 , data may be routed between the host 102 (and the logical unit 126 of the storage device 120 by way of a routing path through a SAN fabric comprising the interconnected switches 130, 132, and 134 connected to the host port 104 and the storage device port 122. Alternatively, data can also be routed between the host 102 and the logical unit 126 of the storage device 120 by way of a routing path through a SAN fabric comprising interconnected switches 140 and 142 connected to the host port 106 and the storage device port 124. By virtue of the SAN fabric comprising switches 130, 132, and 134 being separate from the SAN fabric comprising switches 140 and 142, the SAN 100 is deemed to have redundant SAN fabrics (also referred to as redundant paths or redundant routing paths) for routing data between the host 102 and the logical unit 126. In the SAN 100 of FIG. 1 , data may also be routed between the host 102 and the logical unit 1 16 of the storage device 1 10 (through the storage device port 1 12), but no redundant SAN fabric is depicted to connect host 102 and storage device 1 10.

[0018] An additional aspect of a SAN is zoning. For example, a SAN may be connected to multiple hosts and multiple storage devices, and may be zoned so as to restrict which hosts and storage devices are permitted to communicate with each other. For example, zoning may be achieved by restricting access to a host port, a storage device port, and/or a logical unit. In some implementations, zoning rules are enforced by the switches of a SAN. In some implementations, the example SAN 100 depicted in FIG. 1 may be part of a larger SAN, and it may be understood that the SAN fabrics of the SAN 100 are zoned to permit the host 102 to communicate with the storage devices 1 10 and 120 and the logical units 1 16 and 126. [0019] The switches of the SAN 100 will now be described in further detail. In some implementations, at least one switch of the SAN 100 may perform the functionality described herein with respect to method 200 of FIG. 2, in response to the initiation of a reboot of that switch. In some implementations, each switch of the SAN 100 (e.g., switches 130, 132, 134, 140, and 142) may perform the functionality described in method 200, in response to the initiation of a reboot of the respective switch. To perform the functionality described in method 200, the switch or switches may include a set of instructions encoded on a machine- readable medium and executable by a processor of the device. Additionally or alternatively, the switch or switches can include a hardware device comprising electronic circuitry for implementing the functionality described in method 200. In some implementations, the steps of method 200 may be executed substantially concurrently or in a different order than shown in FIG. 2. In some

implementations, method 200 may include more or less steps than are shown in FIG. 2. In some implementations, one or more of the steps of method 200 may, at certain times, be ongoing and/or may repeat. Illustrations of method 200 will be described with respect to the switches and other components depicted in FIG. 1 .

[0020] Method 200 can begin at block 202. At block 204, a switch of a SAN fabric may identify a host or a storage device, each connected to the SAN fabric, that will be affected by an initiated reboot of the switch. The term "affected" may be understood to mean that the host or storage device may lose connectivity to other devices connected to the SAN fabric upon reboot of the switch. The switch may also be referred to herein as the "rebooting switch." Any hosts or storage devices identified by the switch at block 204 may be deemed an affected host or an affected storage device, respectively. More particularly, an affected host may be further deemed local to the switch or remote to the switch. In some

implementations, a local host may be a host that is directly connected to the switch, while a remote most may be a host that is indirectly connected to the switch by way of at least one other switch. For example, in FIG. 1 , the host 102 is local to the switch 130 and is remote to the switches 132 and 134. In some implementations, the switch can identify a plurality of affected hosts and/or a plurality of affected storage devices at block 204. In some implementations, the switch can perform block 204 by querying a name server database on the switch (which maintains SAN fabric information regarding hosts, host port IDs, storage devices, storage device port IDs, switches, and any connections therein forming the SAN fabric) to identify hosts and storage devices that are connected to and/or through the switch and thus would be affected by the switch reboot. The switch also can identify from the name server database whether the affected host is local or remote to the switch. Each switch in a SAN fabric may have a

synchronized copy of the name server database.

[0021 ] Illustrations of block 204 in view of the SAN 100 of FIG. 1 will now be described. In a case where the switch 130 initiated a reboot, the switch 130 may perform block 204 to identify host 102 to be a local affected host and to identify storage devices 1 10 and 120 to be affected storage devices. If the switch 132 initiated a reboot, the switch 132 may identify host 102 to be a remote affected host and may identify storage devices 1 10 and 120 to be affected storage devices. If the switch 134 initiated a reboot, the switch 134 may identify host 102 to be a remote affected host and may identify storage device 120 to be an affected storage device.

[0022] Referring again to FIG. 2, if the switch identifies an affected host at block 204 ("YES" at block 206), then operation passes to block 210. If the switch identifies an affected storage device at block 204 ("NO" at block 206 and "YES" at block 208), then operation passes to block 212. If the switch identifies neither an affected host nor an affected storage device, then the method 200 can proceed to end at block 216. In some implementations, if the switch identifies neither an affected host nor an affected storage device, the switch may complete the reboot prior to ending the method 200. Block 210 and block 212 will now be described in turn.

[0023] At block 210, the switch may send a reboot notification to the affected host identified at block 204. The reboot notification may indicate that the switch intends to reboot and may include a port ID (depending on whether the host is local or remote to the switch, as will be described) that, in essence, represents how much of the SAN fabric will be unavailable to the affected host if the switch reboots. If the affected host is local to the switch, then the reboot notification may include a host port ID of the affected host identified at block 204. Owing to the affected host being local to the rebooting switch, all SAN fabric through the host port connected to the rebooting switch will be unavailable. For example, because the switch 130 may identify host 102 to be a local affected host at block 204 as described above, and at block 210, the switch 130 may send a reboot notification to the host 102 that includes the host port ID of the host port 104, which represents that the host 102 may lose connectivity to all SAN fabric through the host port 104 (e.g., the switches 130, 132, and 134, the storage devices 1 10 and 120, and the logical units 1 16 and 126). If, on the other hand, the affected host is remote to the switch, then the reboot notification may include a storage device port ID targeted by the affected host. For example, the switch 132 may identify host 102 to be a remote affected host at block 204 as described above, and at block 210, the switch 132 may send a reboot notification to the host 102 that includes the storage device port IDs targeted by the host 102, namely the storage device port IDs of the storage device ports 1 12 and 122. Similarly, the switch 134, which may identify host 102 to be a remote affected host at block 204 as described above, may send, at block 210, a reboot notification to the host 102 that includes the storage device port ID of the storage device port 122 (but not the storage device port ID of the storage device port 1 12, because the host 102 does not target the storage device port 1 12 through the switch 134). By virtue of sending a storage device port ID targeted by the affected host, the affected host may be made aware of the portion SAN fabric that may become unavailable in an efficient manner.

[0024] In some implementations, the host port ID and/or the storage device port ID included in the reboot notification sent at block 206 can be identified by the switch from the name server database. Moreover, in some implementations, the switch may include the host port ID and/or the storage device port ID at block 206 only if they are zoned for communication with the affected host. In some implementations, the reboot notification can be included in or can be sent as a standardized message format of the SAN protocol (e.g., a registered state change notification (RSCN) in Fibre Channel protocol or in an Ethernet multicast messaging). Upon receipt of the reboot notification, the affected host (e.g., host 102) can perform the functionality of method 400 of FIG. 4, which will be described further herein below. After performing block 210, the switch may perform block 214, but before describing block 214, block 212 will first be described.

[0025] The switch performs block 212 if the switch identifies an affected storage device at block 204. At block 212, the switch may send a reboot notification to the affected storage device that may indicate that the switch intends to reboot and that may include a host port ID associated with the affected storage device (or, in other words, a host port ID of a host that may target the affected storage device). For example, in the above illustration where the switch 132 identifies storage devices 1 10 and 120 as affected storage devices at block 204, the switch 132 may send a reboot notification to each of the storage device 1 10 and the storage device 120 that includes the host port ID of the host port 104, owing to the host port 104 being associated with (i.e., zoned for

communication via the SAN fabric) the storage devices 1 10 and 120. As another example, the switch 134 may send a reboot notification to the storage device 120 that includes the host port ID of the host port 104. Analogously to block 210, the host port ID can be identified by the switch from the name server database, and the reboot notification can be included in or can be sent as a message

exchanged via the SAN protocol (e.g., RSCN or Ethernet multicast messaging). Upon receipt of the reboot notification, the storage device 1 10 and/or 120 can perform the functionality of method 500 of FIG. 5 and/or method 600 of FIG. 6, which will be described further herein below. After performing block 210, the switch may perform block 214.

[0026] At block 214, the switch may receive a response from an affected host (i.e., a response determined by the affected host performing method 400, as described below) and/or a response from an affected storage device (i.e., a response determined by the affected storage device performing method 500 and/or method 600, as described below). In some implementations, the response can be included in or can be sent as a message exchanged via the SAN protocol (e.g., RSCN or Ethernet multicast messaging). The response may relate to whether a redundant SAN fabric is available to the affected host and/or the affected storage device. In different implementations, the response may include different information to indicate whether a redundant SAN fabric is available. For example the response may be a binary approval/disapproval of the switch reboot. As but another example, the response may specify the nature, quantity, and/or other details regarding available or unavailable redundant SAN fabric(s).

[0027] As described above, the switch may receive the response at block 214 from an affected host. In some implementations (e.g., for local affected hosts, after performing block 210), the response received from the affected host may relate particularly to whether the affected host has redundant SAN fabric for connecting with logical units accessed via the host port ID of the affected host. For example, returning to the above illustration where the rebooting switch 130 has sent a reboot notification to the local affected host 102 that includes the host port ID of the host port 104, at block 214 the switch 130 may receive a response from the host 102 indicating that the host 102 has redundant SAN fabric to reach logical unit 126 (e.g., via host port 106, switch 140, switch 142, and storage device port 124) but does not have any available redundant SAN fabric to reach logical unit 1 16. Accordingly, the response received by the switch 130 may indicate that the host 102 disapproves of rebooting the switch 130, because it would lose connectivity to the logical unit 1 16.

[0028] In some implementations (e.g., for remote affected hosts, after performing block 210), the response received from the affected host may relate to whether the affected host has redundant SAN fabric for connecting with logical units accessed via the storage device port ID targeted by the affected host. For example, returning to the above illustration where the rebooting switch 134 has sent a reboot notification to the remote affected host 102 that includes the storage device port ID of the storage device port 122, at block 214 the switch 134 may receive a response from the host 102 indicating that the host 102 has redundant SAN fabric to reach logical unit 126 (e.g., via host port 106, switch 140, switch 142, and storage device port 124). Accordingly, the response received by the switch 134 may indicate that the affected host 102 approves of rebooting the switch 134.

[0029] In some cases (e.g., following block 212), the switch may receive the response at block 214 from an affected storage device. In such a case, the response received from the affected storage device may relate to whether redundant SAN fabric is available for connecting (a) logical units that are mapped to host port ID(s) associated with the affected storage device with (b) another host port ID on the same host as the host port ID(s) associated with the affected storage device. For example, returning to the above illustration where the rebooting switch 134 has sent a reboot notification to the affected storage device 120 that includes the host port ID of the host port 104, at block 214 the switch 134 may receive a response from the affected storage device 120 indicating that redundant SAN fabric is available for connecting the logical unit 126 (which is associated with the host port 104, for example, by way of zoning) with a host port 106 on the host 102, the redundant SAN fabric being through host port 106, the switch 140, the switch 142, and the storage device port 124. In some

implementations, the response received by the switch 134 may indicate that the affected storage device 120 approves of rebooting switch 134.

[0030] After block 214, method 200 can end at block 216. In some

implementations, the switch may then perform method 300 of FIG. 3 after performing method 200. Prior to continuing to method 300, if the switch identified a plurality of affected hosts and/or a plurality of affected storage devices at block 204, the switch may repeat blocks of method 200 to receive (and collect) a plurality of responses from the affected hosts and/or affected storage devices. It should be understood that the switch may repeat blocks of method 200 sequentially or substantially simultaneously (i.e., in parallel) for the plurality of affected hosts and/or the plurality of affected storage devices.

[0031 ] Method 300 of FIG. 3 will now be described. At block 304, the switch analyzes the response or the plurality of responses received during method 200. If each of the responses (of the plurality of responses) indicates that redundant SAN fabric is available ("YES" at block 304), then operation proceeds to block 314 and the switch completes the reboot. After completing the reboot at block 314, method can end at block 318. Returning to block 304, if, on the other hand, at least one of the responses (of the plurality of responses) indicates that no redundant SAN fabric is available ("NO" at block 304) for a corresponding affected host or affected storage device, then operation proceeds to block 306, and the switch suspends the reboot. In other words, in some implementations, the switch will not proceed to reboot unless every affected host and/or affected storage device has redundant SAN fabric, thus approving of the reboot. After suspending the reboot at block 306, the switch may proceed to perform block 308.

[0032] At block 308, the switch may transmit a user notification that identifies an affected host or an affected storage device for which no redundant SAN fabric is available (as determined, for example, from a response received at block 214 of method 200). For example, the user notification may be transmitted to a user terminal, server, workstation, desktop computer, laptop computer, mobile device, or the like, that is configured with functions for managing the SAN. By virtue of the user notification, a SAN user (e.g., an IT professional, a network

administrator, or the like) may discover hosts, storage devices, and/or logical units that are operating without redundant SAN fabric or with damaged redundant SAN fabric, which otherwise may go undiscovered in large, complex SANs.

[0033] At block 310, the switch may receive a user command to proceed with the reboot or to abort the reboot. The user command may be in accordance with a user's response to the user notification. For example, in response to the user notification, a user may take corrective action to establish or repair redundant SAN fabric, which may trigger updates to the name server databases of the switches and/or the multipathing module of the host, among other things. After taking corrective action, the user may, in some instances, send a user command to the switch to abort the reboot and may subsequently reinitiate a new switch reboot, which in turn may cause the switch to perform methods 200 and/or 300 with the revised and/or repaired SAN fabrics. As but another example, the user may decide that losing connectivity for the affected host and/or affected switch without redundant SAN fabric is not fatal, and the user may send a user command to the switch to proceed with the reboot. If the user command is to proceed with the reboot ("YES" At block 312), then the switch completes the reboot at block 314 and method 300 ends at block 318. If the user command is to abort the reboot ("NO" at block 312), then the switch aborts the reboot at block 316 and method ends at block 318.

[0034] The functionality of a host (e.g., the host 102) will now be described with reference to method 400 of FIG. 4. In some implementations, method 400 may be performed by the host 102 after a switch performs block 210 of method 200 and before the switch performs block 214 of method 200. Method 400 begins at block 402. At block 404, the host 102 receives a reboot notification from a switch of a SAN fabric (e.g., a fabric of the SAN 100) indicating that the switch intends to reboot and including a port ID affected by the reboot. For example, the received reboot notification may be the reboot notification sent by a switch at block 210 of method 200. In some implementations, the port ID included in the reboot notification depends on whether the host is local or remote to the switch, as described above with respect to block 210. If the host is local to the switch, the port ID may be a host port ID of the host. As an illustration, the host 102 may receive from the switch 130 (to which host 102 is local) a reboot notification including the host port ID of the host port 104, which represents that all the SAN fabric through the host port 104 will be unavailable to the host 102 if the switch 130 reboots. If the host is remote to the switch, the port ID may be a storage device port ID targeted by the host. To illustrate, the host 102 may receive from the switch 134 (to which the host 102 is remote) a reboot notification including the storage device port ID of storage device port 122, because the host 102 may target the storage device port 122 (and more particularly, the logical unit 126 accessed via the storage device port 122).

[0035] At block 406, the host identifies a logical unit accessed via the port ID received at block 404. In some implementations, the host can identify the logical unit by referring to a multipathing module included therein. The multipathing module may manage a list or table of logical units associated with or accessible by the host 102, and what routing paths are available between the host 102 with those logical units. For example, the multipathing module may maintain a table of Host:Bus:Target:LUN addresses zoned for targeting by the host. To illustrate, if the host 102 receives at block 404 the host port ID of the host port 104 in a reboot notification from the switch 130, the host 102 may identify logical units 1 16 and 126 as being accessed via the host port 104. As another illustration, if the host 102 receives at block 404 the storage device port ID of the storage device port 122 in a reboot notification from the switch 134, the host 102 may identify the logical unit 126 as being accessed via the storage device port 122.

[0036] At block 408, the host determines whether a redundant path (i.e., a redundant SAN fabric) is available for the host to access the logical unit identified at block 406 without using the rebooting switch. To perform block 408, the host can refer to the multipathing module to identify a routing path between the host and the logical unit that does not route data through the rebooting switch. More particularly, in some implementations, the host may perform block 408 by calculating a "number of remote active paths" by subtracting a "number of paths to the logical unit via the switch" from a "number of total active paths between the host and the logical unit." Both the "number of paths to the logical unit via the switch" and the "number of total active paths between the host and the logical unit" may be determined from the multipathing module. The calculated "number of remote active paths" being one or greater indicates that at least one redundant path is available for the host to access the logical unit. An illustration of the foregoing calculation will now be described for the case where the switch 130 of FIG. 1 has sent a reboot notification to the host 102 and the host 102 has identified that logical units 1 16 and 126 can be accessed via the host port 104. For the logical unit 1 16, the host 102 may determine from the multipathing module that the "number of total active paths between the host and the logical unit" is one (i.e., through a routing path including the host port 104, the switches 130 and 132, and the storage device port 1 12) and the "number of paths to the logical unit via the switch" also is one (i.e., also through a routing path including the host port 104, the switches 130 and 132, and the storage device port 1 12). Thus, for the logical unit 1 16, the host 102 may calculate the "number of remote active paths" to be zero, which indicates that the host 102 does not have a redundant path available to the logical unit 1 16. For the logical unit 126, the host 102 may determine from the multipathing module that the "number of total active paths between the host and the logical unit" is two (i.e., through a first routing path including the switches 130, 132, and 134, and through a second routing path including the switches 140 and 142) and the "number of paths to the logical unit via the switch" is one (i.e., the first routing path including the switches 130, 132, and 134). Thus, for the logical unit 126, the host 102 may calculate the "number of remote active paths" to be one, which indicates that the host 102 has a redundant path to the logical unit 126 (i.e., owing to the second routing path).

[0037] At block 410, the host may send a response to the rebooting switch based on whether redundant paths are determined to be available at block 408. In some implementations, the host may send a binary approval or disapproval of the reboot of the switch. In some implementations, the host may send details that identify a logical unit, a storage device, and/or a storage device port to which the host cannot connect via a redundant path. For example, as illustrated above when the switch 130 is rebooting, the host 102 has a redundant path available to the logical unit 126 but does not have a redundant path available to the logical unit 1 16, and accordingly, the host 102 may send a reboot disapproval and/or identify the logical host 1 16 to the switch 130 as not having a redundant path. As another example, when the switch 134 is rebooting, the host 102 has a redundant path available to the logical unit 126 as described above and thus may send a reboot approval to the switch 134. After block 410, method 400 may end at block 412.

[0038] The functionality of a storage device (e.g., the storage device 1 10 or 120) will now be described with reference to method 500 of FIG. 5. In some implementations, method 500 may be performed by the storage device after a switch has performed block 212 of method 200 and before the switch performs block 214 of method 200. Method 500 begins at block 502. At block 504, the storage device may receive a reboot notification indicating that a switch of a SAN fabric (e.g., a fabric of SAN 100) intends to reboot. The reboot notification also may include an affected host port ID in the SAN fabric. For example, the received reboot notification may be the reboot notification sent by a switch at block 212 of method 200, and the affected host port ID may be the host port ID included in that reboot notification. As an illustration, the storage device 120 may receive a reboot notification from the switch 134 that includes the host port ID of the host port 104, because connectivity between the host 102 (from host port 104) and the storage device 120 (via storage device port 122) may be affected by a reboot of the switch 134. It should be understood that, in a SAN that interconnects multiple hosts to a storage device, the reboot notification received by that storage device may include a plurality of affected host port IDs for each host that may target or communicate with that storage device.

[0039] At block 506, the storage device may identify a logical unit mapped to the affected host port ID included in the reboot notification received at block 504. In some implementations, the storage device maintains a LUN-host mapping, which identifies which logical units can be accessed by which hosts and/or host ports in a SAN. The storage device can perform block 506 by referring to the LUN-host mapping. In some implementations where a plurality of affected host port IDs are included with the reboot notification, the storage device identifies logical unit(s) mapped to each of the host port IDs. To continue the preceding illustration, the storage device 120 may identify at block 506 that the logical unit 126 is mapped to the host port ID of the host port 104 included in a reboot notification from the rebooting switch 134.

[0040] At block 508, the storage device may determine (or identify) another host port ID that is on the same host as the affected host port ID and that is mapped to the logical unit identified at block 506. The another host port ID thus may be deemed to form part of a redundant fabric with respect to the host and the logical unit. The storage device may perform block 508 by referring again to the LUN-host mapping. If a plurality of affected host port IDs are included with the reboot notification, then the storage device may determine another host port ID for each affected host port ID included in the reboot notification. To illustrate block 506 (and continuing from the preceding illustration), the storage device 120 may determine from the LUN-host mapping that the logical unit 126 is associated with host port 104 and host port 106, which are both ports on the same host 102. Accordingly, the storage device 120 may determine the host port ID of the host port 106 to be the another host port ID at block 508. [0041 ] If another host port ID is not identified at block 508 ("NO" at block 510), then operation passes to block 518, where the storage device sends a reboot disapproval to the rebooting switch and method 500 can end. If another host port ID is identified at block 508 ("YES" at block 510), then operation passes to block 512. At block 512, the storage device may ping the another host port ID determined at block 508 to verify if the another host port ID is reachable by the storage device. If a plurality of another host port IDs at block 506 for a plurality of affected host port IDs included in the reboot notification, the storage device may ping each of the another host port IDs. In some implementations, if all of the another host port IDs are verified by ping to be reachable by the storage device ("YES" at block 514), then the storage device may send a reboot approval to the rebooting switch at block 516, and method 500 may end at 520. On the other hand, if at least one of the another host port IDs cannot be verified by ping to be reachable by the storage device ("NO" at block 514), then the storage device may send a reboot disapproval to the rebooting switch at block 518, and method 500 may end at 520. In some implementations, the reboot disapproval may also include information regarding which of the another host port IDs (and/or the corresponding host) is not reachable by the storage device.

[0042] FIG. 6 is a block diagram illustrating a host 600 that includes a machine-readable medium encoded with instructions to receive to a reboot notification according to an example implementation. In some implementations, the host 600 may be or may form part of the host 102 of FIG. 1 , and may be useful for performing at least one of the steps of method 400 of FIG. 4. In some implementations, the host 600 can form part of a laptop computer, a desktop computer, a workstation, a server, a mobile phone, a tablet computing device, and/or other electronic device. In some implementations, the host 600 can include at least one host port and can be communicatively coupled through the host port to a SAN (e.g., the SAN 100 of FIG. 1 ) and its fabrics, and more particularly, to switches (e.g., the switches 130, 132, 134, 140, and/or 142), to storage devices (e.g., storage device 1 10 and/or storage device 120), and/or to logical units presented by the storage devices (e.g., logical unit 1 16 and/or logical unit 126). [0043] In some implementations, the host 600 is a processor-based system and may include a processor 602 coupled to a machine-readable medium 604. The machine-readable medium 604 may be encoded with a set of executable instructions 606, 608, 610, 612, and 614. Additionally or alternatively, the processor 602 may include electronic circuitry for performing the functionality described herein, including the functionality of instructions 606, 608, 610, 612, and/or 614. In some implementations, the host 600 may include a multipathing module that manages SAN fabric information, such as the address of logical units with which the host 600 can communicate, and the routing paths available between the host 600 and those logical units. In some implementations, at least one of the instructions 606, 608, 610, 612, and/or 614 may be included in the multipathing module.

[0044] Instructions 606 may receive a reboot notification from a switch of a storage area network fabric indicating that the switch intends to reboot. The reboot notification may also include a port identifier (a port ID) affected by the reboot. If the host 600 is local to the switch, the port ID may be a host port ID in some implementations. If the host 600 is remote to the switch, the port ID may be a storage device port ID targeted by the host 600 in some implementations. In some implementations, instructions 606 may be useful for performing block 404 of method 400. Instructions 608 may identify a logical unit accessed by the host 600 via the port ID (e.g., the port ID included in the reboot notification received by instructions 606). In some implementations, instructions 608 may be useful for performing block 406 of method 400. Instructions 610 may determine whether a redundant path is available for the host 600 to access an identified logical unit (e.g., the logical unit identified by instructions 608) without using the switch that intends to reboot. In some implementations, instructions 610 may determine whether a redundant path is available by calculating a number of remote active paths by subtracting a number of paths to the logical unit via the rebooting switch from a number of total active paths between the host 600 and the logical unit. In the foregoing calculation, the number of remote active paths being one or greater indicates that at least one redundant path is available for the host 600 to access the logical unit. In some implementations, instructions 610 may be useful for performing block 408 of method 400. Instructions 612 may send a response to the rebooting switch based on the whether redundant paths are determined to be available by, for example, instructions 610. In some implementations, instructions 612 may be useful for performing block 410 of method 400.

[0045] FIG. 7 is a block diagram illustrating a storage device 700 that includes a machine-readable medium encoded with instructions to receive a reboot notification. In some implementations, the storage device 700 may be or may form part of the storage devices 1 10, 120 of FIG. 1 , and may be useful for performing at least one of the steps of method 500 of FIG. 5. In some

implementations, the storage device 700 can form part of a laptop computer, a desktop computer, a workstation, a server, a mobile phone, a tablet computing device, and/or other electronic device. In some implementations, the storage device 700 may present a logical unit (e.g., the logical unit 1 16 of the storage device 1 10 or the logical unit 126 of the storage device 120). In some

implementations, the storage device 700 can include at least one storage device port and can be communicatively coupled through the storage device port to a SAN (e.g., the SAN 100 of FIG. 1 ) and its fabrics, and more particularly, to switches (e.g., the switches 130, 132, 134, 140, and/or 142) and to host(s) (e.g., the host 102).

[0046] In some implementations, the storage device 700 may be a processor- based system and may include a processor 702 coupled to a machine-readable medium 704. The machine-readable medium 704 may be encoded with a set of executable instructions 706, 708, and 710. Additionally or alternatively, the processor 702 may include electronic circuitry for performing the functionality described herein, including the functionality of instructions 706, 708, and/or 710.

[0047] Instructions 706 may receive a reboot notification from a switch of a storage area network fabric. The reboot notification may indicate that the switch intends to reboot and may include an affected host port identifier (affected host port ID) in the SAN fabric. In some implementations, instructions 706 may be useful for performing block 504 of method 500. Instructions 708 may identify a logical unit mapped to the affected host port ID. In some implementations, instructions 708 may be useful for performing block 506 of method 500. Instructions 710 may determine another host port ID on the same host as the affected host port ID, where the another host port ID is mapped to the logical unit. The another host port ID may be deemed to form part of a redundant SAN fabric. In some implementations, instructions 710 may be useful for performing block 508 of method 500.

[0048] FIG. 8 is a block diagram illustrating a storage device 800 that includes a machine-readable medium encoded with instructions to send a reboot approval or to send a reboot disapproval. The storage device 800 may be analogous to the storage device 700 in many respects. As with the storage device 700, the storage device 800 may be a processor-based system and may include a processor 802 coupled to a machine-readable medium 804. The machine- readable medium 804 may be encoded with a set of executable instructions 806, 808, 810, and 812. Additionally or alternatively, the processor 802 may include electronic circuitry for performing the functionality described herein, including the functionality of instructions 806, 808, 810 and/or 812. The storage device 800 may be or may form part of the storage devices 1 10, 120, and may be useful for performing at least one steps of method 500.

[0049] Instructions 806 may ping an another host port ID (e.g., the another host port ID determined by instruction 710 of storage device 700) to verify if the another host port ID is reachable by the storage device 800. In some

implementations, instructions 806 may be useful for performing block 512 of method 500. Instructions 808 may send a reboot approval to the switch if each another host port ID of a plurality of another host port IDs associated with the storage device 800 is verified by ping to be reachable by the storage device 800. In some implementations, instructions 808 may be useful for performing block 514 and/or block 516 of method 500. Instructions 810 may send a reboot disapproval to the switch if at least one another host port ID of the plurality of another host port IDs cannot be verified by ping to be reachable by the storage device 800. In some implementations, instructions 810 may be useful for performing block 514 and/or block 518 of the method 500. Instructions 812 may send a reboot disapproval to the switch if the storage device 800 cannot identify another host port ID being on the same host as the affected host port ID and being mapped to the logical unit. In some implementations, instructions 812 may be useful for performing block 510 and/or block 518 of the method 500.

[0050] In view of the foregoing description, it can be appreciated that switches seeking to reboot can first coordinate with hosts, storage devices, and/or logical units connected to the switch in a SAN before rebooting, to determine if redundant SAN fabric is available for those hosts, storage devices, and/or logical units. Accordingly, such a system of switches, hosts, storage devices, and/or logical units may possibly minimize disruptions and downtime due to unmitigated switch reboots. Users, such as IT professionals and network administrators, may thus be less concerned about hosts losing access to logical units, and vice versa, when a switch initiates a reboot. Moreover, because a switch may not reboot in some implementations unless redundant SAN fabric is available, system availability and uptime (e.g., as expressed by a "number of nines") may improve, which in turn may allow for competitive service level agreements. Additionally, by virtue of the foregoing description, users may be made aware of connections between hosts and logical units that do not have redundant SAN fabric, and users can thus act to establish redundant SAN fabric. Furthermore, by virtue of using a SAN protocol message (e.g, RSCN or Ethernet multicast messaging) to coordinate between the switch and other SAN components, the coordination may be implemented on at least some existing infrastructure at minimal additional expense.

[0051 ] In the foregoing description, numerous details are set forth to provide an understanding of the subject matter disclosed herein. However,

implementation may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the following claims cover such modifications and variations.