Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A SYSTEM FOR RAPID REMOTE MANAGEMENT OF EQUIPMENT
Document Type and Number:
WIPO Patent Application WO/2006/047583
Kind Code:
A2
Abstract:
This disclosure relates for techniques for restoration/restarting /rebooting of computer system hardware, or the like, from a remote location. When a remote system (20) goes offline, it often takes its primary communication off line with the system failure. By providing an independent controller (20) at the remote site, which has an in-band and out of band communication link, the failure ca be reported to a distance surveillance system (37) despite of the system failure. In addition, the controller an sequentially restart the devices of the remote system in the order and within the time spacing required to insure that the system will operate at restart. Consequently remote systems can both report and correct failures without on-site intervention.

Inventors:
SWANSON DAN (US)
Application Number:
PCT/US2005/038542
Publication Date:
May 04, 2006
Filing Date:
October 26, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
WICOM TECHNOLOGIES (US)
SWANSON DAN (US)
International Classes:
H01H35/00
Foreign References:
US20020161868A1
US20040059903A1
US20040190699A1
US20030023843A1
US20030233463A1
Attorney, Agent or Firm:
LASKY, Michael B. et al. (LLC6500 City West Parkway,Suite 10, Minneapolis MN, US)
Download PDF:
Claims:

Claims:

1. A method of managing remote computer equipment from a surveillance computer at one location and at least one remote device at a distant location and a controller at said at least one distant location, comprising the steps of: a. sensing a plurality of conditions proximate the remote device; b. transmitting the sensed conditions to the controller independently of the remote device; c. attempting to transmit sensed conditions from said controller to the surveillance computer via an available primary in-band network ; d. determining if the transmission was successfully received by the surveillance computer and, if not, transmitting said sensed conditions on an available alternate out of band network.

2. The method of claim 1 further including the step of sensing whether the remote device is operational and if not, rebooting the device.

3. The method of claim 2 wherein said remote device includes a plurality of linked devices which are independently rebootable, and wherein the method further includes the steps of : a. Determining an acceptable restarting sequence for rebooting each device such that it will intemperate with the other devices when all have been rebooted; b. Establishing a predetermined wait time for each device after restarting before restarting the next device in sequence; c. lssuancing commands from the controller to restart the devices, in the acceptable sequence with the predetermined wait times inserted between each command.

4. The method of claim 3 further including the step of monitoring at least one device to determine if it has successfully restarted.

5. The method of claim 4 wherein said monitoring step includes transmitting a command to a device next in line after the at least one device and monitoring said device to determine if the command was received at said next device through said monitored device, thereby indicating that the monitored device must be functioning.

6. The method of claim 5 wherein said monitoring step includes pinging said next device in line and monitoring the response received.

7. The method of claim 4 wherein said monitoring includes determining if said device has successfully restarted by measurement indirect environmental characteristics adjacent the device, but without direct contact with the device.

8. The method of claim 7 wherein said indirect determination includes sensing parameters indicative of a successful restart of the device.

9. The method of claim 8 wherein said parameters include measuring a parameter of the device which indicate a successful operation condition after reboot, storing said parameter in said remote controller, comparing said stored parameter to monitored conditions and if said parameter is achieved, reboot the next device in line until all devices are restarted.

10. An intelligent controlled power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices; b. a remote controller for managing the power to said feeds; c. a data table containing the reboot time frame and reboot sequence of each device managed by said power management system; d. a program for powering down and then powering up each power feed and its connected device in the sequence and in the time frame derived from said data table;

so that said devices may be restarted in the order and time sequence necessary to permit interoperability of said devices.

11. The system of claim 10 wherein said data table includes specific time delays required for each device to complete its reboot.

12. The system of claim 11 wherein said specific time delays are derived from direct measurement of the reboot times of each device and are at least as long as such reboot times.

13. The system of claim 11 wherein said specific time delays are determined by using estimated reboot times for each particular device.

14. The system of claim 11 wherein said data table includes the maximum power consumption allowable during reboot and wherein the system further includes a sensor for measuring power consumption and wherein said reboot sequence is timed so that said sequence does not result in power consumption exceeding the maximum allowable.

15. An intelligent controlled power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices; b. a controller for managing the power to said feeds; c. a data table containing the reboot sequence of each device managed by said power management system; d. at least one sensor capable of determining the completion of the reboot of at least some of said devices; e. a program for powering down and then powering up each power feed and connected device in the sequence and in the time frame derived said sensor; so that said devices may be restarted in the order and time sequence required to permit interoperability between said devices.

16. The system of claim 15 wherein said at least one sensor includes a sensor for detecting a quiescent steady state ready condition of a device after reboot.

17. The system of claim 15 wherein said sensor includes an optical sensor for detecting the presences of an indicator light on the sensed device .

18. The system of claim 15 wherein said at least one sensor includes a sensor capable of sensing the delivery of a data packet through the sensed device, thereby ascertaining the operability of said device.

19. An intelligent controlled power management controller for reliably reporting the status of a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality devices; b. a remote controller for managing the power to said feeds; c. at least one sensor capable of determining a characteristic which could affect the operational status of at least one of said devices d. a primary in-band communication network connected to said controller for communicating status information to a distance surveillance system; e. a secondary out of band communications network, separate from and independent of said in-band network; f. a program configured to first send a status report of device operability to said surveillance system via said in-band, and if an acknowledgement of receipt is not received from said surveillance system, then sending said status report via said out of band network.

20. The controller of claim 19 wherein said status report generated in response to said sensor detecting a non operational condition of a device.

21. The controller of claim 19 wherein said device is periodically tested and if a non-operational status of the device is detected, said report is generated.

22. The controller of claim 19 wherein said sensor periodically pings said device to determine its operational status.

23. The controller of claim 19 wherein said sensor determines the temperature of said device and said program compares said temperature to a predetermined maximum temperature and if said sensed temperature exceeds said maximum, an alarm report is issued to said surveillance system.

24. The controller of claim 19 wherein said sensor determines the temperature of environment proximate said device and said program compares said temperature to a predetermined maximum temperature and if said sensed temperature exceeds said maximum, an alarm report is issued to said surveillance system.

25. The controller of claim 19 wherein said sensor monitors the ramp rate of the temperature proximate said device and said program compares said temperature to a predetermined maximum ramp rate increase in temperature and if said sensed ramp rate exceeds said maximum, said controller is configured to issue an alarm report to said surveillance system.

26. The controller of claim 19 wherein said sensor determines the power consumption of said device and said program compares said consumption to a predetermined consumption and if said sensed consumption exceeds said predetermined, an alarm report is issued to said surveillance system.

27. The system of claim 26 wherein said power consumption is measured on a device having a hard drive subsystem and when said consumption exceeds a predetermined value, the controller turns off said drive subsystem and reports same to said surveillance system.

28. The system of claim 26 wherein said power consumption is measured on a device having a hard drive subsystem and when said consumption exceeds a predetermined value, the controller generates a backup copy of said drive subsystem, then turns off said drive subsystem and reports same to said surveillance system.

29. The controller of claim 19 wherein said sensor determines state of closure of a door and when an open door condition is found, an alarm report is issued to said surveillance system.

30. The controller of claim 19 wherein said sensor determines the power consumption of said device and said program compares said predetermined normal power consumption for said device and if said sensed consumption is exceeded, said program will download at least a portion of the contents of said device to a remote backup separate therefrom.

31. The controller of claim 30 wherein said remote device is associated with said surveillance system.

32. A system of a plurality of chainable rebooting power strips capable of controlling line power to a plurality of power feeds comprising: a. a master controllable power strip including a controller for managing the power to said feeds; b. at least one slave controllable power feed strip connectable to said plurality of computer devices; c. at least one sensor connected to said controller for determining the operational status of at least one device powered any of said strips; d. a program for powering down and then powering up each power feed and connected device determined to be non-operational; e. a communications link removably connectable between successive power strips for connecting said master power strip to at least one slave strip in a chain;

so that the controller via said program, can operate any power feed in any chained strip.

33. The system of claim 32 wherein said strips are supplied line power by connection of one power strip to a successive strip, and wherein the communication link between the strips includes communicating over the power line as communication carrier.

34. A method for restoring a frozen software application running over a wide area network in multiple disparate geographic locations, on a system having a main server and a plurality of remote systems, comprising the steps of: a. placing a known data record on the main server; b. from at least one of the remote system, periodically querying the main server with respect to said record, c. if the main server does not reply with the expected response to the data record query issued by a remote system, remotely issuing a reboot command from the remote system to restart at least the application hosting the data record on the main server.

35. The method of claim 34 further including the steps of : a. placing a known data record each of the remote systems; b. from any system, periodically querying the remote systems; c. if any of the remote systems do not reply with the expected record from a query issued , remotely issuing a reboot command to that remote to restart at least the application hosting the record on that remote system.

36. A method of watchdog monitoring a plurality of interconnected computer devices to automatically maintain their operational state, comprising the steps of: a. periodically checking the operation of the each of the devices by sending it a predetermined command and watching for the proper expected response;

b. detecting an improper response in any of the monitored devices; c. determining a necessary sequence in which said devices must be restarted to interoperate correctly; d. restarting at least a least those devices which must be restarted in said sequence; e. reporting the detection of the improper response, and the restart to a remote site.

37. The method of claim 36 wherein the reporting includes the step of attempting to transmit the report on a first in-band communications channel and if that fails, transmitting the report on a second out-of-band channel.

38. A method of managing remote computer equipment from a surveillance computer at one location and at least one remote device at a distant location and a controller at said at least one distant location, comprising the steps of: a. sensing a plurality of conditions proximate the remote device; b. transmitting the sensed conditions to the controller independently of the remote device; c. attempting to transmit sensed conditions from said controller to the surveillance computer via an available primary in-band network ; d. determining if the transmission was successfully received by the surveillance computer and, if not, transmitting said sensed conditions on an available alternate out of band network.

39. A method of managing remote computer equipment from first location to at least one remote device at a distant location comprising the steps of: a. sensing a condition concerning the remote device; b. transmitting the sensed conditions to the first location by means independent of the remote device; c. attempting to transmit sensed condition via an available primary in-band network ;

d. determining if the transmission was successfully received at the first location and, if not, transmitting said sensed condition on an available alternate out of band network.

40. A power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices; b. a schedule of reboot time frames and a reboot sequence of each device managed by said power management system; c. means for powering down and then powering up each power feed and its connected device in the sequence and in the time frame derived from said data table; so that said devices may be restarted in the order and time sequence necessary to permit interoperability of said devices.

Description:

A SYSTEM FOR RAPID REMOTE MANAGEMENT OF

EQUIPMENT

Technical Field

This disclosure relates for techniques for restoration/restarting/rebooting of computer system hardware, or the like, from a remote location.

Background of the Invention

Studies show that 94 percent of businesses suffering a catastrophic data loss due to a power failure, blackout, flood, earthquake, virus, or human error go out of business within two years. Of those companies, 43 percent are forced to close at the time of the event, while 51 percent continue to operate for another two years before closing.

We often condition ourselves into thinking the only thing that can be done with disasters is to create plans on how to recover as quickly as possible. There are whole industries built up around disaster recovery all the way from disaster planning consultants, to magazines, to newsletters, to hot backup sites.

Strategies range from simple agreements with other companies to use each other's equipment in case something happens to full blown hot backup site contracts with disaster plans that are practiced on a regular basis. The ultimate in contingency planning is a completely redundant self-healing design so that in the event of a disaster, the system rolls over to the backup system, seamlessly, without users even being aware of what happened.

The costs of designing, purchasing, operating and maintaining completely redundant sites is so high the most companies do not implement these designs.

So getting back to what is possible with companies tight budgets, the goal should be to proactive and prevent disasters whenever possible and recover as rapidly as possible. The leading cause for data loss is Power Failure or a Power Surge (45.3%). While this has become common sense to most people and they have purchased Uninterruptible Power Supplies (UPS) to keep there systems going when momentary power outages occur. What they fail to do is purchase the very expensive diesel generator backups that can sustain a long term outage of electricity. So if the UPS is only good for 1 or 2 hours, they will soon be without power in less than 120 minutes.

If on the other hand you were notified when the power feeding the UPS had failed, you may be able to take some immediate steps to restore power quickly. Sometimes the cause for the outage can be a simple accounting mistake and a simple phone call to the utility company will get power restored right away.

The third highest cause for data loss is fire or explosion (8.2%). While most companies have some sort of burglar alarm many fail to have smoke alarms or fire alarms so things can burn up before anyone is aware of it. Even those with some sore of fire detection often only have those fire cone detectors that must melt in the heat of a fire before releasing the water or halon to extinguish the flames.

On the other hand, a simple temperature probe that would alarm when the temperature in cabinets started warming up could give extra minutes or hours before disaster actually strikes. Often just shutting down the equipment that is warming up is all that is needed to prevent a full blown, fire gutting disaster in your computer room.

The fifth most common cause of lost data is flood and water damage (6.7%). We immediately think of the massive flooding caused by hurricanes or torrential rain. While little can be done when that level of intensity occurs other than have a contingency site geographically distant from that. But so much of

flooding and water damage occurs when pipes leak or break. These pipes could be in the ceiling or walls, causing the water to accumulate at ever higher levels, eventually shorting out some piece of equipment or wiring, rendering the data center useless.

On the other hand if the water level was detected when it was not harmful and the appropriate people were dispatched, the water could be turned off or the ruptured pipe repaired before turning the whole mess into a first class disaster.

Another condition is when there are certain conditions that if they could be known when they occur, could have some proactive action taken. This would be things like knowing when the cover of a equipment box is opened. This could mean someone was tampering or stealing equipment. Or maybe the temperature has exceeded the recommended operating temperatures of the installed equipment, or the moisture level has risen to the point of getting ready to short circuit the equipment, the automatic detection and reporting of this condition could be valuable, preventing damage or letting a responsible person that action needs to be taken. In addition, there are some remedial actions which for forestall or perhaps even prevent greater disasters from occurring. To accomplish this is would be necessary to know the precursor conditions remotely and have the ability to do something about them.

One solution would be:

- Having a live technician constantly on the site

- Sending out a technician to manually cycle the power on a system

Neither of these are satisfactory in many situations because of cost or speed of response.

As you can see given the right tools, sensors and alarm procedures more can be done to prevent problems from becoming full blown disasters by early detection and reporting of problems by direct or indirect measurements which could signal trouble.

Even having the capability to provide early warning of problems may not be enough. But what would happen if the disaster cuts off the pathway for early warning? Then the early warning is never transmitted and a full scale disaster may ensue.

The present invention addresses methodologies for early detection and reporting which overcome prior art solutions. The full scope of the invention can be determined from the claims, with specific exemplary embodiments detailed below.

Summary

The following summary is part of the overall disclosure but is not intended to replace a full reading of the entire disclosure. Likewise it is not a substitute for the claims which define the scope of the invention. It is a convenience to assist in understanding the invention.

There are many aspects to this invention.

For example, one aspect of the invention is a method of managing remote computer equipment from a surveillance computer at one location and at least one remote device at a distant location and a controller at said at least one distant location, comprising the steps of: a. sensing a plurality of conditions proximate the remote device; b. transmitting the sensed conditions to the controller independently of the remote device; c. attempting to transmit sensed conditions from said controller to the surveillance computer via an available primary in-band network ; d. determining if the transmission was successfully received by the surveillance computer and, if not, transmitting said sensed conditions on an available alternate out of band network.

The term surveillance system or computer is used as a convenience and is to be broadly interpreted as some form on monitoring system whether a computer or not. The term network, is also a convenience for a broad concept intended to be any communication channel or link.

It can also be further include the aspect of a method further including the step of sensing whether the remote device is operational and if not, rebooting the device.

Note that restarting and rebooting are used interchangeably to mean reinitializing the device so that it is refreshed and ready to operate. This can be a soft boot, cold boot or similar.

It can also be further include the aspect of a the remote device includes a plurality of linked devices which are independently rebootable, and wherein the method further includes the steps of: a. Determining an acceptable restarting sequence for rebooting each device such that it will interoperate with the other devices when all have been rebooted; b. Establishing a predetermined wait time for each device after restarting before restarting the next device in sequence ; c. Issuing commands from the controller to restart the devices, in the acceptable sequence with the predetermined wait times inserted between each command.

The term interoperate or communicate is meant to have the broad meaning of being able to function or work together in the manner the device(s) is suppose to function when it is operating correctly, i.e. not locked or frozen.

It can also be further include the aspect of further including the step of monitoring at least one device to determine if it has successfully restarted.

Successfully restarting means that it will operate as intended, as opposed to being frozen or locked because of some fault which is normally curable with a restart.

A further aspect of the invention may include where monitoring step includes transmitting a command to a device next in line after the at least one device and monitoring said device to determine if the command was received at said next device through said monitored device, thereby indicating that the monitored device must be functioning.

A further aspect of the invention may include a method wherein said monitoring step includes pinging said next device in line and monitoring the response received.

The term ping is to be broadly interpreted as querying the device by any means and receiving some form of response.

A further aspect of the invention may include It can also be further include the aspect of a method of claim 4 wherein said monitoring includes determining if said device has successfully restarted by measurement indirect environmental characteristics adjacent the device, but without direct contact with the device.

Indirect environmental characteristics is to be defined broadly and many examples are provided in the detailed description. It also means that an aspect of the device, can be measure and extrapolated to indirectly determine if it is functioning properly .

A further aspect of the invention may include a method wherein said indirect determination includes sensing parameters indicative of a successful restart of the device.

A further aspect of the invention may include a method of wherein said parameters include measuring a parameter of the device which indicate a successful operation condition after reboot, storing said parameter in said

remote controller, comparing said stored parameter to monitored conditions and if said parameter is achieved, reboot the next device in line until all devices are restarted.

The term parameter is intended to be interpreted broadly as an aspect of the device which can be detected.

A further aspect of the invention may include an intelligent controlled power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices: b. a remote controller for managing the power to said feeds; c. a data table containing the reboot time frame and reboot sequence of each device managed by said power management system; d. a program for powering down and then powering up each power feed and its connected device in the sequence and in the time frame derived from said data table;

so that said devices may be restarted in the order and time sequence necessary to permit interoperability of said devices.

The term data table is not be read literally as a" table" or tabular as such, but a compilation of data in some readable form.

A further aspect of the invention may include a system wherein said data table includes specific time delays required for each device to complete its reboot.

It is understood that the time delay is not an absolute number but a time sufficient to know that the restart is complete. It can be longer of course.

A further aspect of the invention may include a system wherein said specific time delays are derived from direct measurement of the reboot times of each device and are at least as long as such reboot times.

A further aspect of the invention may include a system wherein said specific time delays are determined by using estimated reboot times for each particular device.

A further aspect of the invention may include a system wherein said data table includes the maximum power consumption allowable during reboot and wherein the system further includes a sensor for measuring power consumption and wherein said reboot sequence is timed so that said sequence does not result in power consumption exceeding the maximum allowable.

Power consumption is interpretable as any aspect of power, including voltage change or current fluctuation.

A further aspect of the invention may include a intelligent controlled power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices: b. a controller for managing the power to said feeds; c. a data table containing the reboot sequence of each device managed by said power management system; d. at least one sensor capable of determining the completion of the reboot of at least some of said devices e. a program for powering down and then powering up each power feed and connected device in the sequence and in the time frame derived said sensor;

so that said devices may be restarted in the order and time sequence required to permit inter-operability between said devices.

Power feed are to be interpreted broadly. It is any means to supply power to a device. A electric socket is merely one such structure.

A further aspect of the invention may include a system wherein said at least one sensor includes a sensor for detecting a quiescent steady state ready condition of a device after reboot.

Quiescent is similar to a steady state condition preferably when no event is occurring on the device which change the readings.

A further aspect of the invention may include a system wherein said sensor includes an optical sensor for detecting the presences of an indicator light on the sensed device .

The term light is to considered broadly and not merely as an illumination source but a radiation source which is detectable externally.

A further aspect of the invention may include a system wherein said at least one sensor includes a sensor capable of sensing the delivery of a data packet through the sensed device, thereby ascertaining the operability of said device.

A data packet is a term of convenience, but should be read broadly to include any message or information which can be delivered to the device.

A further aspect of the invention may include a intelligent controlled power management controller for reliably reporting the status of a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality devices: b. a remote controller for managing the power to said feeds; c. at least one sensor capable of determining a characteristic which could affect the operational status of at least one of said devices

d. a primary in-band communication network connected to said controller for communicating status information to a distance surveillance system; e. a secondary out of band communications network, separate from and independent of said in-band network; f. a program configured to first send a status report of device operability to said surveillance system via said in-band, and if an acknowledgement of receipt is not received from said surveillance system, then sending said status report via said out of band network.

A further aspect of the invention may include a controller wherein said status report generated in response to said sensor detecting a non operational condition of a device.

A non-operational condition should be interpreted broadly to mean any condition where the device is not able to function as intended.

A further aspect of the invention may include a controller wherein said device is periodically tested and if a non-operational status of the device is detected, said report is generated.

A further aspect of the invention may include a controller wherein said sensor periodically pings said device to determine its operational status.

A further aspect of the invention may include a controller wherein said sensor determines the temperature of said device and said program compares said temperature to a predetermined maximum temperature and if said sensed temperature exceeds said maximum, an alarm report is issued to said surveillance system.

A further aspect of the invention may include a controller of wherein said sensor determines the temperature of environment proximate said device and said program compares said temperature to a predetermined maximum

temperature and if said sensed temperature exceeds said maximum, an alarm report is issued to said surveillance system.

A further aspect of the invention may include a controller wherein said sensor monitors the ramp rate of the temperature proximate said device and said program compares said temperature to a predetermined maximum ramp rate increase in temperature and if said sensed ramp rate exceeds said maximum, said controller is configured to issue an alarm report to said surveillance system.

A further aspect of the invention may include a controller wherein said sensor determines the power consumption of said device and said program compares said consumption to a predetermined consumption and if said sensed consumption exceeds said predetermined, an alarm report is issued to said surveillance system.

A further aspect of the invention may include a system wherein said power consumption is measured on a device having a hard drive subsystem and when said consumption exceeds a predetermined value, the controller turns off said drive subsystem and reports same to said surveillance system.

A further aspect of the invention may include a system wherein said power consumption is measured on a device having a hard drive subsystem and when said consumption exceeds a predetermined value, the controller generates a backup copy of said drive subsystem, then turns off said drive subsystem and reports same to said surveillance system.

The term hard drive is to be interpreted broadly as a storage device of any kind.

A further aspect of the invention may include a controller wherein said sensor determines state of closure of a door and when an open door condition is found , an alarm report is issued to said surveillance system.

A further aspect of the invention may include a controller wherein said sensor determines the power consumption of said device and said program compares said predetermined normal power consumption for said device and if said sensed consumption is exceeded, said program will download at least a portion of the contents of said device to a remote backup separate therefrom.

A further aspect of the invention may include a controller wherein said remote device is associated with said surveillance system.

A further aspect of the invention may include a system of a plurality of chainable rebooting power strips capable of controlling line power to a plurality of power feeds comprising: a. a master controllable power strip including a controller for managing the power to said feeds; b. at least one slave controllable power feed strip connectable to said plurality of computer devices: c. at least one sensor connected to said controller for determining the operational status of at least one device powered any of said strips; d. a program for powering down and then powering up each power feed and connected device determined to be non-operational; e. a communications link removably connectable between successive power strips for connecting said master power strip to at least one slave strip in a chain; so that the controller via said program, can operate any power feed in any chained strip.

The term strip is merely one of convenience and means any physical form which can contain multiple power feeds. A strip is a typical form factor, but not a limitation thereof.

A further aspect of the invention may include a system wherein said strips are supplied line power by connection of one power strip to a successive strip,

and wherein the communication link between the strips includes communicating over the power line as communication carrier.

A further aspect of the invention may include a method for restoring a frozen software application running over a wide area network in multiple disparate geographic locations, on a system having a main server and a plurality of remote systems, comprising the steps of: a. placing a known data record on the main server b. from at least one of the remote system, periodically querying the main server with respect to said record, c. if the main server does not reply with the expected response to the data record query issued by a remote system, remotely issuing a reboot command from the remote system to restart at least the application hosting the data record on the main server.

A further aspect of the invention may include a method further including the steps of: a. placing a known data record each of the remote systems b. from any system, periodically querying the remote systems, c. if any of the remote systems do not reply with the expected record from a query issued , remotely issuing a reboot command to that remote to restart at least the application hosting the record on that remote system.

A further aspect of the invention may include a method of watchdog monitoring a plurality of interconnected computer devices to automatically maintain their operational state, comprising the steps of: a. periodically checking the operation of the each of the devices by sending it a predetermined command and watching for the proper expected response; b. detecting an improper response in any of the monitored devices; c. determining a necessary sequence in which said devices must be restarted to interoperate correctly;

d. restarting at least a least those devices which must be restarted in said sequence; e. reporting the detection of the improper response, and the restart to a remote site.

Suffice it to say that a watchdog system is a colloquial term meaning surveillance or supervision and will not likely include 4 pawed animals.

A further aspect of the invention may include a method wherein the reporting includes the step of attempting to transmit the report on a first in-band communications channel and if that fails, transmitting the report on a second out-of-band channel.

A further aspect of the invention may include a method of managing remote computer equipment from a surveillance computer at one location and at least one remote device at a distant location and a controller at said at least one distant location, comprising the steps of: a. sensing a plurality of conditions proximate the remote device; b. transmitting the sensed conditions to the controller independently of the remote device; c. attempting to transmit sensed conditions from said controller to the surveillance computer via an available primary in-band network ; d. determining if the transmission was successfully received by the surveillance computer and, if not, transmitting said sensed conditions on an available alternate out of band network.

A further aspect of the invention may include a method of managing remote computer equipment from first location to at least one remote device at a distant location comprising the steps of: a. sensing a condition concerning the remote device; b. transmitting the sensed conditions to the first location by means independent of the remote device;

c. attempting to transmit sensed condition via an available primary in-band network ; d. determining if the transmission was successfully received at the first location and, if not, transmitting said sensed condition on an available alternate out of band network.

A further aspect of the invention may include power management system for controlling a remote system having a plurality of independently rebootable devices comprising: a. a plurality of controllable power feeds connected to said plurality of devices: b. a schedule of reboot time frames and a reboot sequence of each device managed by said power management system; c. means for powering down and then powering up each power feed and its connected device in the sequence and in the time frame derived from said data table; so that said devices may be restarted in the order and time sequence necessary to permit interoperability of said devices.

Brief Description of the Figures

The appended figures are for purposes of illustrating embodiments of the invention and together with the written text, form the invention disclosure:

Figure 1 is a schematic view of one embodiment of invention;

Figure 1a is schematic view of the remote location in Fig. 2.;

Figure 2 is a series of waveform traces from LEDs;

Figure 3 is a schematic view of an other aspect of the invention;

Figure 4 is a view of a out-of-band communication path;

Figure 5 is a block diagram view of an embodiment with a complete controller at a remote site;

Figure 6 is a block diagram view of a controller in card form;

Figure 7 is a block diagram view of a radio board for an embodiment of a controller;

Figure 8 is a block diagram view of a serial card for an embodiment of a controller;

Figure 9 is a block diagram view of a relay board for an embodiment of a controller;

Figure 10 is a representation of one embodiment of a controller in power strip form factor;

Figure 11 is a representation of another embodiment of a controller in stacked power strip form factor; and

Figure 12 is a representation of a daisy chain of power strips with a master and slave controllers.

Detailed Description

Detection of risk factors and a rapid recovery system to get computers, systems, equipment and communications gear back online as quickly as possible is achieved by : 1) use of direct sensing, 2) use indirect sensing, 3) use of in-band and out- band signaling to transmit sensed conditions and 4) restart procedures designed to insure that multi-layered system start in the correct sequence to work properly.

This means that using a remote control signaling system (which we refer as in-band) may allow for remote restart/recovery, but often the cause of

computer failure will also cause the in-band connection to be lost as well. For example, if in-band is an internet connection, a flood or a/c failure would likely cause that connection to be lost also. Thus a secondary connection, we call out-of-band, is necessary for truly secure communications linkage.

For purposes of this disclosure, in-band and out-of-band (or out-band, i.e. independent of the method of transmission for in-band, including a secondary method identical to the in-band, but not on the same circuits, wires, cables, wireless link, satellite, etc.) refers to alternate communications channels, preferably by wholly independent channels which would not be affected by the failure of one relative to the other. The most preferred in-band could be the internet via cable, dsl, T-1 fiber, etc, and the out-of-band could therefore be wireless, satellite, POTS (dialup), etc. Other mixtures are possible such as DSL in-band and Cable modem out-of-band since they tend not to be on the same "last mile" circuits (the last few meters from the utility pole/connection to the building), but may be commingled farther downstream. The concept is to assume that one of the systems (in-band) is cheaper and/or faster and/or more easily available, but that it may be susceptible to failure for the same reasons as the systems which are being monitored, such as fire, water, collapse, etc. Thus the out-of-band system may be slower and/or expensive but not needed for long periods of time or require heavy bandwidth because their primary purpose is to restart equipment or report conditions or both and perhaps even restart the in-band system.

Restart sequencing (or sequential booting) capability is vitally important. Many system cannot be started in the wrong order or they will not link properly. The problem may be that existing systems do not have outputs which give an indication of their actual reboot / restart state.

There are several reasons it is important to sequentially reboot systems. One is for simple power consumption. If one rebooted all of the equipment on a circuit at once, it will likely blow the circuit breaker. The reason for this is that upon startup, equipment will often consume 5 to 10 times as much current as in steady state operation. This is due to charging up capacitors in power

supplies, spinning up disk drives, starting fans, etc.

Another important reason for a sequential reboot is the order that certain equipment comes up is very important. For instance, if you are using the DHCP function of a router, where it give devices a valid IP address upon startup, if the devices looking for a valid IP address don't find it in time, it will time out and the machine will become inaccessible as most types of equipment don't re-try indefinitely but will instead timeout.

The present invention provides multiple ways of determining how to properly time the reboot sequence though more are possible within the spirit of this invention. The simplest is to set (program) a fixed delay between devices. These can be customized by port so you can allow the router, for example, 2 minutes to reboot before allowing the next device which might be a database server. In this way one would want to provide more time because of the more complex operating system. While this will work it is not the optimal timing and care has to be taken to allow for systems to slow down it is not unusual for a windows server to take longer and longer to boot up as more and more applications are added to the system.

So in addition to a specific time delay be device we can do the following: a. Have a sensor that measures the various blinking patterns or ready light patterns so that when the right LED is lit, the next device can be rebooted. It is necessary to either know (from documentation) or observe the LED sequences to be able to determine which sequences mean what. b. Many devices will allow users to add functionality to the startup sequence such as adding commands to a batch file used for startup. In this case a specific application can send a packet over IP to the controller indicating that it is ready for operation and it is OK to reboot the next device in line (i.e. that this device is now stable). c. The controller can make inquiries of the device once it gets the right response from the device it knows to reboot the next device

in line. So this response could be anywhere from a database lookup response on a database server to a status check on a router. These inquires can be over a serial port but preferably over a tcp/ip connection. A router can be checked by pinging a device on the other side of the router and when it becomes available the routing functions are there and it will be safe to boot the next device. d. If the device doesn't respond within a certain amount of time, the controller can be programmed to recycle power one more time just in case the reboot didn't go quite well. e. If after several tries a system doesn't come up and function properly, an alarm notification can be sent to the controller describing which device failed which will send over in-band if available and out of band, if not. f. The current going through the power socket can be measured to see if it has reached its quiescent current. Once the current has been stable on a device for more than X seconds it has usually settled down awaiting instructions. g. Not only does the system reboot in the right order, with minimal timing but it also can do this across controllers at different locations. The server can synchronize events like sequential rebooting across the world. So if rebooting the main router in the home city must happen before rebooting the systems in various other cities, it can be synchronized anywhere. h. The reboot request can be sent through any of the communication links that a controller is connected to including TCP/IP. So if a device can be rebooted without cycling power, the signal can be sent from the controller or server directly to the device. i. If a system has the power recycled and still doesn't work, it means that it simply wasn't frozen but has more systemic issues. The system can report this condition as well so a service technician knows what system or part to bring for repair/replacement.

If a system is still functioning enough that a resident program can be communicated with, the resident program can reboot the computer or application based upon a signal from the controller or server.

Therefore it is important to provide the option of indirect sensing of conditions in addition to direct sensing. Direct sensing of heat/water/humidity/fire/smoke/temp etc is largely measurable directly, though sometimes in may have to be by indirect sensing. For example, the heat of a server room will tell one if the A/C is running properly but not if one server is overheated. A temp sensor adjacent the output fan of a server, once calibrated for normal temp, will tell if one machine is laboring, and indirectly, perhaps that the entire server room is too warm (possible A/C failure). The sensor can be absolute/actual temperature or ramp rate sensing where changing in temp are more important that actual temp.

Current draw sensing is usually an indirect method of determining such things as: bad drive bearings on mechanical systems (fans, drives) and will permit early warning of future problems (seized bearings, fires).

While much of this specification is directed toward detection of failure conditions, of equal importance is prevention and prediction of failures. This is more difficult to achieve, but this invention is the unique ability to do so.

As mentioned above, sensing of current draw of a system can indicate many things. For example, is the controller's sensors monitor a server's current draw over a long period of time and store that data (preferably) locally, it will be known when a system is drawing ever increasing amounts of current even in a quiescent state. This will likely indicate a mechanical parts failure is looming. In the case of a hard drive, wear on the bearing will increase current draw in a quiescent state over time so sampling and storing data will provide critical information which can be used to detect and predict failures. A failing power supply will show increasing quiescent and maximum current draws increasing over time.

With this information, the remote (or surveillance) system can issue a warning to the system manager to take action.

Even more importantly, the remote or surveillance system can be programmed to preemptively shut down a system likely to fail to save the data from total destruction (hard drive crash or fire for example.)

In the preferred embodiment, measuring current at the controller power strip socket (see Figure 10) would provide a plug and play (PNP) solution. The system would be configured to monitor current draw from that socket, create a data table of current vs. time and calculate changes (deltas) for quiescent and peaks. A predetermined trigger value would be created over a short time as the do-not-exceed current and if exceeded, an alarm condition would be reported and/or action taken to shut down the system. In addition, the system could be configured to instruct the failing computer to download its data on the in/out band to a back up location /server before shut down. This would allow a WAN to have central redundancy without having to co-locate spare equipment at every location.

Prevention can also sometimes be accomplished without human on-site intervention. For example, if a sensor detects an overheat condition, whether a failed a/c unit or an overheating computer, the remote system or the surveillance system could activate a room fan, open by pass ductwork to the outside air (if cooler outside by temp sensing or meteorological reports). A fire condition might be quenched by Halon gas release. Other methods of automatic prevention are possible with the sensing and remote control capabilities of the system.

Detection of faults is a further problem solved by this invention.

Fault tolerant systems are know but they are often fooled by tests like PING because that part of the code is still working but the application has really failed.

So rather than wait for a user to discover the problem, the present invention reports the problem and then take corrective action. It can also detect the fault automatically and either automatic or manual corrective action taken as soon as detected, thus achieving virtual fault tolerance.

The system tests tcp/ip based devices at an application level and then reporting those systems that don't respond. The controller can either automatically take corrective action or send a notification to someone who will.

Several examples of application testing can be: a. Sending a tcp/ip request to a device on another network and checking the results. If the expected results were obtained, the router is still functioning, if the expected results were not returned, it could mean either the server or the remote system were having a problem but now an network administration can be notified and take appropriate action. b. Sending a request for a web page to a web server. By checking for some unique content on a specific page, the server can be checked for proper functionality. c. Sending a request for a certain database lookup on a database server checking the results with the expected results will determine if it is working properly. d. Most TCP/IP devices can be tested in a similar way consistent with this invention.

While this functionality is available from public systems, they would violate security policies to have a system outside their firewall testing systems inside the firewall. The present system is preferably a "trusted" system and doesn't require expensive SNMP Management software packages, it provides a unique solution for secure data prevention, detection and rapid recovery.

Another purpose of current draw sensing is for billing. If billing is partly based on energy consumption, if a server farm has exceeded its allowed power usage, a report to the billing entity can be obtained in real time and without visiting the site. In fact, the potential for this system to be applied to billing and another revenue enhancement purposes is equally important as its disaster prevention/recovery usage.

An audible sensor can also be useful especially when programmed to "hear" i.e. respond to certain patterns. A UPS (uninterruptible power supply) will contain its own audible alarm. If it is impossible to interface with the UPS ports directly, knowing the alarm "beep" sequence can be detected and reported as a UPS alarm and perhaps prevent a shutdown within the back up time.

The reboot state of a computer (or worse yet, a or other sub component ) is harder to measure directly, but if there is a "ready" or status LED/iight, it may be enough to know that the light is illuminated/flashing/etc to be certain that that part of the re-boot sequence is done and the system may proceed to the next device in the chain. Another approach to sequencing it to "ping" devices as them com on line to know that they work. If a router, switch and server must be rebooted in that order, it is possible to ping the router, and later the server. The switch may be to unintelligent to interrogate, but because the router is up and the server is up, the switch must be ok. The initial programming of this system requires that, no only must the sensor be properly placed, but tests must be run on the appropriate delay timing each device.

Another effective method of detecting if a component is working correctly and how to know if the reboot process is complete, it to attempt to communicate wit the next successive device further down the chain where the signal must pass through the suspect (or rebooted) device.

For example, in the case of a router or a firewall, a packet of data can be sent to the controller/management computer 37 (see figure 1) through firewall 35 or a router (not shown). If the controller /management computer 37 sends

back a proper anticipated response, then it is clear that the router is working. If not, either the router is not working, the reboot sequencing was wrong, or not enough time was given to the router to finish its reboot.

The system can be rebooted/resequenced or just try again a bit later. This is all programmable.

Another form of resequencing which may be critical is in wide area WAN or other geographically disparate systems

For example if a data base server is located in one city goes down, the remote terminals or locate intra-nets in other cities which depend on the data server will likewise be downed but they may not know it until a new query is sent. During quite periods, early morning 2am -5am, some systems are not queried yet they are dead. If this was known, it would be possible to fix them during quite times instead of waiting for morning "data rush hour" to reveal the problem.

Likewise if the remote sites are down, they cannot know when the main data server will be up again and thus may unnecessarily reboot multiple times in the hope that the main server has now been restored. Anyone who has lost connection with an exchange email server will understand the frustration of this cat and mouse game of "is the server restored yet?"

Since the controller /management server at the remote location is independent of the local terminals or servers, it can repeatedly ping both the main (for example data) sever to determine if it is running properly or "up" and 1) if it find that it is down, it can report that information to the surveillance server and 2) when it finds the main server restored, it can restart the process at its local system or (soft or hard) reboot the local system, whichever seems better according to programmed choices or direction from the surveillance server. A "ping" is a inquiry to the system. There are many types including just looking for the IP address of the target server to running a query on the target computer's program (such as a data base query) and receive a known

reply. This could be done by loading /planting a record on the main server and querying that record for the expected/known answer. This would make the controller or surveillance server's task (either would work) easier because the status of the main server would be objectively determined.

The same pinging method can be done by the surveillance computer of the remote controllers to insure that they are live and working properly.

The traditional approach redundancy is to design, purchase and maintain complete redundancy in all systems and communications networks.

So a typical installation would have redundant routers, redundant data links, redundant database servers, redundant web servers, etc. These are very expensive to design with because of the challenge of making them work in a coordinated effort. Without special software and protocols, having two servers doesn't make them take over each other's functions in case one fails. So it is more complex to create software for, it is more expensive to maintain, it is more expensive to repair. It is easily more than twice as expensive as a non redundant system.

The typical, well executed and well maintained redundant system can get to 5 or fewer minutes of downtime per year.

A seminal discovery of one further aspects of this invention is that most failures are not hardware based. In recent years, hardware elements, especially non moving parts have become so reliable, that the type of redundancy which is truly cost effective is software redundancy, or the need to keep the software running.

A non-redundant system will typically be down for 30-50 hours per yea, but as noted, these failures are generally not hardware but systems freezing. In the 80s the MTBF for a disk drive for instance was 2,000 hours or about 4 months of 7/24 use. Today, these are often in the 500,000 hours or 50 years of 7/24 use. But we still experience systems freezing and the need to reboot systems.

So instead of being down for 3 hours at a time which is the average time to repair considering, detecting the problem, isolating the problem then dispatching a knowledgeable person to correct the problem, they key to faster recovery is early detection. In the present invention in can be anticipated that the mean time to repair at less than five minutes and often closer to 2 minutes for monitored software.

This is accomplished as set forth herein by a plurality of approaches to monitor the state of the software, not just the hardware.

One such approach to insuring that system are running is to use a watchdog methodology.

In addition to pinging entire systems (which must be done remotely), it is possible to have the processors tested locally on a periodic basis.

The remote controller may include a watchdog timer subsystem which pings or "thumps" its own (or other) microprocessor. Remember, that the remote controller is only valuable if it is know to be working, but in the rare case that it itself is not running, there is a need to reboot the remote controller. The watchdog timer subsystem pings and then expects a certain response. If it doesn't get one, it will issue a reboot and then when rebooted the remote computer can report to the surveillance system that an extraordinary even has occurred.

While the advantages of remote management are so great that they should be seen as essential elements, their very nature makes terrorist attack of such systems a national security issue even for non-military applications. For example, electric power distribution system must be running 24/7. Oil and gas fields and their distribution networks must not be shut down even for short time because restart is so slow.

Therefore complete encryption end to end is advisable to prevent malicious spoofing or transmission of bogus information which would result in a shut down when no emergency condition actually existed. Many know systems exist, for example SSL over IP , Triple Dez over satellite or proprietary systems.

This sequence programming can be done on site or remotely, possibly by trial and error.

In the preferred embodiment the invention could include:

A system which remotely cycles the power automatically once the control sequence is received via out-of-band or in-band communications - including wireless and wired communications o The system can remotely reboot equipment from any distance through several different means: o The likely in-band (though it could be out-of-band) signaling method is over the Ethernet connection - SNMP or Telnet o The likely out-of-band (though it could be in-band signaling method is wireless (f. ex. Aeris or Cellemetry networks), satellite such as Iridium, or modem over the POTS network (plain old telephone system)

The system can sense many environmental and computer conditions and establish and measure threshold information and announces the condition via in/out of band signal, such as temperature is too high, the access door is open, moisture levels exceeding recommended levels, etc. o In one embodiment sensors can use 4-20 ma loop technology. Other sensors, such as digital sensors, independent sensor via wireless (short rang) communications or even wired communications can be used. o In any event, if a predetermined condition occurs, such as a switch closure (indicating a monitored event has occurred), a temperature reaching a certain threshold, or a value occurring

that is inside or outside an acceptable range an alarm will be sent, or an action will be taken as a result. If the inband signaling is available it can be sent over it, and if in-band is not functioning, the system will transmit on the out-of-band signaling Communications links.

To ascertain certain triggering conditions, remotely reading the information from a piece of equipment via, for example serial port (or any other available port) and report that data via wireless link to the remote management system. It may even be necessary to monitor LED (status lights) on the equipment if no port is available or such ports may lock up or fail before transmitting the critical data. For example, the LED for hard drive access may be an indirect indicator of a frozen system. If the LED is constant on or off despite known events (such as backup) which would require disk access or long term flickering, indicative of "churning", the remote sensor can be programmed to indicate a possible fault condition requiring reboot. Of course careful study of the system's peculiarities is needed to prevent false reports. The monitoring can be optical, IR or similar- either digitized analog information or genuine analog information such as PSI, gas flow, fluid volume (gallons/liters) processed, etc. o In addition to reading information from the sensors directly it is possible to interface to other pieces of equipment with digital links, with Ethernet, with serial ports - thus reading their information, and reporting it directly or when it reaches certain thresholds. Thresholds may be determined by measuring normal conditions and simulating abnormal ones and setting thresholds therebetween.

- With the threshold condition known, we can then provide remote control via: Digital outputs Relay outputs

A/C power plug outputs (power strip f. ex., for a power cycling) Serial output

4-20ma loop output o Likewise this can be accomplished by other communications systems like Bluetooth, USB and other communications links.

- The system can remotely read analog values via A/D converter o A multi-channel A/D converter can read any analog value coming across our input leads.

- Complete end to end security can be provided - data transmitted to device, data received from the device and data back and forth from the server is preferably all encrypted from end to end so that an unauthorized third party may not hack into the system and cause a reboot or worse. o The control system can be secure from the time the data is carried from the user to the control server via SSL encryption, to the encryption and encapsulation of the security from the control server to the remote unit. All messages going back and forth are preferably completely encrypted and secure.

An important alternate embodiment is the ability to daisy chain multiple units together. Controllers can be daisy chained to control more power outlets, add more analog or digital inputs and outputs. This enables more and more systems to be synchronized whether for sequential reboot, or individual control of units by their managers.

The daisy chain communications can occur between units via: a. RS485 using modbus or any other droplink protocol b. RS232 links c. Ethernet links d. Bluetooth e. Zigbee f. Even communicating over the 120 power lines g. Via satellite h. Via cellular i. Other communications links

Typical daisy chaining is done locally, by any of the above links or similar. On the other hand it is possible to daisy chain as on top of the AC lines. See figure 12. Therefore, one could simply plug multiple controller strips together (plugging one into the next) and the communication link protocol (such as RS485) can be carried over the AC line voltage, meaning no separate cabling is needed. It is possible to have master and slave controllers as shown in figure 12 or have master controller and dumb power feeds as slaves with no independent controller function, just the ability to receive commands from the master. o In environments where there is a heavy concentration of equipment such as a corporate data center or a server farm, it is vital that the power sequencing include more than just the circuits available in one box. By daisy chaining from one master to another and allowing masters to talk to each other, one can sequence dozens, even hundreds of computers in the right order so they come up (restart/reboot) in an order that is very important to proper functioning. Daisy chaining provides a simpler configuration since the chaining of remote management systems provides part of the sequencing as easy remote management system box will be able to report its (current and reboot) status both in and out band. Alternative to true daisy- chaining, the system can provide a look up table of which systems must boot in what order and then system can communicate with each other , or pass tokens to each other to signify that they are have restarted in the required sequence. o The preferred protocol for the daisy chain bus is RS485 with the Modbus protocol. This enables the system to interoperate with 10,000s of other pieces of equipment. We can then take the data from these other pieces of equipment, set thresholds and alarms to notify users when parameters are out of the desired range. It also o allows for the control of a lot more sophisticated equipment trying a wide variety of recovery methods prior to recycling power.

o Modbus allows us to talk to other daisy chained boxes of our own, to receive inputs from 1000s of other pieces of equipment and to control 1000 of other pieces of equipment. All with the idea of collecting and transmitting data, but most of all creating an even higher reliability environment to prevent the lost of data and the destruction of equipment.

- In a preferred embodiment the system is configured as a power strip like device with one or more remotely controlled power sockets for plugging in computers and network equipment. The advantage of the power strip configuration is that is plug and play compatible and replaceable with unintelligent (dumb) power strips. See Figure 10. Alternative configurations are possible within the scope of this invention and include stacking of elements in a slide-in box . See Figure 11. o For example, one implementation calls for 15 amp isolation relays controlling the power to 120/220 VAC circuits.

- The system can power down and power up in a unique sequence depending on the requirements of the user's application - this is mandatory for many of pieces of equipment, since the power up is so critical to obtaining IP addresses from routers, to finding a database server before hosting sites. o If all systems are powered up at the same time, the system that can reinitialize fastest will then start to look around to make sure the other network components it needs to properly function are around. So even though the other pieces of equipment may actually be there, may actually have power, they may not have initialized completely when the other system needs it. As a result the when everything finally comes up and runs, the system itself may not work because the web server has not idea where the database server is, and neither one of those has an IP address because the DHCP server in the router wasn't fully initialized when the two servers came up, so they can't talk to anyone. So in one embodiment there are a plurality of power outlets in the box each with programmable sequencing. So in a

simple installation, the user simply plugs in the components in the order of reboot and sets the timing therebetween or uses the default delay. On configuration allows the master can talk to other masters in the daisy chain and the masters can control their slaves, thereby orchestrating a massive number of units to properly come up in the right sequence. By assigning sequencing to the masters, the slaves will come one line in accordance with the programming or defaults of their master. This would allow for a massive installation in short order. o One of the significant benefits of this design is that it eliminates operator errors and reduces troubleshooting requirements, it also means one doesn't need as many trained operators. A simple click of the mouse (programming) takes care of the proper sequencing of power in hundreds of time sensitive pieces of equipment.

In figure 1 , one embodiment overview is shown. The remote site 20 contains a plurality of sensors and devices to be sensed. The in-band link is not shown, but it is any of a number of commonly available connections such as Ethernet to cable modem, T-1 , dsl, etc.

The out-of-band link is illustrated by bolt 22 which links by long range wireless 24, satellite or other out-of-band system to a link, in this case internet 26 and to the surveillance or management server 28 which monitors all reported activity in or out-of-band. This server could have data tables on known conditions and be able to compare to abnormal conditions. In such case raw data from the sensors would be sent to this server and then decisions of abnormal conditions would be made at the surveillance server 28. A system which requires less bandwidth (which may be relevant to remote regions where data transmission is very costly), is to locate a server in the remote location 20 which will make the data table looks up locally and report the abnormal condition only, not the raw data which supported that determination. Of course, the condition which caused the failure on monitored equipment may also cause the failure of the remote located server.

Often humans must ultimately make the decision of what action to take in view of data reports or at least be notified of actions automatically taken by the surveillance system, a report can be sent , preferably over the internet, though out-of-band is also possible, to a notification server 30. One type would be a voice response server which sends a voice message over POTS/PSTN circuits 32 to a cell network 34, a POTS phone 36 to a phone 38, which verbally advises the manager of the condition and possible response. A configuration computer 37 is linkable to the remote site 20 and particularly to its controller (see Figure 1a) to allow local set up, though the set up can also be accomplished remotely by the surveillance server or any other computer on the internet.

Figure 1 a is a view of some of the components which encompass the remote site 20. The controller 40 may also contain the power control feeds/strips 47 (integrated or connectable) which are more fully described in figures 5 -11.

The controller 40 may be a computer or a basic device which can control the power feed to the various other remote devices 41 (web server), 42 (database server), 43 (router) or any number of other devices, which may have to interoperate/communicate with each other, often in a sequential fashion (for example router to web server to data base server). Thus rebooting/restarting of these devices must be done in a proper sequence in order to insure that one is ready before the next one is restarted. Controller 40 therefore may contain a software program (or ROM or other memorized instruction) to execute the sequential restarting without the above mentioned conflicts. This programming can be done by testing each component for its minimum reboot time and inputting at least that minimum time between devices, or by knowing the average time to reboot for such a device in general and using that as a bar minimum or by observing (sensing) some characteristic of the device, directly or indirectly which will identify when the restart process is complete and the device has reached a quiescent steady state indicative of a completed restart.

The controller 40 may also monitor by sensors 45 and 46 (as examples a door open, and temperature) to report an alarm condition (door open as a possible break in and temp as a hazardous condition). There are a multitude of sensors possible which can be connected to the controller 40 which can function to cause a restart through control of the power feeds or to send a report over the in or out band channels to a surveillance system.

A condition which shuts down the in-band circuit does not interfere with communication and action via an out-of-band circuit all the way to the human manager.

Figure 4 illustrates a partial view of a typical system. The EFM (electronic flow meter) 400 is monitored by sensors leading to a the MCU (master control unit) 402, which relays the information, usually in out-of-band, to satellite 404 and ground station 406. The signals are forwarded by, for example, the internet 407 to a surveillance server 408 and the a SCADA (Supervisor Control and Data Acquisition) server.

Figure 5 is a block diagram of a master control unit 402 which has a satellite radio 502 and interface 504 to the processor 506 and software. The power is supplied thru management circuits 508 and conditioning 510 and a sensor inputs (analog or digital) 512 are provided. In a very remote location where only satellite is available, the in-band signal must be satellite but this is not preferred.

Figures 6, 7, 8 and 9 respectively show the preferred design with separate boards, namely, the MPU board 600, the Radio board 700, serial board 800 (for serial input sensors) and relay board 900 for systems requiring isolation.

Figure 10 illustrated a plug and play power strip version which contains the above mentioned elements (with MPU board 600 including analog inputs for appropriate sensors).

Figure 11 illustrates a unit like Figure 10 except that the boards are racked or stacked to create a different footprint.

Preferred system overview

• The main system board should function for the master, slave and single port unit

• The system should be flexible in design to allow for the most anticipated future requirements (such as having the Ethernet hardware in place, so that just the software needs to be developed to make the system be Ethernet enabled.

• The system should be designed for reliability at every stage o Using components that have a high MTBF o Using part sizing so heat is not a problem o Using cooling techniques to minimize hot spots

• The system should be designed to be easy to use and require little or no customer support

Power control can be provided by:

• The system must be able to control the following number of 115 VAC (or other local voltage) outlets o Master - 8 - outlets o Slave - 8 - outlets

The power distribution PCBs can have: o Relay drivers to enable the relays to be on for an indefinite period - so there must be adequate cooling for the relay drivers o Spark arresting capacitors and resistors on the output side of the relay

• It is highly preferably that the relays be Normally Closed relays so that power failure signals a status change. Likewise it is preferably that all circuits be designed so that power failure condition signals a change, whether or not a discrete relay is used.

• The circuits should power up without cycling power on the relays

• If a watchdog (supervision system) timer resets the control processor, the power should not cycle on the relays during any portion of the self test and coming back online

One preferred radio system (out-of-band) is the Skybility CMM 8920 radio

The preferred package construction can be a 1 U high in a 19" rack mounted configuration for the 8001 and 8002 with: o Front panel

Master

• Label each power LED with the socket number, 1...8 starting with one on the left. The overall section will be labeled "Power Sockets"

• Ethernet port - "Ethernet Port"

• Serial port - "Command Port"

• Modem port - "Modem Port"

• Sensor inputs - "Sensor Inputs" - label each 1..8

• Digital outputs - "Digital Outputs"

• Daisy chain port - "Daisy Chain Port"

Slave

• Label each power LED with the socket number, 1...8 starting with one on the left. The overall section will be labeled "Power Sockets"

• Serial port - "Command Port"

• Sensor inputs - "Sensor Inputs" - label each 1..8

• Digital outputs - "Digital Outputs"

• Daisy chain port - "Daisy Chain Port In"

• Daisy chain port - "Daisy Chain Port Out"

Single port unit could look like this:

• Label the power LED with "Power Socket"

• Ethernet port - "Ethernet Port"

• Serial port - "Command Port"

• Modem port - "Modem Port"

• Sensor inputs - "Sensor Inputs" - label each 1..8

• Digital outputs - "Digital Outputs"

• Daisy chain port - "Daisy Chain Port"

• The system packaging should be the smallest sized box that will fit all of the parts and provide adequate cooling. This should include optional mounting brackets so it can be mounted flat or hung from the end (see figure X)

This invention envisions an unlimited number and variety of sensors including those not yet invented or not yet practical for use, since they will be capable of enhancing the this inventive concept without deviation therefrom.

The following are merely exemplary:

• Configuration o Default - each port can have a default sensor type associated with it

Port 1 = Door opening (i.e. Is a monitored door left open)

Port 2 = Temperature o Each sensor input port can be configured to support any 4-20 ma loop sensor - 2 wire version

• Type of inputs o 4-20 ma loop - this can handle the "digital" styled inputs such as switch closure as well as analog values. Eases the powering of sensors as well as transmitting results over long distances. It is also very low cost to add to each sensor. o Analog -or 4-20 ma loop instead o Digital -Serial -Digital Outputs

O

• Type of outputs o Opto isolated o Application must supply its own power

• Idiot proof connection o For example: DB 9 male - opposite of connector used on modems so no confusion will take place.

Preferred Ethernet Connection could be as follows:

• The MAC address of each Ethernet controller is unique. Add a check digit to the MAC address and that will serve as our serial number. By this method, an existing serialized component can provide the necessary unique identification without creating new serial number IDs. The likelihood of a duplicate MAC address with a check digit is extremely remote.

• This will also have the ability to do application level checking of servers, routers, and also any device that can be pinged.

Preferred Daisy Chaining could be as follows:

• A master could control any number of slave units

• The daisy chaining signal should be capable of reaching at least 50 feet/15 m

• Serial port styled signaling

• Standards o Using RS485 protocol o RJ45 connectors

• Baud rate - 9600 baud

• Ports can be used to : o Download new versions of the software o Configure software o Set parameters o Read and display existing configuration o Perform self test o Turn on and off ports manually

The preferred communication / Modem Port can be used to: o Same functionality as radio - commands, parameters, etc.

o Also same functionality as serial port o Download new versions of the software o Configure software o Set parameters o Read and display existing configuration o Perform self test o Turn on and off ports manually

The system may have visual/audible indicators:

• Power Socket LEDs o These LEDs are on if the socket has 115 VAC

• Main Power LED o This LED shows that there is power to the unit, it will come from the output of the power supply

• Signal Strength LED o This LED will show the strength of the power signal o It may stay solid red when the system powers up o This LED may be used to convey complex information about the power conditions. This exemplary wave form can be found in figure 2. For example: It may pulse λ A second on (see Figure 2 nr. 50, 2 second off for 1 beat if it barely has a signal, it will pulse VT. second on, λ A second off for 5 beats if it is at full strength (52). In between strength will be 2, 3 or 4 pulses (54, 56, 58) followed by a 2 second pause.

The following are preferred sensor configurations: Interface types:

Preferred 4-20ma loop on all sensors

4 ma = off

20 ma = on

Analog signals will be between 4 and 20 ma

Sensor type (examples) : o Temperature sensor

o Water level sensor o External power sensor o Door opening sensor o Humidity sensors o Smoke alarm interface o Motion sensor o Sound - glass breakage o Air flow o Other

The preferred software configuration includes: System updating:

• The system will be able to replace itself with new updated code

• It may be capable of accepting the new code via: o Ethernet connection o Command (Serial) port - primary method o Modem Port - same capability as Command Port o Wireless data - when the network is capable of so much data

• The master may be able to download updates to each slave

To insure that the system itself is alive and well, it may contain a watch dog/supervision control:

• The system needs to check itself periodically. If it doesn't respond, it should reboot itself. This is known as supervision and allows the user to know that the system is working even though there is no alarm condition. It is possible to set up one computer system as a dummy to generate know alarm conditions as a way to periodically test the function of the system. Other solutions are possible such as an array of dummy sensors (inputs) which are set to predetermined values which serve as calibration reports much as a color bar test pattern provides color and hue information for television monitors.

A typical daisy chain arrangement could work as follows:

• Master o Send commands to each slave o Receive data from each slave o Send downloaded program update o Send downloaded parameter updates

• Slave o Receive commands from master o Receive commands relayed from upline relay slave o Receive downloaded program update o Receive downloaded parameter updates o Relay messages to next slave o Relay messages from next slave

Typical power control outputs could be as follows:

• Turn power on until further notice

• Turn power off until further notice

• Turn power on for xx seconds and then turn off

• Turn power off for xx seconds and then turn on

• Turn off power in multiple sockets with the following sequence and timing between removing socket power

• Turn on power in multiple sockets with the following sequence and timing between applying socket power

• Combination of power down and power up sequences

• Follow power sequence based upon input conditions - this would results in a intelligent sequencing which takes real time conditions into consideration. An example would be to watch LEDs on certain devices and by knowing their characteristic response in working and fault conditions, make decisions on what sequence to start and when.

Inputs can be monitored in many ways, such as:

• Each port can be configured as a "digital" port (4 ma = off, 20 ma = on)

• Each port can be configured as an "analog" port (analog data between 4 and 20 ma

• Commands o If "digital" port goes from off to on send signal to host (the host being either a local server or the remote surveillance server connected by in /out band.) o If "digital" port goes from on to off, send signal to host o If "digital" port pulses on for 2 seconds send signal to host o If "digital" port pulses off for 2 seconds send a signal to host o If "digital" port goes from off to on, execute power sequence X o If "digital" port goes from on to off, execute power sequence Y o If "analog" port is above a certain threshold then send a signal to host o If "analog" port is below a certain threshold then send a signal to host o If "analog" port is above a certain threshold then execute power sequence X o If "analog" port is below a certain threshold then execute power sequence Y A self-test procedure may also be incorporated:

• Upon power up

• If it fails - send command to user

As part of the likely in-band software for the Ethernet control may use:

• TCP/IP

• SNMP

• Application testing, so the system can check a preselected web page and look at the results to see if it working. If not , the system can send a warning to the server - application not working. This solution provides unique control since one can design the web page for our particular testing purposes.

The likely out-of-band will be radio which can:

• Initialize radio configuration and execute Skybility ® (or other system) commands

System level command may include:

• These are the high level commands to and from the host - these can be communicated over the inband signaling on the Ethernet port or via the out of band signaling via wireless radio network (such as Aeris®) o From host

Execute the power control command

Execute the digital output command

Execute a self test and report back the results

Request the status of all inputs "analog" and "digital"

Request the value of an individual port

Request the status of all sockets

Request the version numbers (both of the Surveillance server and of the radio)

Request radio signal strength

Update power up sequence (ports and timing)

Update power down sequence (ports and timing)

Configure input monitoring requirements

Disable device - stolen (device is being used by an unauthorized party — render it inoperative) The ability to disable ports is useful to allow the management system to continue to operate when some sensors are not reporting or are damaged or defective. It is possible that equipment was legitimately removed from a site (or stolen) and the management system would constantly report a fault condition. To bypass this condition and not disturb the human monitor with inaccurate reports allows for less frequent on site inspection of the remote site. o To host

Send the results of each of the above commands

Send the status of all inputs

Send the status of specific input

Send the status of all sockets

Send notification that an "input trigger" has occurred along with related information / data

Additional security is achieved by :

• Software made sensitive to the radio's version number in case there are different command sequences required. This will provide a multi¬ level security control

• Software may be set to minimize the number of packets sent since each has a cost associated with it . An intelligent version of the system can be programmed to minimize out-of-band costs. In very remote locations, satellite radio may even be the in-system and full reporting may be very expensive. By adjusting the packet size or other data transmission to dynamically recognize the band cost and adjust to same, may become important.

• Software can be set to use automatic rollover of communications channels (in/out band) when cheapest cost links fails. Typically, the cheapest link is the in-band, so when multiple out of band solutions are available to the remote controller, it will look to a data table located at the controlled (preferably downloaded periodically from Surveillance server) as pricing changes.

• Out of band prioritization filtering: Alarm mode data (a sensor reporting a fault) is minimal in size but bulk data collection (such as the long term status of devices at a remote location, or the operation of non- computer equipment, such as pumps, transformers, relays, etc. can very voluminous thus the cost of data transmission may need to be optimized on the fly (i.e. in real time based up the above mentioned updated tables.) .

The system will accessible on the web at a website.

The remote manager can obtain access to status information over the web. In the simplest configuration, the power strip itself may be a web server so that the user can dial directly into the strip.

For example a user may define the trigger point for each sensor input.

■ Examples are:

• If the input goes from on to off

• If the input goes from off to on

• If the input pulses on for 2 seconds

• If the input pulses off for 2 seconds

• If the input increases to a certain value

• If the input decreases to a certain value

The consequences of a trigger being detected may be:

• Send an email with multiple recipients

• Send a notification to the voice response unit to call a phone number and play a message - one or more phone numbers - it should allow for the user to respond and log the response. When the caller receives the call, they need to respond. If no response after X minutes, the next person on the list will be called in an escalation process. - option to broadcast to a number of phone numbers instead of escalation process.

• Initiate a power sequence.

The sensors (devices) may be controlled as follow:

Select a device

Select a port

Select an action

• Port control (either power socket or digital output) o Turn on port o Turn off port o Initiate power on sequence (one or more ports)

o Initiate power off sequence (one or more ports) o Initiate power control sequence (one or more ports)

• Remote configuration o Send updated power sequence o Send updated single port power cycle timer value o Send input configuration o Send all configuration information in packed packets to minimize the required number of packets

• Other o Report status of all power sockets o Report status of all input ports o Report status of all digital outputs o Report status of selected port o Execute self test and report results o Request radio signal strength o Request version numbers

Of system firmware

Of radio o NOTE - warn user if request will require more than one packet to return information o The user can view transactions, for example:

View logs of transmissions

View logs of messages received

■ View billing information for current period

■ View billing information for previous period

View log of alarm events

View log of alarm event response - phone pad responses - caller ID - i.e. Employee ID o Customer Support department may have the:

■ Ability to get into anyone's account to execute commands

Ability to send password to accounts email address

Ability to flag device as stolen, and disable that device/sensor so as not to trigger further (false) alarms.

Because human monitors may be anywhere at anytime, a voice response system allows the monitor to learn of the condition, and take action without even having computer access.

Typical prompts may be as follows:

Prompt user for account number

Prompt user for user number (client can have more than one user)

Prompt user for PIN

Allow them to select device

Allow them to select port

Allow them to select commands

Allow them to respond to alarms and record their responses Cell/mobile phone SMS (text message) response is also possible

The software design of the system is illustrated in Figure 3.

The functions in figure 3 are defined as:

- ES - Email server

- VRS - Voice response system

- WS - Web server

- ACS - Aeris® (or similar satellite) control system - resides on the web server but interacts with the Aeris® network

- IFP - Interface packet that is used between the ACS and the Aeris net server

- AN - Aeris Net Server

- AP - Aeris packet

- RP - Remote program (denoted as 8001 )

- DC - Daisy chained unit (denoted as 8002)

- CPC - Control PC used to download and interact with the RP and DC

RP originated data functions can follows (of course steps can be deleted, rearranged or combined):

1. When a sensor reaches a alarm condition, it will trigger the RP to send an encrypted version of the alert to the AN via and AP.

2. The AN in turn will forward the data to the ACS portion of the web server via IFP. The content of the data is transparent to this transaction.

3. Once the packet is received by the ACS it will decrypt the packet, acknowledge the receipt back to the RP via the AN.

4. The ACS will then load the unencrypted packet into a database record keyed off of the session ID and then will call the WS with the session ID.

5. The WS will look up the session ID in the DB and take the appropriate action (log the transaction, send an email alert, have VCS call user, or ignore it.)

6. The WS will send an acknowledgement packet to the RP via ACS.

7. ACS will forward the acknowledgement packet to the AN via IFP

8. AN will forward the acknowledgement packet to the RP via the AP

9. The RP will stop the timer for timeout and retransmission of previous message.

Web originated data:

1. User signs onto WS and authenticated via DB. The communications between the end user and the WS is via SSL.

2. The user picks an option like initiate sequence 8 power control. This information is looked up in the DB along with the MIN.

3. The WS puts the needed data into DB and calls ACS with a session id

4. ACS looks up the session ID in the DB gathers the appropriate information. It encrypts the data and sends the packet to AN via IFP.

5. The AN relays the information to the RP via the AP.

6. The RP executes the command and sends an acknowledgement to the APC via the AP to the AN, which relays it to the APC via the IFP. When

the APC receives the acknowledgement it turns off the retransmission counter and stops the time out timer.

The software flows as follows:

- Look for input trigger and take trigger action

- Look for data (from any source) and take response action

- Self test

- Communications acknowledgement

- Communications retry if no acknowledgement received

- Power up sequence (allow for sequence to include 0 or more daisy chained

- Modules)

- Power sequence (allow up or down with delays in between.)

- Send data to server

- Receive data from server

- Look for the change from 4 to 20 ma loop sensors

- Watch dog timer to reset itself if things quit responding (i.e. it seems that the system is frozen and not responding)

- Treat the serial port as the command port when the right control characters are sent back and forth, otherwise treat it as the modem port.

- Need to report the signal strength to the LED

- Need to report the signal strength to the serial port, via wireless, via Ethernet

- Need to run SNMP on the Ethernet port ( mirror all of functions that are available wirelessly and via command port.)

- Read the MAC address of the serial port for the unique identifier

- Ability to load a new version of the software via command port, Ethernet SNMP, even the wireless link (should not be used very often because of the expense of wireless packets), via modem port when the serial port is configured in this mode

- Ability to control 16 daisy chained devices including getting sensor information and including their sockets as part of an elaborate power up or down sequence scheme.

- Configuration of the machine should include: o Uploading power up/down sequences

■ These should probably be thought of as generic so there isn't really a power up or a power down sequence just a sequence. So the sequence would include: box number (in case in a daisy chained environment), socket number o Setting alert thresholds for each sensor input o Creating auto response action, if a sensor input triggers, then it may require an automatic output, i.e. To turn on or off a power sequence o Inquire.

- Monitor mode o Inquire about the version number of the software, radio, hardware, etc. o Perform self test o Display all internal values including the value of each sensor o Initiate a single output o Initiate power sequence Sample Default configuration to be: o Input 1 = Temperature sensor - analog o Input 2 = water level sensor - digital o Input 3 = external power sensor - digital o Input 4 = door opening sensor - digital

- Implement all Power Control Functions (p 17 of System Requirements document)

- One more power control command - cycle power every X hours for Y minutes

- See input monitoring below Timeout / retransmission

- Error reporting

- Linking to satellite

Preferred Software - daisy chained unit

- Same as above except the daisy chained units won't be able to talk to a radio directly, it will do all of its communications with the outside world via the daisy chain ports and of course the command port. relay messages from one box to the next in a daisy chain

- Other than relaying information, it will preferably be controlled by the master not by another slave.

- It can have software uploaded to it via the daisy chain link

- It can be configured via daisy chain link

- power up sequences would preferably be to have all of that handled by the master. The master would then just tell each slave, individually to turn on or off a socket, the master would control the timing and sequence. While there is higher reliability if the power sequencing were handled by each slave since the communications link could die in the midst of a sequence ruining it if were centralized at the master but still enabling it at the slave. Optionally all of the slaves synchronized and all sequences handled the same rather than to decide on a case by case basis if it should be controlled by the master or distributed.

- One more power control command - cycle power every X hours for Y minutes

- See input monitoring below

Preferred Software for the Configuration PC

- Upload new software version

- Configure each software version

- Manually control inputs and outputs

- Simulate an input value so the rest of the alert broadcasts will happen

- Execute self test

- Create easy to use interface since our clients who use this may not have technical skills.

- Ability to change power control sequences

- Ability to trigger power control sequences

- Ability to read power control sequences

- Execute all of the command describe in the master above

- fixed baud rate or should it be changeable by sending an RF packet with the new baud rate in it.

- the output that control the relays in parallel with the opto isolated digital outputs or as separate ports

- All communications should preferably be done on the same channel unless specifically requested differently. In other words if a radio message is received for some data, the data should be sent back via the radio.

Software on the web server that speaks to the user may be as follows:

- Choose which communication link available (in/out band) o Satellite o Aeris® (satellite) net o SNMP o Modem

- Talk to user via SSL once they have signed in

- Login

- Resend user ID and password if that is forgotten

- Update account information

- Register a new user

- Register a new MIN

- Take ecommerce solutions

- Enable or diable Voice Response Unit

- Manage Voice response units user ID and passwords ( can support multiple users and multiple passwords- all numeric)

- Timeout retransmission

- Error reporting

Software on the web server that talks to the voice response unit could be configured as follows:

- Ask for authentication

- Control which messages are played in which order

- Will create an escalation process if no one responds to the call

- Will accept commands for each unit

o Initiate power control sequence o Requests status of port (includes daisy chain number if applicable)

- Timeout retransmission

- Error reporting

Of course variations within the general theme of this invention are intended to be covered by this disclosure. Such variations and alternatives include either methods which would be known to a person skilled in this art or art technologies which have yet to be discovered but would be interchangeable with those disclosed. Reference should be had to claims for the scope of the invention.