Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR MULTICAST MESSAGE PROCESSING
Document Type and Number:
WIPO Patent Application WO/2008/064281
Kind Code:
A2
Abstract:
A system and method for providing a regional multicast protocol that maps multicast groups to regions of multicast group overlap and manages multicast message dissemination and recovery simultaneously on a regional basis.

Inventors:
BIRMAN KENNETH PAUL (US)
OSTROWSKI KRZYSZTOF (US)
Application Number:
PCT/US2007/085329
Publication Date:
May 29, 2008
Filing Date:
November 21, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CORNELL RES FOUNDATION INC (US)
BIRMAN KENNETH PAUL (US)
OSTROWSKI KRZYSZTOF (US)
International Classes:
H04L12/56; H04L12/18
Foreign References:
US20060184695A1
US20030039225A1
US20040029524A1
US7075904B1
US7027400B2
Attorney, Agent or Firm:
CHAPMAN, Kathleen et al. (125 Summer StreetBoston, MA, US)
Download PDF:
Claims:

1. A method (200) for multicast message processing in a communications network (23) having including a sender node (45) and receiver nodes (43) comprising the steps of: determining (201), from a multicast message (13) being routed in the communications network (23), a multicast message group (18) having receiver Internet Protocol (IP) addresses (24) for the receiver nodes (43); determining (203) at least one region (37) having a region subset of IP addresses (24) from the multicast message group (18); assigning (205) an IP address (24) to each at least one region (37) to enable delivery of the multicast message (13) to the region subset; creating (207) at least one partition (57) in the at least one region (37), each of the at least one partition (57) including a partition subset of the region subset of IP addresses (24) and the receiver node IP address; and simultaneously routing (211) the multicast message (13) from an application associated with the sender node (45) to one of the at least one region (37) through a plurality of senders and feeders, passing (213) a token (55) among receiver nodes (43) in the region subset and the partition subset to gather control information related to the multicast message (13), and recovering (213), using the control information, lost multicast messages destined for the receiver node (43 A) from: other receiver nodes (43B, 43C) in the partition subset if the lost multicast message can be recovered from the partition subset, or other receiver nodes (43B, 43C) in the region subset if the lost multicast message cannot be recovered from the partition subset, or the sender node (45) if the lost multicast message cannot be recovered from either the partition subset or the region subset.

2. The method of claim 1 further comprising the steps of: determining a group view (20) which is a snapshot of members of the multicast message group (18);

sending the multicast message (13) to the group view; and determining, from the at least one region and from the group view, at least one region view, the at least one region view being a snapshot of the receiving nodes (43) of the at least one region.

3. The method of either of claims 1 or 2 further comprising the steps of: assigning a group IP address to multicast message group (18); and delivering the multicast message (13) to the group IP address.

4. The method of any of claims 1-3 wherein said simultaneous step further comprises the steps of: creating a group feed (15); locating a group sender (17) for the group feed (15); and registering the group feed (15) with the group sender (17).

5. The method of any of claims 1-4 further comprising the steps of: assigning at least one sequence number to the multicast message (13); sending a request (27) from the group view feed (35) to a region view sender (49) to send the multicast message (13) to a region view (39); and notifying the group sender (17) when the request (27) is complete.

6. The method of any of claims 1-5 wherein said step of recovering lost multicast messages further comprises the steps of: setting receive statuses of the multicast messages (13) by the receiver nodes (43); accumulating the receive status for each of the receiver nodes (43) in the region

(37) of the receiver node (43); sending a regional acknowledgement (72) to the sender node (45) based on the accumulated receive status of all the receiver nodes (43) in the region view (39) if all the receiver nodes (43) in the region view (39) have received the multicast message (13); sending a regional negative acknowledgement (74) to the sender node (45) based on the accumulated receive status of the receiver nodes (43) in the region view (39) if any

of the receiver nodes (43) in the region view (39) have not received the multicast message (13) and if none of the receiver nodes (43) in the region view (39) are caching the multicast message (13); and sending a partition negative acknowledgement (73) to the sender node (45) if no caching server (52) receives the multicast message (13).

7. The method of claim 6 wherein said step of sending the regional acknowledgement further comprises the step of: deleting the multicast messages (13) from the caching servers (52).

8. The method of any of claims 1-7 wherein said step of passing the token (55) further comprises the step of: sending and receiving the token (55) through a unicast protocol, wherein the unicast protocol is a communication between one said receiver node (43A) and a single other said receiver node (43B).

9. The method of any of claims 1-8 wherein said step of recovering lost multicast messages further comprises the step of: sending recovery information derived from the control information to the sender node (45) identifying the lost multicast messages, including the step of: sending, by the at least one region (37), a recovery request that includes information that is associated with the token (55); deleting the recovery information from the control information when the lost multicast messages have been acknowledged by the at least one region (37); including the recovery information when new data are received from the sender node (45) into the at least one region (37); upon the occurrence of an event, creating an Input/Output (I/O) queue (61) capable of handling I/O events from sockets (63); creating an alarm queue (65) for timer-based events;

creating a message request queue (29) for requests from the application; creating I/O batches from the I/O events and timer-based batches from timer-based events from the alarm queue (65); creating an I/O event quantum, an alarm event quantum, and a timer-based event quantum; draining the multicast message (13) from the socket (63) if the multicast message (13) is found on the socket (63) before the I/O batches and the timer-based batches are processed; continuously processing the I/O batches and the timer-based batches according to the I/O event quantum and the timer-based quantum; and waiting for I/O, if the multicast message (13) is not available to drain the I/O batches or the timer-based batches to process.

10. The method of any of claims 1-9 further comprising the step of: processing multicast messages (13) including the steps of: registering a data feed with a channel; pulling the multicast message (13) from the data feed; unregistering the data feed from the channel when there is no multicast message (13) to pull; and signaling the channel when the multicast message (13) is available at the data feed.

11. The method of any of claims 1-10 wherein said step of recovering lost multicast messages further comprises the steps of: exchanging information among the receiver nodes (43) in the at least one region

(37), wherein the information is associated with packets encountered by the receiver nodes (43) in the at least one region (37); storing the packets not yet acknowledged by the at least one region (37) in the caching servers (52);

determining, in the at least one region (37), a highest sequence number of the packets that are unlikely to still be in transit and that further have been received by some of the receiver nodes (43); including the highest sequence number in the token (55); when the token (55) visits the receiver node (43), determining, by the receiver node (43), from the highest sequence number, which packets the receiver node (43) has missed; if the missing packets are cached in at least one partition (57) of the receiver node (43), receiving, by the receiver node (43), the packet from another of the receiver nodes (43) within the at least one partition (57); if the missing packets are not cached in the at least one partition (57) of the receiver node (43), sending, by the receiver node (43), a request (27) in the token (55) to another of the receiver nodes (43); enabling the receiver nodes (43) in the region view (39) to provide the missed packets to the receiver nodes (43) in the region view (39) that reported the missed packets; determining a set of successful packets that all the receiver nodes (43) in the region view (39) have and a set of aggregate missed packets that all the receiver nodes (43) in the region view (39) have missed; providing an acknowledgement (71) to the sender node (45) from the region (37) for the successful packets; providing a negative acknowledgement (73) to the sender node (45) from the region (37) for the set of aggregate missed packets; and purging the set of successful packets from the caching servers (52).

12. The method of any of claims 1-11 further comprising the steps of: circulating an intra-partition token (81) having contents based on an inter-partition token (79) to all the receiver nodes (43) in the at least one partition (57) to collect the recovery information; circulating an inter-partition token (79), having contents based on the intra- partition token (81), among a plurality of the at least one partition (57) to collect regional information by aggregating the recovery information;

for each region (37), forming a set of partition leaders (77) by assigning a partition node in each at least one partition (57) as one of the set of partition leaders (77), each of the set of partition leaders (77) configured to cooperatively perform the step of recovering the lost multicast messages identified by the recovery information; assigning a region leader (75) from the set of partition leaders (77), that is configured to perform said step of recovering lost messages identified by the regional information; and periodically providing the regional information to the receiver nodes (43) in the region view (39) to determine the set of successful packets.

13. The method of any of claims 1-12 further comprising the steps of: receiving region membership changes; changing the set of partition leaders (77) based on the region membership changes; changing the region leader (75) based on the region membership changes; creating, by the partition leader (77), one of the intra-partition tokens (81) for the at least one partition (57) that includes the partition leader (77); updating the intra-partition token (81) with the recovery information; passing the intra-partition token (81) among the receiver nodes (43) in the at least one partition (57); for each visit by the intra-partition token (81): calculating a largest sequence number among the packets received in the region view (39); and setting the largest sequence number as a cutoff point for loss recovery for a succeeding visit by the intra-partition token (81 ) by including the largest sequence number in the intra-partition token (81). when the intra-partition token (81) is received: determining from a comparison of the recovery information in the intra- partition token (81) with receiver node information in the receiver node (43), a set of lost packets that can be recovered;

recovering the set of lost packets by requesting a set of the lost multicast messages from other receiver nodes (43) in the at least one partition (57); and replacing the recovery information with the receiver node information, if the intra-partition token (81) is received from one of the receiver nodes (43) in the same at least one partition (57); requesting the missing packets from another of the receiver nodes (43) in the at least one partition (57) if the recovery information in the intra-partition token (81) does not include the missing packets and if the missing packets are cached in the at least one partition (57); sending at least one of the set of successfully received packets to another of the receiver nodes (43) if the recovery information includes the at least one of the set of successfully received packets that is not in the set of missing packets; updating the aggregate negative acknowledgement; calculating a maximum continuous acknowledged interval that represents the highest sequence number of a numbered packet that was, along with all numbered packets having lower sequence numbers, successfully transmitted in the region view (39); and sending the maximum continuous acknowledged interval to the sender node (45).

14. The method of any of claims 1-13 further comprising the steps of: calculating minimum, average and maximum rate estimates among the rates collected from all of the receiver nodes (43) in the region (37); delivering the rate estimates to the region leader (75); periodically calculating, in the region leader (75), an estimate of a maximum admissible rate for the region (37) based on the maximum rate estimates; dividing the estimate of the maximum admissible rate among all the sender nodes (45) that send the multicast messages (13) into the region (37); providing the result of the step of dividing to the sender node (45) and other of the receiver nodes (43) that are multicasting; and setting a multicast rate in the sender node (45) as a maximum of:

a minimum rate at which receiver nodes (43) in the region view (39) can receive the multicast messages (13) sent by the sender node (45); and a minimum rate necessary to overcome message stagnation.

15. The method of any of claims 9-14 further comprising the steps of: organizing the events into batches up to a limit determined by the quanta; processing unicast messages that are received; processing multicast messages (13) that are received; processing unicast messages to be sent; processing multicast messages (13) to be sent; processing disk I/O; processing timer events; and processing down-calls scheduled through a non-blocking queue from other threads.

16. A method for multicast message processing comprising the steps of: determining a region (37) that includes receiver nodes (43) that receive multicast messages (13) addressed to multicast message groups (18); dividing the region (37) to form partitions (57); configuring a dissemination subsystem (26) to communicate with the region (37); delivering, by the dissemination subsystem (26), the multicast message (13) to the region (37); delivering the multicast messages (13) to the receiver nodes (43) within the region (37); configuring a recovery subsystem (30) to recover lost multicast messages in the region (37); aggregating status information, by using a single protocol, about the multicast messages (13) from receiver nodes (43) in the region (37) that electronically communicate with the dissemination subsystem (26) and the recovery subsystem (30); and

using the aggregated status information to drive the recovery subsystem (30), control sending rates, and control when the multicast messages (13) are purged from caching servers (52).

17. A system (100) for multicast message processing comprising: multicast messages (13) and multicast message groups (18) representing sets of receiver IP addresses including receiver node IP addresses for receiver nodes (43); an application (11) configured to: determine said multicast message groups (18) from said multicast messages (13); determine regions (37) having subsets of IP addresses (24) from said multicast message groups (18) including receiver node IP addresses; assign IP addresses (24) to said regions (37) to enable delivery of said multicast messages (13) to a region subset of IP addresses (24); and create partitions (57) in said regions (37), said partitions including a partition subset of said IP addresses (24) from said region subset and including receiver node IP addresses; a dissemination subsystem (26) configured to route said multicast messages (13) from a sender node (45) to said regions (37) through a plurality of senders and feeders; and a recovery subsystem (30) configured to: pass a token (55) among said receiver nodes (43) in said region subset and said partition subset to gather control information related to said multicast messages (13); and recover, using said control information, lost multicast messages destined for said receiver nodes (43) from: another of said receiver nodes (43) in said region subset if said lost multicast messages can be recovered from said region subset; another of said receiver nodes (43) in said partition subset if said lost multicast messages cannot be recovered from said region subset; and

from said sender node (45) if said lost multicast messages cannot be recovered from said region subset or said partition subset; wherein said dissemination subsystem (26) and said recovery subsystem (30) are configured to execute substantially simultaneously.

18. The system (100) of claim 17 wherein said dissemination subsystem (26) and said recovery subsystem (30) are configured to execute substantially sequentially.

19. The system (100) of either of claims 17 or 18 wherein said application (11) is further configured to assign group IP addresses to said multicast message groups (18) and deliver said multicast messages (13) to said group IP addresses.

20. The system (100) of any of claims 17-19 further comprising: a group view (20) including a group snapshot of receiver IP addresses associated with said multicast message groups (18) when said multicast messages (13) are sent; a region view (39) including a region view snapshot of said receiver IP addresses associated with said region (37) when said multicast messages (13) are sent; a mapping from said group view (20) to said region view (39); a Group Membership Service (GMS) configured to determine said group snapshot, said region snapshot, and updates to said group snapshot and said region snapshot; a group feed (15) configured to send a request (27) to said application (11) and receive said multicast messages (13) in response to said request (27), said group feed (15) configured to buffer received said multicast messages (13); a group sender (17) configured to be registered with said group feed (15), said group sender (17) further configured to pull said multicast messages (13) from said group feed (15), said group sender (17) configured to send a poll (67) to said group feed (15) and can receive said multicast messages (13) from said group feed (15) as a result of said poll (67); a group view sender (19) configured to: receive said request (27) from said group sender (17);

respond to said request (27); receive said multicast message (13) from said group sender (17) in response to said request (27); and assign a group view sequence number associated with said multicast message group (18) to said multicast message (13); at least one group view feed (35) associated with at least one said region view (39) in said group view (20), said at least one group view feed (35) being owned by said group view sender (19), said at least one group view feed (35) configured to: receive said request (27) from said group view sender (19); respond to said request (27); receive said multicast message (13) from said group view sender (19); and assign a sequence number associated with at least one said region view (39) to said multicast message (13); a region view sender (49) configured to: receive said request (27) from said at least one group view feed (35); respond to said request (27); receive said multicast message (13) from said at least one group view feed (35); and notify said group view sender (19) when said request (27) is complete; wherein said region view sender (49) is registered with said at least one group view feed (35); a regional acknowledgement (72) based on an accumulated receive status for said multicast message (13) of all said receiver nodes (43) in said region view (39) if all said receiver nodes (43) in said region view (39) have received said multicast message (13); a regional negative acknowledgement (74) based on an accumulated receive status for said multicast message (13) of all said receiver nodes (43) in said region view (39) if any of said receiver nodes (43) in said region view (39) have not received said multicast message (13) and said multicast message (13) cannot be recovered in said region (37); at least one caching server (52) configured to store said multicast message (13) for said recovery subsystem (30); and

a partition negative acknowledgement (73) created if no said at least one caching server (52) receives said multicast message (13); wherein said regional acknowledgement (72), said regional negative acknowledgement (74), and said partition negative acknowledgement (73) are made available to said sender node (45).

21. The system (100) of claim 20 wherein said recovery subsystem (30) is configured to: delete said multicast messages (13) from said at least one caching server (52); send and receive said token (55) through a unicast protocol; send recovery information derived from said control information; delete said recovery information from said control information when said regional acknowledgement (72) has been sent; and include said recovery information with new said multicast messages (13) received from said sender node (45) into said region (37).

22. The system (100) of either of claims 20 or 21 further configured to: process I/O events, alarm events, and timer-based events by: create an I/O queue (61) that can handle said I/O events from a socket (63); create I/O batches from said I/O events and timer-based batches from timer-based events from an alarm queue (65); create an I/O event quantum and a timer-based event quantum; drain said multicast messages (13) from said socket (63) if said multicast messages (13) are found on said socket (63) before said I/O batches and said timer-based batches are processed; continuously process said I/O batches and said timer-based batches according to said I/O event quantum and said timer-based quantum; wait for I/O if said multicast messages (13) are not available to drain said I/O batches or said timer-based batches; and maintain: a received list of all packets received by said region (37) to said receiver nodes (43) in said region (37);

a not acknowledged set of said packets that are not yet acknowledged by said region (37) and are stored in said at least one caching server (52); and an aggregate set of missed said packets based on a combination of said packets missed by each said receiver node (43) determined by comparing said received list with a list of said packets received by each said receiver node (43) in said region view (39).

23. The system (100) of any of claims 17-22 wherein said recovery subsystem (30) is further configured to: determine, by each of said receiver nodes (43), when missed packets are unlikely to arrive; report said missed packets as missed; enable said receiver nodes (43) in said region view (39) to provide missed packets to said receiver nodes (43) in said region view (39) that reported said missed packets; determine a set of successful packets that all said receiver nodes (43) in said region view (39) have; determine a set of aggregate said missed packets that all said receiver nodes (43) in said region view (39) have missed; provide an acknowledgement (71) to said sender node (45) from said region (37) for said successful packets; provide a negative acknowledgement (73) to said sender node (45) from said region (37) for said set of aggregate missed packets; purge said successful packets from said at least one caching server (52); maintain: an intra-partition token (81) having contents based on said token (55) and that can be circulated to all said receiver nodes (43) in said partition (57) to collect said control information; an inter-partition token (79) having contents based on said intra-partition token (81) that can be circulated among a plurality of said partitions (57) to collect regional information by aggregating said control information;

a set of partition leaders (77) determined for each said region (37) by assigning a partition node in each said partition (57) as one of said set of partition leaders (77), each of said set of partition leaders (77) being configured to cooperatively recover said lost multicast messages identified by said control information; and a region leader (75) chosen from said set of partition leaders (77) and configured to recover said lost multicast messages identified by said regional information; and periodically provide said regional information to said receiver nodes (43) in said region view (39) to determine a set of successful packets.

24. The system (100) of any of claims 17-23 wherein said partition leader (77) is configured to: create an intra-partition token (81) for each partition (57); update said intra-partition token (81) with said control information; and pass said intra-partition token (81) among said receiver nodes (43) in the partition (57).

25. The system (100) of any of claims 17-24 wherein said receiver nodes (43) in said region view (39) are configured to:

(a) receive region membership changes;

(b) change said set of partition leaders (77) based on said region membership changes;

(c) change said region leader (75) based on said region membership changes; (d) for each visit by said intra-partition token (81): calculate a largest sequence number among said packets in said region view (39); and set said largest sequence number as a cutoff point for loss recovery for a succeeding visit by said intra-partition token (81) by including said largest sequence number in said intra-partition token (81);

(e) when said intra-partition token (81) is received:

determine from a comparison of said recovery information in said intra- partition token (81) with receiver node information in one of said receiver nodes (43), said lost multicast messages that can be recovered; recover said lost multicast messages (13) by requesting said lost multicast messages from other said receiver nodes (43) in said partition (57); and replace said recovery information with said receiver node information; (f) if said mtra-partition token (81) is received from one of said receiver nodes (43) in a same said partition (57): determine missing packets at said receiver nodes (43); request said missing packets from another of said receiver nodes (43) in said partition (57) if said recovery information in said intra-partition token (81) does not include said missing packets and if said missing packets are cached in said partition (57); send successfully received packets to said other of said receiver nodes (43) if said recovery information includes said successfully received packets that do not include said missing packets and that are cached at one of said receiver nodes (43); update an aggregate of said negative acknowledgements (73); calculate a maximum continuous acknowledged interval that represents a highest sequence number of a numbered said multicast message (13) that was, along with all said numbered multicast messages having lower sequence numbers; successfully transmit in said region (37); and send said maximum continuous acknowledged interval to said sender node (45).

26. The system (100) of any of claims 23-25 wherein said region leader (75) is configured to: dynamically regulate a multicast transmission rate from said sender node (45) by performing the steps of: collecting a transmission rate from all said receiver nodes (43) in said region (37) by use of a token protocol;

calculating minimum, average and maximum rate estimates among said collected transmission rates; periodically calculating an estimate of a maximum admissible rate for said region (37) based on said maximum rate estimates; dividing said estimate of the maximum admissible rate among all said sender nodes (45) that send said multicast messages (13) into said region (37); providing a result of said dividing to said sender node (45); and setting a multicast rate in said sender node (45) as a maximum of: a first minimum rate at which said receiver nodes (43) in said region view (39) can receive said multicast messages (13) that are sent by said sender node (45), and a second minimum rate necessary to overcome message stagnation.

27. The system (100) of any of claims 22-26 wherein said receiver nodes (43) in said region view (39) are configured to: perform event processing including the steps of: organizing said I/O events, alarm events, and timer-based events into batches up to a limit determined by even said I/O event quantum and said timer- based quantum; processing unicast messages that are received; processing said multicast messages (13) that are received; processing unicast messages to be sent; processing said multicast messages (13) to be sent; processing disk I/O; processing said timer-based events; and processing down-calls scheduled through a non-blocking queue from other threads.

Description:

SYSTEM AND METHOD FOR MULTICAST MESSAGE PROCESSING

BACKGROUND [0001] Multicast messages are messages sent from a sender node to multiple receiver nodes. The receiver nodes can be members of multicast message groups, and messages can be destined for multiple multicast message groups. Multicast message groups can overlap, that is, can have as members some or all of the same receiver nodes. Currently, multiple multicast groups can be processed as follows: (1) "lightweight groups" in which application groups are mapped into broadcasts in an underlying group spanning all receivers, and then receiver nodes filter the received messages to drop unwanted messages, and (2) separate protocols for each multicast message group. [0002] Lightweight groups operate most efficiently when the number of nodes and transmission data rates in the multicast message group are below the point at which receiver nodes are presented with huge numbers of undesired packets that waste resources and fill up receive buffers. To use lightweight groups where there are a large number of nodes and a high transmission data rate, agents that filter the multicast messages for the receiver nodes could be provided. However, this approach introduces extra message hops, and the agents experience load linear in the system size and multicast rate.

[0003] Executing separate protocols for each multicast group incurs overhead linear in the number of groups for acknowledgements, negative acknowledgements, and other control traffic, and contention between multicast message groups for communication resources, both within and outside of individual nodes, and can cause packet loss.

[0004] What is needed is a system that provides for processing multicast messages that can accommodate large numbers of receiver nodes and high data transmission rates.

SUMMARY

[0005] The needs set forth above as well as further and other needs and advantages are addressed by the present embodiment described herein. [0006] For a better understanding of the present embodiment, together with other and further objects thereof, reference is made to the accompanying drawings and detailed description.

DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

[0007] FIG. 1 is a schematic block diagram of an embodiment of a system for multicast message processing;

[0008] FIG. 2 is a schematic block diagram of the control and data flow to regions and partitions; [0009] FIG. 3 is a schematic block diagram of message queue feeding and sending;

[00010] FIG. 4 is a schematic block diagram of multicast message groups divided into regions;

[00011] FIG. 5 is schematic block diagram of inter-partition and intra- partition token passing; and

[00012] FIG. 6 is a flow chart of an embodiment of a method for multicast message processing.

DETAILED DESCRIPTION

[00013] The present embodiment is now described more fully hereinafter with reference to the accompanying drawings. The following configuration description is presented for illustrative purposes only. Any computer configuration satisfying the speed and interface requirements herein described may be suitable for implementing the system of the present embodiment. The

system and method of the present embodiment are described in The Power of Indirection: Achieving Multicast Scalability by Mapping Groups to Regional Underlays, Ostrowski, K., Birman, K, Phanishayee, A., NSDI 2006, incorporated herein in its entirety by reference. [00014] The method of the present embodiment for multicast message processing can include, but is not limited to, the steps of determining, from a multicast message, a multicast message group having receiver Internet Protocol (IP) addresses for receiver nodes, and determining at least one region having a region subset of IP addresses from the multicast message group. The method can also include the steps of assigning an IP address to each at least one region to enable delivery of the multicast message to the region subset, and creating at least one partition in the at least one region, where each at least one partition includes a partition subset of the region subset of IP addresses and at least one partition includes the receiver node IP address. [00015] The method can also include the steps of simultaneously initiating a dissemination subsystem and a recovery subsystem, where the dissemination subsystem routes the multicast message from an application associated with the sender node to at least one of the regions through a plurality of senders and feeds, and where the recovery subsystem (1) passes a token among nodes in the region subset and the partition subset to gather control information related to the multicast message and (2) recovers, using the control information, lost multicast messages destined for the receiver node (a) from other receiver nodes in the partition subset if the lost multicast message can be recovered from the partition subset, or (b) from other receiver nodes in the region subset if the lost multicast message cannot be recovered from the partition subset, or (c) from the sender node if the lost multicast message cannot be recovered from either the partition subset or the region subset.

[0θ016] The method can further include the steps of determining a group view which is a snapshot of members of the multicast message group and sending the multicast message to the group view. A conventional Group Membership

Service (GMS) can perform snapshots of group membership, i.e. group views,

and, at the time the multicast message is sent, the multicast message is assigned to the latest snapshot known to the sender node. The method can also include the step of determining, from the at least one region and from the group view, at least one region view which is a snapshot members of the at least one region. As with group views, the GMS can perform snapshots of region membership, i.e. region views, and, at the time the multicast message is sent, the multicast message is assigned to the latest snapshot known to the sender node. With respect to determining region views, nodes could be, for example, assigned to the same region based on similarity of interest. In any case, the mapping must be such that if group view X maps to region views Yl, ... , Yn, then every node in group view

X must be in at least one of the region views Yl, ..., Yn. [00017] The method can further include the optional steps of assigning a group IP address to a multicast message group and delivering the multicast message to the group IP address. [00018] The step of simultaneously initiating the dissemination subsystem and the recovery subsystem can include, but is not limited to, the steps of (1) creating a group feed, (2) locating a group sender for the group feed, and (3) registering the group feed with the group sender. When the application associated with the sender node sends a message to a group, the application creates a group feed for the multicast message group. The purpose of the group feed is to provide multicast messages that are to be sent from the application to the multicast message group. The application can then find or create the group sender for the multicast message group, and can then register the group feed with the group sender. The group sender can maintain a plurality of group view senders, one for each group view. As stated previously, at the time the multicast message is sent, it is associated with a group view, Thus the multicast message, when pulled by the group view sender from the group feed, is routed to the group view sender. The group view sender can maintain a plurality of region view feeds, and the multicast message being routed can be placed in each of the plurality of region view feeds, each of which is registered with a region view sender. The region

view senders can pull the multicast message from the region view feed and pass it to the region sender for multicasting to the per-region IP address. [00019] Simultaneously, the region view sender can register the multicast message with a collector recovery agent for recovery, and can pass the multicast message to a regional controller so that the multicast message can be delivered to the application if the sender node is itself a member of the group. The method can further include the steps of (1) assigning at least one sequence number to the multicast message, (2) sending a request from the group view feed to a region view sender to send the multicast message to at least one region view, and (3) notifying the group sender when the request is complete.

[00020] The step of recovering lost multicast messages can include, but is not limited to, the steps of (1) setting receive statuses of multicast messages by receiver nodes, (2) accumulating the receive status for each receiver node in the region of the receiver node, (3) sending a regional acknowledgement to the sender node based on the accumulated receive status of all nodes in the region view if all nodes in the region view have received the multicast message, (4) sending a regional negative acknowledgement to the sender node based on the accumulated receive status of nodes in the region view if any node in the region view has not received the multicast message and if no nodes in the region view are caching the multicast message, and (5) sending a partition negative acknowledgement to the sender node if no caching server receives the multicast message. The step of sending the regional acknowledgement can include, but is not limited to, the step of deleting the multicast messages from the caching servers. The step of passing the token can include, but is not limited to, the step of sending and receiving the token through a unicast protocol which is a communication between one node and a single other node , and is supported by, for example, Internet Protocol version 6. [00021] The step of recovering lost multicast messages from the sender node can further include, but is not limited to, the step of sending recovery information derived from the control information to the sender node identifying lost multicast messages. In this situation, the at least one region sends a recovery request that includes information that is associated with the token The method

can further include the step of deleting the recovery information from the control information when lost multicast messages have been acknowledged by the region. Recovery information can be included in the token for a particular sender node when data from the sender node arrives into the at least one region. The recovery information can be deleted from the token for the sender node when there are not multicast messages to be recovered, and no other changes have occurred. The method of the present embodiment can still further include the step of including the recovery information when new data are received from the sender node into the at least one region. [00022] The method can still further include the steps of, upon the occurrence of an event, (1) creating an Input/Output (I/O) queue capable of handling I/O events from sockets, (2) creating an alarm queue for timer-based events, (3) creating a request queue for requests from the application, (4) creating I/O batches from the I/O events and timer-based batches from timer-based events from an alarm queue, (5) creating an I/O event quantum, an alarm event quantum, and a timer-based event quantum, (6) draining the multicast message from the socket if the multicast message is found on the socket before the I/O batches and the timer-based batches are processed, (7) continuously processing the I/O batches and the timer-based batches in round-robin fashion according to the I/O event quantum and the timer-based quantum, and (8) waiting for I/O, if the multicast message is not available to drain the I/O batches or the timer-based batches to process.

[00023] In the method of the present embodiment, the feeder can process, but is not limited to processing, multicast messages according to the steps of (1) registering a data feed with a channel, (2) pulling the multicast message from the data feed, (3) unregistering the data feed from the channel when there is no multicast message to pull, and (4) signaling the channel when the multicast message is available at the data feed. [00024] The multicast message and the lost multicast messages can include packets.

[00025] The step of recovering lost multicast messages can include, but is not limited to including, the steps of (1) exchanging information among the receiver nodes in the at least one region, where the information is associated with the packets encountered by the receiver nodes in the at least one region, and (2) storing the packets not yet acknowledged by the region in caching servers. The at least one region can determine the highest sequence number of the packets that are unlikely to still be in transit and that further have been received by some of the receiver nodes. The highest sequence number can be included in the token. When the token visits a receiver node, the receiver node can determine from the highest sequence number which packets it has missed. If the missing packets are cached in the receiver node's partition, the receiver node can receive the packet from another node within the partition, for example, by requesting the missing packet from a predecessor node, i.e. the node that passed the token to the receiver node, or by informing a successor node, i.e. the node to which the receiver node passes the token, of the missing packets so that the successor nodes can forward the missing packets to the receiver node. If the missing packets are not cached in the partition of the receiver node, the receiver node can send a request in the token to another node, for example, a random node, in another partition, for the packet. The token can contain enough information so that packets that have been missed by all nodes in the partition and the region can be determined. The method can further include the steps of (1) enabling the nodes in the region view to provide the missed packets to the nodes in the region view that reported the missed packets, (2) determining a set of successful packets that all the nodes in the region view have and a set of aggregate missed packets that all the nodes in the region view have missed, (3) providing an acknowledgement to the sender node from the region for the successful packets, (4) providing a negative acknowledgement to the sender node from the region for the set of aggregate missed packets, and (5) purging the set of successful packets from the caching servers. The term acknowledgement is used herein to refer to a message that indicates that sent information was received. The term negative acknowledgement is used herein to refer to a message that indicates that expected information was not received.

[00026] The token of the present embodiment is identified herein by other names, depending on where it is circulating. An inter-partition token circulates among the nodes in the region, and an intra-partition token circulates among the nodes in the partition. The method of the present embodiment can further include the steps of (1) circulating an intra-partition token having contents based on the inter-partition token to all nodes in the partition to collect the recovery information, (2) circulating an inter-partition token, having contents based on the intra-partition token, among a plurality of partitions to collect regional information by aggregating the recovery information, (3) for each region, forming a set of partition leaders by assigning a partition node in each partition as one of the set of partition leaders, each of the set of partition leaders configured to cooperatively perform the step of recovering lost messages identified by the recovery information, (4) assigning a region leader from the set of partition leaders, that is configured to perform the step of recovering lost messages identified by the regional information, and (5) periodically providing the regional information to the nodes in the region view to determine the set of successful packets.

[00027] The method of the present embodiment can still further include the steps of (1) receiving region membership changes, (2) changing the set of partition leaders based on the region membership changes, and (3) changing the region leader based on the region membership changes. The method can further include the steps of (1) creating, by the partition leader, one of the intra-partition tokens for the partition that includes the partition leader, (2) updating the intra- partition token with the recovery information, and (3) passing the intra-partition token among the receiver nodes in the partition. For each visit by the intra- partition token, the method can include the steps of (1) calculating a largest sequence number among the packets received in the region view, and (2) setting the largest sequence number as a cutoff point for loss recovery for a succeeding visit by the intra-partition token by including the largest sequence number in the intra-partition token. When the intra-partition token is received, the method can include the steps of (1) determining from a comparison of the recovery

information in the intra-partition token with receiver node information in the receiver node, a set of lost packets that can be recovered, (2) recovering the set of lost packets by requesting the set of lost messages from other nodes in the partition, and (3) replacing the recovery information with the receiver node information, if the intra-partition token is received from a node in the same partition. The step of replacing can enable the successor node to request or provide packets for the receiver node. The method can further include the steps of (1) requesting the missing packets from another node in the partition if the recovery information in the intra-partition token does not include the missing packets and if the missing packets are cached in the partition, and (2) sending at least one of the set of successfully received packets to the other node if the recovery information includes the at least one of the set of successfully received packets that is not in the set of missing packets, identified by, for example, packet identifiers listed in the token, and that is cached at the receiver node. The method can also include the steps of ( 1 ) updating the aggregate negative acknowledgement, (2) calculating a maximum continuous acknowledged interval that represents the highest sequence number of a numbered packet that was, along with all numbered packets having lower sequence numbers, successfully transmitted in the region view, and (3) sending the maximum continuous acknowledged interval to the sender node. There can be several types of recovery information, for example, recovery information that a predecessor node has placed in the token, and that describes what packets the predecessor node has, and recovery information that can describe the successor node that can receive the token sent by the predecessor node. To continue with the example, if the predecessor node is missing packets that the successor node is caching, the successor node may send them to the predecessor node, or if the predecessor node has packets that the successor node is missing, the successor node may request the packets from the predecessor node. The successor node can detect missing packets because the successor node can extract recovery information placed in the token by the predecessor node and compare it with its own recovery information

that it calculates. The successor node's comparison of recovery information can determine what actions the successor node will take. [00028] The method of the present embodiment can even still further include the step of dynamically regulating a multicast transmission rate from the sender node including the steps of (1 ) calculating minimum, average and maximum rate estimates among the rates collected from all nodes in the region, (2) delivering the rate estimates to the region leader, (3) periodically calculating, in the region leader, an estimate of a maximum admissible rate for the region based on the maximum rate estimates, (4) dividing the estimate of the maximum admissible rate among all the sender nodes that send the multicast messages into the region, (5) providing the result of the step of dividing to the sender node and other nodes that are multicasting, and (6) setting a multicast rate in the sender node as a maximum of (a) a minimum rate at which nodes in the region view can receive data multicast by the sender node, and (b) a minimum rate necessary to overcome message stagnation, for example after long periods of nonactivity or a series of massive losses, which lower bound is calculated by a smoothed estimate in each node and the minimum of those is taken by the region leader and divided by the number of senders into the region. [00029] As stated previously, in the present embodiment, there can be, for example, three types of events, each having associated quanta: I/O events, alarm events, and timer-based events. The method of the present embodiment can even still further include the steps of (1) organizing the events into batches up to a limit determined by the quanta, and (2) prioritizing the I/O events including the steps of (a) processing unicast messages that are received first, (b) processing multicast messages that are received second, (c) processing unicast messages to be sent third, (d) processing multicast messages to be sent fourth, (e) processing disk I/O fifth, (f) processing timer events sixth, and (g) processing down-calls scheduled through a non-blocking queue from other threads last. [00030] An alternate method of the present embodiment for multicast message processing can include, but is not limited to, the steps of (1 ) determining a region that includes receiver nodes that receive multicast messages addressed to

multicast groups, (2) dividing the region to form partitions, (3) configuring a dissemination subsystem to communicate with the region, (4) delivering, by the dissemination subsystem, the multicast message to the region, (5) delivering the multicast messages to the receiver nodes within the region, (6) configuring a recovery subsystem to recover lost multicast messages in the region, (7) aggregating status information, by using a single protocol, about the multicast messages from nodes in the region that electronically communicate with the dissemination subsystem and the recovery subsystem, and (8) using the aggregated status information to drive the recovery subsystem, control sending rates, and control when messages are purged from caching servers.

[00031] A further alternate method differs from the previously described method in that, rather than performing dissemination and recovery for each region separately, dissemination for all regions in a group can be performed at the same time, while recovery can happen on a per-region basis. Specifically, rather than multicasting the message to IP addresses of all regions, per-region headers and the data that would normally be sent in multiple IP multicasts to each of the regions, could be sent to a per-group IP multicast address. Thus, the packet that is sent to a per-group IP multicast address can include a single copy of the actual data, as well as a vector of communication packet headers, one header for each region view over which the group view spans. These headers can include sequence numbers for each region view to which the message is destined. [00032] The system of the present embodiment for multicast message processing can include, but is not limited to, multicast messages and multicast message groups representing sets of receiver IP addresses including receiver node IP addresses for receiver nodes. The system can further include an application that can perform the steps of (1) determining, from the multicast messages, the multicast message groups, (2) determining regions having subsets of IP addresses from the multicast message groups including receiver node IP addresses, (3) assigning IP addresses to the regions to enable delivery of the multicast messages to a region subset of IP addresses, and (4) creating partitions in the regions that include a partition subset of the IP addresses from the region subset and that

include receiver node IP addresses. The system of the present embodiment can also include a dissemination subsystem that can route the multicast messages from a sender node to the regions through a plurality of senders and feeds, and a recovery subsystem that can perform the steps of (1) passing a token among nodes in the region subset and the partition subset to gather control information related to the multicast messages, and (2) recovering, using the control information, lost multicast messages destined for the receiver nodes (a) from other receiver nodes in the region subset if the lost multicast messages can be recovered from the region subset, (b) from another receiver node in the partition subset if the lost multicast messages cannot be recovered from the region subset, and (c) from the sender node if the lost multicast messages cannot be recovered from the region subset or the partition subset, where the dissemination subsystem and the recovery subsystem can execute substantially simultaneously or sequentially. The application can assign group IP addresses to the multicast message groups and deliver multicast messages to the group IP addresses.

[00033] The system of the present embodiment can further include a group view including a group snapshot of receiver IP addresses associated with the multicast message groups when the multicast messages are sent, and a region view including a region view snapshot of the receiver IP addresses associated with the region when the multicast messages are sent. There is a mapping from group views to region views. Group and region snapshots and updates are determined by the GMS and delivered to the nodes which can rely on the GMS information needed for the dissemination and recovery subsystems. The system can even further include a group feed capable of sending a request to the application and receiving the multicast messages as a result of the request. The group feed is maintained by the application, which registers the group feed with a group sender. The group sender can pull messages from the group feed when the group sender is ready to do so. The application can place messages in the group feed to buffer them there, a function that is referred to herein as a push. Alternatively, the application could, for example, register a callback that the group feed can invoke, and in which callback the application can return one or

more messages to the group feed, a function that is referred to herein as a pull. Li general, in a push, the application can place a message in a feed, which acts as a message buffer or message queue, and the application may not participate in message sending afterwards. In the pull, on the other hand, the application can configure the feed to fetch messages from a given place.

[00034] The group sender can send a poll to the group feed and can receive the multicast messages from the group feed as a result of the poll. A group view sender can perform the steps of (1) receiving the request from the group sender, (2) responding to the request, and (3) receiving the multicast message from the group sender in response to the request. The group view sender can also assign a group view sequence number associated with the multicast message group to the message.

[00035] The system can further include at least one group view feed associated with the at least one region view in the group view. There are as many group view feeds as there are region view feeds. The group view feeds are owned by the group view sender and are registered with the region view senders when the group view sender is ready to send data. The group view feed can perform the steps of (1) receiving the request from the group view sender, (2) responding to the request, (3) receiving the multicast message from the group view sender, and (4) assigning a sequence number associated with a region view to multicast message. The system can even further include a region view sender that can perform the steps of (1) receiving the request from the group view feed, (2) responding to the request, (3) receiving the multicast message from the group view feed, and (4) notifying the group view sender when the request is complete. [00036] The system of the present embodiment can further maintain (1) a regional acknowledgement based on an accumulated receive status for the multicast message of all nodes in the region view if all the nodes in the region view have received the multicast message, (2) a regional negative acknowledgement based on an accumulated receive status for the multicast message of all nodes in the region view if any of the nodes in the region view has not received the multicast message and the message cannot be recovered in the

region, and (3) a partition negative acknowledgement created if no caching server receives the multicast message, where the regional acknowledgement, the regional negative acknowledgement, and the partition negative acknowledgement are made available to the sender node. At least one node in the system can be designated as a caching server that can store the multicast message for the recovery subsystem.

[00037] In the system of the present embodiment, the recovery subsystem can perform the steps of deleting the multicast messages from the caching servers and sending and receiving the token through a unicast protocol. Further, the recovery subsystem can perform the step of sending recovery information derived from the control information, i.e. information that is being circulated in the token to the sender node identifying the lost multicast messages, and deleting the recovery information from the control information when the regional acknowledgement has been sent. The recovery subsystem can still further perform the step of including the recovery information with new data received from the sender node into the region. If new messages are received, for example, from a sender node that had not been previously multicasting into the region, the information in the token can include information relevant to the sender node and the messages it is multicasting. [00038] The system of the present embodiment can process I/O events, alarm events, and timer-based events by performing the steps of (1) creating an I/O queue that can handle I/O events from a socket, (2) creating I/O batches from the I/O events and timer-based batches from timer-based events from an alarm queue, (3) creating an I/O event quantum and a timer-based event quantum, (4) draining the multicast messages from the socket if the multicast messages are found on the socket before the I/O batches and the timer-based batches are processed, (5) continuously processing the I/O batches and the timer-based batches in round-robin fashion according to the I/O event quantum and the timer- based quantum, and (6) waiting for I/O if the multicast messages are not available to drain the I/O batches or the timer-based batches.

[00039] The recovery subsystem can maintain, but is not limited to maintaining, (1) a received list of all the packets received by the region to nodes in the region, (2) a not acknowledged set of packets that are not yet acknowledged by the region and are stored in the caching server, and (3) an aggregate set of missed packets based on a combination of the packets missed by each node determined by comparing the received list with a list of packets received by each node in the region view. The recovery subsystem can perform the steps of (1) determining, by each node, when missed packets are unlikely to arrive and reporting missed packets as missed, (2) enabling nodes in the region view to provide missed packets to the nodes in the region view that reported missed packets, (3) determining a set of successful packets that all the nodes in the region view have, and a set of aggregate missed packets that all the nodes in the region view have missed, (4) providing an acknowledgement to the sender node from the region for the successful packets, (5) providing a negative acknowledgement to the sender node from the region for the set of aggregate missed packets, and (6) purging the successful packets from the caching server,

[00040] The recovery subsystem can further maintain (1) an intra-partition token having contents based on the token and that can be circulated to all nodes in the partition to collect the control information, (2) an inter-partition token having contents based on the intra-partition token that can be circulated among a plurality of the partitions to collect regional information by aggregating the control information, (3) a set of partition leaders determined for each region by assigning a partition node in each partition as one of the partition leaders, each of the set of partition leaders being configured to cooperatively recover lost multicast messages identified by the control information, and (4) a region leader chosen from the set of partition leaders and configured to recover lost multicast messages identified by the regional information. The recovery subsystem can perform the step of periodically providing regional information to nodes in the region view to determine the set of successful packets. [00041] The nodes in the region view can perform the steps of (1 ) receiving region membership changes, (2) changing the set of partition leaders based on

region membership changes, and (3) changing the region leader based on the region membership changes. The partition leader can perform the steps of (1) creating an intra-partition token for each partition, (2) updating the intra-partition token with the control information, and (3) passing the intra-partition token among the receiver nodes in the partition. For each visit by the intra-partition token, the nodes in the region view can perform the steps of (1) calculating a largest sequence number among packets in the region view, and (2) setting the largest sequence number as a cutoff point for loss recovery for a succeeding visit by the intra-partition token by including the largest sequence number in the intra- partition token. When the intra-partition token is received, the nodes in the region view can perform the steps of (1) determining from a comparison of the recovery information in the intra-partition token with receiver node information in the receiver node, lost multicast messages that can be recovered, (2) recovering lost multicast messages by requesting lost multicast messages from other nodes in the partition, and (3) replacing the recovery information with receiver node information. If the intra-partition token is received from a node in the same partition, the nodes in the region view can perform the steps of (1) determining missing packets at the receiving node, (2) requesting missing packets from another node in the partition if the recovery information in the intra-partition token does not include the missing packets and if the missing packets are cached in the partition, (3) sending successfully received packets to the other node if the recovery information includes successfully received packets that do not include the missing packets, and that are cached at the receiver node, (4) updating the aggregate negative acknowledgement, (5) calculating a maximum continuous acknowledged interval that represents the highest sequence number of a numbered message that was, along with all numbered messages having lower sequence numbers, (6) successfully transmitted in the region, and (7) sending the maximum continuous acknowledged interval to the sender node. [00042] The region leader can perform the steps of (1) dynamically regulating a multicast transmission rate from the sender node by performing the steps of collecting a transmission rate from all nodes in the region by use of a

token protocol, (2) calculating minimum, average and maximum rate estimates among the collected transmission rates, (3) periodically calculating an estimate of a maximum admissible rate for the region based on the maximum rate estimates, (4) dividing the estimate of the maximum admissible rate among all the sender nodes that send the multicast messages into the region, (5) providing a result of dividing to the sender node, and (6) setting a multicast rate in the sender node as a maximum of (a) a minimum rate at which nodes in the region view can receive data multicast by the sender node, and (b) a minimum rate necessary to overcome message stagnation. [00043] The receiver node can perform event processing in steps including the steps of (1) organizing the events into batches up to a limit determined by the even quanta, and (2) prioritizing the I/O events according to the steps of (a) processing unicast messages that are received first, (b) processing multicast messages that are received second, (c) processing unicast messages to be sent third, (d) processing multicast messages to be sent fourth, (e) processing disk I/O fifth, (f) processing timer events sixth, and (g) processing down-calls scheduled through a non-blocking queue from other threads last. [00044] An alternate system for multicast message processing of the present embodiment can include, but is not limited to, (1) a region that includes receiver nodes that receive multicast messages addressed to multicast groups, (2) partitions created by dividing the region, and (3) a dissemination subsystem that can perform the step of delivering the multicast messages to the region. The region can perform the step of delivering the multicast messages to the receiver nodes within the region. The alternate system can further include a recovery subsystem and a protocol for aggregating status information about the multicast messages from nodes in the region that are in electronic communication with the dissemination subsystem and the recovery subsystem. The recovery subsystem can perform the steps of using the aggregated status information to recover lost multicast messages and to control when the multicast messages are purged from caching servers. In the alternate system, dissemination for all regions in a group is performed at the same time, while recovery happens on a per-region basis.

[00045] Referring now to FIG. 1, system 100 can include, but is not limited to application 11 which can determine, from multicast message 13, multicast message group 18, determine region 37 having a region subset of IP addresses from multicast message group 18 including the receiver node IP address, assign IP address 24 to region 37 to enable delivery of multicast message 13 to the region subset, and create partition 57 in region 37, where partition 57 includes a partition subset of the region subset of the IP addresses and includes the receiver node IP address. System 100 can also include group feed 15 and group sender 17. Application 11 can create group feed 15 and then locate group sender 17 for that group feed 15. Group feed 15 can send request 27 to application 11 and receive multicast message 13 as a result of request 27 through group feed 15. Group feed 15 can include message request queue 29, message input queue 31, and message output queue 33. Message request queue 29 sends requests 27 to application 11 which responds by filling message input queue 31. Group feed 15 can choose multicast messages 13 from message output queue 33 to provide to group sender

17 in response to poll 67. Group feed 15 can maintain callbacks that it could invoke to obtain multicast messages 13. Group feed can be rate controlled so as to provide messages at a pre-selected rate, for example, a maximum rate. If group feed 15 cannot produce multicast messages 13, group feed 15 can be unregistered from group sender 17, and application 11 can reregister group feed 15 before sending more multicast messages 13 to it. Group sender 17, with which group feed 15 is registered, can send poll 67 to group feed 15 and receive multicast message 13 from group feed 15 as a result of poll 67. Group sender 17 can send multicast message 13 to a specific multicast message group 18, and retrieve multicast messages 13 to send from the various group feeds 15 registered with it.

Group sender 17 can buffer a queue of multicast messages 13 to send, can maintain feeds registered with it that can provide more multicast messages 13, and can maintain requests created for multicast messages 13 that are currently being processed. Group sender 17 can control message flow by, for example, limiting the number of multicast messages 13 that can be concurrently processed.

Group sender 17 can process a new multicast message 13 by determining a

current group view 20, associating the multicast message 13 with group view 20, creating request 27, and directing request 27 to group view sender 19 for processing.

[00046] Continuing to refer to FIG. 1, system 100 can include group view sender 19, group view feed 35, and region view sender 49. Group view sender 19 can receive request 27 from group sender 17, respond to request 27, and receive multicast message 13 from group sender 17. Group view sender 19 can deliver multicast messages 13 to group view 20. Each multicast message 13 can receive a sequence number, assigned by group view sender 19, and associated with group view 20. Each group view sender 19 can maintain separate sequence numbers.

Group view sender 19 can create sub-requests, one for each region view 39 that group view 20 maps to. The sub-requests can be placed in group view feed 35 that acts as a buffer between group view 20 and the corresponding region views 39. Group view feed 35 can receive request 27 from group view sender 19, respond to request 27, and receive multicast message 13 from group view sender

19. Group view feed 35 can buffer sub-requests associated with request 27 for group view 20, and directed to region view 39. If the buffer is full, group sender 17 can stop accepting new multicast messages 13, Only if all group view feeds 35 associated with region view 39 have space for new multicast messages 13 will group sender 17 start pulling more multicast messages 13 from group feed 15.

Region view sender 49 can receive request 27 from group view feed 35, respond to request 27, receive multicast message 13 from group view feed 35, and notify group sender 17 when request 27 is complete. Region view sender 49 can process sub-requests to deliver multicast messages 13 to region view 39. Each sub- request can be numbered, and the numbering can be within the context of region view 39. Each region view sender 49 can maintain a separate sequence of sub- request numbers. Region view sender 49 can initiate both dissemination subsystem 26 and recovery subsystem 30 for each sub-request, after which time, dissemination subsystem 26 and recovery subsystem 30 can perform processing concurrently. Region view sender 49 can limit the rate at which new sub-requests are picked up from feeds to control resource usage and can match a desired

sending rate for region view 39, as obtained through information in token 55 and delivered to sender node 45. Region view sender 49 can provide each sub-request to region sender 51 for dissemination. Also, region view sender 49 can provide each sub-request to regional controller 25 for delivering to sender node 45 in a loop-back, as well as to collector recovery agent 47 to initiate recovery of multicast message 13. Region view sender 49 can further notify group sender 17 that the sub-request is done when region view sender 49 is notified that multicast message 13 is transmitted and recovery is done. Group sender 17 can notify application 11 that request 27 is fully completed when all sub-requests are done, Region sender 51 can send multicast message 13, for example, unreliably to an IP multicast address of region 37. Each multicast message 13 can be associated with various values, for example, a sequence number within group view 20, as assigned by application 11 , and another sequence number associated with region view 39. Regional controller 25 can be a point of entry for all incoming multicast messages 13. Regional controller 25 recognizes incoming multicast messages 13 and processes them according to their characteristics. Collector recovery agent 47 can track recovery status of multicast messages 13 from sender node 45. Collector recovery agent 47 can track, for example, a recovery record that is associated with multicast message 13, can update the recovery record as sender node 45 receives acknowledgements (ACK) 71 and negative acknowledgements

(NAK) 73 from receiver nodes 43 (also known herein as caching servers 52). Collector recovery agent 47 can notify region view sender 49 when recovery for multicast message 13 completes. [00047] With further reference to FIG. 1 , system 100 can further include dissemination subsystem 26 operating simultaneously with recovery subsystem

30. Dissemination subsystem 26 can route multicast message 13 from sender node 45 to region 37 through a plurality of senders and feeds, and can include, but is not limited to including, regional controller 25 which can provide an interface between region view sender 49 and region sender 51, and region sender 51 which can provide multicast message 13 to region 37. Recovery subsystem 30 can pass token 55 among nodes in a region subset and a partition subset of nodes in region

37 to gather control information related to multicast message 13 and can recover, from another receiver node 43B, using the control information, lost multicast messages destined for receiver node 43 A in the partition subset if the lost multicast message can be recovered from the partition subset, from a receiver node 43 C in the region subset if the lost multicast message cannot be recovered from the partition subset, and from sender node 45 if the lost multicast message cannot be recovered from the region subset or the partition subset. System 100 can include caching servers 52, which are specific receiver nodes 43 that can store multicast messages 13 until they are acknowledged by all the nodes in region 37. [00048] Continuing to refer to FIG. 1, multicast messages 13 that are transmitted to group 18 can be assigned group sequence numbers within group view 20. Group view 20 maps to a set of region views 39, thus determining receiver nodes 43. System 100 can attempt to deliver multicast message 13 to receiver nodes 43 by creating, for each request to send multicast message 13 to group 18, referred to as a group request, a set of sub-requests as described above, referred to as regional requests, one for each region 37, and can process the sub- requests independently. Though region view 39 may be logically decomposed into several partitions 57, a single IP address can be used to transmit multicast message 13 to region view 39. [O0θ49] Continuing to still further refer to FIG. 1, system 100 includes a regional multicast protocol that proceeds as follows. Packets can be transmitted to region 37 using an IP address 24 assigned to region 37. Nodes in region 37 can recover from packet losses from peers within region 37, using a token ring protocol described later to acknowledge received packets and to provide negative acknowledgements for dropped packets. Receiver nodes can also provide feedback that sender node 45 can use to adjust its multicast rate for maximum throughput. Sender node 45 can buffer messages to accommodate a typically higher latency of acknowledgements to sender node 45. To reduce the number of buffers sender node 45 maintains, and to allow peer-to-peer recovery among receiver nodes 43, for every multicast message 13, a preselected number of nodes can be designated as caching servers 52 that can cache the received data for the

purpose of peer-to-peer loss recovery, until all nodes in region 37 have acknowledged the received data. Sender node 45 can stop buffering the received data including multicast message 13 when the received data have been acknowledged according to information in all caching servers 52. [00050] Continuing to still further refer to FIG. 1 , as receiver nodes 43 request to join groups 18, groups 18 are formed. For each group 18, system 100 can access group views 20 generated by a conventional GMS 3 which represent versions of groups 18. As more receiver nodes 43 request to join or leave group 18, group view 20 is updated, so that each group 18 has a current group view 20 associated with it. Each group view 20 includes a set of receiver nodes 43 referred to as group members. A current set of receiver nodes 43 included in group view 20 is fixed at the time group view 20 is created. [00051] Continuing to yet still further refer to FIG. 1 , one or more regions

37 are associated with group 18. As receiver nodes 43 join and leave the configuration, the assignment of receiver nodes 43 to regions 37 also changes.

Thus, region view 39 provides a snapshot of region 37 at a particular time that determines a set of receiver nodes 43 in region view 39. System 100 also maintains a mapping from group views 20 to region views 39. Specifically, for each group view 20, there is a list of one or more region views 39 associated with it. This association is created when group view 20 is created. The association is such that if a group view X has region views Y_l, Y_2, ..., Y_K associated with it, the list of receiver nodes 43 in all region views Y_l, Y_2, ..., Y_K can be the same as or a superset of the list of receiver nodes 43 in group view X. [00052] Continuing to even yet still further refer to FIG. 1 , to reduce the likelihood of dropping packets and to make the system 100 capable of overcoming periods of instability triggered by various external events, system 100 can prioritize operations, for example in order to maintain regularity of token circulation by providing control information a high processing priority. In the illustrative embodiment, all I/O completion events are assigned to one of several priority queues based on their type. The actual processing of events in the queues is deferred to the time where no new I/O completion events are available.

Processing then continues in the order of decreasing priority. In the illustrative embodiment, processing priorities are, for example, (1) process all the unicast traffic, which represents the control information such as, for example, token 55, ahead of processing multicast messages 13 to make the system more stable, and (2) process received data before initiating any new transmissions in order to minimize packet loss. In addition to prioritizing, system 100 can provide for relative synchrony, defined herein as nodes maintaining an up to date view of the state of their peers in order to avoid unnecessary forwarding. [00053] Referring now primarily to FIG. 2, in system 100 (FIG.1 ), recovery from packet loss associated with regional requests occurs on a region-by-region basis. If sender node 45 fails, receiver nodes in each region view 39 recover multicast message 13 among themselves, and it can happen that some region views 39 deliver multicast message 13 but others do not. Conventional higher level protocols can be used to manage such a failure case. In regional requests a region sequence number is assigned to multicast message 13 within region views

39, and system 100 attempts to deliver multicast message 13 until every receiver node in region view 39 either has acknowledged multicast message 13 or is reported by the GMS as faulty. In order to provide a structure for token passing, loss recovery and caching mechanisms, each region view 39 is divided into multiple partitions 57. Consistency among the region views 39 is provided by the conventional GMS.

[00054] Referring now primarily to FIG. 3, system 100 (FIG. 1) can include core thread 64, a single I/O completion port referred to herein as I/O queue 61, that is created to handle all I/O events, including confirmation of packets received, completed transmissions and errors, for unicast and multicast traffic, for sockets 63 created by system 100. Sockets 63 and I/O queue 61 can be included in kernel 12. The completion port shown here as I/O queue 61 can be realized in any way that provides for all I/O events initiated by or related to the system of the present embodiment be available for retrieval by using a call in a synchronous, blocking or non-blocking manner. Core thread 64 can continuously process events from I/O queue 61 as well as from its own alarm queue 65 and

from a lock-free queue of application requests (not shown), which can be, for example, a priority queue implemented as a splay tree, where core thread 64 can store timer-based events. The lock-free queue (not shown) can be used to establish communication between a potentially multi-threaded application and a single-threaded application, such as how the system of the present embodiment could be implemented, and can be implemented by conventional "compare-and- swap" style operations. Both I/O events and alarm events and application requests (such as a request of the application to register a feed it created for a given group with the group sender for that group) can be processed in a round- robin fashion, and in batches to minimize overhead. A quantum of, for example,

100ms for a batch of I/O events and 50ms for a batch of timer events, can be used, but any suitable quantum is within the scope of the present embodiment. Additionally, in order to minimize losses, when received packets are found on socket 63, socket 63 may be synchronously drained of some or all the received packets queued on it before any other processing takes place. In the situation where there are no events to process, the core thread 64 can await I/O in a blocking system call. In the situation where application 11 places a request on the lock-free queue of application requests while core thread 64 is in a blocking system call, a special, custom I/O operation can be initiated to resume core thread 64 so that it can process the request.

[00055] Continuing to refer to FIG. 3, a single socket 63, for example, can be used for all incoming unicast and multicast traffic, executing, for example, protocols based on the conventional User Datagram Protocol (UDP). Multiple sockets 63 could be used as well. A pull scheme can be used in which a component that intends to send data creates an output channel by registering a data feed, several of which are described with respect to FIG. 1, which can include a callback that returns objects to be transmitted. Data can be pulled from the feed asynchronously, as rate control and resource limitations permit. When no more data are available, the feed can be internally deactivated to eliminate useless polling. The component can then signal the feed when new data are available in order to reactivate the feed. Wrappers can be used that allow application 11 to

use a push interface, e.g. a SendMessage call. Buffering and message batching can be used as well.

[θ0056] Continuing to refer primarily to FIG. 3, multithreaded applications can be supported by system 100 (FIG. 1). Calls made by application 11 to core thread 64, referred to as downcalls, e.g. to signal a channel, can be implemented by posting special requests to I/O queue 61 and/or inserting them into, for example, nonblocking queues, implemented with, for example, conventional compare-and-swap (CAS) operations. Calls made by core thread 64 to application 11, referred to as upcalls, e.g. to pull data from a feed, can be made directly. Serialization can be used in which arbitrary types of objects can implement a serialization interface that requires a class to have a uniquely assigned number, and objects to be able to store their contents by appending data to a header or buffers to a scatter-gather pool, and to load an object's contents from a given header and buffer. [00057] Referring now to FIG. 4, a system with large numbers of overlapping groups typically has a much smaller number of regions 37 of overlap. In FIG. 4, eighteen nodes have subscribed to three hundred groups but these overlap in just seven regions 37. Regions 37 of overlap may be created in a way such that nodes in region 37 are members of similar groups, in which case regions 37 exhibit properties that are referred to as fate and interest sharing. Interest sharing arises when nodes in region 37 can be configured to receive the same or almost the same multicast messages 13. In FIG. 4, sender node 45, instead of multicasting in two hundred groups, might do so in six, and can pack small messages targeted to a given region 37 into larger packets, increasing network utilization and reducing the receiver overhead. Fate sharing arises when nodes in a region 37 experience similar workload, suffer from similar bursts of traffic, and often miss the same packets, for example when they are dropped at sender node 45 or by a switch or other network device on the path from sender node 45 to part of communications network 23 where most nodes in region 37 are located. Caching multicast messages 13 in regions 37, for example, in caching servers 52, allows for recovering missing messages from other nodes within region 37 on a

peer-to-peer basis. Peer-to-peer recovery and reporting aggregate status to sender node 45 can protect system 100 from situations such as those when all nodes in region 37 miss a packet. In this situation, rather than having the nodes individually seek a retransmission from sender node 45, system 100 reports their aggregated loss to sender node 45, which can retransmit the packet in a single IP multicast.

[00058] Referring now to FIG. 5, loss recovery in system 100 (FIG. 1) can be accomplished by a token passing protocol in which nodes in partition 57 form intra-partition token ring. Partitions 57 within region 37, in turn, form a single regional inter-partition token ring. A selected node, for example a node with the smallest address among all those nodes in each partition 57 that have not failed, is partition leader 77, and one partition leader 77 in region 37 is also region leader 75. Periodically, region leader 75 generates token 55 (FIG. 1) to travel across region 37. Region leader 75 (partition leader 77) circulates token 55 around its own partition as intra-partition token 81. After intra-partition token 81 is received by partition leader 77, partition leader 77 passes intra-partition token 81 to another partition leader 77 in another partition 57 as an inter-partition token 79. Inter-partition token 19 circulates around partition 57 as intra-partition token 81, then again intra-partition token 81 is passed to another partition leader 77. This process continues until partition leader 77 of the last partition 57 in region 37 passes intra-partition token 81 back to region leader 75. Tokens 55 are passed using a conventional TCP-like reliable unicast protocol, [00059] Continuing to refer primarily to FIG. 5, tokens 55 (FIG. 1) serve the following purposes: (1) determine which packets were received throughout region 37, and which of those packets are unlikely to be in transit; (2) report missed packets to other nodes in region 37 and exchange packets among nodes in region 37 — both a push model, in which a node forwards packets without request from another node as it learns that the other node is missing packets, and a pull model, in which a node may request forwarding of packets if it learns that another node has them, can be used to exchange packets among nodes; (3) determine which packets have been received by all nodes throughout region 37 and report

the received packets to sender node 45 as collective regional ACK 72 (FIG. 2); (4) determine which packets were missed by all nodes responsible for caching and are not recoverable and report those packets to sender node 45 as a collective regional NAK 74 (FIG. 2); and (5) distribute information about packets that can be purged from cache, and packets that are unlikely to be in transit and thus should be actively recovered in a peer-to-peer manner.

[00060] Referring now primarily to FIGs. 2 and 5, system 100 (FIG. 1) can include a single token 55 (FIG. 1) per region 37 (FIG, 2) for all sender nodes 45 (FIG. T). Data relative to each sender node 45 actively multicasting into region 37 can occupy a part of token 55. Thus, when more sender nodes 45 multicast into region 37, the size of token 55 can grow, but the rate at which control packets circulate can stay the same. Information for a given sender node 45 can be included in token 55 when it multicasts data into region 37 and can be deleted from token 55 when all packets known to have been transmitted are acknowledged by all nodes in region 37, until some new data are received.

Herein, the term token is used to describe the portion of a circulating data structure occupied by a specific sender node 45. Note that the leadership, partition leader 77 (FIG. 5) and region leader 75 (FIG. 5), might change as the GMS notifies receiver nodes about changes in membership in group 18 (FIG. 1) and region 37 (FIG. 2). Nodes can update their status based on information from the GMS and can assume leadership in a distributed manner. Since updates from the GMS may arrive at different times to different nodes, there might occasionally be multiple self-elected leaders, which is entirely consistent of the operation of system 100. A node that the GMS considers as dead in region 37 can be excluded by other nodes in region 37 from participating in token passing and packet recovery. If the node that is considered dead is actually operating, the node can rejoin system 100 through the GMS. In order to determine which packets have been transmitted, nodes use token 55 to calculate, each time token 55 is circulated to a node of partition 57, the largest sequence number among packets received in region 37. That largest sequence number, known as a cutoff point for loss recovery, is distributed through token 55 the next time it circulates through the

nodes in partition 57 and region 37. While token 55 is circulating, nodes in region 37 consider as missing all packets with this or lower sequence numbers that they have not yet received. The natural delay in circulation of token 55 can insure the reception of recently transmitted packets that might still be buffered by some nodes in their receive queues.

[00061] Continuing to refer primarily to FIGs. 2 and 5, when receiver node

43 (FIG. 1) generates or receives token 55 (FIG. 1), receiver node 43 creates a NAK 73 (FIG. 2) representing the ranges of packets with numbers up to the cutoff point cached in partition 57 (FIG. 2) and missed at receiver node 43 and, if token 55 were received from a predecessor node in the same partition 57, receiver node

43 compares its NAK 73 with a similar NAK 73 placed in token 55 by the predecessor node. All NAKs 73 reported by the predecessor node, but not reported by receiver node 43 are forwarded, for example, by a push, by receiver node 43 to the predecessor node. All NAKs 73 reported by receiver node 43, but not by the predecessor node are to be requested, for example, by a pull, from the predecessor node. In the latter case, receiver node 43 can, for example, send a single pull request with a list of NAK ranges. If token 55 is to be passed to another node in the same partition 57, receiver node 43 can store its NAK 73 in token 55. In the illustrative embodiment, there is one such NAK 73 in token 55, but the scope of the embodiment allows for other possibilities. In the illustrative embodiment, push and pull requests can be issued between neighbors on the ring to control the size of token 55. It is thus possible for receiver node 43 to pull a packet from a predecessor node, and at the same time have it forwarded by a successor node on the ring. It is also possible for receiver node 43 to either pull or push the packet, depending on the information stored in token 55.

[00062] Referring now primarily to FIG. 5, the rules described with respect to token passing, referred to herein as a token protocol, can insure recovery of the cached packets among nodes within a single region 37 (FIG. 2) provided that the packet was delivered to at least one of the caching servers 52 (FIG. 1). To recover from packets missed by the whole partition 57 and therefore not cached on any of caching servers 52, nodes in partition 57 can use token 55 to calculate

the intersection of their NAKs 73, which is then included in a regional NAK 74 (FIG. 2) forwarded to sender node 45 (FIG. 1) by partition leader 77 (FIG. 5). Each partition 57 can report its own losses independently. While processing token 55, receiver node 43 can also create NAKs 73 for packets cached in other partitions 57 and can send pull requests to partition leaders 77 of other partitions

57, for example, in the form of one request per partition 57 containing a compressed list. These requests are satisfied, for example, as soon as data are available. Partition leaders 77 usually contain the most up to date packets, and because a push involves simply transferring data between neighbors, while a pull requires that data be requested first, hence forwarding towards partition leader 77 can be more efficient. However, choosing a random node in every partition 57, for example, is another of the solutions implemented within the scope of the present embodiment. Token 55 can be used to calculate the maximum value such that all packets with numbers up to this value are stable within region 37. The maximum value can be sent by region leader 75 (FIG. 5), in the form of ACK 71

(FIG. 2), to sender node 45 (FIG. 2), and can also be circulated in token 55 to indicate which messages are to be purged from caching server cache. Token 55 can also optionally be used to calculate a list of numeric ranges, representing the set of sequence numbers of packets that are stable within region 37. ACK 71 sent to sender node 45 may contain a list of numeric ranges, as opposed to a single numeric value, to facilitate a more efficient cleanup at the cost of more expensive processing.

[00063] Optimization possibilities can include, but are not limited to (1) controlling the number of NAK ranges reported by nodes; (2) delaying requests for a given packet by nodes outside of the packet' s caching partition until the packet is stable on all caching servers 52 on which the given packet is cached in order to (a) control the amount of packet state information required, (b) insure that pull requests can be satisfied immediately, and (c) simplify handling of the case where the same packet is requested twice in different token rounds; and (3) generating token 55 once per second, for example, hi the illustrative embodiment, control traffic, including, for example token 55, can be handled by a

reliable TCP-like unicast protocol, and can be rate-controlled. The rate can be set to a value that minimizes the impact it might have on multicast traffic. [00064] Referring now primarily to FIGs. 1 and 5, in the illustrative embodiment, the rate of multicast message transmission can be controlled by an adaptive rate control scheme in which sender node 45 (FIG. 1) adjusts its multicast rate based on a value received from nodes in region 37 (FIG. 1) and representing the lower bound on the rate at which all nodes in region 37 can receive multicast messages 13 (FIG. 1) from sender node 45, The value can be a function of a smoothed estimate maintained by each receiver node 43 (FIG. 1) in region 37 that represents the rate at which receiver node 43 is currently receiving new, previously unseen packets, the smoothed estimate computed, for example, by a moving average of the node's rate history calculated in fixed intervals. Region leader 75 (FIG. 5) can periodically calculate a minimum of these smoothed estimates and divide the minimum by the number of sender nodes 45 currently multicasting into region 37. This minimum can be sent with ACK 71

(FIG. 2) to all sender nodes 45, and it represents a lower bound on the currently available bandwidth. Each sender node 45 can then set the rate at which it sends multicast messages 13 as the maximum of (1) a pre-determined minimum rate necessary to kick system 100 out of stagnation after a long period of inactivity or a series of massive losses, and (2) a growth coefficient plus one times the previously computed minimum, where the growth coefficient can represent a tradeoff between the latency at which the adaptive scheme of system 100 tunes up to the maximum available rate and losses resulting from tuning it too high. This rate can grow over time because it is slightly higher than the rate declared by the slowest receiver node 43. Further, the actual capacity of region 37 is higher than the rate communicated to sender node 45, thus possibly reducing loss rates. Sender node 45 is tuned according to a monotonically increasing function of a parameter that is adjusted periodically, for example at most once per second, and is itself a function of the measured actual sending rate, calculated by sender node 45. In the illustrative embodiment, for example, (1) there is an "internal" control component that can adjust the sending rate based on a supplied parameter. It can

be any component, provided that it that has the property that the sending rate it produces is a function f(a) of the supplied parameter a, where f is monotonically increasing. The credit system described below is only one of many possible ways such functionality could be included in system 100; (2) there is an "external" control component that can "tune" the internal component. The external component can be required if no single internal component can send at a rate f(a) exactly equal to a desired rate a. The external component can attempt to tune the internal component, by asking it to send at rate a slightly higher than the desired rate. The external component is an adaptive scheme that works as follows. First, its output can be equal to a supplied desired rate. Then, its output can be periodically changed. Specifically, the output (which is the parameter that the external component provides to the internal component), can be periodically increased by the product of a parameter "inertia", and a value obtained by taking the difference between the number one and the proportion of the desired sending rate to the actual, observed sending rate. The parameter "inertia" can allow for a tradeoff between the stability of the mechanism and the speed at which system 100 can adjust the sending rate to match the maximum capacity of receiver nodes 43. [00065] Further, message sending can be metered according to the credit system, for example, sending a message consumes a credit out of a fixed pool of a pre-determined size, and sender node 45 (FIG. 1) can send multicast messages 13 (FIG. 1) if the credit level is positive. Credits can be recreated over time. When the credit level is less than a pre-selected threshold, a timer can be set for a recovery interval that is based on the minimum of (1) a high credit pool size minus the current credit level, and (2) the maximum number of credits to recover, the minimum being divided by the multicast send rate. When the timer expires, the number of credits can be increased by the time that has elapsed multiplied by the multicast send rate. If the credit level is still below the high credit pool size, the timer can be reset according to the above description using the above formula until that credit level is reached.

[00066] Referring now primarily to FIG. 6, method 200 can include, but is not limited to, the steps of determining, from a multicast message 13 (FIG. 1), at least one multicast message group 18 (FIG. 1) having receiver IP addresses 24 (FIG. 1) for receiver nodes 43 (FIG. 1), determining at least one region 37 (FIG. 1) having a region subset of IP addresses 24 from the at least one multicast message group 18, assigning an IP address 24 to the at least one region 37 to enable delivery of the multicast message 13 to the region subset, creating at least one partition 57 (FIG. 1) in the region 37, the at least one partition 57 including a partition subset of the region subset of the IP addresses 24, and simultaneously initiating a dissemination subsystem 26 (FIG. 1) and a recovery subsystem 30

(FIG. 1). The dissemination subsystem 26 can route the multicast message 13 from a sender node 45 to the at least one region 37 through a plurality of senders and feeds. The recovery subsystem 30 can pass a token 55 (FIG. 1) among receiver nodes 43 in the region subset and the partition subset to gather control information related to multicast message 13, and can recover, using the control information, at least one lost multicast message destined for receiver node 43 from another receiver node 43 in the partition subset if the at least one lost multicast message can be recovered from the partition subset, from another receiver node 43 in the region subset if the at least one lost multicast message cannot be recovered from the partition subset, and from sender node 45 if the at least one lost multicast message cannot be recovered from the partition subset or the region subset.

[00067] Method 200 (FIG, 6) can be, in whole or in part, implemented electronically. Signals representing actions taken by elements of the system can travel over electronic communications media and from node to node in communications network 23 (FIG. 1). Control and data information can be electronically executed and stored on computer-readable media such as medium 53 (FIG. 1). Method 200 can be implemented to execute on a node in computer communications network 23. Common forms of computer-readable media, such as medium 53 can include, but are not limited to, a floppy disk, a flexible disk, a hard disk, magnetic tape, or any other magnetic medium, a CDROM or any other

optical medium, punched cards, paper tape, or any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, or any other memory chip or cartridge, a carrier wave, electronic signal, or any other medium from which a computer can read.

[00068] Although various embodiments have been described, it should be realized that a wide variety of further and other embodiments are within the scope of this disclosure,

[00069] What is claimed is: