Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND APPARATUS FOR A GUARANTEED EXACTLY ONCE PROCESSING OF AN EVENT IN A DISTRIBUTED EVENT-DRIVEN ENVIRONMENT
Document Type and Number:
WIPO Patent Application WO/2018/234265
Kind Code:
A1
Abstract:
The present invention relates to an event-driven processing system for ensuring a singular event event in a data grid, the system comprising a grid of nodes providing processing and in-memory data, the system further comprising a cache module provided by the data grid, an event processing module provided by the data grid, and a reactive stream module connected to but not provided by the grid, and configured to execute an event when instructed by the event processing module, wherein: (a) the cache module comprises at least a cache for storing identifying information about events to be processed in a node ("to-handle cache"); (b) the event processing module comprises at least a first node comprising i.a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system; and ii.a processing interceptor logic ("notifier") for copying the entity reference to the reactive stream; and (c) the reactive stream module comprises i. a processing module ("reactive stream") for distributed execution of a request, ii. an entity specific logic for updating the first and/or second cache after an event has been processed and related transactions has been committed; and iii. a control logic for verifying data in the first cache relating to a specific event to be processed with committed transactions, and for rolling backany transaction for events no longer present in the to-handle cache prior to commitment, such that an event entity is transacted exactly once only.

Inventors:
GOOVAERTS DANIEL (BE)
GRIMBERS PAUL (BE)
Application Number:
PCT/EP2018/066175
Publication Date:
December 27, 2018
Filing Date:
June 19, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THE GLUE (BE)
International Classes:
G06F9/46
Foreign References:
US20130151889A12013-06-13
US20150242138A12015-08-27
US20130151889A12013-06-13
Attorney, Agent or Firm:
HGF BV (NL)
Download PDF:
Claims:
Claims

1. An event-driven processing system for ensuring a guaranteed exactly singular event processing in a data grid, the system comprising a grid of nodes providing processing and in-memory data, the system further comprising a cache module provided by the data grid, an event processing module provided by the data grid, and a reactive stream module connected to but not provided by the grid, and configured to execute an event when instructed by the event processing module, wherein:

(a) the cache module comprises at least a cache for storing identifying information about events to be processed in a node ("to-handle cache"); and optionally, a cache for storing identifying information about events to be processed in a node ("event cache");

(b) the event processing module comprises at least a first node comprising

i. a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system; and

ii. a processing interceptor logic ("notifier") for copying the entity reference to the reactive stream; and (c) the reactive stream module comprises iii. a processing module ("reactive stream") for parallel execution of a events, iv. an entity specific logic for updating the first and/or second cache after an event has been processed and related transactions has been committed; and v. a control logic for verifying data in the to-handle cache relating to a specific event to be processed is still present for a transaction to be committed, and for rolling back any transaction for events no longer present in the to-handle cache, such that an event entity is transacted exactly once.

2. A system according to claim 1, wherein cache module (a) comprises at least

i. a first cache for storing identifying information about events to be processed in a node ("event cache"); and

ii. a second cache for storing data associated with event scheduling and handling by the event handled by the node ("to-handle cache").

3. A system according to claim 1 or claim 2, wherein the at least first node comprises

iii. a request processor for receiving a request from the request handler and identifying an event to be processed based on the request; and

iv. an internal in memory cache ("changed entity list') for listing an event id, data associated with the event identifier and event run time, and data associated with whether the node is designated a primary node scheduled to execute the event, or as a secondary backup node; and

v. processing interceptor logic ("notifier") for transferring the identify! information about the event from the first event (entity) cache to the second (to- handle) cache, and for copying identifying information about an event entity from the second (to-handle) cache to the "changed entity list".

4. A system according to any one of claims 1 to 3, wherein the processing module comprises

VI an event dispatching logic {"dispatcher") for dispatching the event data to an event commitment and execution module specific for a given event, and

VII event specific logic for assessing if data concerning a specific event to be executed is still present in the second cache, and logic for updating the second cache after an event transaction is about to be committed.

5. A system according to any one of the previous claims, wherein on each update in the second cache, the data in the second cache are written to disk. 6. A system according to any one of the previous claims, wherein the cache module also comprises a third cache ("entity archive") for maintaining an archive of completed events, and configured to hold data placed by the processing module logic on a completed event, and after removal from the first cache.

7. A system according to any one of the previous claims, wherein the cache module also comprises a fourth entity cache ("dead letter cache") for maintaining storing data relating to events that could not be executed to the a system external factor, for reference by an operator.

8. A system according to any one of the previous claims, further comprising an entity specific logic to initiate a roll back of a transaction that is about to be committed, but where no corresponding entry is found in the to-handle cache. 9. A system according to any one of the previous claims, wherein the node is a first node and the system further comprises one or more second nodes comprising a separate cache ("Changed entity list cache"), processing interceptor logic, and event transfer logic, wherein the node holding the primary copy ("the primary node") of a referenced event is determined as the primary active node in the second ("to handle") cache, whereas the one or more nodes holding a secondary copy are determined as the secondary or back-up nodes that are not executing the referenced event.

10. A system according to claim 9, wherein on each update in the first and/or the second cache, a recovery logic copies data in the changed entity list cache of the first node into the changed entity list of the one or more nodes holding a secondary copy ("the secondary or back-up nodes"), to update the data in their internal cache accordingly from the first and/or second cache. 11. A system according to any one of the previous claims, further comprising watchdog logic which is configured to, at a regular time interval or a predetermined event, to verify all listed entity events in the changed entity list and to compare with the data in the second "to-handle" cache, and retriggers the processing interceptor logic in the node that is designated as holding the primary copy of the reference entity stored in the changed entity list cache, preferably after a predetermined time, more preferably at a time interval suitable to guarantee prompt processing by the reactive stream, yet more preferably in an interval of from 1 to 20 seconds.

12. A system according to claim 11, wherein each node comprises a watchdog logic, the watchdog logic configured to check the entries on the to-handle cache, and to determine whether the node in which the watchdog is activated holds the primary copy of the referenced entity; the age of the entry reference, and whether the entry reference is on the reactive stream; and wherein if the age of the entry reference is smaller than a predetermined maximum age, the entry reference is not yet on the reactive stream, directly places the reference on the reactive stream.

13. A system according to anyone of the previous claims, wherein on each update in one or more of the first and the second cache, the data in the first and the second cache are written to disk. 14. A system according to anyone of the previous claims, wherein the grid comprises logic is configured to promote a secondary node to a primary node if an event is found that should have been handled by the primary node.

15. A system according to anyone of the previous claims, wherein the watchdog logic is configured to directly add an event to reactive stream to ensure prompt processing. 16. A system according to anyone of the previous claims, wherein the reactive stream comprises a dispatcher logic configured for splitting the reactive stream into distributed parallel streams, wherein each stream is sequentially handling the events relating to a specific referenced entity.

17. A system according to anyone of the previous claims, wherein the reactive stream comprises an entity processor logic configured to verify if there are overdue entity details in the second cache that need to be processed, and to move the entity reference directly to the transaction in the reactive stream.

18. A system according to claim 17, further comprising a processor configured to move an entity or to a dead letter cache for the attention of an operator if an issue external to the system prohibits the completion of a transaction.

19. A system according to claim 17 or claim 18, further comprising a processor configured to start entity specific logic to initiate a roll back of the transaction if there is no corresponding entity in the to-handle cache.

20. A system according to any one of the preceding claims, further configured to synchronously copy data to a second and/or further node located at a different location to ensure stability of the system.

21. A system according to claim 20, wherein the at least a second node is located on a server at a first predetermined physical distance of at least 10 km away from the server running the first node and first data cache.

22. A system according to claim 20 or 21, further configured to provide, and to asynchronously copy data to a further secondary node at a second predetermined physical distance, preferably on a server located at a distance of more than 100 km away, to be verified by an asynchronous recovery logic.

23. A process for providing a singular event for a given event request or external event, the request comprising at least one type of event, wherein an event has a beginning and an end and comprises a plurality of actions triggered by the external events or events translated from user requests received in a system according to any one of claims 1 to 22, the process comprising: i. providing a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system;

ii. providing a first grid cache for storing identifying information about events to be processed in a node;

iii. providing a second, to-handle cache for storing data associated with the scheduled events;

vi. providing at least a first node each comprising a request processor for receiving a request from the request handler and identifying an event to be processed based on the request; and preferably providing a secondary node scheduled to contain an inactive backup version of the information about events to be processed in the primary node,

vii. providing processing interceptor logic for starting the processing steps on saving of identifying information about an event in the second cache; and viii. providing a reactive stream to commit and complete the transactions related to the event entity; and

ix. providing event transfer logic for transferring the identifying information about an event from the first cache to an entity archive cache on completion of the processing steps associated with the event; and

x. providing for a control logic and an entity specific logic for rolling back any transactions where there is no corresponding entry on the to handle cache prior to commitment.

24. The process according to claim 23, further comprising

(a) storing identifying information about an event to be processed in at last a first node in the first cache, and storing data on which node to be determined as the node holding the primary copy of the event in the second cache, and instructing the event processing module comprising the at least a first node to activate a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system; and

(b) copying the entity reference to the reactive stream by a processing interceptor logic ("notifier"); and

(c) dispatching the entity reference to the reactive stream module for committing transactions, and upon completion of the event transactions, transferring the data from the first and/or second cache to an entity archive cache, and rolling back any transaction for events no longer present in the first cache, such that an event entity is transacted once only.

25. The process according to claim 24, wherein step (a) includes writing the subsequent changes in the status information to the first and/or second cache, and wherein a node is determined as a primary node for performing an event, and wherein at least a second node is determined as secondary node that is not active to perform the same event, and wherein the first and second cache are updated accordingly.

26. The process according to any one of claims 22 to 25, wherein if the primary node is not active or deactivated before executing a transaction, the grid logic promotes a secondary node to primary status, configured to transact the event as a primary node.

27. Machine executable instructions that when executed by at least one processor cause the at least one processor to perform the process of ay one of claims 22 to 26.

28. A non-transitory machine readable medium comprising machine executable instructions according to claim 27.

29. Machine readable storage storing machine executable instructions according to claim 27.

30. Use of a system according to any one of claims 1 to 21 to ensure prompt event execution even in case of a partial system failure, while ensuring a singular execution of an event.

Description:
System and Apparatus for a Guaranteed Exactly Once Processing of an Event in a Distributed Event- Driven Environment Field of the Invention

Aspects of the present invention relate to data storage and retrieval, and more particularly to a method, system and computer program product for creation and deployment of a specific data event in an in memory data grid or similar data storage arrangement, particularly for an institution possessing large amounts of data in mission critical systems. The invention particularly relates to a front-end interface with a handler connected to a node, storing data concerning events captured by an event interface, to provide a more rapid response to a customer's request of the system and apparatus, and guaranteed exactly once execution of the actions related to the event.

Background of the Invention

In many environments one of the most important features is to ensure that a running application continues to run, and to execute an order or activity precisely only once, even in the event of one or more system or software faults. So-called mission critical systems in telecommunications, military, navigational, financial and other and real-time processing systems rely on their inbuilt redundancy, or replication, to repair and avoid faults, and to ensure a singular, i.e. only once execution of any specific request.

While in traditional monolithic applications faults, e.g. of servers, networks or subsystems are accounted for by having a physical backup system or application replica take over and continue to provide the service, this does not ensure that a particular transaction or event has been processed and completed, which may result in a doubling, or omission of a certain action once a server or replica has taken over. Failure of a server or otherwise part of the computing system however requires a system restart, whereby the status of processes will have to be taken from the back-up data that has been collected, and hence typically is slow and costly to implement.

Redundancy to avoid such issues can also be achieved by running a complete system in parallel. However, issues can be corrected quite easily, as the systems are structured linearly and sequentially, and hence it can be ensured that a process as the result of a particular request is performed only once. As such, these processes are unsuitable for high data volumes and high event traffic.

In recent years, distributed computing, allowing for parallel processing through the use of microservice architecture, such as for instance in an Event Driven Architecture (EDA), has become popular. EDA has now become more widely applied, in particular among users with large amounts of data and increasing numbers of transactions. Such users include utilities, e.g. electricity, gas, heat and water providers; telecom providers, and transport providers, both traditional such as trains railway systems, or more recent developments, such as navigation systems of e.g. autonomous cars; processes for in-time planning of production, logistic processes such as traffic planning data for trains, subways, waterways, bridges, air control and the like, electricity or natural gas provision, but also provision of health services. Also banking institutions stand out as servicing in the above way, as they operate large databases, and need to handle multiple requests by a variety of different users, and many external events.

All of these large scale and high data traffic processes have in common that they are critically depending on the accuracy of data that effects changes in the system, and are hence mission critical.

In particular, it is critical in such processes that a certain transaction may only be singular, i.e. they are performed once, and only once.

However, as Event-based architectures inherently decouple system components, the implementation of fault tolerant systems in a system with EDA has been particularly challenging vis- a-vis the traditional linear architecture. In EDA systems, instead of being able to manage transactions inside the boundaries of a single monolithic application, the system must be able to manage consistency using transactions that are distributed across a network of many different applications and databases.

Due the decentralised nature of the distributed computing in particular systems using a data grid, this may lead to issues with recognition of errors, and to secure execution in case of a partial system failure, as the interrelated nature of the grid makes it more difficult to spot, and to correct errors, as the distributed state means that diagnostics and error handling are more complex than the sequential synchronous predictive transactional centralized systems.

In an event driven system, an application can be notified that an event has happened and then take action. However, if something goes wrong, e.g. a machine goes down while the event is happening, then the event handling is stopped. Even if the machine is recovered, the processing of the event would then not be complete because the event would have been recorded as having already happened, and any records of the event and the processing states would be lost. Examples for such systems include the system and process disclosed in US2013/151889, whereby the system only comprises a primary system that may cause a data overload when rolling back transactions.

Accordingly, there is a need for improved data processing systems, allowing distributed computing and event driven architecture, as well as guaranteed and stable process to ensure that a certain event activity, or data event, is performed guaranteed, but only once.

Summary of the Invention Accordingly, the present invention relates to an event-driven processing system for ensuring a singular event in a data grid, the system comprising a grid of nodes providing processing and in-memory data, the system further comprising a cache module provided by the data grid, an event processing module provided by the data grid, and a reactive stream module connected to but not provided by the grid, and configured to execute an event when instructed by the event processing module, wherein:

(a) the cache module comprises at least a first cache for storing identifying information about events to be processed in a node ("to handle cache");

(b) the event processing module comprises at least a first node comprising

i. a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system; and

ii. a processing interceptor logic ("notifier") for copying the entity reference to the reactive stream; and

(c) the reactive stream module comprises

i. a processing module ("reactive stream") for parallel execution of an event, ii. an entity specific logic for updating the at least first, and optionally, and/or second cache, after an event has been processed and related transactions has been committed; and

iii. a control logic for verifying data in the first cache relating to a specific event to be processed with transactions that are to be committed by the reactive stream, and for either removing the data from the stream for an event that is committed, or for rolling back any transaction for events no longer present in the first cache, such that an event entity is transacted exactly once.

The improved system preferably is compatible with an existing data processing apparatus and system of the institution. The present invention also advantageously relates to a process for providing a singular event for a given event request or external event, the request comprising at least one type of event, wherein a event has a beginning and an end and comprises a plurality of actions triggered by the external events or events translated from user requests received in the subject system, the process comprising: i. providing a request handler for receiving a request corresponding to an event associated with a number of processing steps to be taken in the system; ii. providing a first grid cache for storing identifying information about events to be processed in a node; vi. providing a second cache for storing data associated with the scheduled events, i.e. the so called to handle cache; vii. providing at least a first node each comprising a request processor for receiving a request from the request handler and identifying an event to be processed based on the request; and preferably providing a secondary node scheduled to contain an inactive backup version of the information about events to be processed in the primary node, vii. providing processing interceptor logic for starting the processing steps on saving of identifying information about an event in the first cache; and viii. providing a reactive stream to commit and complete the transactions related to the event entity; and ix. providing event transfer logic for transferring the identifying information about an event from the first cache to an entity archive cache on completion of the processing steps associated with the event; and x. providing for a control logic and an entity specific logic for rolling back any transactions that had been duplicated.

In a further aspect the present invention relates to machine executable instructions that when executed by at least one processor cause the at least one processor to perform the subject process. In a further aspect the present invention relates to a non-transitory machine readable medium comprising machine executable instructions according to the subject process. In a further aspect the present invention relates to machine readable storage storing machine executable instructions according the present process. In yet a further aspect, the present invention also relates to the use of a system according to the invention to increase the stability of the system by ensuring that an event is handled once and only once.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The present invention is further described in the detailed description which follows in reference to the noted plurality of drawings by way of non-limiting examples of embodiments of the present invention in which like reference numerals represent similar parts throughout the several views of the drawings and wherein:

FIG. 1 is a flow diagram illustrating a process that makes use of the architecture, from an applications standpoint, of a preferred embodiment of a server in the system and apparatus of this invention. Preferably the system and apparatus of this invention comprise at least two parallel servers, one being a back-up for the other, each with its own front-end interface, event interface, back-end interface, and plurality of nodes.

FIG. 2 is a flow diagram of the operation of the front-end interface or request handler of the system and apparatus, and the different entities that are active. The request handler communicates with front end applications for receiving external requests from users to allow the users to access services provided via the system and apparatus. Figure 2 illustrates the specific working of the subject system and process, depicting the most relevant components.

FIG. 3 is a flow diagram of the operation of the watch dog process that guards processing of the events in case of failure of one or more servers and put the events on the the reactive stream. FIG. 4 is a flow diagram of the operation of the event interface of the system and apparatus of this invention, whereby different nodes handle a mission critical request, allowing for a singular execution, and giving an interlock avoiding doubling of activities.

The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The following paragraph illustrates the meaning of the reference numerals in the figures:

Figure 1:

107 external request handler

121 connective node

based on timer start watch dog

For each entry reference on the "changed entry list"

The node holds the primary copy of the referenced entity and age of the entry reference is smaller than the maximum age and the reference is not yet on the reactive stream

Yes

No

Put the reference on the reactive stream

Watch Dog

Scheduler

Figure 4:

402 Entity creater 403 Create entity

404 Persist entity

405 Start transaction

406 Put entity in entity cache

407 Start transaction

413 Interceptor logic

414 After commit

436 When there are entity references on the stream, the reactive forwarder passes the events to the entity processor and removes them from the stream

416 Dispatcher

417 Entity Processor

419 After completion

420 To control cache

421 Transaction monitor

422 Put entity reference in 'to-handle-cache' with time stamp (moment of creation and number of attempts =0)

423 Puts entity reference in changed entity list

424 Commit Transaction

425 Commit Transaction

426 For each reference on the « changed entity list »

427 Node holds primary copy of the referenced entity

428 Copy entity reference to reactive stream

431 Remove entity details from entity cache

432 Empty « changed entries list »

435 Group entity references by Entity-ID and move each group to a dedicated stream for that entity-ID

436 When there are entity references on the stream, the reactive forwarder passes the entities to the entity processor and removes them from the stream

437 Start transaction

438 Start transaction

440 number attemtpes < maximum number attempts

441 Move entity reference from 'to-handle-cache' to 'dead-letter-cache'

442 Commit transaction

443 Commit transaction 444 Updates to caches

445 Incremental number of attempts

446 Start the entity specific logic

447 Entity specific logic

448 Entity specific logic Reports « error »?

450 Roll back of transaction

452 Entity reference is still in « To-handle cache »

453 Remove Entity reference from « To-handle cache »

454 Move entity from « Entity cache » to « Entity archive cache »

455 Commit Transaction

456 Commit Transaction

457 Roll back of Transaction

458 Roll back of Transaction Detailed Description of the Invention

In an event driven system, an application can be notified that an event has happened and then take action. If something goes wrong such as a machine goes down while the event is happening then the event handling is stopped. Even if the machine is recovered, the processing of the event wouldn't complete because the event has "already happened" and any records of the event and the processing states would be lost. For example if an event is at 12 o'clock, the processing starts at 12.01 (but does not complete) and a system goes down at 12.01 and is not recovered until 12:02 then the event would not be processed. For most systems the event is not considered a high importance so there is no method in place to manage this. As in the figure below, if the system goes down then the activity is not performed. Event sourcing is a generic mechanism of events but handling of events may occur more than once or not at all.

The present system advantageously guarantees processing of an event, and only once, while using minimal redundancy.

When a system is notified of an event, the event is immediately stored in a cache, and on disk, so that it does not get lost.

The present system permits to do the following: After a failure, the system can start processing and look at the series of events which have happened but which have not yet been handled and restart processing these events. As soon as the event has happened, it is stored permanently in memory and then written to a disk. This advantageously combines features of the in-memory data grid, as the grid allows fast storage and so stores as soon as the event has occurred, and then if there is a failure the system knows exactly what has not been processed, with features of a linear system, e.g. a reactive stream that does not run inside the grid.

The data grid advantageously further allows for an event to be written to the in-memory data grid and stored to a disk automatically. By storing what has been processed it is possible to check which events that have not been processed and find which ones need processing.

Unless otherwise stated, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fibre, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fibre cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Preferably, Cache module (a) comprises at least a first cache for storing identifying information about events/entity to be processed in a node ("event/entity cache"); and a second cache for storing data associated with event scheduling and handling by the event handled by the node ("to be handled cache").

Preferably, the at least first node comprises a request processor for receiving a request from the request handler and identifying an event to be processed based on the request; and an internal in memory list ("changed event list") for listing an event, data associated with the event identifier and event run time, and data associated with whether the node is designated a primary node scheduled to execute the event, or as a secondary backup node; and a first processing interceptor logic ("notifier") for transferring the identifying information about the event entity from the first event (entity) cache to the second (to handle) cache, and for copying identifying information about an event entity from the second (to handle) cache to the ("changed entity list").

Also, new events may be created as part of the processing of an earlier event. These new events are then processed in the same way, i.e. they are placed into the appropriate caches, and from there the processing starts.

Preferably the processing module comprises an event dispatching logic ("dispatcher") for dispatching the event entity data to an event commitment and execution module specific for a given entity, and entity specific logic for assessing if data concerning a specific event entity to be executed is still present in the first and/or second cache, and logic for updating the first and/or second cache after an event has been completed.

Preferably, on each update in the first and the second cache, the data in the first and the second cache are written to disk. Preferably, the cache module also comprises a third cache ("entity archive") for maintaining an archive of completed events, and configured to hold data placed by the processing module logic on a completed event, and after removal from the first cache. Preferably, the cache module also comprises a fourth entity cache ("dead letter cache") for maintaining storing data relating to events that could not be executed to the a system external factor, for reference by an operator. Preferably, the system further comprises an entity specific logic to initiate a roll back of a transaction relating to a previously completed, and to update the first cache accordingly.

Preferably, the node is a first node and the system further comprises one or more second nodes comprising a separate cache ("Changed entity list cache"), processing interceptor logic, and event transfer logic, wherein the node holding the primary copy ("the primary node") of a referenced event is determined as the primary active node in the second ("to handle") cache, whereas the one or more nodes holding a secondary copy are determined as the secondary or back-up nodes that are not executing the referenced event.

Preferably, wherein on each update in the first and/or the second cache, a recovery logic copies data in the changed entity list cache of the first node into the changed entity list of the one or more nodes holding a secondary copy ("the secondary or back-up nodes") update the data in their internal cache accordingly from the to handle cache.

The recovery logic may thus select "to be handled" events for which the node on which the recovery logic runs is holding the primary copy of the event, and may place them on the changed entity list.

Preferably, the recovery logic runs on every node, so also on the nodes where a backup copy is held. However, on the "backup" node, the "backed up" events are not selected, thus no duplicate processing. Only if that back-up node is made primary, the watch dog may select the scheduled events from the t- handle cache, to be processed in the now primary node.

Preferably , further comprising watchdog logic which is configured to, at a regular time interval or a predetermined event, to verify all listed entity events in the changed entity list cache and to compare with the data in the second cache ("to handle cache"), the first cache, and retriggers the processing interceptor logic in the node that is designated as holding the primary copy of the reference entity stored in the changed entity list cache, preferably after a predetermined time, more preferably at a time interval suitable to guarantee prompt processing by the reactive stream, yet more preferably in an interval of from 1 to 20 seconds.

Preferably, each node comprises a watchdog logic, the watchdog logic configured to check the entries on the second cache, and to determine whether the node in which the watchdog is activated holds the primary copy of the referenced entity; the age of the entry reference, and whether the entry reference is on the reactive stream; and configured to where the age of the entry reference is smaller than a predetermined maximum age, the entry reference is not yet on the reactive stream, directly places the reference on the reactive stream.

Preferably, on each update in the first and the second cache, the data in the first and the second cache are written to disk.

Preferably, the watchdog logic is configured to directly add an event to reactive stream to ensure prompt processing. Preferably, the reactive stream comprises a dispatcher logic configured for splitting the reactive stream into distributed parallel streams, wherein each stream is sequentially handling the events relating to a specific referenced entity. Preferably, the reactive stream comprises an entity processor logic configured to verify if there are overdue entity details in the second cache that need to be processed, and to move the entity reference directly to the transaction in the reactive stream.

The system and method according to the present invention also comprises grid logic configured to promote a secondary node to a primary node if an event is found that should have been handled by the primary node, and wherein the promotion is updated on the second cache. Once the secondary note has been prompted to become a primary note, the watchdog logic running in that new primary node will then pick up the events listed in the to handle cache, and convey them to processing.

Preferably, the system further comprising a processor configured to move an entity or to a dead letter cache for the attention of an operator if an issue external to the system prohibits the completion of a transaction.

Preferably, the system further comprises a processor configured to start entity specific logic to initiate a roll back of the transaction, and to update the first cache accordingly. The term (EDA) or Event-Driven Architecture relates to a process and system that permits implementing multistage business processes that deliver goods, services and information with minimum delay.

EDA is based on distributed processing, whereby "nodes", i.e. microservices react to incoming events, and publish events in response. Event channels then transport events from one node to the next, usually asynchronously. This means the system is quicker and responds faster, as a reaction follows as soon as an event is triggered, and usually no central response is required to further the process. EDA often is used in conjunction with in memory data grids (IMDG) to store data that is more frequently used or requested, as it allows for fast scaling. The IMDG then in turn may be used in conjunction with traditional databases. The more frequently used or accessed data in the cache of the IMDG enables faster data access as the data is accessed from memory rather than from the database, which also reduces stress on the database. Also, IMDGs have the advantage of providing caches which are distributed over a number of nodes, thereby acting as channel and allowing a faster exchange of events.

As all of the processing logic is collated in a single large user application, namely the grid, there is always the risk of some code or action being corrupted or incomplete when introducing a change, leading to a potential total or partial failure of the entire system.

The term "event based processing" as used herein preferably means event-driven data programming that has an application flow control determined by events or changes in state, event based processing is an ordered structure or chain of events and functions. It provides various connectors that allow alternative and parallel execution of processes. Furthermore it is specified by the usages of logical operators, such as OR, AND, and XOR. Event based processing requires non-local semantics, i.e., the execution behaviour of a particular node within an event based processing stream may depend on the state of other parts of the event based processing.

The term "activity" or "function" as used herein preferably means an active component of an event based processing that has decision-making authority and that typically consumes time and resources.

The term "event" preferably means, in accordance with DIN 69900, a condition of an event based processing that has occurred and causes a sequence of activities. An event is a passive component of an event based processing and has no decision-making authority. Events can trigger activities.

The term "node" as used herein preferably means a self-contained storage node of an event based processing, especially a node interconnected with other nodes in a storage grid so that any node can communicate with any other node without the data having to pass through a centralized switch. Each node contains its own storage medium, microprocessor, indexing capability, and management layer. Because event based processing requires non-local semantics, the execution behaviour of a particular node within an event based processing may depend on the state of other nodes of the event based processing, possibly far away. A cluster of several nodes may share a common switch, but each node is also connected to at least one other node cluster. Nodes are individual parts of a larger data structure, such as linked lists and tree data structures.

A node, in this regard, may be considered to include any subset of the available soft- and hardware resources in a data processing system, and generally, though not necessarily, may include one or more processors, one or more memory devices, and in some instances additional hardware resources such as input/output (I/O) resources, network resources, or other types of resources capable of being allocated to a logical partition. Nodes in some embodiments may be implemented as multi-chip modules, circuit boards or cards, books, drawers, racks, slots, chassis, or even combinations thereof.

Nodes may also be self-contained computer systems in some embodiments. In embodiments where event-driven optimization is used to optimize data stability or power consumption, nodes may also be defined based upon groupings of hardware resources that can be separately powered on, powered off, or otherwise controlled to increase or decrease power consumption.

Thus, from the perspective of the herein-described embodiments, a node may include at least a collection of processors and memory that may be powered down, or powered completely off, separately from other nodes.

Preferably nodes are stored in grids. Grid storage introduces a new level of fault-tolerance and redundancy. If one storage node fails or a pathway between two nodes is interrupted, the network can reroute access another way or to a redundant node. This reduces the need for online maintenance, which practically eliminates downtime. Also, the multiple paths between pairs of nodes ensures that a storage grid can maintain optimum performance under conditions of fluctuating load. Also, grid storage is scalable. If a new storage node is added, it can be automatically recognized by the rest of the grid. This reduces the need for expensive hardware upgrades and downtime. Suitable software packages for grid storage for handling the stored data are available from Gemfire, Hazelcast, XAP Gigaspaces and GridGain. In this regard, the software package, used, preferably acts as a data access and processing layer between the application and the data storage and uses memory (RAM) as the primary data repository for data being processed rather than a disk.

Normally, grid storage systems for handling stored data involve the use of only one data cache, accessible by all nodes. However, as not all the nodes in the system and apparatus of this invention have the same functionality, each separate event handled by the system and apparatus of this invention must only end up in the nodes that can handle it. To this end, the system and apparatus of this invention have different dedicated data caches for each event type, and each of these data caches is only accessible by the nodes that have the code for this event type. Gridgain, for example, offers technology to have data stored in a dedicated data cache and distributed only to a predetermined set of nodes. When data is put in a data cache, it is sufficient just to inform GridGain of the name of the dedicated data cache. In this regard, the names of each set of dedicated data caches can be stored and accessible to GridGain in a configuration cache.

In the system and apparatus of this invention, each dedicated data cache is provided with a specific name for each of the event according to the invention, for which it is storing data. The nodes which can handle a specific event, put the dedicated data cache name in a first entity cache, and into three other caches, all arranged in the grid. When a node starts up, it registers itself with the grid and requests that the grid provide access to the dedicated data cache with the specific name of the event handled by the node.

The grid cache is also informed when nodes start up and which are available. The dedicated cache contains the event type and the event name. By each node registering itself in the grid, the grid knows from the configuration cache which nodes should be able to access the dedicated data cache with the specific event name. The request handler knows the name of the dedicated data cache by looking in the to-handle cache. When the request handler wants to send the request to the nodes, it puts it in the correct named dedicated data cache, and the grid then knows to which nodes it can possibly send the request. Using the dedicated data caches and the to-handle cache in this way, the request handler only needs to know the name(s) of the dedicated data cache(s) for each request. The request handler also designates a node as primary node, ie holding the primary active copy of the event.

Nodes that are designated to have a secondary, back-up copy of the event will be termed "secondary" or back-up nodes.

"Events" as used herein preferably means the actions (e.g., authorisations, verifications, back- end, queries, etc.) occurring in an event-based processing, triggered by external requests by customers, or by the processing of earlier events. For instance, in the case of a banking environment, herein, an event entity of a customer can occur between a front-end security layer and a back-end data processing layer of a bank, as follows: A customer's event entity can be broken down in multiple small events, such as: 1. his/her checking his bank account to know its balance; 2. subsequently starting to make a payment from his/her account; 3. subsequently confirming and finalising the payment whereby an event occurs and the event ends. A event in accordance with this invention could be, for example an event for a bank account: Each entry and the current balance of a customer, as well as each authorisation, verification, querying back-end etc. occurring internally, triggered by events and external requests, is recorded and updated and all changes can be viewed; an event of a consent status: Each change of a client's consent to do something on his behalf, e.g., by a trader, is recorded and updated and all changes can be viewed; an event of energy management, e.g., electricity/gas metre: Each gas and/or electricity metre reading is recorded and all changes can be viewed; a event of computer/phone backups: All system backups are recorded and all changes can be viewed so that the reason for any problem with computer/phone can be identified.

Figure 1 illustrates a preferred embodiment of the apparatus and system (101) including an external event (102) handler (107). A user terminal (103) for sending requests and receiving responses is linked through the security layer (104) to the front end interface (106). The front end interface comprises one or more a request handlers (105) checks the request. The request handler also has access to the Entity ID and Entity Data Cache, and to the grid, to route the request to a node (413) with appropriate event processing logic. A similar line-up may exist for different events, whereby a separate event handler may effectively be considered a different kind of node.

The node preferably resides in a container, and comprises a request processor (415), an Event processor (418), and logic to process a particular event entity. A primary, and one or more secondary nodes of the same type share the event and data caches, which are only accessible for nodes of this type (416); whereas all nodes may share the dynamic configuration cache (411).

Once an event is processed, the event processor negotiates and communicates with the communication event data cache and the reactive stream (425) through a communication node (422). The system thus links a back end, and provides a front end for users, and is highly scalable as additional nodes can be added, and operated using the dynamic configuration cache in a grid.

Figure 2 shows a process flow in the system, i.e. when a request comes in and the request handler forwards the request to the grid. The grid provides the following caches Entity Cache (209A) To Handle Cache (209B) showing various events distributed to two nodes, Dead Letter Cache (209C), and Entity Archive Cache (209D). The system further comprises a Transaction Monitor (221), an observable repository (220), After Completion logic (212) and After Commit logic (214), a Watch dog logic (211), a Notifier (213) and an in memory Changed Entity List (210), which is kept in memory in the node, and containing all data from the "to handle cache". The system further comprises a reactive stream processing system comprising a Dispatcher (216), and per Journey Id a dedicated Entity processor. Preferably, events are targeted to a specific journey instance. Events for different journey instances may for instance be processed in parallel, but events for the same journey instance will preferably be handled sequentially. This is done by the dispatcher logic, which moves the events to a new reactive stream only containing events for the same journey instance. The request handler finds a request processor and executes the request process as soon as possible and stores the request for future reference. The request processor then translates the request into an event. After the request processor puts the event in a cache, and as part of the processing, it is preferably persisted to disc.

As part of this transaction, the event reference is also put in the "to-handle cache" and on the

"changed entity" list. When this transaction commits, the "after commit" logic is triggered. For each event on the "changed entity" list, the notifier is triggered on the node which holds the primary copy of the event. The notifier then places the event on the reactive stream of the node which holds the primary copy. When this has been done for all the entries on the "changed entity list", the "after complete" logic is triggered, which empties the "changed entity" list.

Thus, the event process is initiated by the reactive stream, and once completed, the event is put in the first grid cache, and written to disc or to another non-transitory memory, rather than a queue, so that data is never lost. The primary event node executes the actual event process steps from the event interceptor to passing it on to the reactive stream.

During this period there is preferably a recovery logic (not disclosed in this diagram) ensures that if a node crashes due to a technology problem and events are not handled, the event will be picked up and handled by the processor when the system recovers. As soon as a new event comes in it is stored in the first cache, the interceptor is triggered. If processing is interrupted, the new event would not be processed since the event is already in the cache, but no trace is there of the actual execution. The recovery logic, also referred to as watchdog herein, therefore checks on a regular time interval to see if the events in the cache have been handled and if they have not it retriggers the interceptor, and places older events directly onto the reactive stream.

Figure 3 shows how the watchdog logic operates in more detail: Scheduler (319) starts the Watch dog logic (311), based on an order given by a timer. The watch dog then verifies (302) for each entry reference on the « Changed entry list » (310) if the node it operates in holds the primary copy of the referenced entity, if the age of the entry reference is smaller than the maximum age; and if this reference is not yet found on a reactive stream, if not, it reverts to checking the « Changed entry list ». If the age is above, and if the primary node is active, it puts the reference on the reactive stream directly.

Figure 4 shows a flowchart for steps and system (401) entities involved in the handling and execution of an event. An Entity Creator (402) creates an event entity (403). This entity is then placed (404) in the first cache as persisting entity, and the Observable repository (420) then initiates the transaction (405), by putting (406) the entity into the first Entity Cache of the grid. The grid also puts the Entity Reference in the second "To Handle" Cache (422), with a time stamp (creation TS and numberOfAttempts=0). A primary node puts the Entity Reference in Changed Entity List (423), and the Commit transaction interceptor logic (42) commits a transaction (425), i.e. at the end of the transaction, the "commit" is executed and this assures that all the changes that have been made are made durable. A transaction monitor of the grid handles the various caches and entries therein, and ensures that the data is regularly updated, and written to disc where necessary.

After Commit (414), for each entry reference on the « Changed entry list » 426, the system verifies (427) if a node holds the primary copy of the referenced entity, ie if it is the primary node to execute the event. If the node is not a primary node, but a secondary or back-up node, the system (427) instructs (431) the Grid to execute the Notifier (413) on the node which holds the primary copy of the referenced entity. The Notifier (413) then copies (428) the entity reference to the reactive stream 415). After Completion (414), the system empties (432) the Changed Entries list cache of the node. When there are entity references on the stream, a reactive stream forwarder (416) passes the events to the Dispatcher and removes them from the stream. The Dispatcher groups (435) the entity references by Entity Id, and moves each group to a dedicated reactive stream for that Entity Id. This advantageously allows to distribute the events on the reactive stream, but at the same time allows for a sequential handling of a specific entity, for avoidance of duplication, and all without using any static memory, which is left to the Grid. The relevant entity processor (417) then takes the events listed on the stream in the order they are listed, and starts a transaction (437) to update the node, and to update (438) the cached data in the transaction monitor (421) of the grid.

Again, when there are entity references on the stream, the reactive forwarder passes the events to the Entity processor and removes them from the stream Entity Processor (436).

Now in order to safeguard the only once processing, the reactive stream also does the following assessment: the reactive stream retrieves the entity details from the entity cache (431). It then checks recurrently if the entity Id shows a number of attempts to transact is above the predetermined maximum number of attempts (440). If yes, it moves the entity reference from the « to Handle cache » (the second cache in the grid) to the third cache, the « Dead Letter Cache », for attention by an operator. This is in particular relevant if the connection to a back office system, e.g. a bank data base or the like, has failed, so than an event cannot be handled due to a cause external to the system. This also will commit a transaction in the node, and will commit a transaction in the cache so that an event entity is removed from the first and second cache.

If the entity Id, and the history of the processing attempts held in the to-handle-cache shows a number of attempts to transact is below the predetermined threshold maximum number of attempts (440), the system may also increment the number of attempts (445), and starts (446) the entity specific logic (447). This Entity Specific Logic (447) then reviews the data relating to the entity Id in the first and second caches, and either updates the caches if the transaction is completed, of if the Entity Specific Logic returns an « Error » (448), i.e. that the event data belonging to the specific event is no longer present in the to-handle cache, initiates a Roll back of the relevant transaction (457 and 458) in the node and the cache, respectively. If the Entity Specific Logic finds that that there is no erroneous doubling of activities, it removes (453) the entity reference still in the « to-handle-cache » from the « to-handle-cache », and (454) moves entity from « Entity cache » to « Entity Archive Cache », and commits the relevant transactions (455 and 456) in both nodes and grid. By building the control into the step of removing the data from the to-handle-cache, the system is extremely efficient, since the small number of potential cases where two or more nodes, or two or more reactive streams should happen to execute the same event means that only kin a very small number of cases, a rollback needs to initiated, whereas the large majority of cases will simply lead to a normal exactly once execution. In particular in combination with the watchdog logic, and the grid logic that restarts or adds nodes, the present process and system are able to provide a failure proof, guaranteed exactly once execution of an event, with minimal reduplication of resources, and allowing to multiply both nodes, as well as reactive streams, thereby catering for variation in traffic.

Step a) of the present process preferably includes writing the subsequent changes in the status information to the first and/or second cache, and wherein a node is determined as a primary node for performing an event, and wherein at least a second node is determined as secondary node that is not active to perform the same event, and wherein the first and second cache are updated accordingly. Preferably, if the primary node is not active or deactivated before executing a transaction, the grid comprises logic that promotes a secondary node to primary status, configured to transact the event as a primary node.

In this manner, a transaction that may have been omitted is guaranteed to start by the Watch dog logic. In the case this is due to a duplication, e.g. due to asynchronous data, the roll back process initiated by an entity specific logic on the reactive stream will catch any such activity, as set out below. The combination of the two steps, and the use of the grid cache for non-transitory data allows, without using too much disc space, and the associated delay, to operate a process at optimal speed and with optimal distribution, but ensuring a guaranteed, prompt and singular execution.

In the present system, event handlers and event processors are preferably provided that can define, implement, test, deploy, monitor and/or change events. The grid advantageously allows this by keeping data and processing together, limiting pan-system data transfer.

Every event is preferably deployed in a separate container. The solution is therefore a concrete practical and useful realisation of the high level microservices concept. By creating each event as an individual, runnable microservice, it can advantageously be modified without touching the already existing nodes, therefore reducing the risk of system collapse.

When an event is started, the system can then start up a new secondary node as back-up functionality in parallel to the primary node. This advantageously allows to introduce variance and novel offerings and events without having to stop the existing system, and without affecting the performed. In the case that a node should work appropriately, the secondary node can simply be stopped and the issues addressed without having to stop any of the processes.

Examples for events are as follows: In a banking context, a customer may request a transaction to be performed, e.g. a transfer of moneys from account. This request is then handled by a request handler, who appoints a primary node for execution, and one or more secondary nodes that act as inactive back-up nodes. Once the request is started, normally the node will run its process, which may include checking of balances, correctness of the account numbers, and entitlement of customer to perform the act, and exchange requirements if for instance the currency is changing. Obviously, this process can critically only be executed once, and exactly only once, as otherwise the transaction would be doubled up. In case the primary node is now not reacting, e.g. due to a pare failure or cut in the communication line, the transaction might be entirely not be performed, which may lead to a real life issues, e.g. an eviction if the payment was a rental payment, or similarly undesired effects. Alternatively, a back-up and restart procedure will be extremely slow, and may upset the entire procedure, also for other transactions.

The present solution can be used to create a flexible system which allows a built-in redundancy and bullet proof singular handling of a request, while using only minimal additional resources. Furthermore, geographical stability can be achieved to avoid issues to do local events, e.g., power outages or line disruptions, by arranging for a geographically stabilized environment. Also, peaks in data traffic can be accommodated by using distant components that are not under full load to e.g. different day times.

The term "container" as used herein preferably means a software package, also referred to as microservice of an event based processing that contains everything needed to run the software package: code, runtime, system tools, system libraries - anything that can be installed on a server. A container guarantees that its software will always run the same, regardless of its environment. A container generally includes an entire runtime environment: an application, plus all its dependencies, libraries and other binaries, and configuration files needed to run it, bundled into one package. By containerizing an application platform and its dependencies, the effects of differences in OS distributions and underlying infrastructure can be avoided. Suitable container software packages are for instance available from Docker, Linux Containers, FreeBSD jails, AIX Workload Partitions, Solaris Containers and CoreOS rkt.

The containerisation of the nodes advantageously ensures operability and developments; security through redundancy; isolation of nodes in case of issues, simple access restrictions; requires inter node communication; encryption; but permits changeability and versioning without disturbing on-going activities.

The system and apparatus preferably also comprise a copy of the container comprising the one or more nodes. The system and apparatus more preferably also have the container and its copy run on different servers.

The system and apparatus preferably also have each node in communication with a configuration cache accessible by the front-end interface. In this regard, each node is configured to store information about the one or more events it is configured to process according to data in the configuration cache, and the front-end interface is configured to route the request based on information in the configuration cache to the node.

The system and apparatus preferably also have each node being configured to generate a event ID for each commenced event process and communicate the event ID to the front-end interface, wherein the front-end interface is configured to route a request for a commenced event process to the node based on the event ID. The event ID is then shared between the primary node and the secondary node or nodes, through the first and second (and further cache).

The system and apparatus preferably also have one or more actions of at least one of said events comprising a request for information stored in a back-end system and the back-end interface is configured to pre-load information from the back-end system into a data cache accessible by a plurality of nodes upon occurrence of an event. The system and apparatus more preferably also have the event corresponding to a user initiating an event process.

The system and apparatus preferably also comprise a back-end system, such as a banking or insurance system, or a utility, traffic control, or otherwise system that requires a guaranteed exactly once execution.

The system and apparatus preferably also have the plurality of events comprising at least one out of plurality of actions for checking a user's account balance, a plurality of actions for transferring money into or out of a bank account, a plurality of actions for making a payment, a plurality of actions for a financial instrument transaction.

As set out above, Figure 1 illustrates the basic architecture, from an applications standpoint, of a server in the system and apparatus of this invention. Preferably the system and apparatus of this invention comprise at least two parallel servers, one being a back-up for the other, each with its own front-end interface, event interface, back-end interface, plurality of nodes and request handler.

The functionality of the system and apparatus and each of its servers is implemented via a central processor that manages the launching of script files and controls the operation of each server. The central processor utilizes a central service utility that runs in the background and automates tasks within the system and apparatus. Thus, the central service utility includes two types of utilities, one that runs on the individual servers and one that runs across all of the servers.

The central service utility utilizes an event-driven design to perform tasks by monitoring a set of directories on the various servers and identifying the presence of an event before initiating, or triggering, an associated script or application. Multiple scripts and flags can be used together to complete tasks, and each task may consist of multiple scripts and/or third party programs. An event may include an empty file, a file comprising a single line of data, or a complete data file; and a flag file contains data that indicates what task is to be performed based on the event.

The central service utility supports tasks performed by standard internet-based services (e.g., Internet Information Services (IIS) and Active Server Page Network (ASP.NET) services) and standard software-framework-based services (e.g., Component Object Model Plus (COM+) and .NET services). The internet-based services provide functionality for the robust, interactive data exchange processes of the present invention, and provide functionality for presenting data to users of the various systems of the I PI 100 in a web-browser-type format. The software-framework-based services provide functionality for centrally managing all of the business logic and routines utilized by the present invention.

The system preferably comprises a grid of nodes providing processing and in-memory data, the grid of nodes comprising a first node comprising said first dedicated data cache and said first dedicated logic.

The system preferably comprises a request handler to receive requests for services from users, the request handler comprising logic and data to determine an event version a request relates to and logic to route the request to the first node upon a determination that the request relates to a version that can be handled by the first node.

Looking at the architectural diagram in Figure 1, a request comes in and the request handler forwards the request to the grid. The request handler finds the request processor and executes the request process as soon as possible and stores the request for future reference.

The request processor then translates the request into an event. After the request processor, the event process is initiated and the event is put in a cache rather than a queue so that data is never lost. The steps in the primary event node run now from the event interceptor to the end. During the period there is a watchdog (not disclosed in this diagram) which ensures that if nodes crash due to a problem and events are not handled, the event will be picked up and handled by the processor when the system recovers.

As soon as an event comes in, it is stored in the cache. The interceptor is triggered if there is a new event placed into the cache and so is not triggered when the event is already in there.

If processing is interrupted at this point, the new event would not be processed because the event is already in the cache. The watchdog therefore checks to see if the events in the cache have been handled and if they have not it retriggers the interceptor.

The data stored in the primary node is also stored in at least one other, secondary node. In case the primary node fails, this is written to a disk/cache. If the node does fail, the watchdog detects this and triggers the interceptor on the other node that is already working, but not processing, since it was designated as secondary, back-up node.

Accordingly, in the subject system, preferably the first and second node(s) comprise a configuration cache accessible by the request handler, the configuration cache being configured to store information about the one or more event versions the first node can handle, wherein the request handler is configured to route the request to the first node based on information in the configuration cache of the first node.

In the subject system, preferably the grid of nodes comprises a second node comprising said second dedicated cache and said second dedicated logic. In the subject system, preferably the first data cache and logic are provided by a first container and the second data cache and logic are provided by a second container.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Figure 3 illustrates the "normal flow" of the process. The actual scheduling of the processing uses a "reactive stream". These are in-memory queue. This reactive stream is preferably part of a reactive framework which is not distributed, i.e. there are no "backup copies" in other nodes, thereby minimizing the amount of redundancy required for the subject system. This may for instance be in a Java environment. When a technical problem occurs precluding a node or server to act, a server or connection is going down, these reactive stream or streams disappear.

Contrary to the traditional use of an IMDG, where only one data cache is provided accessible by all nodes, the present system and process provide a configuration cache, and a separate data cache. As not all of the nodes have the same functionality, it must be assured that events only end up in the nodes that can handle it. Therefore the system uses dedicated caches per event type and these caches are only accessible by the nodes that have the code for this event type.

As the IMDG offers the technology to have data stored in a cache which is distributed over a set of nodes, when placing data in a cache, the Grid will be informed about the cache's unique name or denominator. Preferably, dedicated caches are employed with a specific name/denominator for each of the event types. Advantageously, then, those nodes which can handle a specific event type may place the cache name in the configuration cache. When a node starts up, it also registers itself with the grid and requests the grid to get access to the cache with the specific name. As a result, the cache, referred to herein as dynamic configuration cache, contains the event type and the cache name. By the node registering itself in the grid, the grid knows which nodes access the cache with a specific name. The request handler in turn knows the name of the cache by looking in the configuration cache. When the request handler wants to send the request to the nodes, it puts it in the correct named cache, and the grid then knows to which nodes it can possibly send it.

Using the caches in this way, the request handler only needs to know the name of the cache in which to put the requests; hence for the request handler, it is completely transparent how many nodes are available that can handle the request.

Accordingly, the use of a defined configuration cache which is populated when the nodes start up and which is available to all the node advantageously permits to reduce data transfer, and time lost in finding the cache. While distributed cache technology itself is typically provided by IMDG, the present system makes use of a dynamic configuration cache.

A second cache comprises backend data in the in memory data grid, for performance reasons. The original copy of his data is always kept in the backend, and to retrieve the most recent value, a « data retrieval instruction event » is employed. In order to reduce the need for continuously requesting the most up to date data from the back end data bases, data of a certain age may be employed without upsetting the process, also called usage based freshness ». When a event needs to retrieve data from the backend, it first looks for the latest data in the in memory cache. For this, it specifies the maximum age that the piece of data should have, as the age is the elapsed time since the data was stored in the cache. If the data in the cache is older than this maximum age, the cache will start a « data retrieval instruction event », to get the "latest" data from the backend. When this data retrieval event, which is coupled to a « communication event », brings back the value from the backend, it updates the value in the cache, and also stores the timestamp. It then returns the data to the original event, which by now is fresh enough as it was just retrieved.

Similarly, pre-fetching may be employed as an optimization, by coupling events that are typically used for a certain action. For example, when a money transfer event is started, it is already clear that during a later stage in the processing, the debtor account details need to be checked. Hence a data retrieval event directed to this data can already retrieve these values from the backend even though these are not immediately required, to have the data present in the in memory cache.

Backend pre-load: In the case of very large databases, loading substantially all the data into the IMDG may not be economical or practical. Under such circumstances, a user may pre-load selective data that may be expected to be frequently requested or used into the IMDG. For data not pre-loaded in the IMDG, an IMDG client-pre-loader or loader plug-in may act as a data access layer to fetch or collect any frequently requested data from the database and cache this data in the IMDG.

Frequently requested data may be defined as any data or data object which is used, requested or accessed a set number of times or frequency that exceeds a predetermined threshold or is accessed a preset number of times over a selected time duration.

The actual scheduling of the processing uses a "reactive stream". These are in-memory queue. This reactive stream is preferably part of a reactive Java framework which is not distributed, i.e. there are no "backup copies" in other nodes, thereby minimizing the amount of redundancy required for the subject system. When a technical problem occurs precluding a node or server to act, a server or connection is going down, these reactive stream or streams disappear.

At a regular interval, a watchdog logic may be activated, as depicted in Figures 3 and 4.

The processing of the watchdog is there to assure that every "entity" is handled, also in case of problems.

The present invention also has the benefit of allowing to stabilize the system against geographical disasters. Backups are usually required for disaster recovery. Usually two machines or data centres that are storing the same data have to be close in proximity so that the updates are synchronised because if they are far away there is a delay due to the time taken for data to reach the second machine. The normal maximum distance to avoid latency problems is 30 km.

In the case of a regional disaster, typically a third machine is kept further away and is unsynchronised which means there is a small window of data that will be lost if the first and second machines fail. Conventionally, solutions exist where one node is located on the east coast of the US and one node is located on the west coast and data is updated asynchronously with the risk of losing some data if one node goes down. However, there is a problem in the proximity versus the performance, which makes the spaced apart systems slow.

To solve this problem, a deployment model for the subject invention advantageously may comprise two synchronous machines in close proximity, e.g. neighbouring buildings, and a third machine still in close proximity but further away than the second machine, e.g. 15km (10 to 25km) spaced away, and an fourth asynchronous machine which is distanced very far, e.g. 100 km or more from the other machines; e.g. on the other side of the world. Preferably, the system is further configured to synchronously copy data to a second and/or further node located at a different location to ensure stability of the system. Preferably, the at least a second node is located on a server at a first predetermined physical distance of at least 10 km away from the server running the first node and first data cache. Preferably, processing is only performed on the closest two machines, while the other machines only carry back-up nodes, and share the cache. Once an event is handled, nothing more needs to be done as data will be available at multiple machines in case one or more fail. Accordingly, the present invention also relates to a system further configured to synchronously copy data to a second and/or further node and a second or further cache located at a different location to ensure stability of the system. Preferably, at least a second node is located on a server at a first predetermined physical distance in a range of from of from 10 to 20 km away from the server running the first node and first data cache. Advantageously, the system also asynchronously copies data to a further secondary node at a second predetermined physical distance, preferably on a server located at a distance of more than 100 km away, to be compared by a further asynchronous recovery logic.

Preferably, the system is further configured to provide, and to asynchronously copy data to a further secondary node at a second predetermined physical distance, preferably on a server located at a distance of more than 100 km away, to be verified by an asynchronous recovery logic.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to embodiments of the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of embodiments of the invention. The embodiment was chosen and described in order to best explain the principles of embodiments of the invention and the practical application, and to enable others of ordinary skill in the art to understand embodiments of the invention for various embodiments with various modifications as are suited to the particular use contemplated. Although specific embodiments have been illustrated and described herein, those of ordinary skill in the art appreciate that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiments shown and that embodiments of the invention have other applications in other environments.