Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEMS AND METHODS FOR GENERATING, MONITORING, AND ANALYZING EVENT NETWORKS FROM EVENT DATA
Document Type and Number:
WIPO Patent Application WO/2020/201830
Kind Code:
A1
Abstract:
Systems and methods for generating a consistent multi-level event network from event data representing a complex process made up of many sub-processes, each involving different entities is described herein. Event data can include different event types, a timestamp, and an identifier associated with each event. The event data can be sorted into a single table. A set of nodes can be generated based on a unique combination of the event type and the identifier associated with each event. The nodes can include at least one root node that is associated with a root entity and at least one leaf node that is associated with a leaf entity. Starting from a leaf node, nodes that are associated with a common identifier can be linked in an iterative manner until the root node. One or more event networks can be identified based on the links and relations between the nodes.

Inventors:
DELSANTE MASSIMILIANO (IT)
BACIU CLAUDIU (RO)
Application Number:
PCT/IB2020/000248
Publication Date:
October 08, 2020
Filing Date:
April 06, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
COGNITIVE TECH LIMITED (MT)
DELSANTE MASSIMILIANO (IT)
BACIU CLAUDIU (RO)
International Classes:
G06F16/26; G06F16/2458; G06Q30/06
Foreign References:
US20170213167A12017-07-27
Attorney, Agent or Firm:
GILL JENNINGS & EVERY LLP (GB)
Download PDF:
Claims:
CLAIMS

1. A system for generating an event network from event logs, the system comprising:

a database configured to receive a set of event logs associated with a sequence of n events, each event being of a different event type than each other event of the sequence of n events, each event log of the set of event logs associated with at least one event of the sequence of n events, each event log including a) timestamp information and b) a unique identifier for each event associated therewith;

a controller communicably coupled to the database and configured to:

assign an event identifier to each event log of the set of event logs based on the corresponding timestamp information to generate a set of event identifiers;

generate a set of nodes based on the set of event logs and the set of event identifiers, each node based on each unique combination of the event type and the unique identifier in the set of event logs, each node including a listing of the event identifiers of the set of event identifiers associated with its event type and with its unique identifier, the set of nodes including (i) at least one root node associated with event 1 of the sequence of n events and (ii) at least two leaf nodes associated with event n of the sequence of n events;

generate, for each leaf node, a corresponding event network to generate at least two event networks, including for each leaf node:

link the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith;

iteratively link each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is linked,

wherein each event network includes the leaf node, the at least one root node, and all nodes linked therebetween, and wherein the at least two event networks have at least one node in common;

a display device communicably coupled to the controller and configured to generate a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

2. The system of claim 1 , wherein each event log associated with an event other than the event 1 or the event n includes two or more event identifiers.

3. The system of claim 1, wherein the set of event logs received by the database includes a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other.

4. The system of claim 3, wherein the controller is configured to sort the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables prior to assigning the set of event identifiers.

5. The system of claim 1, wherein each event log further includes a relationship attribute representing a link between that event and at least one other event in the sequence of n events.

6. The system of claim 5, wherein the relationship attribute is an event identifier of the at least another event.

7. The system of claim 1, wherein linking each node in the set of nodes to another node in the set of nodes visually includes rendering at least one of an incoming transition to that node from the other node or an outgoing transition from that node to the other node.

8. The system of claim 7, wherein the number of event networks for the set of event logs is equal to the number of nodes in the set of nodes without outgoing transitions.

9. The system of claim 7, wherein the controller is further configured to generate a first event network of the at least two event networks by starting from a first leaf node of the at least two leaf nodes and traversing all nodes linked therebetween until the at least one root node.

10. The system of claim of claim 1, the controller is further configured to assign an event identifier with a Boolean value, the event identifier being associated with an event occurring in both the event networks of the at least two event networks.

11. The system of claim 1, wherein the database further includes frequency information for each event of the sequence of n events and associated with the at least two event networks, and wherein the controller is further configured to, for a first event network of the at least two event networks:

determine frequency information for each event in the first event network based on the frequency information associated with the at least two event networks; and

generate a frequency analysis model for the first event network, the frequency analysis model including a visual representation of the frequency information for each event in the first event network.

12. The system of claim 11, wherein the controller is further configured to:

determine a cost associated with each event in the first event network based at least in part on the frequency information associated with the at least two event networks; and

generate a cost analysis model of the first event network, the cost analysis model including a visual representation of the cost associated with each event in the first event network.

13. The system of claim 1, wherein the database further includes average throughput time for each event in the sequence of n events and associated with the at least two event networks, and wherein the controller is further configured to, for a first event network of the at least two event networks:

generate a performance analysis model for the first event network, the performance analysis model including a visual representation of the average throughput time for each event in the first event network.

14. The system of claim 1, wherein the database further includes information related to automation for each event of the sequence of n events and associated with the at least two event networks, and

wherein the controller is further configured to, for a first event network of the at least two event networks: determine an automation level for each event in the first event network based on the automation associated with that event; and

generate a rework analysis model for the first event network, the rework analysis model including a visual representation of the automation level for each event in the first event network.

15. A method for generating an event network from event logs, the method comprising: receiving a set of event logs associated with a sequence of n events, each event being of a different event type than each other event of the sequence of n events, each event log associated with at least one event of the sequence of n events, each event log including a) timestamp information and b) a unique identifier for each event associated therewith;

assigning an event identifier to each event log of the set of event log based on its corresponding timestamp information to generate a set of event identifiers;

generating a set of nodes based on the set of event log and the set of event identifiers, each node based on each unique combination of the event type and unique identifier in the set of event log, each node including a listing of the event identifiers of the set of event identifiers associated with that node’s event type and unique identifier, the set of nodes including (i) at least one root node associated with event 1 of the sequence of n events and (ii) at least two leaf nodes associated with event n of the sequence of n events;

identifying, for each leaf node, a corresponding event network to identify at least two event networks, including for each leaf node:

linking the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith;

iteratively linking each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is reached,

wherein each event network includes the leaf node, the at least one root node, and all nodes linked therebetween, and wherein the at least two event networks have at least one node in common; generating a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

16. The method of claim 15, wherein each event log associated with an event other than the event 1 or the event n includes two or more event identifiers.

17. The method of claim 15, wherein the set of event logs includes a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other.

18. The method of claim 17, further comprising, prior to the assigning, sorting the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables.

19. The method of claim 15, wherein each event log further includes a relationship attribute representing a link between that event and at least one other event in the sequence of n events.

20. The method of claim 19, wherein the relationship attribute is a respective event identifier of the at least another event.

21. The method of claim 15, wherein linking each node in the set of nodes to another node in the set of nodes visually includes rendering at least one of an incoming transition to that node from the other node or an outgoing transition from that node to the other node.

22. The method of claim 21, wherein the number of event networks for the set of event logs is equal to the number of nodes in the set of nodes without outgoing transitions.

23. The method of claim 21 , wherein the method generating a first event network of the at least two event networks further includes starting from a first leaf node of the at least two leaf nodes and traversing all nodes linked therebetween until the at least one root node.

24. The method of claim 1, further comprising:

for a first event network of the at least two event networks:

determining frequency information of each event in the first event network based on frequency information associated with the at least two event networks, the event log including frequency information for each event of the sequence of n events and associated with the at least two event networks; and

generating a frequency analysis model for the first event network, the frequency analysis model including a visual representation of the frequency information for each event in the first event network.

25. The method of claim 24, further comprising:

determining a cost associated with each event in the first event network based at least in part on the frequency information associated with the at least two event networks; and

generating a cost analysis model of the first event network, the cost analysis model including a visual representation of the cost associated with each event in the first event network.

26. The method of claim 15, further comprising:

for a first event network of the at least two event networks:

generating a performance analysis model for the first event network, the performance analysis model including a visual representation of an average throughput time for each event in the first event network.

27. A non-transitory computer readable medium for generating an event network from event logs executing on a computing device encoded with instructions that, when executed, causes a processor to:

receive a set of event logs associated with a sequence of n events, each event being of a different event type than each other event of the sequence of n events, each event log associated with at least one event of the sequence of n events, each event log including a) timestamp information and b) a unique identifier for each event associated therewith;

assign an event identifier to each event log of the set of event logs based on its corresponding timestamp information to generate a set of event identifiers; generate a set of nodes based on the set of event logs and the set of event identifiers, each node based on each unique combination of the event type and unique identifier in the set of event log, each node including a listing of the event identifiers of the set of event identifiers associated with that node’s event type and unique identifier, the set of nodes including at least one root node associated with event 1 of the sequence of n events, and including at least two leaf nodes associated with event n of the sequence of n events;

identify, for each leaf node, a corresponding event network to identify at least two event networks, including for each leaf node:

link the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith;

iteratively link each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is reached,

wherein each event network includes the leaf node, the at least one root node, and all nodes linked therebetween, and wherein the at least two event networks have at least one node in common;

generate a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

28. The non-transitory computer readable medium of claim 27, wherein each event log associated with an event other than the event 1 or the event n includes two or more event identifiers.

28. The non-transitory computer readable medium of claim 27, wherein the set of event log includes a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other.

29. The non-transitory computer readable medium of claim 28, wherein the instructions further causes the processor to:

prior to the assigning, sort the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables.

Description:
Systems and Methods for Generating, Monitoring, and Analyzing Event Networks from

Event Data

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the priority benefit, under 35 U.S.C. § 119(e), of U.S. Application No. 62/829,875, entitled“Systems and Methods for Discovering Event Networks,” filed on April 5, 2019, the disclosure of which is incorporated herein by reference in its entirety.

BACKGROUND

[0002] A process can be considered to be a set of linked activities or events for delivering a desirable output. Today, most companies require a deep understanding of their processes in order to remain competitive. It can be important to understand how certain processes work, and if shifts in such processes are possible for flexibility, cost savings, etc. For example, an understanding of vendors meeting past delivery dates can provide insight into which vendor may have the best chance of meeting an upcoming committed delivery date.

[0003] Process mining can provide answers to such questions. Process mining utilizes stored activity/event data to discover or identify a process, also sometimes referred to as a process model or event network, that is carried out by the organization, perhaps implicitly or without any explicit description or order. Once discovered, the process can then be analyzed. Said another way, process mining can be used for discovering, monitoring, and improving processes by extracting knowledge from event logs.

[0004] Event data can come from a variety of different sources, such as a database system, a comma-separated CSV file or spreadsheet, a transaction log, an Enterprise Resource Planning (ERP) system, a message log, an Application Programming Interface (API), etc. Event data can include any kind of information, such as patient data in a hospital, trading information, data from websites or social media, and/or the like. Process mining can allow companies to connect to varied data platforms and integrate with multiple data sources in order to extract and analyze information relating to processes. [0005] Additionally, process mining can allow companies to audit their processes and improve them by answering compliance related and performance related questions. Therefore, potential problems can be detected before they can have a negative impact on the operations of a company.

[0006] However, an unresolved technological challenge of process mining is modeling and analyzing multi-level processes. A multi-level process includes one or more complex processes involving steps/events/entities linked by many to many relationships. Examples of multi-level processes include P2P (Purchase to Pay) and 02C (Order to Cash) processes. As an example, a P2P process can be made up of different subprocesses, such as purchasing, ordering, invoicing, and payment. Multiple instances of purchasing can be linked to a single order. In a similar manner, multiple instances of an order may be linked to a single purchase. Therefore, each subprocess has to be analyzed separately in order to provide visibility on the actual end-to-end P2P process.

SUMMARY

[0007] Process mining for multi-level processes should solve the problem of data convergence and the problem of data divergence. More specifically, data divergence can occur when several events of the same type are related to a single case (a case is a sequence of events). Data convergence can occur when one event is related to multiple cases. Existing process mining techniques do not address the challenge of modeling and analyzing multi-level processes.

[0008] Systems and methods disclosed herein address the challenges of modeling and analyzing multi-level processes.

[0009] In some implementations, a system for generating an event network from event logs is disclosed herein. The system can include a database configured to receive a set of event logs associated with a sequence of n events. Each event can be a different event type than each other event of the sequence of n events. Each event log of the set of event logs can be associated with at least one event of the sequence of n events. Each event log can include a) timestamp information and b) a unique identifier for each event associated therewith.

[0010] The system can also include a controller communicably coupled to the database and configured to: assign an event identifier to each event log of the set of event logs based on the corresponding timestamp information to generate a set of event identifiers, generate a set of nodes based on the set of event logs and the set of event identifiers, and generate, for each leaf node, a corresponding event network to generate at least two event networks. Each node can be based on each unique combination of the event type and the unique identifier in the set of event logs. Each node can include a listing of the event identifiers of the set of event identifiers associated with its event type and with its unique identifier, the set of nodes can include (i) at least one root node associated with event 1 of the sequence of n events and (ii) at least two leaf nodes associated with event n of the sequence of n events.

[0011] The controller can be configured to for each leaf node: link the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith, iteratively link each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is linked. Each event network can include the leaf node, the at least one root node, and all nodes linked therebetween. The at least two event networks can have at least one node in common. The system can include a display device communicably coupled to the controller and configured to generate a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

[0012] In some implementations, each event log can be associated with an event other than the event 1 or the event n includes two or more event identifiers. In some implementations, the set of event logs received by the database includes a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other. In some implementations, the controller is configured to sort the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables prior to assigning the set of event identifiers.

[0013] In some implementations, each event log can further include a relationship attribute representing a link between that event and at least one other event in the sequence of n events. In some implementations, the relationship attribute is an event identifier of the at least another event. In some implementations, linking each node in the set of nodes to another node in the set of nodes visually can include rendering at least one of an incoming transition to that node from the other node or an outgoing transition from that node to the other node. In some implementations, the number of event networks for the set of event logs is equal to the number of nodes in the set of nodes without outgoing transitions.

[0014] In some implementations, the controller can be further configured to generate a first event network of the at least two event networks by starting from a first leaf node of the at least two leaf nodes and traversing all nodes linked therebetween until the at least one root node. In some implementations, the controller can be further configured to assign an event identifier with a Boolean value, the event identifier being associated with an event occurring in both the event networks of the at least two event networks.

[0015] In some implementations, the database can further include frequency information for each event of the sequence of n events and associated with the at least two event networks. The controller can be further configured to, for a first event network of the at least two event networks: determine frequency information for each event in the first event network based on the frequency information associated with the at least two event networks, and generate a frequency analysis model for the first event network, the frequency analysis model including a visual representation of the frequency information for each event in the first event network.

[0016] In some implementations, the controller can be further configured to: determine a cost associated with each event in the first event network based at least in part on the frequency information associated with the at least two event networks, and generate a cost analysis model of the first event network, the cost analysis model including a visual representation of the cost associated with each event in the first event network.

[0017] In some implementations, the database can further include average throughput time for each event in the sequence of n events and associated with the at least two event networks. The controller can be further configured to, for a first event network of the at least two event networks: generate a performance analysis model for the first event network, the performance analysis model including a visual representation of the average throughput time for each event in the first event network.

[0018] In some implementations, the database can further include information related to automation for each event of the sequence of n events and associated with the at least two event networks. The controller can be further configured to for a first event network of the at least two event networks: determine an automation level for each event in the first event network based on the automation associated with that event, and generate a rework analysis model for the first event network, the rework analysis model including a visual representation of the automation level for each event in the first event network.

[0019] In some implementations, a method for generating an event network from event logs is disclosed herein. The method can include receiving a set of event logs associated with a sequence of n events. Each event can be a different event type than each other event of the sequence of n events. Each event log can be associated with at least one event of the sequence of n events. Each event log can include a) timestamp information and b) a unique identifier for each event associated therewith. The method can also include assigning an event identifier to each event log of the set of event log based on its corresponding timestamp information to generate a set of event identifiers.

[0020] The method can also include generating a set of nodes based on the set of event log and the set of event identifiers. Each node can be based on each unique combination of the event type and unique identifier in the set of event log. Each node can include a listing of the event identifiers of the set of event identifiers associated with that node’s event type and unique identifier. The set of nodes can include (i) at least one root node associated with event 1 of the sequence of n events and (ii) at least two leaf nodes associated with event n of the sequence of n events.

[0021] The method can also include identifying, for each leaf node, a corresponding event network to identify at least two event networks, including for each leaf node: linking the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith, iteratively linking each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is reached. Each event network can include the leaf node, the at least one root node, and all nodes linked therebetween. The at least two event networks have at least one node in common. The method can also include generating a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

[0022] In some implementations, each event log can be associated with an event other than the event 1 or the event n includes two or more event identifiers. In some implementations, the set of event logs can include a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other. In some implementations, the method can further include prior to the assigning, sorting the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables.

[0023] In some implementations, each event log can further include a relationship attribute representing a link between that event and at least one other event in the sequence of n events. In some implementations, the relationship attribute is a respective event identifier of the at least another event. In some implementations, linking each node in the set of nodes to another node in the set of nodes visually can include rendering at least one of an incoming transition to that node from the other node or an outgoing transition from that node to the other node. In some implementations, the number of event networks for the set of event logs is equal to the number of nodes in the set of nodes without outgoing transitions. In some implementations, the method generating a first event network of the at least two event networks further includes starting from a first leaf node of the at least two leaf nodes and traversing all nodes linked therebetween until the at least one root node.

[0024] In some implementations, the method can further include for a first event network of the at least two event networks: determining frequency information of each event in the first event network based on frequency information associated with the at least two event networks, the event log including frequency information for each event of the sequence of n events and associated with the at least two event networks, and generating a frequency analysis model for the first event network. The frequency analysis model can include a visual representation of the frequency information for each event in the first event network.

[0025] In some implementations, the method can further include determining a cost associated with each event in the first event network based at least in part on the frequency information associated with the at least two event networks, and generating a cost analysis model of the first event network, the cost analysis model including a visual representation of the cost associated with each event in the first event network.

[0026] In some implementations, the method can further include for a first event network of the at least two event networks: generating a performance analysis model for the first event network. The performance analysis model can include a visual representation of an average throughput time for each event in the first event network. [0027] In some implementations, a non-transitory computer readable medium for generating an event network from event logs is disclosed herein. The non-transitory computer readable medium can be executed on a computing device encoded with instructions that, when executed, causes a processor to: receive a set of event logs associated with a sequence of n events, each event being of a different event type than each other event of the sequence of n events, each event log associated with at least one event of the sequence of n events, each event log including a) timestamp information and b) a unique identifier for each event associated therewith; assign an event identifier to each event log of the set of event logs based on its corresponding timestamp information to generate a set of event identifiers; generate a set of nodes based on the set of event logs and the set of event identifiers, each node based on each unique combination of the event type and unique identifier in the set of event log, each node including a listing of the event identifiers of the set of event identifiers associated with that node’s event type and unique identifier, the set of nodes including at least one root node associated with event 1 of the sequence of n events, and including at least two leaf nodes associated with event n of the sequence of n events;

[0028] The instructions can also include instructions that when executed causes the processor to: identify, for each leaf node, a corresponding event network to identify at least two event networks, including for each leaf node: link the leaf node to one or more nodes corresponding to event n-1 in the sequence of n events, the leaf node and the one or more nodes corresponding to event n-1 having the same event identifier associated therewith; iteratively link each of the one or more nodes corresponding to event n-1 to one or more nodes corresponding to event n-2 having the same event identifier associated therewith until the at least one root node is reached.

[0029] Each event network can include the leaf node, the at least one root node, and all nodes linked therebetween. The at least two event networks have at least one node in common. The instructions can also include instructions that when executed causes the processor to generate a visual representation of the at least two event networks illustrating an area of overlap having the at least one node in common.

[0030] In some implementations, each event log can be associated with an event other than the event 1 or the event n includes two or more event identifiers. In some implementations, the set of event log can include a set of n tables, each table corresponding to a different event of the set of n events, the set of n tables being referentially constrained with respect to each other. In some implementations, the instructions can further causes the processor to: prior to the assigning, sort the set of event logs into a single table in chronological order based on the referential constraints between the set of n tables.

[0031] All combinations of the foregoing concepts and additional concepts are discussed in greater detail below (provided such concepts are not mutually inconsistent) and are part of the inventive subject matter disclosed herein. In particular, all combinations of claimed subject matter appearing at the end of this disclosure are part of the inventive subject matter disclosed herein. The terminology used herein that also may appear in any disclosure incorporated by reference should be accorded a meaning most consistent with the particular concepts disclosed herein.

BRIEF DESCRIPTIONS OF THE DRAWINGS

[0032] The skilled artisan will understand that the drawings primarily are for illustrative purposes and are not intended to limit the scope of the inventive subject matter described herein. The drawings are not necessarily to scale; in some instances, various aspects of the inventive subject matter disclosed herein may be shown exaggerated or enlarged in the drawings to facilitate an understanding of different features. In the drawings, like reference characters generally refer to like features (e.g., functionally similar and/or structurally similar elements).

[0033] FIG. 1 illustrates an example P2P process.

[0034] FIG. 2 illustrates the many to many relationships of entities/events involved in a P2P process.

[0035] FIG. 3 illustrates the relationships among entity instances that could be stored in a schema disclosed herein.

[0036] FIG. 4 illustrates data divergence in an event network discovered by traditional methods for discovering event networks.

[0037] FIG. 5 illustrates data convergence in an event network discovered by traditional methods for discovering event networks.

[0038] FIG. 6 illustrates an example event network identified and/or generated by the systems and methods described herein. [0039] FIG. 7 illustrates a method for identifying and generating event networks from event data.

[0040] FIG. 8 illustrates identification of nodes and relationships in a multi-level process using the systems and methods described herein.

[0041] FIG. 9 illustrates example event network generated using the systems and methods described herein.

[0042] FIG. 10A illustrates a first part of an example frequency model obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0043] FIG. 10B illustrates a second part of the example frequency model obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0044] FIG. 11 A illustrates a first part of an example performance model obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0045] FIG. 1 IB illustrates a second part of the example performance model obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0046] FIG. 12A illustrates a first part of an example cost model obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0047] FIG. 12B illustrates a second part of the example cost model obtained by applying the systems and methods described herein on a P2P process for an automotive company. FIG. 13 A illustrates a first part of an example process model (rework model) obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0048] FIG. 13B illustrates a second part of the example process model (rework model) obtained by applying the systems and methods described herein on a P2P process for an automotive company.

[0049] FIG. 14 illustrates an example automotive P2P process in order to show the advantages offered by systems and methods.

[0050] FIG. 15A illustrates a frequency view model derived for the case in FIG. 14 using the systems and methods.

[0051] FIG. 15B illustrates a cost view model derived for the case in FIG. 14 using the systems and methods. [0052] FIG. 15C illustrates a rework and automation model derived for the case in FIG. 14 using the systems and methods.

[0053] FIG. 16 illustrates event networks derived for the case in FIG. 14 using a traditional approach by selecting invoice as the process identifier.

[0054] FIG. 17 illustrates event networks derived for the case in FIG. 14 using a traditional approach by selecting purchase order as the process identifier.

[0055] FIGS. 18A illustrates a first part of an event network generated by a simulation engine described herein to simulate ulti-level processes for an industrial automotive P2P process.

[0056] FIG. 18B illustrates a second part of the event network generated by a simulation engine described herein to simulate multi-level processes for an industrial automotive P2P process.

[0057] FIG. 19 illustrates details of an invoice registered event in an example simulated model.

[0058] FIG. 20 illustrates an example simulated process and the relationships between the entities.

[0059] FIG. 21 A illustrates a first event network for the simulated case.

[0060] FIG. 2 IB illustrates a second event network for the simulated case.

[0061] FIG. 21C illustrates a third event network for the simulated case.

DETAILED DESCRIPTION

[0062] Systems and methods for discovering event networks from event data are described herein. Event networks are derived from event data that includes information about events.

[0063] An event can represent any suitable activity (e.g., messaging), transaction (e.g., a trade), and/or other data (e.g., patient data) in a system that processes data electronically. All events that are to be assigned to a transaction (e.g., a business transaction) can form a case, which is a related set of events is a case. Put differently, a case can be a trace (i.e., a sequence of events). Each case can be identified by its individual case identifier (case ID) The case ID is a representation of a case. It is unique to every case and can serve to uniquely identify the case. Case ID can be alphanumeric, i .e., it can include a combination of letters and numbers.

[0064] Each event can have an associated case ID, a time stamp, and other attributes. The time stamps can correspond to logged start and end time of the event. Events are listed together with their attributes in an event log. Attributes include the case ID, the time stamps, and other attributes of the event (e.g., relationship attributes that represent a reiation/iink between two or more events, etc.) recorded by the system. An event log can represent one or more cases of a process. A process can be considered to be a set of linked activities or events, aimed to deliver a desirable output. A process instance can identify an instance of a process that corresponds to the lifecycle of a single entity. A process instance identifier can refer to the instance of the entity.

[0065] Process mining is the visualization and analysis of processes on the basis of event logs using algorithms and mathematical procedures. Put differently, process mining provides a set of tools and/or techniques that has the potential to utilize stored activity/event data to discover process models, also referred to as event networks, for subsequent analysis. Event logs can be created by data preparation from available process data/initial raw data in order to enable process mining.

[0066] Process mining can enable discovery, conformance checking, and improvement of processes by linking event logs with process models/event networks. The resulting event networks can represent real world process flow based on actual event data, rather than on assumptions or tacit knowledge. Event networks can configure and improve software systems that model and support real world process flows.

[0067] Technological Problems in the Field of Process Mining

[0068] An unresolved technological problem with process mining includes modeling of multi level processes. A multi-level process includes one or more complex processes involving events linked by many-to-many relationships. Examples of such processes in a real-world setting are the P2P (Purchase to Pay) process and the 02C (Order to Cash) process.

[0069] Standard process mining approaches do not address the problems of data convergence and data divergence. Data divergence can occur when several events of the same type are related to a single case, i.e., a single trace of events from beginning to end. Data convergence can occur when one event is related to multiple cases. Event networks generated for multi-level process models using standard process mining approaches often introduce new self-loops that are incorrect (data divergence problem). This increases the complexity and ambiguity of analyzing the processes using these generated event networks. Additionally, in some cases, the generated event networks erroneously duplicate events (data convergence problem). This can bias the analysis of processes using these generated event networks. [0070] These are technological problems that are rooted in the technical field of process mining since processes that need to be analyzed and monitored by companies almost always include a large number of event data/logs (e.g., tens of thousands of events), and the resulting event networks, when accurately determined, can be extremely intricate. Given the size and scope of this event data, generating event networks from such data manually is, for all practical purposes, impossible.

[0071] Process mining techniques should be able to organize and sort such vast quantities of event data, and then use the organized data to accurately generate the complex event networks that give rise to them. Furthermore, these techniques should be able to provide valuable insights based on the data while simultaneously addressing the data divergence and data convergence problems for multi-level processes.

[0072] An example of a multi-level process is a P2P process, although the considerations discussed here can be applicable to any multi-level process. FIG. 1 illustrates the structure of an example P2P process 102 solely for purposes of explanation using an everyday scenario, where the starting event is the creation of a Purchase Request 104 and the end event is the issuance of the Purchase Invoice 110. As seen in FIG. 1, the P2P process 102 includes the events process request 104, process order 106, goods receipt 108, and purchase invoice 110.

[0073] FIG. 2 illustrates the many-to-many relationships (represented by asterixes (*) in FIG. 2) of entities involved in a P2P process. As seen in FIG. 2, purchase request 204 is followed by the event purchase order 206. There can be a many-to-many relationship between these two events. Put differently, two instances of the event purchase request 204 can be associated with three different instances of the event purchase order 206. For example, two purchase requests can be invoked and sent by a company to three different vendors, thereby linking two instances of the event purchase request 204 with three instances of purchase order 206. Purchase order 206 is followed by good receipt 208 and then by purchase invoice 210. In a similar manner, there can be many-to-many relationships between purchase order 206 and goods receipt 208, and goods receipt 208 and purchase invoice 210. The sequence of events for the P2P process is, therefore, purchase request 204 - purchase order 206 - goods receipt 208 - purchase invoice 210.

[0074] Now consider a P2P process case that involves the following events - a) 2 Purchase Requests (i.e., 2 instances of process request 104/204 in FIG. 1 and FIG. 2), b) 2 Purchase Orders (i.e., 2 instances of purchase order 106/206 in FIG. 1 and FIG. 2), c) 3 Goods Receipts (i.e., 3 instances of goods receipt 108/208 in FIG. 1 and FIG. 2), and d) 2 Purchase Invoices (i.e., 2 instances of purchase invoice 110/210 in FIG. 1 and FIG. 2). Each event/entity can refer to the same P2P process case but can have its own lifecycle.

[0075] Typically, ERP (Enterprise Resource Planning) systems like SAP® (Systems, Applications, and Products), JD Edwards©, Oracle® Infor, etc. are entity oriented, not process oriented, so the events are stored in relational tables linked by reference constraints.

[0076] Consider data for the example P2P process that is stored as illustrated in the tables below. The tables represent different event types. For instance, table 1 represents purchase request 104 in FIG. 1 and/or purchase request 204 in FIG. 2. The recorded event in this case is the request creation. Table 2 represents purchase order 106 in FIG. 1 and/or purchase order 206 in FIG. 2. The recorded event it this case the order creation. Table 3 represents goods receipt 108 in FIG. 1 and/or goods receipt 208 in FIG. 2. The recorded event is the registration of goods receipt. Table 4 represents purchase invoice 110 in FIG. 1 and/or purchase invoice 110 in FIG. 2. The recorded event is the registration of the invoice sent by the supplier.

[0077] Each row in every table represents an instance of that event type. Put differently, each row in the table represents an instance of the recorded event associated with that table. For example, every row in table 1 represents an instance of the event type purchase request 104 in FIG. 1 and/or purchase request 204 in FIG. 2. Therefore, every time a request creation is recorded, table 1 can store an ID associated with the request creation (i.e., PR ID), a row number associated with the request creation (i.e., PR ROW), and a timestamp associated with the request creation.

[0078] In a P2P process, such as P2P process 102 in FIG. 1, purchase request 104 is followed by the event purchase order 106. Therefore, purchase order 106 is linked to the purchase request 104. This information can be captured in the tables. Put differently, the tables can also include information to represent an association with different entity types. For example, in table 2, the columns PR ID and PR ROW link an instance of purchase order to a specific instance of purchase request by linking table 2 back to table 1. Put differently, every time an order creation is recorded, table 2 can store an ID associated with the order creation (i.e., PO ID), a row number associated with the order creation (i.e., PO ROW), and a timestamp associated with the order creation. The table 2 also links that order creation to its corresponding request creation via PR ROW and PR ID. Therefore, an instance of the purchase order 106 can be linked back to an instance of the purchase request 104.

[0079] In a similar manner, as seen in FIG. 1, purchase order 106 is followed by goods receipt 108 linking goods receipt 108 to purchase order 106. This information can be captured by the columns PO ID and PO ROW in table 3, that links the corresponding instances of goods receipt back to an instance of purchase order in table 2. Goods receipt 108 is followed by purchase invoice 110 linking goods receipt 108 to purchase invoice 110. This information can be captured by the columns GR ID and GR ROW in table 3, that links the corresponding instances of purchase invoice in table 4 back to the corresponding instances of goods receipt in table 3.

PURCHASE REQUEST TABLE

TABLE 1

PURCHASE ORDER TABLE

TABLE 2

GOODS RECEIPT TABLE

TABLE 3

INVOICE TABLE

TABLE 4

[0080] FIG. 3 illustrates a graphical view of the relationships among entity instances that could be stored in a schema as shown in the example above. That is, consider an example of entity instances stored in table 1, table 2, table 3, and table 4. A graphical view of the relationships among these entity instances is illustrated in FIG. 3.

[0081] For example, consider that two instances of purchase request are stored in table 1. The PR ID for these instances in this example are PR 10070 (represented as 302a in FIG. 3) and PR 10080 (represented as 302b in FIG. 3). In this example, purchase request 304a is associated with one instance of purchase order (stored in table 2). Purchase request 304a is linked to PO 42580 (represented as 306a in FIG. 3). Purchase request 304b is linked to two instances of purchase order (stored in table 2). Purchase request 304b is linked to PO 42580 (represented as 306a in FIG. 3) and PO 42761 (represented as 306b in FIG. 3).

[0082] Purchase order 306a is associated with two instances of goods receipt (stored in table 3). Purchase order 306a is linked to GR 90017 (represented as 308a in FIG. 3) and GR 89705 (represented as 308b in FIG. 3). Purchase order 306b is associated with one instance of goods receipt (stored in table 3). Purchase order 306b is linked to GR 89158 (represented as 308c in FIG. 3)·

[0083] Goods receipt 308a is associated with one instance of purchase invoice (stored in table 4). Goods receipt 308a is linked to PI 32688 (represented as 310a in FIG.3). Goods receipt 308b is also associated with one instance of purchase invoice (stored in table 4). Goods receipt 308b is linked to PI 32688 (represented as 310a in FIG.3). Goods receipt 308c associated with another instance of purchase invoice (stored in table 4). Goods receipt 308c is linked to PI 32079 (represented as 310b in FIG. 3).

[0084] Table 1, table 2, table 3, and table 4, as well as FIG. 1, FIG. 2, and FIG. 3 illustrate an example of a complex multi-level P2P process.

[0085] For multi-level processes, process mining should solve the problem of data convergence and the problem of data divergence, described in greater detail below. Briefly, in the scenario of data convergence, one event is related to multiple cases, i.e., is part of multiple traces. In the scenario of data divergence, several events of the same type are related to a single case. The problems of data convergence and data divergence can complicate case identification and should be solved in order to avoid the generation of biased statistics.

[0086] Standard Process Minins Approach

[0087] In this section, the technological problems of data divergence and data convergence that occur with creating event networks from an event log from the example shown above following standard approach of process mining is described.

[0088] Data Divergence Problem

[0089] Data divergence occurs when a case is related to multiple events of the same event type. More specifically, due to data divergence, standard process mining techniques that generate event networks for multi-level processes often generate incorrect self-loops. This makes the event networks incorrect, resulting in downstream errors upon further analysis.

[0090] Consider the purchase order 306a in FIG. 3 (i.e., purchase order with order ID 42580) as a case ID to define the P2P process. The trace for a case using this case ID can be seen below:

42580 POl Order Creation : 04/01/2018 12:00 : Lucas

TABLE 5

[0091] As seen, the purchase order 306a (order ID 42580) has two Goods Receipt events GR1 (with GR ID 90017 represented as 308a in FIG. 3) and GR2 (with GR ID 89158 represented as 308c in FIG. 3) and two Invoice Registration events PI1 (with PI ID 32688 represented as 310a in FIG. 3) and PI2 (with PI ID 32079 represented as 310b in FIG. 3).

[0092] FIG. 4 illustrates a process model 400 obtained by using traditional process mining using this trace. In FIG. 4, two goods receipts 408 refer to different invoices 410, both of which are related to the purchase order 404, but are not related to each other. The self-loops (self-loop 414a and self-loop 414b) on Goods Receipt 408 and Invoice Registration 410 are misleading because they indicate a rework on the same activity. Rework indicates repeated activity. For instance, it indicates repeated activity of invoice registration 410 by changing minor details after the event has been already created. When an invoice is not registered right for the first time, the activity might need to be repeated again. This complexity and ambiguity of the process model increases when more good receipts and invoices are linked to the case, as the divergence problem introduces new loops.

[0093] Data Convergence Problem

[0094] Data convergence occurs when one event is related to multiple cases. More specifically, due to data convergence, standard process mining techniques that generate event networks for multi-level processes often duplicate events. This can bias, and render erroneous, the analysis resulting from analyzing such event networks. [0095] Consider purchase order as a case ID to define the P2P process. For the purchase order 306a in FIG. 3 (i.e., purchase order with order ID 42580), and purchase order 306b in FIG. 3 (i.e., purchase order with order ID 42761), using purchase order 306a and purchase order 306b as the case ID, the trace can be seen below:

TABLE 6

[0096] As seen, the receipt event of the invoice with PI ID 32688 represented as 310a in FIG. 3, which is related to two different purchase orders (e.g., the purchase order 306a in FIG. 3 (i.e., purchase order with order ID 42580), and purchase order 306b in FIG. 3 (i.e., purchase order with order ID 42761)) is considered twice.

[0097] FIG. 5 illustrates a process model 500 obtained by using traditional process mining techniques using this trace. As seen in FIG. 5, the event invoice receipt PI1 (i.e., invoice receipt 510) is considered to be two different events. This data convergence problem can lead to extracting duplicate events and biased statistics.

[0098] Existing Approaches to Data Convergence and Data Divergence

[0099] Choosing different notions of a case for the process (i.e., different case definitions) has been proposed as a solution to the divergence and convergence problem. Put differently, the existing approaches consider a case as a sequence of activities that are related to the lifecycle of a single entity. Using this approach, a case is uniquely identified by the case ID of the entity whose lifecycle is described by the process. This solution might solve the divergence and convergence problems partially, but not completely (as discussed below).

[00100] Again employing the P2P example of FIGS.1-3, choosing the invoice as the case definition (i.e., the case is identified by the case ID of the entity invoice receipt), the many-to- many relation between the invoices and purchase orders causes the resulting event log to still suffer from divergence and convergence. Choosing invoice as the case definition would duplicate the events related to the purchase order. Choosing the goods receipts as case definition (i.e., the case is identified by the case ID of the entity goods receipt) could eventually solve the divergence problem, but worsen the convergence problem, since, purchase order POl is considered as an event for GR1 and GR2.

[00101] Furthermore, it can also be very difficult for the traditional approaches to define or to retrieve an optimal definition of a case from all possible case definitions found in relational data. This is because ERP systems store a large number of entities related to a process, and the relational structure allows each entity to have its own lifecycle which may or may not be related to other entities.

[00102] Moreover, as ERP systems support the operation of an entire organization with dozens of large processes, it can be almost impossible to describe the behavior of a system using only one case definition.

[00103] Rather, a single lifecycle of the entities offers a partial view of the business process can have multiple partial views, e.g. the P2P process can be seen from the point of view of the Order, of, but cannot describe by itself the way the organization is working.

[00104] To summarize, existing approaches of process mining consider a case as a sequence of activities related to the lifecycle of a single entity. Using this approach, a case is uniquely identified by the case ID of the entity whose lifecycle is described by the process. Therefore, a process variant can be identified by the hash value of the activity sequence. Unfortunately, as discussed above, this approach fails if applied to a multi-level process, due to data divergence/convergence problem.

[00105] Multi-level Process Mining Solution The multi-level process mining solution described herein provides one or more technological improvements over existing process mining methodologies. The multi-level process solution described herein begins with rethinking the concept of a case for a multi-level process. Using this solution, the case can be dynamically identified by discovering the correlation among the entities. The entities can be mapped in the event log according to their mutual relationships. Therefore, it is possible to build their correlation/link the entities while parsing the event log.

[00106] Referring to example in FIG. 1, FIG. 2, and FIG. 3 (and table 1 -table 4) of the P2P process - events, that is, purchase request (purchase request 104 in FIG. 1 and/or purchase request 204 in FIG. 2), purchase order (purchase order 106 in FIG. 1 and/or purchase order 206 in FIG. 2), goods receipt (goods receipt 108 in FIG. 1 and/or goods receipt 208 in FIG. 2), and purchase invoice (purchase invoice 110 in FIG. 1 and/or purchase invoice 210 in FIG. 2) can be mapped and analyzed. The mapping is as follow: 1) purchase request; 2) purchase order; 3) goods receipt; and 4) purchase invoice.

[00107] The mapping order corresponds to the natural sequence of the entities: a) a purchase request is created before the corresponding order; b) a purchase order is created after the corresponding request and before the goods receipt; c) a goods receipt is registered after the corresponding order and before the purchase invoice; and d) a purchase invoice is registered after the good receipt.

[00108] Therefore, the purchase request is the root entity/event because it does not receive any incoming event from other entities. The purchase invoice is the leaf entity/event because it does not generate any event on other entities. Applying a mapping rule to the example in FIG. 3 and table 1- table 4 to map the PR ID, PO ID, GR ID, and PI ID from different tables to each other, the event log resembles the below:

Parti patirsg Entities

Root Leaf

Entity Entity

TABLE 7

[00109] The events recorded in the above event log are identified as belonging to the same case. Based on the entity mapping, the case is detected as the set of events involving the correlated entities. During the discovery phase (e.g., the process of generating and/or discovering the event network), the multi-level process solution described herein takes into account that a single event can belong to one or more cases. Therefore, such events are considered just one time to avoid data divergence and data convergence. At the end of discovery, the set of events belonging to the same process is normalized. Put differently, if the same event ID is present more than once it is reported just once, in order to obtain the right multiplicity between subprocesses, so that throughput time, resource allocation and cost allocation are correctly calculated.

[00110] FIG. 6 illustrates a process model for the above example generated using the multi level process mining solution described herein. Specifically, process variants of a multi-level process can be detected as unique sequences of different transitions. Put differently, process variants of a multi-level process can identify a unique event flow in the multi-level process. For example, in FIG. 6, there are two unique transitions from request creation 604 to order creation 606, from order creation 606 to goods receipt 608, and from goods receipt 608 to invoice receipt 610 indicating two variants for the process starting from root entity with PR ID 10080 to the leaf entity with PI ID 32688. Referring back to FIG. 3, starting at purchase request 304b the two process variants are: 1) purchase request 304b - purchase order 306a - goods receipt 308b - purchase invoice 310a; and 2) purchase request 304b - purchase order 306b - goods receipt 308a - purchase invoice 310.

[00111] In this manner, the ulti-level process can be properly modeled and statistics (e.g., costs analysis, frequency analysis, time analysis, etc.) are not biased. Moreover, a user can see and analyze the multi-level process in a manner similar to a process with one-to-one relationships without extra effort. Therefore, an innovative aspect is the capability to present in a single event network several derived processes.

[00112] Overview of an Example System for Implementing the Multi-level Process Solution

[00113] An example system for implementing the multi-level process solution can include a computing device. Some non-limiting examples of computing device include mobile devices and smart phones (e.g., Apple iPhone®, Samsung Galaxy®, Google Pixel®, etc.), computers (e.g., desktops, personal computers, laptops etc.), and/or tablets and e-readers (e.g., Apple iPad®, Samsung Galaxy® Tab, Microsoft Surface®, Amazon Kindle®, etc.).

[00114] The computing device can include a processor (also referred to herein as a “controller”). In some implementations, such a processor can be a central processing unit (CPU). One form of processor is referred to as a microprocessor. CPUs use communicative circuits to pass binary encoded signals acting as instructions to enable various operations. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory (e.g., registers, cache memory, random access memory, etc.). Such communicative instructions may be stored and/or transmitted in batches (e.g., batches of instructions) as programs and/or data components to facilitate desired operations. These stored instruction codes, e.g., programs, may engage the CPU circuit components and other motherboard and/or system components to perform desired operations.

[00115] The processor can be communicatively coupled to a database. In some implementations, the computing device includes the database. The database can be stored in a memory (e.g., (fixed and/or removable) magnetic disk drive; a magneto-optical drive; an optical drive (i.e., Blueray, CD ROM/RAM/Recordable (R)/ReWritable (RW), DVD R/RW, HD DVD R/RW etc.); an array of devices (e.g., Redundant Array of Independent Disks (RAID)); solid state memory devices (USB memory, solid state drives (SSD), etc.); other processor-readable storage mediums; and/or other devices of the like) associated with the computing device. In some implementations, the database can be stored on a cloud platform (e.g., Microsoft Azure®, Amazon® web services, IBM® cloud computing, etc.).

[00116] The processor and the database can be communicatively coupled to a display device. In some implementations, the computing device can include a display interface which acts as the display device. In some implementations, a display device separate from the computing device can be communicatively coupled to the processor.

[00117] The database can be configured to store event data associated with a process. In some implementations, event data can be in the form of one or more tables. In some implementations, event data can be event logs of a process. As seen above, an event log can include a sequence of events with each event being of a different event type than the other events. The event log can include information such as a timestamp, a case ID, a name attribute that indicates the activity, and a relationship attribute that indicates the relationship between one or more events. The case ID is a unique identifier for each event.

[00118] As seen in the event log (table 7) above, other than the root entity and the leaf entity every other row in the event log can include two or more (two in the example shown in table 7) case IDs for each entry indicating that the events are linked together. In some implementations, (example table 1 -table 4), the event log can be a set of tables with each table corresponding to a different event. The set of tables can be referentially constrained with respect to each other.

[00119] The controller communicably coupled to the database can use the event data to generate a process model for the multi-level process.

[00120] Example Method for Implementing the Multi-level Process Solution

[00121] FIG. 7 illustrates a method 800 for implementing the multi-level process solution. The following assumptions can be made by the method 800: 1) the events in the event data (e.g., event logs) need not be presented, provided, and other otherwise made available in any particular order (chronological or other); and 2) there can be a finite list of possible event types an event can belong to (e.g., “Request Creation”,“Order Creation”,“Invoice Receipt”). [00122] The method 800 can be implemented by a controller. When the event data (e.g., event logs) is a set of tables with each table corresponding to a different event, the method 800 can sort the event data into a single table in chronological order based on the referential constraints between the tables. The sorted table illustrates the relationship between different entity types. Therefore, after sorting the data into a single table, the table can include for each entry - the event type, the timestamp, and one or more event identifiers. For instance, for an entry relating to a purchase request, the entry can include a PR ID, purchase request as event type, and the timestamp relating to that particular purchase request. In a similar manner, for an entry relating to a purchase order, the entry can include a PO ID, purchase order as the event type, the timestamp relating to that purchase order, and a PR ID that linking that purchase order entry to a specific purchase request.

[00123] Once the event log is sorted into a single table, at 804, a controller can assign a case ID and/or an event identifier to each entry of the event log. The case IDs can be assigned based on timestamp information associated with each entry.

[00124] At 806, the method 800 can generate a set of nodes based on the event logs and the event identifiers of each entry of the event logs. More specifically, the method 800 can identify combinations of event type and the event identifiers. Each unique combination can be associated with a node. For example, a unique combination of a PR ID and purchase request type would generate a node. The set of nodes can include at least one root node that is associated with a root entity and at least one leaf node that is associated with a leaf entity.

[00125] At 808, the method 800 can generate a process model and/or an event network for each leaf node. In order to generate a process model, the method 800 begins with a leaf node. The method 800 then links the leaf node to one or more nodes that have the same event identifier as the leaf node. For instance, consider a leaf node of event type purchase invoice with an event identifier PI 1. Consider that an entry of goods receipt includes an event identifier GR 1, a timestamp associated with that entry, and the event identifier PI 1 linking the purchase invoice PI 1 to the goods receipt GR 1. A node of the event type goods receipt with event identifier GR 1 is generated for this entry. Since, the leaf node and the node with event identifier GR 1 have the same event identifier PI 1, the method 800 can link the leaf node to the node with the event identifier GR 1. [00126] At 810, the method 800 can iteratively link each of the nodes starting from the leaf node to other nodes that the particular node shares an event identifier with, until the root node is reached. For instance, the method 800 can start from a node of the type purchase invoice. The method 800 can link this node to a node of the type goods receipt provided both these nodes share a common event identifier. The method 800 can then link the node of the type goods receipt to a node of the type purchase order provided both these nodes share a common event identifier. The method 800 can then link the node of the type purchase order to a node of the type purchase request provided both these nodes share a common event identifier. If the node of the type purchase request is a root node, the iterative process can end here.

[00127] A process model and/or an event network is this link of nodes that includes at least one leaf node, at least one root node, and all the nodes that are linked therebetween. Put differently, an event network can be generated by starting from a leaf node and traversing all nodes linked therebetween until a root node. In some implementations, two or more process models can have one or more nodes in common. At 810, the process models can be displayed on a display device.

[00128] Example Event Networks Discovered by the Multi-level Process Solution

[00129] Data preparation of event logs can include extracting and transforming source data by an Extract, Transform, Load (ETL) procedure to generate an event log that can be imported into a system that performs process mining. In the example solution described herein, the data preparation process is greatly simplified because the solution can recognize the correlated entities, and therefore the correlated sub-processes, directly from the relational tables.

[00130] Therefore, the event log of a multi-level process can be automatically obtained through the composition of the event logs of each single processes, according to the relational tables and the multiplicity of the relationships.

[00131] For the example in FIG. 2 and FIG. 3, the traces can be extracted for the four flat processes. The denormalization (same event type repeated in the log of the flat process) can be directly obtained by the foreign keys of the relational model. The traces are shown below:

TABLE 8

PURCHASE ORDER LOG

TABLE 9

TABLE 10

PURCHASE INVOICE LOG

TABLE 11

[00132] Starting from the flat traces, the solution described herein (i.e., method 800 in FIG. 8) can automatically sort the rows in chronological order and use the relationships to identify the subprocesses in a transparent way. The solution does not require any explicit action on the user’s part to map the business entities and their relationships. The solution produces the resulting trace of the multi-level process as illustrated below.

TABLE 12

[00133] As shown above, the table 12 includes entities each having its own lifecycle while participating in one or more variants of the multi-level process.

[00134] As disclosed herein, the term“process instance” refers to identifying the lifecycle of each instance of a process. The above example therefore includes: a) 2 Purchase Request process instance -> Process Identifiers: 10070, 10080; b) 2 Purchase Order process instances - Process Identifiers: 42580, 42761; c) 3 Goods Receipt process instances - Process Identifiers: 89158, 89705, 90017; and d) 2 Purchase Invoice process instances -> Process Identifiers: 32079, 32688. [00135] As discussed in method 800, the relationship model between process instances can be generated in an iterative manner from the chronologically-sorted events. FIG. 8 illustrates the identification of nodes and relationships for the above example using the methods 800 in FIG. 7.

[00136] As discussed above, the nodes can be identified as a pair of event type and their respective process instance identifier. For example, node 904a is identified as a combination of the event type“request creation” and process instance identifier“10070”, which is a unique combination. Similarly, node 904b is identified as a combination of the event type“request creation” and process instance identifier“10080.” Node 906a is identified as a combination of the event type“order creation” and process instance identifier“42580.” Node 906b is identified as a combination of the event type“order creation” and process instance identifier“42761.”

[00137] The relations of the process model can be generated whenever an event can be assigned to multiple process instances (e.g., has multiple process identifiers). More specifically, when a node associated with an event type has the same identifier as another node associated with a different event type (both the event types being consecutive events), these nodes can be linked. For example, event with event index 2 from the table 12 has two process identifiers“10070” and “42580.” Therefore, the node 906a associated with the event type“order creation” has two process identifiers“10070” and“42580.” The node 904a associated with the event type“request creation” has event identifier“10070.” This is event with event index 0 in table 12. Therefore, node 906a can be linked to node 904a since both the nodes have a common event identifier“10070.” In a similar manner, node 906a can be linked to node 904b and node 906b can be linked to node 904b.

[00138] The value assigned to each node can represent the list of event indexes that have been matched to that respective node. For instance, from table 12, process identifier“10070” is associated with event indexes“0” and“2.” Therefore, the values assigned to node 904a include “0” and“2.” In a similar manner, process“10080” is associated with event indexes“1,”“3,” and “4.” Therefore, the values assigned to node 904b include values“1,”“3,” and“4.”

[00139] FIG. 9 illustrates example process models generated using the solution described herein for the event logs shown in table 8 - table 11. After sorting the event logs (as seen in table 12), the nodes and relationships can be identified. More specifically, a node is associated with a unique combination of event type and process identifier. An event type can be associated with an entity type. For instance, event type“Request creation” can be associated with entity type “Purchase Request.” In a similar manner, event type“order creation” can be associated with the entity type“Purchase order.”

[00140] The relationships can be generated by starting from a leaf node and traversing until the root node by linking nodes that are associated with a common identifier. For the example in table 12, starting from the leaf node 1010a until root node 1004a generates the link 1004a - 1006a -> 1008a -> 1010a. In a similar manner, starting from leaf node 1010a until root node 1004b generates the links: a) 1004b -> 1006a -> 1008a -> 1010a; and b) 1004b -> 1006b -> 1008b -> 1010a. And, starting from leaf node 1010b until root node 1004b generates the link 1004b -> 1006b -> 1008c -> 1010b. Therefore, for each of the leaf nodes 1010a and 1010b a respective process model is generated. Put differently, two process models 1000a and 1000b can be generated using the multi-level process solution described herein.

[00141] As seen in FIG. 9, the leaf nodes 1010a and 1010b do not have any outgoing relationships. Therefore, the number of generated multi-level process identifiers is equal to the number of nodes without outgoing relationships (e.g., leaf nodes). As described above, the operation of assigning a multi-level process identifier to the existing events can be performed by traversing the relationship model, starting from leaf nodes towards all nodes without incoming relationships (e.g., root nodes). Moreover, each event can be assigned one or more multi-level process identifiers. For example, node 1004b and 1006b can be assigned multi-level process identifier 1000a and 1000b. However, node 1004a is assigned process identifier 1000a while node 1010b is assigned process identifier 1000b.

[00142] In some implementations, an additional parameter, e.g., a Boolean value, can be assigned to each node that can be used to indicate whether an event is shared by at least two event networks, or not. This can prevent the duplication of the event in a resulting multi-level process model when the nodes are parsed. For example, Boolean values for nodes that are shared by two or more event networks can be“TRUE”,“0”, and/or the like. Boolean values for nodes that belong to a single event network can e“FALSE”,“1”, and/or the like. Then, nodes have the Boolean value“TRUE” are identified as shared nodes and are included once in the resulting process model. For instance, with reference to FIG. 9, a Boolean value of“TRUE” can be assigned to the event request creation 1004b with identifier 10080 in order to prevent duplication, while a Boolean value of“FALSE” can be assigned to the event goods receipt 1008c with identifier 89158. Therefore, as discussed above, event networks and/or process models can be generated for multi-level processes in a simple manner overcoming the data convergence and the data divergence problems.

[00143] Industrial Applications of Multi-Level Process Mining

[00144] The multi-level process mining solution described herein can be applied to several industrial processes. Using the multi-level process mining solution, a process can be analyzed along its entire lifecycle by discovering correct relationships between the different events/entities involved.

[00145] One example of such analysis includes a frequency analysis. FIGS. 10A and 10B are an illustration of a frequency view of an industrial automotive P2P process. FIGS. 10A and 10B illustrate an example process model obtained by applying the multi-level process mining solution described herein on a P2P process for an automotive company.

[00146] As seen in FIGS. 10A and 10B, boxes (e.g., box 1104a, box 1104b, box 1104c, etc.) represent activities. In some implementations, the boxes can have different color borders with each color identifying an entity type that the activity belongs to. The links between the boxes are transitions (e.g., transition 1102a, transition 1102b, transition 1102c, etc.) between the activities. FIGS. 10A and 10B illustrate a process model generated using the multi-level process mining solution described herein (e.g., method 800 in FIG. 7).

[00147] The number next to the transitions in FIGS. 10A and 10B indicate the number of times that specific transition has occurred. For instance, in this example, transition 1102b has occurred 19675 times. The number within every box indicates the number of times that specific activity has occurred. For example, 19884 inside box 1104a indicates that the number of times the activity requisition created has occurred.

[00148] As seen in FIGS. 10A and 10B, some incoming transitions e.g., 1102b have frequencies that are greater than some outgoing transitions e.g., transition 1102c. This is likely because some activities can be carried out in parallel and/or because of reworks. Consider transition 1 102g, the number in the bracket represents the number of times this transition is executed in parallel with another activity in the same process instance. For instance, transition 1102g is executed in parallel with another activity 10 times in the same process instance. [00149] The circle icon at the bottom right corner of the nodes indicate model coverage. For example, circle 1106c is a full circle indicating that the model covers all relationships of the activity“order item created.” Put differently, all the incoming and outgoing relationships of this activity are displayed in the model. Circle 1106g and 1106i are partially full circles indicating that the model provides only partial coverage. That is, not all incoming and outgoing relationships of the activities“order changed: payment terms,” and“order changed: delivery dates” are displayed in the model. Model coverage represents the level of detail associated with the visualization of the process model. If a user chooses 100% level of coverage for activities and 100% level of coverage for transitions, all the boxes will display full coverage. Put differently, the detail level (i.e., model coverage) can hide certain details in the displayed process model (i.e, if the user chooses partial coverage and not 100% coverage).

[00150] In some implementations, boxes can have a background color that visually represents the frequencies of an activity. For instance, a darker background color may indicate that the corresponding activity is more frequent. In some implementations, the outlines of the boxes can have different colors that indicate an entity to which the activity is referring to.

[00151] Another example of an analysis using the multi-layer process solution includes a performance analysis. FIGS. 11A and 11B is an illustration of a time performance view of an industrial automotive P2P process. FIGS. 11A and 11B illustrate an example process model obtained by applying the multi-level process mining solution described herein on a P2P process for an automotive company.

[00152] In FIGS. 11 A and 1 IB, the numbers next to the transitions (e.g., transition 1202a, transition 1202b, etc.) represent the average waiting t time for the transitions. For example, transition 1202b has an average waiting time of 4 days and 12 hours. The number within the boxes (e.g., 0ms within box 1204a), represents the average time to execute the activity (e.g., activity “requisition created”). In this example, the number within the boxes are 0 because the corresponding event log includes only a column with start timestamp and does not include the end timestamp. In this scenario, the average execution time is embedded in the waiting time.

[00153] The number 1210a represents the average and the maximum waiting request for the activity“requisition created.” The first number“37.3” represents the average waiting request and the second number“59” represents the maximum waiting request for the activity“requisition created.” The number 1208a represents the average and the maximum waiting time for the activity “requisition created.” The first number“9 Id 23h” represents the average waiting time and the second number“255d lOh” represents the maximum waiting time for the activity“requisition created.” The number 1212a represents the average and the maximum resource location for the activity“requisition created.” The number“18.5” represents the average resource location and the number“39” represents the maximum resource location.

[00154] Another example of an analysis using the multi-layer process solution includes a cost analysis. FIGS. 12A and 12B is an illustration of a cost view of an industrial automotive P2P process. FIGS. 12A and 12B illustrate an example process model obtained by applying the multi level process mining solution described herein on a P2P process for an automotive company.

[00155] The model in FIGS. 12A and 12B shows the average cost associated with an activity. For instance, the average cost of the activity“requisition created” is7. The frequency of this activity is 19,884. Therefore, the average cost associated with all execution of this activity is 139, 188 which is also represented in the model. As seen in FIGS. 12A and 12B, since costs are associated with activities, the costs are shown inside the boxes and are not represented by the transitions.

[00156] Another example of an analysis using the multi-layer process solution includes a rework analysis. Rework analysis is the repetition of the same activity for the same entity. For instance, rework of purchase order in a P2P process may signify the repetition of placing a specific order by changing minor details of the event purchase order. FIGS. 13A and 13B is an illustration of a rework and automation view of an industrial automotive P2P process. FIGS. 13A and 13B illustrate an example process model obtained by applying the multi-level process mining solution described herein on a P2P process for an automotive company.

[00157] In some instances, an event log can include a column to indicate whether an activity is performed manually or is automated. The process model in FIGS. 13A and 13B uses this information to show the automation level of each activity. For instance, the automation level of “requisition released” is 0% indicating that this activity is performed manually. However, the automation level of“order item created” is 34% indicating that this activity is automated by 34%.

[00158] To understand the advantages offered by multi-level process mining solution compared to traditional mining approaches consider a process shown in FIG. 14. As seen in FIG. 14, five purchase order lines (e.g., purchase order 1506a-1506e) can be generated by five purchase requisition lines (e.g., purchase requisition 1504a-1504e) in five separate goods receipt (e.g., goods receipt 1508a-1508e). All of these order lines are registered in a unique invoice 1510.

[00159] FIGS. 15 A, 15B, and 15C illustrate the process model derived for the case in FIG. 14 using the multilevel process mining solution. In FIGS. 15 A, 15B, and 15C, the relationships between entities are preserved. FIGS. 15 A, 15B, and 15C illustrate the frequency view model 1600a, the cost view model 1600b, and the rework and automation view model 1600c respectively. As seen in FIGS. 15A, 15B and 15C, the process flow is linear until“invoice registered” 1616a, 1616b, and/or 1616c. Five goods receipt (e.g., represented as 1614a, 1614b, and 1614c) are registered with a unique invoice (i.e., invoice 1510 in FIG. 14). In frequency view model 1600a, this relationship is preserved on the transition between 1614a and 1616a. In the cost model, this relation is preserved in the activity goods registered 1614a as frequency times cost (5 times 7). The invoice activity is triggered only once thereby making the invoicing less expensive. No reworks occur within this case.

[00160] Now consider a traditional process mining approach. As discussed above, a unique process identifier will need to be determined. Consider a scenario where invoice is selected as the process identifier. The resulting process model is illustrated in FIG. 16. FIG. 16 illustrates the frequency view model 1700a and the rework and automation model 1700b when invoice is selected as the process identifier. As illustrated in FIG. 16, non-existing self-loops can occur along the frequency view model 1700a using the traditional process mining approach. This happens because each activity before the Invoice Registration (1716a and 1716b) is performed five times. However, from the Invoice perspective (1 unique case) this results in a repetition making the process model inconsistent.

[00161] Now, consider a case when purchase order is selected as the process identifier. FIG. 17 illustrates the frequency view model 1800a and cost model 1800b when purchase order is selected as the process identifier. As seen in process model 1800a, the frequencies of invoice registered 1816a and invoice cleared 1818a are incorrect. The frequency of the activities 1816a and 1818a is shown as five, even if in reality, all the order lines are registered to a single invoice. This happens because, from the perspective of purchase order, five order lines are invoiced. Therefore, the process model recognizes five invoice registrations and five clearing activities. [00162] This inconsistency can lead to costs related to invoicing activities being strongly biased because of frequency errors. In this case, the costs are five times higher than the real cost because the costs of activities associated with invoicing can contribute five times to the total cost instead of contributing just once.

[00163] Multi-Level Process Simulation

[00164] In addition to deriving the process models for an end-to-end multi-level process, the systems and methods described herein can perform various types of what-if analysis on a multi level process. In some implementations, the system for implementing the multi-process solution can include a simulation engine to perform what-if analysis on multi-level processes. Put differently, the multi-level process solution described herein can include the functionalities of the simulation engine (as described below) to generate process events starting from multi-level process models that are discovered by the multi-level process solution.

[00165] In some implementations, the controller in the system for implementing the multi process solution can include instructions to execute functionalities of the simulation engine that are described herein. The simulation engine can generate simulated scenarios with multi-level process properties.

[00166] These simulation scenarios can be generated from process models derived from the multi-level process mining solution described herein. Therefore, the simulation data can be automatically set according to the real process. Some example data retrieved from the discovered models and used by the simulation engine include: a) historical case arrival distribution wherein the simulation follows the same case arrival distribution as real process; b) relationships and correlations between different entities; c) activity execution time and waiting time; d) resource allocation; and e) execution flow probabilities.

[00167] Put differently, a user and/or a company may want to analyze a specific scenario in a multi-level process. For example, analyzing the effect of automating a specific event or activity in a multi-level process can offer a deep understanding about the impact that activity has on the multi-level process. For instance, the cost of the entire process may be reduced by 50% by automating that specific event. The simulation engine can enable the analysis of such what-if scenarios. Additionally, the simulation engine can generate contextual data with the right distribution and correlation. Contextual data are optional columns provided in the rows of the event log that are mapped as event attributes in the discovered process, to enable a multi-dimensional data analysis. Some non-limiting examples of contextual attributes for event types relating to purchase order can include - order material, order quantity, order currency, order category, order purchase group, order purchase organization, order supplier, order supplier country, order supplier city, etc. Some contextual attributes can be correlated. For instance, an order supplier country that is Italy will include an order supplier city that is a city in Italy. The simulation engine can replicate the distribution of contextual data preserving their correlations.

[00168] A user can decide the what-if scenario and the corresponding event network can be generated. An analysis of the corresponding frequency model, cost model, performance model, and/or the rework model can provide an impact that scenario has on the process itself.

[00169] Consider an example industrial automotive P2P process such as the example industrial automotive P2P process discussed above. In one example, the simulation engine can perform what-if analysis in which the“invoice registered” activity can be automated by 80%. In some implementation, automation of 80% can be equivalent to five minutes of robotic execution time.

[00170] FIGS. 18A and 18B illustrate a process model (frequency view) generated by the simulation engine for the industrial automotive P2P process. As seen in FIGS. 18A and 18B, the process model generated by the simulation engine reflects a workflow of the discovered process. The automation of 80% to the“invoice registered” by the simulation engine can result in performance and cost improvement of the process.

[00171] The automation level, and consequently the waiting time and the resource allocation time on the“invoice registered” activity can be changed according to the what-if hypothesis. FIG. 19 illustrates the invoice registered in an example simulated model. 2012a represents the automation level of 80% in this example. 2010b represents the waiting time (e.g., 6 days 3 hours/ 69 days 3 hours) after the simulation. 2008b represents the resource allocation after the simulation.

[00172] To illustrate that the simulated process preserves the multi-level process properties of the original process, a single simulated case can be isolated. FIG. 20 illustrates an example simulated case and the relationships between the entities. As seen in FIG. 20, two purchase order lines 2106a and 2106b are generated by two purchase requisition lines 2104a and 2104b and received in two separate good receipts 2108a and 2108b. The purchase order lines are registered in a unique invoice 2110.

[00173] FIGS. 21A, 21B, and 21C illustrate the process models for the simulated case. FIGS. 21 A, 21B, and 21C include the frequency view 2200a, cost view 2200b, and rework 2200c. As seen from the process models in FIGS. 21A, 21B, and 21C, the simulated case contains improvements (improvements expected by the what-if analysis) to the original process. The rework model 2200c shows that the“invoice registered” 2216c activity can be completely automated and performed by a robot. Therefore, as seen in the cost model 2200b, the cost associated with“invoice registered” 2216b is lower since no human resources are spent on this activity.

Conclusion

[00174] While various inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize or be able to ascertain, using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.

[00175] Also, various inventive concepts may be embodied as one or more methods, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

[00176] All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.

[00177] The indefinite articles“a” and“an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean“at least one.”

[00178] The phrase“and/or,” as used herein in the specification and in the claims, should be understood to mean“either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with“and/or” should be construed in the same fashion, i.e.,“one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the“and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to“A and/or B”, when used in conjunction with open-ended language such as“comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.

[00179] As used herein in the specification and in the claims,“or” should be understood to have the same meaning as“and/or” as defined above. For example, when separating items in a list,“or” or“and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as“only one of’ or“exactly one of,” or, when used in the claims,“consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term“or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e.“one or the other but not both”) when preceded by terms of exclusivity, such as“either,”“one of,”“only one of,” or“exactly one of.”“Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law. [00180] As used herein in the specification and in the claims, the phrase“at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase“at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently,“at least one of A or B,” or, equivalently“at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

[00181] In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases“consisting of’ and“consisting essentially of’ shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.