Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEMS AND METHODS FOR EVALUATION OF EVENTS BASED ON A REFERENCE BASELINE ACCORDING TO TEMPORAL POSITION IN A SEQUENCE OF EVENTS
Document Type and Number:
WIPO Patent Application WO/2013/036269
Kind Code:
A1
Abstract:
Systems and methods for evaluation of events are provided. A user-specific reference baseline comprising a set of temporally-ordered sequences of events. An event of a sequence of events in a current session is received. A determination is made as to whether the event at least partially matches the reference baseline using an attribute of the event and a temporal position of the event within the sequence of events in the current session.

Inventors:
SINGLA ANURAG (US)
BLOCK ROBERT (US)
Application Number:
PCT/US2011/057139
Publication Date:
March 14, 2013
Filing Date:
October 20, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEWLETT PACKARD DEVELOPMENT CO (US)
SINGLA ANURAG (US)
BLOCK ROBERT (US)
International Classes:
G06F11/30; G06F9/44; G06F21/00
Foreign References:
US6725287B12004-04-20
US20020107841A12002-08-08
US7251584B12007-07-31
US6601048B12003-07-29
US7424742B12008-09-09
Other References:
See also references of EP 2754049A4
Attorney, Agent or Firm:
CHATTERJEE-MARATHE, Naya (Intellectual Property Administration3404 E Harmony Road,Mail Stop 3, Fort Collins Colorado, US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1 . A method for evaluation of events, the method comprising:

generating, by a computing device, a user-specific reference baseline comprising a set of temporally-ordered sequences of events;

receiving an event of a sequence of events in a current session;

determining whether the event at least partially matches the reference baseline using an attribute of the event and based on a temporal position of the event within the sequence of events in the current session; and

analyzing a condition of a rule based on the determination of whether the event at least partially matches the reference baseline.

2. The method of claim 1 , wherein the reference baseline further comprises timing data associated with each sequence in the set of temporally-ordered sequences of events, wherein each sequence in the set is a distinctive pattern of historical events associated with the user.

3. The method of claim 1 , wherein determining whether the event at least partially matches the reference baseline comprises:

identifying the user-specific reference baseline among a plurality of reference baselines based on a user identifier associated with the received event;

selecting an event in a sequence of the reference baseline having the same temporal position as the temporal position of the received event;

comparing the attribute of the received event to a same attribute of the event in the reference baseline, wherein the attribute of the received event is an event type.

4. The method of claim 3, wherein determining whether the event at least partially matches the reference baseline further comprises determining whether the attribute of the received event is within a deviation threshold specified in the condition of the rule.

5. The method of claim 1 , wherein at least one sequence in the set of temporally-ordered sequences of events is flagged as being either normal user behavior or anomalous user behavior.

6. The method of claim S, wherein analyzing the condition of the rule comprises:

determining the rule condition is "false" if the event matches the at least one sequence which is flagged as norma! user behavior; and

determining the rule condition is "true" if the event matches the at least one sequence which is flagged as anomalous user behavior.

7. The method of claim 1 , wherein determining whether the received event at least partially matches the reference baseline comprises:

tracking a time gap between a time of occurrence of the received event and a time of occurrence of a previously received event of the sequence of events in the current session;

comparing the time gap to timing data of the reference baseline.

8. The method of claim 1 , wherein each sequence in the set of temporally-ordered sequences of events is associated with timing data, and wherein the timing data includes statistics generated using a time of occurrence of events in a raw baseline.

9. A system for evaluation of events, the system comprising:

a processor; and

a memory coupled to the processor, the memory configured to store a data list including a plurality of user-specific reference baselines;

wherein the processor is configured to:

generate a first user-specific reference baseline of the plurality of user-specific reference baselines, the first user-specific reference baseline comprising a set of temporally-ordered sequences of events;

receive an event of a sequence of events in a current session; determine whether the event at least partially matches the reference baseline using an attribute of the event and based on a temporal position of the event within the sequence of events in the current session;

analyze a condition of a rule based on the determination of whether the event at least partially matches the reference baseline; and

generate a correlation event.

10. The system of claim 9, wherein the reference baseline further comprises timing data associated with each sequence in the set of temporally-ordered sequences of events, wherein each sequence in the set is a distinctive pattern of historical events associated with the user.

1 1 . The system of claim 9, wherein the processor is configured to determine whether the event at least partially matches the reference baseline by:

identifying the user-specific reference baseline among a plurality of reference baselines based on a user identifier associated with the received event;

selecting an event in a sequence of the reference baseline having the same temporal position as the temporal position of the received event;

comparing the attribute of the received event to a same attribute of the event in the reference baseline, wherein the attribute of the received event is an event type.

12. The system of claim 1 1 , wherein the processor is configured to determine whether the event at least partially matches the reference baseline by:

determining whether the attribute of the received event is within a deviation threshold specified in the condition of the rule.

13. The system of claim 9, wherein at least one sequence in the set of temporally-ordered sequences of events is flagged as being either normal user behavior or anomalous user behavior

14. A non-transitory computer-readable medium storing a plurality of instructions to control a data processor to evaluate events, the plurality of instructions comprising instructions that cause the data processor to:

generate a first user-specific reference baseline of the plurality of user-specific reference baselines, the first reference baseline comprising a set of temporally-ordered sequences of transactional events, wherein each event is a part of a financial transaction;

receive a transactional event of a sequence of transactional events in a current session;

determine whether the transactional event at least partially matches the reference baseline using an attribute of the transactional event and based on a temporal position of the transactional event within the sequence of

transactional events in the current session;

analyze a condition of a rule based on the determination of whether the transactional event at least partially matches the reference baseline; and correlate the transactional event with a security-related event.

15. The non-transitory computer-readable medium of claim 14, wherein the instructions that cause the data processor to determine whether the event at least partially matches the reference baseline comprise instructions that cause the data processor to:

identify the user-specific reference baseline among a plurality of reference baselines based on a user identifier associated with the received event;

select an event in a sequence of the reference baseline having the same temporal position as the temporal position of the received event;

compare the attribute of the received event to a same attribute of the event in the reference baseline, wherein the attribute of the received event is an event type.

Description:
SYSTEMS AND METHODS FOR EVALUATION OF EVENTS BASED ON A REFERENCE BASELINE ACCORDING TO TEMPORAL POSITION M A

SEQUENCE OF EVENTS

L CROSS-REFERENCES

[0001] This application claims priority from co-pending U.S. Provisional Patent Application No. 61 /532,968, filed September 9, 201 1 , titled "RULE-BASED

ANALYSIS OF EVENTS BASED ON DEVIATION FROM A BASELINE," (Attorney Docket No. 82844890, which is hereby incorporated by reference, as if set forth in full in this document, for all purposes. This application incorporates by reference in its entirety U.S. Pat. No. 7,984,502, filed October 1 , 2008, titled "PATTERN DISCOVERY IN A NETWORK SYSTEM."

II. BACKGROUND

[0002] Computer networks and systems have become indispensable tools for modern business. Today terabits of information on virtually every subject imaginable are stored in and accessed across such networks by users throughout the world. Much of this information is, to some degree, confidential and its protection is desired. Intrusion detection systems (IDS), including fraud detection systems (FDS), have been developed to detect unauthorized use of information and resources and to help uncover attempts by unauthorized persons and/or devices to gain access to computer networks and the information stored therein.

[0003] There are two complementary approaches to detecting intrusions:

knowledge-based approaches and behavior-based approaches. Many IDS tools in use today are knowledge-based. Knowledge-based intrusion detection techniques involve comparing the captured data to information regarding existing techniques to exploit vulnerabilities. When a match is detected, an alarm is triggered. Behavior- based intrusion detection techniques, on the other hand, attempt to spot intrusions by observing deviations from normal or expected behaviors of the system or the users (models of which are extracted from reference information collected by various means). When a suspected deviation is observed, an alarm is generated.

[0004] Traditional security systems use rules to correlate events. Rules may be used to analyze and correlate user activity events to identify intrusions, more specifically, behavioral patterns that deviate from the norm. While these mechanisms are powerful enough to support many standard correlation use cases, some intrusions, such as sophisticated fraudulent attacks, may go undetected or may be detected after specifying a highly complex set of rules.

III. BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The present disclosure may be better understood and its numerous features and advantages made apparent by referencing the accompanying drawings.

[0001] FIG. 1 is a topological block diagram of a network security system in accordance with an embodiment.

[0006] FIG. 2 is a process flow diagram for generation of a reference baseline in accordance with an embodiment.

[0007] FIG. 3 is a process flow diagram for analysis of user activity events based on deviation from a reference baseline in accordance with an embodiment.

[0008] FIG. 4 is a process flow diagram for triggering a response action in accordance with an embodiment.

[0009] FIG. 5 illustrates a computer system in which an embodiment may be implemented.

IV. DETAILED DESCRIPTION

[0010] Security systems receive events from potentially thousands of sources. Events may be cross-correlated with rules to provide security-related intelligence that would not be identified by individual devices. In general, correlation can indicate that different events from different sources are associated with a common incident, as defined by correlation rules. More specifically, correlation includes, for example, discovering the relationships among events, inferring the significance of those relationships, prioritizing the events and meta-events, and providing a framework for taking action. [0011] As used herein, a rule or a "correlation rule" is a procedure and comprises a set of simple or complex conditions which may be combined with other constructs such as aggregation, groupings, and triggers. A rule is used in many ways, such as: to evaluate incoming events for specific conditions and patterns; to correlate information from different events using rule correlation as well as other constructs like active lists, session lists, and threat level calculations; to infer meaning about significance of events; and to initiate actions in response to events.

[0012] In other words, rules express conditions against which event streams are evaluated. The outcome of the evaluation provides information to derive the meaning out of the event streams. When a match is determined, the rule may initiate an action in response.

[0013] In addition to conditions, a rule may further include a threshold (i.e., number of occurrences, running total), a time duration, join criterion, and/or an aggregation criterion. For example: if (failed login attempt) occurs (from the same source IP address) (10 times) within (1 minute) then (Action).

[0014] For this rule, the condition is "failed login attempt," the threshold number of occurrences is "10," the time duration is "1 minute," and the aggregation criterion is "from the same source IP address."

[0015] Rule conditions may refer to various data models. For example, a rule condition may refer to fields or properties of a network and asset model, which is a representation of nodes and/or machines on a network. The properties may include open ports, operating system, vulnerabilities, business classification, etc.

[0016] A rule condition may also refer to a data list, such as an active list and session list. Session lists associate users with their event traffic on the network. More specifically, a session list is a configurable table that maintains temporal data related to user sessions (e.g., DHCP session information, VPN session information, mappings from users to their roles and the corresponding period of time for which those attributes are valid, etc.). [0017] An active list is a configurable table that aggregates specified fields of events. Active lists provide tracking for specific events over a period of time (e.g., days, weeks, etc.). The aggregated data may be made available for correlation. For example, a data list may keep track of a running total of purchases made using a particular credit card for a billing cycle. The running total is referenced by a rule, for example, to alert the credit card holder of reaching a credit limit.

[0018] in the context of intrusion detection, temporally-ordered sequences of user action events and a time gap between each of these events is relevant for identifying anomalies and abnormal behavior. More specifically, the comparison of a user's activity to that users baseline behavior is relevant for the timely detection of anomalies for fraud detection.

[0019] However, rules are generally static in nature and are thus insufficient to capture the differences in patterns of activities among all users in a network and/or system. Furthermore, traditional rules engines allow the specification of rules that correlate events using simple join conditions, which typically relate different events based on matching field values, and threshold conditions that specify the number of events that need to occur in a certain time period. While these mechanisms are powerful enough to support many standard correlation use cases, the detection of sophisticated fraudulent attacks often demands more complex conditions than can be easily specified in this manner.

[0020] Moreover, a complicated set of static rules may be used to capture sequences of events and detect missing events within a specified event sequence. For a sequence with more than two events, detecting all possible sequence deviations would demand creating an exponential number of rules, including rules that chain other rules to implement a state machine, for example.

[0021] A framework to detect anomalies in web-based financial transactions (e.g., account access) behavior and other types of user action events is described, according to one embodiment. A user-specific baseline is determined based on an identified pattern of a temporally-ordered sequence of events and/or based on time gaps between each of the events in the sequence. Rules may be used to detect when an online transaction is likely to be fraudulent based on the properties of the online session. Events corresponding to a user's interaction with an online account or other session (e.g., web session), are compared against the user-specific baseline. Significant deviations from a user's baseline or expected behavior may be flagged as a potential fraud. Furthermore, the fraudulent/suspicious events may be correlated with security information to provide a more complete view of the system by tapping-in to previously disparate sections of a security system.

[0022] In the case of online financial transaction fraud, detection is based on analysis of a users online session. For a particular user, a normal online session might entail visiting seven web pages in a particular order, i.e., login page, homepage, account balance page, transfer funds page, etc. In this example, the fund transfer transaction is completed in about five minutes. Sf, in a particular instance, this user skips an otherwise usual step for this transaction (i.e., does not navigate to the account balance page), executes steps out of the expected order, spends merely 20 seconds before completing the transaction during a session, or the time interval between the expected steps deviates significantly from the normal behavior for that user, the anomaly may be indicative of fraudulent access.

[0023] Systems and methods for evaluation of events are provided, in one embodiment. A user-specific reference baseline comprising a set of temporally- ordered sequences of events is generated. An event of a sequence of events in a current session is received. A determination is made as to whether the event at least partially matches the reference baseline using an attribute of the event and a temporal position of the event within the sequence of events in the current session. A condition of a rule is analyzed, and a correlation event is generated.

[0024] FIG. 1 is a topological block diagram of a network security system 100 in accordance with an embodiment. System 100 includes agents 12a-n, at least one manager 14 and at least one console 16 (which may include browser-based versions thereof). In some embodiments, agents, managers and/or consoles may be combined in a single platform or distributed in two, three or more platforms (such as in the illustrated example). The use of this multi-tier architecture supports scalability as a computer network or system grows. [0025] Agents 12a-n are software programs, which are machine readable instructions, that provide efficient, real-time (or near real-time) local event data capture and filtering from a variety of network security devices and/or applications. The typical sources of security events are common network security devices, such as firewalls, intrusion defection systems and operating system logs. Agents 12a-n can collect events from any source that produces event logs or messages and can operate at the native device, at consolidation points within the network, and/or through simple network management protocol (SNMP) traps.

[0028] Agents 12a-n are configurable through both manual and automated processes and via associated configuration files. Each agent 12 may include at least one software module including a normalizing component, a time correction component, an aggregation component, a batching component, a resolver component, a transport component, and/or additional components. These components may be activated and/or deactivated through appropriate commands in the configuration file.

[0027] Manager 14 may be comprised of server-based components that further consolidate, filter and cross-correlate events received from the agents, employing a rules engine 18 and a centralized event database 20. One role of manager 14 is to capture and store ail of the real-time and historic event data to construct (via database manager 22) a complete, enterprise-wide picture of security activity. The manager 14 also provides centralized administration, notification (through at least one notifier 24), and reporting, as well as a knowledge base 28 and case management workflow. The manager 14 may be deployed on any computer hardware platform and one embodiment utilizes a database management system to implement the event data store component. Communications between manager 14 and agents 12a-n may be bi-directional (e.g., to allow manager 14 to transmit commands to the platform hosting agents 12a-n) and encrypted. Sn some installations, managers 14 may act as concentrators for multiple agents 12a~n and can forward information to other managers (e.g., deployed at a corporate headquarters). [0028] Manager 14 also includes at least one event manager 26, which is responsible for receiving the event data messages transmitted by agents 12a-n and/or other managers, and receiving event summary data from other managers. Event manager 26 is also responsible for generating event data messages such as correlation events and audit events. Where bi-directional communication with agents 12a-n is implemented, event manager 26 may be used to transmit messages to agents 12a-n. If encryption is employed for agent-manager communications, event manager 26 is responsible for decrypting the messages received from agents 12a-n and encrypting any messages transmitted to agents 12a-n.

[0029] Once the event data messages have been received, the event data is passed to the rules engine 18. Rules engine 18 is configured to cross-correlate the event data with correlation rules in order to identify rule matches.

[0030] The actions triggered by the rules may include notifications transmitted (e.g., via notifier 24) to designated destinations (e.g., security analysts may be notified via the consoles 16, email messages, a call to a telephone, cellular telephone, voicemail box and/or pager number or address, or by way of a message to another communication device and/or address such as a facsimile machine, etc.) and/or instructions to network devices (e.g., via agents 12, etc.).

[0031] Consoles 16 are computer- (e.g., workstation-) based applications that allow security professionals to perform day-to-day administrative and operation tasks such as event monitoring, rules authoring, incident investigation and reporting. A single manager 14 can support multiple consoles 16.

[0032] In some embodiments, a browser-based version of the console 16 may be used to provide access to security events, knowledge base articles, reports, notifications and cases. That is, the manager 14 may include a web server component accessible via a web browser hosted on a personal or handheld computer (which takes the place of console 16) to provide some or all of the functionality of a console 16. Browser access is particularly useful for security professionals that are away from the consoles 16 and for part-time users. Communication between consoles 18 and manager 14 is bi-directional and may be encrypted.

[0033] Through the above-described architecture, a centralized or decentralized environment may be supported. This is useful because an organization may want to implement a single instance of system 100 and use an access control list to partition users. Alternatively, the organization may choose to deploy separate systems 100 for each of a number of groups and consolidate the results at a "master" level. Such a deployment can also achieve a "foiiow-the-sun" arrangement where geographically dispersed peer groups collaborate with each other bypassing oversight responsibility to the group currently working standard business hours. Systems 100 can also be deployed in a corporate hierarchy where business divisions work separately and support a roll-up to a centralized management function.

[0034] The network security system 100 also includes anomaly detection capabilities. In one embodiment, manager 14 further includes a raw baseline module 31 A, reference baseline module 31 B, and a local memory 32.

[0035] Raw baseline module 31 A is configured to receive a set of events, such as security events from at least one of agents 12a-n via event manager 26, from event database 20 via the database manager 22, or from event manager 26. Raw baseline module 31 A is further configured to group the events per-user and per- session, using a user ID and session ID, respectively, as provided in an event.

[0036] Grouping by session may be performed, for example, according to a session identifier (ID) (e.g., web session identifier) of an event . In another embodiment, the session-based grouping may be performed by examining the timestamp of an event and determining a set of events to be a part of a same session where the timestamps of the events in the set are close in time proximity. In other words, the events are grouped by time proximity among the events (e.g., transactions).

[0037] Raw baseline module 31 A is further configured to identify a sequence of user actions taken during the session and the time gaps between those actions, generating a raw baseline. Raw baseline module 31 is configured to store the raw baseline in a data list, such as an active list and/or session list of data list module 30.

[0038] Reference baseline module 31 B is configured to generate a reference baseline against which a sequence of received events are analyzed to identify anomalies. The reference baseline module 31 B is more specifically configured to identify common and distinctive patterns of temporally-ordered sequences of events that make up the historical record on a per-user basis, using the raw baselines. Furthermore, the time gaps between each of the events in a distinctive sequence of events is determined.

[0039] The reference baseline module 31 B is further configured to compute statistics about an event flow, including the number of occurrences of individual events across ail the sessions for a particular event flow. Raw baseline module 31 A and reference baseline module 31 B may be standalone modules, as shown, or may be integrated with another component, such as rules engine 18.

[0040] Rules engine 18 is configured to receive a set of events, such as security events from at least one of agents 12a-n via event manager 26, from event database 20 via the database manager 22, or from event manager 26.

Furthermore, rules engine 18 is configured to compare a sequence of incoming events and time data (including time gaps) of a current session to the reference baseline associated with the particular user, and initiate a rule firing where the current session information does not match with a reference baseline.

[0041] The network security system 100 also includes data list capabilities, in one embodiment, manager 14 further includes a data list module 30 and a local memory 32. Data list module 30 is configured to maintain session lists and/or active lists. Furthermore, data list module 30 is configured to receive a set of events, such as security events from at least one of agents 12a-n via event manager 28 or from the event manager 26 itself, and/or receive user-specific baselines from at least one of raw baseline module 31 A and reference baseline module 31 B. The session lists and/or active lists may be maintained in tables (i.e., master table and/or local tables) In local memory 32. In one embodiment, each record In a data list table represents a user-specific, session-specific event flow. There may be multiple event flows for a single user, for example, if the user engages in multiple sessions of a financial transaction.

[0042] Local memory 32 may be any appropriate storage medium and may be located on manager 14 itself, in a cluster containing manager 14, or on a network node accessible to manager 14.

[0043] BASELINE GENERATION

[0044] The detection of potential intrusions, including fraud during the course of financial transactions, is facilitated by the generation of a reference baseline, which represents a pattern of historical (e.g., normal or anomalous) behavior or business patterns of a user. These patterns in the reference baseline are then used to detect anomalies in the user's behavior.

[0045] FIG. 2 is a process flow diagram for generation of a reference baseline in accordance with an embodiment. The depicted process flow 200 may be carried out by execution of sequences of executable instructions. In another embodiment, various portions of the process flow 200 are carried out by components of a network security system, an arrangement of hardware logic, e.g., an Application- Specific Integrated Circuit (ASIC), etc. For example, blocks of process flow 200 may be performed by execution of sequences of executable instructions in a raw baseline module of the network security system. The raw baseline module may be deployed, for example, at a manager in the network security system. Furthermore, blocks of process flow 200 may be performed by execution of sequences of executable instructions in a reference baseline module of the network security system. The reference baseline module may be deployed, for example, at a manager in the network security system.

[0046] The network security system monitors and analyzes events, such as security events and application events. In the context of web-based financial transactions, the application events may include user activity events/financial transactions events, e.g., navigation by a user through an online financial system for example via web clicks to review a bank balance, initiate a transfer of funds in and out of the user's credit card account, add a payee, update account information, etc. In one embodiment, each page-by-page traversal through the financial system by a particular user and during a particular session (e.g., web session, login session, etc.) is grouped in an event flow. As such, an event flow is a collection of events on a per~user and per-session basis. The user and/or session information of an event may be determined from fields in the event. The events include security events, application events (e.g., financial transactions events), and other types of events generated by agents deployed at systems for which the security system has oversight.

[0047] At step 210, a raw baseline is determined. As used herein, a raw baseline is a set of historical event flows and time gap information associated with a particular user. The raw baseline may include event flows of multiple sessions. These event flows make up the historical record associated with a user. In one embodiment, an event flow is organized into a temporally-ordered sequence of events on a per-user and per-session basis. Furthermore, the time gaps between each of the events in the sequence and/or the overall time spent in the session are determined for each event flow. The time gap between one event and the next sequential event may be determined by taking the difference between the time of occurrence of one event and the time of occurrence of the next sequential event.

[0048] Table 1 below is an example of a partial raw baseline associated with User 1 , as may be contained in an event table of a data list. The raw baseline includes time gap information.

TABLE 1

1 3 Summary 1 :20:12 pm 7 sec

1 3 Schedule 1 :23:12 pm 3 min

Payment

1 4 Logon 1 1 :10:00 am —

1 4 View Acct Bai 1 1 :10:05 am 7 sec

1 4 Summary 1 1 :10:12 am 3 ssc

1 4 Schedule 1 :13:12 am 4 min

Payment

1 5 Logon 9:00:00 am —

1 5 View Acct Bal 9:00:04 am 8 sec

1 5 Add Payee 9:01 :04 am 1 min

1 6 Logon 3:30:00 pm —

1 6 Rewards 3:31 :00 pm 1 min

Summary

[0049] An event flow represents a "normal" behavior of the user. In another embodiment, an event flow represents "anomalous" behavior. For example, a postmortem analysis is performed whereby the event flow is flagged as being either normal or anomalous.

[0050] In one embodiment, the raw baseline is constructed on existing historical event flows (including sequences of user action events) from a transaction log, event database, or the like, for example as a batch operation. The raw baseline may then be updated in real-time or in batch mode as new events are received, for example by a manager in a security system.

[0051] At step 220, a distinctive, temporally-ordered sequence of events based on the raw baseline. In one embodiment, the event flows and timing data associated with the raw baseline are analyzed and a pattern of business or behavior of the particular user is determined. Pattern recognition or identification may involve merging the event flows in the user-specific raw baseline such that when a pattern is identified, the distinctive sequence of events is represented as a temporally- ordered sequence of events.

[0052] In one embodiment, the history of the user's actual page-by-page traversal through a financial system is analyzed for patterns. For example, before scheduling a payment for a monthly credit card bill, it may be gleamed from the raw baseline that the user typically begins a session with a login page on the credit card online site, views the account balance, views a summary of transactions for the current pay period, and finally navigates to the schedule payment page. This pattern is then represented as a temporally-ordered sequence of events.

[0053] Furthermore, pattern discovery may be performed, for example, where the number of occurrences of a particular event in a user-specific raw baseline is tracked, and a support graph of user activity may be generated using the raw baseline and the number of occurrences. Pattern discovery is further described in U.S. Pat. No. 7,984,502, filed October 1 , 2008, which is hereby incorporated by reference herein in its entirety. Other methods of pattern discovery may also be applied.

[0054] At step 230, timing data for the sequence of events is determined based on the raw baseline. In one embodiment, the timing data includes various statistics about the sequence of events such as minimum time, maximum time, average time, and standard deviation.

[0055] In one embodiment, ail event flows in the raw baseline (for a particular user) having the same sequence of events or subset thereof are grouped together. As previously described, each event flow also includes data about the time gap between each sequential event in the flow. For example, as shown in Table 1 , in one event flow (i.e., session ID 3), the time gap between navigation from a login page to a view account balance page is 5 seconds. In another event flow (i.e., session ID 4), the time gap between these two pages may be 7 seconds. [0056] The minimum time statistic may be the least time that was historically executed by the user when traversing from one particular event to the next sequential event (or another later-occurring event) in an event flow during the course of a session. Using the example as shown in Table 1 , there are three event flows which include a traversal from a logon page to a view account balance page. In this example, the minimum time is 5 seconds for traversing from the login page to the view account balance page.

[0057] The maximum time statistic may be the most time that was historically executed by the user when traversing from one event to the next sequential event (or another later-occurring event) in an event flow during the course of a session. Using the example as shown in Table 1 , the maximum time is 7 seconds for traversing from the login page to the view account balance page.

[0058] Likewise, the average time statistics may be the statistical average of ail historically-occurring event flows from one particular event to the next sequential event (or another later-occurring event). Using the example as shown in Table 1 , the average time is 6 seconds for traversing from the login page to the view account balance page. The standard deviation statistic for the same page-by-page traversal is 1 second, in this example. Other time-based statistics may also be determined.

[0059] A reference baseline is generated, at step 240, based on the distinctive sequence of events and the associated timing data. As used herein, a reference baseline is a set of distinctive temporally-ordered sequences of events that have been identified as being representative of a user-specific pattern of business or behavior. The reference baseline also includes the timing data associated with each distinctive sequence of events.

[0060] The patterns and the timing data in the reference baseline are used to detect anomalies in the user's behavior. In one embodiment, the reference baseline is generated and/or updated on a fixed schedule. The reference baselines are periodically updated, for example to reflect changes to the ongoing user usage of and interaction with a system, such as a financial system. Updates to the reference baseline may be based on updates that are performed to the

corresponding raw baseline.

[0061] Table 2 below is an example of a partial reference baseline associated with User 1 , as may be generated based on the partial raw baseline of Table 1 . The partial reference baseline of Table 2 below includes timing data (e.g., timing statistics).

TABLE 2

[0062] At least three distinctive reference patterns are identified in the partial reference baseline of Table 2. Two attributes of timing statistics are included for each event, i.e., the average time gap from a previous event in a sequence, and the standard deviation time gap from a previous event in the sequence. Various other timing statistics may also be included in the reference baseline.

[0063] ANO ALY DETECTION

[0064] As previously described, the detection of potential intrusions, including fraud during the course of financial transactions, is facilitated by the use of a reference baseline. More specifically, the distinctive sequences in the reference baseline are used to detect anomalies in the user's behavior. Sequences of events in a current session are analyzed according to rules. Rule evaluation may include comparing the sequences of the current session against the historical patterns in the reference baseline. Significant deviations from the reference baseline may result in rule violation and detection of an anomaly, which can be indicative of a potential intrusion.

[0065] FIG. 3 is a process flow diagram for analysis of user activity events based on deviation from a reference baseline in accordance with an embodiment. The depicted process flow 300 may be carried out by execution of sequences of executable instructions. In another embodiment, various portions of the process flow 300 are carried out by components of a network security system, an arrangement of hardware logic, e.g., an Application-Specific Integrated Circuit (ASIC), etc. For example, blocks of process flow 300 may be performed by execution of sequences of executable instructions in a rules engine of the network security system. The rules engine may be deployed, for example, at a manager in the network security system.

[0066] As previously described, the network security system monitors and analyzes events, such as security events. More specifically, the events are evaluated against rules. The rules may specify specific conditions, which involve the evaluation of temporally-ordered sequences of events and time gaps between those events in the sequence. An example of such a rule may be:

RULE 1

If (transactional anomaly) occurs (by a User ID) (standard deviation of time gap up to 1 standard deviation for the User ID) then (Action). [0087] For this rule, the condition is "transactional anomaly," where a sequence of events for the user's current session fails to match the reference baseline for that user. Furthermore, the threshold range of validity for finding a rule match is a "standard deviation of time gap up to 1 standard deviation for the User ID," and the aggregation criterion is "by the User ID."

[0068] Typically, rule conditions are static, in that a single pattern of activity applies to all users of a transactional system. As described herein, the rule condition in this example involves tracking of a sequence of events to determine if an anomaly is present. When a sequence of a current session is evaluated, the user-specific reference baseline is used as a comparison point. As such, the rules as described herein are not static, but instead, the rule conditions are expressed such that a specific pattern of activity for each user is used to evaluate the current session's sequence of events.

[0069] Furthermore, many rule conditions are Boolean in nature, meaning, the conditions are either satisfied or not. As described herein, the rules may specify a deviation threshold or other type of threshold which allows for a range of validity for satisfying the rule condition. The deviation threshold may be expressed such that duplicate, missing, and/or out-of-order events are taken into account, in addition to timing data and/or temporal position data. There may be Boolean conditions which are also a part of the same rule.

[0070] At step 305, an event is received for a current session. As each event is received, it is evaluated against various rules in real-time.

[0071] The rule evaluation involves matching, on an iterative basis, events of a flow for the current session, to a user-specific reference baseline. In other words, a comparison is performed and it is determined whether the received event in a flow of a current session matches a pattern in the reference baseline. In one

embodiment, a rules engine performs this comparison.

[0072] Specifically, at step 310, it is determined whether there is at least a partial match with an identified pattern (represented as a distinctive sequence of events) in a user-specific reference baseline. In particular, a user ID associated with the received event is determined. A reference baseiine specific for that user ID is identified.

[0073] The received event is compared to the reference baseline. In particular, a temporal position of the event within an event flow of a current session is determined. For example, an event with the earliest time of occurrence among the other events in the flow may have the first temporal position, an event with the second earliest time of occurrence may have the second temporal position, etc. The received event is compared to the events in the reference baseline having the same temporal position. Moreover, an attribute of the received event (e.g., event type) is compared to the same attribute of the events in the reference baseline.

[0074] A time gap between a time of occurrence of the received event and a time of occurrence of a previous event in the current session or other timing information is tracked. In addition to comparing the temporal position, in one embodiment, the time gap associated with the received event is compared to the timing data in the reference baseiine.

[0075] During the course of the comparison based on temporal position, timing information, or other attribute, the determination of a match may be static and/or may be based on deviation thresholds. The deviation thresholds specify a tolerance (e.g., range of validity) within which a match is still determined. The tolerance may be implemented as a rule condition.

[0076] For example, a user-specific reference baseline may include a sequence of event 1 followed by event 2, with an average time gap of 7 seconds. The reference baseline may include an attribute listing the average time gap between these specific events. For purposes of rule-matching, a current session's sequence of events may be deemed to match the reference baseline as long as the time gap is within the standard deviation of 2 seconds from the average time gap in the reference baseline. The deviation threshold may be expressed with respect to timing data, as described in this example, or with respect to another attribute such as temporal position, such that a tolerance is expressed for missing events or events that occur out-of-order from what is expected. For example, the non- occurrence of some events in a sequence in the reference base may be tolerated, whereas the non-occurrence of others may not be tolerated.

[0077] When evaluating the received event, if a partial match is not found, at step 315 it is determined that there is no sequence match between the sequence for which the received event is a part of and the reference baseline.

[0078] At step 317, an outcome of the rule condition is determined. The reference baseline may include sequences that are indicative of "normal" user behavior and/or "anomalous" user behavior. In one embodiment, where the reference baseline includes just those sequences indicative of "normal" user behavior, any incoming sequence of a current session that does not match the reference baseline is deemed to be suspect, and the rule condition returns "true" (where the condition is looking for anomalous behavior), at step 317, i.e., a rule fire. Where the reference baseline includes just those sequences indicative of "anomalous" user behavior, any incoming sequence of a current session that does not match the reference baseline is deemed to be non-suspect, and the rule condition returns "false," at step 317.

[0079] In another embodiment, where the reference baseline includes those sequences indicative of "anomalous" and "normal" user behavior, an incoming sequence of a current session that does not match the reference baseline, the rule condition returns "true," resulting in a rule firing since the user's action were unexpected (based on history) and should be investigated. An alert generated from this type of rule firing may be associated with a lower priority level, than, for example, if an anomalous sequence had been matched. In another embodiment, the rule condition returns "false," for example if a policy goal is to minimize the likelihood of false positives or otherwise to limit rule firings to matched anomalous sequences.

[0080] When evaluating the received event, if a partial match is found at step 310, it is then determined whether there is a full match, at step 320. As used herein, a full match occurs where the flow of events in the current session matches an entire sequence of events in the reference baseline. If there is no full match, processing continues to step 305, where another event in the current sequence is received. The comparisons at step 310 and 320 may occur as a single step, and on an event by event basis in real-time as the events are received by a manager. Matching may also be performed on an aggregate basis, operating on an set of events of the current session at once.

[0081] Where a full match is determined at step 320, it is determined that a sequence match is found at step 325. As previously mentioned, an outcome of the rule condition is determined at step 330. The reference baseline may include sequences that are indicative of "normal" user behavior and/or "anomalous" user behavior, in one embodiment, where the reference baseline includes just those sequences indicative of "normal" user behavior, any incoming sequence of a current session that matches the reference baseline is deemed to be non-suspect, and the rule condition returns "false" (where the condition is looking for anomalous behavior). Where the reference baseline includes just those sequences indicative of "anomalous" user behavior, any incoming sequence of a current session that matches the reference baseline is deemed to be suspect, and the rule condition returns "true."

[0082] In another embodiment, the reference baseline includes sequences indicative of "anomalous" and/or "normal" user behavior. Where an incoming flow of a current session matches the reference baseline, it is determined whether the matching sequence in the reference baseline is flagged or otherwise marked as "anomalous" or "normal." If the match is with an "anomalous" sequence, the rule condition returns "true," and if the match is with a "normal" sequence, the rule condition returns "false."

[0083] For purposes of explanation, a reference baseline associated with a particular user may include three reference baseline flows as shown in Table 2: 1 ) Event A (Logon), Event B (View Account Balance), Event C (Summary), and Event D (Schedule Payment); 2) Event A (Logon), Event B (View Account Balance), Event E (Add Payee); and 3) Event A (Logon), and Event F (Rewards Summary), in one example. A rule, such as Rule 1 as describe above, is evaluated on the current session's events. [0084] In the current session, for example, Event A Is received at step 305. Event A is compared to the reference baseline sequences. Since all three sequences begin with Event A, it is determined that there is at least a partial match. Where there is no match, for example if Event G is received at step 305, it is determined there is no sequence match at step 315. Where the reference baseline includes sequences indicative of normal behavior, a determination of no sequence match leads to characterizing the current session's events as suspicious .A rule condition may return "true" at step 317 in response.

[0085] The comparison continues from step 310 to step 320 where it is

determined whether there is a full match based on the flow of events (and timing information) already received for the current session. If there is no full match as yet, the system continues to attempt matching with subsequently received events. Specifically, processing continues to step 305, where another event is received. Thus, the matching/comparison process occurs in an iterative manner. In a subsequent iteration, at step 310, it is determined whether the current flow of events can be at least partially matched.

[0088] Continuing with the example above, if an Event B is received in a second iteration within 5 seconds of Event A, it is determined that the current session's flow (which now includes Event A followed by Event B) at least partially matches with a both reference pattern ID 1 and reference pattern ID 2, based on the deviation tolerance specified in Rule 1 .

[0087] At step 320, it is determined whether there is a full match as yet. At this point, the current flow includes just Event A and Event B. To be a full match with the reference pattern ID 1 , the flow of the current session should also include Events C and D within the deviation tolerance. Likewise, to be a full match with the reference pattern ID 2, the flow should also include Event E within the deviation tolerance. As such, there is no full match. The system may then iterate again, and the next event is received at step 305.

[0088] Once a full match is determined at step 325, rule conditions are

determined at step 330. Where the reference baseline includes sequences indicative of normal behavior, the current session events are characterized as norma! and the rule condition returns "false" at step 330.

[0089] In one embodiment, the reference baseline may include sequences of normal activity and/or anomalous activity. In other words, an event sequence that a user has taken in the past, which therefore might be represented in the reference baseline, may be a normal event flow or an anomalous flow. For example, the first time a user executes a particular sequence, the current flow may not match at step 310, resulting in a determination of no sequence match at step 315.

[0090] The new sequence may be fed back to the reference baseline generation flow at FSG. 2, for updating the raw baseline and/or reference baseline. The system does not know whether this current sequence is a normal activity or an anomalous activity, for example, until after a post-mortem analysis is performed. During the post-mortem analysis, the user may possibly be contacted and asked to validate the activity. The user may enter the result and thereby flag the new sequence as normal or anomalous. If normal, the new sequence is added to the raw baseline so that future matching event sequences will find a match and not fire the rule. If anomalous, the new sequence may also be added to the baseline but flagged as anomalous, such that future matches to this pattern will cause the rule to be fired.

[0091] As such, a rule may be fired if either no flow match is found, as shown in FIG. 3, or a match is found with a flow that is flagged as anomalous and/or otherwise suspicious.

[0092] FIG, 4 is a process flow diagram for triggering a response action in accordance with an embodiment. The depicted process flow 400 may be carried out by execution of sequences of executable instructions. In another embodiment, various portions of the process flow 400 are carried out by components of a network security system, an arrangement of hardware logic, e.g., an Application- Specific Integrated Circuit (ASIC), etc. For example, blocks of process flow 400 may be performed by execution of sequences of executable instructions in a rules engine of the network security system. The rules engine may be deployed, for example, at a manager in the network security system. [0093] At step 410, it is detected that a rule condition returned "true," for example after an event in a current session was evaluated against the rule. At step 415, an action may be triggered in response.

[0094] The actions triggered by the rules may include executing a pre-determined command or script, updating a data list, logging an alert, sending alerts to a console or notification designees, setting custom severity levels for the alert based on cumulative activity, adding a source to a list of suspicious sources, adding a target to a vulnerability list, triggering another rule, or any combination of these actions.

[0095] Furthermore, the action may include triggering another rule. For example, the information from transactional events corresponding to a user's interaction with an online account may be further correlated with network security events and with knowledge of the network infrastructure including but not limited to asset model, vulnerabilities of various machines and other attacks that are underway in the system in real-time, as the simultaneous presence of attacks and detected anomalies may increase the likelihood of fraud. As such, correlation may be performed for multiple data sources.

[0096] In other words, the separate rule may include rule conditions that look for fraud-related rule firings and also look for security-related rule firings, for example for a target host. When the conditions are satisfied, correlation may be performed thereby bridging transactional or fraud-based event knowledge with security data.

[0097] In one embodiment, the action includes generating a correlation event. The correlation event may include information about the base events that triggered the rule in addition to rule-specific information. Sn one embodiment, the correlation event may be used for condition-matching thus allowing chaining of multiple rules.

[0098] FIG. 5 illustrates a computer system in which an embodiment may be implemented. The system 500 may be used to implement any of the computer systems described above. The computer system 500 is shown comprising hardware elements that may be electrically coupled via a bus 524. The hardware elements may include at least one central processing unit (CPU) 502, at least one input device 504, and at ieast one output device 508. The computer system 500 may also include at least one storage device 508. By way of example, the storage device 508 can include devices such as disk drives, optical storage devices, solid- state storage device such as a random access memory ("RAM") and/or a read-only memory ("ROM"), which can be programmable, fiash-updateable and/or the like.

[0099] The computer system 500 may additionally include a computer-readable storage media reader 512, a communications system 514 (e.g., a modem, a network card (wireless or wired), an infra-red communication device, etc.), and working memory 518, which may include RAM and ROM devices as described above. In some embodiments, the computer system 500 may also include a processing acceleration unit 516, which can include a digital signal processor (DSP), a special-purpose processor, and/or the like.

[0100] The computer-readable storage media reader 512 can further be connected to a computer-readable storage medium 510, together (and in combination with storage device 508 in one embodiment) comprehensively representing remote, local, fixed, and/or removable storage devices plus any tangible non-transitory storage media, for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information (e.g., instructions and data). Computer-readable storage medium 510 may be non- transitory such as hardware storage devices (e.g., RAM, ROM, EPROM (erasable programmable ROM), EEPROM (electrically erasable programmable ROM), hard drives, and flash memory). The communications system 514 may permit data to be exchanged with the network and/or any other computer described above with respect to the system 500. Computer-readable storage medium 510 includes a rules engine 525, raw baseline module 526, and a reference baseline module 527.

[0101] The computer system 500 may also comprise software elements, which are machine readable instructions, shown as being currently located within a working memory 518, including an operating system 520 and/or other code 522, such as an application program (which may be a client application, Web browser, mid-tier application, etc.). It should be appreciated that alternate embodiments of a computer system 500 may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets), or both. Further, connection to other computing devices such as network input/output devices may be employed.

[0102] The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made.

[0103] Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example of a generic series of equivalent or similar features.