Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND NON-TRANSITORY COMPUTER READABLE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2021/215019
Kind Code:
A1
Abstract:
An information processing apparatus (1) for determining an anomaly score for each element in a sequence of events w1, w2,..., wn, includes: a block scoring component (10) that, given that the start i and end position i + na - 1 of a sequence, estimates a block anomaly score p(w|i, na) based on the difference in distribution of events in [i, i + na - 1] and outside [1, i - 1] U [i + na, n]; and a score aggregation component (20) that weights each score p(w|i, na) based on the prior distribution p(na) and then uses either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event.

Inventors:
ANDRADE SILVA DANIEL GEORG (JP)
OKAJIMA YUZURU (JP)
Application Number:
PCT/JP2020/018364
Publication Date:
October 28, 2021
Filing Date:
April 23, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEC CORP (JP)
International Classes:
G06N20/00; G06F21/55
Domestic Patent References:
WO2019163154A12019-08-29
Foreign References:
JP2005236863A2005-09-02
US20170097863A12017-04-06
US20180349801A12018-12-06
US20190243743A12019-08-08
Attorney, Agent or Firm:
IEIRI, Takeshi (JP)
Download PDF:
Claims:
CLAIMS

[Claim 1]

An information processing apparatus for determining an anomaly score for each element in a sequence of events wi, W2, . . wn, comprising: a block scoring component that, given that the start i and end position i + na - 1 of a sequence, estimates a block anomaly score p(w|i, na) based on the difference in distribution of events in [i, i + na - 1] and outside [1, i - 1] U [i + na, n]; and a score aggregation component that weights each score p(w|i, na) based on the prior distribution p(na) and then uses either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event. Equation (2)

Equation (5)

[Claim 2]

The information processing apparatus according to claim 1 , wherein the score aggregation component weights each score p(w|i, na) based on the prior distribution p(na) and prior knowledge about the start position i conditional on na (given in the form of a prior p(i|na)).

[Claim 3]

The information processing apparatus according to claim 1 , wherein the calculation of p(w|i, na) is limited to the combination of i and na for which p(na) is non-zero.

[Claim 4]

An information processing method for determining an anomaly score for each element in a sequence of events wi, W2, . . ., wn, comprising: estimating, given that the start i and end position i + na - 1 of a sequence, a block anomaly score p(w|i, na) based on the difference in distribution of events in [i, i + na - 1] and outside [1, i - 1] U [i + na, n]; and weighting each score p(w|i, na) based on the prior distribution p(na) and then using either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event.

Equation (2)

Equation (3)

[Claim 5]

The information processing method according to claim 4, wherein the weighting each score p(w|i, na) is based on the prior distribution p(na) and prior knowledge about the start position i conditional on na (given in the form of a prior distribution p(i|na)).

[Claim 6]

The information processing method according to claim 4, wherein the calculation of p(w|i, na) is limited to the combination of i and na for which p(na) is non-zero.

[Claim 7]

A non-transitory computer readable medium storing a program for causing a computer to execute an information processing method, the information processing method comprising: estimating, given that the start i and end position i + na - 1 of a sequence, a block anomaly score p(w|i, na) based on the difference in distribution of events in [i, i + na - 1] and outside [1, i - 1] U [i + na, n]; and weighting each score p(w|i, na) based on the prior distribution p(na) and then using either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event.

Equation (2)

Equation (5)

[Claim 8]

The non-transitory computer readable medium according to claim 7, wherein the weighting each score p(w|i, na) is based on the prior distribution p(n ) and prior knowledge about the start position i conditional on na (given in the form of a prior distribution p(i|na)).

[Claim 9]

The non-transitory computer readable medium according to claim 7, wherein the calculation of p(w|i, na) is limited to the combination of i and na for which p(na) is non zero.

Description:
DESCRIPTION

Title of Invention

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND NON-TRANSITORY COMPUTER READABLE MEDIUM

Technical Field

[0001]

The present invention relates to an information processing apparatus, an information processing method and a program for determining anomaly scores for events that are arranged in time (e.g. log data).

Background Art

[0002]

A general method often used for detecting anomalies is to use a uni-gram or n-gram model. These models use the past n - 1 events to predict the next event. Such these models do not allow to include prior knowledge about the block structure of an anomaly. For example, a cyber-attack is likely to occur in a relatively short time window (e.g. a few hours). As a result, several attack-related events may occur in one block.

[0003]

Fig. 2 shows an example of the log data which consists of a sequence of events with various process names. In Fig. 2, "user" denotes the programs that were executed by the owner of the computer, while "attacker" denotes the programs that were executed by an intruder. The attack occurs in one block. However, n-gram anomaly detection systems cannot exploit the block structure. Therefore, as show in Fig. 3, the detected anomaly scores have high values almost uniformly spread over the log data. Therefore, n-gram anomaly detection systems cannot identify the block of attack.

[0004]

Another possible method is to use a hidden Markov Model (HMM) model. However, in a standard hidden Markov Model, it is difficult to control the length of the attack (block size) and the number of blocks. Similarly, recent state space models for anomaly detection as in NPL1 (Du et al 2017) do not allow to incorporate prior knowledge about the block size.

Citation List Non Patent Literature

[0005]

NPL 1: (Du et al 2017): Du, M., Li, F, Zheng, G., Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1285-1298).

Summary of Invention Technical Problem

[0006]

The anomaly detection described in NPL1 (Du et al 2017), do not allow to incorporate prior knowledge about the concentration of anomaly related events (hereinafter may be referred to as block size). In case, where anomalies tend to occur in blocks, for example, a cyber attack resulting in many anomalies during a small time window, the method described in NPL1 cannot determine anomaly scores appropriately.

[0007]

The present disclosure has been made in view of the aforementioned problem and aims to provide an information processing apparatus, an information processing method and a program capable of appropriately determining anomaly scores.

Solution to Problem

[0008]

An information processing apparatus according to a first exemplary aspect of the present disclosure is an information processing apparatus for determining an anomaly score for each element in a sequence of events wi, W2, . . ., w n , including: a block scoring component that, given that the start i and end position i + n a - 1 of a sequence, estimates a block anomaly score p(w|i, n a ) based on the difference in distribution of events in [i, i + n a - 1] and outside [1, i - 1] U [i + n a , n]; and a score aggregation component that weights each score p(w|i, n a ) based on the prior distribution p(n a ) and then uses either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event. Equation (2)

Equation (5)

An information processing method according to a second exemplary aspect of the present disclosure is an information processing method for determining an anomaly score for each element in a sequence of events wi, W2, . . w n , including: estimating, given that the start i and end position i + n a - 1 of a sequence, a block anomaly score p(w|i, n a ) based on the difference in distribution of events in [i, i + n a - 1] and outside [1, i - 1] U [i + n a , n]; and weighting each score p(w|i, n a ) based on the prior distribution p(n ) and then using either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event.

Equation (2) Equation (3)

A non-transitory computer readable medium storing a program according to a third exemplary aspect of the present disclosure is a non-transitory computer readable medium storing a program for causing a computer to execute an information processing method, the information processing method including: estimating, given that the start i and end position i + n a - 1 of a sequence, a block anomaly score p(w|i, n a ) based on the difference in distribution of events in [i, i + n a - 1] and outside [1, i - 1] U [i + n a , n]; and weighting each score p(w|i, n a ) based on the prior distribution p(n a ) and then using either the maximum (obtained from Equations (2) and (3)) or the average (obtained from Equations (4) and (5)) to determine an anomaly score for each event.

Equation (2)

Equation (4)

Advantageous Effects of Invention

[0011]

According to the present disclosure, it is possible to provide an information processing apparatus, an information processing method and a program capable of appropriately determining anomaly scores. The present disclosure tends to produce high anomaly scores for events that are close together (occurring in one block) rather than being uniformly spread over time.

Brief Description of Drawings

[0012]

[Fig. 1]

Fig. 1 shows a block diagram showing an example configuration of an information processing apparatus (or proposed method) for determining an anomaly score of each event according to an embodiment of the present disclosure.

[Fig. 2]

Fig. 2 shows an example where the anomaly is a sequence of processes that were executed by an attacker.

[Fig. 3]

Fig. 3 shows the result of an anomaly detection that is based on uni-grams.

[Fig. 4]

Fig. 4 shows the output of the block scoring component for i = 1 and n a = 3.

[Fig. 5]

Fig. 5 shows the output of the block scoring component for i = 6 and na = 3.

[Fig. 6]

Fig. 6 shows the output of all block scores (left hand side) and after reweighting and normalization with prior knowledge (right hand side).

[Fig. 7]

Fig. 7 shows the final output of the information processing apparatus (or proposed method). [Fig. 8]

Fig. 8 is a block diagram illustrating a hardware configuration example of the information processing apparatus.

Description of Embodiments

[0013]

Hereinafter, specific embodiments to which the present disclosure including the above-described example aspects is applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference signs, and repeated descriptions are omitted for clarity of the description.

[0014]

Fig. 1 shows an example configuration of an information processing apparatus (or method) according to an embodiment of the present disclosure. The information processing apparatus 1 includes Block Scoring Component 10 and Score Aggregation Component 20. The information processing apparatus 1 , in a first step, calculates an anomaly score not for a single event (element), but for all plausible blocks of one or more events. The resulting block anomaly scores are then combined with the user's prior assumption about the length of blocks to arrive at a final anomaly score for each single event. These core components will be described below more in detail.

[0015]

The information processing apparatus 1 assumes discrete time events, in particular, which are denoted by [a, b] the time index set {a, a + 1, a + 2, . . . b}, where a < b. The event at time (index) j ∈ [1, n] is denoted as wj . The sequence of observed events is denoted as w = (wi, W2, ... w n ), which may be also referred to as the log file or log data. Here, i is used for denoting index where anomaly starts, and n a is used for denoting the number of events that are considered as anomaly (the number of anomaly events). Hereinafter, n a may also be called the block size (of the anomaly).

[0016]

Block Scoring Component 10

First, given i and n a , Block Scoring Component 10, as shown in Fig. 1, receives event sequence data 101 and calculates a score representing the likelihood where the anomaly occurred in the interval [i, i +n a - 1]. In other words, Block Scoring Component 10 estimates the plausibility of anomaly starting at position i and ending at position i +n a - 1. This score might be interpreted as the conditional probability p(w|i, n a ).

For example, we may define [math 1] where k a is the number of different event types in the interval [i, i + na - 1] and m a is the number of event types that occurred in the interval [i, i + n a - 1] but not in [1, i - 1] U [i + n a , n]. [0017]

Score Aggregation Component 20

Based on the user's prior belief about the index where anomaly starts i and block size n , expressed as a prior distribution p(i, n a ), Score Aggregation Component 20 may aggregate all scores p(w|i, n a ). For example, if the user is ignorant about the index where anomaly starts, Score Aggregation Component 20 may use a uniform distribution for p(i|n a ) and any prior p(n a ) 102 as follows:

For example, the p(n ) 102 shown in Fig.l may be modeled as a truncated Poisson distribution or a negative Binomial distribution.

[0019]

Based on p(i, na), Score Aggregation Component 20 aggregates the scores p(w|i, na) for all na ∈ [1, n] and i ∈ [1, n - na+ 1], to obtain a score p(j) representing the degree of plausibility that j is related to an anomaly. The score p(j) may be referred to as " anomaly score".

[0020]

For example, using the maximum a-posterior (MAP) estimate, Score Aggregation

Component 20 obtains the maximum as follows:

[math 2] and the corresponding score p(j) is set as [math 3]

As described above, Score Aggregation Component 20 can use the maximum to determine an anomaly score for each event. [0021]

As an alternative, Score Aggregation Component 20 can use the full posterior probability distribution:

[math 4] where the normalization constant is ' and then set

[math 5] where 1A(J) denotes the indicator function which is 1, if j ∈ A, and 0 otherwise. As described above, Score Aggregation Component 20 can use the weighted average, which is implicitly specified by the use of the proportional sign in Equation (4), to determine an anomaly score for each event.

[0022]

In general, the computational costs are in 0(n 2 ), which can be computationally infeasible for large n. Therefore, we might assume that n ∈ [n m in, n m ax], where n m m and n max is the minimal/maximal number of anomaly events that is considered plausible, i.e.

:p(na) = 0. Therefore, if n max - n mi n ∈ 0(1), then the computational costs are reduced to O(n). [0023]

The score aggregation component 20 weights each score p(w|i, n a ) based on the prior p(n a ) 102 and prior knowledge about the start position i conditional on n a (given in the form of a prior p(i|n a )).

[0024]

Finally, we explain an example with reference to Figs. 4 to 7. Block Scoring Component 10 receives, as input, the event sequence as shown in Fig. 2. Here, we assume the following prior for n a :

Furthermore, we assume a uniform prior on p(i|na), i.e. . We define p(w|i, na) as in Equation (1). Furthermore, for calculating the final anomaly score p(j), we use the full posterior probability distribution as in Equation (5).

[0026]

Fig. 4 shows the output of the block scoring component for i = 1 and n a = 3. Fig. 5 shows the output of the block scoring component for i = 6 and n a = 3. "Attack block" shown in

Figs. 4 and 5 indicates the block in which all events are assumed to be attacks (i.e. a consecutive sequence of anomalies). All outputs of the block scoring component 10 are shown on the left hand side in Fig. 6. The block scoring component 10 provides the outputs to Score Aggregation

Component 20.

[0027]

Next, Score Aggregation Component 20 weights each output based on the prior distribution p(i, na) as shown in Equation (4). Here, this means placing all weights on n a = 3. After normalization, Score Aggregation Component 20 gets the posterior distribution p(i, n a |w) as shown on the right hand side in Fig. 6. Finally, using the weighting as in Equation (5), Score Aggregation Component 20 arrives at the anomaly scores for each event, shown in Fig. 7. If Information processing apparatus 1 considers all events with anomaly scores larger than 0.0, as attacks, then Information processing apparatus 1 correctly detects all true attack events and has only one false-alarm (event "Word" in Fig. 7). As described above, the present disclosure can appropriately determining anomaly scores, and tends to produce high anomaly scores for one or more events that are close together (occurring in one block) rather than being uniformly spread over time. [0028]

Specification via a Generative Model

Here, we give another example of the specification of the likelihood function p(w|i, na) (used by the Block Scoring Component 10) and the prior distribution p(i, n ) (used by the Score Aggregation Component 20). Let us denote by 1 c the all one vector of dimension c. Furthermore, the Poisson distribution with average parameter l truncated to be in the interval [a, b] is denoted by Poisson [ a], ( b λ ). A summary of all parameters is given in Table 1.

Fig. 8 is a block diagram illustrating a hardware configuration example of the information processing apparatus. In view of Fig. 8, the information processing apparatus 1 includes a network interface 1201, a processor 1202 and a memory 1203. The network interface 1201 is used to communicate with a network node. The network interface 1201 may include, for example, a network interface card (NIC) compliant with, for example, IEEE 802.3 series.

[0030] The processor 1202 performs processing of the information processing apparatus 1 described with reference to the sequence diagrams and the flowchart in the above embodiments by reading software (computer program) from the memory 1203 and executing the software.

The processor 1202 may be, for example, a microprocessor, an MPU or a CPU. The processor

1202 may include a plurality of processors. [0031]

The memory 1203 is configured by a combination of a volatile memory and a non-volatile memory. The memory 1203 may include a storage disposed apart from the processor 1202. In this case, the processor 1202 may access the memory 1203 via an unillustrated I/O interface.

[0032]

In the example in Fig. 8, the memory 1203 is used to store a software module group.

The processor 1202 can perform processing of the information processing apparatus described in the above embodiments by reading these software module groups from the memory 1203 and executing the software module groups.

[0033]

Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (Blu-ray (registered trademark) Disc), and semiconductor memories (such as mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.

[0034]

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. Eiowever, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the techniques and their practical applications. Others skilled in the art are thereby enabled to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

[0035]

Although the disclosure and examples have been fully described with reference to the accompanying figures, it is to be noted that various changes and modifications will become apparent to those skilled in the art. Such changes and modifications are to be understood as being included within the scope of the disclosure and examples as defined by the appended claims. Industrial Applicability

[0036]

The present disclosure is applicable to an information processing apparatus for detecting anomalies that occur in a short time window rather than being uniformly spread over time in cyber attacks. Also, the information processing apparatus is applicable for forensic purposes.

Reference Signs List

[0037]

1 Information processing apparatus 10 Block Scoring Component

20 Score Aggregation Component

101 Event sequence data

102 Block size prior distribution 201 Anomaly probabilities