Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM FOR ANOMALY DETECTION ON CAN BUS DATA WITH SPARSE AND LOW RANK DECOMPOSITION OF TRANSFER ENTROPY MATRIX
Document Type and Number:
WIPO Patent Application WO/2018/067227
Kind Code:
A1
Abstract:
Described is a system for detecting cyber intrusions based on analysis of network traffic. During operation, the system performs a statistical analysis of message tuning on network traffic to produce a temporal dependency matrix representative of temporal dependency between different message types in the network traffic. The sets of temporal dependency matrices are decomposed into component matrices, where at least one component matrix represents typical properties of these matrices and at least one other component matrix represents atypical properties of the matrices. A new temporal dependency matrix is generated based on new network traffic. Finally, anomalous behavior is detected in the new network traffic by comparing component matrices of the new temporal dependency matrix, with component matrices of the temporal dependency matrices tinder normal operating conditions.

Inventors:
NI KANG-YU (US)
PAYTON DAVID (US)
Application Number:
PCT/US2017/045734
Publication Date:
April 12, 2018
Filing Date:
August 07, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HRL LAB LLC (US)
International Classes:
H04L12/40; H04L29/06
Foreign References:
US20110276682A12011-11-10
US20100275262A12010-10-28
US20140325364A12014-10-30
KR101334017B12013-12-12
Other References:
ATEF ABDELKEFI ET AL.: "Robust Traffic Anomaly Detection with Principal Component Pursuit", CONEXT STUDENT WORKSHOP, 30 November 2010 (2010-11-30), Philadelphia, USA, XP058400927, Retrieved from the Internet
See also references of EP 3523944A4
Attorney, Agent or Firm:
TOPE-MCKAY, Cary, R. (US)
Download PDF:
Claims:
CLAIM

claimed is:

A system for detecting cyberintrosions based on ana-lysis -of network traffic:, the system compiisiiig;

one or more processors and a memory, the memory being a non- transitory computer-readable medium ha ving executable- instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of:

performing a statistical analysis of message ti mi ng on network traffic, to produce a temporal dependency matrix representative of temporal dependency between different message types in the network traffic; decomposing sets- of temporal dependency matrices into component matrices, where at least one component matrix represents typical properties of these matr ices and at least one other component matrix represents atypical properties of the matrices;

generating a new temporal dependency matrix based on new network traffic; and

detecting anomalous behavior in the new network traffic by comparing component matrices of the new temporal dependency matrix with component matrices of the temporal dependency matrices under normal operating conditions.

The system as set forth in Claim 1, wherein temporal dependency matrix representative of temporal dependency between different message types in the network traffic is a Transfer Entropy Matrix (TE ) produced using Transfer Entropy calculations.

3. The system- as set forth in. Claim 2, wherein decomposing sets of temporal dependency matrices into component matrices is performed using sparse low- rank decomposition. 4, The system as set forth in Claim 3, wherein detecting anomalous behavior in the new network traffic is performed by first peforrmng sparse low-rank

decompositi on of individual Transfer Entropy matrices (TEM) and then comparing an 4t norm of a sparse component. | | .jwith a mean of a training set norm.

5. The system as set forth in- Claim 4, wherein detecting .anomalous behavior in 'the new network traffic further comprises determinaing a norm and standard deviation from sparse low-rank decompositions of multiple TEMs and performing a t-test to determine if the new network traffic exceeds a pre-set bound.

6. The system as set forth in Claim 5, wherein apon detecting anomalous behavio in the new network traffic, further comprising an operation of intiating a reactive response.

7. The system as set forth in Claim 6, wherein the reactive response includes

causing a vehicle to initiate, a safe-mode, where at least one vehicle system is disabled. 8. The system as set forth in Claim 3 , wherein decomposing sets of temporal

dependency matrices into component matrices is performed using sparse low- rank decomposition.

The system as set forth in Claim 1 , wherein detecting anomalous behavior in the new network traffic is performed by first pefomimg sparse low-rank

decompositi on of individual Transfer Entropy matrices (TE ) and then comparing an 4 norm of a sparse component, l¾lli th a mean of a training- set norm.

10, The system as set forth in Claim 1 , wherein detecting anomalous behavior in the new network traffic further comprises determinaing a norm and standard deviation from sparse low-rank decompositions of multiple Transfer Entropy matrices (TEM) and performing a Mest to determine if the new network traffic- exceeds a pre-sei bound, 1 1. The system a set forth as Claim 1, wherein upon detecting anomalous behavio in the new network traffic, further comprising an. operation of intiating a reactive response.

12. The system as set forth in Claim 3 1 , wherein the reactive response includes causing a vehicle to initiate a safe-mode, where at lest one vehicle system is disabled.

13. A -computer program product for detecting cyber intrusions based on analysis of network traffic, the computer program product comprising:

& non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions by one or more processors, the one or more processors perform operations of:

performing a statistical analysis of message timing on network traffic to produce a temporal dependency matrix representative of temporal dependency between different message types in the network traffic; decomposing sets of temporal dependency matr ices into component matrices, where at least one component matrix represents typical properties of these matrices and a t least one other component matrix represents atypical properties of the matrices; generatin a new temporal dependency matrix based on new network traffic; and

detecting anomalous behavior in the new network traffic by comparing component matrices of tne new temporal dependency matrix. with component matrices of the temporal dependency matrices under normal operating conditions.

14. The Computer program product as. set forth in Claim 1 ÷ wherein temporal

dependency matrix representative of temporal dependency between different message types in the network traffic is a Transfer Entropy Matrix (TEM) produced using Transfer Entropy calculations.

15. The computer program product as set forth in. Claim 13, wherein decomposing sets of temporal dependency matrices into component matrices is performed using sparse low-rank decomposition.

16. The computer program product as set forth in Claim 1 , wherein detecting

anomalous behavior in the new network traffic is performed by first peforming sparse low-rank decomposition of individual Transfer Entropy matrices (TEM) and then comparing an 4t noon of a sparse component, |z5|jwith a mean of a training set norm.

17. The computer program product as set forth i Claim .13, wherein detecting

anomalous behavior i the new network traffic further comprises deiemiinaing a norm and standard deviation from sparse low-rank decompositions of multiple Transfer Entropy matrices (TEMs) and performing a t-test to determine if the new network traffic exceeds a p.re~set bound.

18. The computer program product as set forth in Claim 13, wherein upon detecting anomalous behavior in the new network traffic, further comprising instructions for causing the one or more processors to perform an operation of mtiating a reactive response.

1 . The computer prograiri product as set . forth in 'Claim 18, wherein the reactive response includes causing a vehicle to initiate a. safe-mode, where at least one vehicle system is disabled.

20. A computer implemented method for detecting cyber intrusions based on

analysis of network traffic, the method comprising an act of;

causing one or more processets to execute instructions encoded o a non-transitory computer-readable medium, such thai upon execution, the one or more processors perform operations of

performing a statistical analysis of message timing on network traffic to produce a temporal dependency matri representative of "temporal dependency between different message types in the network traffic; decomposing sets of temporal dependency matrices into component matrices, where at least one component matrix represents typical properties of these matrices and at least one other component matrix represents atypical properties of the matrices;

generating a new temporal dependency matri based on new network traffic; and

detecting anomalous behavior in the new network traffic by comparing component matrices of the new temporal dependency matrix with component matrices of the temporal dependency matrices under normal operating conditions.

21. The computer implemented method as set forth in Claim 20, wherein temporal dependency matrix .representative of temporal dependency between different message types in the network traffic is a Transfer Entropy Matrix (TEM) produced using Transfer Entropy calculations.

22. The compiiier implemented method as set forth in Claim '20, wherein decomposing sets of temporal dependency matrices into component ma trices is performed using sparse low-rank decomposition.

23. The compiiier implemented method as set forth in Claim 20, wherein detecting anomalous behavior in the new network traffic is performed by first pefoxming sparse low-rank decompositio of individual Transfer Entropy matrices (TEM) and then comparing an s norm of a sparse component, |%i:l ith a mean of a training set norm.

24. The computer implemented method as set forth in Claim 20, wherein detecting anomalous behavior in the new network traffic further - comprises deterrnina ng a norm and standard deviation from sparse low-rank decompositions -of multiple Transfer Entropy matrices (TEMs) and performing a t-test to determine if the new network traffic exceeds a pre- set bound.

25. The computer implemented method as set forth in Claim 20, wherein upon detecting anomalous behavior in the new network: traffic, further comprising an operation of intiating a reacti ve response.

26. The computer implemented method as set forth in Claim 25, wherein the

reactive response includes causing a vehicle to initiate a safe-mode, where at least one vehicle svstero is disabled.

Description:
[0001] SYSTEM FOR ANOMALY DETECTIO ON CAN BUS DATA WITH SPARSE AND LOW RANK DECOMPOSITION OF TRANSFER ENTROPY

MATRIX

[0002] GOVERNMENT RIGHTS

[0003] This invention was made with government support under U.S. Government Contract Number D l 5PC00223, entitled, "Side Channel Causal Analysis for Design of Cyber-Physical Security " The government has certain rights in the invention.

[0004] CROSS-REFERENCE TO RELATED APPLICATIONS

[0005] This is a non-provisional patent application of LIS Serial No. 62/405,716, filed on October 07, 2016, the entirety of which is hereby incorporated by reference.

[0006] BACKGROUND OF INVENTION

[0007] (1) Field of Inventio

[0008] The present invention relates to an anomaly detection system and, more specifically, to a system for detecting anomalies on CAN bits data using sparse and low rank decomposition.

[0009] (2) Description of Related Art

[00010] Anomaly detection is the process by which anomal us data can be detected to prevent attacks or intrusion of malicious* data. Man known attacks on automobiles involve some form of spoofing or altering CAN bus messages. For instance, if an attacker can cause another module to go into diagnostic mode, they can stop that module's messages from appearing on the bus, which allows the attacker to replace those messages with their own. Depending on the module, these spoof messages can potentially put passengers in serious danger. [0001 1} Attempts have been made to address this issue. For instance, the researchers in Tay ' ler proposed a. frequency-based anomaly detection method to compare current and historical packet timing (see the List of Incorporated Literature References below, Reference No. 6). Their algorithm measures inter-packet timing over a sliding window. They found that the Hamming distance of data packets was an unreliable measure of normality. The inter-packet timing statistic is reliable for detecting inserted packets, with a one-class support vector machine. However, if the normal packet is not periodic, then detection of extra insertions could be more challenging. Moreover, their method is unlikely to work for other types of ttacks, suc as changing the packet order .

[00012] Thus, a continuing need exists for a system that for anomaly detection on CAN bus data.

[000 3] SUMMARY OF INVENTION

! 14] This disclosure provides a system for detecting cyber intrusions based on analysis of network traffic. The system comprises one or more processors and a memory. The memory is a non- transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform several operations. During operation, the system performs a statistical analysis of message timing on network traffic to produce a temporal dependency matrix representative of temporal dependency betwee different message types in. the network traffic. The sets of temporal dependency matrices are decomposed into component matrices., where at least one component raatTi represents typical properties of these matrices and at least one other component matrix represents atypical properties of the matrices. A new temporal dependency matrix is generated based on new network traffic. Fhiallv, anomalous behavior is detected in the new network. traffic by comparing component .matrice -of the. new temporal dependency matrix with component matrices of the temporal dependency matrices trader normal, operating conditions,

[00015) In another asepct, the temporal dependency matrix representative of

temporal dependency between different message types in the network traffic is a Transfer Entropy Matrix (TEM) produced using Transfer Entropy calculations.

[00016} Further, decomposing sets of temporal dependency matrices into component- matrices is performed using sparse low-rank decomposition,

[00017] In another aspect, detecting anomalous behavior in th new network traffic is performed b first petorming sparse low-rank decomposition, of individual Transfer Entropy matrices (TEM ) and then comparing an. t norm of a sparse component, |¾| j wiihi a mea of a training set norm,

[00018] Further, detecting anomalous behavior in the new network traffic further comprises determinamg a norm and standard deviation from sparse low-rank, decompositions of multiple TEMs and performing a t-test to determine if the new network traffic exceeds a pre-set hound,

[00 19] In another asepct, upon detecting anomalous behavior in. the new network traffic, the system satiates a reactive response. The reactive response includes causing a vehicle to initiate a safe-mode, where at least one vehicle system is disabled.

[00020] Finally, the present invention also includes a computer program product and a computer implemented method. The computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors, such that upon execution of the infractions, the one .-or more processors perform the operations listed herein. Alternatively, the computer implemented method includes an act of causing a conipiiter to execute such instructions and perfomi the resulting operations.

[00021 ] BRIEF DESCRIPTION OF THE DRAWINGS

[00022] The objects, features and advantages of the present invention will be

apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where:

[00023] F1Q. 1 is a block diagram depicting the components of a system according to various embodiments of the present invention;

[00024] FIG. 2 is an illustration of a computer program product embodyin an aspect of the present invention;

[00025] FIG. 3 A is a flowchart illustrating a process of .anomaly detection according to some embodiments of the present invention; [00026] FIG. 3B is an illustration depicting an example Transfer Entropy Matrix (TEM);

[00027] FIG. 4A is an illustration depicting an example TE data matrix: [00028] FIG. 4B is a grap depicting exampl results for 18 evaluations of corrupted, data;

[00029] FIG. 5 A is a graph depicting example results where all messages of a given type are delayed by a small-time delta; [00030] FIG. SB is a graph depicting example results where the first message to follow a given message type are removed;

[00031 ) FIG. SB is a graph depicting example results where the second message to follow a given message type are removed; and

[00032] FIG. 6 is a flow chart depicting a process flow for the system in various embodiments.

100033] DETAILED DESCRIPTION

[00034] The present invention relates to an anomaly detection system and, more specifically, to a system for detecting anomalies on CAN bus data using spaise and low rank decomposition. The following description is presented to enable one of ordinary skill in the art. to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may he applied to a wide range of aspects. Thus, the present invention is not intended to be limi ted to the aspects presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein,

[00035] In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the presen invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known stmctures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. [00036] The reader's attention is directed to all papers and docoments which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. AH the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, eqat valerit or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

[00037] Furtberrfiore, airy element in a claim that does not explicitl state "means for 5 performing a specified function, or "step for" performing a specific function, is not to be interpreted as a "mea s" or "step" clause as specified i 35 U.S.C. Section 1 12, Paragraph 6. In particular, the use of "step of or "act of in the claims herein is not intended to invoke the provisions of 35 U.S.C. 1 12, Paragraph 6,

[00038] Before describing the invention in detail, first a list of cited references is provided. Next, a description of the various principal aspects of the present invention is provided. Subsequently, an introduction provides the reader with a general understanding of the present invention. Finally, specific details of various embodiment of the present invention are provided to give an

understanding of the specific aspects.

[00039] (! ) List of Incorporated Literature References

[00040] The following references are cited throughout this application. For clarity and convenience, the references are listed herein as a central, resource for the reader. The following references are hereby incorporated by reference as though fully set . forth herein. The references are cited in the applic tion by referring to the corresponding literature reference number, as follows:

1 , E. Candes, X. Li, Y- Ma, and J, Wright, "Robust Principal Component.

Analysis? 5* , IEEE ΡΑΜΪ 201 1.

2. M. Mardani, G. Mateos, and G. B, Giaruiakis, ''Unveiling anomalies in large-scale networks via sparsity and low rank," in Proc. of 45th Asilomar Conf. on Signal, Systems and Computers, Pacific Grove, CA, Nov. 20.1 1 , pp. 403-407.

3. Kang-Yu Ni and T sai-Ching Lu, "Information Dynamic Spectrum

Characterizes System Instability Toward Critical Transitions," EPS Data

Science, 3:28, 2014,

4. Thomas Schreiber, "Measuring Information Transfer", Phys. Rev. Lett.

85(2): 461-464, 2000.

5. M Staniek, Lehnertz, " Symbolic Transfer Entropy". Phys. Rev. Lett, 100(15), 2008,

6. Taylor, Adrian, Nathalie j ' apkowk , and Sylvara Leblauc. "Frequency- based anomaly detection for the automotive CAN bos", WCICSS 20.15,

7. T. Zhou and D. Tao. "GoDec: Randomized low-rank & sparse matrix decomposition in noisy case." ICML 201 I,

[000 1 ] (2) Principal Aspects

[00042] Various embodiments of the in vention include three "principal 5 ' aspects.

The first is a system for anomaly detection. The system is typically in the form of a .computer system operating software or in the form of "hard-coded" .instruction set. This system, may be incorporated into a wide variety of devices that provide different functionalities. The second, principal aspect is a method, typically in the form of software, operated using a data processing system (computer). The third principal aspect is a computer program product. The computer program product generally represents computer-readable instructions stored on a noa-fraasitorv ' computer-readfehte medium sack as an. optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape. Other * non-limiting examples of computer-readable media include hard disks, read-only memory (ROM), and flash-type memories. These aspects will be described in more detail below.

[00043] A block diagram depicting an example of a system (i.e., computer system

100) of the present invention is provided in FIG. ί . The computer system 100 is configured to perform calculations, processes, operations, and/or functions associated .with a program or algorithm. In one aspect, certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed by one or more processors of the computer system 100. When executed, the instructions cause the computer system 100 to perform specific actions and exhibit specific behavior, such as described herein.

[00044] The computer system 100 may include an address data bus 102 that is

configured to communicate information. Additionally, one or more data processing units, such as a processor 104 (or processors), are coupled with the address/data bus 102. The processor 04 is configured to process information and instructions. In an aspect, the processor 1 4 is a microprocessor.

Alternatively, the processor 104 may be a different type of processor such as a parallel processor, application-specific integrated circuit (ASIC), programmable logic array (PLA), complex programmable logic device (CPLD), or a field programmable gate array (FPGA),

[00045] The computer system 1 0 is configured to utilize one or more data storage units. The computer system 100 may include a volatile memory unit 106 (e.g., random access . -memory ("RAM"), static RAM, dynamic RAM, etc) coupled wit the address/data bus 102, wherein, a volatile memor wait 106 is configured to store information .and ' instructions for the processor 104. The computer system 100 further ma include a non~ volatile memory unit 108 (e.g., read-only memory ("ROM"), programmable ROM ("PROM"), erasable programmable

RDM ("EPROM"), electrically erasable programmable ROM "EEPROM"), flash memory , etc.) coupled with the address/data bus 102, wherein the nonvolatile memory unit 108 is configured, to store static information, and instructions for the processor 104. Alternatively, the computer system 100 may execute instructions retrieved from an online data storage unit such as in

"Cloud" computing, in an aspect the computer system 100 also may include one or more interfaces, such as an interface- 1 10, coupled with the address/data bus 102. The one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc) communication technology. in one aspect, the computer system 100 may include an input device 1 12 coupled with the address/data bus 102, wherein the input device 1 12 is configured to communicate information and command selections to the processor 100. In accordance with one aspect, the input device 1 12 is analphanumeric input device, such as a keyboard, that ma include alphanumeric and/or function keys. Alternatively, the inpu device 1 12 may be an input device other than an alphanumeric input device. In an aspect, the computer system 100 may include a cursor control device 114 coupled with the

address/data bus 1 2, wherein the cursor control device i 14 is configured to communicate user input information and/or command selections to the processor 100, I an aspect, the cursor control device 1 14 is implemented ' using a device such as a mouse, a track-ball, a track-pad, an optical tracking device, or a touch screen. The foregoing notw thstanding, in an aspect, the cursor control device .1 14 is directed and/or activated via input from the input device 1 12, such as in response to the use of special keys and key sequence commands associated with the input device 1 12. in an alternative aspect, the cursor control device 1 14 is configured to be directed or guided by voice commands.

[00047] hi an aspect, the computer system 100 further-may include cue or more optional computer usable data storage devices, such as a storage device 1 16, coupled with the address/data bus 102. The storage device 1 16 is configured to store information and/or computer executable instructions. In one aspect, the storage device 116 is a storage device such as a magnetic or optical disk drive (e.g., hard disk drive ("HDD"), floppy diskette, compact disk read only memory ("CD-ROM"), digital versatile disk ("DVD"}). Pursuant to one aspect, a display device 118 is coupled with the address/data bus 102, wherein the display device 1 18 is configured to display video and/or graphics, in a aspect, the display device 1 18 may include a cathode ray tube ("CRT"h liquid crystal display ("LCD"), field emission display ("FED"), plasma display, or an other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user.

[00048] The computer system 100 presented herein is an example comparing

environment in accordanc with an aspect. However, the ^i raiting example of the computer system 100 is not strictly limited to being a computer system.

For example, an aspect provides that the computer system 100 represents a type of data processing analysis that may be used in accordance with various aspects described herein. Moreover, other computing systems may also be

implemented, indeed, the spirit and scope of the present technology is not limited to my single data processing emtftonroent Thus, in. an aspect one or more operations of vario us aspects of the present technology are controlled or implemented using computer-executable instructions, -s ch as program modules, being executed by a computer. In one implementation, -such program modules include routines, programs, objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types. In addition, an aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote compute -storage media including memory-storage devices.

£00049] An illustrative diagram of a computer program product (i.e., storage device) embodying the present invention is depicted in FIG. 2, The computer program product is depicted as floppy disk 200 or an optical disk 202 such as a CD or DVD, However, as mentioned previously, the computer program product generally represents computer-readable instructions stored on any compatible non-transitory computer-readable medium. The term "instructions" as used with respect to this invention generally indicates set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules. Non-limiting examples of "instruction" include computer program code (source or object code) and "hard-coded" electronics (Le. computer operations coded into a computer chip). The ¾&truction" is stored on any .non-transitory computer-readable medium, such as in the memory of a computer or on a floppy disk, a CD-ROM, and a flash drive. In either event, the instructions are encoded on a non-transitory computer-readable medium.

[00050] (3) Introduction [00051] Many fctio n attacks on systems.,, such as automobile, systetns, involve some form of spoofing or altering CAN hiss messages. For instance, if an attacker can cause another module to go into diagnostic mode, they can stop that module's messages from appearing on the bus, which allows the attacker to replace those messages with their own. Depending on the module, these spoof messages can potentially put passengers in serious danger. The system described herein defends against attacks like this by looking at the timing of the bus messages. Any attempt to spoof messages will have a difficult time avoiding alteration to the relative timing between messages. If the system can detect such timing alterations, then the system can be applied to detect a wide range of different attacks.

[00052] Thus, this disclosure provides a anomaly detection method that is suitable for detecting subtle changes in the timing of CAN Bus messages. The method starts by computing the transfer entropy (TE) matrix for message events. TE matrices (TE. s) are used to capture the causal relationship between the occurrence of one message event and another. The sparse and low rank (SLR) decompositi n technique is then applied to lean) the suhspace of normal TE matrices. To determine whether the given observed data is normal or abnormal, the system calculates the TEMs of the timing of CAN Bus messages. Then, each TEM is decomposed into a component in the norma! TEM subspace and a sparse residual component. The spar se residual components are used to determine whether the given data is normal or abnormal

[00053] As can be appreciated by those skilled in th art, there are a number of

applications in which the system and method described herein can be implemented. For example, the system can be incorporated into vehicles (e.g., car, airplane, drones, etc.) for cybersecurity to address the problem of cyber intrusion detection for vehicles, and in particular automobiles. The growing media attention to hacked cars has made it very clear that many commerc al automobiles are potentially vulnerable to life-threatening eyher-atiaeks. " ibis •system and method addresses the problem of detecting the presence of such attacks before they can cause serious harm. However, while the analysis described in this disclosure is focused on analysis of CAN bus data, which is particular to automobiles, there are similar data busses in other vehicles such as aircraft to which these same analysis techniques could apply. Gi ven these similarities, the potential application of this invention could range anywhere from providing a cyber intrusion monitor for automobiles all the way to cyber intrusion monitoring for commercial aircraf

[00054] (4) Specific Details of Various Embodiments

[00055] This disclosure provides a new anomaly detection method that is suitable for detecting subtle changes in the timing of CAN Bus messages. Many known attacks on automobiles involve some form of spoofing or altering CA bus messages.- For instance, i -an -attacker can cause another module to go into diagnostic mode, they can stop that module's messages from appearing on the bus and then they can replace those messages with their own. Depending on the module, these spoof messages can potentially put passengers in serious danger. The present invention defends against attacks .like this by looking at the timing of the bus messages based on an assumption that any attempt to spoof messages will have a difficult time avoiding alteration to the relati ve timing between messages. Thus, if the system or method can detect such timing alterations, then the system can detect a wide range of ' different attacks.

[00056] As shown in FIG. 3 A, the system starts by computing 300 the transfer

entropy (TE) matrix (i.e., temporal dependency matrix) for message events of trusted messages to establish a normal baseline (i.e., which can be designated as a normal TE matrix under a normal operating conditions). An example of such a TE matrix 3 0 is- shown in FIG, 3B. The Transfer Entropy analysis of a CAN ■Bus message time series provides a measure of time dependency between different messag types. .In the TE Matrix 320 in FIG, 3B {29x29 pixels, because t here are 29 message types used), each column represents one of the several CAN Bus message types and the same is true for each row. Thus, the

TE Matrix 320 is a temporal dependency matrix. Bright squares show where a message type in a gi ven column tends to be preceded by a message type from the corresponding row. [00057] Here, each message type is treated as a spike event, regardless of the

message content. The time series of each message type is a spike time series where one represents an event of that message type. The matrix 320 captures the TE relationship between these message types. Essentially, if there is some causa! relationship between the occurrence of one message type and another , that should be revealed in the TE matrix 320. With a suitable time- series of

CA Bus messages, the system ends up computi ng a time-series of TE matrices. These matrices capture the fact tha message timing among message types and the underlying causal structure can change as a vehicle transitions between different chiving modes.

[00058] Referring again to FIG. 3 A, the next step is to compute 302 a sparse low- rank SLR decomposition of the multiple TE matrices captured over a series of outings (e.g., vehicle outings, etc.). Assuming these are captured from a vehicle that has not been compromised, the SLR decomposition provides one matrix (the S ow-rank component) that is a compact representation of what the range- of normal TE matrices should look like (under normal operation), and another matrix (the sparse component) that describes the aspects of the TE matrix that cannot be characterized by the low-rank component (abnormal properties). This is the residual or error of the matrix approximation. [00059] As noted above, using sparse low-rank decomposition, th system, perfomid a best fit for known good patterns ' to a low-rank -matrix combined with a sparse matrix- When new data is fit to this same low-rank matrix the .magnitude of the sparse matrix residual provides a measure of anomaly as shown in FIGs, 4A and 4B. FIG, 4A depicts a TE data matrix 400: stacking each 29x29 TEM (an example TEM matrix is shown in FIG. 3B) as a column vector and showing the first 1000 columns. The y-axis represents the entry of the 29x29 TEM, while the x-axis represents time stamps , and the intensity bar represents the value of the TEM entries. FIG, 4B is a graph illustrating the result for 18 evaluations of the corrupted data. The error bars correspond to 18 differen training -sets 404 and corresponding test sets 406, where the test data has a single message type removed. In this example, it is observed that the error bars 406 of corrupted data are above the error bars 404 of normal data, indicating the ability of anomaly detection. Note that the x-axis represents each experiment, while the y- a is represents residual (the Li norm of the sparse component, which is described later).

[00060] Referring again to FIG, 3 A , once the residual or error of the matrix

approximation is determined, the system can the look at any new TE matrix and measure 304 its similarity to the normal TE matrices by how small this residual is. If the residual is too large, then the system designates 306 that there is some anomaly in the C A bus message traffic. A residual is too large if it is above the error bar of the normal TEM. Further details regarding the system and process are provided below.

[0006.1 ] (4.1 ) Transfer Entropy

[00062] Transfe entropy (TE) (e.g., see Literature Reference No. 4} is a directional measure of information flow between pairs of time series, such as power co sum tion, thermal status, and electromagnetic emissions side-channels, as well as CAN bus messages. The present system uses TE to detect intrusions by identifying, anomalies in the causa! relationships between system modules (see Literature Reference No. 3 for a description of such system modules}.

[00063] The system depicts the interactions among sensors as a graph with nodes and causal links where each node is a module. TE quantifies how much information is transferred from the current state into the future .from, one node to another and therefore is directional. The TE .measure from node x f to node x is defined as:

where τ is the time delay/shift of information transport. The system finds the optimal time shift that maximizes time-delay dependent correlations . , which allows the system to determine the strength of causal links. The nodes in this application are the message types.

[00064] Suspicious activities is inferred by monitoring the changes in causal

.structure of CAN Bus messages and side channels. For instance, a modul might start processing messages that ft had previousl ignored, or start sending messages it shouldn't; this can make die causal structure deviate from normal. Using the scale-invariant TE measure (the scale-invariant TE measure is described i Literature Reference No. 5), the svsiem can identify causal chances over a broad range of different time frames, represented by TE matrices.

[00065] (4.2) Sparse and Low Rank (SLR) Decomposition

[00066] S LR. decomposition is a set of provahly optimal and efficient mathematical techniques for identifying and decomposing the low-variation structure from higli-dimeimonai raw data (see Literature Reference No. 1 for a detailed discussion, of SLR), It is also known as Robust PGA because it is designed to handle grossly corrupted data, rather than assuming the data noise is

independently and identically distributed Gaussian. Suppos thai there is a stationary camera that is viewing a scene for the task of intruder or foreground detection. If there are many raw image frames obtained over the coarse of a day, each frame ca be stacked as a column vector of the data matrix X, which can be decomposed to X « I - 5. where L is the low-rank matrix that represents the background scene and S is the sparse matrix that represents the sparse foreground and deviations from the convex Lambertiaii model, e.g., shadows and reflec tion, that cannot be modeled as Gaussia noise. The low-rank matrix I is extremely low-tank relative to the image size, the length of the columns in X. It has been shown that the low-rank, and sparse components of the data matrix X can he exactly decomposed by the following convex optimization, Principal Component Pursuit (PCP) (see Literature Reference No. 1): m!aj J!* + xuhleet to ' — L - 5 where the nuclear norm fj¾. takes the sum of the singular values of and the # I -norm ' $S\\ t is the absolute sum of the entries of S. The mmhrnzer I provides a background template for stationary camera. The minimizer S contains the detected foreground.

[00067] (4.3) Learning Normal States of TEM with SLR

[00068] To apply SLR, the transfer entropy matrices are organized into a data matrix X„ in which each column is a TEM. The fast SLR method is applied (the fast. SLR method is described i Literature Reference No. 7), which decompose the matrix .A ' into three niairices: a low-rank component ! that represents the norma! operation state structure, a sparse component S to tolerate a small number of the data .matrix eu.iri.es that deviates Ironi the normal operation (abnormal

operation), and a Gaussian .noise component ( This is expressed in the

following:

X■ ■ L 4- S - G, s bfeel o ra:nk(.l)≤ r, j]S| s < k n 1 X T , where r is a small integer, is the -i no m that is the number of nonzero entries, ' is the sparsity percentage, 2 is the size of the TEM, and F is the total number of TEMs, which are constructed by calculating the TEMs over sliding windows (these sliding windows are typically overlapping, but do not have to). This iwimilaiion can be solved by the following optimization problem, (see;

Literature Reference No, ?):

{L I ~ : argmfel Ji— I— 5|| , subject ©- r k{L}≤f f jfSjj^ k x » 2 · Γ ,

where !Hj r is the Frobenius norm. The Frobenius norm is matrix norm defined

as || tij ? where <t ti are the y fc entry otA. The optimization problem is solved by solving the following subproblems until convergence i J. t subject to ratik ≤ r (

s bject to j|S||§≤k x n 2 X T .

The first subprobiem. with the sparse matrix 5' fixed, is the PC A problem. The classical method to solve this is to perform S.VD decomposition, sort the singular values in the descending order, and then keep the first r singular values and set the rest to ^ero. This gives the exact low rank matrix solution, but since SVD requires a cubic complexity, the algorithm is impractical for large daiasets. The fast low-rank approximation method i Zhou et al. (i.e., Literature Reference No, 7) uses bilateral random projeciicms (BRP), To approximate the low-rank matrix with ra k r for a given matrix X(it would b X— S^fer the first subproblem), the first ste is to compute ¾ ~-XA % and F≥ ~ I T 4 ¾ where A t and A t are m x r and n x r random matrices, .respectively. The rank-?- approximation The computation is fast since * ¾ is a r x r matrix for a small r. The second subproblem is solved by hard- thresholding.

[00070] (4.4) Anomaly Detection Method with SLR of TEM

[00071] The anomaly detection method of this disclosure is described in the

.following stages and steps:

[00072] (4.4.1) Training:

[00073] Training requires a set of inputs. The inputs for normal data consists of the timing of the C AN bus messages for each message type. The training process proceeds as follows:

Construct TEMs over time, an nXnXT tensor T (e.g. 29X29X500 when there are 29 message types and the total number of s liding windo ws for training is 500), Although not limited thereto, these are the numbers in the example reduction to practice section. Each TEM is obtained by calculating the TE values of airwise message timing over a time window of duration Δ : .

Generate the n 2 XT training, data matrix X by stacking TEMs as column vectors. For example, the dimension of the data matrix is 841 X 14751, Decompose the data matrix X with SLR,:

M ~ 1 - 5 4· G subject to rank (£) < r, |5| § kx-m XT and get the basis vector set ¾ (the basis vectors with the largest k singular values) from the extracted low rank matrix L. 4. For each column vector % of th data.matrix if, decompose it into three components: «— % 4- % -f %. where % is the projection οί onto the '¾ vector space, and is ' the sparse component of x in the orthogonal space ¾ and x s is the residual

5. Compute the mean μ and standard deviation a of the l' x norm of the sparse component, !%i l5 over ail column vectors %.

The training process described above results in a set of outputs, the outputs being basis vector space ¾, error mean μ and standard deviation of the sparse errors under normal conditions.

[00074] (4.4.2) Detection

[00075] Once the system is set, it is operable to monitor messages and detect

anomalies in the messages. For example, dining operation, inputs into the system include observed data that comprises the timing of the CAN bus messages for each message type. Detection also uses the basis vector space ¾, error mean μ and standard deviation a of the sparse errors obtained from training. The are at. Least, two methods for implementing the system on input .data. Method One is directed to batch processing. For example, a vehicle may have a targe batch of messages in die syste and periodically performs a diagnostic function where a batch of messages are input through the system for anomaly detection. Alternatively, Method Two is directed to stream processing. For example, the vehicle or other system simply processes the messages as they stream into or through the vehicle or other system. Each of these methods are described in further detail below.

[00076] (4.4.2J) Method One (for batch processing);

1. Construct TE s over time, an «X¾X tensor where each TE is obtained by calculating the TE values of pairwise message timing over a time window of duration Α ε , The variable t indicates the number of sliding windows (each window with dotation & t ).

2. Reshape ¥ ¾a¾s into an n ' M matrix Z by stacking TEMs as column

vectors,

3. For each column vector z of the testing matrix Z, decompose ft into three components; z = ¾ - ¾ - τ ΰ , where ¾ is the projection of onto the ¾. vector space, and∑ s is the sparse component of 2 in the orthogonal space 0 ¾ and ¾ is the residual

4. Compute the mean and standard deviation i¾ st of the 4 norm of the sparse component, 11% II 3., over all column vectors s.

5. Perform a statistical test such as a t-test to determine if the mea

and standard deviation i¾ est is above a pre-deterrnined error threshold relative to the mean μ and standard deviation , in which case the system indicates die input data contains abnormal activities.

The output in this process are the mean μ and standard de viation a of the sparse errors, and an anomaly indicator indicating that the input data, contains abnormal activities. (4.4.2,2) Method Two (for stream processing);

. At each time interval & tt calculate the TE values of pairwise message timing over that time window to obtain an Xn 1ΈΜ,

2. Reshape this nXn TEM into an n 2 vector

3. Decompose vector z into three components; 2— z - ¾ 4- ¾ where ¾ is the projection of 2 onto the ¾ vector space obtained from training, and ¾ is the sparse component of in the orthogonal space ¾ and ¾ is the res dual.

A. Compute the t norm of the sparse component, !%! t . 5. Evaluate whether t¾! s > μ -f , where is a pre-deteraiined threshold that defines a level outside of normal sufficient to trigger an anomaly warning.

The output in this method is an anomaly warning each time |%l i > μ ρ®.

[00078] Method One evaluates multiple TEMs to get statistics of the sparse error for anomaly detection, while Method Two evaluates an individual TEM to give anomaly warning at each time step. The anomaly indicator of Method One is less likely to produce false alarms because it is looking at a longer window of the data; while Method Two provides qoicker feedback since it only evaloates a single TEM, so it is better suited to detect momentary hacks that interfere with CAN Bus activity for a ver short time. Furthermore, after an anomaly detection or warning is activated, by identifying the entries in ¾ that are large in magnitude, the system can zero in on the particular message types that are corrupted. This may provide some insight in to the type of hack that is being attempted and into the particul r vehicle subsystems that the hacker is attempting to influence.

[00079] (4 , .3) Response

[00080] Once an attack is detected, there are many options for response. A key

concern is that the response not be more harmful than the hack itself. Therefore; it is essential that the response to a detected hack does not interfere with the normal safe operation of the vehicle (or other system in which the invention is implemented), especially since the vehicle is likely to be in motion at the time the attack is detected. This invention proposes the following non- limiting set of alternative reactive responses that may be initiated in reaction to detecting anomalous data or an intrusion.: Upon anomaly detection, the system provides a. warning light or sound to the dri ver so the y are a ware imm ediately mat there co uld be a problem. This would then leave the driver to decide appropriate actions given the current -drcumstaxH.es. For instance, if the driver is on a freeway traveling at high speeds, they may decide to exit the freeway as soon as possible and stick to surface roads until they can get to a nearby dealer. Upon anomaly detection, the system provides an electronic message to the auto manufacturer. Ma ufacturers may be able to collect hack information from multiple vehicles and thereby determine a potential threat of a multi-vehicle attack.

Upon anomaly detection., the system triggers or initiates a "safe-mode" for the vehicle that temporarily disables all advanced automated driving and automated parking accessories that have the potential to control brakes, steering, or other vital functions, it would also cut off all nonessential access to the CAN Bus from components such as the

infotainment system and any CAN Bus p!ug-ίη devices. Essentially, in such a. safe-mode, the vehicle would be drivable but would be temporarily missing most of the added luxury features that enhance the driving experience. Resetting the vehicle out of the "safe-mode" woul d require either a visit to the dealer or a remote software refresh.

[0008.1 ] (4.5) Reduction to Practice

[00082] To demonstrate that the system as described herein works effectively, the anomaly detection method was performed using CAN Bus dat collected from a vehicle. The data consists of the time stamps of the message activities, which are the message types, along with the contents in the message types. Time series were constructed of each message type (there are 29 message types and the length of the time series is 15251 ), using the collective time stamps of the message activities. Each time series is a sequence of 0's and Vs, where 0 indicates no activity of the message type at ' the time stamp and 1 indicates an activity of th message type at the time stamp. The tithe window interval A t for TEM calculat on is 500, and there are 1 75 Ϊ windows. From this data, a subset (Γ ~ 50Ο) of time series were extracted that were unaltered for training and another subset of time series {£=500) was manually altered in various ways to emulate various types of attacks. For instance and as shown in FIG. 5A, the normal data was altered by delaying the occurrence of one particular message type in the time series. This might mimic the effects of an attacker who

purposely blocks a particular message whenever it occurs on the bus but then re~ inserts a message of the same type with, different con tents, in another data set and as shown in FIG, SB, the occurrences of the first message were removed to follow a particular message type. In. a third dat set and as shown in FIG. SC, the second message was removed to follow a particular message type. The results of these tests (with Method One) are shown in FIGs. 5 A through 5C. in each plot. 18 results are shown from 18 sets of training and testing data. The error bars show magnitude of the residual within one standard deviation. The normal data error bars 502 are for the normal data and the evaluation error bars 500 are for the modified data. A true anomal detection occurs when the evaluation error bar 500 (those on the top portion of the graph in this example) is above the normal data error bar 502, It was observed that most of the two error bars are well separated, indicating the ability of detect anomaly. Thus, using sparse low-rank decomposition, the system can identify anomalies even when relatively subtle changes are made to the CAN Bus time series. Thus, in summary and as shown in FIG. 6, the system operates in some embodiments by performing a statistical analysis 600 of message timing on network traffic to produce a temporal dependency matrix representative of temporal dependency between different message types in the network traffic;

decomposing sets of temporal dependency matrices into component matrices 602; generating a new temporal dependency matrix based on new network traffic 604; and detecting anomalous behavior in the ne network traffic 606. A variety of actions can he irritated if anomaliotis behavior is detected. For example, the system can initate a reactive response 608, such as causing a vehicle to initiate a safe-mode, where at lest one vehicle system is disabled. Finally, while this invention has been described in terms of several embodiments, one of ordinary skill in the art will readily recognize that the invention may have other applications in other environments, it should be noted that many embodiments and implementations are possible. Further, the i ilowing claims are In no way intended t limit the scope of the present invention to the specific embodiments described above. In addition, any recitation of "means for" is intended to evoke a means-plus-ftmction reading of an element and a claim, whereas, any elements that do not specifically use the recitation "means for 5 , are not intended to he read as means-plus-ftinction elements, even if the claim otherwise includes the word "means". Further, while particular method steps have been recited in a particular order, the method steps may occur in any desired order and fall within the scope of the present invention.