Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPUTER INTRUSION DETECTION
Document Type and Number:
WIPO Patent Application WO/2014/094034
Kind Code:
A1
Abstract:
Methods of generating a configuration for block-based neural network (BBNN) structure and intrusion detection systems are disclosed. A flow collector is configured to collect packets destined for computing device and produce input vector for each detected flow of packets. A BBNN structure is developed and configured to label each input vector as either normal or intrusion. A FPGA-based hardware system is provided in which BBNN structure is embedded to present a flexible structure for identifications of time-varying nature of intruders as well as new and unknown intrusions. A database is configured to store labelled input vectors. A computing device is configured to execute real-time detection process, wherein a pre-training process is configured to process the labelled vectors to generate a configuration for BBNN structure. A detection device operates with stable convergence rates under a flexible structure in which a high detection rate is maintained with low false alarm rate.

Inventors:
JIANG FRANK (AU)
FRATER MICHAEL (AU)
Application Number:
PCT/AU2013/001430
Publication Date:
June 26, 2014
Filing Date:
December 04, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEWSOUTH INNOVATIONS PTY LTD (AU)
International Classes:
G06F21/76
Other References:
TRAN, Q A ET AL.: "A Real-time NetFlow-based Instrusion Detection System with Improved BBNN and High-Frequency Field Programable Gate Arrays", IEEE 11TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, June 2012 (2012-06-01), pages 201 - 208
Attorney, Agent or Firm:
SPRUSON & FERGUSON (Sydney, New South Wales 2001, AU)
Download PDF:
Claims:
1. An intrusion detectio system comprising: a flow collector configured to collect packets destined for a computing device and produce an input vector for each detected flow of packets; a block-based neural network structure developed and configured to label each input vector as either normal or an intrusion, a FPGA-based hardware system in which said block-based neural network structure is embedded to present a flexible structure for identifications of time-varying nature of intruders and new and unknown intrusions; a database configured to store labelled input vectors; a computing device configured to execute a real-time detection process, wherein a pre- training process is configured to process the labelled input vectors to generate a configuration for the block-based neural network structure; and a detection device that operates with stable convergence rates under a flexible structure in which a high detection rate is maintained with low false alarm rate.

2. The system as claimed in claim 1 , wherein the high detection rate reaches around 99%, and the low false positive rate (false alarm rate) is lower than 5%.

3. A method of generating a configuration for a block-based neural network structure, the method comprising: a. initializing a position and a velocity of each component of each particle in a

particle swarm, the weighted structure encoded in a 2-D array representation as the configuration of a block-based neural network structure; b. developing new movements of particles to solve a stagnation problem; c. determining whether a termination condition is met by the particle swarm, and if so, returning the configuration represented by a global best position of all the particles in the swarm; otherwise d. updating the velocity of each component of each particle using a velocity update rule; e. updating the position of each component of each particle using a position update rule and the corresponding updated component velocity; f. mutating at least one particle using a mutation rule; g. reproducing the swarm using the mutated particles; h. updating the global best position of all the particles in the reproduced swarm; and i. repeating steps a. to g. until the termination condition is met; wherein the velocity update rule uses as a target best position for the particle a best position for a particle other than the current particle being updated.

4. The method as claimed in claim 3, wherein in respect of stagnation problem, "stagnation" means immaturely converged.

Description:
COMPUTER INTRUSION DETECTION

Related Application

[0001] The present patent application claims the benefit of the earlier filing date of Australian Provisional Patent Application No. 2012905589 filed 20 December 2012 of the same title, which is incorporated herein by reference in its entirety.

Technical Field

[0002] The present disclosure relates to computer security and in particular to detecting intrusion into computing devices using reconfigurable hardware.

Background

[0003] Large-scale distributed computing infrastructure is presenting new challenges to conventional intrusion detection, prevention, and self-healing security systems. For example, the past few years have seen over 600 million dollars in losses accrue annually across Australian businesses due to computer security incidents. As it is impossible to completely prevent computer attacks, intrusion detection systems (IDSs) are designed to minimize the damage caused by potential attackers.

[0004] Traditionally, IDSs include network-based and host-based implementations. They can also be roughly divided into two primary categories: rate-based detection, and content-based detection (i.e., signature and anomaly-based). A rate-based IDS blocks the suspicious traffic whenever a threshold, e.g. of number of packets or connections in a given time interval, is reached. A content -based IDS is triggered by anomalous behaviour of the networks (e.g., port scanning), by matching a signature of the behaviour with that of known anomalies, such as Sasser, Blaster, MyDoom, etc. Content-based IDSs are generally classified into the following categories: (1 ) statistical features and correlation analysis (signal processing approaches); (2) artificial intelligence-based, rule-based, and agent-based approaches; and (3) data mining-based approaches. However, in addition to the very limited public IDS datasets, current IDS systems in use are primarily facing the following two difficulties: (1) comparatively high error rate of detection, especially for "hew" (unknown or evolving) attacks; and (2) low speed of detection, especially in the large-scale real-time environment, such as core networks. [0005] A purely software-implemented IDS can adapt itself effectively to new attacks.

However, its detection speed is limited. Many modern intrusion detection processes run too slowly in software to have any practical value, and cannot meet the requirements in a large-scale distributed environment in future computing scenarios such as cloud computing, where enormous amounts of data are merged into a centralized infrastructure. In contrast, a purely hardware-implemented IDS will provide a relatively high detection speed in comparison with existing software IDSs, but is not able to detect new attacks.

Summary

[0006] The aspects of the present invention seek to overcome, or at least ameliorate, one or more of the above disadvantages of existing arrangements.

[0007] Disclosed is an intrusion detection system that is hardware-cored with adaptive capability to cope with both high rates of incoming data and evolving attacks. The disclosed IDS uses a high-sampling-rate-enabled field-programmable gate array (FPGA), on which is implemented an adaptive Block-Based Neural Network (BBNN) engine to perform attack- related pattern recognition. The internal block structure of the BBNN matches well with the reconfigurable logic gate array structure of FPGA hardware. The BBNN is software-driven to allow dynamic reconfiguration. The configuration of the BBNN is determined during an offline training stage by an optimisation process referred to as Spiral Particle Swarm Optimisation with Wavelet Mutation (SPSOWM). The disclosed IDS has higher speed than purely software implementations and the ability to adapt to unknown and / or evolving attacks with lower false alarm rates and higher detection rates in a real-time environment than conventional approaches.

[0008] In accordance with an aspect of the invention, there is provided an intrusion detection system. The system comprises: a flow collector configured to collect packets destined for a computing device and produce an input vector for each detected flow of packets; a block-based neural network structure developed and configured to label each input vector as either normal or an intrusion, a FPGA-based hardware system in which the block-based neural network structure is embedded to present a flexible structure for identifications of time-varying nature of intruders as well as new and unknown intrusions; a database configured to store labelled input vectors; a computing device configured to execute a real-time detection process, wherein a pre-training process is configured to process the labelled input vectors to generate a configuration for the block-based neural network structure; and a detection device that operates with stable convergence rates under a flexible structure in which a high detection rate is maintained with low false alarm rate.

[0009] In accordance with an aspect of the invention, there is provided a method of generating a configuration for a block -based neural network structure. The method comprises: initializing a position and a velocity of each component of each particle in a particle swarm, the weighted structure encoded in a 2-D array representation as the configuration of a block-based neural network structure; developing new movements of particles to solve a stagnation problem; determining whether a termination condition is met by the particle swarm, and if so, returning the configuration represented by a global best position of all the particles in the swarm ;

otherwise updating the veloci ty of each component of each particle using a velocity update rule; updating the position of each component of each particle using a position update rule and the corresponding updated component velocity; mutating at least one particle using a mutation rule; reproducing the swarm using the mutated particles; updating the global best position of all the particles in the reproduced swarm; and repeating the foregoing steps until the termination condition is met; wherein the velocity update rule uses as a target best position for the particle a best position for a particle other than the current particle being updated.

Brief Description of Drawings

[00010] At least one embodiment of the invention is described below with reference to the drawings, in which:

[0001 1] Fig. 1 is a block diagram of an intrusion detection system according to one

embodiment;

[00012] Fig. 2 is a block diagram illustrating a block-based neural network used as used in the system of Fig. 1 ;

[00013] Fig. 3 is a block diagram illustrating the four possible structures of each block in the block-based neural network of Fig. 2; [00014] Figs. 4A and 4B form a schematic block diagram of a general purpose computing device on which the training process in the system of Fig. 1 may be implemented;

[000 5] Fig. 5 is a flow chart illustrating a particle swarm optimisation (PSO) algorithm with mutation that may be used to implement the training process 140 of Fig. 1; and

[00016] Figs. 6A and 6B illustrate the difference between typical particle paths in standard PSO and SPSOWM.

Description of Embodiments

[00017] Fig. 1 is a block diagram of an intrusion detection IDS 100 according to one embodiment. Packets arrive at the flow collector 1 10 from one or more routers (not shown). The flow collector 1 10 analyses the packets to produce an input vector for each flow in the manner described below. In the seven-layer OSI model of computer networking, 'packet' strictly refers to a data unit at layer 3. As for the "flow", a sequence of packets sent from a particular source to a particular unicast, anycast, or multicast destination that the source desires to label is a flow. A flow comprises all packets in a specific transport connection or a media stream.

During the training phase, the automatic switch 120 is in the lower position. Each vector has a corresponding label (intrusion or normal) with which it is stored in the database 150. The labels on the vectors may be obtained from of the input packets or may be manually applied by an operator during the training phase. A training process 140, described in detail below, processes the labelled vectors in the database 150 to generate a configuration for a block-based neural network (BBNN) 130, implemented in hardware. In one implementation, the BBNN 130 is implemented using a field programmable gate array (FPGA). In some implementations, the training process 140 and the database 150 are implemented on the same physical device. In other implementations, the training process 140 and the database 150 are implemented on separate physical devices.

[00018] During the operation phase, the switch 120 is in the upper position, so that incoming vectors pass into the BBNN 130, which processes each vector to generate a label (intrusion or normal) for the corresponding flow. The generated labels are passed to subsequent stages for possible action. These actions refer to the general detection or re-examination, etc. The labelled vectors are also stored in the database 150 for subsequent training phases. [00019] The flow collector 1 10 is implemented in hardware to enable it to operate at the required real-time rate. Operation processing time reaches milliseconds level within the acceptable amount of data input. The processing carried out by the flow collector 1 10 depends on the protocol followed by the input packets. In one implementation, the input packets are NetFlow packets, which include among their fields Source and Destination IP addresses, Source and Destination Ports, and a timestamp. The flow collector 110 parses the fields of each input packet and generates, for each flow, a vector comprising the four features shown in Table 1 :

Table 1 : Flow Features used by IDS 100

[00020] The flow collector 1 10 scales each feature value x to the range [-1 , 1 ] using the following equation:

x

max mm

[00021 ] where x max and x m i n and the maximum and minimum values of the feature value x the whole dataset.

[00022] The flow collector 1 10 then forwards the scaled vector to the database 150 or the BBNN 130 for storage or processing. [00023] In one implementation, the BBNN 130 is implemented using the Cyclone III FPGA Starter Board manufactured by the Altera Corporation. This model has a 50 MHz clock, 32 Mbytes of SRAM, 4 LEDs, and a USB programming interface.

[00024] Fig. 2 is a block diagram illustrating a BBNN 200 used as the BBNN 130 in the system 100 of Fig. 1 . The BBNN 200 is a two-dimensional array of interconnected blocks, e.g. 220. The input 210 to the BBNN 200 is a vector x of n components, or features, ¾·. The BBNN 200 comprises m rows or layers, each comprising n blocks, so the blocks 220 are therefore labelled as By where runs from 1 to m and_/ ' runs from 1 to n. The output 230 of the BBNN 200 is a vector y of n components, each labelled as ¾. Vector y consists of a set of yj. It is a function of summation of weighted inputs and a bias. Also as shown in Fig. 2, the rightmost node has arbitrarily chosen as the output. All the redundant output nodes are marked as *. This rightmost node output is denoted as 1 (intrusion) or 0 (no intrusion).

[00025] Each individual block 220 is a basic signal processing unit (neuron) having four variable input/output nodes labelled 1 (left), 2 (right), 3 (top), and 4 (bottom). Node 1 is always an input node and node 2 is always an output node, so signal flow is always from left to right, whereas nodes 3 and 4 can be either input or output, so there are four possible configurations for each block 220. For example, in Fig. 2, the top node 3 and the bottom node 4 of the block Z? 22 (220) are both outputs.

[00026] Fig. 3 is a block diagram illustrating the four possible configurations 310, 320, 330, and 340 of each block 220 in the BBNN 200 of Fig. 2. In the configuration 310 (type 1 ), the top node 3 is an input and the bottom node 4 is an output. In the configuration 320 (type 2), the top node 3 and the bottom node 4 are both inputs. In the configuration 330 (type 3), the top node 3 and the bottom node 4 are both outputs. Finally, in the configuration 340 (type 4), the top node 3 is an output and the bottom node 4 is an input. The inputs to a block are labelled as w„ where / ' is 1 , 3, or 4, whereas the outputs are labelled as v,, where j is 2, 3, or 4. The output of each configuration is computed by the summation of its weighted inputs and a bias term bf.

4

v, =∑ w i, u . + b j (2) where wy is the weight from input to output jr. Because node 1 is always an input and node 2 always an output, the weights w i, w 2 \ , wi \ , w i , w 22 , w 23 , w 24 , w 33 , and w 44 are identically zero. A type 1 configuration 310 has four non-zero weights (w , w , w 32 , and w ) and two non-zero biases (b 2 and 6 4 ). A type 2 configuration 320 has three non-zero weights (w 12 , w 23 , and w 24 ) and one non-zero bias (_¾). A type 3 configuration 330 has three non-zero weights On, w 13 , and H'i 4 ) and three non-zero biases (b 2 , b}, and 6 4 ). A type 4 configuration 340 has four nonzero weights (w 2, w\3, vv 42 , and w 43 ,) and two non-zero biases {b 2 and 6 3 ). Therefore, each block 220 may be completely characterised by two bits (indicating one of the four configurations) and at most six scalar parameters.

[00027] The inputs Xj to the BBNN 200 may be identified with the top inputs w 3 to the first layer blocks B \ „, which must therefore be of type 1 or type 2. The connection directions between the blocks are flexible, here in Fig 2, the directions always from left hand side to right hand side; some connections record can be from right to left. Therefore, the structure constraints are minimised. The outputs .y, of the BBNN 200 may be identified with the bottom outputs v 4 of the w-th layer blocks B mn , which must therefore be of type 1 or type 3. The BBNN 130 may therefore be characterised by 1 Omn parameters, each representable by at least four bits, and 2mn "structure" bits.

[00028] Figs. 4A and 4B collectively form a schematic block diagram of a general purpose computing device 400, upon which the training process 140 and / or the database 150 may be implemented.

[00029] As seen in Fig. 4 A, the computing device 400 is formed by a computer module 401 , input devices such as a keyboard 402, a pointer device 403, and a microphone 480, and output devices including a printer 415, a display device 414 and loudspeakers 417. An external Modulator-Demodulator (Modem) transceiver device 416 may be used by the computer module 401 for communicating to and from a communications network 420 via a connection 421. The network 420 may be a wide-area network (WAN), such as the Internet or a private WAN.

Where the connection 421 is a telephone line, the modem 416 may be a "dial-up" modem. Alternatively, where the connection 421 is a high capacity (e.g. cable) connection, the modem 416 may be a broadband modem. A wireless modem may also be used for wireless connection to the network 420.

[00030] The computer module 401 typically includes at least one processor unit 405, and a memory unit 406 for example formed from semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The module 401 also includes an number of input/output (I/O) interfaces including an audio-video interface 407 that couples to the video display 414, loudspeakers 417 and microphone 480, an I/O interface 413 for the keyboard 402 and pointer device 403, and an interface 408 for the external modem 416 and printer 415. In some implementations, the modem 416 may be incorporated within the computer module 401 , for example within the interface 408. The computer module 401 also has a local network interface 41 1 which, via a connection 423, permits coupling of the computing device 400 to a local computer network 422, known as a Local Area Network (LAN). As also illustrated, the local network 422 may also couple to the wide network 420 via a connection 424, which would typically include a so-called "firewall" device or device of similar functionality. The interface 41 1 may be formed by an Ethernet™ circuit card, a Bluetooth™ wireless arrangement or an IEEE 802.1 1 wireless arrangement.

[00031 ] The interfaces 408 and 413 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). In particular, the computing device 400 may interface with the BBNN 130 over such a USB interface 408.

[00032] Storage devices 409 are provided and typically include a hard disk drive (HDD) 410. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. A reader 412 is typically provided to interface with an external non- volatile source of data. A portable computer readable storage device 425, such as optical disks (e.g. CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 400. In particular, the database 150 may be implemented in the storage device 409 or the portable computer readable storage device 425.

[00033] The components 405 to 413 of the computer module 401 typically communicate via an interconnected bus 404 and i a manner which results in a conventional mode of operation of the computing device 400 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun

Sparcstations, Apple Mac™ or computing devices evolved therefrom.

[00034] The training process 140 may be implemented using the computing device 400 as one or more software programs 433 executable within the computing device 400. In particular, with reference to Fig. 4B, the steps of the training process 140 are effected by instructions 431 in the software 433 that are carried out within the computing device 400. The software instructions 431 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the training process 140 and a second part and the corresponding code modules manage a user interface between the first part and the user.

[00035] The software 433 is generally loaded into the computing device 400 from a computer readable medium, and is then typically stored in the HDD 410, as illustrated in Fig. 4A, or the memory 406, after which the software 433 can be executed by the computing device 400. In some instances, the programs 433 may be supplied to the user encoded on one or more storage media 425 and read via the corresponding reader 412 prior to storage in the memory 410 or 406. Computer readable storage media refers to any non-transitory tangible storage medium that participates in providing instructions and/or data to the computing device 400 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD- ROM, DVD, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, semiconductor memory, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external to the computer module 401. A computer readable storage medium having such software or computer program recorded on it is a computer program product. The use of such a computer program product in the computer module 401 effects an apparatus for training the BBNN 130.

[00036] Alternatively the software 433 may be read by the computing device 400 from the networks 420 or 422 or loaded into the computing device 400 from other computer readable media. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software programs, instructions and/or data to the computer module 401 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[00037] The second part of the software 433 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 414. Through manipulation of typically the keyboard 402 and the mouse 403, a user of the computing device 400 and the software 433 may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the software associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 417 and user voice commands input via the microphone 480.

[00038] Fig. 4B is a detailed schematic block diagram of the processor 405 and a "memory" 434. The memory 434 represents a logical aggregation of all the memory devices (including the HDD 410 and semiconductor memory 406) that can be accessed by the computer module 401 in Fig. 4A.

[00039] When the computer module 401 is initially powered up, a power-on self-test (POST) program 450 executes. The POST program 450 is typically stored in a ROM 449 of the semiconductor memory 406. A program permanently stored in a hardware device such as the ROM 449 is sometimes referred to as firmware. The POST program 450 examines hardware within the computer module 401 to ensure proper functioning, and typically checks the processor 405, the memory (409, 406), and a basic input-output systems software (BIOS) module 451, also typically stored in the ROM 449, for.correct operation. Once the POST program 450 has run successfully, the BIOS 451 activates the hard disk drive 410. Activation of the hard disk drive 410 causes a bootstrap loader program 452 that is resident on the hard disk drive 410 to execute via the processor 405. This loads an operating system 453 into the RAM memory 406 upon which the operating system 453 commences operation. The operating system 453 is a system level application, executable by the processor 405, to fulfill various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

[00040] The operating system 453 manages the memory (409, 406) in order to ensure that each process or application running on the computer module 401 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 400 must be used properly so that each process can run effectively. Accordingly, the aggregated memory 434 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computing device 400 and how such is used. [00041] The processor 405 includes a number of functional modules including a control unit 439, an arithmetic logic unit (ALU) 440, and a local or internal memory 448, sometimes called a cache memory. The cache memory 448 typically includes a number of storage registers 444 - 446 in a register section. One or more internal buses 441 functionally interconnect these functional modules. The processor 405 typically also has one or more interfaces 442 for communicating with external devices via the system bus 404, using a connection 418.

[00042] The program 433 includes a sequence of instructions 431 that may include conditional branch and loop instructions. The program 433 may also include data 432 which is used in execution of the program 433. The instructions 431 and the data 432 are stored in memory locations 428-430 and 435-437 respectively. Depending upon the relative size of the instructions 431 and the memory locations 428-430, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 430.

Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 428-429.

[00043] In general, the processor 405 is given a set of instructions which are executed therein. The processor 405 then waits for a subsequent input, to which it reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 402, 403, data received from an external source across one of the networks 420, 422, data retrieved from one of the storage devices 406, 409 or data retrieved from a storage medium 425 inserted into the corresponding reader 412. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 434.

[00044] The training process 140 use input variables 454, that are stored in the memory 434 in corresponding memory locations 455-457. The training process 140 produces output variables 461, that are stored in the memory 434 in corresponding memory locations 462-464.

Intermediate variables may be stored in memory locations 459, 460, 466 and 467.

[00045] The register section 444-446, the arithmetic logic unit (ALU) 440, and the control unit 439 of the processor 405 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 433. Each fetch, decode, and execute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 431 from a memory location 428;

(b) a decode operation in which the control unit 439 determines which instruction has been fetched; and

(c) an execute operation in which the control unit 439 and/or the ALU 440 execute the instruction.

[00046] Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 439 stores or writes a value to a memory location 432.

[00047] Each step or sub-process in the training process 140 is associated with one or more segments of the program 433, and is performed by the register section 444-447, the ALU 440, and the control unit 439 in the processor 405 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 433.

[00048] The training process 140 may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the training process 140. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

[00049] The training process 140 comprises two tasks: structure and parameter optimisation. Structure optimisation corresponds to the determination of the type of each block 220 in the BBNN 1 0. Parameter optimisation corresponds to determining the weights and biases in each block 220. In one implementation, the training process 140 is a genetic algorithm (GA). The genetic algorithm searches for the global optimum of structure and parameters for the BBNN 130, within permitted combinations of structure and parameters, based on a given fitness function of the structure and parameters. The fitness function is to achieve the optimized detection rate while the false positive rate is kept at acceptable value. The GA is used to find a maximum value of the fitness function, which occurs at an "optimum" set of structures and parameters of the BBNN 130. However, as far as the model-free neural network structure is concemed, the relative slowness of a training process 140 implemented by a GA constrains the ability of such an IDS 100 to cope with a real-time application. In other implementations, the training process 140 is a Particle Swarm Optimisation (PSO) algorithm.

[00050] To describe PSO algorithms, the following notation is used. X{t) denotes a particle swarm containing P particles at the /-th iteration. Each particle is represented by a vector x p (t) , where p = I , . .. , P, containing κ components denoted as x J (t) for / = 1 , ... , κ. The value of each component x l (/) of x (t) represents the position of the particle indexed by p along the j ' -th coordinate axis. Each particle position x p (t) represents a candidate optimising solution for the predetermined objective function denoted as/(x). An objective function may also be evaluated on the entire particle swarm X(t), based on the values of the objective function f(x p (t)) for p = 1 , P. The objective function depicts the satisfied intrusion detection rate performance for BBNN, given a condition that false alarm rate is less than an acceptable value. The aim of a PSO algorithm is to optimise the fitness function F(x(()) by iterative refinement of the particle positions x r (t) . In the case where the objective function / " is a fitness function, "optimisation" means maximisation. In the case where the objective function /is a cost function, "optimisation" means minimisation.

[00051] In PSO algorithms, the positions x p (f)of the particles in the swarm X(t) are first initialised and the iteration number / is set to zero. The objective function F{X{t)) is evaluated on the swarm X(t). If a termination condition, typically dependent on the value of the objective function F(X(t), is not met, the swarm X{t) is updated from iteration t- \ to iteration / by updating the velocity v'' (/) of each particle according to a velocity update rule, and then updating the position x^ it) of each particle according to a position update rule using the updated velocity if) . A new swarm is then "reproduced" from the updated particle positions x p (t) . A "best" position x p of each particle and a global "best" position x of the reproduced swarm are then updated. The algorithm then returns to test the termination condition on the reproduced swarm; if satisfied, the global "best" position x is returned; otherwise, a further update is performed.

[00052] In PSO algorithms with mutation, the swarm X(t) is "mutated" according to a mutation rule after the updating steps, and the new swarm is reproduced from the mutated swarm. [00053] Fig. 5 is a flow chart illustrating a PSO-based method 500 that may be used to implement the training process 140 of Fig. 1. In the implementation in which the training process 140 is implemented as software in the computing device 400, the method 500 is carried out by the processor 405 in concert with the software 433.

[00054] In the method 500, the particle positions x p (t) are binary-valued, and the binary values of each component x p (t) of each particle position encode the structure and parameters of the

BBNN 200. Each block 220 in the BBNN 200 is encoded by six parameters and two structure bits, as mentioned above. Each parameter is encoded by B bits. Since there are mn blocks 220 in the BBN 200, mn(6B+2) bits are required in total to encode the structure and parameters of the BBNN 200. Each particle in the swarm X{t) therefore comprises κ = mn(6B+ 4) components that are grouped into mn groups of 65+2, each encoding a block 220 in the BBNN 200. Within each group, 6 subgroups of B bits each encode the six parameters of the block 220, while the remaining two bits encode the structure of the block 220. (For a block of type 2, e.g. 340 in Fig. 3, only four parameters are needed, so the fifth and sixth subgroups are ignored.) The two structure bits indicate the directions of the vertical signals indexed by 3 and 4 in Fig. 3: 0 indicates downward flow, while 1 indicates upward flow.

[00055] The method 500 starts at step 510, where the iteration number t is initialised to zero. The position x p (t) and velocity v^ ^ of each particle in the swarm X(() are initialised, to most suitable initial values, such as zero, at the following step 520. Step 520 also initialises the "best" position x of each particle to the corresponding initialised value x p (0) , and the global "best" position x of the entire swarm X(t) to the position x p that optimises the objective function f x p ) over all-/?.

[00056] Step 530 follows, at which the objective function F(X(t) is evaluated on the particle swarm X(t) for the current iteration. Step 540 then tests whether the termination condition is met. For example, the termination condition might involve evaluating f(x(t)); if the predefined ratio is achieved or if the predefined iteration number is reached, then terminate the process. If so ("Y"), the method 500 concludes at step 590, in which a best global position or velocity is returned. Otherwise ("N"), the method 500 proceeds to step 545, at which the iteration number t is incremented. At the next step 550, the velocity v^ of each particle is updated according to a velocity update rule that typically involves one or more of the current best position x , the current global best position , and the current position x p (t - Ϊ) . As part of the velocity update rule, step 550 "clips" the updated velocity v p (t) to remain in the range [-v max , v max ] for a predetermined maximum velocity v max , where v max is typically set to 10% - 20% of the dynamic range of corresponding component.

[00057] The method 500 then proceeds to step 555, at which the position x p (l) of each particle is updated using a position update rule involving the updated velocity \ p (t) . The velocity update rule used at step 550 and the position update rule used at step 555 are described in detail below.

[00058] At the following step 560, the position x p (t) of each particle in the swarm X(t) is mutated according to a mutation rule. The mutation rule used at step 560 is described in detail below. The method 500 then at step 570 reproduces a new swarm X{t) from the mutated swarm. The mutation rules are applied in a normal way

[00059] Step 580 follows, at which the best position x p of each particle and the global best position x of the reproduced swarm X(f) are updated. Step 580 updates x p to the current particle position x p (t) if the objective function f(x p (t)) is "more optimal" than the objective function f(x p ) of the current best position x p , and updates x to one of the best positions x if the objective function f^ ) for some /? is "more optimal" than the objective function /(x) . That is, the case where/is a fitness function to be maximised,

x p , otherwise

_ lx p , f(x" ) > /(x) for some p e [l, p] (4)

[x, otherwise

[00060] In the case where /is a cost function to be minimised, equations (3) and (4) are still used, but with the inequalities reversed.

[00061] After step 580, the method 500 returns to step 530 for another iteration, if needed.

[00062] One example of a velocity update rule is given by: v f (/) = k(wv >; (ι - + φ, r, (*; - * (r - 1)) + ^ 2 r 2 (; *, - * (/ (5)

[00063] where:

• w is an inertia constant;

• φ \ and ι are acceleration constants;

• r\ and A * 2 are random numbers between 0 and 1 ; and

• A: is a constriction factor that is derived from stability analysis of equation (5) as

[00064] where φ = φ + φ 2 \ and φχ are chosen such that φ> 4), to ensure the method 500 does not converge prematurely.

[00065] A suitable selection of the inertia constant w provides a balance between global and local explorations of the objective function. In one implementation of the velocity update rule embodied in equation (9), the inertia constant w is dynamically set to decrease over time as follows:

• where T is the maximum number of iterations.

[00066] In standard PSO algorithms, the position update rule is given by

*;( = * (/ - i) + v; ( (8)

[00067] However, such a position update rule is not suitable for binary-valued particle positions x p (/), as in the method 500. [00068] In one implementation, the PSO-based algorithm used by the training process 140 is a spiral PSO with wavelet mutation (SPSOWM) algorithm. According to SPSOWM, the velocity update rule used at step 550 of the method 500 is a variant of equation (5) given by:

[00069] The difference between equation (9) and the velocity update rule of equation (5) is that the "target" best position for the particle indexed by p is not fixed as the best position \ p for that particle, as in equation (5), but varies over the iterations t to the best positions of other particles in the swarm, i.e. x ip )mod l' (t), where P is the number of particles in the swarms. (The modulo- P operation ensures that the index of the "target" best position for the particle indexed by p, which changes with every iteration f, is always in the range [0, P-l].)

[00070] In conventional PSO algorithms, the velocity and position update rules embodied in equations (5) and (8) cause each particle to tend to move directly towards its own best position x p and the global best position x , often leading to stagnation as all particles catch up with the global best position x and stop moving before the true optimal value is reached.

[00071] In SPSOWM, by contrast, the velocity update rule used at step 550 of the method 500 is defined by equation (9) such that each particle tends to follow a spiral path towards the best position of a different particle, as well as the global best position x .

[00072] Figs. 6A and 6B illustrate the difference between typical particle paths in conventional PSO (Fig. 6A) and SPSOWM (Fig. 6B), as particles (e.g. 610 and 660) tend to move towards the global best position (600 and 650) in straight-line and spiral paths (620 and 670)

respectively.

[00073] According to SPSOWM, the position update rule used at step 555 of the method 500 is defined as

[00074] where r is a random number between 0 and 1 , where r less than a certain value (e.g., 1) is applied, and the sigmoid function is defined as Sigmoid{x) =— ( 1 1 )

1 + e 1

[00075] Also, according to the SPSOWM, the position update of step 555 is cancelled if a particle that is not the global best particle reaches the position x of the global best particle as a result of the position update applied at step 555. That is, x p (t )→ x p (t - 1 ) if (x " (t - 1 )≠ ) and (x if) = x) (12)

[00076] Equation (12) prevents a particle whose position p (() is different from the global best particle position x from ever reaching the global best particle position x .

[00077] The mutation rule used at step 560 of the method 500 is applied dependent on a predetermined probability of mutation, p m , between 0 and 1. For each component x (t)oi the position x '' (/) of each particle at each iteration t, a random number evenly distributed between 0 and 1 is generated. If the generated random number is less than the predetermined probability of mutation p m , the mutation rule will be applied to that component x (t) .

[00078] According to SPSOWM, the mutation rule used at step 560 of the method 500 is a wavelet mutation rule. The wavelet mutation rule operates on the position x (/) of the y-th component of the p-t particle as follows: a random number evenly distributed between 0 and 1 is generated. If the generated random number is less than a predetermined function of x (t), the component x p (/) is set to 1 ; otherwise, the component x (t) is set to 0. The predetermined function is Sigmoid{wm (t)), where the sigmoid function is defined as in equation (1 1).

[00079] and its argument wm p (t) is defined as

[00080] where ais a mutation parameter. If ais positive approaching 1 , the mutated position x p (/)of the particle will tend to the maximum value of 1. Conversely, when is negative approaching -1 , the mutated position x p (i)of the particle will tend to the minimum value of 0. A larger value of | σ | gives a larger searching space for the particle position χ Ί (/) is small, it gives a smaller searching space for fine-tuning the particle position x p t

[00081 ] The mutation parameter σ \$> defined by

σ - (14) la

[00082] where φ is a random number evenly distributed between -2.5a and 2.5α, ψ χ ' the Morelet wavelet function defined as

ψ{χ) - cosi (5x) (15)

[00083] and a is a dilation parameter. The value of the dilation parameter a is set to vary with the value of tIT to meet the fine-tuning purpose, where is the total number of iterations. To perform a local search when / is large, the value of a should increase as t/T increases to reduce the significance of the mutation. Hence, a is defined by a monotonic increasing function of t/T defined as follows:

[00084] where £ " wm is the shape parameter of the monotonic increasing function, and g is the upper limit of the dilation parameter a.

[00085] Because of the symmetry property of the Morlet wavelet in equation (15), the overall positive mutation and the overall negative mutation throughout the evolution are nearly the same. This property gives better solution stability (a smaller standard deviation of the solution values upon many trials).