Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EARLY CLASSIFICATION OF NETWORK FLOWS
Document Type and Number:
WIPO Patent Application WO/2017/065627
Kind Code:
A1
Abstract:
The invention proposes a method for early classification of network flows with a training phase comprising capturing full-length training network flows, assigning classes to the captured full-length training network flows, training a predictor model on the full-length training network flows, obtaining truncated training network flows from the full-length training network flows, applying the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes, comparing the training classes predicted with the predictor model on the truncated training network flows with the assigned classes, and training a corrector model on the truncated training network flows taking into account the comparison of the training classes with the assigned classes. A prediction phase comprises receiving first packets of a non-classified network flow that is a subject for early classification, obtaining truncated non-classified network flows from the non-classified network flow, applying the predictor model to the truncated non-classified network flows, and outputting predictor model classification results, applying the corrector model to the truncated non-classified network flows taking into account the predictor model classification results, and outputting corrector model classification results, and combining the corrector model classification results to make a final prediction for the non-classified network flow.

Inventors:
SEROV ALEXANDER ALEXEEVICH (CN)
GLUKHOV VALERY NIKOLAEVITCH (CN)
Application Number:
PCT/RU2015/000662
Publication Date:
April 20, 2017
Filing Date:
October 12, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
International Classes:
H04L12/24; H04L12/26
Foreign References:
EP2521312A22012-11-07
US8095635B22012-01-10
Other References:
ZHENGZHENG XING ET AL: "Early classification on time series", KNOWLEDGE AND INFORMATION SYSTEMS ; AN INTERNATIONAL JOURNAL, SPRINGER-VERLAG, LO, vol. 31, no. 1, 26 April 2011 (2011-04-26), pages 105 - 127, XP035036978, ISSN: 0219-3116, DOI: 10.1007/S10115-011-0400-X
CHARLES ELKAN: "Nearest Neighbor Classification", 29 September 2008 (2008-09-29), XP055281391, Retrieved from the Internet [retrieved on 20160617]
ZHENGZHENG XING; JIAN PEI; PHILIP S. YU: "Early classification of time series", KNOWLEDGE AND INFORMATION SYSTEMS, vol. 31, no. 1, 2012, pages 105 - 127, XP035036978, DOI: doi:10.1007/s10115-011-0400-x
RAJAHALME ET AL.: "RFC 3697", 2004, IETF, article "IPv6 Flow Label Specification"
QUITTEK ET AL.: "RFC 3917", 2004, IETF, article "Requirements for IP Flow Information Export (IPFIX"
Attorney, Agent or Firm:
LAW FIRM "GORODISSKY AND PARTNERS" LTD. (RU)
Download PDF:
Claims:
Claims

1. Method for early classification of network flows,

the method comprising a training phase (101) and a prediction phase (111),

the training phase (101) comprising:

- capturing (102) full-length training network flows,

- assigning (103) classes (c) to the captured full-length training network flows,

- training (104) a predictor model on the full-length training network flows,

- obtaining (105) truncated training network flows from the full-length training network flows,

- applying (106) the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes (c),

- comparing (107) the training classes (c) predicted with the predictor model on the truncated training network flows with the assigned classes (c),

- training (108) a corrector model on the truncated training network flows taking into account the comparison of the training classes ( ) with the assigned classes (c),

the prediction phase (11 1) comprising:

- receiving (1 12) first packets of a non-classified network flow that is a subject for early classification,

- obtaining (1 13) truncated non-classified network flows from the non-classified net- work flow,

- applying (114) the predictor model to the truncated non-classified network flows, and outputting predictor model classification results (c),

- applying (1 15) the corrector model to the truncated non-classified network flows taking into account the predictor model classification results (c), and outputting cor- rector model classification results,

- combining (1 16) the corrector model classification results to make a final prediction for the non-classified network flow.

2. Method according to claim 1 ,

wherein training (104) the predictor model on the full-length training network flows comprises:

- extracting (201) a first set of features from the captured full-length training network flows to obtain a first training feature vector (x), and - training (202) the predictor model on the first training feature vector (x) and on the respective classes (c) assigned to the captured full-length training network flows.

3. Method according to claim 1 or claim 2,

wherein applying (106) the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes (c), comprises:

- extracting (301) the first set of features from the truncated training network flows to obtain a second training feature vector ( ),

- extracting (302) a second set of features from the truncated training network flows to obtain a third training feature vector (x),

- applying (303) the predictor model to the second training feature vector (x) of the truncated training network flows so as to obtain the training classes (c),

wherein training (108) the corrector model on the truncated training network flows taking into account the comparison of the training classes (c) with the assigned classes (c) comprises:

- training (304) the corrector model on the third training feature vector (x), on the assigned classes (c), and on the training classes (c).

4. Method according to any of the preceding claims,

wherein applying (1 14) the predictor model to the truncated non-classified network flows comprises:

- extracting (401) a first set of features from the truncated non-classified network flows,

- computing (402) a first prediction feature vector (x) with the first set of features from the truncated non-classified network flows,

- supplying (403) the computed first prediction feature vector (x) to the predictor model to obtain a corresponding predictor model classification result (c).

5. Method according to claim 4,

wherein applying (1 15) the corrector model to the truncated non-classified network flows comprises:

- extracting (501) a second set of features from the truncated non-classified network flows,

- computing (502) a second prediction feature vector (x) with the second set of features from the truncated non-classified network flows, - supplying (503) the computed second prediction feature vector (x) and the obtained predictor model classification result (c) to the corrector model to obtain a classification of the truncated non-classified network flows.

6. Method according to any of the claims 2 to 5,

wherein the first set of features extracted from a network flow comprise:

- the transport layer protocol of the flow,

- the mean size of the packets in the flow,

- the standard deviation of the size of the packets in the flow,

- the inter-arrival time of the packets in the flow,

- the standard deviation of the inter-arrival time of the packets in the flow,

- the entropy of the application layer payloads of the packets in the flow, or

- any combination thereof.

7. Method according to any of the claims 3 to 6,

wherein the second set of features extracted from a network flow comprise:

- the transport layer protocol of the flow,

- the conditional probability (p( | x)) of the second predictor model classification result (c) given the second feature vector (x) computed on the first L packets of the flow,

- the conditional probability (p( |j )) of the second predictor model classification result (c) given the second feature vector (x) computed on the first k packets of the flow, with k<L,

- the relative position of the k-th packet in the truncated flow, k/L,

- the relative difference of the mean size of the first k packets of the truncated flow and of all packets in the truncated flow,

- the relative difference of inter-arrival time of the first k packets of the truncated flow and of all packets in the truncated flow,

- the relative difference of the entropy of the application layer payloads of the first k packets of the truncated flow and of all packets in the truncated flow, or

- any combination thereof.

8. Method according to any of the preceding claims,

wherein obtaining (1 13) the truncated non-classified network flows from the non-classified network flow comprises:

obtaining truncated non-classified network flows by discarding last packets of the re- ceived first packets.

9. Method according to any of the preceding claims,

wherein at least one of the truncated non-classified network flows is obtained by truncating the non-classified network flow at a truncation length (L).

10. Method according to claim 9,

wherein the truncation length (L) is fixed.

11. Method according to claim 9 or 10,

wherein the non-classified network flow is additionally truncated at a length shorter than the truncation length (L) to obtain shorter non-classified network flows, and the prediction phase is additionally applied to the shorter non-classified network flows to improve the accuracy of the classification.

12. Computer program having a program code for performing the method according to any of the preceding claims, when the computer program runs on a computing device.

13. Device for early classification of network flows,

the device being adapted to carry out a training phase and a prediction phase,

the device being adapted to, in the training phase:

- capture full-length network flows, and assign classes (c) to the captured full-length training network flows,

- train a predictor model on the full-length training network flows,

- obtain truncated training network flows from the full-length training network flows,

- apply the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes (c),

- compare the training classes (c) predicted with the predictor model on the truncated training network flows with the assigned labels (c),

- train a corrector model on the truncated training network flows taking into account the comparison of the training classes (c) with the assigned classes (c),

the device being adapted to, in the prediction phase:

- receive first packets of a non-classified network flow that is a subject for early classi- fication,

- obtain truncated non-classified network flows from the non-classified network flow,

- apply the predictor model to the truncated non-classified network flows, and output predictor model classification results ( ),

- apply the corrector model to the truncated non-classified network flows taking into account the predictor model classification results (c), and output corrector model classification results,

- combine the corrector model classification results to make a final prediction for the non-classified network flow.

Description:
EARLY CLASSIFICATION OF NETWORK FLOWS

TECHNICAL FIELD

The present invention generally relates to the field of network flow classification. Specifically, the invention relates to machine learning techniques for early network traffic classification and identification of for example casual applications of traffic flows. Such classification relates to the management of traffic services, which management is supposed to be done in real time.

BACKGROUND

The prior art comprises some known approaches for performing a network traffic classification. The simplest approach for performing a network traffic classification relies upon port numbers. This approach leads to a relatively fast classification method since it does not inspect packet pay loads. However, it is applicable only if fixed ports are in use. Also, this approach cannot detect disguised traffic.

Flow-based analysis approaches have proven some efficiency. A flow is a set of packets which share the same transport layer protocol, the same source and destination IP addresses, and the same source and destination TCP or UDP port numbers.

Deep Packet Inspection (DPI) is another approach for performing a network traffic classification that is based on pattern matching technique. DPI may inspect packet payloads alone with the headers and thus is a much more computationally de- manding method. Advantageously, this method works in a port agnostic way by comparing application signatures against the network flows to be classified. However, DPI necessitates the maintenance of a bulky application signatures database.

Other approaches include behavioral and statistical analysis. The former characterizes network hosts building their profiles while the latter is based on machine learning classification algorithms trained on statistical features. The features can be built either on packet- or on flow-level statistics. Fig. 6 shows a comparison of the packet-level statistics and the flow-level statistics according to the state of the art. In the case of packet-level statistics, the averaging is done over the flows that belong to a particular application for each of the first few packets of the flows, as shown in Fig. 6. In the case of flow-level statistics, the averaging is done over all packets of a particular flow.

A machine learning treatment of the classification problem implies a two-phase workflow. The first phase, called training phase, comprises a generalization on the labeled instances using their features. The second phase of the workflow, called prediction phase, comprises a prediction for unlabeled, i.e., non-classified instances. Usually, the training phase ends up with either a generative or a discriminative statistical model. In the course of network traffic analysis, such model can be built off-line from packet captures. Before the training phase, the packets should be assembled in flows and these flows should be mapped to a feature space. Moreover, one should assign a label to each flow according to the application that produced it. Once built on labeled dataset, the model can be applied to unlabeled, i.e., non-classified flows to infer their applications during the prediction phase. The prediction phase does not require a lot of com- puting resources and thus can be performed in real time.

Being processed on-line, the packets of the non-classified flows arrive one-by-one, so that at any instance of time the prediction phase has to deal not with the whole non-classified flow but only with its first few packets, which can be referred to as a truncated flow. The sooner the prediction phase will be able to classify a flow using only its first packets, the earlier it is possible to apply certain policy or rule to boost the overall performance of the network. Thus, the earliness of the prediction, which is determined by the length of a truncated flow, is crucial.

In the followings, state of the art methods related to the topic of network traffic classification and its earliness are summarized.

The document US 8 095 635 B2 proposes to manage network traffic for improved availability of network services. The document is related to network traffic classification based on flow-level statistics that can be obtained using standard NetFlow records. NetFlow refers specifically to the Cisco implementation of packet-trace statistics, wherein equivalent approaches include the NetStream approach of Huawei. Some features built from those records data are irrelevant to the application classes and thus should be eliminated to avoid unnecessary computations. In the document, it is proposed to rank the features using the following symmetric uncertainty measure:

wherein H represents the entropy function, (A], A m ) represent the features, and C represents the flow class, which is considered as a feature.

In said document, it is further on proposed to determine the goodness of any given subset, S, of features by the following value:

By adding features sequentially in decreasing order of symmetric uncertainty and monitoring the amount of increase in the goodness measure, the desired set of features can be selected. An implementation of the proposed method can utilize the Net- Flow sampling feature. As shown in the document, packet sampling does not significantly affect the accuracy of the classifier because when the sampling rate is low a more homogeneous set of large flows can survive the sampling, which perceivably leads to a more accurate classification.

However, this method disadvantageous^ does not address the problem of early classification of network flows.

The document "Early classification of time series" by Zhengzheng Xing, Jian Pei, Philip S. Yu, Knowledge and Information Systems 2012:31 :1, 105-127, proposes an approach for early classification of time series (ECTS) of the type: wherein L represents the full length of the time series.

Being based on INN classification algorithm, the method proposed in said document relies upon a distance measure between two time series according to the following equation:

Said document proposes to elaborate the concept of Minimum Prediction Prefix (MPP). By construction, for a time series /, any prefix longer than MPP(/) is redundant in INN classification and can be removed to achieve early classification.

However, the ECTS method described in said document is based on the INN classification algorithm and uses the following assumptions:

1. any time series is a sequence of pairs (timestamp, value),

2. all time series are of length L,

3. there is a metric defining distance between two time series, and

4. the training dataset is sufficiently large and uniform sample of time series.

Inevitably, these assumptions have to be satisfied when the method is adjusted to network flows. First of all, by no means the INN classification is the best classification algorithm. There are in fact a bunch of other classifiers, such as Naive Byes, support vector machines (SVM) or decision trees, that demonstrate superior performance. Secondly, it is not possible to guaranty "uniformness" of the training dataset even if the Euclidian metric (4) is redesigned to count for packet features that can be either nominal (e.g. header flags) or numerical (e.g. payload size). Thirdly, the above-mentioned as- sumptions 1 and 3 necessitate the storage of packet-level features that are much more memory consuming than flow-level features. Finally, the prediction length varies for each time series that has been reliably classified by the ECTS method.

SUMMARY

Having recognized the above-mentioned disadvantages and problems, the pre- sent invention aims to improve the state of the art. In particular, the object of the present invention is to provide an improved early classification of network flows.

In contrast to known classification methods, where the optimization goal is set to maximize the classification accuracy, the present invention particularly intends to improve the early classification by optimizing the earliness as long as the classification accuracy is satisfactory.

The above-mentioned object of the present invention is achieved by the solution provided in the enclosed independent claims. Advantageous implementations of the present invention are further defined in the respective dependent claims.

According to a first aspect of the present invention, there is provided a method for early classification of network flows. The method comprises a training phase and a prediction phase.

The training phase comprises capturing full-length training network flows. The training phase comprises assigning classes to the captured full-length training network flows. The training phase comprises training a predictor model on the full-length training network flows. The training phase comprises obtaining truncated training network flows from the full-length training network flows. The training phase comprises applying the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes. The training phase comprises comparing the training classes predicted with the predictor model on the truncated training network flows with the assigned classes. The training phase comprises training a corrector model on the truncated training network flows taking into account the comparison of the training classes with the assigned classes. The prediction phase comprises receiving first packets of a non-classified network flow that is a subject for early classification. The prediction phase comprises obtaining truncated non-classified network flows from the non-classified network flow. The prediction phase comprises applying the predictor model to the truncated non-classified network flows, and outputting predictor model classification results. The prediction phase comprises applying the corrector model to the truncated non-classified network flows taking into account the predictor model classification results, and out- putting corrector model classification results. The prediction phase comprises combining the corrector model classification results to make a final prediction for the non-classified network flow.

Thereby, the proposed method tackles the problem of early classification of network flows by means of machine learning classification techniques. Advantageously, the method balances the tradeoff between the accuracy and the earliness of the prediction. Further on, the method is based on flow-level rather than packet-level features. Further on, the method is generic enough to avoid limitations imposed by any particular machine learning classification algorithm.

Thereby, the proposed method supplements a baseline feature-based network flow classification algorithm, called predictor model, with a corrector model which captures sensitivity of the algorithm towards flow truncation. This technique facilitates accomplishing flow classification earlier and, at the same time, with higher accuracy. Upon being trained in off-line mode, the proposed method may classify truncated non-classified flows in real-time. First, it applies the original algorithm, i.e. the predictor model, to the features computed on truncated flows, then computes features used by the corrector model, and, finally, apply it to improve the prediction made by the original algorithm.

In a first possible implementation form of the method according to the first aspect, training the predictor model on the full-length training network flows comprises: extracting a first set of features from the captured full-length training network flows to obtain a first training feature vector, and training the predictor model on the first training feature vector and on the respective classes assigned to the captured full-length training network flows.

In a second possible implementation form of the method according to the first aspect as such or according to the first implementation form of the first aspect, applying the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes, comprises: extracting the first set of features from the truncated training network flows to obtain a second training feature vector, extracting a second set of features from the truncated training network flows to obtain a third training feature vector, and applying the predictor model to the second training feature vector of the truncated training network flows so as to obtain the training classes. Training the corrector model on the truncated training network flows taking into account the comparison of the training classes with the assigned classes comprises training the corrector model on the third training feature vector, on the assigned classes, and on the training classes.

In a third possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, applying the predictor model to the truncated non-classified network flows comprises: extracting a first set of features from the truncated non-classified network flows, computing a first prediction feature vector with the first set of features from the truncated non-classified network flows, and supplying the computed first prediction feature vector to the predictor model to obtain a corresponding predictor model classification result.

In a fourth possible implementation form of the method according to the third implementation form of the first aspect, applying the corrector model to the truncated non-classified network flows comprises: extracting a second set of features from the truncated non-classified network flows, computing a second prediction feature vector with the second set of features from the truncated non-classified network flows, and supplying the computed second prediction feature vector and the obtained predictor model classification result to the corrector model to obtain a classification of the truncated non-classified network flows.

In a fifth possible implementation form of the method according to any of the first to fourth implementation forms of the first aspect, the first set of features extracted from a network flow comprise: the transport layer protocol of the flow, the mean size of the packets in the flow, the standard deviation of the size of the packets in the flow, the inter-arrival time of the packets in the flow, the standard deviation of the inter-arrival time of the packets in the flow, the entropy of the application layer payloads of the packets in the flow, or any combination thereof.

In a sixth possible implementation form of the method according to any of the second to the fifth implementation forms of the first aspect, the second set of features extracted from a network flow comprise: the transport layer protocol of the flow, the conditional probability of the second predictor model classification result given the second feature vector computed on the first L packets of the flow, the conditional probability of the second predictor model classification result given the second feature vector computed on the first k packets of the flow with k<L, the relative position of the k-th packet in the truncated flow k/L, the relative difference of the mean size of the first k packets of the truncated flow and of all packets in the truncated flow, the relative difference of inter-arrival time of the first k packets of the truncated flow and of all packets in the truncated flow, the relative difference of the entropy of the application layer payloads of the first k packets of the truncated flow and of all packets in the truncated flow, or any combination thereof. Thereby, the corrector model may be based on features different from those used by the original algorithm, i.e. by the predictor model. Moreover, it may treat derivatives of the predictions made by the original algorithm on truncated flows, as features.

In a seventh possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, obtaining the truncated non-classified network flows from the non-classified network flow comprises obtaining truncated non-classified network flows by discarding last packets of the received first packets.

In an eight possible implementation form of the method according to the first aspect as such or according to any of the implementation forms of the first aspect, at least one of the truncated non-classified network flows is obtained by truncating the non-classified network flow at a truncation length.

In a ninth possible implementation form of the method according to the eight implementation form of the first aspect, the truncation length is fixed.

In a tenth possible implementation form of the method according to the eight or the ninth implementation forms of the first aspect, the non-classified network flow is additionally truncated at a length shorter than the truncation length to obtain shorter non-classified network flows, and the prediction phase is additionally applied to the shorter non-classified network flows to improve the accuracy of the classification.

According to a second aspect of the present invention, there is provided a computer program having a program code for performing a method according to the first aspect of the present invention when the computer program runs on a computing device.

According to a third aspect of the present invention, there is provided a device for early classification of network flows. The device is adapted to carry out a training phase and a prediction phase. The device is adapted to, in the training phase, capture full-length network flows, and assign classes to the captured full-length training network flows, train a predictor model on the full-length training network flows, obtain truncated training network flows from the full-length training network flows, apply the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes, compare the training classes predicted with the predictor model on the truncated training network flows with the assigned labels, train a corrector model on the truncated training network flows taking into account the comparison of the training classes with the assigned classes. The device is adapted to, in the prediction phase, receive first packets of a non-classified network flow that is a subject for early classification, obtain truncated non-classified network flows from the non-classified network flow, by discarding the last received packets, apply the predictor model to the truncated non-classified network flows, and output predictor model classification results, apply the corrector model to the truncated non-classified network flows taking into account the predictor model classification results, and output corrector model classification results, and combine the corrector model classification results to make a final prediction for the non-classified network flow.

It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be full formed by external entities not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.

BRIEF DESCRIPTION OF DRAWINGS

The above aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which

Fig. 1 shows a schematic diagram of a method for early classification of network flows according to an embodiment of the present invention,

Fig. 2 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention,

Fig. 3 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention,

Fig. 4 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention,

Fig. 5 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention,

Fig. 6 shows a comparison of the packet-level statistics and the flow-level statistics,

Fig. 7 shows a classification workflow according to the prior art,

Fig. 8 shows a classification workflow according to an embodiment of the present invention,

Fig. 9 shows the improvement of an embodiment of the present invention in terms of earliness and accuracy.

DETAILED DESCRIPTION OF EMBODIMENTS

Fig. 1 shows a schematic diagram of a method for early classification of network flows according to an embodiment of the present invention.

The method comprising a training phase 101 and a prediction phase 111. The training phase 101 may be carried out off-line, i.e. before the classification, from net- work traffic captures in the form of training network flows. It may be carried out off-line since this phase is not time critical. The prediction phase 111 , on the other hand, may be carried out on-line, i.e. the classification may be done in real-time while network traffic to be classified is received. This prediction phase 111 may be carried out on-line so as to ensure that an early classification can be reached by optimizing the earliness as well as the classification accuracy.

The method of the diagram comprises the following steps during the training phase 101.

In a first step 102, full-length training network flows are captured. Training network flows are network flows captured during the training phase. A network flow may be either a Transmission Control Protocol (TCP) or a User Datagram Protocol (UDP) network flow passing an observation point in the network during a certain time interval. A network flow may be characterized by a 5-tuple consisting of:

- source IP address,

- source port,

- destination IP address,

- destination port, and

- transport layer protocol.

The transport layer protocol is the protocol of the layer 4 according to the Open Systems Interconnection (OSI) model. For example, the transport layer protocol may be either TCP or UDP. Each network flow may be transferred by through a pair of sockets created by communicating applications.

The network flow relates particularly to packet switching networks and may correspond to a sequence of packets from a source to a destination. The source may be a network entity defined by the source IP address and by the source port, while the destination may be another network entity defined by the destination IP address and by the destination port. Capturing a network flow then may consist in capturing said sequence of packets and assembling said sequence of packets in a network flow.

According to RFC 3697, "IPv6 Flow Label Specification", Rajahalme et al., IETF, 2004, a network flow may be a sequence of packets sent from a particular source to a particular unicast, anycast, or multicast destination that the source desires to label as a flow, and such a network flow could consist of all packets in a specific transport connection or a media stream. According to RFC 3917, "Requirements for IP Flow Information Export (IPFIX)", Quittek et al., IETF, 2004, a network flow may be a set of IP packets passing an observation point in the network during a certain time interval, wherein all packets belonging to a particular flow may have a set of common properties.

A full-length network flow consists of all packets transferred through a pair of sockets during its lifetime. Alternatively, and particularly in practice, a full-length network flow may not consist of all the packets transferred through the pair of sockets during its lifetime, but may consist of the packets captured during a sufficiently long period of time, i.e. captured during a period of time longer than a threshold.

In a further step 103, classes c are assigned to the captured full-length training network flows. The classes c or labels may be assigned to each captured full-length training network flow according to the application that produced it. The full-length training network flows may be grouped by applications that produced them and corresponding classes c may be assigned to said flows. Alternatively, the classes c may be assigned according to for example the platform or the services that produces the network flows.

In a further step 104, a predictor model is trained on the full-length training network flows.

In a further step 105, truncated training network flows are obtained from the full-length training network flows.

In a further step 106, the predictor model trained on the full-length training network flows is applied to the truncated training network flows, to obtain a plurality of training classes c.

In a further step 107, the training classes c predicted with the predictor model on the truncated training network flows are compared with the assigned classes c.

In a further step 108, a corrector model is trained on the truncated training network flows taking into account the comparison of the training classes c with the as- signed classes c.

In the training phase 101, the predictor model is trained 104 on the full-length training network flows and applied 106 to the truncated training network flows to obtain training classes. The corrector model is trained 108 on these obtained training classes, while also considering the results of a comparison 107 between the results of the pre- dictor model and a class assignment 103.

In particular, during the training phase 101, first, the predictor model is trained 104 on full-length training network flows, then the predictor model is applied 106 to make prediction on the truncated training network flows in order to observe whether the predictor model misclassifies a truncated training network flow or not. By doing so, the invention provides input the corrector model to be trained 108 on. In this sense, the corrector model is a meta-model or, speaking in terms of ensemble learning, a kind of stacked generalization.

The method of the diagram comprises the following steps during the prediction phase 111.

In a step 112, first packets of a non-classified network flow that is a subject for early classification are received. In the diagram of Fig. 1, the step 112 is carried out after the step 108 of the training phase 101. In this respect, the training phase 101 is preferably performed off-line and the prediction phase 111 on-line.

In a further step 113, truncated non-classified network flows are obtained from the non-classified network flow. The truncated non-classified network flows may be obtained on-line after receiving a given number of packets by combining said received given number of packets. The combined packets of a truncated non-classified network flow are preferably a part of the totality of the packets belonging to the non-classified network flow. A truncated non-classified network flow may be obtained by discarding the last received packets.

In a further step 114, the predictor model is applied to the truncated non-classified network flows, and outputting predictor model classification results c.

In a further step 115, the corrector model is applied to the truncated non-classified network flows taking into account the predictor model classification results c, and corrector model classification results are outputted.

In a further step 116, the corrector model classification results are combined to make a final prediction for the non-classified network flow.

In the prediction phase 1 1 1, a class of flow is predicted by applying 114 the predictor model and applying 115 the corrector model to truncated non-classified network flows. In fact, noise occurs when applying the predictor model to the truncated non-classified network flow instead of applying the predictor model to the full-length non-classified network. By applying the corrector model, this noise can be reduced.

The prediction phase 111 may in fact start without waiting for all packets of a network flow to be classified. The corrector model may be applied to truncated non-classified network flows that are shorter than the non-classified network flow supplied to the predictor model. So, during the prediction phase 11 1, the predictor model may be applied 114 to all the received packets once, as it is done for a full-length training network flow during the training phase 101. Then, the corrector model is ap- plied 115 to the truncated non-classified network flows, obtained from said received packets supplied to the predictor model, multiple times. By doing so, the multiple predictions obtained from the corrector model are then combined 1 16.

In step 115, the corrector model yields multiple predictions, i.e. multiple corrector model classification results that are made on all truncated non-classified network flows obtained 113 from the received 1 12 packets that are supplied in step 1 14 to the predictor model. The predictions may be combined 116, for example by scoring, to produce a final prediction of the class of the non-classified network flow.

Fig. 2 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention. According to this embodiment, the step 104 of the embodiment of Fig. 1 of training the predictor model on the full-length training network flows comprises steps 201 and 202.

In step 201, a first set of features is extracted from the captured full-length training network flows to obtain a first training feature vector x.

In step 202, the predictor model is trained on the first training feature vector x and on the respective classes c assigned to the captured full-length training network flows.

Fig. 3 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention. According to this embodiment, step 106 of the embodiment of Fig. 1, of applying the predictor model trained on the full-length training network flows to the truncated training network flows, to obtain a plurality of training classes c, comprises steps 301, 302 and 303. Further on, step 108 of the embodiment of Fig. 1, of training the corrector model on the truncated training network flows taking into account the comparison of the training classes c with the assigned classes c, comprises step 304.

In step 301, the first set of features is extracted from the truncated training network flows to obtain a second training feature vector x.

In step 302, a second set of features is extracted from the truncated training network flows to obtain a third training feature vector x.

In step 303, the predictor model is applied to the second training feature vector x of the truncated training network flows so as to obtain the training classes c.

In step 304, the corrector model is trained on the third training feature vector x, on the assigned classes c, and on the training classes c.

Fig. 4 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention. According to this embodiment, the step 114 of the embodiment of Fig. 1, of applying the predictor model to the truncated non-classified network flows, comprises steps 401 to 403.

In step 401, a first set of features is extracted from the truncated non-classified network flows.

In step 402, a first prediction feature vector x is computed with the first set of features from the truncated non-classified network flows.

In step 403, the computed first prediction feature vector x is supplied to the predictor model to obtain a corresponding predictor model classification result c.

Fig. 5 shows a schematic diagram of a method for early classification of network flows according to a further embodiment of the present invention. According to this embodiment, step 115 of the embodiment of Fig. 1 , of applying the corrector model to the truncated non-classified network flows, comprises steps 501 to 503.

In step 501 , a second set of features is extracted from the truncated non-classified network flows.

In step 502, a second prediction feature vector x is computed with the second set of features from the truncated non-classified network flows.

In step 503, the computed second prediction feature vector x and the obtained predictor model classification result are supplied to the corrector model to obtain a classification of the truncated non-classified network flows.

According to a further embodiment of the invention, traffic captures may contain training network flows that can be grouped for example by applications that produced them, wherein c represents classes or labels for those application groups. The invention proposes a baseline feature-based classification algorithm capable of learning on training network flows to predict the applications for non-classified or unlabeled network flows using their features. In the proposed learning approach, it is proposed to extract 201 features for the full-length training network flows and to train 202 the predictor model accordingly in the training phase 101. Once the predictor model has been fitted to data, it may be used for prediction. For example, it is proposed to derive the conditional posterior probability p(c\x) for this predictor model. If x is a first training feature vector computed for a full-length training network flow, the value p(c\x) represents the probability that the full-length training network flow belongs to the class c. Then the application that produced the flow can be found by means of the following equation:

class( ) «- argmax c (c|x) (5) In the prior art, it is assumed that the features have been computed on full-length training network flows, which means that an averaging is carried out over all packets of each flow. Fig. 7 shows a corresponding classification workflow according to the prior art. In an off-line training phase, a training dataset corresponding to the full-length training network flows is used to train the model, i.e. the predictor model. In an on-line prediction phase, the trained predictor model is applied on a received test stream for carrying out in real time a prediction.

Now, according to the present invention, it is proposed to compute the same set of features on truncated versions of the full-length training network flows. If averaging is performed over only the first few packets of the full-length training network flow, it is possible to obtain a second training feature vector x that is computed on a truncated training network flow. Since both the first training feature vector x and the second training feature vector x realize the same set of features called first set of features, it is possible to apply the above-mentioned equation (5) to the second training feature vector x so as to perform an early classification of the truncated flow according to the fol- lowing equation:

class( c) «- argmaXc (c|£) (6) The earliness of the prediction is determined either by the truncation length or by the time interval between the first and the last packets in every truncated network flow to be classified. Although both restriction types are eligible, the first one may particularly be imposed. That is, the present invention strives for the most accurate prediction that may be achieved on truncated non-classified network flows given a fixed truncation length. At the same time, the invention takes advantage of full-length training network flows during the training phase.

Since the second vector x computed on a truncated flow are noisy, the class c predicted according to the equation (6) may differ from the class c computed on the corresponding full-length flow according to the equation (5). The core idea of the pre- sent invention consists in building a supplemental model called corrector model that captures the sensitivity of the predictor model towards flow truncation. The wording "sensitivity" here represents the difference in predictions made by the predictor model on truncated network flows versus the predictions for the corresponding full-length network flows. The corrector model is adapted to predict pairs (c, c). Its features x may be different from those of the predictor model. Moreover, being the second-level model, the corrector model may utilize features derived from predictions made by the predictor model.

It may be considered that p(c\x, c) is the posterior probability that a flow belongs to class c given that its first packets have composed a truncated flow with features x, and that this truncated flow has been classified by the predictor model to belong to class c. Then the class of the flow may be computed according to the followings:

class (x) *- argmax c p (c | x , c) (7) Fig. 8 shows a classification workflow for early classification of network flows according to an embodiment of the present invention.

Both predictor and corrector models are trained in off-line mode. First, it is proposed to extract 201 features x from the full-length training network flows, which are considered to be instances for the predictor model's construction, and to train 202 the predictor model on these features and the associated class labels.

Second, it is proposed to sample packets in the full-length training network flows and compose 105 truncated training network flows. By construction, any truncated training network flow consists of all packets in a sampled flow starting from the first and ending by the packet drawn by sampling. To build the corrector model, it is proposed to extract 301, 302 features x and x from the truncated training network flows, apply 303 the already constructed predictor model to x to obtain training classes c, which repre- sent a guess computed by the predictor model on truncated training network flows about the classes of the corresponding full-length training network flows.

Then it is proposed to confront 107 the training classes c with those predicted by the prediction model on the corresponding full-length training network flows c, and train 304 the corrector model on features x and pairs (c, c).

After the models have been trained, they are used for online classification during the prediction phase 111. For a network flow to be classified, first, features x are computed 401 and supplied 402 to the predictor model to obtain c from equation (6). Then, features x are computed 502 and supplied 503 to the corrector model to obtain the class c from equation (7).

If the truncation length L is fixed and resources are available for repeating the classification procedure described above several times for truncated flows of various lengths shorter than L which correspond to the flow to be classified, several values of c may be obtained which can be combined, for example, by scoring to further improve the accuracy.

An embodiment of the method has been tested on traffic captures which contained flows produced by four consecutive Skype VoIP calls and a P2P client working in background. For the test, the flows have been labeled with a Deep Packet Inspection (DPI) utility that extracted traffic from Skype. The following table shows the statistics for these traffic captures:

A baseline support vector machines (SVM) algorithm has been applied to these captured in round robin fashion four times. During each experiment, it has been tried to detect Skype flows in one capture using the other three for learning. After each experiment, performance metrics, such as precision, recall and Fl -score that combines the first two metrics, have been evaluated.

The following features (elements of x and x) have been extracted from the instances supplied to the predictor model:

- Transport layer protocol of the flow (TCP or UDP).

- Mean size of the packets in the flow.

- Standard deviation of the size of the packets in the flow. - Inter-arrival time of packets in the flow.

- Standard deviation of the inter-arrival time of the packets in the flow.

- Entropy of the application layer payloads of the packets in the flow.

The following features (elements of x) have been extracted from the instances supplied to the corrector model:

- Transport layer protocol of the flow (TCP or UDP).

- p(.c\x) for each application class c and x computed on the first L packets of the flow.

- p(c\x) for each application class c and x computed on the first k packets (k<L) of the flow.

- Relative position of the k-th packet in the truncated flow, k/L.

- Relative difference of the mean size of the first k packets of the truncated flow and of all packets in the truncated flow.

- Relative difference of inter-arrival time of the first k packets of the truncated flow and of all packets in the truncated flow.

- Relative difference of the entropy of the application layer payloads of the first k packets of the truncated flow and of all packets in the truncated flow.

Fig. 9 shows the improvement of an embodiment of the present invention in terms of earliness and accuracy, and particularly shows the achieved accuracy boost measured for various truncation lengths of the classified network flows.

Said Fig. 9 shows the overall Fl -score measured for the truncation length L varying from 1 to 10. The dark color depicts accuracy of the baseline SVM algorithm, while the lighter color represents the improvement attained by the present invention which ranges from 13 up to 30%. As shown in Fig. 9, the highest improvement of up to 30% is achieved for the values 3, 4 and 5.

The horizontal dotted line corresponds to the accuracy of the baseline algorithm measured for full-length flows. The method of the present invention reaches the same accuracy using only first five packets of each flow which means it can accomplish classification much faster.

The present invention has been described in conjunction with various embodi- ments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word "comprising" does not exclude other elements or steps and the indefinite article "a" or "an" does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.