METHOD OF MONITORING NETWORK TRAFFIC BY MEANS OF DESCRIPTIVE METADATA

Title:

METHOD OF MONITORING NETWORK TRAFFIC BY MEANS OF DESCRIPTIVE METADATA

Document Type and Number:

WIPO Patent Application WO/2011/051750

Kind Code:

Abstract:

The present invention provides a process for traffic monitoring comprising the steps of classifying information either as information relevant for traffic classification or as relevant for traffic accounting, generating metadata based on the information relevant for traffic classification, wherein this metadata information comprises the data necessary to classify this packet or flow into a specific application or service, and exporting the generated metadata through an interface, adapted to store and/or send the metadata to another device or module. This process reduces the complexity of DPI equipment and makes possible to build real-time traffic characterization systems which scale well in networks with high traffic volumes.

Inventors:

RAMON SALGUERO FRANCISCO JAVIER (ES)
GARCIA DE BLAS GERARDO (ES)

Application Number:

PCT/IB2009/007474

Publication Date:

May 05, 2011

Filing Date:

October 29, 2009

Export Citation:

Click for automatic bibliography generation Help

Assignee:

TELEFONICA SA (ES)
RAMON SALGUERO FRANCISCO JAVIER (ES)
GARCIA DE BLAS GERARDO (ES)

Foreign References:

US20050047333A1

2005-03-03

Other References:

See references of EP 2494744A4

Attorney, Agent or Firm:

CARPINTERO LOPEZ, Francisco (S.L.C/ Alcal, 35 Madrid, ES)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. - Process for network traffic monitoring comprising the steps of:

a. Classifying information either as information relevant for traffic classification or as relevant for traffic accounting

b. Generating metadata based on the information relevant for traffic classification, wherein this metadata information comprises the data necessary to classify a packet or flow into a specific application or service.

c. Exporting the generated metadata through an interface, adapted · to store and/or send the metadata to another device or module.

2. - Process according to claim 1 wherein the metadata comprises also signatures, strings of characters in the URL or the host of a HTTP GET request.

3. -Process according to any of claims 1-2 wherein the information relevant for traffic account includes the volume of bytes and packets per flow.

4. -Process according to any of the preceding claims wherein the metadata and the information for traffic account are correlated so as to provide a full classification of all the traffic flows into specific applications or services.

5. -Process according to any preceding claims wherein the metadata and the information for traffic account are provided as output and traffic is reclassified later on with updated signatures.

Description:

METHOD OF MONITORING NETWORK TRAFFIC BY MEANS OF

DESCRIPTIVE METADATA FIELD OF THE INVENTION

This invention belongs to the area of communication networks and, more specifically, to the field of traffic monitoring .

BACKGROUND OF THE INVENTION

Traffic monitoring is an important procedure in data network management, since it allows anticipating the need of upgrading node capacity or link bandwidth before the network becomes congested. This capacity planning is done in a time-scale of weeks or months and is a common operation procedure in operators' networks.

Traffic monitoring also allows building traffic matrixes, a matrix which contains the amount of traffic exchanged between each source and destination node or group of nodes.

These traffic matrixes are very useful for network planning, since, from a well-known network routing scheme, it is possible to obtain the load or traffic of each link. With this information, a network with resiliency to single or double failures in nodes or links can be built.

One of the main aspects in traffic monitoring is traffic identification or classification, which deals with the ^■ identification of the traffic as belonging to a specific application or service. The future demands of capacity are different from one application or service to another one. The classification of traffic in different typologies allows the application of finer grain policies in capacity planning, anticipating the real needs of equipment upgrading. Besides, it allows building traffic matrixes for each application or service, thus making the network planning finer. It also helps in network operation, since it allows diagnosing the reasons behind an unexpected growth of traffic in specific links.

Traffic identification can be done in several ways: · Based on port numbers. This technique is based on the fact that end applications use known TCP and User Datagram Protocol (UDP) ports for the connections, so from the used ports, a classification is done. For instance, web traffic uses HTTP and HTTPS protocols, which use respectively Transmission control Protocol (TCP) port numbers 80 and 443. The traffic identification can be done by using well-known lists of ports such as the official port numbers assigned by the Internet Assigned Numbers

Authority (IANA) , as well as unofficial port numbers used by specific applications.

This technique can be applied for real-time traffic classification (on the fly) , as well as for off-line analysis (from stored traffic traces) .

• Based on flow and packet patterns. These techniques are based on the examination of flow or packet characteristics such as the packet sizes, the port numbers, the TCP flags in case of TCP flows, the number of TCP connections established from a host towards a destination host with the same source or destination ports, duration of TCP connections, etc. These characteristics are compared with previously defined patterns, thus establishing the type of the traffic in terms of similarity to a traffic pattern.

Some of these techniques can be applied in realtime, but others can only be applied off-line since they require the whole traces. For instance; if one of the drivers to classify the traffic is the number of TCP connections established from a host towards a destination host with the same source or destination ports, this classification cannot be done until the whole trace is processed.

• Based on deep packet inspection (DPI) . This traffic classification is based on the analysis of the payload transported inside TCP and UDP protocols. For that reason, this traffic classification is commonly called Layer 7 classification. The traffic classification, can be based on the identification of application protocol primitives inside the payload or the identification of patterns (appearance of specific strings of bytes in the payload) .

In the 90' s, Cisco developed Netflow, a network protocol to run on Cisco IOS©-enabled equipment for collecting Internet Protocol (IP) traffic information. Although it is a proprietary solution, it is also supported by platforms other than IOS, such as Juniper© Networks' routers. The routers enabled with Netflow generate Netflow records, a traffic summary of bytes and packets sent/received per flow

(a flow is a tuple composed of source IP address, destination IP address, transport protocol, source port, destination port) during some period of time (typically 5 minutes) . These Netflow records are exported in a specific format to Netflow collectors, where records from several Netflow-enabled routers are received. Currently, Netflow is becoming an Internet Engineering Task Force (IETF) standard, called Internet Protocol Flow Information eXport

(IPFIX) , which is based on the Netflow Version 9.

Although Netflow is a solution for traffic accounting, not for traffic classification, the generated Netflow records can be post-processed to perform traffic analysis and classification based on port numbers, or on flow patterns.

However, in the last decade, the traffic classification based on port numbers was becoming more inexact due to several reasons:

• Different applications can use the same ports.

For instance, lots of applications use TCP port number 80 since this port number is not filtered by firewalls. Besides, some applications started to use the TCP port number 80 as a way to disguise its traffic as HTTP and not to be filtered.

• Some applications do not use a specific port so that they cannot be identified and therefore are becoming more difficult to be filtered by specific firewall rules.

• Traffic embedded in HTTP (TCP port 80) will be classified always as web traffic with no distinction of the type of traffic transported inside this kind of traffic. Currently,, video traffic is transported inside HTTP protocol, so with this technique, it is impossible to discriminate video traffic transported inside HTTP traffic.

• Classification requires establishing priority between ports, thus leading to false positives. A flow could have source port 4600 and destination port 5000 and it is necessary to establish priority rules in order to decide if the traffic belongs to the application that uses port number 4600 or the application that uses port number 5000.

Due .to these drawbacks, Netflow records have become less used as a way to classify the traffic, although they keep being used worldwide for traffic accounting, for the building of intra-domain and inter-domain traffic matrixes.

In order to fill the gap for traffic classification left by Netflow, several commercial products such as those from Sandvine©, iPoque or Cisco have appeared, providing solutions for traffic classification which inspect the packets traversing a link and classify each packet "on the fly" using DPI techniques or flow and packet patterns. Each packet is either identified as belonging to a specific kind of application or classified as unknown, but once the packet has been analysed there is no possibility for further analysis. As result of this classification, the tools provide reports with the traffic of each application during each time interval; These solutions offer the best performance for real-time classification in terms of detection errors (number of false positives, that is, flows classified into one type when they really belong to another type) and detection percentage (percentage of traffic not classified as unknown) . The risk of false positives is really low with these equipments since they inspect the packet payload and it is almost always possible to establish univocal packet patterns for each kind of traffic. Regarding the detection percentage, some studies performed by the DPI equipment manufacturers show that only around 10% of traffic in current operators' networks is classified as unknown. However, the current solutions for traffic classification have several drawbacks:

• They are not modular since they perform the tasks of traffic classification and traffic accounting in single equipment. The complexity of the accounting forces the equipment to keep state of each flow, which makes it highly intensive in memory and processing requirements, which, consequently, implies a higher price. This complexity grows with the link bandwidth since at high data rates, the equipment needs to keep state of more flows.

• The information about the traffic classification cannot be exported for further analysis. As stated before, there are exporting formats for traffic accounting (e.g. Netflow performs accounting of bytes per flow) , but there are no ways to export the decisions about traffic classification. Once a packet is classified, the packet is deleted and no information about this classification is exported. This has the drawback that it is not possible to reclassify the packets further again. If some packets are classified as unknown, these packets cannot be reclassified into other category, even if the methods to identify traffic improve. Besides, the equipment needs to be updated in order to keep the signatures updated, which allows classifying the traffic in the right category. Since the information about the traffic classification is not exported and reclassification is not possible, this forces the equipment to be updated frequently.

Therefore, a process is sought that reduces the complexity of DPI equipment and makes possible to build real-time traffic characterization systems which scale well in networks with high traffic volumes. BRIEF DESCRIPTION OF THE DRAWINGS

To complete the description and in order to provide for a better understanding of the invention, a set of drawings is provided. Said drawings form an integral part of the description and illustrate preferred embodimentsO of the invention, which should not be interpreted as restricting the scope of the invention, but just as an example of how the invention can be embodied. The drawings comprise the following figures:

Figure 1. -shows an example of a possible implementation of the invention. Figure 2. -is an example of how the modules in the possible implementation could be grouped into single equipment

Figure 3. -is an example of how the modules in the possible implementation could be grouped into three different equipments

DESCRIPTION OF THE INVENTION

The invention thus consists in a procedure of traffic classification that distinguishes between the information describing the type of content (application, service) transported in the payload of certain packets, and the information related to traffic accounting (count of bytes per flow) . This strategy allows decoupling the techniques to obtain both types of information: traffic classification and traffic accounting. The present invention consists in a new process for traffic monitoring comprising the steps of:

a. Classifying information either as information relevant for traffic classification (META) or as relevant for traffic accounting (ACC) b. Generating metadata based on the information relevant for traffic classification, wherein this metadata information comprises the data necessary to classify this packet or flow into a specific application or service. c. Exporting the generated metadata through an interface, adapted to store and/or send the metadata to another device or module.

can also include other information such as:

• Information that can help to classify traffic later on. For instance, signatures to identify and classify traffic could be improved so that traffic which was not identified in a specific moment could be identified later on with the new signatures. One example of this reclassification could be the video traffic embedded in HTTP. This traffic can be identified by the appearance of specific strings of characters in the URLs (pattern) . These patterns could change in any moment so that this traffic could be considered unknown. In order to identify this traffic, the metadata information generated from web data packets could include the whole URL of all unidentified HTTP GET requests. An off-line analysis could be performed on the generated metadata, inspecting the URLs of the unidentified HTTP GET requests, thus generating new patterns and enabling the off-line classification of those traffic flows matching with the given metadata.

• Information useful for purposes other than traffic classification. For instance; the metadata information generated from web data packets could include the host of the HTTP GET request, in order to get statistics of visited hosts.

The ACC information includes, for example, the volume of bytes and packets per flow.

DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

Figure 1 shows an example of the procedure that could be followed in order to classify traffic with the methodology described in this invention. A packet is intercepted by a traffic capturing module (module 1) . Then it is passed to a traffic detection module (module 2) which classifies the traffic either as META or as ACC. From the META information, the metadata generation module (module 3) extracts the interesting information called metadata and the metadata is exported by the exporting module (module 4) towards a correlation module (module 5) . The module 5 also receives the traffic accounting from a traffic accounting module (modules 6 and 7). Traffic accounting can be generated in different ways since the functionality of traffic accounting has been decoupled from the traffic classification. In the figure, the module 6 generates the traffic accounting from the ACC information. On the other hand, the module 7 in the figure performs the traffic accounting from other sources (module 8); for instance, the traffic accounting could be performed by a Netflow collector which receives the Netflow records, from several routers. Finally, the module 5 correlates the metadata and the traffic accounting, providing a full classification of all the traffic flows into specific applications or services .

The possible implementation depicted in Figure 1 is only a functional scheme. Functionalities of the different modules could be grouped into single equipment or separated into different equipments. Figure 2 shows an example of how the modules in the possible implementation could be grouped into single equipment (Equipment 1) such as the DPI equipment .

In Figure 2 it is shown that, besides the report on traffic classification, the traffic accounting and metadata are provided as output, making possible to reclassify the traffic later on with updated signatures.

Figure 3, on the other hand, shows an example of how the modules in the possible implementation could be grouped into three different equipments. In the figure, Equipment 1 could be identified as a DPI equipment simpler than the current ones (it will not perform the accounting) . Equipment 1 could also be a router card specialized in the identification, generation and exporting of metadata. The role of Equipment 2 is currently done, for instance, by Netflow collectors, which generate the traffic accounting information from the Netflow records of the routers. Finally, Equipment 3 would be a new device that performs the storage and correlation of information to generate the reports on traffic classification.

The invention allows the current DPI equipment to focus on the classification generating some metadata useful for identifying the type of traffic, whereas the traffic accounting could be done by different equipment (e.g. a router enabled with Netflow) . In this way, the complexity of DPI equipment can be reduced and building real-time traffic characterization systems which scale well in networks with high traffic volumes is made possible.

Furthermore, DPI equipment would not need to keep state of all identified flows for traffic accounting,- so its memory and processing requirements will be lower. Traffic accounting could be done by systems such as Netflow collectors, which are commonly used and deployed in operators' ^" networks. This reduction of complexity in the DPI equipment would imply operators' CAPEX savings. Also, due to the reduction of complexity in the DPI equipment, the functionality of detection and classification could be transferred to specific router cards, thus eliminating the need of new equipment and, consequently, decreasing the OPEX associated to manage one or more DPI equipments per network Point of presence (PoP) . Due to the generation and exporting of metadata, the traffic classification becomes more flexible since it is possible to add rules for metadata generation that can help to re-classify in a second stage traffic that was classified in a first stage as unknown. This is not possible nowadays with the DPI equipment. With the current DPI equipment, once a packet is classified, this packet is deleted, so it is not possible to classify it again, so if some packets are classified as unknown by the DPI equipment, these packets cannot be reclassified. However, with the proposed invention, an offline analysis could be done from the metadata to reclassify unknown flows. The metadata information can be used for purposes other than traffic classification. Statistics of visited hosts could be generated from metadata information which includes the hosts of the HTTP GET requests. Another example of useful information that could be extracted as metadata is the codec rates of videos embedded in web pages, whose distribution could allow predicting increases in network traffic due to changes in codec rates.

In this text, the term "comprises" and its derivations (such as "comprising", etc.) should not be understood in an excluding sense, that is, these terms should not be interpreted as excluding the possibility that what is described and defined may include further elements, steps, etc .

On the other hand, the invention is obviously not limited to the specific embodiments described herein, but also encompasses any variations that may be considered by any person skilled in the art, within the general scope of the invention as defined in the claims.

Previous Patent: NOVEL COMPOUNDS FOR ELECTRONIC MATERIAL AND ORGANIC ELECTRONIC DEVICE USING THE SAME

Next Patent: RADIOACTIVE DEBRIS TRAP