METHODS AND APPARATUS FOR PROCESSING DATA IN A NETWORK

Title:

METHODS AND APPARATUS FOR PROCESSING DATA IN A NETWORK

Document Type and Number:

WIPO Patent Application WO/2016/151311

Kind Code:

Abstract:

The present invention relates to methods of processing data in a network, a method of load balancing in a field programmable gate array in processing packetised data, a method of monitoring data in a network (50)and related apparatus (10). In an aspect, the method comprises receiving a stream of data packets over a network interface representing different flows. A Field Programmable Gate Array (12) is used to perform variable position keyword and/or signature matching in the data packets indicative of the presence of an application protocol of interest or application protocol event of interest. The results of the keyword or signature matching are communicated to software (21). A determination is made based on the results that a flow is of interest and extracting flow information for the flow. Software packets belonging to a flow of interest or related flows are then processed differently (25,27) from packets of other flows.

Inventors:

RUDD MARTIN (GB)
SMITH DOMINIC (GB)

Application Number:

PCT/GB2016/050788

Publication Date:

September 29, 2016

Filing Date:

March 22, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

TELESOFT TECH LTD (GB)

International Classes:

H04L47/2475

Other References:

HAOYU SONG ET AL: "Snort offloader: a reconfigurable hardware NIDS filter", INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2005 : [FPL] ; 24 - 26 AUG. 2005, [TAMPERE HALL, TAMPERE, FINLAND ; PROCEEDINGS], IEEE OPERATIONS CENTER, PISCATAWAY, NJ, 24 August 2005 (2005-08-24), pages 493 - 498, XP010839944, ISBN: 978-0-7803-9362-2, DOI: 10.1109/FPL.2005.1515770
QUISHI DING ET AL: "Kill Switch: Hardware-Based Line-Rate Filtering and Capture of 10Gb/s Ethernet Network", CSEE 4840 SPRING 2013 PROJECT PROPOSAL WORKING-REAL GROUP, 6 November 2014 (2014-11-06), pages 1 - 3, XP055281249, Retrieved from the Internet [retrieved on 20160616]
KIM NAMUK ET AL: "A Scalable Carrier-Grade DPI System Architecture Using Synchronization of Flow Information", IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, US, vol. 32, no. 10, 1 October 2014 (2014-10-01), pages 1834 - 1848, XP011565261, ISSN: 0733-8716, [retrieved on 20141127], DOI: 10.1109/JSAC.2014.2358836

Attorney, Agent or Firm:

BECK GREENER et al. (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1 . A method of processing data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

with a Field Programmable Gate Array, performing variable position keyword and/or signature matching in the data packets indicative of the presence of an application protocol of interest or application protocol event of interest; communicating the results of the keyword or signature matching to software;

determining based on the results that a flow is of interest and extracting flow information for the flow;

based on the flow information, causing with the software packets belonging to a flow of interest or related flows to be processed differently from packets of other flows.

2. A method according to claim 1 , wherein processing differently comprises discarding packets or routing packets belonging to flows of interest or related flows.

3. A method according to claim 1 or claim 2, comprising performing further monitoring functions on non-discarded or routed packets.

4. A method according to any of claims 1 to 3, comprising extracting the source IP address for flows of interest and processing differently packets having that source IP address, wherein related flows are flows originating from that source IP address.

5. A method according to any of claims 1 to 4, wherein the keywords or signatures used in matching include one or any combination of:

a specific application protocol at the beginning of a packet payload;

application protocol header fields;

parameters in the uniform resource indicator field;

a fully qualified host name or referrer field; a host name address mapped to a known organisation.

6. A method according to any of claims 1 to 5, comprising generating statistics for a plurality of packets in a particular flow;

comparing the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

determining a flow is of interest based on the degree of match in the statistics together with the results of the keyword and/or signature matching. 7. A method according to claim 6, wherein the statistics comprise one or more of:

packet size;

separation of packets;

separation of segments.

8. A method according to claim 6 or claim 7, wherein a FPGA generates some or all of the statistics and passes the results to the software to determine a flow is of interest. 9. A method according to any of claims 6 to 8, comprising creating with the FPGA a checksum or signature of the packet header fields of incoming packets indicative of the identity of the flow, the FPGA creating a flow record for each newly observed checksum or signature to store statistics for each flow. 10. A method according to claim 9, comprising sending the flow record to the software when a predetermined number of packets in a flow have been processed or after a predetermined time period has elapsed.

1 1 . A method according to any of claims 1 to 10, wherein a heuristic process is used on the results of the keyword or signature matching and statistics matching to determine whether a flow is of interest.

12. A method according to any of claims 1 to 1 1 , comprising applying predetermined rules defining how flows or related flows are differently processed based on whether or not a flow is of interest. 13. A method according to any of claims 1 to 12, comprising creating with the FPGA a checksum or signature of the packet header fields of incoming packets indicative of the identity of the flow.

14. A method according to any of claims 1 to 13, comprising making flows of interest available to a human operator to confirm that the flow is of interest.

15. A method according to any of claims 1 to 14, wherein flows of interest are streamed video and/or audio. 16. A method according to any of claims 1 to 15, comprising the FPGA passing protocol events or packets belonging to flows of interest to the software, the software determining the presence of multiple segments in the same flow and/or the gap between segments in the same flow and including the

determination in the process of determining whether a flow is of interest.

17. A method according to any of claims 1 to 16, wherein the FPGA comprises at least a first filter arranged to perform said variable position keyword and/or signature matching and a second filter programmable by the software, the software, having determined a flow of interest, dynamically programming the second filter of the FPGA to filter packets relating to that flow or having the same source IP address as that flow.

18. A method according to claim 17, wherein the second filter additionally filters packets containing keywords or signatures corresponding to protocol events of interest in the flows of interest.

19. A method according to claim 17 or claim 18, wherein the second filter results are returned to the software.

20. A method according to any of claims 17 to 19, comprising the FPGA delaying packets between the first and second filters.

21 . A method according to any of claims 1 to 20, comprising, in determining whether a flow is of interest, processing only a subset of flows at any one time and periodically cycling between the subsets so that all flows are processed in turn.

22. A method according to claim 21 , comprising creating with the FPGA a checksum or signature of the packet header fields of incoming packets indicative of the identity of the flow, and based on a subset of one or more digits of the checksum or signature, processing packets having a particular value of the subset of one or more digits and periodically cycling between all of the

combinations of values of the subset so that each flow is processed in turn.

23. A method of load balancing in a field programmable gate array in processing packetised data, the method comprising:

receiving packetised data over an interface;

generating a signature or checksum based on one or more header fields in the packets;

based on a subset of one or more digits of the signature or checksum, processing packets having a particular value of the subset of one or more digits; cycling between all of the combinations of values of the subset in turn. 24. A method according to claim 23, wherein the header fields used in the checksum include at least source and destination IP address.

25 A method of monitoring data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

analysing the packets to determine whether the flows are of a target type and therefore of interest;

extracting the source IP address from flows determined to be of interest; forwarding packets for monitoring according to whether they contain the extracted source IP address;

performing further monitoring on the forwarded packets. 26. A method according to claim 25, wherein the analysing comprises performing variable position keyword or signature matching in the packets of data to determine the presence of an application protocol of interest and/or

generating statistics for the packets in the individual flows.

27. A method of processing data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

performing variable position keyword and/or signature matching in the packets of data indicative of the presence of an application protocol of interest or application protocol event of interest;

generating statistics for a plurality of packets in a particular flow;

comparing the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

determining a flow is of interest based on the degree of match in the statistics together with the results of the keyword and/or signature matching; extracting flow information for the flow of interest;

based on the flow information, causing packets belonging to the flow of interest or related flows to be processed differently from packets of other flows. 28. Apparatus for processing data in a network, the apparatus comprising: a network interface adapted to receive a stream of data packets representing different flows between different endpoints in the network;

a Field Programmable Gate Array configured to perform variable position keyword and/or signature matching in the data packets indicative of the presence of an application protocol of interest or application protocol event of interest; a memory storing software and a processor arranged to execute the software, the processor when executing the software being configured to:

receive the results of the keyword or signature matching; determine based on the results that a flow is of interest and extract flow information for the flow;

based on the flow information, cause packets belonging to a flow of interest or related flows to be processed differently from packets of other flows.

29. Apparatus for load balancing in processing packetised data, apparatus comprising:

an interface configured to receive packetised data;

a Field Programmable Gate Array configured to:

generate a signature or checksum based on one or more header fields in the packets;

based on a subset of one or more digits of the signature or checksum, process packets having a particular value of the subset of one or more digits;

cycle between all of the combinations of values of the subset in turn.

30. Apparatus for monitoring data in a network, the apparatus comprising: a network interface configured to receive a stream of data packets representing different flows between different endpoints in the network;

an analyser configured to analyse the packets to determine whether the flows are of a target type and therefore of interest and extract the source IP address from flows determined to be of interest;

a forwarder configured to forward packets for monitoring according to whether they contain the extracted source IP address.

31 . Apparatus for processing data in a network, the apparatus comprising: a network interface configured to receive a stream of data packets representing different flows between different endpoints in the network;

an analyser configured to:

performing variable position keyword and/or signature matching in the packets of data indicative of the presence of an application protocol of interest or application protocol event of interest;

generate statistics for a plurality of packets in a particular flow; compare the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

determine a flow is of interest based on the degree of match in the statistics together with the results of the keyword and/or signature matching;

extract flow information for the flow of interest;

based on the flow information, cause packets belonging to the flow of interest or related flows to be processed differently from packets of other flows.

Apparatus arranged to perform the method of any of claims 1 to 27.

Description:

Methods and Apparatus for Processing Data in a Network

The present invention relates to methods of processing data in a network, a method of load balancing in a field programmable gate array in processing packetised data, a method of monitoring data in a network and related apparatus. Preferred embodiments relate generally to methods and apparatus of identifying and selecting for further processing data traffic types carried in a packet data network using a field programmable gate array. The growth in internet traffic caused by increasing use of high bandwidth services including video causes problems for systems that need to monitor, filter or process traffic within the network before it reaches the end user or consumer of the data. As background, according to the Cisco® Visual Networking Index (VNI) published Jun 10, 2014:

Annual global IP traffic will surpass the zettabyte ( 1000 exabytes) threshold in 2016. Global IP traffic will reach 1. 1 zettabytes per year or 91.3 exabytes (one billion gigabytes) per month in 2016. By 2018, global IP traffic will reach 1.6 zettabytes per month.

Global IP traffic has increased more than fivefold in the past 5 years, and will increase threefold over the next 5 years. Overall, IP traffic will grow at a compound annual growth rate (CAGR) of 21 percent from 2013 to 2018.

Content delivery networks will carry over half of Internet traffic by 2018. Fifty-five percent of all Internet traffic will cross content delivery networks by 2018 globally, up from 36 percent in 2013. Global IP traffic will reach 1. 1 zettabytes per year.

Traffic from wireless and mobile devices will exceed traffic from wired devices by 2018. By 2018, wired devices will account for 39 percent of IP traffic, while Wi-Fi and mobile devices will account for 61 percent of IP traffic. In 2013, wired devices accounted for the majority of IP traffic at 56 percent. Global Internet traffic in 2018 will be equivalent to 64 times the volume of the entire global Internet in 2005. Globally, Internet traffic will reach 14 gigabytes (GB) per capita by 2018, up from 5 GB per capita in 2013.

Video Highlights

It would take an individual over 5 million years to watch the amount of video that will cross global IP networks each month in2018. Every second, nearly a million minutes of video content will cross the network by 2018.

Globally, IP video traffic will be 79 percent of all consumer Internet traffic in 2018, up from 66 percent in 2013. This percentage does not include video exchanged through peer-to-peer (P2P) filesharing. The sum of all forms of video (TV, video on demand [VoD], Internet, and P2P) will be in the range of 80 to 90 percent ofglobal consumer traffic by 2018.

Internet video to TV doubled in 2013. Internet video to TV will continue to grow at a rapid pace, increasing fourfold by 2018. Internet video to TV traffic will be 14 percent of consumer Internet video traffic by 2018, up from 11 percent in 2013. Consumer VoD traffic will double by 2018. The amount of VoD traffic by 2018 will be equivalent to 6billion DVDs per month.

Content delivery network traffic will deliver over half of all internet video traffic by 2018. By 2018, 67 percent of all Internet video traffic will cross content delivery networks, up from 53 percent in 2013.

For systems that need to monitor route or process IP traffic in the network, it is becoming increasingly difficult and non-commercially sustainable to continue to purchase additional capacity in both hardware and software.

One solution is to identify traffic types that are of high value to an end user or a network operator and then treat each differently. For example, traffic can be identified by user so that all the traffic for a specific individual is identified and grouped together for treatment, or traffic for a particular service is identified and grouped together, for example high definition paid for video content for a number of users. Treatment of the traffic can include but not limited to (i) routing to specific processing systems for analysis or (ii) discarding. Discarding certain unwanted traffic types reduces the traffic volume that is routed to monitoring and analysis tools. By routing certain traffic types and discarding certain traffic types monitoring and analysis hardware and software can be optimised and reduced.

Since video is forecast to utilise the largest bandwidth, for systems that need to process video separately there is a need to identify video streams quickly for separate treatment.

One example is in lawful Intercept and intelligence gathering where authorised agencies are interested in communications content between end users of interest. In such a scenario, video on demand is not of interest as this is generally fixed content that does not change between individuals. In this scenario, the video content is identified and then discarded.

Another example is quality assurance of the delivery of high definition paid for video content, where it is important for a content provider and network operator to understand if the quality of the delivered video stream meets the required paid for quality standard. In this scenario the video content is identified and then routed to quality analysis tools.

However, individuals may use video to communicate. Thus, identifying video content alone may not be sufficient. It may be necessary to distinguish between video content served as a video on demand service and video content served as a peer to peer video communication (such as a peer to peer video chat, in application video chat etc.). A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centres across the

Internet. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs serve a large fraction of the Internet content today, including web objects (text, graphics and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social networks.

Content providers such as media companies and e-commerce vendors pay CDN operators to deliver their content to their audience of end-users. In turn, a CDN pays ISPs, carriers, and network operators for hosting its servers in their data centres.

Identifying domain names of well-known content providers (for example netflix.com, amazonprime.com, hulu.com ... ) is insufficient to identify video streams, since the use of CDNs means that the actual source of the video data is likely to be on a server that may not be easily identifiable with known VoD provider identities. Accordingly there is a need for improved systems to monitor, filter or process traffic at the high speed data rates encountered in today's networks where existing techniques are becoming increasingly inadequate.

According to a first aspect of the present invention, there is provided a method of processing data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

determining based on the results that a flow is of interest and extracting flow information for the flow;

based on the flow information, causing with the software packets belonging to a flow of interest or related flows to be processed differently from packets of other flows. The method allows a Field Programmable Gate Array to carry out the variable position pattern matching to identify flows of interest. A flow indicates one or more packets of data that exist within a communications session between two uniquely identifiable end points. Searching for variable position patterns in received data packets is very processor intensive in a software only

implementation. Such implementations are currently incapable of processing packet data streams in excess of 10Gbps. Variable position pattern detection is implemented in FPGA to detect application protocols and protocol events and the like. In embodiments, as described below, this may be combined with generation of statistics of a communications session implemented in FPGA. Both may be combined to determine that a communications session is probably carrying certain content data types including but not limited to adaptive video and/or audio bit stream. It will be appreciated that not all packets in a particular flow need be analysed to determine that the flow is of interest. Once sufficient packets have been analysed to determine that a flow is of interest, i.e. relates to a particular data type, the flow can be marked as being of interest and thereon all packets in that flow or related flows can be treated accordingly. Implementing these techniques in FPGA allows identification of such content types in data streams running at 100Gbps or greater. At the same time, the software can provide more detailed analysis and decision making based on the functionality implemented in the FPGA.

The method can be used to identify flows of interest, i.e. flows containing user defined communications types for further treatment. Such types include but are not limited to non-personal communications streamed video content also known as video on demand or broadcast video. Other types that may be identified and processed include but are not limited to other streaming media such as voice, peer-to-peer traffic and botnet traffic. Flows of interest or related flows are given further treatment. Related flows are flows that share some attribute with the identified flow of interest, such as sharing a source IP address or destination IP address. Further treatment may be exclusion from monitoring for the purposes of lawful interception and intelligence gathering. Other treatment methods may be used such as but not limited to routing and content processing, used, for example by quality of experience monitoring systems. In an embodiment, processing differently comprises discarding packets or routing packets belonging to flows of interest or related flows. This functionality can be performed by a separate router configured by the software or by a filter of the FPGA.

In an embodiment, the method comprises extracting the source IP address for flows of interest and processing differently packets having that source IP address, wherein related flows are flows originating from that source IP address. Thus a determination can be made that a flow is of interest, e.g. of a particular type such as video on demand, and that any other flow originating from the same source can be assumed to be of a similar type and given similar treatment.

In an embodiment, the keywords or signatures used in matching include one or any combination of:

a specific application protocol at the beginning of a packet payload;

application protocol header fields;

parameters in the uniform resource indicator field;

a fully qualified host name or referrer field;

a host name address mapped to a known organisation.

The matching can simply check to see if fields are present in the packet or values or parameters in the fields can be extracted for further processing, e.g. further processing by the software, as part of confirming that the flow is of interest. For example, the matching might confirm the presence of keywords expected to be found in an HTTP GET request, which can immediately reduce the number of flows that need to be given further analysis.

In an embodiment, the method comprises generating statistics for a plurality of packets in a particular flow;

comparing the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

determining a flow is of interest based on the degree of match in the statistics together with the results of the keyword and/or signature matching. In an embodiment, the statistics comprise one or more of:

packet size;

separation of packets;

separation or existence of multiple segments.

Statistics can be generated for example by observing a flow of a known type and averaging the various metrics over a number of packets and/or flows to obtain expected values to be compared with newly observed flows of unknown types. Statistics can look at average values or variance in values. For example, video on demand might have packets of a predetermined size and separation with small variance and be delivered in multiple segments from a destination A to a destination B. This can be coupled with keyword matching, e.g. multiple HTTP GET requests with B as the source and A as the destination, indicative of a requests for segments video content, in determining a flow between A and B is of interest.

In an embodiment, a FPGA generates some or all of the statistics and passes the results to the software to determine a flow is of interest. Thus, the FPGA can be delegated the more computationally demanding tasks by the software and the software can analyse the results, such as comparing the statistics with the statistics for flows of known types, to find the degree of match.

In an embodiment, the method comprises creating with the FPGA a checksum or signature of the packet header fields of incoming packets indicative of the identity of the flow, the FPGA creating a flow record for each newly observed checksum or signature to store statistics for each flow. This provides a convenient way of identifying flows within the FPGA when gathering statistics for each flow.

In an embodiment, the method comprises sending the flow record to the software when a predetermined number of packets in a flow have been processed or after a predetermined time period has elapsed. Thus, only a subset of packets in a flow need be analysed in generating the statistics. In an embodiment, a heuristic process is used on the results of the keyword or signature matching and statistics matching to determine whether a flow is of interest. In an embodiment, the method comprises applying predetermined rules defining how flows or related flows are differently processed based on whether or not a flow is of interest. The rule can be to drop packets from further processing or route packets to an entity or process. In an embodiment, flows of interest are streamed video and/or audio.

In an embodiment, the method comprises making flows of interest available to a human operator to confirm that the flow is of interest. For instance, a filter of the FPGA can be programmed by the software to extract packets for a particular flow and to present these to a human user. The human user can confirm to the software that the flow is of the target type. In the case of a video/audio stream, this can be suitably decoded and displayed for the user to confirm that the stream is of interest. In an embodiment, the method comprises the FPGA passing protocol events or packets belonging to flows of interest to the software, the software determining the presence of multiple segments in the same flow and/or the gap between segments in the same flow and including the determination in the process of determining whether a flow is of interest. This can help the software gather further information about possible flows of interest that have been identified by the pattern/signature matching stage, such as specific events, such as the start of a user browsing session, which the software can use in confirming whether the flow is of interest. In an embodiment, the FPGA comprises at least a first filter arranged to perform said variable position keyword and/or signature matching and a second filter programmable by the software, the software, having determined a flow of interest, dynamically programming the second filter of the FPGA to filter packets relating to that flow or having the same source IP address as that flow. In an embodiment, the second filter additionally filters packets containing keywords or signatures corresponding to protocol events of interest in the flows of interest.

In an embodiment, the second filter results are returned to the software.

In an embodiment, the method comprises the FPGA delaying packets between the first and second filters. This allows for latency and delays in the software in determining a flow is of interest and programming the second filter in response so packets are not missed.

In an embodiment, the method comprises, in determining whether a flow is of interest, processing only a subset of flows at any one time and periodically cycling between the subsets so that all flows are processed in turn. This can be useful at high data rates by processing only a proportion of packets at any time in determining flows of interest. In effect, the incoming flows of packets are sampled so that a complete picture of the flows is built up over time. The further processing of flows of interest and related flows can operate on the complete, i.e. non-sampled, stream of packets based on the determination of flows of interest which is based on a sampled set of flows.

In an embodiment, the method comprises creating with the FPGA a checksum or signature of the packet header fields of incoming packets indicative of the identity of the flow, and based on a subset of one or more digits of the checksum or signature, processing packets having a particular value of the subset of one or more digits and periodically cycling between all of the combinations of values of the subset so that each flow is processed in turn.

According to a second aspect of the present invention, there is provided a method of load balancing in a field programmable gate array in processing packetised data, the method comprising:

receiving packetised data over an interface; generating a signature or checksum based on one or more header fields in the packets;

This provides a way of processing packets at high data rates to build up a picture of the overall flows present in the stream over time. In an embodiment, the header fields used in the signature or checksum include at least source and destination IP address. The signature or checksum may also include source and destination port and protocol.

According to a third aspect of the present invention, there is provided a method of monitoring data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

analysing the packets to determine whether the flows are of a target type and therefore of interest;

extracting the source IP address from flows determined to be of interest; forwarding packets for monitoring according to whether they contain the extracted source IP address;

performing further monitoring on the forwarded packets. This provides a way of excluding packets belonging to certain types of flow from further monitoring which can greatly reduce the processing demand. This is particularly important when monitoring packets at high data rates, where it may be impractical to monitor every packet. This aspect applies a pre-processing step by which certain flows of a target type are eliminated before the monitoring process. This recognises that certain servers may exclusively be the source of flows of a target content type and that therefore all flows from a particular server are not of interest for further monitoring. For instance, video on demand streaming content servers may be excluded from monitoring as the content is unlikely to contain personal communications between two users and therefore is no interest for intelligence monitoring.

In an embodiment, the analysing comprises performing variable position keyword or signature matching in the packets of data to determine the presence of an application protocol of interest and/or generating statistics for the packets in the individual flows.

According to a fourth aspect of the present invention, there is provided a method of processing data in a network, the method comprising:

receiving a stream of data packets over a network interface representing different flows between different endpoints in the network;

performing variable position keyword and/or signature matching in the data packets indicative of the presence of an application protocol of interest or application protocol event of interest;

generating statistics for a plurality of packets in a particular flow;

comparing the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

based on the flow information, causing packets belonging to the flow of interest or related flows to be processed differently from packets of other flows. This aspect combines the techniques of variable position keyword matching to find application protocols and events of interest indicative of flows of interest together with the collection of statistics relating to known flow types of interest and combining the techniques to obtain a better determination of whether or not a flow is of interest and should be treated differently. Advantageously, a FPGA can perform many of the highly intensive computational steps, such as the variable position matching and some of the statistics generation, the results of which can be passed to software to apply more sophisticated analysis of flows of potential interest and to make a determination of whether the flow is of interest. According to a fifth aspect of the present invention, there is provided apparatus for carrying out any of the methods described above.

According to a sixth aspect of the present invention, there is provided apparatus for processing data in a network, the apparatus comprising:

a network interface adapted to receive a stream of data packets

representing different flows between different endpoints in the network;

receive the results of the keyword or signature matching;

determine based on the results that a flow is of interest and extract flow information for the flow;

based on the flow information, cause packets belonging to a flow of interest or related flows to be processed differently from packets of other flows.

According to a seventh aspect of the present invention, there is provided apparatus comprising:

an interface configured to receive packetised data;

a Field Programmable Gate Array configured to:

generate a signature or checksum based on one or more header fields in the packets;

based on a subset of one or more digits of the signature or checksum, process packets having a particular value of the subset of one or more digits;

cycle between all of the combinations of values of the subset in turn.

According to an eight aspect of the present invention, there is provided apparatus comprising:

a network interface configured to receive a stream of data packets representing different flows between different endpoints in the network; an analyser configured to analyse the packets to determine whether the flows are of a target type and therefore of interest and extract the source IP address from flows determined to be of interest;

a forwarder configured to forward packets for monitoring according to whether they contain the extracted source IP address.

According to a ninth aspect of the present invention, there is provided apparatus comprising a network interface configured to receive a stream of data packets representing different flows between different endpoints in the network;

an analyser configured to:

performing variable position keyword and/or signature matching in the packets of data indicative of the presence of an application protocol of interest or application protocol event of interest;

generate statistics for a plurality of packets in a particular flow;

compare the statistics for the flow with the expected values for a flow of a known type and determining a degree of match;

determine a flow is of interest based on the degree of match in the statistics together with the results of the keyword and/or signature matching;

extract flow information for the flow of interest;

based on the flow information, cause packets belonging to the flow of interest or related flows to be processed differently from packets of other flows.

It will be appreciated that any features expressed herein as being provided "in one example" or "in an embodiment" or as being "preferable" may be provided in combination with any one or more other such features together with any one or more of the aspects of the present invention.

Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which:

Figure 1 shows the operation of a typical Video on Demand system;

Figure 1 a shows an example of apparatus for processing data on a packet switched network according to an embodiment of the present invention; Figure 2 shows schematically an example of a keyword matching according to an embodiment of the present invention; Figure 3 shows schematically an example of a load balancer according to an embodiment of the present invention;

Figure 4 shows schematically an example of statistics being generated in accordance with an embodiment of the present invention;

Figure 5 shows schematically an example of apparatus for processing data on a packet switched network according to an embodiment of the present invention.

Reference is now made to Figure 1 which shows a typical sequence of steps in the delivery of video content to a video client via a Content Distribution Network (CDN).

1 . Registration and payment

A subscriber 100 to a Video on Demand (VoD) service registers an account and provides payment information to a server located in a VoD Data Centre 101 . Once authorised the subscriber 100 will be given access to lists of available content. Registration, payment and browsing lists of hosted content is not bandwidth intensive. It is therefore of little interest to identify such activity. 2. Selection of video playback

For some VoD services, when a title is selected for play the first operation is to down load a soft (software) based player appropriate to the viewing device (phone, table, PC, set-top box, TV... ) from a VoD Control Server 102. Although downloaded for each different title played, the soft player is small so this is a low bandwidth activity compared to streaming and viewing actual video content and is therefore not of interest to be identified.

3. The user is then authenticated. Authentication takes place between the subscriber 100 and VoD Control Server 102 to ensure that the subscriber is permitted to access the requested content.

4. Manifest file fetch

After authentication, the player on the subscriber's device 100 fetches a manifest file from the control server 102. This manifest file contains information telling the video client 100 where to fetch the video content. Again, this is a low bandwidth activity and of little interest to most applications. Steps 3 and 4 are delivered over encrypted sessions, for example SSL 103. This makes extracting useful information to be used to identify video streams difficult and time consuming, especially at very high data rates, for example at 100Gbps and higher. 5. Video streaming starts according to the manifest file

Video streaming is controlled by instructions in the manifest file that the video client 100 downloads. The manifest file provides the player with information to conduct adaptive video streaming from a content server 105. Manifest files are client-specific and are generated according to each client's playback capability. For instance, if the user player indicates it is capable of rendering h.264 encoded video, h.264 format video is included in the manifest file. If the player indicates that it can only play back .wmv format, only .wmv format video is included.

The manifest file contains several key pieces of information including the list and priority/rank of the available CDNs content servers, location of trickplay data, video/audio chunk URLs for multiple quality levels, and timing parameters such as time-out interval and polling interval.

6. Audio and video chunk downloading

Audio and video contents are downloaded in chunks (also known as segments). Download sessions are more frequent at the beginning so as to build up the player buffer. Once the buffer is sufficiently filled, downloads become periodic. The manifest file contains multiple audio and video quality levels. For each quality level, it contains the URLs for individual CDNs.

7. Trickplay

Most players support pause, rewind, forward and random seek, collectively referred to as "trickplay". This is achieved by downloading a set of thumbnail images for periodic snapshots. The thumbnail resolution, pixel aspect, trickplay interval, and CDN from where to download the trickplay file are described in the manifest file.

8. Dynamic Adaptive Streaming

Most players support fetching and playing video and audio content at variable quality and bit rate allowing continuous viewing as network conditions (latency, congestion, available bandwidth) change. Examples include DASH (Dynamic Adaptive Streaming over HTTP) and Apple's HTTP Live Streaming (HLS) solution.

Adaptive streaming works by breaking the content into a sequence of small HTTP-based file segments 104, each segment containing a short interval of playback time of content that is potentially many hours in duration, such as a movie or the live broadcast of a sports event. The content is typically made available at a variety of different bit rates. As the content is played back, the client automatically selects from the alternatives the next segment to download and play back based on current network conditions. The client selects the segment with the highest bit rate possible that can be downloaded in time for play back without causing stalls or rebuffering events in the playback. Thus, an adaptive bit rate client can seamlessly adapt to changing network conditions, and provide high quality play back without stalls or rebuffering events. In the following text, flow indicates one or more packets of data that exist within a communications session between two uniquely identifiable end points. Each end point having an IP address. Figure 1A shows an example of an apparatus 10 for processing packetised data in a network according to an embodiment of the present invention. The apparatus 10 has an interface 1 1 for monitoring or "tapping" a node or interface of a communication network 50 carrying communication sessions in streams of 10OGbps or higher that may require treatment, for example for routing or for discarding.

One such traffic type may be non-personal video streaming (as shown in Figure 1 ). This content type is delivered in a communication session between two or more end points. The video stream may be delivered through dynamic adaptive bit rate streaming from a content delivery network.

However, it will be appreciated that the techniques disclosed are not limited to being used with video data. Other types that may be identified and processed include but are not limited to other streaming media such as voice, audio, peer- to-peer traffic and/or botnet traffic.

The apparatus 10 comprises a Field Programmable Gate Array (FPGA) 12 and a software component 13 stored in memory and executed by a processor. The FPGA and software component communicate with each other and cooperate to perform the various techniques described herein. For instance the FPGA 12 be provided together with the interface 1 1 on a PC card for plugging into to a local computer bus (e.g. PCI) of a computer (e.g. a PC), comprising a processor, memory, storage, input and display means, etc., which is programmed with appropriate software 13.

The FPGA 12 performs variable position keyword matching 17 within packet data to detect protocol events, such as HTTP GET, 200 OK, etc. Such keyword matching being implemented in FPGA in order to process data at 100Gbps or higher.

The FPGA 12 also performs generation of statistics 19 calculated from the observed characteristics of a number of sequential packets in the same flow. In the case of video streaming this is the flow of packets between two end points carrying one or more video chunks/segments, associated together using packet headers and transport protocol descriptors. Statistics include but are not limited to packet size, inter packet gap; inter chunk gap. Such statistics generation being implemented in FPGA in order to process data at 100Gbps or higher.

In embodiments, both keyword matching 17 and statistics generation 19 are performed. However, as will be apparent from the following disclosure, either of these techniques can be used individually. Optionally, the FPGA 12 also performs a load balancing 15 on packets received over the interface 1 1 before they are further processed to determine flows of interest. This may comprise use of a CRC or hashing algorithm on packet header information elements, including but not limited to source IP address, destination IP address, source port, destination port and Protocol to quickly associate packets with a particular communications session (flow) at 100Gbps. The generated CRC or hash, or a sub portion of the CRC or hash may then be used to select only some of the flows to be routed to either statistics generation or key word matching. The results of the keyword detection 17 and/or statistics generation 19 steps are passed to the software component 13 for analysis and to determine flows of interest. Different detection criteria can be combined in a heuristic manner in order to generate a probability that the communication to an endpoint is the target content type, for example delivery of video and or audio content to a player.

In one example, keyword matching may identify individual DASH content chunks. Multiple content chunks delivered to a single address may indicate use of an adaptive bit rate streaming method such as DASH and may therefore indicate that the associated stream is video on demand.

Optionally, once a stream is identified as a target content type, for example an adaptive video stream, a passive monitoring function 23 can provide the capability for presenting the content for inspection, e.g. on a display in the case of video content, in order for a human administrator to confirm that the identified communication stream is a valid or not.

If it is confirmed that the identified stream does contain the target content (for example video on demand) flow information such as the source IP address is extracted.

That source IP address is then marked as a source for the target content. All future communication sessions from that IP address are considered as being the same, e.g. VoD, and treated according to pre-set rules applied by block 28. Preset rules may include but are not limited to discarding packets 25 and routing 27 the packets. Such rules can be applied to the incoming stream of packets by a filter in the FPGA or by a separate router under the control of the software. Keyword Matching

Figure 2 shows in more detail an arrangement of functional blocks implemented in FPGA 12 for variable keyword matching suitable for use in the apparatus of Figure 1 a. The bounds of the functionality implemented in FPGA are shown by the dashed line 207 in Figure 2.

In this arrangement, IP traffic consisting of data packets is connected to the system through a physical interface 201 . These packets may be routed through a load balancer 204 that progressively selects only a sub-set of all of the flows for further processing.

Selected flows are then routed to a first, or front end filter 202. This filter is programmed to search for specific keywords and patterns that can identify specific application protocols such as HTTP. For instance, protocol events such as HTTP GET, 200 OK may be identified to indicate HTTP protocol packets. In this way, packets from other protocols, such as FTP, SMTP, etc. can be eliminated from further processing. The application protocol keyword or patterns can potentially be at any position in the packetised data, so variable position matching is preferably used. The keywords and patterns used to match packets against application protocols are stored within the FPGA for rapid access and fast programming. The FPGA scans through each packet received looking for the keywords and signatures at any position. Searching for fixed patterns and keywords at variable positions is very processor intensive if implemented in software only. In this way the FPGA is able to significantly speed up the detection and extraction of protocols.

Any packets that potentially contain an application protocol that may indicate a flow of interest such as HTTP are passed to the Host Software 205 outside the FPGA for further processing. Further processing may include further qualifying checks. This might include checking whether the packet contains indications of multiple different protocols which might indicate a false positive result. For example, where keyword matching has found the keyword "GET", indicating a possible HTTP protocol event, the further processing can examine the packet to determine whether in fact the packet contains an email protocol and the word "GET" happened to be used in the body of email message. If found, such false positive matches can be eliminated. Alternatively, the further processing may include further, more sophisticated checks on the data to positively confirm the packets are the required application protocol. As will be appreciated, the first filter stage 202 has significantly reduced the amount of data being passed to the software compared with the incoming data rate, meaning that the software has time to implement more sophisticated checks, if desired.

If this step confirms that the packet(s) delivered from the Filters 202 are the required application protocol, the software 205 extracts the addressing

parameters from the packet(s) to identify the specific flow, and the second stage Filters 206 in the FPGA are programmed by the software 205 to deliver packets only for the specified flow. This might for example be used to inspect a flow to enable a user to confirm that the flow is correctly classified, i.e. step 23 in Figure 1 B. Alternatively, the filters can be used to discard packets within the FPGA where a separate host based router is not used.

All packets may also be routed into a delay buffer, 203. This buffer allows for delays and latency in the Host Software 205. Thus, this gives time for the software 205 to confirm flows of interest and program the second stage filter 206 so that all packet in that flow are subject to filtering by the second stage filter 206.

When looking for specific application protocol events to identify specific user activity including but not limited to web browsing, streaming video, peer to peer within a data tunnel, the second stage filters 206 are also programmed with fixed protocol keyword matching so that if any match is identified, the Host Software 205 is notified of a specific application protocol event within the specific flow that was previously identified by Filters 202, e.g. the start of web browsing, the start of a new page, or start of video streaming etc. In this way, the software can program the second stage 206 filters to forward all packets in a session or just packets relating to a particular event in a session, i.e. the start of a video playback session. The front end filters 202 can optionally also be configured to filter specific IP packet header address field values in the packets (this filtering is commonly known as n-tuple) and deliver these packets to the software 205 for further processing. This can be useful if a particular source IP address has been identified as being likely to originate flows of interest. The software 205 can decide whether or not the delivered packets are of interest before programming the second stage filters 206 as appropriate.

In this way the software off loads to the FPGA both fixed and variable position signature and pattern scanning in each packet received, variable position scanning being very processor intensive if implemented in software alone. The FPGA also de-tunnels data in the event that the network traffic being monitored is in a tunnelled network, i.e. a GPRS network. The FPGA also delivers protocol events to the Host Software allowing for pre-programmed significant events, programmed as keywords and signatures in the FPGA, such as start of web browsing, to be delivered to Host Software.

This technique of offloading processing within the FPGA enables significantly higher data rates to be processed by the monitoring system, in excess of the 10Gbps achievable in current software implementations and beyond 100Gbps in current designs.

Keyword Matching Example

As will be appreciated, the matching is handled on a case-by-case basis, depending on the service. The keywords and signatures used will vary by territory and other factors, hence there is no one absolute rule. An example is given below for streaming video matching. In the case of streaming video, for example previously mentioned broadcast video or video on demand, the system is only concerned with the content delivery itself - the pre-authentication, authentication and handover steps are no interest largely because the final handover is opaque. To detect the video content delivery streams, a multi-stage evaluation is employed. An example of a first stage is to detect flows that contain an HTTP GET request from the client to the video content server. This can be determined, for example, by matching the following criteria: transport is TCP, destination port is 80, and the HTTP 'GET keyword is present at the very beginning of the packet's payload. Matches to other HTTP keywords used in HTTP headers are also detected so that components from HTTP headers can be easily extracted and used for subsequent analysis. As an illustration, an example from Netflix (RTM) is shown below: GET /range/27340475-

29035909?o=AQE20iLdXIQWUBPLs5PblEIG2i8KW3j9mRt_mvNEFUhNWZ dEzt 2b35XvemXyEezzhLKSskt4ESTU7UmhFWWTnflwolmPtFdPYu7TUrpGpq- i3ifCZGIodul__mg&v=3&e=1415741015&t=P09y_sLwG2Dd 48_FtjMScGeUrTM&r andom=232733573 HTTP/1 .1

Host: 185.2.223.145

User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:33.0) Gecko/20100101 Firefox/33.0 Accept: text/html, application/xhtml+xml,application/xml;q=0.9, ^*/ ^*;q=0.8

Accept-Language: en-US,en;q=0.5

Accept-Encoding: gzip, deflate Connection: keep-alive

The FPGA filter provides keyword matches on the terms "GET", "HTTP", "Host:", "User-Agent:", "Accept:", "Accept-Language:", "Accept-Encoding:", "Connection:".

The host in this example is a bare IP address, but in some instances a Fully Qualified Domain Name (FQDN) will be present. There may also be an HTTP referrer (e.g. www.netflix.com) present. Different combinations of header information, where present, identified by matching header keywords in FPGA, can be used to make a more confident identification.

The second stage with this process is to look at the URI pattern. With Netflix the following is expected:

1 / The URI follows the regex "/range/[[:digit:]]+-[[:digit:]]+"

21 The URI contains a base64 encoded parameter Ό' (authentication data) 3/ The URI contains a base64 encoded parameter (authentication token) 4/ The URI contains an integer parameter V (the protocol version)

5/ The URI contains an integer parameter 'e' (expiry timestamp - typically 8 hours advanced from the first HTTP request timestamp)

6/ The URI contains an integer parameter 'random'

The URI in the example above contains matches for all of the above factors, i.e.

/range/27340475- 29035909?o=AQE20iLdXIQWUBPLs5PblEIG2i8KW3j9mRt_mvNEFUhNWZdEz t 2b35XvemXyEezzhLKSskt4ESTU7UmhFWWTnflwolmPtFdPYu7TUrpGpq- i3ifCZGIodul__mg&v=3&e=1415741015&t=P09y_sLwG2Dd 48_FtjMScGeUrTM&r andom=232733573 If all of these factors match, then it is possible to say with reasonable confidence that the traffic in the flow being analysed is Netflix streaming video. If the HTTP referrer field is present, then this can confirm the match. As an additional and optional stage, it may be advantageous to identify the CDN provider (e.g. Level 3, Akamai, Amazon ... ) to allow all traffic from a specific CDN provider to be treated in the same way. In the example above, the information obtained that can be used to identify the CDN provider is an IP address. In some circumstances this may match a known allocated IP address range. In the example above this maps back to ipv4_1 .lagg0.c046.lhr004.ix.nflxvideo.net, which is owned by Netflix Corporation. In other cases we may have an FQDN in addition to the IP address. Using both of these factors we can try to identify the CDN provider.

Load Balancing

Figure 3 shows an arrangement of functional elements arranged in the FPGA to optionally load balance flows. A technique may be applied to progressively scan through portions of the l OOGbps data stream in order to sequentially process one part of the l OOGbps at any one time. As each part is processed, the system steps on to the next and so on, so that over a period of time a picture is built up across the compete l OOGbps data stream. This allows parts of the FPGA system that are not able to process the full l OOGbps data rate to operate below l OOGbps, but over a period of time have processed traffic from the complete l OOGbps but in steps. Thus, the stream is in effect sampled when identifying flows of interest, although all packets belonging to those flows are processed. Data is grouped into data packets 303 that enter the system. Each packet consists of a header part 301 and a payload. The header part contains defined Information Elements, lEs. Examples are shown in Figure 3, including

protocol/next header, source address, destination address, source port and destination port.

Some or all of the lEs are extracted from the header, including preferably at least the source and destination address, and fed into a CRC or hash calculating algorithm 302 that produces a binary number, the value of which is dependent on the values of the lEs in the packet header 301. Packets belonging to the same flow contain the same header lEs and will therefore result in the same CRC value as shown in 304. A second processing element 305 selects only certain flows based on the CRC value. In the example shown in Figure 3, the least two significant binary bits are used. Hence for Time = x+1 , flows matching xx1 1 will be selected, in this case Flow B. The selected flows are routed for further processing to 307.

The processing 305 progressively scans through sets of CRC values, and restarts at the beginning. In this way, all flows are processed, but only a sub-set at any one time. This allows the processing element 307 to operate at a lower throughput than the system input, 100Gbps, but fully process all of the flows once every CRC/hash detect value has been cycled through.

As an example implementation, if the algorithm 305 operates on 4 binary digits, this provides 2 ^Λ4 or 16 groups of flows. Hence 1/16th of the total flows will be processed at any one time. If the next stage processing 307 takes 1 s to process all of the flows routed to it, the system will be able to scan through all of the flows in the 100Gbps input in 16s. Generation of Flow Statistics in FPGA

Figure 4 illustrates techniques for generating statistics on each flow.

Certain functions are implemented in FPGA to process data at above 10Gbps.

The boundary of the FPGA is shown by the dotted line 406. Data packets sent into the system may be routed through Filters 401 . This filtering may restrict packets to those relating to application protocols of interest. For instance, in Figure 1 a, variable keyword matching can be used to detect flows of interest before statistics are generated on packets in those flows.

Alternatively or additionally, this filtering may perform load balancing as described previously.

A flow record 405 is maintained for each detected flow. Filtered packets are routed through functionality 402 that identifies each flows and checks for new flows. Flows are identified by calculating a hash/CRC on packet header parameters. A table of all detected flows and their hash/CRC and status is maintained as part of the flow records 405. The hash/CRC for each filtered packet is checked against those entries marked as being active as each packet arrives. If a new flow is detected, i.e. there is no active entry matching that hash/CRC value, a new flow record 405 is generated at block 404.

As each packet for a flow is received at block 403, measurements including but not limited to packet size and gap between this and last packet in that flow are taken. Measurements are written into the flow record 405.

Generally, the FPGA can recognise the start of a packet by identifying the signature pattern of bits signifying the start of the packet. The header fields, from which the hash/CRC is calculated, are at known locations relative to the start of the packet. The size of a packet can be determined by counting the number of bytes from the start of a packet to the start of the next packet. The gap between packets can be the number of bytes and or time delay between packets determined by maintaining a counter or timer value in the flow record for the arrival of the last packet in a flow, which can be compared with the counter or timer value for the arrival of the current packet.

Flow records can be marked as complete, i.e. non active, either after a fixed number of packets are received or after a timeout. To achieve this a record is kept in the flow record 405 as to the number of total packets received in a flow or a timer is initialised when a new flow is recognised.

Complete flow records are retrieved from the FPGA into the host software 407. A comparison algorithm 409 in the host software can then compare the measured statistics for a flow against those already prepared for known flow types. If a reasonable degree of match is found, it can be said with a degree of confidence that the unidentified flow is of the same type as the known flow. Thresholds of match may be configured in order to increase accuracy. Combination of Techniques

Figure 5 illustrates an example of how the different techniques may be combined in order to classify a flow from a 100Gbps packet data stream. 100Gbps packet data enters the system at 501 . Load balancing may be applied at 502 and as described in Figure 3 to progressively scan through the flows in a 100Gbps packet data stream.

All or a subset of the streams, depending on the functionality of 502 may be fed into statistics generation functions 503 and also keyword matching functions 504.

Any statistics and measurements generated by function 503 can be compared against statistics of known data flow types at function 510. Flows whose statistics match those of a known data flow type within a predetermined tolerance can be determined to be of that type. This can generate both a matching flow type and a degree of confidence or accuracy.

The software can also perform analysis 506 relating to multiple chunks or segments that are part of the same flow. The software looks at header information and protocol header fields such as session identifiers to associate different chunks together. Additional processing on protocol detection can look for patterns such as repeated HTTP GETs as used in DASH. As described in relation to Figure 2, the second stage filters 206 can be programmed by the software to forward to the software specific events for a flow of interest, such as GET events, which can be used for this analysis.

One or any combination of keyword matching 504, multiple segment detection 506 and statistics comparison 503 may be used to identify the type of a protocol flow.

Combinational logic 508 can be used to perform the analysis and to determine flows of interest 509. Different detection criteria can be combined in a heuristic manner in order to generate a probability that the communication to an endpoint is the target content type, for example delivery of video and or audio content to a player.

Embodiments of the present invention have been described with particular reference to the example illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the present invention.

Previous Patent: CONTAINER SIZING

Next Patent: MOUNTING ARRANGEMENT