Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD OF OBSERVING PACKETS IN A NETWORK AND A NETWORK NODE
Document Type and Number:
WIPO Patent Application WO/2019/020518
Kind Code:
A1
Abstract:
The present invention discloses a method, in a network node, of monitoring packets, the method comprising steps of: a) obtaining a basic key from a packet, the basic key comprising a first number a of parameters charactering the packet, the first number a being a natural number; b) determining a first specification comprising a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d < a; c) generating a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification; d) updating, a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm;

Inventors:
EINZIGER GIL (IL)
WAISBARD EREZ (IL)
Application Number:
PCT/EP2018/069787
Publication Date:
January 31, 2019
Filing Date:
July 20, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
H04L29/06; H04L12/26
Foreign References:
US7424489B12008-09-09
Other References:
RAN BEN BASAT ET AL: "Constant Time Updates in Hierarchical Heavy Hitters", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 21 July 2017 (2017-07-21), XP080778457, DOI: 10.1145/3098822.3098832
GRAHAM CORMODE ET AL: "Finding hierarchical heavy hitters in streaming data", ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (TKDD), ASSOCIATION FOR COMPUTING MACHINERY, INC, US, vol. 1, no. 4, 2 February 2008 (2008-02-02), pages 1 - 48, XP058336864, ISSN: 1556-4681, DOI: 10.1145/1324172.1324174
M. ALIZADEH; S. YANG; M. SHARIF; S. KATTI; N. MCKEOWN; B. PRABHAKAR; S. SHENKER: "pFabric: Minimal Near-optimal Datacenter Transport", ACM SIGCOMM, 2013, pages 435 - 446
A. R. CURTIS; J. C. MOGUL; J. TOURRILHES; P. YALAGANDULA; P. SHARMA; S. BANERJEE: "DevoFlow: Scaling Flow Management for Highperformance Networks", ACM SIGCOMM, 2011, pages 254 - 265
A. KABBANI; M. ALIZADEH; M. YASUDA; R. PAN; B. PRABHAKAR: "AFQCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers", IEEE HOTI, 2010, pages 58 - 65, XP031757020
T. BENSON; A. ANAND; A. AKELLA; M. ZHANG: "MicroTE: Fine Grained Traffic Engineering for Data Centers", ACM CONEXT, 2011, pages 8
P. GARCIA-TEODORO; J. E. DIAZ-VERDEJO; G. MACIA-FERNANDEZ; E. VAZQUEZ: "Anomaly-Based Network Intrusion Detection: Techniques, Systems and Challenges", COMPUTERS AND SECURITY, 2009, pages 18 - 28, XP025839371, DOI: doi:10.1016/j.cose.2008.08.003
L. YING; R. SRIKANT; X. KANG: "The Power of Slightly More than One Sample in Randomized Load Balancing", IEEE INFOCOM, April 2015 (2015-04-01), pages 1131 - 1139, XP033208380, DOI: doi:10.1109/INFOCOM.2015.7218487
M. ALIZADEH; T. EDSALL; S. DHARMAPURIKAR; R. VAIDYANATHAN; K. CHU; A. FINGERHUT; V. T. LAM; F. MATUS; R. PAN; N. YADAV: "CONGA: Distributed Congestion-aware Load Balancing for Datacenters", ACM SIGCOMM, 2014, pages 503 - 514
G. EINZIGER; R. FRIEDMAN: "TinyLFU: A Highly Efficient Cache Admission Policy", EUROMICRO PDP, 2014, pages 146 - 153, XP032584958, DOI: doi:10.1109/PDP.2014.34
Y. ZHANG; S. SINGH; S. SEN; N. DUFFIELD; C. LUND: "Online Identification of Hierarchical Heavy Hitters: Algorithms, Evaluation, and Applications", ACM IMC, 2004, pages 101 - 114
V. SEKAR; N. DUFFIELD; O. SPATSCHECK; J. VAN DER MERWE; H. ZHANG: "LADS: Large-scale Automated DDOS Detection System", USENIX ATEC, 2006, pages 16 - 16
M. MITZENMACHER; T. STEINKE; J. THALER: "Hierarchical Heavy Hitters with the Space Saving Algorithm", PROCEEDINGS OF THE MEETING ON ALGORITHM ENGINEERING & EXPERMIMENTS, SER. ALENEX, 2012, pages 160 - 174
G. CORMODE; F. KORN; S. MUTHUKRISHNAN; D. SRIVASTAVA: "Finding Hierarchical Heavy Hitters in Streaming Data", ACM TRANS. KNOWL. DISCOV. DATA, vol. 1, no. 4, February 2008 (2008-02-01), XP058336864, DOI: doi:10.1145/1324172.1324174
"Finding Hierarchical Heavy Hitters in Data Streams", VLDB, 2003, pages 464 - 475
L. JOSE; M. YU: "Online measurement of large traffic aggregates on commodity switches", USENIX HOT-ICE, 2011
G. CORMODE; F. KORN; S. MUTHUKRISHNAN; D. SRIVASTAVA: "Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-dimensional Data", SIGMOD, 2004, pages 155 - 166
J. HERSHBERGER; N. SHRIVASTAVA; S. SURI; C. D. T'OTH: "Space Complexity of Hierarchical Heavy Hitters in Multi-dimensional Data Streams", ACM PODS, 2005, pages 338 - 347, XP058233837, DOI: doi:10.1145/1065167.1065211
Y. LIN; H. LIU: "Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams", ADMA, SER. ADMA, 2007, pages 170 - 182, XP047296802, DOI: doi:10.1007/978-3-540-73871-8_17
P. TRUONG; F. GUILLEMIN: "Identification of heavyweight address prefix pairs in IP traffic", ITC, September 2009 (2009-09-01), pages 1 - 8, XP031552574
A. METWALLY; D. AGRAWAL; A. E. ABBADI: "Efficient Computation of Frequent and Top-k Elements in Data Streams", ICDT, 2005
S. MUTHUKRISHNAN: "Data Streams: Algorithms and Applications", FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, vol. 1, no. 2, 2005, pages 117 - 236
N. BANDI; A. METWALLY; D. AGRAWAL; A. EL ABBADI: "SIGMOD", 2007, ACM, article "Fast data stream algorithms using associative memories", pages: 247 - 256
E. D. DEMAINE; A. L'OPEZ-ORTIZ; J. I. MUNRO: "Frequency estimation of internet packet streams with limited space", EATCS ESA, 2002, pages 348 - 360
R. M. KARP; S. SHENKER; C. H. PAPADIMITRIOU: "A simple algorithm for finding frequent elements in streams and bags", ACM TRANSACTIONS DATABASE SYSTEMS, vol. 28, no. 1, March 2003 (2003-03-01), XP058099071, DOI: doi:10.1145/762471.762473
G. S. MANKU; R. MOTWANI: "Approximate frequency counts over data streams", VLDB, 2002
V. SIVARAMAN; S. NARAYANA; O. ROTTENSTREICH; S. MUTHUKRISHNAN; J. REXFORD: "Smoking out the heavy-hitter flows with hashpipe", CORR, vol. abs/1611, 2016, Retrieved from the Internet
P. BOSSHART; D. DALY; G. GIBB; M. IZZARD; N. MCKEOWN; J. REXFORD; C. SCHLESINGER; D. TALAYCO; A. VAHDAT; G. VARGHESE: "P4: Programming protocol-independent packet processors", SIGCOMM COMPUT. COMMUN. REV., vol. 44, no. 3, July 2014 (2014-07-01), pages 87 - 95, Retrieved from the Internet
R. BEN-BASAT; G. EINZIGER; R. FRIEDMAN; Y. KASSNER: "Optimal elephant flow detection", IEEE INFOCOM, 2017
"Randomized admission policy for efficient top-k and frequency estimation", IEEE INFOCOM, 2017
Attorney, Agent or Firm:
BERTHIER, Karine (FR)
Download PDF:
Claims:
Claims

1 . A method, in a network node (400), of monitoring packets, the method comprising steps of:

a) obtaining (S101 ) a basic key from a packet, the basic key comprising a first number a of parameters charactering the packet, the first number a being a natural number;

b) determining (S102) a first specification comprising a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d≤ a ;

c) generating (S103) a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification;

d) updating (S104), a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm.

2. A method according to claim 1 , further comprising steps of:

e) receiving a query request about the heavy hitter;

f) determining (S105), based on the Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents the heavy hitter.

3. A method according to claim 1 , further comprising steps of:

g) receiving a query request about a hierarchical heavy hitter;

h) determining whether there is a hierarchical heavy hitter based on the first results updated in step e).

4. A method according to claim 2, the first result being a frequency of the respective extended key,

step f) further comprising:

f1 ) determining, for a predetermined time period, for each of the extended keys, whether a frequency of the respective extended key is equal to or above a predetermined threshold Θ N , wherein, N represents a number of packets monitored during the predetermined time period, and Θ represents a threshold parameter given by an user or an operator;

f1 1 ) if so, determining the location of the aggregation indicated in the respective extended key as the heavy hitter.

5. A method according to claim 1 , wherein, the parameters comprised in the basic key is selected from the group consisting of source IP, source port, destination IP, destination port, protocol.

6. A method according to claim 1 , wherein, the first specification is indicated by a user or an operator of the network.

7. A method according to claim 1 , wherein, the extended key comprises the modified basic key, and in the modified basic key, bits from the least significant bit to the location indicated in the extended key are unified.

8. A method according to claim 7, wherein, in the modified basic key comprised in the extended key, bits from the least significant bit to the location indicated in the extended key are zeroed.

9. A method according to claim 1 , for ith entry comprised in the first specification,

1 < i≤ d , the parameter comprising br bits, the granularity indicating that the parameter can be modified on a resolution of i bits, the third number H is determined as:.

rounded up to the nearest integer value.

10. A network node (400), configured to:

a) obtain (S101 ) a basic key from a packet, the basic key comprising a first number of parameters charactering the packet, the first number a being a natural number; b) determine (S102) a first specification comprising a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d≤ a ;

c) generate (S103) a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification;

d) update (S104), a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm.

1 1 . A network node (400) according to claim 10, further configured to:

e) receive a query request about the heavy hitter;

f) determine (S105), based on the Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents the heavy hitter.

12. A network node (400) according to claim 10, further configured to:

g) receive a query request about a hierarchical heavy hitter;

h) determine whether there is a hierarchical heavy hitter based on the first results updated in step e).

13. A network node (400) according to claim 10, the first result being a frequency of the respective extended key,

the network node (400) being further configured to:

f1 ) determine, for a predetermined time period, for each of the extended keys, whether a frequency of the respective extended key is equal to or above a predetermined threshold Θ N , wherein, N represents a number of packets monitored during the predetermined time period, and Θ represents a threshold parameter given by an user or an operator;

f1 1 ) if so, the network node being further configured to determine the location of the aggregation indicated in the respective extended key as the heavy hitter.

14. A network node (400) according to claim 10, wherein, the parameters comprised in the basic key is selected from the group consisting of source IP, source port, destination IP, destination port, protocol.

15. A network node (400) according to claim 10, wherein, the first specification is indicated by a user or an operator of the network.

Description:
A method of observing packets in a network and a network node

Field of the invention

The invention relates to communication technology, in particular to a method of observing packets in a network.

Background

Network measurements are essential for a variety of network functionalities such as traffic engineering, load balancing, quality of service, caching, anomaly and intrusion detection [1 ]-[8]. A major challenge in performing and maintaining network measurements comes from rapid line rates and the large number of active flows.

Previous works suggested identifying Heavy Hitter (HH) flows that account for a large portion of the traffic. Indeed, approximate HH are used in many functionalities and can be captured quickly and efficiently. However, applications such as anomaly detection and Distributed Denial of Service (DDoS) attack detection require more sophisticated measurements [9], [10]. In such attacks, each device generates a small portion of the traffic but their combined volume is overwhelming. HH measurement is therefore insufficient as each individual device is not a heavy hitter.

Hierarchical Heavy Hitters (HHH) account aggregates of flows that share certain Internet protocol (IP) prefixes. The structure of IP addresses implies a prefix based hierarchy as defined more precisely below. In the DDoS example, HHH can identify IP prefixes that are suddenly responsible for a large portion of traffic and such an anomaly may very well be a manifesting attack. Further, HHH can be collected in one dimension, e.g. , a single source IP prefix hierarchy, or in multiple dimensions, e.g., a hierarchy based on both source and destination IP prefixes. HHH measurement can be done according to multiple packet header fields. In this patent, term "dimension" of an HHH measurement is the number of header fields that participate in the measurement. Specifically, dimension is used to represent the number of parameters that are considered while analysing aggregation. The nature of the hierarchy changes from system to system, in devices near edge servers it often make sense to only analyze aggregates based on the source I P field of packets. Alternatively, in backbone links it is useful to perform a two-dimensional source/destination analysis. Such an analysis may indicate a manifesting attack against an edge server. In this patent, term "flavor" of the HHH measurement provides the following details about the measurement: a. which fields participate in the HHH analysis; b. the structure of hierarchy for each field. Specifically, flavor is used to indicate the parameters that are considered while analysing aggregation and the granularity corresponding to the respective parameter. Term "granularity" refers to the structure of the hierarchy in a specific field. Specifically, it indicates the resolution of aggregation.

The main difficulty of performing HHH measurement in network devices is that current HHH algorithms use different data structures for each flavor of the HHH measurement. For example, we require different data structure for one and two dimensional hierarchies. We also require different structures depending if the measurement is done in bit or byte granularity. This dependence between the performed measurement and the used data structures prevents reuse of memory for multiple measurement configurations and requires hardware vendors to hard code a measurement configuration in their devices. Thus, HHH measurements are difficult to perform in current technology.

The object of the present invention is to improve the memory efficiency and allow to reuse the same hardware for multiple measurement tasks.

Summary of the Invention

The object of the invention is achieved by the method in the claims.

According to one aspect of the invention, there is provided a method, in a network node, of monitoring packets, the method comprising steps of: a) obtaining a basic key from a packet, the basic key comprising a first number a of parameters charactering the packet, the first number a being a natural number; b) determining a first specification comprising a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d≤a , c) generating a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification; d) updating, a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm.

In a preferred embodiment, the method further comprises a step of: e) receiving a query request about a heavy hitter; f) determining, based on a Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents a heavy hitter.

In a preferred embodiment, the method further comprises a step of: g) receiving a query request about a hierarchical heavy hitter; h) determining whether there is a hierarchical heavy hitter based on the first results updated in step e).

In a preferred embodiment, the first result being a frequency of the respective extended key, step f) further comprising: f1 ) determining, for a predetermined time period, for each of the extended keys, whether a frequency of the respective extended key is equal to or above a predetermined threshold Θ N , wherein, N represents a number of packets monitored during the predetermined time period, and Θ represents a threshold parameter given by an user or an operator; f1 1 ) if so, determining the location of the aggregation indicated in the respective extended key as the heavy hitter.

In a preferred embodiment, the parameters comprised in the basic key is selected from the group consisting of source IP, source port, destination IP, destination port, protocol. In a preferred embodiment, the first specification is indicated by a user or an operator of the network.

In a preferred embodiment, in the modified basic key comprised in the extended key, bits from the least significant bit to the location indicated in the extended key are unified.

In a preferred embodiment, in the modified basic key comprised in the extended key, bits from the least significant bit to the location indicated in the extended key are zeroed.

In a preferred embodiment, for i th entry comprised in the first specification, 1 < i≤ d , the parameter comprising bits, the granularity indicating that the parameter can be modified on a resolution of i bits, the third number H is determined as:.

rounded up to the nearest integer value.

According to another aspect of the present invention, there is provided a network node, configured to: a) obtain a basic key from a packet, the basic key comprising a first number a of parameters charactering the packet, the first number a being a natural number; b) determine a first specification comprising a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d < a c) generate a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification; d) update, a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm. In a preferred embodiment, the network node is further configured to: e) receive a query request about a heavy hitter; f) determine, based on a Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents a heavy hitter.

In a preferred embodiment, the network node is further configured to: g) receive a query request about a hierarchical heavy hitter; h) determine whether there is a hierarchical heavy hitter based on the first results updated in step e).

In a preferred embodiment, the first result is a frequency of the respective extended key, and the network node is further configured to: f1 ) determine, for a predetermined time period, for each of the extended keys, whether a frequency of the respective extended key is equal to or above a predetermined threshold Θ N , wherein, N represents a number of packets monitored during the predetermined time period, and Θ represents a threshold parameter given by an user or an operator; f1 1 ) if so, the network node being further configured to determine the location of the aggregation indicated in the respective extended key as the heavy hitter.

In a preferred embodiment, the parameters comprised in the basic key is selected from the group consisting of source IP, source port, destination IP, destination port, protocol.

In a preferred embodiment, the first specification is indicated by a user or an operator of the network.

According to the present invention, each HHH update operation is transformed into multiple simpler HH updates. Then, the common HH infrastructure is modified with all the simpler HH updates.

The present invention allows every flavor of hierarchical heavy hitter measurement as well as plain heavy hitters to be performed using the same data structure and thus conserve space and make HHH analysis practical to implement in network devices. It also adds the unique ability to define the measurement flavor at runtime. The same hardware is utilized for any of these measurement tasks. This enables hardware to offer a programmable measurement framework. It improves the flexibility and efficiency of HHH measurement.

Brief description of the figures

The features and advantages of the invention will be more completely understood by appreciating the following detailed description of preferred embodiments with reference to the figures, wherein

Fig. 1 depicts a schematic flow chart of the method according to an embodiment of the present invention;

Fig. 2 depicts an example of locations of aggregation according to an embodiment of the present invention;

Fig. 3 depicts an exemplary Algorithm 1 according to an embodiment of the present invention;

Fig. 4 depicts a schematic block diagram of a network node 400 according to one embodiment of the invention.

Detailed description

Prior to the detailed description of the embodiments of the present invention, some general definitions will be given as below.

IP addresses are considered to form a hierarchical domain with either bit or byte size granularity. Fully specified IP addresses are the lowest level of the hierarchy and can be generalized. We use U to denote the domain of fully specified items. For example, 181 .7.20.6 is a fully specified IP address, and 181 .7.20.* generalizes it by a single byte. Similarly, 181 .7.* generalizes it by two bytes and formally, a fully specified IP address is generalized by any of its prefixes. The parent of an item is the longest prefix that generalizes it. In two dimensions, we consider a tuple containing source and destination I P addresses. A fully specified item is fully specified in both dimensions. For example, (<181 . 7. 20. 6>→<208. 67. 222. 222>) is fully specified. In two dimensional hierarchies, each item has two parents, e.g., (<181 . 7. 20.*>→<208. 67. 222. 222>) and (<181 . 7. 20. 6>→<208. 67. 222. *>) are both parents to (<181 . 7. 20. 6>→ <208. 67. 222. 222>).

Definition 1 : Generalization.

For two prefixes p , </, we denote V - Ί if in any dimension it is either a prefix of q or is equal to q. We also denote the set of elements that are generalized by p with H p = {< > € j r p} ^ anc | those generalized by a set of prefixes P by

HP u p€ p H p _ | f p t , and p≠q > we denote p q.

In a single dimension, the generalization relation defines a vector going from fully generalized to fully specified. In two dimensions, the relation defines a lattice where each item has two parents. An example of a two dimensional lattice will be described later with respect to Fig. 2.

Definition 2: Given a prefix p and a set of prefixes P , we define as the set of prefixes:

{h : h€ P, h -< p, h' e P s.t. h - h' - p}

Intuitively, .g., let p = 142.

contains < 142.14.13.* >

We consider a stream S, where at each step a packet of an item e arrives. Packets belong to a hierarchical domain of size H , and can be generalized by multiple prefixes as explained above. Given a fully specified item e , f e is the frequency of e in S. Definition 3 extends this notion to prefixes. Definition 3: Frequency

Given a prefix p , the frequency of is:

The current number of packets in the stream is denoted N. Our goal is to identify the hierarchical heavy hitter prefixes whose frequency is above the threshold φ■ N ) . However, if the frequency of a prefix exceeds the threshold then so is the frequency of all its ancestors. For compactness, we are interested in prefixes whose frequency is above the threshold due to non HHH siblings. This motivates the definition of conditioned frequency C{p\P) . Intuitively, C{p\P) measures the additional traffic prefix p adds to a set of previously selected HHHs ( P ), and it is defined as follows.

Definition 4: Conditioned frequency

The conditioned frequency of a refix p with respect to a prefix set P is:

Cp\P is derived by subtracting the frequency of fully specified items that are already generalized by items in P from p 's frequency (f p ). In two dimensions, exclusion inclusion principles are used to avoid double counting.

Definition 5: An algorithm solves the (e, < 5) - FREQUENCY ESTIMATION problem if for any prefix (x), it provides f x s.t.:

Definition 6: Heavy hitter (HH)

Given a threshold (Θ ) a fully specified item ( e ) is a heavy hitter if its frequency CO is above the threshold: Θ - Ν , i.e., f e ≥Θ· Ν . We now continue and describe how exact hierarchical heavy hitters are found. To that end, partition the hierarchy to levels as explained in Definition 7.

Definition 7: Hierarchy Depth

Define L, the depth of a hierarchy, as follows: Given a fully specified element e , we consider a set of prefixes such that: e -< p l -< p 2 , ... -< p L where e ≠ p l ≠ p 2 ≠ ...≠ p L and L is the maximal size of that set. We also define the function level (p) that given a prefix p returns p 's maximal location in the chain, i.e., the maximal chain of generalizations that ends in p .

To calculate exact heavy hitters, we go over fully specified items (level o) and add their heavy hitters to the set HHHo. Using HHHo, we calculate conditioned frequency for prefixes in level 1 and if C p ^ H > #- Nwe add p to HHHi . We continue this process until the last level ( I ) and the exact heavy hitters are the set HHH L . Next, we define HHH formally.

Definition 8: Hierarchical HH (HHH)

The set HHHo contains the fully specified items e s.t. f e > Θ·Ν . Given a prefix p from level (i) , 0≤l≤L , we define:

HHH / = HHH / l > Θ

· Ν)\.

The set of exact hierarchical heavy hitters ΗΗΗ is defined as the set HHH L .

Finding exact hierarchical heavy hitters requires plenty of space. Indeed, even finding exact (non hierarchical) heavy hitters requires linear space [20]. Such a memory requirement is prohibitively expensive and motivates finding approximate

HHHs.

Definition 9 ( ε ,δ) approximate HHH: An algorithm solves (ε,δ) - APPROXI MATE H IERARCHICAL HEAVY HITTERS if after processing any stream S of length N, it returns a set of prefixes ( P ) that satisfies the following conditions:

Accuracy: for every prefix p≡P, < sN .

Coverage: for every prefix q £ p ·. < ΘΝ .

Approximate HHH are a set of prefixes ( P ) that satisfies accuracy and coverage; there are many possible sets that satisfy both these properties. Unlike exact HHH, we do no require that for p≡P , C p ^ p > ΘΝ . Unfortunately, if we add such a requirement then [16] proved a lower bound of Ω(— ^-) space, where d is the θ

number of dimensions. This is considerably more space than is used in the present invention (—) that when Θ∞∞ is also— .

ε Θ

The present invention may be implemented as software, firmware or hardware in a network node. In one embodiment, the network node may be a router or part of the data plane of the router. In other embodiments, the present invention may be implemented in other location in the network, for example: an entity that is communicatively connected to a router. In the following, the present invention will be described as software in a router.

Fig. 1 depicts a schematic flow chart of the method according to an embodiment of the present invention.

In step S1 01 , a basic key B is obtained from a packet. The basic key comprises a first number a of parameters charactering the packet, the first number a being a natural number. In one embodiment, the parameters comprised in the basic key is selected from the group consisting of source I P, source port, destination I P, destination port, protocol.

Normally, parameters charactering the packet are included in the packet's header fields. A router may extract these according to the requirement of a user or an operator. In one embodiment, the first number a = 5 , and the basic key is expressed using a 5-tuple of the form (srclP, srcPort, dstIP, dstPort, protocol). In the

embodiment shown in Fig. 1 , the first number a = 1 , the basic key comprises only one parameter, i.e., the source IP. In another embodiment, the first number a = 2 , the basic key comprises the source IP and the destination IP.

In step S102, a first specification is determined. The first specification may be indicated by a user or an operator of the network. The first specification comprises a second number d of entries. Each entry indicates a parameter comprised in the basic key, and a granularity corresponding to the parameter. The second number d reflects the dimension of the measurement and is a natural number and not bigger than the first number d≤a .

In one embodiment, each entry may be expressed as <id, granularity >, with id indicating a parameter comprised in the basic key, and the granularity defines the resolution of aggregation in the parameter indicated by id. In one embodiment, granularities are given as natural numbers that specifies how many bits constitute each hierarchy level. For example, if the field is IP source or destination addresses, the granularity is often 8 bits (one byte) as it is common to parse IP addresses in that manner. However, if the field is source (or destination) port, the granularity is expected to be 32 bits which indicates that the entire header is single domain without hierarchical order (in IPv4, port number is a 32 bit long).

In the embodiment shown in Fig. 1 , the first specification comprises only one entry of the form <SourcelP, 8 bits>, d = 1 . The measurement is carried out based on an analysis of the source IP and the source IP address may be generalized every 8 bits. In another embodiment, the basic key comprises 5 parameters, and is expressed as (srclP, srcPort, dstIP, dstPort, protocol). The user specification may comprise two entries, <SourcelP, 8 bits> and <DestinationlP, 8 bits>, d = 2 . In that case, the measurement is carried out based on an analysis of the source IP and the

destination IP, while the source I P address and the destination IP address both may be generalized every 8 bits. In one embodiment, granularity=1 means that a * can be placed in every bit of the field, while granularity = 32, means that it can be placed in 4 byte intervals. In the present invention, placing * into a field means that bit(s) from the least significant bit to the location of * in that field is(are) of arbitrarily value.

In one embodiment, granularity=32 may be used to include ports in the flavor. In that case we either provide a specific port to the measurement, or leave the entire port field as *. A skilled person shall understand, the value of the granularity is not limited to the given example. In another embodiment, granularity= 8 means that a * can be placed in every byte of the field. The number of bits in the granularity is logarithmic in the maximal field size. For example, for I Pv4, 5 bits are sufficient and for IPv6, 6 bits are enough.

In step S103, a third number H of extended keys are generated for the packet based on the basic key and the first specification. Each extended key comprises the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other. The third number H indicates a number of all possible locations of the aggregation according to the first specification.

In one embodiment, for i th entry comprised in the first specification, 1 < i≤ d , the parameter comprising b r bits, the granularity indicating that the parameter can be generalized on a resolution of i bits, the third number H is determined as:.

where the notation | x | means that x is rounded up to the nearest integer value.

Fig. 2 shows an example of locations of aggregation according to an embodiment of the present invention. In the embodiment shown in Fig. 2, the basic key comprises two parameters, i.e. the source I P and the destination IP. The user specification comprises two entries <SourcelP, 8 bits> and <Destinationl P, 8 bits>. destination IP are respectively 32 bits, therefore, there are

25 possible locations of aggregation. All of them are

shown in Fig. 2.

In Fig.2, each lattice node is generalized by all nodes that are upper or more to the left. The most generalized node (*,*) is called fully general and the most specified node (s1 .s2.s3.s4; d1 .d2.d3.d4) is called fully specified. The parents of each node are directly above it and directly to the left. The third number H is the hierarchy's size defined as the number of nodes in the lattice.

In another embodiment, in IPv4, byte level one dimensional hierarchies imply H=5 as each IP address is divided into four bytes and querying * is also allowed.

The location of the * in Fig.2 suggests the location of generalization. The location of generalization shall be encoded and incorporated into the extended key. In the following, an exemplary method of encoding the location of generalization is described with respect to Fig.2.

In Fig.2, the two * locations need to be encoded. Since IP addresses are at most 64 bit long, the most general application in the context of the present invention requires additional log 2 (64 ) = 6 bits per field. Thus, if the basic key is of the form

< sourcelP , destinatio nIP > ,

an encoding of the form

< sourcelP , destinatio nIP , destOffset , srcOffset > is used. The location of the * is given in the additional offset fields. If the field has no star, then an offset of -1 is used.

In a preferred embodiment, bits that are after the star are zeroed to preserve the semantics. For example the prefix (*,*) is encoded as < sourcelP , destinationIP, 0,0 > .

If we do not zero out the relevant bits of sourcelP and destinationIP, then different packets will have a different (*,*) encoding. Once they are zeroed, the encoding of all (*,*) prefixes becomes:

< 0.0.0.0, 0.0.0.0,0,0 > .

Using this encoding technique a unique key is generated for each of the update operations. The additional memory for key storage grows logarithmic with each of the dimensions.

In another embodiment, bits from the least significant bit to the location indicated in the extended key may be unified in other possible manner, for example, all bits from the least significant bit to the location indicated in the extended key may be replaced with "1 ".

A skilled person shall understand, the method used to encode the locations of generalization is not limited to the given example, other methods can also be alternatively or additionally used, as long as the locations of generalization can be recovered later from the extended keys.

Generally speaking, given a user specification S p , and a basic key b, extended keys may be generated. In one embodiment, the extended keys are of the form

Where * is the location of the * in field i of the basic key, 1 < i≤ a . If the user specification places no * in field i , we encode * =-1. We denote by ]¾ the generated set that includes all the combinations of extended keys for b with all possible locations of * according to the user specification.

In step S104, a first result related to heavy hitter determination is updated for each of the extended keys based on a Heavy Hitter algorithm. In one embodiment, in step S104, the hierarchical heavy hitter update is broken into | ]¾ | plain heavy hitter update that are all performed on the underlying heavy hitters algorithm. In another embodiment, the first result is an estimated (or accurate] frequency of the respective extended key.

Specifically, in step S104 the HH algorithm receives ]¾ and performs an update for every extended key in ]¾. Any HH algorithm that satisfies Definition 5 can be used in the present invention.

In one embodiment, Space Saving [19], a popular (non hierarchical) heavy hitters algorithm is used in step S104. The algorithm was realized in Ternary content- addressable memory (TCAM) memories are widely used in packet switching and allow parallel pattern search [21 ]. TCAM is expensive and therefore limited in switches. The present invention allows reusing the same TCAM for multiple flavors of measurements.

Other algorithms that satisfy the requirement include [22]- [24]. As far as hardware implementations goes, HashPipe [25] implements heavy hitter functionality in P4 [26] (where P4 is an open source language that is used to describe the data flow of network devices). Alternatively or additionally, the work of [27] suggests a method to identify heavy hitters using two simple hash tables. Alternatively, the work of [28] suggests a hardware implementation of k-way set associative memory. Such memory is extensively used to construct hardware caches and hash tables.

In one embodiment, upon receiving a query request about a heavy hitter, it is determined in step S105, based on a Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents a heavy hitter.

Specifically, it is determined, for a predetermined time period, for each of the extended keys, whether a frequency of the respective extended key is equal to or above a predetermined threshold Θ N , wherein, N represents a number of packets monitored during the predetermined time period, and Θ represents a threshold parameter given by an user or an operator. If so, the location of the aggregation indicated in the respective extended key may be determined as the heavy hitter. In another embodiment, upon receiving a query request about a hierarchical heavy hitter, it is determined whether there is a hierarchical heavy hitter based on the first results updated in step S104.

Fig. 3 shows an exemplary Algorithm 1 according to an embodiment of the present invention.

In Fig.3, the HHH query method is exemplified in Algorithm 1 Line 1 , method query. In this method, we start with fully specified heavy hitter prefixes and declare them as heavy hitters. Then, we consider prefixes that directly generalize fully specified items and calculate their conditioned frequency with regard to the already selected items. The process is repeated until the hierarchy is exhausted.

Fig. 4 shows a schematic block diagram of a network node 400 according to one embodiment of the invention.

The network node 400 is configured to obtain a basic key from a packet via a first interface 401. The basic key comprises a first number a of parameters charactering the packet, the first number a being a natural number. In the embodiment shown in Fig.4, the network node 400 is configured to obtain a first specification via a second interface 402. In another embodiment, the first specification may be preconfigured in the network node 400. The first specification comprises a second number d of entries, each entry indicating a parameter comprised in the basic key, and a granularity corresponding to the parameter, the second number d being a natural number and not bigger than the first number d≤a .

The network node 400 is configured to generate a third number H of extended keys for the packet based on the basic key and the first specification, each extended key comprising the basic key or a modified basic key and an indication of a location of an aggregation, the location of the aggregation indicated in each extended key being different from each other, the third number H indicating a number of all possible locations of the aggregation according to the first specification. The network node 400 is further configured to update, a first result related to heavy hitter determination for each of the extended keys based on a Heavy Hitter algorithm.

The network node 400 is further configured to receive a query request via a third interface 403. In one embodiment, the query request is about a heavy hitter. The network node 400 is further configured to determine based on a Heavy Hitter algorithm, for each of the extended keys, whether the respective extended key represents a heavy hitter.

In another embodiment, the query request is about a hierarchical heavy hitter. The network node 400 is further configured determine whether there is a hierarchical heavy hitter based on the first results updated based on the Heavy Hitter algorithm.

The results of the determination are output via the fourth interface 404.

List of symbols

Symbol Meaning

First number

d Second number, dimension of the

measurement

H Third number, hierarchy's size

S Stream

User specification

S P

N Current number of packets (in all flows)

U Domain of fully specified items ε Accuracy parameter

δ Confidence parameter

θ Threshold parameter

Conditioned frequency of q with respect to P

Frequency of prefix q

Upper, lower bound for f q

List of cited references

[I ] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S.

Shenker, "pFabric: Minimal Near-optimal Datacenter Transport," ACM SIGCOMM, pp. 435-446, 2013.

[2] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S.

Banerjee, "DevoFlow: Scaling Flow Management for Highperformance Networks," in ACM SIGCOMM, 201 1 , pp. 254-265.

[3] A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar, "AFQCN:

Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers," in IEEE HOTI, 2010, pp. 58-65.

[4] T. Benson, A. Anand, A. Akella, and M. Zhang, "MicroTE: Fine Grained Traffic Engineering for Data Centers," in ACM CoNEXT, 201 1 , p. 8.

[5] P. Garcia-Teodoro, J. E. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez, "Anomaly-Based Network Intrusion Detection: Techniques, Systems and

Challenges," Computers and Security, pp. 18-28, 2009.

[6] L. Ying, R. Srikant, and X. Kang, "The Power of Slightly More than One Sample in Randomized Load Balancing," in IEEE INFOCOM, April 2015, pp. 1 131-1 139.

[7] M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese, "CONGA: Distributed Congestion-aware Load Balancing for Datacenters," in ACM SIGCOMM, 2014, pp. 503-514.

[8] G. Einziger and R. Friedman, "TinyLFU: A Highly Efficient Cache Admission Policy," in Euromicro PDP, 2014, pp. 146-153.

[9] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, Online Identification of Hierarchical Heavy Hitters: Algorithms, Evaluation, and Applications," in ACM IMC, 2004, pp. 101-1 14.

[10] V. Sekar, N. Duffield, O. Spatscheck, J. van der Merwe, and H. Zhang, "LADS: Large-scale Automated DDOS Detection System," in USENIX ATEC, 2006, pp. 16- 16.

[I I ] M. Mitzenmacher, T. Steinke, and J. Thaler, "Hierarchical Heavy Hitters with the Space Saving Algorithm," in Proceedings of the Meeting on Algorithm Engineering & Expermiments, ser. ALENEX, 2012, pp. 160- 174.

[12] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, "Finding Hierarchical Heavy Hitters in Streaming Data," ACM Trans. Knowl. Discov. Data, vol. 1 , no. 4, pp. 2: 1-2:48, Feb. 2008. [13] , "Finding Hierarchical Heavy Hitters in Data Streams," in VLDB, 2003, pp.

464-475.

[14] L. Jose and M. Yu, Online measurement of large traffic aggregates on commodity switches," in USENIX Hot-ICE, 201 1 .

[15] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, "Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-dimensional Data," in SIGMOD, 2004, pp. 155-166.

[16] J. Hershberger, N. Shrivastava, S. Suri, and C. D. T ' oth, "Space Complexity of Hierarchical Heavy Hitters in Multi-dimensional Data Streams," in ACM PODS, 2005, pp. 338-347.

[17] Y. Lin and H. Liu, "Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams," in ADMA, ser. ADMA, 2007, pp. 170- 182.

[18] P. Truong and F. Guillemin, "Identification of heavyweight address prefix pairs in IP traffic," in ITC, Sept 2009, pp. 1-8.

[19] A. Metwally, D. Agrawal, and A. E. Abbadi, "Efficient Computation of Frequent and Top-k Elements in Data Streams," in ICDT, 2005.

[20] S. Muthukrishnan, "Data Streams: Algorithms and Applications," Foundations and Trends in Theoretical Computer Science, vol. 1 , no. 2, pp. 1 17-236, 2005.

[21 ] N. Bandi, A. Metwally, D. Agrawal, and A. El Abbadi, "Fast data stream algorithms using associative memories," in SIGMOD. ACM, 2007, pp. 247-256.

[22] E. D. Demaine, A. L ' opez-Ortiz, and J. I. Munro, "Frequency estimation of internet packet streams with limited space," in EATCS ESA, 2002, pp. 348-360.

[23] R. M. Karp, S. Shenker, and C. H. Papadimitriou, "A simple algorithm for finding frequent elements in streams and bags," ACM Transactions Database Systems, vol. 28, no. 1 , Mar. 2003.

[24] G. S. Manku and R. Motwani, "Approximate frequency counts over data streams," in VLDB, 2002.

[25] V. Sivaraman, S. Narayana, O. Rottenstreich, S. Muthukrishnan, and J. Rexford, "Smoking out the heavy-hitter flows with hashpipe," CoRR, vol. abs/161 1 .04825, 2016. [Online]. Available: http://arxiv.org/abs/161 1 .04825

[26] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C.

Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, "P4: Programming protocol-independent packet processors," SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, pp. 87-95, Jul. 2014. [Online]. Available:

http://doi.acm.Org/10.1 145/2656877.2656890 [27] R. Ben-Basat, G. Einziger, R. Friedman, and Y. Kassner, Optimal elephant flow detection," IEEE INFOCOM, 2017.

[28] , "Randomized admission policy for efficient top-k and frequency estimation," IEEE INFOCOM, 2017.