Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DETECTING NETWORK TRAFFIC
Document Type and Number:
WIPO Patent Application WO/2020/160755
Kind Code:
A1
Abstract:
A network analyser for detecting traffic of a specific type on a network by analysing records of traffic on the network that indicate for each of a plurality of communication streams a number of packets conveyed, a volume of data conveyed, a start time and an end time, each of the communication streams being between a single source entity and a single destination entity, the network analyser being configured to compute at least one of the following metrics: a. based on an estimation, for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, a frequency of occurrence of messages of a certain size in each direction and a number of messages in each direction, a representation of the extent to which the distribution of those frequencies is skewed; b. the ratio of the size of messages in each direction for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity; c. the distribution of the rate of messages from a remote entity to a local entity; d. the distribution of the rate of messages from a local entity to a remote entity; e. for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, the degree of correlation between the mean packet size in each direction; and f. for a communication stream between a source entity and a destination entity, the degree of correlation between the number of packets in that stream and the mean packet size in that stream; and to estimate the presence of the traffic of a specific type in dependence on the or each computed metric.

Inventors:
RUSSO MICHELE (DE)
SMDIC NEDIM (DE)
LASKOV PAVEL (LI)
Application Number:
PCT/EP2019/052745
Publication Date:
August 13, 2020
Filing Date:
February 05, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
RUSSO MICHELE (DE)
International Classes:
H04L12/24; G06Q20/06; H04L12/26
Foreign References:
US20100046377A12010-02-25
US20150215327A12015-07-30
US20150229661A12015-08-13
US20180331922A12018-11-15
Other References:
None
Attorney, Agent or Firm:
KREUZ, Georg (DE)
Download PDF:
Claims:
CLAIMS

1 . A network analyser for detecting traffic of a specific type on a network by analysing records of traffic on the network that indicate for each of a plurality of communication streams a number of packets conveyed, a volume of data conveyed, a start time and an end time, each of the communication streams being between a single source entity and a single destination entity, the network analyser being configured to compute at least one of the following metrics:

a. based on an estimation, for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, of a frequency of occurrence of messages of a certain size in each direction and a number of messages in each direction, a representation of the extent to which the distribution of those frequencies is skewed;

b. the ratio of the size of messages in each direction for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity;

c. the distribution of the rate of messages from a remote entity to a local entity; d. the distribution of the rate of messages from a local entity to a remote entity; e. for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, the degree of correlation between the mean packet size in each direction; and

f. for a communication stream between a source entity and a destination entity, the degree of correlation between the number of packets in that stream and the mean packet size in that stream;

and to estimate the presence of the traffic of a specific type in dependence on the or each computed metric.

2. The network analyser as claimed in claim 1 , wherein the network analyser is configured to estimate the occurrence of keepalive packets in each communication stream from a remote entity to a local entity and to adjust its computation of those of metrics a, b, c and d that it computes in dependence on that estimation.

3. The network analyser as claimed in claim 2, wherein the network analyser is configured to estimate the occurrence of keepalive packets between a source entity and a destination entity by searching for recurrence of upper bounds on the recorded interval between communication streams from that source entity to that destination entity.

4. The network analyser as claimed in any preceding claim, the network analyser being configured to, for any of metrics a to d, estimate the occurrence of a respective types of message in each communication stream of the pair, the said messages in dependence on which metric a, b, c or d is computed being restricted to those messages estimated to be of one of the respective types.

5. The network analyser as claimed in claim 4, wherein for each communication stream from a local entity to a remote entity the type of message is a blockchain solution submission message.

6. The network analyser as claimed in claim 4 or 5, wherein for each communication stream from a remote entity to a local entity the type of message is a blockchain computation request message.

7. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to treat metric c as indicating an increased likelihood of the presence of traffic of the specific type if the said distribution of metric c is more concentrated.

8. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to treat metric d as indicating an increased likelihood of the presence of traffic of the specific type if the said distribution of metric d has a distribution of a predetermined form.

9. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to treat metric e as indicating an increased likelihood of the presence of traffic of the specific type if the said degree of correlation of metric e is lower.

10. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to treat metric f as indicating an increased likelihood of the presence of traffic of the specific type if the said degree of correlation of metric f is lower and/or is lower for traffic in one direction between a pair of network entities than in the other direction.

1 1 . The network analyser as claimed in any preceding claim, wherein the traffic of a specific type is cryptocurrency mining traffic.

12. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to train an artificial intelligence model to detect traffic of the specific type in dependence on one or more of the said metrics.

13. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to train the model by one-class classification.

14. The network analyser as claimed in any preceding claim, wherein the network analyser is configured to input one or more of the said metrics to an artificial intelligence model to thereby form an estimate of the presence of traffic of the specific type.

15. A method for detecting traffic of a specific type on a network by analysing records of traffic on the network that indicate for each of a plurality of communication streams a number of packets conveyed, a volume of data conveyed, a start time and an end time, each of the communication streams being between a single source entity and a single destination entity, the method comprising:

computing at least one of the following metrics:

a. based on an estimation, for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, of a frequency of occurrence of messages of a certain size in each direction and a number of messages in each direction, a representation of the extent to which the distribution of those frequencies is skewed; b. the ratio of the size of messages in each direction for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity;

c. the distribution of the rate of messages from a remote entity to a local entity; d. the distribution of the rate of messages from a local entity to a remote entity; e. for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, the degree of correlation between the mean packet size in each direction; and

f. for a communication stream between a source entity and a destination entity, the degree of correlation between the number of packets in that stream and the mean packet size in that stream;

and estimating the presence of the traffic of a specific type in dependence on the or each computed metric.

Description:
DETECTING NETWORK TRAFFIC

This disclosure relates to detecting network traffic of a specific, predetermined type.

In organisations where multiple users share a network, it is valuable to be able to detect abuse of network or computing resources. One form of abuse is using the resources to participate in an external activity such as cryptocurrency mining. Other examples of unwanted participation might include the identification of large prime numbers, the analysis of astronomical data or the computation of hashes for purposes other than cryptocurrency mining, for example for implementing a proof-of-work blockchain for administrative purposes.

Cryptocurrency mining involves receiving input data and then computing a value dependent on that data. The value is difficult to calculate. Nowadays, most cryptocurrency mining is undertaken by mining pools. In a mining pool, a pool administrator distributes jobs between members of the pool. The members of the pool are known as miners. The administrator transmits work instructions known as New Job messages to the miners. Each miner performs calculations on the data sent in its New Job message. It may then transmit to the administrator a message indicating a potential solution for the sought value in the form of a Solution Submission message. Thus the traffic between the administrator and a participant in the pool involves a series of New Job messages in one direction and Solution Submission messages in the other direction. The Stratum protocol is commonly used for this purpose. Other examples of unwanted participation may involve a similar form of message exchange. Figure 1 illustrates potential mining communications of various types.

The traffic between an administrator and a miner may be encrypted, meaning that the bit content of the traffic itself cannot be analysed directly to identify New Job messages and Solution Submission messages. The traffic may pass via a VPN, meaning that traffic from a miner in a local network to a remote administrator cannot be identified from occurrence of the network address of the administrator (even if it is known) as a packet source or destination address. Notwithstanding these difficulties, it would be desirable to be able to detect network traffic that is indicative of unwanted activity of the types described above.

Network analysis tools capture and analyse data about traffic on a network. One example of a network analysis tool is the Huawei NetStream. Another example is NetFlow. Devices such as the Huawei NetStream collect and store data on communication streams. Devices of this type can monitor traffic entering or leaving an organisation’s local network. For example, when one or more packets are being transmitted from a local source device having a source address to a remote destination address information on that communication stream can be captured. Similarly, information can be captured on a communication stream when one or more packets are being received from a remote source address addressed to a local destination device. A convenient way to capture this data is to begin accumulating data about the communication stream when the stream is first identified. Subsequently a predetermined timeout period has elapsed with no packets of the stream having passed, the stream can be assumed to have ended and a data record can be written to a database indicating the accumulated information. An example of the information that may be included in the record is:

- source address

- destination address

- number of packets

- total number of bytes in those packets

- start time of the stream

- end time of the stream (i.e. the time when the last packet of the stream passed)

- source port

- destination port

- protocol

Figure 2 shows examples of NetStream records. The timeout may, for example, be 100 seconds. It is possible that the communication stream has not ended when the record is written, but is temporarily inactive. In that case, when the next packet passes it will be treated as indicating the start of a new communication stream from the same source address to the same destination address. In the approach described above, a bidirectional flow is described by two records: one for the traffic in each direction. However, the records could be coalesced so that bidirectional flows are described by a common record.

According to one aspect there is provided a network analyser for detecting traffic of a specific type on a network by analysing records of traffic on the network that indicate for each of a plurality of communication streams a number of packets conveyed, a volume of data conveyed, a start time and an end time, each of the communication streams being between a single source entity and a single destination entity, the network analyser being configured to compute at least one of the following metrics: a. it having been having estimated, for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, a frequency of occurrence of messages of a certain size in each direction and a number of messages in each direction, a representation of the extent to which the distribution of those frequencies is skewed; b. the ratio of the size of messages in each direction for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity; c. the distribution of the rate of messages from a remote entity to a local entity; d. the distribution of the rate of messages from a local entity to a remote entity; e. for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, the degree of correlation between the mean packet size in each direction; and f. for a communication stream between a source entity and a destination entity, the degree of correlation between the number of packets in that stream and the mean packet size in that stream; and to estimate the presence of the traffic of a specific type in dependence on the or each computed metric.

According to a second aspect there is provided a method for detecting traffic of a specific type on a network by analysing records of traffic on the network that indicate for each of a plurality of communication streams a number of packets conveyed, a volume of data conveyed, a start time and an end time, each of the communication streams being between a single source entity and a single destination entity, the method comprising: computing at least one of the following metrics: a. it having been having estimated, for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, a frequency of occurrence of messages of a certain size in each direction and a number of messages in each direction, a representation of the extent to which the distribution of those frequencies is skewed; b. the ratio of the size of messages in each direction for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity; c. the distribution of the rate of messages from a remote entity to a local entity; d. the distribution of the rate of messages from a local entity to a remote entity; e. for a pair of communication streams relating to overlapping time periods and being in either direction between a pair of a source entity and a destination entity, the degree of correlation between the mean packet size in each direction; and f. for a communication stream between a source entity and a destination entity, the degree of correlation between the number of packets in that stream and the mean packet size in that stream; and estimating the presence of the traffic of a specific type in dependence on the or each computed metric.

Any one of the said metrics or any combination of two or more of them may be computed. The or each remaining metric may be not computed. This may allow the system to efficiently compute an effective set of metrics.

The network analyser may be configured to estimate the occurrence of keepalive packets in each communication stream from a remote entity to a local entity and to adjust its computation of those of metrics a, b, c and d that it computes in dependence on that estimation. This may improve the effectiveness of those metrics.

The network analyser may be configured to estimate the occurrence of keepalive packets between a source entity and a destination entity by searching for recurrence of upper bounds on the recorded interval between communication streams from that source entity to that destination entity. This may be an effective mechanism for detecting keepalive packets. This may be especially effective when the network traffic is being analysed based on aggregated records of traffic (e.g. in a single direction between a particular source and a particular destination) over a period ending when there has been no such traffic for a predetermined time. The network analyser may be configured to, for any of metrics a to d, estimate the occurrence of a respective types of message in each communication stream of the pair, the said messages in dependence on which metric a, b, c or d is computed being restricted to those messages estimated to be of one of the respective types. This may improve the effectiveness of those metrics. For each communication stream from a local entity to a remote entity the type of message may be a blockchain solution submission message. For each communication stream from a remote entity to a local entity the type of message may be a blockchain computation request message.

The network analyser may be configured to treat metric c as indicating an increased likelihood of the presence of traffic of the specific type if the said distribution of metric c is more concentrated. The network analyser may be configured to treat metric d as indicating an increased likelihood of the presence of traffic of the specific type if the said distribution of metric d has a distribution of a predetermined form. The network analyser may be configured to treat metric e as indicating an increased likelihood of the presence of traffic of the specific type if the said degree of correlation of metric e is lower. The network analyser may be configured to treat metric f as indicating an increased likelihood of the presence of traffic of the specific type if the said degree of correlation of metric f is lower and/or is lower for traffic in one direction between a pair of network entities than in the other direction. In each case, these criteria may improve the effectiveness of detection of the specific traffic.

The traffic of a specific type may be cryptocurrency mining traffic.

The network analyser may be configured to train an artificial intelligence model to detect traffic of the specific type in dependence on one or more of the said metrics. The network analyser may be configured to train the model by one-class classification. This may be an efficient way to train the model.

The network analyser may be configured to input one or more of the said metrics to an artificial intelligence model to thereby form an estimate of the presence of traffic of the specific type. This may allow traffic of the specific type to be detected in real-time or non-training traffic. The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 illustrates types of mining activity.

Figure 2 shows examples of NetStream records.

Figure 3 shows an outline of the traffic detection algorithm.

Figure 4 shows the architecture of a network.

Figure 5 illustrates mining traffic flows.

Figure 6 illustrates an occurrence pattern of Stratum traffic.

Figure 7 shows the steps in training a model.

Figure 8 shows the steps in using the model to analyse non-training traffic.

First, an outline will be given of the approach to be described in detail below for heuristically detecting network traffic of a specific, predetermined type. Figure 3 illustrates the approach. Initially a training phase is undertaken in order to train a model for detecting the specific traffic. Later the model is used to detect the traffic in live data. In the training phase training data 1 is provided. The training data may be in the form of records of individual time-bound communication streams as described above, for example NetStream records. The training data is analysed in a traffic inference block 2. The traffic inference block attempts to reconstruct artefacts of the communication sessions that led to the records. In step 3 it reconstructs TCP sessions; in step 4 it windows traffic by time and in step 5 it extracts features from that traffic. The extracted features serve as metrics by which the traffic can be analysed. Next, a training process 6 is undertaken on the features in order to form the weightings of an artificial intelligence model 7. The model is stored for future use. In the later phase, real-time traffic 8 is input to the traffic inference block 2, which extracts features in the same way as in the training phase. Then the features are input to the trained model. The model applies the predetermined weightings and outputs an indication of whether the traffic is indicative of the specific type of traffic. The hardware that implements the traffic inference block and that stores the model could be the same in both phases. Alternatively, one set of hardware could develop the trained model, and then the parameters of the model could be transmitted to and stored by another set of hardware which could perform the second phase, potentially at a location remote from where the model was trained.

The present approach will be described below with reference to NetStream records, but this is just an example and records from another network analysis system could be used.

The present approach will be described below with reference to mining the Monero cryptocurrency but this is just an example and it could be used to detect the mining of other cryptocurrencies.

The present approach will be described below with reference to the Stratum protocol, but this is just an example and it could be used to detect traffic using protocols other than Stratum.

It should be noted that different mining pools have different configurations. For this reason the system cannot rely on simple hardcoded rules: e.g. detecting New Job messages of a specific size.

Every N seconds, the system may receive as input a collection of NetStream data collected from a tap/span in the network to be protected. Mining is typically performed over the TCP protocol. The TCP protocol supports long communication sessions. NetStream records are collected with a timeout that can be shorter than the length of the mining TCP session, for example 100 seconds. To handle this case, the first step (3) of the method is to reassemble unidirectional TCP NetStream flows into TCP conversations spanning multiple flow records. The algorithm groups flow records together based on their source and destination addresses. In step 4 the NetStream flow records are split into overlapping successive windows by time, each window having the same time length. The overlap is due to the sliding window nature of splitting. This is to ensure that time-dependent features have a comparable value among all time windows.

Before the resulting collection of time-windowed TCP conversations can be fed to the algorithm, it is desirable for it to be converted to a format that can conveniently be processed in subsequent steps. In step 5 the algorithm computes a number of features capable of capturing the intrinsic network behaviour of the communication between the miners and the pool/services from which they receive the jobs. The features can be described by vectors.

After the pre-processing step done in block 2, the feature vectors are sent to an algorithm in block 6 for one-class classification. In machine learning, one-class classification tries to identify objects of a specific class by learning from a training set containing only the objects of that class. The present implementation uses Isolation Forest, but any other one-class classification algorithm can be used, e.g., One-Class Support Vector Machine. In the present case, the training class is constituted only by mining traffic. The model is trained to identify mining traffic from a window of NetStream data of predetermined duration T. The output of the training step is a one- class classification model. In other examples, the training data could include traffic of the specific type to be detected (in this case mining traffic) and other traffic. Then the training can be done with knowledge from the training data of which training data relates to the specific type of traffic that is to be detected,

After the learning phase described above, the system can operate in prediction mode. In prediction mode, the model 7 is presented with feature vectors for real-time traffic. The model predicts whether or not each of those feature vectors represents mining traffic. Conveniently, predictions for the same TCP conversations can be grouped together and the conversation (i.e. the bidirectional communication stream) assigned a risk score depending on how many windows have been found to be classified as mining. When the risk score exceeds a predetermined value the stream may be assumed to represent mining traffic and the system may automatically take action in response to that determination: for example it may block traffic between the addresses involved in that stream and/or transmit an alert to an administrator.

Some advantages of this approach over some prior art methods include that cryptocurrency mining can be detected from simplified representations of traffic, such as NetStream data; that it does not depend on deep packet inspection and is therefore suitable for large networks and milder hardware requirements; that it can operate at relatively modest cost because it does not require the deployment of host agents; that it reduces the possibility of evading detection; that it simplifies network security management; and that it reduces maintenance effort since the model does not require continual updating.

Figure 4 shows an example of the network architecture in which the present system may be employed.

Stratum is a clear-text communication protocol that mining pools and miners use to communicate. It is built over TCP/IP and it uses the JSON-RPC format. A typical exchange of Stratum messages during mining is depicted in figure 5.

Once a miner successfully logs into a mining pool it starts receiving jobs from the pool administrator. These are given in the form of New Job messages. New Job messages generally have the following format:

{"jsonrpc":"2.0","method":"job","params":{"blob":"0909aefe9d e005621 aad0680 34342918072dd44ddb9534f1 a75b2e1 fc8c71 c26dada37abdc6d0b000000003 3ad940d75bc0398871 1 de6b68313380a44c12cf0441 1 1 deaal e908b5af 14f070 4","job_id":"1 6864", "target" :"cf8b0000"}}

On receiving a New Job message the miner starts to work on the mathematical problem (i.e. mining) using the information contained in the New Job message. As soon as it finds a solution it submits the solution to the administrator in the form of a Solution Submission message. Solution Submission messages generally have the following format:

{"id":76,"jsonrpc":"2.0","method":"submit","params":{"id":"1 ","job_id":"16871 nonce":"b51 100e0","result":"77b70dd08e35514827a0cd9e75aec4eeb28a681 6c2fd90dc649285d4ca050000"}} The pool verifies the solution and sends back a result message (a Submission Result message). Submission Result messages generally have the following format:

{"id":76,"jsonrpc":"2.0","result":{"status":"OK"},"error":nu ll}

This process, marked as“Unit” in figure 5, repeats until the network connection is terminated.

Figure 6 shows a detailed extract of a communication between a miner and a Monero public pool over Stratum. The first and second rows show the TCP/Stratum packets and their size sent by the pool and the miner, respectively, during the mining process. The plot is based on a full packet capture (DPI) but the present algorithm works over more limited NetStream data. As an example, the network traffic sent in the window delimited by the dashed-vertical lines is summarized in the following NetStream records:

Source: Miner address

Destination : Pool address

Packets: 1 1

Bytes: 1753

Start time: 1530189072

End time: 1530189144

Source port: 50122

Destination port: 7777

TCP flags: 24

Protocol: 6

Source: Pool address

Destination: Miner address

Packets: 6

Bytes: 952

Start time: 1530189072

End time: 1530189144

Source port: 7777

Destination port: 50122

TCP flags: 24 Protocol: 6

Before the time-windowed TCP conversations can be fed to the algorithm, they are encoded. The encoding is carried out in the feature extraction phase. Examples of features that may be detected and the manner in which they are detected are described below.

Feature 1 : Heuristic based on speculative estimations of protocol-specific packets

The Stratum protocol communicates over TCP. As part of a typical TCP conversation carrying mining traffic, a miner transmits acknowledgement (ACK) messages. The number of ACK messages sent by a miner is generally equal to the number of Solution Submission messages it sends plus the number of New Job messages it receives. The Solution Submission messages influence the total number of ACKs because for each of these the pool sends back a Submission Result message that gets acknowledged by the miner.

Generally, the miner sends only two types of packets during a mining session, which are ACK and Solution Submission. However, some protocols implement keepalive messages. For example figure 6 shows tpc_keep_alive messaging. For feature 1 it is advantageous to estimate whether a keepalive mechanism is in use. This can be done in the following way.

For each NetStream record summarizing the miner’s network traffic in a TCP conversation (e.g. the dashed selection in Fig. 6) the total number of packets and the total number of bytes are known. In addition, any ACK packets can be assumed to be of one of the sizes specified by IANA, i.e. 40, 44, 48, 52, 56, 60, 64, 68, 72, 76 or 80 bytes. Also, for a specific protocol the size of a Solution Submission message can be lower bounded. In the case of Stratum, the size of a Solution Submission message is greater than or equal to the ACK size + 72 bytes. 72 bytes is the minimum size of a Solution Submission message which contains only the values of the nonce and result keys of the JSON packet format illustrated above. If a communication stream from a local miner to a remote destination relates to a mining protocol it can be supposed that the total number of packets is equal to the number of ACKs plus the number of keepalive messages (if any) plus the number of Solution Submission messages.

The system estimates whether there is a keepalive mechansim. It is known that if a keepalive mechanism is in use then the interval between successive NetStream records describing one direction of the TCP conversation is upper bounded by a specific number, that number being lower than the NetStream aging timeout (e.g. 100s). In addition, this bounding amount often repeats more than once in a conversation. If keepalive is in use then for each pair of successive NetStream records for a potential miner to pool conversation the system computes the least number of keepalive packets they could contain. It can be supposed that when the NetStream records’ inter-arrival time is equal to the discovered keepalive duration then one keepalive has been sent. Furthermore, the number of possible keepalives can be estimated by considering the transmission rate.

Once this information has been computed the speculation process for the miner can be started.

For each NetStream record relating to a potential mining conversation from the miner the system tests the feasible ACK sizes and, using the information listed above, it builds a constrained system of mathematical equations for which it then finds the list of possible solutions. These have the following format:

[Solution submission size, number of Solution submission, number of New

Jobs, ACK size, number of ACK, number of keepalive]

After iterating over all the miner’s records in the candidate TCP conversation, the system starts speculations for the NetStream records relating to communications from the potential pool to the potential miner. The NetStream records give the following information: total number of packets, total number of bytes. Further, in the Stratum protocol the total number of packets from the pool is equal to the number of ACKs from the pool plus the number of keepalive messages plus the number of New Job messages plus the number of Submission Result messages. It is also known that the number of Solution Submission messages (from the miner) equals the Number of Submission Result messages (from the pool). It is also known that the number of the miner’s keepalive messages equals the number of the pool’s keepalive messages. It is also known that the number of ACK packets sent only from the pool is not equal to the number of Solution Submission messages from the miner. In Stratum, since it employs TCP, the pool often embeds the ACK flag for a Solution Submission message in the Submission Result packet it sends. In the case of Monero it is also known that the size of a New Job message is greater than or equal to the ACK size plus 152. 152 bytes is the minimum size of a New Job message which contains only the blob which is the data the miner must have in order to start mining. It is also known that the size of the Submission Result message equals the ACK size plus a value that is from 63 to 70 inclusive. Across different pools the structure of the Submission Result message of Stratum is typically the same. What changes is the value of the id key in the JSON packet (see example above). The id keeps track of the number of Solution Submission messages from the miner. By setting the Submission Result to be lower than ACK+70 it can be assumed that the miner is detected before millions of submissions are sent; which might take days.

In addition, for each NetStream record for communications from the pool the system has a corresponding list of feasible speculations regarding the miner’s network traffic which has been retrieved in the previous step. By reassembling the NetStream records in TCP conversations the system can also find, within each conversation, the pair of flows which summarize the traffic sent in the same time window but for opposite direction (from pool to miner and vice versa).

At this point, the system has formed a list of tuples having the following structure:

(pool’s NetStream 1 ,[list of solution from the speculation on the corresponding miner’s NetStream 1 ]), (pool’s NetStream 2, [list of solution from the speculation on the corresponding miner’s NetStream 2]), ...

This list can be combined with the information available about the pool, and iterated over to obtain a constrained system of mathematical equations. From its results the system can obtain a final solutions list. An example of an element in such a list is given below.

(Number of repetitions, Solution submission’s size, Number of Solution submission, New job’s size, number of New jobs, ACK size), ... From this list the following features can be computed:

1 . Percentage of times the top solution (highest number of repetitions) repeats with respect to the number of available solutions in the final list.

2. Difference between maximum and minimum number of repetitions of the final list of solutions.

3. Standard deviation of the number of repetitions.

The reasoning behind these features is that if the TCP conversation contains mining network traffic then the present problem will have a single solution that repeats multiple times across different NetStream pairs. While, if the traffic is not related to mining there are going to be a lot of feasible solutions with a small number of repetitions.

4. Packet size ratio of New Job versus Solution Submission. It has been observed that the ratio of these two Stratum messages is almost constant across different mining pools.

5. Standard deviation of pool’s New Job rate. At the end of the speculation process, for each NetStream record of the pool, the system has estimated how many New Job messages have been sent in a specific time window equal to the NetStream record duration. Thus, it knows the per-record New Job rate. Then, it computes the standard deviation by considering all the rates of the different pool’s NetStream records composing the TCP conversation. This feature aims at capturing the fact that mining pool sends New Job messages at an almost constant rate (the rate is different across pools).

6. Standard deviation of miner’s Solution Submission rate. This follows an analogous process to the above feature. This feature aims at capturing the variable difficulty (VARDIFF) algorithm deployed by mining pools. Variable difficulty will adjust the New Job difficulty (the target key in the New Job JSON packet shown above) so that the miner submits a specific number of shares a minute.

Feature 2: Correlation between the mean packet size of client and server

The feature is computed by iterating over the set of NetStream records that together represent a single time-windowed TCP conversation. First, the records are grouped by source address. Then, per group, the system computes the mean packet size over the different NetStream records, thus obtaining two lists of mean packet sizes per- record: one for local to remote traffic and one for remote to local traffic. On these sets, different correlation metrics are computed. An example of a suitable metric is the Pearson correlation coefficient.

These features try to capture the following behaviour observed in mining. When the pool sends a New Job message the miners start to work on it, and stop working on their previous jobs. For this reason, after the pool sends a New Job it usually takes some time before the miner finds a plausible solution. At the network level this often translates in a pool’s NetStream record showing a high mean packet size (since a New Job message is the biggest Stratum message, see Fig. 6) and the corresponding miner’s NetStream record a low mean size because it just acknowledged the reception of the New Job but it needs some time to find a solution. The other way around, when the miner sends a Solution Submission message the pool usually has not sent a New Job message in the previous seconds and additionally the pool needs to send a Submission results, plus sometimes an ACK, that decrease the mean packet size since they are remarkably smaller that New Job messages. Hence in mining, the two lists of mean packet sizes are anti-correlated.

Feature 3: Correlation between number of packets and the mean packet size

This feature is similar to the Feature 2, but it looks instead at the correlation between the mean packet size and the number of packets per NetStream record, both for traffic flowing in the same direction (same source address).

The feature captures the fact that when a miner submits a solution it triggers also a response from the pool that it needs then to ACK (TCP traffic). Hence, a Submission solution message usually results in an increased number of packets sent for both miner and pool. Furthermore, this causes the mean packet size of the miner to increase and for the pool to decrease. So in this case there is correlation for traffic in one direction and anti-correlation for traffic in the other direction.

Feature 4: Ratio of transmitted bytes

The feature is computed iterating over the NetStream records forming a time- windowed TCP conversation. For each NetStream record in one direction (from A to B) we find the corresponding answer (records from B to A) and we compute the ratio of bytes sent/received. On the final series of values, we compute the five-number summary and mean, standard deviation, interquartile range and (maximum-to- minimum) ratio.

Feature 5: Mean packet size

This feature looks at the distribution of the mean packet size for NetStream records flowing in the same direction (same source address). Specifically, it computes the standard deviation, interquartile range and (maximum-to-minimum) mean size.

Feature 6: Transmission rate

This feature looks at the distribution of the transmission rate (bytes/s) for NetStream records flowing in the same direction (same source address). Specifically, it computes the standard deviation, interquartile range and minimum rate.

The above three feature (4, 5 and 6) are aimed at addressing the behaviour that in a mining communication there is a finite set of possible packets exchanged and there is a constant communication between miner and pool which follow a specific pattern (see the block marked“unit” in Figure 5).

Other possible features include:

- The number of bytes sent from each of a candidate client and a candidate server, and their interquartile and standard deviation.

- The packet inter-arrival time for traffic in each direction between a candidate client and a candidate server, and their interquartile, standard deviation and maximum- minimum.

Once the collection of time-windowed TCP conversations has been properly encoded, as explained above, the extracted feature vectors are fed as input to the trained model.

The detection model follows a one-class classification approach. One-class classification (OCC) tries to identify objects of a specific class by learning from a training set containing only the objects of that class. This can make one-class classification harder than conventional multi-class/binary classification. The task in OCC is to define a classification boundary around the positive (or target) class, such that it accepts as many objects as possible from the positive class, while it minimizes the chance of accepting non-positive (or outlier) objects. Since only one side of the boundary can be determined, in OCC, it is hard to decide, on the basis of just one class how tightly the boundary should fit in each of the directions around the data. It is also harder to decide which attributes should be used to find the best separation of the positive and non-positive class objects. To aid this separation process, the present system uses one or more of the above features for the target class.

The positive (target) class is represented by NetStream-formatted TCP conversations containing mining traffic only and these constitute the training data. This data is used to train the OCC algorithm which is based on Isolation Forest.

The training process is illustrated in Figure 7. The training data is first grouped by TCP conversations which are then split into overlapping windows. The amount of overlap (S) and the overall window size (T) are tuneable parameters of the model. The windowed TCP conversations go through the feature extraction process which encodes each of them in a feature vector. The collection of all these vectors is used to train the algorithm which is based on the Isolation Forest.

An advantage of using one-class classification is that, because the mining traffic has unique characteristics, an OCC model primed with the feature set described above can provide a high detection rate and/or accuracy and may be easily integrated in any network environment without the need to frequently re-train the model. This can ease the task of managing the network.

The detection process is depicted in Figure 8. Once the detection model is deployed in a network environment, it receives every R minutes a collection of NetStream records summarizing the network traffic over the preceding N minutes. This is termed live or real-time network traffic. This data passes to the pre-processing block 2 where it undergoes the same transformation as in the training phase. The output of the pre processing block is a sequence of feature vectors where each vector is extracted from a specific window of a TCP conversation. These vectors are then showed to the trained model which labels them either as mining or not. Finally, labelled vectors belonging to the same TCP conversations are grouped together and an anomaly score is assigned to each conversation. The higher the number of feature vectors for a single TCP conversation labelled as mining, the higher the anomaly score. Other algorithms can be used to process the or each selected feature, or combinations of selected features, to develop an indication of the likelihood that a particular stream represents traffic of the predetermined type that is sought. The operations for training or analysis may be performed using one or more devices having one or more processors and access to memory storing software executable by the processor(s) to perform the operations described above.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.