Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SPAN CATEGORIZATION
Document Type and Number:
WIPO Patent Application WO/2021/058104
Kind Code:
A1
Abstract:
A system for categorizing spans in microservice applications, each span being a vector of property-value pairs, the system comprising a span categorization module configured to: receive a plurality of spans; extract a plurality of features from each span; and categorize the plurality of spans into a plurality of categories in dependence on the extracted features.

Inventors:
CARDOSO JORGE (DE)
SHAKHAT ILYA (DE)
Application Number:
PCT/EP2019/076076
Publication Date:
April 01, 2021
Filing Date:
September 26, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
CARDOSO JORGE (DE)
International Classes:
G06F11/07; G06F11/34
Domestic Patent References:
WO2019099558A12019-05-23
Foreign References:
US20180309637A12018-10-25
US9531736B12016-12-27
US20190057015A12019-02-21
US20160269482A12016-09-15
Other References:
"Fast webpage classification using URL features", CIKM '05 PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, pages 325 - 326
Attorney, Agent or Firm:
KREUZ, Georg (DE)
Download PDF:
Claims:
CLAIMS

1. A system for categorizing spans in microservice applications, each span being a vector of property-value pairs, the system comprising a span categorization module configured to: receive a plurality of spans; extract a plurality of features from each span; and categorize the plurality of spans into a plurality of categories in dependence on the extracted features.

2. The system as claimed in claim 1 , wherein the system further comprises a span buffer configured to select a set of k spans from the plurality of spans, the span buffer being configured to repeatedly perform the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans.

3. The system as claimed in claim 1 or claim 2, wherein the system further comprises a learning module configured to perform the following steps: extract a plurality of features from the plurality of spans; form a series of span abstractions corresponding to the subset of spans, each span abstraction comprising the features of a respective span; form a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identify nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and select the features of the paths to each such node as a pattern and storing that pattern for future detection.

4. The system as claimed in claim 3 as dependent on claim 2, wherein the plurality of spans is the set of k spans selected by the span buffer.

5. The system of claim 3 or claim 4, wherein the span categorization module is configured to extract features from the plurality of spans corresponding to the features of the pattern.

6. The system of any of claims 3 to 5, wherein the pattern corresponds to a category and the span categorization module is configured to categorize spans into that category by matching the extracted features of spans to the features of the pattern.

7. The system as claimed in any of claims 3 to 6, wherein the learning module is configured to cease mapping spans abstractions to the trie when the rate of change of the number of such nodes is approximately zero.

8. The system of any of claims 3 to 7, wherein the predetermined threshold is the mean of the incoming edge frequencies of all of the nodes in the span trie.

9. The system of any preceding claim, wherein each span is a vector of property-value pairs describing the state of a microservice at a given time, and the spans in each category form a time series of spans.

10. The system of claim 9, wherein the system is further configured to assign a time series ID to the spans in each category.

11. The system of any preceding claim, wherein the extracted features comprise components of a URL.

12. The system of any preceding claim, wherein the extracted features comprise at least one of a method, URL endpoint, path or port.

13. The system of claim 12 as dependent on any of claims 3 to 11 , wherein the port number of the spans is used as the root node of the span trie.

14. A method for categorizing spans in microservice applications, each span being a vector of property-value pairs, the method comprising: receiving a plurality of spans; extracting a plurality of features from each span; and categorizing the plurality of spans into a plurality of categories in dependence on the extracted features.

15. A method for selecting a set of k spans from a plurality of spans, each span being a vector of property-value pairs, the method comprising repeatedly performing the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans.

16. A method for determining a pattern of features for extracting from a span in a microservice application, each span being a vector of property-value pairs, the method comprising: extracting a plurality of features from a plurality of spans; forming a series of span abstractions corresponding to the spans, each span abstraction comprising the features of a respective span; forming a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identifying nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and selecting the features of the paths to each such node as a pattern and storing that pattern for future detection.

17. A computer program that, when executed by a computer, causes the computer to perform the method of any of claims 14 to 16.

Description:
SPAN CATEGORIZATION

FIELD OF THE INVENTION

This invention relates to categorizing data, for example for analysis in microservice applications, using real-time distributed traces and spans, online model learning and machine learning.

BACKGROUND

For large-scale and complex microservice applications, it is fundamental to monitor and analyse streams of distributed traces to ensure quality of service and high reliability.

Among other benefits, the use of distributed tracing technology enables the monitoring of the communication of microservice applications using traces. Traces are composed of spans. They contain information that can be used to detect anomalies and performance bottlenecks at runtime. A span S t is a vector of property-value pairs (; Pi, v t ) describing the state, response time, and/or other characteristics of a microservice at a given time t. Collectively, spans of the same type are grouped together to form a time series {5 1 S 2 , . . . , S T ], where the subscript t indicates time. Time series of spans may conveniently be used to analyse microservice applications. For example, if the property p ; = response time, characterizing inter-microservice communication, suddenly increases its value v t at times t; , t i+1 , t i+2 , this may indicate a runtime problem with the underlying distributed application.

However, analysing millions of traces and their spans as time series requires the aggregation of spans using meaningful attributes. Thus, a categorization technique is required to assign spans to distinct categories, often to time series, for a posterior fine grained analysis. Classic techniques are inadequate, since they use distance metrics incompatible with the properties of spans. Furthermore, a form of online learning solution is needed, since, a priori, there is no training set representing all classes. The initial number of span classes is unknown.

To create time series by grouping spans, existing solutions use the Uniform Resource Locators (URL) (RFC 1738) of inter-microservice calls. For example, when microservice m 1 calls microservice m 2 , the distributed tracing system records a span S t = [(start, ts start ), ( end, ts end ), (url, m 2 )] where ts start and ts end are the timestamps indicating when the request started and when it ended (i.e., the response time ts end - ts start ), respectively, and url is the URL of microservice m 2 which was called (for example, http://192.168.0.12:5002/v2/AB12345/instance/98CD765/delete) . Depending on the level of aggregation required, pre-processing can be used to remove the scheme (i.e., “http”), the host (i.e., “192.168.0.12”). and the port number (i.e., “:5002”). Afterwards, distance or string-based matching methods are used to group spans with the same pre-processed URL. The immediate limitation is that when the path /v2/AB12345/instance/98CD765/delete contains so-called path parameters, existing approaches generate too many groups. As a negative consequence, many time series are also generated. This occurs because AB12345 is a customer ID and 98CD765 is a resource ID. Both are variables. Often, these path parameters have no effect on the response time of remote procedure calls and it is desirable to ignore them.

Researchers have investigated a similar problem which involves the clustering of web pages using their URLs. These approaches could potentially be used to solve the problem at hand to group similar spans using URLs. Since web pages are documents for human consumption, the methods used were from the field of text mining and text analysis. Existing approaches extract orthographic features from URLs to describe sequences of text units such as n-gram to be processed by machine learning algorithms (for example, Support Vector Machines).

Notable attempts include that described in ‘Fast webpage classification using URL features’, CIKM Ό5 Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pages 325-326. H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing linguistic processing. This approach segments the URL into meaningful chunks and adds component, sequential and orthographic features to model salient patterns. The resulting binary features are used in supervised maximum entropy modelling.

The approach described in ‘Purely URL-based topic classification’, WWW 2009, Proceedings of the 18th International Conference on World Wide Web’ uses two methods to extract features vector using whole tokens as features and using letter n- grams of the tokens. Support Vector Machines (SVM), Naive Bayes (NB) and Maximum Entropy (ME) machine learning algorithms are used to extract features.

Distance metrics can also be used to classify data. For example, the approach described in ‘CLUE: Clustering for Mining Web URLs’, published in 2016 28th International Teletraffic Congress (ITC 28) uses the concept of URL-distance and uses it to compose clusters of URLs using the well-known DBSCAN algorithm. However, using such distance metrics can be problematic, since they often make the assumption that paths’ elements are words or contain words. As a result, when these metrics are used to group spans with URLs /v2/AB12345/instance/98CD765/delete and /v2/XY98765/instance/12AB345/delete, the token sets {v2, AB12345, instance, 98CD765, delete} and {v2, XY98765, instance, 12AB345, delete} generated are separated by a large syntactic distance. Nonetheless, in the domain of inter microservice communication their distance is very small since they refer to the same function call. They simply have different parameters. Thus, in the field of distributed traces, the semantics of URLs’ paths cannot be ignored.

‘Highly Efficient Algorithms for Structural Clustering of Large Websites, WWW 2011 , Proceedings of the 20th International Conference on World Wide Web’ discloses a scalable algorithm for structurally clustering webpages for extraction using only the URLs of the webpages. The algorithm runs in linear time complexity in the number of webpages. It uses a close-world approach and assumes that all the objects are known before clustering URLs, and therefore cannot adapt to form new categories during processing.

It is desirable to develop a categorization method that overcomes these problems. SUMMARY OF THE INVENTION

According to a first aspect there is provided a system for categorizing spans in microservice applications, each span being a vector of property-value pairs, the system comprising a span categorization module configured to: receive a plurality of spans; extract a plurality of features from each span; and categorize the plurality of spans into a plurality of categories in dependence on the extracted features.

The system may further comprise a span buffer configured to select a set of k spans from the plurality of spans, the span buffer being configured to repeatedly perform the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans. This may allow a representative sample of spans to be analysed if there are insufficient computational resources available to analyse all incoming spans. This may allow the system to handle span bursts.

The system may further comprise a learning module configured to perform the following steps: extract a plurality of features from the plurality of spans; form a series of span abstractions corresponding to the subset of spans, each span abstraction comprising the features of a respective span; form a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identify nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and select the features of the paths to each such node as a pattern and storing that pattern for future detection. This may allow the features to be extracted from the spans in order to categorize them to be identified. This may also allow for progressive learning, as span categories may be learned as new spans are observed by the system.

The plurality of spans may be the set of k spans selected by the span buffer. This may allow the system to learn from a representative sample of spans to be analysed if there are insufficient computational resources available to learn from all incoming spans. This may allow the system to handle span bursts.

The span categorization module may be configured to extract features from the plurality of spans corresponding to the features of the pattern. The pattern may correspond to a category and the span categorization module may be configured to categorize spans into that category by matching the extracted features of spans to the features of the pattern. Therefore, the system may learn the features to extract in order to meaningfully categorize the plurality of spans. The learning module may be configured to cease mapping spans abstractions to the trie when the rate of change of the number of such nodes is approximately zero. The approach may therefore automatically stop when the change rate derivative converges. This may provide an effective stop condition preventing the system from analysing redundant spans which do not add additional information to the knowledge model.

The predetermined threshold may be the mean of the incoming edge frequencies of all of the nodes in the span trie. This may be a convenient critereon for determining schema nodes.

Each span may be a vector of property-value pairs describing the state of a microservice at a given time, and the spans in each category form a time series of spans. The system may be further configured to assign a time series ID to the spans in each category. This allows each category to represent a time series of spans. The system may therefore be used, for example, for time series analysis.

The extracted features may comprise components of a URL. The extracted features may comprise at least one of a method, URL endpoint, path or port. This may allow for improved processing efficiency and the system may achieve a low running time complexity. This makes the approach scalable and suitable to efficiently process spans originating from large-scale microservice applications.

The port number of the spans may be used as the root node of the span trie. This may be a convenient implementation.

According to a second aspect there is provided a method for categorizing spans in microservice applications, each span being a vector of property-value pairs, the method comprising: receiving a plurality of spans; extracting a plurality of features from each span; and categorizing the plurality of spans into a plurality of categories in dependence on the extracted features.

According to a third aspect there is provided a method for selecting a set of k spans from a plurality of spans, each span being a vector of property-value pairs, the method comprising repeatedly performing the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans. This may allow a representative sample of spans to be analysed if there are insufficient computational resources available to analyse all incoming spans. This may allow the system to handle span bursts. According to a fourth aspect there is provided a method for determining a pattern of features for extracting from a span in a microservice application, each span being a vector of property-value pairs, the method comprising: extracting a plurality of features from a plurality of spans; forming a series of span abstractions corresponding to the spans, each span abstraction comprising the features of a respective span; forming a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identifying nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and selecting the features of the paths to each such node as a pattern and storing that pattern for future detection. This may allow the features to be extracted from the spans in order to categorize them to be identified. This may also allow for progressive learning, as span categories may be learned as new spans are observed by the system.

According to a fifth aspect there is provided a computer program that, when executed by a computer, causes the computer to perform the methods described above. The computer program may be provided on a non-transitory computer readable storage medium.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings: Figure 1 illustrates an example of the architecture of a span categorization system and its relationships to the context environment.

Figure 2 shows an example properties-value pairs in a span.

Figure 3 shows an example of a span categorization system. Figure 4 illustrates the interaction between the span buffer, the tracing server and the span extractor.

Figure 5 illustrates a method for selecting a set of k spans from a plurality of spans in the span buffer.

Figure 6 illustrates the step of a progressive learning process to extract patterns for a plurality of spans.

Figure 7 shows an example of the structure of a HTTP endpoint.

Figure 8 shows examples of span abstractions.

Figure 9 illustrates port indexing.

Figures 10(a)-(c) show a span trie constructed using three qualified paths (q. paths). Figure 11 shows a span trie with terminal nodes.

Figure 12 illustrates identifying schema and parameter nodes in a span trie.

Figure 13 illustrates the relative change of node types as more spans are analysed. Figures 14(a) and 14(b) illustrate reducing a span trie.

Figure 15 illustrates a method for determining a pattern of features for extracting from a span.

Figure 16 illustrates a method of categorizing spans. DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to a progressive learning technique for span categorization to assign distinct, but related, spans into categories which dynamically adapts as new categories appear in span streams. For example, spans with similar URLs are assigned to the same category since they have inherently a similar behaviour.

Categorization is closely related to the concept of classification. The term categorization refers to the process of dividing the open-world of spans into groups whose members are in some way similar. On the other hand, classification refers to the process of assigning elements to classes in an existing classification system.

Figure 1 shows the context architecture of the proposed categorization system 100, particularly its relationship with a microservice application 101 , a tracing server 102 and a data and processing tier 103.

The system 100 is particularly suitable for use with large-scale, complex distributed systems implemented by following a microservice architecture, which is an architecture style for developing software systems exposed as (micro)services interconnected by a network. Such a microservice application is shown at 101 in Figure 1 . Services interact with other services using an inter-process communication protocol such as HTTP, AMQP, or a binary protocol such as TCP. The system described herein exemplifies its use with the HTTP protocol and REST, but it is not limited to them.

Distributed tracing may be used to monitor the microservice application. For each request, detailed statistics, metrics, and logging data is generated so that operators can understand distributed traffic flow and debug problems as they occur. Tracing infrastructures generate so-called traces which are collections of spans. Each span describes the state of a microservice application when an important milestone started or completed its execution. As an example, consider spans which capture the response time of communication between microservices. Spans may be collected, grouped, and analysed using statistical and machine learning algorithms with one of the following objectives in mind: transaction monitoring, root cause analysis, service dependency analysis, and performance optimization. The analysis may be achieved by ordering spans to form a time series using timestamps for the x-axis and a value of a property, such as intra-service call response time, for the y-axis.

As shown in Figure 1 , the microservices of microservice application 101 communicate by calling other microservices via endpoints. During microservice-to-microservice communication using HTTP, the tracing library client 104 generates traces which are pathways showing how the handling of requests was conducted. Traces can be generated using distributed tracing systems (for example, Jaeger or Zipkin) or traditional logging (form example, Unix syslog). The tracing server module 102 receives tracing data generated by the microservices via tracing library clients. The received data is in the form of a span. Each span contains several property-value (; Pi , v t ) pairs. Figure 2 shows the following properties as examples: trace id, span id, timestamp, application name, method, protocol, endpoint, response time, req_msg_size, rsp_msg_size and result_code.

In the data and processing tier 103, the spans associated with each category, which in this example are assigned a time series id, are stored in a database. In the embodiment shown in Figure 1 , the database is a time-series database (TSDB). Afterwards, artificial intelligence and statistical methods are applied to the time series for operations such as transaction monitoring, root cause analysis, service dependency analysis and performance optimization. This module may conveniently implement various behaviour change detection and anomaly detection algorithms, using parallel data analysis frameworks such as Spark for efficient processing.

Large-scale microservice applications can generate hundreds of thousands of traces which in turn generate millions of different spans. Each span generated by the tracing infrastructure has an invocation endpoint. If a data centre has 50,000 servers, each server has 10 services, and each service has 10 endpoints, which can be called with 10 different parameters, then there are 50 million different invocation endpoints which need to be monitored and analysed. If no specific method is used, the invocation endpoints can be used as a unique identifier for a time series to store the spans of an endpoint. Nonetheless, this has a major drawback, since it generates too many time series, on the one hand, and causes many time series to only contain a few observations since they get diluted into a large pool of time series.

The span categorization system described herein is an efficient implementation which addresses this problem by assigning spans to categories, which may optionally afterwards be assigned to time series.

Figure 3 shows the overall architecture of the categorization system 300. The components of the system include span buffer 301 , progressive learning module 302, span categorization module 303, span extractor 304, model constructor module 305, model adaptor module 306, pattern extractor module 307 and span collector 308. The operation of these components will now be described in the following sections.

The span buffer 301 is a component shared by the tracing server (102 in Figure 1 ) and by the span extractor 304 to mediate the different processing speeds so that these two modules can operate without mutual interference.

For large-scale microservice applications, the number of spans generated per unit of time can be much larger than the number of spans, which can be analysed. Without any specific mechanism, the tracing server 102 can easily overload the span extractor 304 and progressive learning module 302. While adopting a buffer enables the achievement of a strong decoupling to handle different speeds of operation, it typically has a limited capacity. Thus, situations when spans cannot be processed to build a span model are unavoidable.

In situations where the spans cannot fit into memory all at once, the span buffer may store k spans and drop all of the subsequent incoming spans. While simple, this approach does not guarantee that the spans stored in the buffer are a statistically representative sample from the population. This is important since, based on the operation of the microservice application, it can happen that a large set of related requests are executed with a close temporal proximity. In such a scenario, the buffer contains a set of spans which is not representative of the true span population. Hence, a mechanism which enables to randomly choose a sample of k spans, having seen n spans, from an unknown span domain size N and which guarantees that each span has an equal probability k/n of being chosen is required. To address this problem, a reservoir sampling mechanism is adopted which runs in 0(n ) time and uses 0(k ) space to sample spans. As illustrated in Figure 4, the approach comprises making a pass to the stream of n spans while maintaining and updating a span buffer structure. The buffer 301 stores the first k items of the span stream. Then, it iterates through the rest of 5. For the i-th item of 5 where i > k, it selects this span with probability k/i. If the buffer decides to store the new span, it randomly removes an old span from its structure and puts the new span into its place. It can be shown via induction that, when the whole stream is traversed, the buffer 301 contains k random samples. The approach of the reservoir sampling has a complexity of O(n) making it an efficient buffering solution for span categorization.

Figure 5 illustrates a method for selecting a set of k spans from a plurality of spans in the span buffer, each span being a vector of property-value pairs. The method comprising repeatedly performing the steps 501 to 503. At step 501 , the method comprises receiving an i-th span from the plurality of spans. At step 502, the method comprises randomly selecting that i-th span with a probability of k/i. At step 503 the method comprises, if that i-th span is selected, storing it as one of the k spans.

This method may be conveniently applied where there are insufficient computational resources to process all incoming spans. This ensures that a representative sample of spans is input into the progressive learning module, the operation of which will now be described.

The span categorization system may categorize a large number of spans by applying a procedure implemented by the progressive learning module 302. From a conceptual perspective, progressive learning can be formulated as assigning a Boolean value to each pair (s i Cj ) e S x C where S is the domain of spans and C = { c t , c 2 , ... , is a set of dynamically learned categories: f((si, Cj)) ® {T, F], Sl G S, Cj G C, |5| » \C\ (1 )

A value of T (true) assigned to (s^ C j ) indicates a decision to assign s; to c 7 ·, while a value of F (false) indicates a decision not to assign s; to c . Formally, the task is to approximate the unknown target function f-.S C ® { T,F } (that describes how spans should be categorized) by means of a function f'-.S x C ® { T,F } called the categorizer such that / and f concur as much as possible. Similar spans are assigned to the same category since they exhibit similar behaviours expressed by a subset of their pairs property-value ( Vi > v d Compared to previous existing techniques from the field of machine learning, the categories of spans are not pre-defined. Instead, they are learned dynamically as spans are seen.

The progressive learning module 302 is responsible for executing six main steps, as illustrated in Figure 6. These steps are:

- Step 601 : Extract Features

- Step 602: Index Ports

- Step 603: Model Spans

- Step 604: Label Nodes

- Step 605: If the Model is to be Extended: Go to Step 601 else Go to Step 606

- Step 606: Extract Patterns

The steps 601-604 are executed by the model constructor module 305, step 605 is executed by the model adaptor module 306, while step 606 is executed by the pattern extractor module 307. These steps are described in the following sections.

During learning, and for each span read from the span buffer 301 , step 601 extracts information related to the calls made between microservices. In one embodiment, the following fields are extracted from the structure of a span (shown in Figure 2): method, endpoint. To make a call, a method is used together with the endpoint of a remote microservice. Calls between microservices are made using a method, also known as HTTP method or HTTP verb. Examples of methods are GET, POST, PUT, and DELETE. Advantageously, the method is extracted from spans, since in many cases it is correlated with the behaviour of microservice calls. For example, the call GET /v1 /customer/123/ and DELETE /v1 /customer/123/ may have a totally different response time since the first call retrieves information about customers, but the second call deletes records about customers. Intuitively, the task of retrieving information and deleting a record are very different procedures which have a different response time. Thus, these endpoints should be assigned to different categories. Besides a method, a microservice call also includes a HTTP endpoint. An example of a HTTP endpoint is shown in Figure 7. An endpoint is a URL structure which identifies the endpoint (remote procedure) called during inter-process communication. An endpoint has a schema, user, password, host, port, path, query, and fragment (several of these elements are optional). In this embodiment, the following elements are further extracted from an endpoint: port; path. These two elements are considered since they are also highly correlated with, for example, the response time of invoking a remote microservice. Different microservices are typically assigned to an internal, well-know, unique port. Thus, different ports refer to distinct microservices and different paths refer to distinct functions. The path component is divided into path segments (keywords) using the separators {/, ?} to identify keywords boundaries.

For example, a span with a method GET and with the endpoint http://user:pass@cloud.com:5005/v2/instances/list is processed by the model constructor module to create the following structure, called a span abstraction: <port=5005, method=GET, path=(v2, instances, list)>

Figure 8 shows additional examples of the results of the span extraction step.

Whilst the above example describes the use of the properties method, port, and path, it is also possible to include additional or alternative properties. For example, the host can be included and concatenated with the port number to enable a more fine-grained categorization when the characteristics of the host are relevant. The HTTP result code (see Figure 2) may also be extracted to enable a categorization which also accounts for successful and unsuccessful remote microservice calls.

In step 602, the feature of the port number of span abstractions is used as a key to create an entry in a hash table with its value referencing the root node of a trie structure, as shown in Figure 9. The trie is used to model spans for a specific port. Since ports are distributed uniformly across the hash table, the table enables to spans to be efficiently assigned to a trie until a complete model is constructed.

While categorizing spans using only the port as a property is typically not computationally expensive, other more fine-grained categorization implementations require an efficient indexing. For example, the number of keys of the form <host ip, port> can grow exponentially and generate several million entries for planet-scale microservice applications. Thus, the constant time 0(1) access to the structure provided by hash tables is particularly advantageous. In step 603, a trie data structure (also known as a prefix tree) is used to create a model for the spans. In its original form, a trie is a tree-like data structure. The trie comprises nodes and edges. It is similar to a binary tree, and it is a very efficient implementation of an ordered index for text-based keys. Retrieving and determining the existence of paths stored in a trie is linear. Thus, the trie is suitable for modelling spans. This structure is herein referred to as a span trie.

Letr = (N, E) be a span trie with N being the set of nodes, E the set of edges, and r e N is the root node of T. The alphabet å of the span trie is the set of path segments and methods: { p | p e path ; } u { methodi }. Let S be the set of s strings over alphabet å. A span trie T that stores the strings in 5 is a structure where each node of T (except the root node) is labelled with a character c e å. T has s terminal nodes, each associated with one string in S. The path from the root node to a terminal node has exactly one string.

To construct a span trie, the method and the path from a span abstraction are retrieved from the span extractor 304. Both features are joined to create a string s of 5 called a qualified path, with the form of a sequence (method, path). For example, (GET, v2, instance, list). Qualified paths (or q. paths) are added to the span trie.

Figures 10(a)-(c) respectively show a trie with the following three qualified paths which were added over a period of time:

Path 1 : (a, b, c, d, e) Path 2: (a, b, c, d, f)

Path 3: (a, g, h, d, f)

Nodes a, b, c, d, e, ... correspond to the elements of a qualified path, e.g., a=GET, b=v2, c=instance, d=AB123456, e=list. During the trie construction, edges are annotated with the number of qualified paths which go through them. This number is called the edge frequency. The frequency of an edge is a mapping ef . E ® N\{0], denoted by ef(u,v). It represents the “flow” that passes through an edge. For example, in the span trie in Figure 10(b), the edge (a, b) has a frequency of ef(a, b) = 2.

Terminal nodes are used to distinguish nodes as end of string nodes. These nodes do not have outgoing edges or are nodes for which the number of incoming edges is greater than the number of outgoing edges. These nodes enable the identification of the last node of q. paths. In the figures, these nodes are coloured using light grey.

Figure 11 shows a span trie constructed using the following three qualified paths:

• (GET, v1 , XYZ11 , list)

• (GET, v1 , XYZ11 , list, RST22)

• (GET, v2, XYZ99, list, RST88)

The terminal nodes are identified as list, RST22 and RST88, shown at 1101 , 1102 and 1103 in Figure 11 respectively.

Inserting and searching for a qualified path is efficient and can be achieved in linear time complexity O(n), where n is the length of the path.

In step 604, the nodes are labelled. A span trie can be used to efficiently categorize spans to be assigned to a specific time series. A simple categorization approach comprises concatenating the elements of qualified paths from the root to terminal nodes to create unique category or time series identifier. Nonetheless, a closer observation of a span trie structure revels that three types of disjoint nodes exist. These types of nodes are method nodes, schema nodes and parameter nodes. For example, in the qualified path (GET, v1 , XYZ11 , list, RST22), GET is a method node, v1 and list are schema nodes, and XYZ11 and RST22 are parameter nodes. XYZ11 and RST22 correspond to path parameters.

These parameters are arguments that can be passed within an endpoint to parametrize the response of a microservice call. REST-based microservice applications support four types of parameters: 1) path parameters, 2) query string parameters, 3) header parameters, and 4) request body parameters. Described herein are examples of handling path parameters. Other types of parameters can be considered in other implementations by also adding them to a span trie.

Path parameters are found within the path of an endpoint, before the query string (identified with symbol Ύ). Path parameters are represented herein using curly brackets. For example, the path L/1/CUZ11/list/RST22 has two path parameters: XYZ11 and RST22, and the path is represented with /v1/{P1}/list/{P2}, where P1 and P2 are the parameters.

If a span trie is used to categorize spans, it needs to identify which nodes are parameters nodes, since they are not taken into account by the categorization procedure. Otherwise, the number of categories will be large and its is likely that there will only be a few spans assigned to each category, which may make the application of analytics rather useless.

The objective of step 604 is therefore to label each node as a schema node or as a parameter node.

Two important insights from analysing span tries need to be considered for the node labelling task. Firstly, the number of distinct parameters nodes is substantially larger than the number of distinct schema nodes. Secondly, schema nodes occur with a higher frequency than parameters nodes. As qualified paths are added to a span trie, the number of distinct schema nodes has been observed to grow slowly when compared to the number of parameter nodes since the domain of parameters is usually much larger than the schema domain. For example, if parameters {P1 } and {P2} of path /v1/{P1}/list/{P2} are customer ID and product ID, respectively, the size of their domain should be much larger than the number of elements that belong to the schema nodes such as v1 and list.

To simplify the description of the node labelling procedure, in this example it is assumed that each depth d i i e (2, ...,n} of a span trie starting at the root node only has schema nodes or parameter nodes, but not both. The level of the root node is 0 and level 1 corresponds to method nodes. The level of all the other nodes is calculated using the topological sort operation breadth first (BFS) search. BFS queues every node at a fixed depth before visiting the next depth. Furthermore, two consecutive depth levels cannot contain parameter nodes. For microservice applications, this translates into having URL paths for which parameters occur always in the same position using the separator 7’ and no two parameters can occur in sequence. The following observation holds for span tries: |P;| » \S j \, where Pi and S are the set of parameter and schema nodes at depth I and j respectively. The main insight from this observation it that once sufficient spans have been observed, the number of distinct parameters nodes is far greater than the number of schema nodes. Distinct schema nodes are fewer and have a higher incoming edge frequency than parameters nodes. Parameter nodes have a larger number of distinct nodes and have a lower incoming edge frequency than schema nodes. Thus, to identify schema nodes, the nodes with high incoming edge frequency are identified. Correspondingly, parameter nodes have a low incoming edge frequency.

A first approach is to use a threshold variable k. When a node has a number of incoming edges greater that k, it is labelled as a schema node, otherwise it is a parameter node. However, this approach may sometimes fail, since as more spans are analysed, the number of incoming edges of some parameter nodes may increase beyond threshold k, thus being incorrectly classified. On the other hand, schema nodes which are infrequent (as they are not often part of traces) would be misclassified as parameter nodes. A labelling method therefore should not use a global threshold but should use local and contextual information to each node.

An alternative approach is to partition all of the nodes in the trie into two sets, based on a threshold of the incoming edge frequency of nodes. Finding a threshold for the two sets can be performed by, for example, calculating the mean of the incoming edge frequency. A high edge frequency is therefore above the mean and a low edge frequency is below the mean. An alternative, more robust, approach to outliers is to sort the edge frequencies, identify the two most distant edge frequencies in sequence, and use their mean as the boundary between low and high frequencies.

Figure 12 shows an example using the mean of the incoming edge frequency to separate the two sets. In this example, the average edge frequency is 2.42. Therefore, the nodes with a high incoming edge frequency (>2.42) are {GET, v1 , v2, list, info}, show at 1201 , 1202, 1203, 1204 and 1205 respectively. The nodes with a high incoming frequency are the schema nodes and are coloured in black in Figure 12. The procedure has a low running time complexity and it is executed in 0(n + rri), where n and m are the number of nodes and edges, by analysing all the nodes once and counting the outgoing edges.

In step 605, the labelling of nodes is suspended. The execution of steps 601-604 provides a method to add span abstractions to a span trie and to automatically colour terminal, schema and parameter nodes. For efficiency reasons, it is advantageous to stop the learning phase as soon as possible. Thus, a mechanism is required to, at any given time, estimate how many spans still need to be analysed to build a representative span trie. This presents a two-fold challenge, in that the system does not know how many distinct span abstractions exist and the system does not know when sufficient spans have been analysed to build a representative span trie. It is assumed that each span observed is random, independent from other spans, with equal probabilities, and that the total number of parameter nodes presented by all the spans is far greater than the schema nodes, for example, \P\ » |S| . In such a scenario, it is desirable to determine the average number of spans which needs to be analysed to see all of the possible parameter and schema nodes and, thus, stop the learning phase. It is assumed that a microservice application generates |S| spans over its lifespan, with N distinct nodes (schema or parameter nodes). The microservice application starts its operation and starts generating a stream of spans. Each time a new span is generated, the span trie is updated. It is desirable to know the estimated number of spans which needs to be analysed to see all the N distinct nodes. E(S ) denotes the expected number of spans which need to be analysed to see all the

N distinct node types. Then £ " (0) = 0, and

E(S ) is the partial sums of the harmonic series of the first n positive integers. For example, for N = 2, E(S) = 3 and for N = 7, £"(5) = 18. 15. Thus, 3 and 19 observations are needed to identify 3 and 7 distinct node types, respectively. For larger values of N the following well known approximation may be used: where y * 0. 5772 is the Euler-Mascheroni constant. E(S) can be approximated as N ® +00 by:

The result is that, on average, N(ln N + y) spans need to be collected to analyse at least one node of each type. As shown in Figure 13, when N = 100, 419 observations on average are required to collect all the different span types. Since parameter nodes typically have a larger domain size, when N = 10000, it takes 87876 observations.

These results offer the important insight that the derivative of the number of distinct schema nodes as a function of the number of spans observed stops changing much faster than the derivative of parameter nodes since the domain size of schema nodes is smaller. Thus, the relative change rate of schema node labels may be analysed and the derivative used as a stop condition when it reaches the value zero (or a value close to, or approximately, zero). If the change rate of parameter nodes is used to derive a stop condition, it may happen that the training phase would never finish, as new parameter nodes can be seen over time. This scenario may occur, for example, if new customers or product are regularly added to an application and the endpoints use customer/product IDs to pass arguments to microservices calls.

Thus, the approach uses the following procedure to stop the learning phase:

1. a = 0.000001; b = 10

2. acc = 0

3. count old = maxjnt

4. while acc < b :

5. span = read_spanQ

6. T = update _span_trie (span)

7. count new = \high_incoming_edge_frequency(G)\ 8. D = ( count new - count old )/ count old

9. if A < or.

10. acc += 1

11. else :

12. acc = 0

While loop (4) controls how many spans the learning phase needs to process before terminating, each loop reads a span (5), updates a span trie T (6), and counts the number of nodes labelled as schema nodes (7) using the function \high_incoming_edge_frequency(G)\. The delta variable D stores the change rate of the span trie T (8). Variable acc is an accumulator, which memorizes the number of consecutives iterations which had a zero or very small change rate (9-12). When b consecutive spans did not cause a change to the structure of trie T (i.e., while acc < b ), the procedure stops the learning phase.

Once the learning phase has ended, when the rate of change of the number of schema nodes (those with an incoming edge frequency that is greater than a predetermined threshold) is approximately zero, the identified schema nodes are stabilized and step 606 in Figure 6 extracts patterns corresponding to the features of the identified paths, which will be used to assign spans to categories (and, optionally, time series ids) in the span categorization module.

The first procedure to execute in step 606 is to compress the span tries by replacing parameter nodes. For each span trie, the procedure is as follows. Firstly, for each level of nodes in the span trie, low edge frequency nodes (i.e. nodes having an incoming edge frequency that is below a predetermined threshold) having the same parent node are merged into a unique parameter node. Then, newly created parameter nodes are each labelled with a unique identifier (for example, {¾). Next, the edge frequency of new parameter nodes is updated. For example, the span trie of Figure 14(a) is reduced into the trie shown in Figure 14(b).

The reduction of the trie is efficient and can be achieved in 0(n + m), where n and m are the number of nodes and edges, using topological sort or the Breadth-First Search (BFS) algorithm. The reduced span trie can be further optimized by merging parameter nodes at the same level with different parent nodes, and by transforming the structure into a deterministic acyclic finite state automaton (DAFSA). Since DAFSA allow the same vertices to be reached by multiple paths, this alternative data structure uses significantly fewer vertices than a trie. The second procedure of step 606 is to extract patterns from the reduced span tries. Firstly, all of the paths from the root to the terminal nodes are determined. Then, for each path, the path support is calculated.

The path support is calculated as follows. For nodes with outgoing edges, the support is calculated as: support = incoming_edge_frequency — outgoing_edge_frequency (5)

For the other nodes without outgoing edges, the support is equal to the incoming edge frequency.

For example, in the span trie of Figure 14 (b), the three existing patterns have the following support: · port=1343, method=GET, path=v1 , support = 1

• port=1343, method=GET, path=v1 , {P1 }, list, {P3}, support = 4

• port=1343, method=GET, path=v2, {P2}, info, support = 3

The most frequent patterns are used to create golden patterns. Each pattern with a support greater than a threshold t is kept (i.e., alternatively keep the top k patterns). The other patterns are ignored. In the previous example, if l = 3, then patterns <port=1343, method=GET, path=(v1 , {P1 }, list, {P3}) and <port=1343, method=GET, path=(v2, {P2}, info)> are kept.

Using Depth-First Search (DFS) algorithm, extracting all the paths between the root node and terminal nodes is an efficient procedure since it can be achieved in 0(n + m), where n and m are the number of nodes and edges respectively.

Each pattern extracted from a span trie T is transformed into a 4-tuple (port, method, regex, tsjd), where regex is a regular expression for the endpoint and tsjd is a time series id. For example, the second pattern from the previous golden patterns is transformed into the 4-tuple:

• port = 1343

• method = GET · regex = 7v2Aw/listAw$

• tsjd = 1343_G ET_v2_ * _info_ *

The port and method fields are copied from the golden pattern. The regular expression regex is created by joining path elements using the slash separator. Parameter nodes labelled with { P x } are transformed to match any character in the set [a-zA-Z0-9J, represented with \w in the previous example.

The time series id may be created by providing a unique string representation of the concatenation of the port, method and the path regular expression. For example, this may be done by replacing slashes by underscores and the \w character set by the star symbol * . If domain knowledge is available, other mechanisms can be used to create regular expressions and time series ids.

The regex field is included into the general regular expression which will be used to parse HTTP/URL endpoints. ep_regex = A ((http[s]?|ftp):V)? V? ([ A :V\s]+) (:<Port>) (<regex>) ([\w\-\.]+[ A #?\s]+) (\?([ A #D) ?(#(. * ))?$ The output of the step 606 (extract span patterns) is a list of 3-tuple (method, ep_regex, tsjd). The patterns are therefore represented by regular expressions.

The extracted span patterns may therefore each correspond to a category. When analysing the spans, the span categorization module can categorize spans into that category by matching the extracted features of spans to the features of the pattern. Figure 15 summarises the above steps and illustrates a method for determining a pattern of features for extracting from a span in a microservice application, each span being a vector of property-value pairs. At step 1501 , the method comprises extracting a plurality of features from a plurality of spans. At step 1502, the method comprises forming a series of span abstractions corresponding to the spans, each span abstraction comprising the features of a respective span. At step 1503, the method comprises forming a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie. At step 1504, the method comprises identifying nodes in the trie whose incoming edge frequency is greater than a predetermined threshold. At step 1505, the method comprises selecting the features of the paths to each such node as a pattern and storing that pattern for future detection.

In situations where there are insufficient computational resources to process all incoming spans, the progressive learning module may only process and extract patterns from subset of spans selected by the span buffer. However, if sufficient resources are available, the progressive learning module may learn from all incoming spans (i.e, in the span extractor, features are extracted from all incoming spans).

Once the features to be extracted have been identified in the progressive learning module, these features are extrcted from incoming spans in the span categorization module, 303 in Figure 3, the operation of which will now be described.

The span categorization module 303 processes new spans. It receives spans and extracts their method and endpoint using the same procedure described in step 601 (extracting features). These fields are matched against the method and the regular expression ep_regex of golden patterns.

When a match exists with a pattern, the span categorization module 303 emits the category to which a span is assigned, and may optionally assign a time series id to the span. Otherwise, the None keyword is returned and the span can be stored by the span collector module 308 for further processing. Using the previous example, if a microservice A calls another microservice B, and a new span with method GET and with endpoint http://192.168.5.2: 1343/v2/CUST03/list/PROD09 is generated by the trace server, the span categorization module iterates over the golden patterns to identify the category, and/or time series id, to emit. Since the following pattern matches the method GET and the endpoint of the span, the system emits the time series id 1343_G ET_v2_ * _info_ * .

A ((http ?(#(. * ) This step is efficient, since building a Deterministic Finite Automaton (DFA) using a standard algorithm to recognize an ep_regex is done only once and has a running time of 0( 2 n ), being n the size of the regex. For microservice applications endpoints, n is typically less than 10. Thus, the complexity of creating m patterns can be approximated by m 0(1). Once it is built, processing an endpoint is linear 0(ri), being n the number of elements of a path.

When the span collector module 308 has stored more than k spans, the learning phase is restarted. Since no golden pattern was able to match k span abstractions, it is likely that the analysis of the spans in the span collector module will make the structure of the span trie change and, thus, generate a new set of patterns. Figure 16 illustrates a method for categorizing spans in microservice applications, each span being a vector of property-value pairs. This method is performed by the span categorization module. At step 1601 , the method comprises receiving a plurality of spans. At step 1602, the method comprises extracting a plurality of features from each span. At step 1603, the method comprises categorizing the plurality of spans into a plurality of categories in dependence on the extracted features.

Each of the modules described above may comprise a processor and a non-volatile memory. Each module may comprise more than one processor and more than one memory. The memory may store data that is executable by the processor. The processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium. The computer program may store instructions for causing the processor to perform its methods in the manner described herein. The components may be implemented in physical hardware or in the cloud. Some advantages of the present invention will now be described.

The categorization system is advantageously able to handle span bursts. Since large- scale microservice applications can generate millions spans in short periods of time, the mechanism is able to decouple span producers from span consumers. Traditional solutions which rely only on the use of message queuing systems for buffering do not guarantee the construction of a representative span model. The present invention uses a sound statistical sampling technique to cope with span bursts. This makes the process scaleable.

The system also allows for improved processing efficiency. The system achieves a low running time complexity 0(nk), being n the number of spans and k the number of elements of a path (typically a constant number <10), by exploring the use of method calls and endpoints for categorizing spans. This makes the approach scalable and suitable to efficiently process spans originating from large-scale microservice applications.

Traditional approaches often rely on the notion of distance to cluster or classify URLs (i.e., endpoints). However, researchers have shown that these measures are in many scenarios inadequate since endpoints separated by a very small distance may provide information about different concepts, while distant endpoints may be related to resources with similar concepts. Thus, the present approach relies on a probabilistic model which takes into account the semantics of remote REST calls using URLs.

The method proposed is particularly suitable to analyse microservice applications for which the initial number of span categorizes is unknown and online learning from real time data is required. It draws its fundamental operation from the field of machine learning by using a progressive and open-world learning approach. Many traditional classification methods make a closed world assumption, i.e., the classes to which an element can be assigned are identified during training and are the ones which will be used in testing. No new classes can appear during testing. This principal does not hold with large-scale, dynamic microservices applications which are in permanent change and evolution. Microservices start, reboot, and shutdown as development teams introduce, split, improve, and deprecate services which leads to the constant change of span types. The system therefore uses an open-world view, since new span categories can appear during testing which were not seen during training. This approach is particularly suited for large-scale, dynamic microservices applications which are in permanent change and evolution and contrasts with the closed-world assumption taken by many existing classification methods.

The system also allows for progressive learning. Span categories are learned as new spans are observed by the system. The approach automatically stops when the change rate derivative converges. This characteristic provides an effective stop condition preventing the system from analysing redundant spans which do not add additional information to the knowledge model.

The processing and categorization of spans uses efficient data structures, such as hash tables and tries, to offer a solution with a linear running time, resulting in improved performance.

This innovation brings several benefits for the development of commercial systems and tools for microservice application performance management (APM). Such applications use time series analysis to support important tasks, such as anomaly detection, behaviour prediction, change point detection, and trend analysis.

Although the categorization system and methods described herein are described as being particularly advantageous for use for time series analysis, the system and methods may also be applied to any field of data analysis and not only time series analysis. They may also be applied to other applications outside of microservice applications.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.