Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR SERVICE INSTANCE LEVEL WORKLOAD MONITORING FOR AMF MANAGED APPLICATIONS
Document Type and Number:
WIPO Patent Application WO/2016/170387
Kind Code:
A1
Abstract:
A monitoring engine includes a monitoring server and a set of monitoring clients is adapted to measure the workload of a service in a system managed by Availability Management Framework (AMF). The system includes one or more components that receive component service instance (CSI) assignments for the service. Each monitoring client runs on a node in the system and is adapted to detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment, and to obtain a system resource usage of the component resulting from the assignment based on the information. The monitoring server is adapted to receive the system resource usage attributable to the service from the set of monitoring clients, and to aggregate the system resource usage to obtain the workload of the service.

Inventors:
KHAN MEHRAN (CA)
TOEROE MARIA (CA)
Application Number:
PCT/IB2015/052879
Publication Date:
October 27, 2016
Filing Date:
April 20, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (PUBL) (SE)
KHAN MEHRAN (CA)
TOEROE MARIA (CA)
International Classes:
G06F9/50
Domestic Patent References:
WO2008149302A12008-12-11
Foreign References:
US20050216860A12005-09-29
US20120240129A12012-09-20
Other References:
None
Attorney, Agent or Firm:
RAHMER, David et al. (8400 Decarie BoulevardTown Mount Royal, Québec H4P 2N2, CA)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A computer-implemented method (1100) of a monitoring engine (500, 1200, 1322) for measuring workload of a service in a system (120, 220) managed by Availability Management Framework (AMF), the method comprising:

detecting (1101), at runtime, an assignment of a component service instance (CSI) of the service to a component to obtain information identifying the assignment, wherein the system includes one or more components that receive CSI assignments for the service;

obtaining (1 102) a system resource usage of the component resulting from the assignment based on the information; and

aggregating (1103) the system resource usage attributable to the service from the set of components to obtain the workload of the service.

2. The method of claim 1, wherein the monitoring engine includes a monitoring server

(100, 1210) and at least one monitoring client (200, 1220), the method further comprising: receiving, by the monitoring server, component workload data at fixed intervals from the at least one monitoring client; and

generating a histogram of the workload of the service based on the component workload data.

3. The method of claim 1, wherein detecting the assignment further comprises:

detecting a new assignment, a changed assignment or a removed assignment of the component using an instrumented component interface.

4. The method of claim 1, wherein obtaining the system resource usage further comprises:

identifying, among processes that run in the system, a process that implements the assignment to the component based on the information; and

obtaining the system resource usage of the process.

5. The method of claim 1, wherein the system resource usage comprises at least one of: memory usage and processor usage.

The method of claim 1, further comprising: placing, by a monitoring client, the system resource usage of the component into a data structure that includes at least an identifier of the component and an identifier of the CSI; and sending the data structure from the monitoring client to a monitoring server for aggregation.

7. The method of claim 6, wherein the data structure additionally include a High

Availability (HA state) of the component for the assignment.

8. The method of claim 1, wherein aggregating the system usage further comprises: constructing a hierarchical data structure in which a service instance (SI) of the service is higher in hierarchy than a set of CSIs belonging to the SI, and the set of CSIs are higher in hierarchy than the set of components that are assigned the set of CSIs; and

calculating the workload of the SI by combining the system resource usage of all components under the SI in hierarchy.

9. The method of claim 8, wherein the hierarchical data structure contains a hierarchical key -value sequence for: SI, node, HA state, CSI, component and usages.

10. The method of claim 8, further comprising:

updating, during runtime, the hierarchical data structure at a component level to reflect a changed CSI assignment for the service; and

re-calculating the workload of the SI using the updated hierarchical data structure.

11. A monitoring engine system (500, 1200) for measuring workload of a service in a system (120, 220) managed by Availability Management Framework (AMF), the monitoring engine system comprising:

a set of monitoring clients (200, 1220), each of the monitoring clients runs on a node in the system, wherein the system includes one or more components that receive component service instance (CSI) assignments for the service, and wherein each of the monitoring clients is adapted to:

detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment; and

obtain a system resource usage of the component resulting from the assignment based on the information; and a monitoring server (100, 1210) coupled to the set of monitoring clients, the monitoring server adapted to:

receive the system resource usage attributable to the service from the set of monitoring clients; and

aggregate the system resource usage to obtain the workload of the service

12. The monitoring engine system of claim 1 1, wherein the monitoring server is further adapted to:

receive component workload data at fixed intervals from the set of monitoring clients; and

generate a histogram of the workload of the service based on the component workload data.

13. The monitoring engine system of claim 1 1, wherein each monitoring client is adapted to detect a new assignment, a changed assignment or a removed assignment of the component using an instrumented component interface.

14. The monitoring engine system of claim 1 1, wherein each monitoring client is further adapted to:

identify, among processes that run in the system, a process that implements the assignment to the component based on the information; and

obtain the system resource usage of the process.

15. The monitoring engine system of claim 1 1, wherein the system resource usage comprises at least one of: memory usage and processor usage.

16. The monitoring engine system of claim 1 1, wherein each monitoring client is further adapted to:

place the system resource usage of the component into a data structure that includes at least an identifier of the component and an identifier of the CSI; and

send the data structure to the monitoring server for aggregation.

17. The monitoring engine system of claim 16, wherein the data structure additionally include a High Availability (HA) state of the component for the assignment.

18. The monitoring engine system of claim 1 1, wherein the monitoring server is further adapted to:

construct a hierarchical data structure in which a service instance (SI) of the service is higher in hierarchy than a set of CSIs belonging to the SI, and the set of CSIs are higher in hierarchy than the set of components that are assigned the set of CSIs; and

calculate the workload of the SI by combining the system resource usage of all components under the SI in hierarchy.

19. The monitoring engine system of claim 18, wherein the hierarchical data structure contains a hierarchical key-value sequence for: SI, node, HA state, CSI, component and usag

20. The monitoring engine system of claim 18, wherein the monitoring server is further adapted to:

update, during runtime, the hierarchical data structure at a component level to reflect a changed CSI assignment for the service; and

re-calculate the workload of the SI using the updated hierarchical data structure.

21. A monitoring engine system (1200) for measuring workload of a service in a system (120, 220) managed by Availability Management Framework (AMF), the monitoring engine system comprising:

a set of monitoring client modules (1220, 200), each of the monitoring client modules runs on a node in the system, wherein the system includes one or more components that receive component service instance (CSI) assignments for the service, and wherein each of the monitoring client modules further comprises:

a detection module (1201) adapted to detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment; and

a data retrieval module (1202) adapted to obtain a system resource usage of the component resulting from the assignment based on the information; and

a monitoring server module (1210, 100) coupled to the set of monitoring client modules, the monitoring server module further comprising:

an interface module (1203) adapted to receive the system resource usage attributable to the service from the set of monitoring clients; and

an aggregation module (1204) adapted to aggregate the system resource usage to obtain the workload of the service.

Description:
METHOD FOR SERVICE INSTANCE LEVEL WORKLOAD MONITORING FOR AMF

MANAGED APPLICATIONS

TECHNICAL FIELD

[0001] Embodiments of the invention relate to workload monitoring for Availability

Management Framework (AMF) managed systems.

BACKGROUND

[0002] The term "elasticity" in the context of a cloud system refers to the ability of the cloud system to automatically adapt to workload changes by provisioning and de-provisioning system resources. Thus, at any point in time, the available resources match the current demand as closely as possible. In a typical cloud system, elasticity is managed based on system resource usage of virtual machines (VMs) running an application; i.e., the system resource usage of the VMs is equated to the system resource usage of the application hosted in the VMs.

[0003] However, for an Availability Management Framework (AMF) managed application that runs in a cloud system, elasticity is managed based on system resource usage of service instances (Sis). AMF is a middleware implemented according to the Service Availability (SA) Forum specifications for ensuring the availability of services of a component-based clustered system. AMF manages the availability of application services based on an AMF configuration. An AMF configuration includes two kinds of entities: the service provider entities and the service entities. The smallest service provider entity is a component, on which AMF performs error detection, isolation and repair. A component represents a specific resource such as a process, which is capable of providing a set of functionalities of an application. Service entities such as component service instances (CSIs) represent workload. To provide a given

functionality at runtime, AMF assigns a CSI to the component. Higher level services are combinations of these functionalities and they are represented in the AMF configuration as Sis. One or more components that collaborate to provide an SI are placed together to compose a service unit (SU). Hence, an SI is the workload assigned to an SU at runtime.

[0004] For AMF managed applications, the workload represented as CSIs and Sis are distributed in the system according to the applicable redundancy model defined by the AMF configuration and a current state (referred to as the high availability (HA) state) of the service provider entities.

[0005] AMF may use an elasticity engine to manage elasticity. The elasticity engine manages the resource usage by manipulating the application configurations, which results in AMF redistributing the SI assignments according to the configuration change. The elasticity engine receives workload changes in terms of Sis as input. [0006] Most of the currently available workload monitoring solutions are either directed to specific hardware, such as Central Processing Unit (CPU) utilization and memory usage, or developed for popular proprietary cloud systems, such as Amazon Web Services™ or Windows Azure™. Currently, there is no monitoring solution that can provide the elasticity engine with workload changes in terms of Sis.

SUMMARY

[0007] According to one embodiment, a method performed by a monitoring engine is provided for measuring workload of a service in a system managed by AMF. The method comprises the steps of detecting, at runtime, an assignment of a component service instance (CSI) of the service to a component to obtain information identifying the assignment, wherein the system includes one or more components that receive CSI assignments for the service; obtaining a system resource usage of the component resulting from the assignment based on the information; and aggregating the system resource usage attributable to the service from the one or more components to obtain the workload of the service.

[0008] According to another embodiment, a monitoring engine system is provided for measuring workload of a service in a system managed by AMF. The monitoring engine system comprises: a set of monitoring clients and a monitoring server coupled to the set of monitoring clients. Each of the monitoring clients runs on a node in the system. The system includes one or more components that receive CSI assignments for the service. Each of the monitoring clients is adapted to: detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment; and obtain a system resource usage of the component resulting from the assignment based on the information. The monitoring server is adapted to: receive the system resource usage attributable to the service from the set of monitoring clients; and aggregate the system resource usage to obtain the workload of the service.

[0009] According to yet another embodiment, a monitoring engine system is provided for measuring workload of a service in a system managed by AMF. The monitoring engine system comprises a set of monitoring client modules and a monitoring server module coupled to the set of monitoring client modules. Each of the monitoring client modules runs on a node in the system. The system includes one or more components that receive CSI assignments for the service. Each of the monitoring client modules further comprises: a detection module adapted to detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment; and a data retrieval module adapted to obtain a system resource usage of the component resulting from the assignment based on the information. The monitoring server module further comprises: an interface module adapted to receive the system resource usage attributable to the service from the set of monitoring clients; and an aggregation module adapted to aggregate the system resource usage to obtain the workload of the service.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to "an" or "one" embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[001 1] Figure 1 illustrates service workload measurement in a system managed by AMF according to one embodiment.

[0012] Figure 2 illustrates a cloud system environment in which service workload is monitored according to one embodiment.

[0013] Figure 3 illustrates a monitoring client that detects CSI assignments to components according to one embodiment.

[0014] Figure 4 illustrates an example of component workload measurement according to one embodiment.

[0015] Figure 5 illustrates the architecture of a monitoring engine according to one embodiment.

[0016] Figure 6 illustrates an example of a hierarchical data structure generated by a monitoring server according to one embodiment.

[0017] Figure 7 illustrates an example of an SI hash table and an Sl-usage hash table according to one embodiment.

[0018] Figure 8 is a flow diagram illustrating a method for generating or updating an SI hash table according to one embodiment.

[0019] Figure 9 is a flow diagram illustrating a method for generating or updating an Si-usage hash table according to one embodiment.

[0020] Figure 10 illustrates an example of SI workload measurement by a monitoring server and monitoring clients according to one embodiment. [0021] Figure 11 is a flow diagram illustrating a method of a monitoring engine for measuring workload of a service in a system managed by AMF according to one embodiment

[0022] Figure 12 illustrates a block diagram of a system for measuring service workload according to one embodiment.

[0023] Figure 13 illustrates a block diagram of a system for measuring service workload according to another embodiment.

DETAILED DESCRIPTION

[0024] In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

[0025] Embodiments of the invention provide a method and system of a monitoring engine that measures the workload of a service in an AMF managed system. More specifically, the "workload" measured by the monitoring engine is the system resource usage, and the "service" is the "SI" defined in an AMF configuration. In one embodiment, the monitoring engine includes a monitoring server and one or more monitoring clients. Each monitoring client is hosted on a node in a cluster. The monitoring clients collect component workload data and transmit the data to the monitoring server. The monitoring server aggregates the component workload data and generates SI workload data as output. In one embodiment, the monitoring clients transmit the component workload data to the monitoring server at fixed time intervals. The monitoring server generates a histogram of the SI workload based on the component workload data from all monitoring clients to enable continuous workload monitoring.

[0026] The monitoring engine described herein solves the problem of measuring the system resource usage in terms of Sis. The Sis of AMF managed applications are managed dynamically and distributed in the system according to the runtime state of the available SUs and the applicable redundancy model. Mapping the system resource usage measured in a system to the Sis is difficult because of the dynamic nature of the SI distribution. The use of the monitoring engine solves the problem of collecting and interpreting the system resource usage in terms of Sis. The monitoring engine completes a larger system (e.g., a cloud computing system) that bears the responsibility of managing the elasticity as well as high availability of AMF managed applications.

[0027] In one embodiment, the monitoring engine identifies components, CSIs and Sis by their Distinguished Names (DNs). Each of the AMF configuration entities has a DN that includes the entity's Relative Distinguished Name (RDN) and the DN of its parent entity. An entity's DN can be retrieved from the DN of its children. For example, the DN of an SI is included in the DN of its CSIs.

[0028] Figure 1 is a diagram illustrating a monitoring server 100 for measuring the workload of two Sis in a system 120 managed by AMF according to one embodiment. In this

embodiment, the system 120 includes N virtual machines (VMs). As an example, each of VMi and VM3 hosts a component CI that is assigned a CSI of a first SI (SI1), and each of VM 2 , VM 3 , . .. , VM N hosts a component C2 that is assigned a CSI of a second SI (SI2). These VMs report the system resource usage of their respective components (each shown as a circle with a solid outline) that have CSI assignments to the monitoring server 100. Each CSI assignment may be active or standby. Active means that the component is the primary service provider for that CSI, whereas standby means that the component is acting as secondary (redundant) entity for that CSI. The VMs do not report the system resource usage of spare components; for example, VM 2 reports the system resource usage of C2 but not the spare component C I (shown as a circle with a dotted outline). In one embodiment, the VMs send the system resource usage reports to the monitoring server 100 at fixed time intervals, and the monitoring server 100 aggregates the reports and generates a workload measurement for each SI.

[0029] Figure 2 is a diagram illustrating a cloud system environment in which service workload is monitored according to one embodiment. A cloud computing system 220 including three interconnected nodes (e.g., physical hosts or virtual machines) provides services to users in response to users' requests. In this example, each node hosts two components (e.g., CI and C2) and a monitoring client 200. Each monitoring client 200 measures, for the hosting node, the system resource usage of each component that has a CSI assignment. The monitoring clients 200 then report the measured usage (i.e., workload per component) to the monitoring server 100. The monitoring server 100 aggregates the reports and outputs the workload of each SI. In one embodiment, the output is sent to a workload analyzer 230, which determines the change (if any) in SI workload. The change in SI workload is sent to an elasticity engine 250, which outputs resource provisioning or de-provisioning instructions to the cloud computing system 220 to dynamically adjust allocated resources for each service.

[0030] In one embodiment, the monitoring server 100 and one or more of the monitoring clients 200 are collectively referred to as a monitoring engine. The monitoring engine architecture follows the server-client model. Each monitoring client 200 is hosted on a node that also hosts an AMF managed component. The monitoring server 100 may be hosted on a separate node, which may or may not be part of the cluster in which the monitoring clients 200 are hosted. Alternatively, the monitoring server 100 may be hosted on the same node as one of the monitoring clients 200 In all scenarios, the monitoring server 100 is reachable from all of the monitoring clients 200.

[0031] The monitoring engine described herein resolves three main challenges in service workload monitoring. First, the monitoring engine is capable of retrieving the distribution of CSI assignments in the system at runtime. Second, the monitoring engine is capable of retrieving the system resource usage that is relatable to the CSI assignments to the nodes. Third, the monitoring engine is capable of aggregating the system resource usage according to the distribution of CSI assignments to express the usage in terms of AMF services (i.e., Sis).

Techniques used by the monitoring engine for resolving each of the three challenges are described below.

[0032] Figure 3 is a diagram illustrating the detection of CSI assignments to components by a monitoring client according to one embodiment. The mechanism for the detection, as described below, resolves the aforementioned first challenge for service workload monitoring. The detection of CSI assignments enables the retrieval of CSI assignment distribution in the system at runtime. To manage the workload of a component 310, AMF 330 interacts with the component 310 through an AMF Application Programming Interface (API) 312. For each new CSI assignment, CSI assignment change or CSI assignment removal from the component 310, AMF 330 dispatches a callback to the component 310 through this interface 312. In one embodiment, the component 310 is instrumented with tracing probes; e.g., LTTng User Space Tracing (UST) probes (where LTTng is a tracing framework for Linux). Alternatively or additionally, the API 312 may be instrumented with the tracing probes. The tracing probes, at runtime, detect AMF callbacks and extract the CSI assignment information, such as the component DN, DN of the CSI assigned to it, the HA state assigned to the component on behalf of the CSI, and the Identifier (ID) of the process executing on behalf of the CSI assignment (that is, the ID of the process implementing the CSI assignment as part of the component). The monitoring client 200 includes an LTTng UST session daemon 350 that retrieves the CSI assignment information from the instrumented component 310.

[0033] Figure 4 is a diagram illustrating component workload measurement according to one embodiment. The component workload measurement on each node resolves the aforementioned second challenge of service workload monitoring; i.e., retrieving system resource usage that is relatable to the CSI assignments to the nodes. For each process that runs in the system, system resource usage 420 can be measured by a tracing tool. As described in connection with Figure 3, a component is implemented by a process that runs in a system. As the process ID has been obtained by the LTTng UST session daemon 350, the monitoring client 200 can retrieve the system resource usage of that process using the process ID. In this embodiment, the CSI assignment information (i.e., UST data 430) from the LTTng UST session daemon 350 indicates a new UST event. The UST data 430 may include, but is not limited to: a timestamp, a CSI identifier (e.g., CSI DN), a component identifier (e.g. component DN) and a process ID. The monitoring client 200 then incorporates the system resource usage identified by the process ID and the UST data 430 into component workload data 440, and transmits the component workload data 440 to the monitoring server 100.

[0034] In one embodiment, the workload (or system resource usage) associated with each process can be measured by analyzing the traces generated by a kernel tracing session. There are a number of tools for measuring system workload on a Linux platform and other platforms. For example, LTTng kernel tracing can be used to measure system workload since kernel traces include all task level interactions between the operating system and installed applications. Thus, per-process system resource usage in a Linux host may be obtained by analyzing kernel trace data on that host. Using the LTTng kernel tracing to measure the system workload can be performance intensive since the kernel traces include a large volume of additional data which is not necessarily related to the system workload. An alternative to the LTTng kernel tracing is the Python PSUtil, which is a Python module for measuring system workload per-process with low overhead.

[0035] Figure 5 is a diagram illustrating the architecture of a monitoring engine 500 according to one embodiment. In one embodiment, the monitoring engine 500 include one or more monitoring clients 200 (only one is shown) communicatively coupled to the monitoring server 100. Each monitoring client 200 is hosted on one of N nodes in a cluster. In one embodiment, each monitoring client 200 includes a monitoring engine client daemon 540, which, at startup, initiates an LTTng kernel tracing session 520 and an LTTng UST tracing session 350 in a Linux kernel 570. Although a Linux platform and specific tracing tools are described herein, it is understood that the monitoring engine 500 can be deployed on any software and hardware platforms with applicable tracing tools.

[0036] As described and illustrated in Figure 3, when AMF assigns a CSI to a component (e.g., the component 310) by dispatching a CSI set callback, the details of the callback to the instrumented component can be retrieved by the LTTng UST session 350 at runtime. In one embodiment, the UST data 430, which is extracted from callback information, includes one or more of: the component DN, the DN of the assigned CSI, the component' s HA state and the process ID. The UST data 430 is forwarded from the LTTng UST session 350 to the monitoring engine client daemon 540. Based on the UST data 430, the monitoring engine client daemon 540 fetches, from system resource usage 420 generated by the LTTng kernel tracing session 520 (or Python PSUtil), the workload associated with the component which have been assigned the CSI. The monitoring engine client daemon 540 wraps the workload information related to the component alongside with its assigned CSI and HA state into one workload data structure; i.e., the component workload data 440, which is transmitted to the monitoring server 100 through a client network daemon 550. The client network daemon 550 manages data transmission, reception, serialization and deserialization between the monitoring client 200 and the monitoring server 100. In one embodiment, the component workload data 440 is transmitted from monitoring clients 200 to the monitoring server 100 at fixed time intervals. In one embodiment, the fixed time interval can be set (e.g., configured) by a cloud administrator by updating a monitoring client configuration. In an alternative embodiment, the component workload data 440 may be transmitted by the monitoring clients 200 to the monitoring server 100 when requested by a cloud administrator.

[0037] In one embodiment, the monitoring server 100 includes a server network daemon 560, which receives the component workload data 440 from all monitoring clients 200 on the N nodes. In one embodiment, the component workload data 440 may be received at fixed intervals. The server network daemon 560 decodes the component workload data 440 from each node into decoded per-component workload objects 535, and forwards the decoded per-component workload objects 535 to a monitoring engine server daemon 510 in the monitoring server 100.

[0038] In one embodiment, the monitoring engine server daemon 510 includes an

aggregation module 580, which further includes a data structure generator 581 and an SI usage aggregator 582. The aggregation module 580 creates a hierarchical data structure for the component-CSI assignments, keeping the system usage at the bottom level of the data structure. The workload per SI (i.e., SI workload data 590) is then calculated by aggregating the system resource usage from the bottom to the top of the hierarchy. The operations performed by the aggregation module 580 resolves the aforementioned third challenge of measuring workload in terms of Sis.

[0039] Figure 6 illustrates an example of a hierarchical data structure in the form of a tree structure 600 generated by the data structure generator 581 according to one embodiment. The tree structure 600 is populated according to the following hierarchy from top to bottom: SI 630, its CSIs 620, the components 310 serving the CSI assignments, and the component' s workload and HA state for the CSI assignment. The tree structure 600 is updated by the data structure generator 581 with each newly arrived component workload data 440. After the tree structure 600 is updated, the SI usage aggregator 582 calculates the SI workload data 590 by aggregating the component workloads following its associated tree path. In one embodiment, the workload aggregation is performed at fixed time intervals to generate the system resource usage per SI. In another embodiment, the tree structure 600 may include an additional Sis level as the root on top of multiple SI tree nodes 630. Then the sub-tree under each SI tree node 630 represents the hierarchical data structure for the SI.

[0040] Scenarios in which the tree structure 600 is updated include, but are not limited to the following examples. When AMF changes the assignments of a CSI, the components' HA state may change in the tree structure 600. For example, the component which had the standby assignment may now has the active assignment, and vice versa. If a CSI is assigned to a new component, the new workload measurement of the component replaces the old measurement in the tree structure. Furthermore, if a CSI assignment is completely removed from a component, or if the component crashes, that component is removed from the tree structure 600.

[0041] In one embodiment, the component workload data 440 in Figures 4-6 may be transmitted from each monitoring client 200 to the monitoring server 100 in the form of

JavaScript Object Notation (JSON) objects using the Transmission Control Protocol (TCP) over every fixed time interval (e.g., every two seconds). In one embodiment, the JSON object may contain the following data structure for each component that has a CSI assignment: 'safComp=AmfDemo, safSu=SC-l, safSg=AmfDemo, safApp=AmfDemor: {

'cpu_usage': 1.0,

'memory usage' : 8,

'CSI': 'safCsi=AmfDemo, safSi=AmfDemo, safApp=AmfDemor,

HAState': 'Active'},

[0042] In one embodiment, each JSON object includes a list of component names. For each component, the JSON object also provides the DN of the CSI assigned to it, the system resource usage information such as CPU usage, memory usage, and the component's HA service status.

[0043] At the monitoring server' s 100 end, the server's aggregation module 580 converts all the JSON objects received from all the nodes in the cluster into a hierarchical data structure. In one embodiment, the hierarchical data structure is a nested hash table, also referred to as SI hash table. The nested hash table is an alternative to the tree structure 600 as previously described with reference to Figure 6. While converting JSON objects into the nested hash table, the aggregation module 580 (more specifically, the data structure generator 581 of Figure 5) follows the following hierarchical key-value sequence for: Sis > SI > Node > HA-State > CSI > Component > Usages. Analogous to the tree structure 600 in which there is one sub-tree per SI, the nested hash table lists all of the Sis in the system; i.e., there is one hash table per SI.

Similarly, the analogy between the sub-trees and hash tables applies to the other levels such as CSI, Component and Usages.

[0044] From the bottom of this sequence, the usage values are mapped against their respective attribute names (e.g., CPU_usage: 4.4%, mem_usage: 8 MB, etc). Then the usage hash tables are mapped against component DNs, which are mapped against CSI DNs, and so on. Following the hierarchical key -value sequence, the aggregation module 580 constructs an SI hash table containing, for example, the following content:

"safSi=AmfDemo, safApp=AmfDemol " : {

"nodel " : {

"Active" : {

"safCsi=AmfDemo, safSi=AmfDemo, safApp=AmfDemol " : {

"safComp=AmfDemo, safSu=SC-l, safSg=AmfDemo, safApp=AmfDemol": { "cpu_usage" : 1.0,

"mem_usage: 8} ... } .. . } . . . } ... }

[0045] Note that at this state, the aggregation module 580 has an SI hash table listing all Sis in the cluster that is updated at regular time intervals to reflect any change in the workload and its distribution.

[0046] If a JSON object from any of the monitoring clients 200 contain information about a CSI DN that contains an SI DN not listed in the existing SI hash table, the aggregation module 580 creates a new entry for that SI. However, if an SI DN is not found in consecutive JSON objects (e.g., two consecutive JSON objects) from the same monitoring client 200, that SI entry is deleted from the SI hash table.

[0047] Next, the aggregation module 580 (more specifically, the SI usage aggregator 582 of Figure 5) creates an Si-usage hash table by recursively adding and normalizing the usage data for each level of the SI hash table. Each key of the Si-usage hash table is mapped against the workload on that SI. Figure 7 illustrates an example of an SI hash table 710 and an Si-usage hash table 720 generated from the SI hash table 710 according to one embodiment.

[0048] Figure 8 is a flow diagram illustrating a method 800 performed by the aggregation module 580 of the monitoring server 100 according to one embodiment. The method 800 begins with the aggregation module 580 creating an empty nested hash table (e.g., the SI hash table 710 of Figure 7) in the following hierarchical key-value sequence for: Sis > SI > Node > HA-state > CSI > Component > Usages (block 810). The "Usages" is the only level with actual system resource usage values mapped against keys, which are the names of system resource usage entities (e.g., RAM usage, CPU usage, etc.). The rest of the SI hash table has another hash table' s key as their value. For instance, the keys at the SI level of the hash table are the names of the Sis, while the values at the SI level are the names of the nodes or hosts of the monitoring clients 200.

[0049] In one embodiment, the monitoring server 100 receives component workload data from all monitoring clients 200 in the cluster, and de-serializes and decodes the received component workload data (block 820) to generate decoded component workload objects. The monitoring server 100 then extracts, from the decoded component workload objects, the component DN, the DN of its assigned CSI, the HA state, the node name or host name, and system resource usage values (block 830). The aggregation module 580 uses the extracted information to construct or update a path or a key -value sequence to populate the new usage values into the SI hash table (block 840).

[0050] During the table update, the aggregation module 580 determines whether there is an HA state change in the SI hash table (block 850). If there is an HA state change, there might be duplicate entries of the SI hash table in two separate paths. Furthermore, if one or more of the SI entries are not updated in the SI hash table after a component workload object is received from each monitoring client 200, these SI entries are considered to be outdated entries. The aggregation module 580 checks for such duplicated entries and outdated entries after every SI hash table update. If there is any duplicated entry, the older entry is deleted (block 860).

Similarly, the outdated entry is deleted (bock 860).

[0051] Figure 9 illustrates a method 900 performed by the aggregation module 580 for aggregating the system resource usage values per SI according to one embodiment. After each SI hash table update as shown in Figure 8, the aggregation module 580 starts the aggregation of the system resource usage values by duplicating the SI hash table (block 910). The usage values mapped against all the keys in the usage level of the duplicated SI hash table are added to their sibling key's corresponding usage values and normalized recursively until their parent level is the SI level. That is, the aggregation module 580 determines whether the level above the usage keys is the SI level (block 920). If it is not, the last two levels of the hash table values (i.e., the Usages and one of the Component, CSI, HA-state, Node levels) are added and normalized. That is, the usage values for the siblings of the parent level are added and normalized, and the parent level is removed (block 930) and by that creating a new aggregated usage level at the level of the current parent for which the aggregation is performed and which is replaced by the new usage level. For example, if the parent level is Component and three components assigned the same CSI, the usage values for all three components are added and normalized (e.g., divided by three), by that the Usages level moves one level up and the Component level is removed; i.e., the sequence becomes Sis > SI > Node > HA-state > CSI > Usages. Note that during this process, the aggregation module 580 manipulates the last two levels: Usage and Component. The recursion process is complete when the level before the usage keys is the SI level. Once this recursion process is complete, the duplicated SI hash table has been transformed into an SI- usage hash table, which contains the system resource usage of the AMF managed application mapped against Sis. In one embodiment, the aggregation module 580 outputs per-SI system resource usage as the output from the monitoring engine 500 (block 940). Alternatively or additionally, the aggregation module 580 may generate a graphical output (such as a histogram) from the Si-usage hash table resulting from the transformations to show the system resource usage per SI over time.

[0052] Figure 10 is a diagram illustrating an example of SI workload measurement by the monitoring server 100 and monitoring clients 200 according to one embodiment. In this example, there are three nodes each hosting one of the monitoring clients 200. Each monitoring client 200 transmits component workload data 440 at fixed time intervals to the monitoring server 100. The monitoring server 100 constructs or updates an SI hash table 1001 (e.g., the SI hash table 710 of Figure 7) according to the received component workload data 440, and performs SI path-wise aggregation to generate an Si-usage hash table 1102, which contains the per-SI workload data 590.

[0053] Figure 11 is a flow diagram illustrating a method 1 100 of a monitoring engine for measuring workload of a service in a system managed by AMF. The method 1 100 comprises detecting, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment (block 1101). The system includes one or more components that receive CSI assignments for the service. The method 1100 further comprises obtaining a system resource usage of the component resulting from the assignment based on the information (block 1 102); and aggregating the system resource usage attributable to the service from the one or more components to obtain the workload of the service (block 1103). In one embodiment, the monitoring engine includes a monitoring server and at least one monitoring client. The monitoring server receives component workload data at fixed intervals from the at least one monitoring client, and generates a histogram of the workload of the service based on the component workload data.

[0054] The methods of Figure 11 may be performed by hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof. In one embodiment, the method 1100 of Figure 11 may be performed by a monitoring engine system 1200 of Figure 12 and/or by a computer system 1300 of Figure 13. [0055] Figure 12 illustrates a monitoring engine system 1200 for measuring workload of a service in a system managed by AMF according to one embodiment. The monitoring engine system 1200 comprises a set of monitoring client modules 1220 (e.g., the monitoring client 200 of Figure 5) and a monitoring server module 1210 (e.g., the monitoring server 100 of Figure 5). Each of the monitoring client modules 1220 runs on a node in the system, which includes one or more components that receive CSI assignments for the service. Each of the monitoring client modules 1220 further comprises: a detection module 1201 adapted to detect, at runtime, an assignment of a CSI of the service to a component to obtain information identifying the assignment; and a data retrieval module 1202 adapted to obtain a system resource usage of the component resulting from the assignment based on the information. The monitoring server module 1210 further comprises: an interface module 1203 adapted to receive the system resource usage attributable to the service from the set of monitoring clients 1220; and an aggregation module 1204 adapted to aggregate the system resource usage to obtain the workload of the service.

[0056] Figure 13 illustrates a diagrammatic representation of a machine in the exemplary form of a computer system 1300 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In one embodiment, the computer system 1300 may be part of a network node (e.g., a router, switch, bridge, controller, base station, etc.). In one embodiment, the computer system 1300 may operate in a cloud computing environment where multiple server computers in one or more service centers collectively provide computing services on demand. The computer system 1300 may be a server computer, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines (e.g., computers) that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

[0057] The computer system 1300 includes a processing device 1302. The processing device 1302 represents one or more general -purpose processors, each of which can be: a

microprocessor, a central processing unit (CPU), a multicore system, or the like. The processing device 1302 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In one embodiment, the processing device 1302 is adapted or operative to perform the methods of Figures 8, 9 and 11. In one embodiment, the processing device 1302 is adapted or operative to execute the operations of a monitoring engine 1322 (e.g., the monitoring engine 500 of Figure 5), which contains instructions executable by the processing device 1302 to perform the methods of Figures 8, 9 and 1 1.

[0058] In one embodiment, the processing device 1302 is coupled to one or more memory devices such as: a main memory 1304 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM), etc.), a secondary memory 1318 (e.g., a magnetic data storage device, an optical magnetic data storage device, etc.), and other forms of computer-readable media, which communicate with each other via a bus or interconnect 1330. The memory devices may also include different forms of read-only memories (ROMs), different forms of random access memories (RAMs), static random access memory (SRAM), or any type of media suitable for storing electronic instructions. In one embodiment, the memory devices may store the code and data of the monitoring engine 1322. In the embodiment of Figure 13, the monitoring engine 1322 may be located in one or more of the locations shown as dotted boxes and labeled by the reference numeral 1322. In alternative embodiments the monitoring engine 1322 may be located in other location(s) not shown in Figure 13.

[0059] The computer system 1300 may further include a network interface device 1308. A part or all of the data and code of the monitoring engine 1322 may be transmitted or received over a network 1320 via the network interface device 1308.

[0060] In one embodiment, the monitoring engine 1322 can be implemented using code and data stored and executed on one or more computer systems (e.g., the computer system 1300). Such computer systems store and transmit (internally and/or with other electronic devices over a network) code (composed of software instructions) and data using computer-readable media (also referred to as a machine-readable medium, a processor-readable medium, or a computer usable medium having a computer readable program code embodied therein), such as non- transitory tangible computer-readable media (e.g., magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), digital versatile disc read only memory (DVD-ROM) memory device (volatile or non-volatile), or similar storage mechanism) and transitory computer-readable transmission media (e.g., electrical, optical, acoustical or other form of propagated signals - such as carrier waves, infrared signals). A non- transitory computer-readable medium of a given computer system typically stores instructions for execution on one or more processors of that computer system.

[0061] The operations of the flow diagrams of Figures 8, 9 and 11 have been described with reference to the exemplary embodiments of Figures 12 and 13. However, it should be understood that the operations of the flow diagrams of Figures 8, 9 and 11 can be performed by embodiments of the invention other than those discussed with reference to Figures 12 and 13, and the embodiments discussed with reference to Figures 12 and 13 can perform operations different than those discussed with reference to the flow diagram. While the flow diagrams of Figures 8, 9 and 11 shows a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative

embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

[0062] It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense.