A DEVICE FOR APPLYING ARTIFICIAL INTELLIGENCE IN A COMMUNICATION NETWORK

Title:

A DEVICE FOR APPLYING ARTIFICIAL INTELLIGENCE IN A COMMUNICATION NETWORK

Document Type and Number:

WIPO Patent Application WO/2021/052556

Kind Code:

Abstract:

A device comprising an interface receiving a first set of input states and a second set of input states, a processor determining an output state using a first neural network, which is a p re-trained neural network and a second neural network, which is a reinforcement learning neural network. The first neural network determines a first output based on the first set of input states, and the second neural network determines a second output based on the second set of input states. The device comprises an output selector, which obtains the first and the second outputs and at least one of a first performance indicator from the first neural network and a second performance indicator from the second neural network; and selects as output state the first or the second output based on the at least one of the first performance indicator and the second performance indicator.

Inventors:

ABBOUD OSAMA (DE)
KHALILI RAMIN (DE)

Application Number:

PCT/EP2019/074682

Publication Date:

March 25, 2021

Filing Date:

September 16, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
ABBOUD OSAMA (DE)

International Classes:

G06N3/00; H04L12/24; G06N3/04; G06N3/08

Foreign References:

US20190156247A1	2019-05-23
US8370280B1	2013-02-05
EP2871803A1	2015-05-13
EP3223457A1	2017-09-27
US20180278486A1	2018-09-27

Other References:

"3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Study of Enablers for Network Automation for 5G (Release 16)", 3GPP STANDARD; TECHNICAL REPORT; 3GPP TR 23.791, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. SA WG2, no. V16.2.0, 11 June 2019 (2019-06-11), pages 1 - 124, XP051753968

Attorney, Agent or Firm:

KREUZ, Georg (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A device (200) comprising: an interface (201 ) for receiving a first set of input states and a second set of input states; at least one processor (203) configured to determine an output state using a first neural network (301) and a second neural network (303), wherein the first neural network (301) is a pre-trained neural network and the second neural network (303) is a reinforcement learning neural network, wherein:

- the first neural network (301 ) is configured to receive the first set of input states as input and determine a first output on the basis of the first set of input states, and

- the second neural network (303) is configured to receive the second set of input states and determine a second output based on the second set of input states; and an output selector (205) configured to:

- obtain the first and the second outputs and at least one of a first performance indicator from the first neural network (301) and a second performance indicator from the second neural network (303); and - select as the output state the first output or the second output based on the at least one of the first performance indicator and the second performance indicator.

2. The device (200) of claim 1 , wherein the processor (203) is configured to execute at least a program code for establishing the first neural network (301) and the second neural network (303).

3. The device (200) of claim 1 or 2, wherein the first and the second set of input states are at least partially overlapping; and wherein the first neural network (301) is configured to feed the first output to the second neural network, and the second neural network (303) is configured to use the first output as a training parameter.

4. The device (200) of anyone of the preceding claims, the device (200) being adapted to communicate with one or more network entities in a communication network, wherein the first and second sets of input states comprise network parameters and/or resource information of at least one network entity in the communication network, and the first and second outputs comprise modified network parameters and/or resource information of the at least one network entity in the communication network.

5. The device (200) of claim 4, wherein the interface (201 ) is configured to receive a set of feedback information from the at least one network entity, the set of feedback information being associated with the at least one network entity using the network parameters and/or resource information to perform a communication service for one or more user entities.

6. The device (200) of claim 5, wherein the set of feedback information comprises at least one of packet losses between the at least one network entity and the one or more user entities, quality of service and load of the at least one network entity.

7. The device (200) of claim 5 or 6, wherein the processor (203) is configured to train the second neural network (303) using the set of feedback information as a further training parameter in order to obtain the second output from the second neural network (303).

8. The device (200) of anyone of the preceding claims 5 to 7, wherein the processor (203) is configured to calculate the first performance indicator indicating a success rate associated with the communication service performed by the at least one network entity using the first output and/or to calculate a second performance indicator associated with the communication service performed by the at least one network entity using the second output, and wherein the output selector (205) is configured to select the first output or the second output as the output state on the basis of the at least one of the first performance indicator and the second performance indicator, in particular, to select the second output as the output state if the second performance indicator is higher than the first performance indicator, or vice versa.

9. The device (200) of anyone of the preceding claims 5 to 8, wherein the processor (203) is configured to calculate the second performance indicator associated with the communication service performed by the at least one network entity using the second output on the basis of the set of feedback information received from the at least one network entity.

10. The device (200) of anyone of the preceding claims 5 to 9, wherein the first and second sets of input states and/or the first and second outputs comprise at least one of total communication resources, available communication resources to be assigned to the at least one network entity, the number of the one or more user entities, positions and velocities of the one or more user entities, fading and pass losses between the at least one network entity and the one or more user entities, packet loss or load on user plane functions, user quality of experience, user quality of Service, user equipment connection quality, radio access network state and/or load, edge cloud state and/or loud, transport network state and/or load, Core network start and/or load.

11. The device (200) of anyone of the preceding claims 5 to 10, wherein the communication service comprises at least one of allocating communication resources to the one or more user entities, adaption of the one or more user entities to a transport network or an edge network, increase or decrease of available visualization resources of the one or more network entities, predicting changes in the one or more user entities and or user movement patterns, predicting outages or misses of service level agreements, SLAs, building user profiles and detecting users based on network access patterns, selection of edge network or origin cloud system to serve certain content requests.

12. The device (200) of anyone of the preceding claims 4 to 11 , wherein the communication network comprises at least one access network (801 a, b), at least one core network (805) and/or at least one edge network (803), wherein the device (200) is configured to be located in the at least one access network (801 a, b), the at least one core network (805) and/or the at least one edge network (803).

13. The device (200) of anyone of the preceding claims 5 to 12, wherein the interface (201) is configured to transmit information comprising the output state, the communication service, the second set of input states and the set of feedback information to a further device for determining a further output state for a further network entity.

14. The device (200) of anyone of the preceding claims 4 to 13, wherein the interface (201 ) is configured to receive the second set of input states from a network data analytics function (101), NWDAF, in the communication network.

15. The device (200) of claim 14, wherein the interface (201) is configured to receive the set of feedback information from the at least one network entity via the network data analytics function (101), NWDAF, in the communication network.

16. The device (200) of anyone of the preceding claims 4 to 15, being integrated with a network data analytics function (101 ), NWDAF, in the communication network.

17. A communication coordinator (501 ) for coordinating a plurality of devices (200a-c) according to anyone of claims 1 to 16 in a communication network, wherein the communication coordinator (501) is configured to: receive information comprising a communication service, an output state, a second set of input states and a set of feedback information from a device of the plurality of devices (200a-c), and transmit the information to another device of the plurality of devices (200a-c).

18. The communication coordinator (501 ) of claim 17, wherein the communication network comprises a plurality of access networks (901 a-c) and wherein each of the plurality of devices (200a-c) is located in one of the plurality of access networks (901 a-c) respectively.

19. A method for determining an output state comprising the steps of: receiving a first set of input states and a second set of input states; determining, by at least one processor, an output state using a first neural network and a second neural network; receiving, by a first neural network, the first set of input states as input and determining, by the first neural network, a first output on the basis of the first set of input states; receiving, by a second neural network, the second set of input states and determining, by the second neural network, a second output based on the second set of input states; wherein the first neural network is a pre-trained neural network and the second neural network is a reinforcement learning neural network and obtaining, by an output selector, the first and the second outputs and at least one of a first performance indicator from the first neural network (301) and a second performance indicator from the second neural network (303); and selecting, by the output selector, as the output state the first output or the second output based on the at least one of the first performance indicator and the second performance indicator.

20. The method of claim 19, wherein the first and the second set of input states are at least partially overlapping; and further comprising: feeding, by the first neural network (301), the first output to the second neural network, and using, by the second neural network (303), the first output as a training parameter.

21. The method according to claim 19 or 20, for communicating with one or more network entities in a communication network, wherein the first and second sets of input states comprise network parameters and/or resource information of at least one network entity in the communication network, and the first and second outputs comprise modified network parameters and/or resource information of the at least one network entity in the communication network.

22. The method of any one of claims 19 to 21 , further comprising: receiving a set of feedback information from the at least one network entity, the set of feedback information being associated with the at least one network entity using the network parameters and/or resource information to perform a communication service for one or more user entities.

23. The method of claim 22, further comprising: training the second neural network (303) using the set of feedback information as a further training parameter in order to obtain the second output from the second neural network (303).

24. The method of claim 22 or 23, wherein the first and second sets of input states and/or the first and second outputs comprise at least one of total communication resources, available communication resources to be assigned to the at least one network entity, the number of the one or more user entities, positions and velocities of the one or more user entities, fading and pass losses between the at least one network entity and the one or more user entities, packet loss or load on user plane functions, user quality of experience, user quality of Service, user equipment connection quality, radio access network state and/or load, edge cloud state and/or loud, transport network state and/or load, Core network start and/or load.

25. The method of anyone of claims 22 to 24, wherein the communication service comprises at least one of allocating communication resources to the one or more user entities, adaption of the one or more user entities to a transport network or an edge network, increase or decrease of available visualization resources of the one or more network entities, predicting changes in the one or more user entities and or user movement patterns, predicting outages or misses of service level agreements, SLAs, building user profiles and detecting users based on network access patterns, selection of edge network or origin cloud system to serve certain content requests.

26. A computer program product including program code for performing the method according to any one of claims 19 to 25, when the program code is run by a processor.

Description:

DESCRIPTION

A DEVICE FOR APPLYING ARTIFICIAL INTELLIGENCE IN A COMMUNICATION NETWORK

TECHNICAL FIELD

In general, the present invention relates to the field of communication networks. More specifically, the present invention relates to a device which applies artificial intelligence to a communication network.

BACKGROUND

In the 5th Generation (5G) mobile technology, a Network Data Analytics Function (NWDAF) 101 as shown in figure 1 , part of the mobile 5G system 100, provides slice and general data analytics. The data analytics can be used by other functions, e.g., session management function (SMF) 103, policy control function (PCF) 105, access and mobility management (AMF) function 107, user plane function (UPF) 109, application function (AF) 111 , network exposure function (NEF) 113, unified data repository (UDR) 115 or any other network functions (NFs) 121 a,b. The main purpose is to make use of the different NF data to allow improving the performance of the system.

The general framework for analytics in mobile systems and 3GPP standards focuses on the interfaces that can be used by the different NFs 121 a,b and how they can deliver information regarding their status and performance to the NWDAF101. In addition, it defines how the NWDAF 101 stores the information and analyses them to produce analytics information back into the system.

Data analytics in a 5G System focuses so far on standardizing how to collect data and how to reveal the analytics, but does not address the core challenges of analytics: complexity and learning time. To address these challenges, it may be useful to apply artificial intelligence or machine learning to data analytics in 5G network systems.

Generally speaking, there are two type of machine learning or artificial intelligence (Al): a first type is traditional Al in which a neural network is trained to produce certain output based on certain input. As an example, input (also referred to as environment model) can be pictures and whether they contain cats or not. The trained neural network can by itself detect if any image (never seen before) contains a cat or not. It can be trained reasonably fast and can achieve good results especially if deep learning is invoked.

However, traditional Al requires an environment model and does not react to change fast. Furthermore, it is difficult to judge beforehand which metrics are relevant. An environment model might not be available or too simple and the input has to be specified, i.e. , if an input image contains a cat.

A second type of Al is reinforcement learning Al in which a neural network trains itself by interacting with the environment. No input or environment model has to be specified. The neural network evolves and learns by interacting with the environment, i.e. the input. The reinforcement learning Al is able to take feedbacks from the environment into account and to react to environmental changes. It could take into account 100s of parameters whose impact on the model might not be obvious.

However, reinforcement learning Al might inflict errors when making bad decisions, especially at the beginning of the learning period or when some abnormally occurs. It takes some time for reinforcement learning Al until its learning is producing good results.

In summary, traditional Al requires labeled data or a model of the system, which might not be available in many cases. Moreover, it is difficult to determine beforehand which metrics are relevant; in practice, only a subset of input metrics might be used for training. As for reinforcement learning Al, it requires more time to learn how to interact with the environment and its initial results cannot be used.

Regarding applicability to mobile systems, traditional Al relies on simplistic models or assumptions for training, both complexity and dynamicity of mobile network systems pose strong limitations on its usage in complex mobile communication systems. Reinforcement learning Al, on the other hand, requires more time to learn to interact with network systems and the generated strategies and decisions during the initial phase cannot be used. Furthermore, it may fail to make good decisions when an abnormal situation occurs, e.g., a blackout in the network.

Thus, both traditional Al and reinforcement learning Al are not directly applicable to mobile communication systems due to the above-mentioned limitations. In light of the above, there is a need for an improved device, allowing applying artificial intelligence to communication network systems in an efficient way.

SUMMARY

It is an object of the invention to provide an improved device, allowing applying artificial intelligence to communication network systems in an efficient way.

The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

Generally, the present invention relates to a device for applying artificial intelligence to a communication network. More specifically, embodiments of the invention provide an improved hybrid and distributed approach for enabling artificial intelligence (Al) in a communication system such as a 5G mobile system.

This hybrid approach is also referred to “transitional Al” hereafter, in which a traditional Al or a classical optimization policy is used as a baseline in the first place and then a reinforcement Al learns from the baseline and takes into account the feedback from the environment, while including additional metrics and parameters which are not used by the baseline according to embodiments of the invention.

More specifically, according to a first aspect a device is provided, which comprises: an interface for receiving a first set of input states and a second set of input states; at least one processor configured to determine an output state using a first neural network and a second neural network, wherein the first neural network is a pre-trained neural network and the second neural network is a reinforcement learning neural network, wherein the first neural network is configured to receive the first set of input states as input and determine a first output on the basis of the first set of input states, and the second neural network is configured to receive the second set of input states and determine a second output based on the second set of input states.

The device further comprises an output selector which is configured to: obtain the first and the second outputs and at least one of a first performance indicator from the first neural network and a second performance indicator from the second neural network; and select as the output state the first output or the second output based on the at least one of the first performance indicator and the second performance indicator.

In a further possible implementation form of the first aspect, the processor is configured to execute at least a program code for establishing or realizing the first neural network and the second neural network.

Thus, an improved device is provided, allowing applying transitional artificial intelligence (Al), i.e. both a pre-trained neural network and a reinforcement learning neural network, to communication network systems in an efficient way.

In a further possible implementation form of the first aspect, the first and the second set of input states are at least partially overlapping; the first neural network is configured to feed the first output to the second neural network and the second neural network is configured to use the first output as a training parameter.

Thus, the second neural network, i.e. the reinforcement learning neural network, can produce good results such as useful and good strategies and decisions in an efficient way, in particular during the initial learning phase or when an abnormally occurs.

In a further possible implementation form of the first aspect, the device being adapted to communicate with one or more network entities in a communication network, wherein the first and second sets of input states comprise network parameters and/or resource information of at least one network entity in the communication network, and the first and second outputs comprise modified network parameters and/or resource information of the at least one network entity in the communication network.

Thus, the device can communicate with the network entities and receive network parameters and/or resource information regarding the network entities in the communication network efficiently.

In a further possible implementation form of the first aspect, the interface is configured to receive a set of feedback information from the at least one network entity, the set of feedback information being associated with the at least one network entity using the network parameters and/or resource information to perform a communication service for one or more user entities.

Thus, the device can receive feedback information from the network entities about the services they provided to user entities in a communication system efficiently.

In a further possible implementation form of the first aspect, the set of feedback information comprises at least one of packet losses between the at least one network entity and the one or more user entities, quality of service and load of the at least one network entity.

Thus, the device can receive feedback information from the network entities about the quality of the services they provided or the loading of the network entities themselves efficiently.

In a further possible implementation form of the first aspect, the processor is configured to train the second neural network using the set of feedback information as a further training parameter in order to obtain the second output from the second neural network.

Thus, the device can train the second neural network according to the environment, i.e. the feedback information about the services provided by the network entities, in an efficient and effective manner and improve the final output by the second neural network.

In a further possible implementation form of the first aspect, the processor is configured to calculate the first performance indicator indicating a success rate associated with the communication service performed by the at least one network entity using the first output and/or to calculate a second performance indicator associated with the communication service performed by the at least one network entity using the second output, and the output selector is configured to select the first output or the second output as the output state on the basis of the at least one of the first performance indicator and the second performance indicator, in particular, to select the second output as the output state if the second performance indicator is higher than the first performance indicator, or vice versa.

Thus, a better output between the first and second outputs provided by the first and second neural networks respectively can be determined in an efficient manner. In a further possible implementation form of the first aspect, the processor is configured to calculate the second performance indicator associated with the communication service performed by the at least one network entity using the second output on the basis of the set of feedback information received from the at least one network entity.

Thus, the device can evaluate outputs provided by the neural network by taking into account the environment, i.e. the feedback information about the services provided by the network entities, in an efficient and effective manner.

In a further possible implementation form of the first aspect, the first and second sets of input states and/or the first and second outputs comprise total communication resources, available communication resources to be assigned to the at least one network entity, the number of the one or more user entities, positions and velocities of the one or more user entities, fading and pass losses between the at least one network entity and the one or more user entities, packet loss or load on user plane functions, user quality of experience, user quality of service, user equipment connection quality, radio access network state and/or load, edge cloud state and/or loud, transport network state and/or load, Core network start and/or load.

Thus, the device can train neural networks according to the situation of a communication network and the environment, i.e. the feedback information about the services provided by the network entities, in an efficient and effective manner as well as improve output results by neural networks.

In a further possible implementation form of the first aspect, the communication service comprises allocating communication resources to the one or more user entities, adaption of the one or more user entities to a transport network or an edge network, increase or decrease of available visualization resources of the one or more network entities, predicting changes in the one or more user entities and or user movement patterns, predicting outages or misses of service level agreements, SLAs, building user profiles and detecting users based on network access patterns, selection of edge network or origin cloud system to serve certain content requests.

Thus, the device can help the network entities to provide communication services in an improved manner. In a further possible implementation form of the first aspect, the communication network comprises at least one access network, at least one core network and/or at least one edge network, wherein the device is configured to be located in the at least one access network, the at least one core network and/or the at least one edge network.

Thus, the devices realizing the transitional Al can be deployed at dedicated locations of a communication network efficiently.

In a further possible implementation form of the first aspect, the interface is configured to transmit information comprising the output state, the communication service, the second set of input states and the set of feedback information to a further device for determining a further output state for a further network entity.

Thus, the devices realizing the transitional Al can be deployed at dedicated locations of the communication network and these devices can interact with each other efficiently.

In a further possible implementation form of the first aspect, the interface is configured to receive the second set of input states from a network data analytics function, NWDAF, in the communication network.

Thus, the device can receive the second set of input states from a network data analytics function (NWDAF) efficiently.

In a further possible implementation form of the first aspect, the interface is configured to receive the set of feedback information from the at least one network entity via the network data analytics function, NWDAF, in the communication network.

Thus, the device can receive the second set of feedback information from a network data analytics function (NWDAF) efficiently.

In a further possible implementation form of the first aspect, the device is integrated with a network data analytics function, NWDAF, in the communication network.

Thus, the device can be integrated with a network data analytics function, NWDAF, in the communication network efficiently. According to a second aspect a communication coordinator is provided for coordinating a plurality of devices according to the first aspect in a communication network, wherein the communication coordinator is configured to: receive information comprising a communication service, an output state, a second set of input states and a set of feedback information from a device of the plurality of devices, and transmit the information to another device of the plurality of devices.

Thus, an improved communication coordinator is provided, allowing for coordination of a plurality of devices which realize the transitional Al and apply the transitional Al to communication network systems.

In a further possible implementation form of the second aspect, the communication network comprises a plurality of access networks and wherein each of the plurality of devices is located in one of the plurality of access networks respectively.

Thus, the devices realizing the transitional Al can be deployed at dedicated locations of the communication network and these devices can interact with each other efficiently.

According to a third aspect, a method for determining an output state is provided comprising the steps of: receiving a first set of input states and a second set of input states; determining, by at least one processor, an output state using a first neural network and a second neural network; receiving, by a first neural network, the first set of input states as input and determining, by the first neural network, a first output on the basis of the first set of input states; receiving, by a second neural network, the second set of input states and determining, by the second neural network, a second output based on the second set of input states; wherein the first neural network is a pre-trained neural network and the second neural network is a reinforcement learning neural network and obtaining, by an output selector, the first and the second outputs and at least one of a first performance indicator from the first neural network and a second performance indicator from the second neural network; and selecting, by the output selector, as the output state the first output or the second output based on the at least one of the first performance indicator and the second performance indicator.

In a further possible implementation form of the third aspect, the first and the second set of input states are at least partially overlapping; and further comprising feeding, by the first neural network, the first output to the second neural network; and using, by the second neural network, the first output as a training parameter.

In a further possible implementation form of the third aspect, the method is for communicating with one or more network entities in a communication network, wherein the first and second sets of input states comprise network parameters and/or resource information of at least one network entity in the communication network, and the first and second outputs comprise modified network parameters and/or resource information of the at least one network entity in the communication network.

In a further possible implementation form of the third aspect, the method further comprises receiving a set of feedback information from the at least one network entity, the set of feedback information being associated with the at least one network entity using the network parameters and/or resource information to perform a communication service for one or more user entities.

In a further possible implementation form of the third aspect, the method further comprises training the second neural network using the set of feedback information as a further training parameter in order to obtain the second output from the second neural network.

In a further possible implementation form of the third aspect, the first and second sets of input states and/or the first and second outputs comprise at least one of total communication resources, available communication resources to be assigned to the at least one network entity, the number of the one or more user entities, positions and velocities of the one or more user entities, fading and pass losses between the at least one network entity and the one or more user entities, packet loss or load on user plane functions, user quality of experience, user quality of Service, user equipment connection quality, radio access network state and/or load, edge cloud state and/or loud, transport network state and/or load, Core network start and/or load.

In a further possible implementation form of the third aspect the communication service comprises at least one of allocating communication resources to the one or more user entities, adaption of the one or more user entities to a transport network or an edge network, increase or decrease of available visualization resources of the one or more network entities, predicting changes in the one or more user entities and or user movement patterns, predicting outages or misses of service level agreements, SLAs, building user profiles and detecting users based on network access patterns, selection of edge network or origin cloud system to serve certain content requests.

The method according to the third aspect can be extended into implementation forms corresponding to the implementation forms of the client device according to the first and second aspects. Hence, an implementation form of the method comprises the feature(s) of the corresponding implementation form of the device and communication coordinator.

The advantages of the methods according to the third aspect and its further implementations are the same as those for the corresponding implementation forms of the client device according to the first and second aspects and their implementations.

According to a fourth aspect a computer program product is provided, which includes program code for performing the method according to any one of claims 19 to 25, when the program code is run by a processor.

A further aspect of the invention also relates to a computer program, characterized in program code, which when run by at least one processor causes said at least one processor to execute any method according to embodiments and aspects of the invention. According to still a further aspect, the invention also relates to a computer program product comprising a computer readable medium and said mentioned computer program, wherein said computer program is included in the computer readable medium, and the computer medium comprises of one or more from the group: ROM (Read-Only Memory), PROM (Programmable ROM), EPROM (Erasable PROM), Flash memory, EEPROM (Electrically EPROM) and hard disk drive.

The invention can be implemented in hardware and/or software.

BRIEF DESCRIPTION OF THE DRAWINGS

Further embodiments of the invention will be described with respect to the following figures, wherein:

Figure 1 shows a schematic diagram illustrating a system architecture of a 5G communication network; Figure 2 shows a schematic diagram illustrating a device applying a pre-trained neural network and a reinforcement learning neural network according to an embodiment;

Figure 3 shows a schematic diagram illustrating a process of applying a traditional artificial intelligence (Al) and a reinforcement learning artificial intelligence (Al) to a communication network system according to an embodiment;

Figure 4 shows a schematic diagram illustrating an action selection entity of a device according to an embodiment;

Figure 5 shows a schematic diagram illustrating a communication coordinator according to an embodiment for coordinating a plurality of transitional machine learning entities for applying a traditional Al and a reinforcement learning Al according to an embodiment;

Figure 6 shows a schematic diagram illustrating a transitional machine learning entity according to an embodiment as an independent entity in a 3GPP communication network;

Figure 7 shows a schematic diagram illustrating a transitional machine learning entity according to an embodiment integrated with a network data analytics function in a 3GPP communication network;

Figure 8 shows a schematic diagram illustrating a communication network system according to an embodiment comprising at least one access network, at least one core network and/or at least one edge network, each network being deployed with a transitional machine learning entity according to an embodiment;

Figure 9 shows a schematic diagram illustrating a communication network according to an embodiment for coordinating a plurality of transitional machine learning entities according to an embodiment;

Figure 10 shows a schematic diagram illustrating allocation of transmission resources for vehicle-to-vehicle communication by a transitional machine learning entity according to an embodiment; and Figure 11 shows a schematic diagram illustrating a mapping of a transitional machine learning entity according to an embodiment to a vehicle-to-vehicle communication 1000 according to an embodiment.

In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF EMBODIMENTS

In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the present invention may be placed. It will be appreciated that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the present invention is defined by the appended claims.

For instance, it will be appreciated that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures.

Moreover, in the following detailed description as well as in the claims embodiments with different functional blocks or processing units are described, which are connected with each other or exchange signals. It will be appreciated that the present invention covers embodiments as well, which include additional functional blocks or processing units that are arranged between the functional blocks or processing units of the embodiments described below.

Finally, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.

As will be described in more detail under reference to figures 2 and 3, embodiments of the invention relate to a device 200 which applies “transitional artificial intelligence” (Al) in a communication system such as a 5G mobile system, in which a traditional Al or a classical optimization policy is used as a baseline in the first place, and then a reinforcement Al learns from the baseline and takes into account feedbacks from the environment, while including additional metrics and parameters which are not used by the baseline. The device 200 comprises an action selection entity (also referred to as an output selector 205 in the following) which selects the best action to use among the actions performed by the baseline and reinforcement Al.

Embodiments of the invention allow for no negative impact of a bad action performed by the reinforcement Al during the learning phase or when an abnormally occurs, enable learning from existing traditional Al or classical optimization solutions to increase the convergence speed, and enable the usage of 100s of metrics which is not scalable with classical optimization methods.

More specifically, the device 200 comprises an interface 201 , at least one processor 203 and an output selector 205, as shown in figure 2.

In an embodiment, the interface 201 is configured to receive a first set of input states and a second set of input states, wherein the first and the second set of input states are at least partially overlapping.

In an embodiment, the processor 203 is configured to execute at least a program code for establishing a first neural network and a second neural network and to determine an output state using the first neural network and the second neural network.

In an embodiment, the first neural network is a pre-trained neural network (such as a traditional Al or a classical optimization policy) and the second neural network is a reinforcement learning neural network.

In an embodiment, the first neural network is configured to receive the first set of input states as input and determine a first output on the basis of the first set of input states, and the second neural network is configured to receive the second set of input states and determine a second output based on the second set of input states. Moreover, the first neural network is configured to feed the first output to the second neural network, and the second neural network is configured to use the first output as a training parameter. In an embodiment, the output selector 205 is configured to obtain the first and the second outputs and at least one of a first performance indicator from the first neural network and a second performance indicator from the second neural network; and select the first output or the second output as the output state based on the at least one of the first performance indicator and the second performance indicator.

In an embodiment, the device 200 is adapted to communicate with one or more network entities in a communication network, wherein the first and second sets of input states comprise network parameters and/or resource information of at least one network entity in the communication network, and the first and second outputs comprise modified network parameters and/or resource information of the at least one network entity in the communication network.

In an embodiment, the first and second sets of input states and/or the first and second outputs comprise, for instance, total communication resources, available communication resources to be assigned to the at least one network entity, the number of the one or more user entities, positions and velocities of the one or more user entities, fading and pass losses between the at least one network entity and the one or more user entities, packet loss or load on user plane functions, user quality of experience, user quality of Service, user equipment connection quality, radio access network state and/or load, edge cloud state and/or loud, transport network state and/or load, Core network start and/or load.

In an embodiment, the interface 201 is further configured to receive a set of feedback information from the at least one network entity, wherein the set of feedback information is associated with the at least one network entity using the network parameters and/or resource information to perform a communication service for one or more user entities and the set of feedback information comprises, for example, at least one of packet losses between the at least one network entity and the one or more user entities, quality of service and load of the at least one network entity.

A communication service performed by the network entities can be, for instance, allocating communication resources to the one or more user entities, adaption of the one or more user entities to a transport network or an edge network, increase or decrease of available visualization resources of the one or more network entities, predicting changes in the one or more user entities and or user movement patterns, predicting outages or misses of service level agreements, SLAs, building user profiles and detecting users based on network access patterns, selection of edge network or origin cloud system to serve certain content requests.

In an embodiment, the processor 203 is configured to train the second neural network using the set of feedback information as a further training parameter in order to obtain the second output from the second neural network.

In an embodiment, the processor 203 is configured to calculate the first performance indicator indicating a success rate associated with the communication service performed by the at least one network entity using the first output.

Similarly, the processor 203 can also calculate a second performance indicator associated with the communication service performed by the at least one network entity using the second output, wherein the processor is configured to calculate the second performance indicator on the basis of the set of feedback information received from the at least one network entity.

After the above calculation, the output selector 205 according to an embodiment is configured to select the second output as the output state on the basis of the first performance indicator and the second performance indicator, if the second performance indicator is higher than the first performance indicator, or vice versa.

The device 200 as discussed above will also be referred to as a “transitional machine learning entity” hereafter and the output selector 205 as an “action selection entity” hereafter. In the following both entities will be demonstrated in further details in conjunction with more embodiments.

Figure 3 shows a schematic diagram illustrating a process 300 of applying the transitional artificial intelligence (Al) to a 5G communication network by a transitional machine learning entity 200 according to an embodiment, wherein the transitional machine learning entity 200 comprises an action selection entity 205 as shown in figure 3 according to an embodiment.

The 5G communication network shown in figure 3 comprises a plurality of network functions, e.g., session management function (SMF) 103, policy control function (PCF) 105, access and mobility management (AMF) function 107, user plane function (UPF) 109, application function (AF) 111 , network exposure function (NEF) 113, and unified data repository (UDR) 115.

In step 1 , a baseline policy 301 (such as traditional Al or classic optimization solution) is used to perform an action based on a subset of input metrics during an initiation phase. The baseline policy 301 does not take feedbacks from the network into account by following its learned or dedicated policy.

In step 2, the baseline policy action is fed into the reinforcement learning neural network 303 and used to train the reinforcement learning neural network 303.

In step 3, feedbacks from network functions (e.g. packet loss, quality of service, load, etc.) is used to determine the reward which will be used by the reinforcement learning neural network 303 to train its policy. The reinforcement learning neural network 303 takes into account all relevant metrics from the environment during training, while the baseline 301 may use only a subset of these metrics to take an action.

In step 4, when the reinforcement learning neural network 303 is producing good actions measured through an action certainty indicator, the action of the reinforcement learning neural network 303 is used. The selected action is sent to the reinforcement learning neural network 303 for consideration in the next iteration and further training.

Optionally, the action selection entity 205 can switch back to the baseline if it detects bad performance by the reinforcement learning neural network 303 according to an embodiment. This could happen when some abnormally happens in the network, where the reinforcement learning neural network 303 is never trained for such abnormally or never encounters something similar.

The embodiments of the invention provide in particular the following advantages: there are no requirements on training data or data labeling as feedback form the environment drives the learning of the system; dynamic changes can be taken into consideration by the reinforcement Al; and learning from different locations may be aggregated, which will be further demonstrated under reference to figure 8 or 9 in the following.

In a short summary, the embodiments of the invention provide a device, a system and corresponding method applying an improved hybrid and distributed approach for machine learning in a communication system such as a 5G mobile system and provide the best result out of the two methods (i.e. a first one by traditional, offline Al or classical optimization solutions and a second one by online reinforcement learning) while limiting the limitations of both.

During an initiation phase (off-policy learning), traditional Al or a baseline policy is applied (e.g. a trained policy or a heuristic). Secondly, the reinforcement learning Al is trained: the baseline action is fed into the reinforcement Al for training it and the decision from the baseline is used to enhance the learning of the reinforcement Al.

Thirdly, feedbacks from the network functions (e.g. packet loss, quality of service, load, etc.) is used to determine the reward, and then the reward is used to determine the fitness of the policy and will be used as a quantitative metric to train the reinforcement Al. When the traditional Al makes a good action, a higher reward will be applied. When the traditional Al makes bad action, a lower reward will be applied.

Finally, when the reinforcement Al policy is producing good actions, measured by an action certainty indicator, this policy will be selected by an output selector, i.e. an action selection entity 205 according to an embodiment.

As two policies are running in parallel, a selection entity 205 is needed to select which decision to use, either from the baseline policy/traditional Al or from the reinforcement Al. An embodiment highlighting how this selection works is shown in figure 4.

Figure 4 shows a schematic diagram illustrating an action selection entity 205 of a transitional machine learning entity 200 according to an embodiment for applying a traditional Al 301 and a reinforcement learning Al 303.

Generally speaking, the action selection entity 205 is responsible for selecting the action to be applied. In other words, it decides when to use results of the baseline 301 and when to use results of the reinforcement Al 303. The main input for the decision is an action certainty indicator which indicates how certain the decision of the reinforcement Al 303 is.

A state-of-the-art action certainty indictor is the so-called Temporal Differences (TD), but other metrics may be used. Therefore, the action selection entity 205 enables switching between these two policies. It is mainly helpful in dynamic environments when sudden changes happen and thus the reinforcement Al 303 needs some time to adjust to these changes.

The action selection entity 205 uses the following possible data: history of performed actions, history of action certainty indicator, related and historical reward, and/or action certainty indicator.

Within complex networks, various devices applying transitional machine learning will be running in parallel. This could be employed at different network locations such as radio access network (RAN), transport network, core network, etc. as well as at different system layers such as MAC layer, transport layer, and application layer. This means that multiple transitional machine learning entities will be running in parallel and will need to interact with each other.

Therefore, a communication coordinator that interacts with the different transitional machine learning entities running in the different domains is needed. To realize the interaction, interfaces are required to send and receive the machine learning relevant metrics and parameters. The interfaces are categorized into either Entity-Entity interface 505 or Entity-Coordinator interface 507 as shown in figure 5.

Figure 5 shows a schematic diagram illustrating a communication coordinator 501 according to an embodiment for coordinating a plurality of transitional machine learning entities 200a-c applying transitional Al according to an embodiment.

The communication coordinator 501 is configured to: receive information comprising a communication service, an output state, a second set of input states and a set of feedback information from an entity of the plurality of transitional machine learning entities 200a-c applying transitional Al, and transmit the information to another entity of the plurality of transitional machine learning entities 200a-c applying transitional Al, wherein the communication network comprises a plurality of access networks and wherein each of the plurality of transitional machine learning entities 200a-c is located in one of the plurality of access networks respectively.

In an embodiment, the Entity-Entity interface 505 can exchange good actions and learned policies by exchanging the following messages: send_actions (policyJD, reward function) and send_learned_policy (policyJD, learned policy). In an embodiment, the Entity-coordinator interface 507 can send a locally-learned policy and local constraints as well as receive a globally-learned policy by exchanging the following messages: send_actions (policyJD, reward_function) and send_learned_policy (policyJD, learned policy).

In an embodiment, the device 200, i.e. the transitional machine learning entity 200 could be a separate entity in the 3GPP system such as a 5G network system and provides actions for the other network functions, as shown in figure 6.

The 5G communication network 600 shown in figure 6 comprises the following network functions (NFs): Network Data Analytics Function (NWDAF) 101 , session management function (SMF) 103, policy control function (PCF) 105, access and mobility management (AMF) function 107, user plane function (UPF) 109, application function (AF) 111 , network exposure function (NEF) 113, and unified data repository (UDR) 115.

In an embodiment, data could be retrieved from the NFs directly or from the NWDAF 101 , and the actions could be impact virtualization layer aspects, allocated resources, location, etc. The transitional machine learning entity 200 could respond to optimization requests for different network functions as well as the NWDAF. A baseline policy/traditional Al is assumed to be available for each optimization goal of the different NFs.

In a further embodiment, the transitional Machine Learning entity 200 could be integrated into the Network Data Analytics Function (NWDAF) 101 in a communication network 700, as shown in figure 7. The NWDAF 100 is a function with main focus on how to collect and expose analytic information to other functions. Benefits of this embodiment include: no additional data needed to be exchange between functions as it is already available in the NWDAF 101.

Generally speaking, there are no impacts on the existing Service-Based Architecture (SBA) interfaces, such as those to discover data sources, collect data from NFs, and provide data to NFs.

However, there are impacts and enhancement to existing interfaces as follows: interface to initiate data analytics could be enhanced to include optimization by e.g. traditional or transitional Al; interface between different NWDAF instances could be enhanced to include reference points for sending and receiving learnings and rewards.

In a further embodiment, the transitional machine learning entity 200a-d could be deployed at dedicated locations of a communication network system 800 as shown in figure 8, wherein the communication network system 800 comprises at least one access network 801 a-b, at least one core network 805 and/or at least one edge network 803, and wherein the transitional machine learning entity 200a-d is configured to be located in the at least one access network 801 a-b, the at least one core network 805 and/or the at least one edge network 803.

In the radio access network (RAN) 801 a-b, the transitional machine learning entity 200a-b can impact radio resources allocations in which the action could be adapting allocated time/frequency slots to reduce radio interference.

In the transport network, the transitional machine learning entity can impact transport resources in which the action could be adapting usage of different transport network to load balance or minimize resource utilizations.

In the core network 805, the transitional machine learning entity 200d can impact states and resources of different network functions.

This embodiment enables exchange of transitional learning (good actions and learned policies) at different network locations such as radio access network (RAN), transport network, edge network, core network, etc. This is realized by the communication coordinator 501 and interfaces between different transitional learning entities and a coordinator to exchange good actions and learned policies. This embodiment enables a distributed Al system and makes is scalable as well as enables the learning from different geographical regions to benefit the whole system.

According to an embodiment, multiple transitional machine learning entities 200a-c could be deployed at different radio access networks (RANs) 901 a-c, as shown in figure 9.

As can be from figure 9, the learning results from different RANs 901 a-c will be sent to the communication coordinator 501 according to an embodiment. Further, the learning results could be collected, an independent policy could be generated and the learned policy is sent back to the RANs 901 a-c.

Furthermore, the global policy could be used in conjunction with the local policy and the defined interfaces will be used to exchange the learned policy and reward functions according to a further embodiment.

Figure 10 shows a schematic diagram illustrating allocation of transmission resources for vehicle-to-vehicle (V2V) communication 1000 by the transitional machine learning entity 200 according to an embodiment.

Vehicles are broadcasting their information to neighboring vehicles and there are a number of transmission blocks (TBs) which vehicles can use for these message broadcast. If two vehicles in transmission range of each other broadcast on the same TB, collision occurs and vehicles within the collision area receive no messages.

As can be seen from figure 10, if two vehicles 1001 ,1003 in the middle transmit data on the first transmission block (TB #1), collision occurs and those vehicles 1005,1007 marked with a circle will not receive any of the broadcasted messages.

Transmission blocks (TBs) are assigned by a base station which arranges vehicles in its operating range and which should assign TBs to vehicles such that collision (message loss) is minimized.

In a first step, embodiments of the invention, for offline training, can model this scenario in a simulator, applying some simplifications such as a simple model of physical channel or a simple mobility model. Embodiments of the invention can train a first neural network by considering the following: input state: a number of vehicles in the range of the base station, usage of each resource block (in a form of a matrix); action: assigning a TB to a new vehicle entering the range of the base station; reward: success rate (1 - collision rate).

In this embodiment, action is a vector comprising a number of 0 and 1 , e.g. (0, 0, 0, 1 , 0,

0, 0, 0, 0, 0), i.e. , the fourth transmission block is assigned to the vehicle. To train a second neural network, i.e. the reinforcement Al in a real (i.e. not a simulated) environment, embodiments of the invention apply actions made by the first trained neural network: embodiments of the invention feed the number of vehicles in the range of the base station as the input state to the trained neural network, and get the action which determines what TB to use for a new entering vehicle to the range of the base station.

To train the second neural network, embodiments of the invention require the following information: a current state of the environment 5 _;; an applied action A a reward received from the environment R the new state of the environment S _i+1, after applying the action.

The above SARS information would be fed to a Stochastic Gradient Descent (SGD) algorithm which adjusts the parameters of the second neural network: If the action provides good reward, the parameters are adjusted such that the probability of taking such action in future, observing a similar state, is increased. If the action provides poor reward, the parameters are adjusted such that it would be less probable to take such action in future, observing such state.

As embodiments of the invention are interacting with a real environment, more detailed state information (containing information about the position and velocity of the vehicles, level of fading and path loss in the environment, etc.) can be used.

During the initial phase of training, embodiments of the invention consider action made by the first trained Al (e.g. baseline policy) and assume that this is the action the second neural network made. A reward is the reward (performance) feedback R _t received from the environment applying action A _t. The new state of the environment S _i+1 is a new state of the environment, applying A _t and receiving R _t.

While the embodiments of the invention use the more assured actions performed by the first trained neural network, hence preventing the cost of applying a poor action applied by the second untrained agent, embodiments of the invention can still train the second neural network and learn from the actions performed by the trained neural network.

However, the trained neural network is trained over a simulator by using a simplified model, hence has limited applicability. Embodiments of the invention thus in parallel assess the quality of the decisions made by the second neural network (not by applying the action as it might be costly, but by evaluating other metric, such as temporal differences error).

Embodiments of the invention can start to use actions performed by the second neural network as soon as the quality of its results is ensured, e.g., temporal differences level is lower than some level. This decision is performed by an action selection entity or an output selector 205 as already discussed above: as input, the output selector 205 receives the action made by the first trained neural network, and action by the second neural network as well as temporal differences error calculated by the second neural network for the previous action. The output selector 205 then decides which action to apply.

Figure 11 shows a schematic diagram illustrating a mapping 1100 of the transitional machine learning entity 200 to the vehicle-to-vehicle (V2V) communication 1000 as shown in figure 10 according to an embodiment.

In step 1 , a success rate without collision is used to train a traditional Al 301 and generate the base line policy 301 , wherein the success rate might be generated in a simulation model.

In step 2, additional parameters will be used to train a reinforcement Al 303, such as tunnel length.

In step 3, decision made by the baseline policy 301 of assigned time blocks to vehicles (matrix mapping car number of assigned block) is sent to the action selection entity 205.

In step 4, decision from the action selection entity 205 of assigned time blocks to vehicles (matrix mapping car number of assigned block) is sent to the reinforcement Al 303.

In step 5, an action certainty indicator is derived when applied the gradient decent.

In step 6, feedback about the performance of the applied action (e.g., success rate or 1 - collision rate) is sent to the reinforcement Al 303.

By combining the advantages of a traditional Al 301 and a reinforcement Al 303, the embodiments of the invention thus advantageously reduce negative impact of a bad action performed by the reinforcement Al during its learning phase or when an abnormally occurs, as well as enable learning from existing traditional Al or classical optimization solutions to increase the convergence speed of the reinforcement Al.

While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "include", "have", "with", or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprise". Also, the terms "exemplary", "for example" and "e.g." are merely meant as an example, rather than the best or optimal. The terms “coupled” and “connected”, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.

Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.

Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.

Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.

Previous Patent: TRANSMISSION BEAM SELECTION

Next Patent: HYBRID DRIVE ARRANGEMENT WITH SHIFT TRANSMISSION, DRIVETRAIN ARRANGEMENT AND METHOD FOR CONTROLLING ...