Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD AND A DEVICE FOR GENERATING A FORWARDING-PLANE PROGRAM CODE FOR A NETWORK ELEMENT
Document Type and Number:
WIPO Patent Application WO/2018/162788
Kind Code:
A1
Abstract:
A method and a device for generating a forwarding-plane program code for a network element are presented. The method comprises setting (201) a predetermined forwarding-plane program code to represent a starting point for an iterative reward-driven learning and repeating (202) a reward-driven learning process so as to develop the forwarding-plane program code to a satisfactory form. The iterative reward-driven learning can be guided by requirements to be met by the forwarding-plane program code as well as by implementation costs of functionalities defined by the forwarding-plane program code. Therefore, human work for developing or optimizing the forwarding-plane program code can be avoided or at least reduced.

Inventors:
NIEMINEN JUHA-PETTERI (FI)
Application Number:
PCT/FI2017/050151
Publication Date:
September 13, 2018
Filing Date:
March 06, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CORIANT OY (FI)
International Classes:
H04L12/24; G06N20/00
Other References:
MARTÍNEZ-PLUMED FERNANDO ET AL: "Learning with Configurable Operators and RL-Based Heuristics", 24 September 2012, NETWORK AND PARALLEL COMPUTING; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 1 - 16, ISBN: 978-3-540-77365-8, ISSN: 0302-9743, XP047040548
Attorney, Agent or Firm:
FINNPATENT OY (FI)
Download PDF:
Claims:
What is claimed is:

1 . A device (103) for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support predetermined forwarding- plane functionalities, characterized in that the device comprises a processor system (104) configured to:

- set a predetermined forwarding-plane program code (C0) to represent a starting point for successive modifications of the forwarding-plane program code, and

- repeat a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action (a,(k)) from among modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,), the selection being based on quality values (Q,(1 , 2, ...)) each relating to the forwarding-plane program code (C,) and to one of the modification actions (a,(1 , 2, ...)) and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the predetermined forwarding-plane functionalities, each of the modification actions (a,(1 , 2, ...)) being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code (C,) according to the selected modification action (a,(k)), and c) updating a particular one of the quality values (Q,(k)) which relates to the forwarding-plane program code (C,) prior to the modification and to the selected modification action (ai(k)), the updating being based on i) the quality values (Qi+ (1 , 2, ...)) related to the modified forwarding-plane program code (C ) and to modification actions (ai+i (1 , 2, ...)) suitable for the modified forwarding-plane program code and on ii) a reward value (n) indicating the ability of the forwarding-plane program code (C,) to support the predetermined forwarding-plane functionalities prior to the modification.

2. A device according to claim 1 , wherein the processor system is configured to select the modification action (a,(k)) from among the modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,) based on the quality values (Q,(1 , 2, ...)) and on cost change values (K, - Ki+ (1 , 2, ...)) each indicating a change of an implementation cost of the forwarding-plane program code caused by one of the modification actions (a,(1 , 2, .. .)).

3. A device according to claim 2, wherein the processor system is configured to compute selection values S(1 , 2, ...) for the modification actions (a,(1 , 2, ...)) in accordance with the following equation and to select the modification action (a,(k)) whose selection value is greatest:

S(1 , 2, ...) = Qi(1 , 2, ...) - r (Ki - Ki+1 (1 , 2, ...)), where is Q,(1 , 2, ...) are the quality values relating to the forwarding-plane program code (Ci) and to the modification actions (a,(1 , 2, ...)), K, is the implementation cost of the forwarding-plane program code (C,), each of Ki+i (1 , 2, ...) is the implementation cost of the forwarding-plane program code (C,) after being modified according to one of the modification actions (a,(1 , 2, ...)), and r is a cost factor. 4. A device according to any of claims 1 -3, wherein the processor system is configured to run, at least once amongst repetitions of the reward-driven learning process, a process comprising :

- a random selection of a modification action from among the modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,), - the process step b), and

- the process step c).

5. A device according to any of claims 1 -4, wherein the processor system is configured to apply random selection for selecting one from two or more modification actions which are equal to each other in light of a selection criterion related to the process step a).

6. A device according to any of claims 1 -5, wherein the processor system is configured to set initial values of the quality values to be equal to each other.

7. A device according to any of claims 1 -5, wherein the processor system is configured to set initial values of the quality values to be randomly generated values.

8. A device according to any of claims 1 -7, wherein the processor system is configured to update the one of the quality values Q,(k) which relates to the forwarding-plane program code (Ci) prior to the modification and to the selected modification action (a,(k)) in accordance with the following equation:

Qi(k)Updated = (1 - cc)Qi(k) + α(η + γ max{Qi+i (1 , 2, ...)}), where Qi(k)updated is the updated quality value Qi(k), η is the reward value indicating the ability of the forwarding-plane program code (C) to support the predetermined forwarding-plane functionalities prior to the modification, Qi+ (1 , 2, ...) are quality values related to the modified forwarding-plane program code (Ci+i) and to modification actions (ai+i (1 , 2, ...)) suitable for the modified forwarding-plane program code, a is a low-pass filtering factor, and γ is a discount factor modelling uncertainty of reward values related to modifications of the forwarding-plane program code. 9. A device according to any of claims 1 -8, wherein the processor system is configured to stop repeating the reward-driven learning process in response to a situation in which the reward value has not improved for a predetermined number of repetitions of the reward-driven learning process.

10. A device according to claim 2 or 3, wherein the processor system is configured to stop repeating the reward-driven learning process in response to a situation in which the reward value minus the implementation cost has not improved for a predetermined number of repetitions of the reward-driven learning process

1 1 . A device according to any of claims 1 -10, wherein the processor system is configured to stop repeating the reward-driven learning process in response to a situation in which the forwarding-plane program code supports the predetermined forwarding-plane functionalities. 12. A device according to any of claims 1 -1 1 , wherein the processor system is configured to stop repeating the reward-driven learning process in response to a situation in which the reward-driven learning process has been repeated a predetermined number of times.

13. A device according to any of claims 1 -12, wherein the processor system is configured to load the forwarding-plane program code on the programmable processing system of the network element.

14. A network element (101 ) for a data transfer network, the network element comprising:

- a data interface (109) for receiving data from a data transfer network and for transmitting data to the data transfer network,

- a programmable processing system (102), and

- a device (103) according to claim 13 for generating a forwarding-plane program code that comprises computer executable instructions for controlling the programmable processing system to support predetermined forwarding-plane functionalities.

15. A network element according to claim 14, wherein the network element is at least one of the following: an Internet Protocol IP router, a Multiprotocol Label Switching MPLS switch, a packet optical switch, an Ethernet switch, a software- defined networking "SDN" controlled network element.

16. A method for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support predetermined forwarding-plane functionalities, characterized in that the method comprises:

- setting (201 ) a predetermined forwarding-plane program code (C0) to represent a starting point for successive modifications of the forwarding- plane program code, and

- repeating (202) a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action (a,(k)) from among modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,), the selection being based on quality values (Q,(1 , 2, ...)) each relating to the forwarding-plane program code (C,) and to one of the modification actions (a,(1 , 2, ...)) and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the predetermined forwarding-plane functionalities, each of the modification actions (a,(1 , 2, ...)) being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code (C,) according to the selected modification action (a,(k)), and c) updating a particular one of the quality values (Q,(k)) which relates to the forwarding-plane program code (C,) prior to the modification and to the selected modification action (ai(k)), the updating being based on i) the quality values (Qi+ (1 , 2, ...)) related to the modified forwarding-plane program code (C ) and to modification actions (ai+i (1 , 2, ...)) suitable for the modified forwarding-plane program code and on ii) a reward value (n) indicating the ability of the forwarding-plane program code (C,) to support the predetermined forwarding-plane functionalities prior to the modification.

17. A method according to claim 16, wherein the modification action (a,(k)) is selected from among the modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C) based on the quality values (Q,(1 , 2, ...)) and on cost change values (K, - Ki+ (1 , 2, ...)) each indicating a change of an implementation cost of the forwarding-plane program code caused by one of the modification actions (a,(1 , 2, ...)).

18. A method according to claim 17, wherein the method comprises computing selection values S(1 , 2, ...) for the modification actions (a,(1 , 2, ...)) in accordance with the following equation and selecting the modification action (a,(k)) whose selection value is greatest:

S(1 , 2, ...) = Qi(1 , 2, ...) - r (Ki - Ki+1 (1 , 2, ...)), where is Q,(1 , 2, ...) are the quality values relating to the forwarding-plane program code (Ci) and to the modification actions (a,(1 , 2, ...)), K, is the implementation cost of the forwarding-plane program code (C,), each of Ki+i (1 , 2, ...) is the implementation costs of the forwarding-plane program code (C,) after being modified according to one of the modification actions (a,(1 , 2, ...)), and r is a cost factor. 19. A method according to any of claims 16-18, wherein the method comprises running, at least once amongst repetitions of the reward-driven learning process, a process comprising:

- a random selection of a modification action from among the modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,), - the process step b), and

- the process step c).

20. A method according to any of claims 1 6-19, wherein the method comprises applying random selection for selecting one from two or more modification actions which are equal to each other in light of a selection criterion related to the process step a).

21 . A method according to any of claims 1 6-20, wherein the method comprises setting initial values of the quality values to be equal to each other.

22. A method according to any of claims 1 6-20, wherein the method comprises setting initial values of the quality values to be randomly generated values.

23. A method according to any of claims 1 6-22, wherein the one of the quality values Qi(k) which relates to the forwarding-plane program code (C,) prior to the modification and to the selected modification action (ai(k)) is updated in accordance with the following equation : Qi(k)updated = (1 - cc)Qi(k) + α(η + γ max{Qi+i (1 , 2, ...)}), where Qi(k)updated is the updated quality value Qi(k), η is the reward value indicating the ability of the forwarding-plane program code (C,) to support the predetermined forwarding-plane functionalities prior to the modification, Qi+ (1 , 2, ...) are quality values related to the modified forwarding-plane program code (Ci+i ) and to modification actions (ai+i (1 , 2, ...)) suitable for the modified forwarding-plane program code, a is a low-pass filtering factor, and γ is a discount factor modelling uncertainty of reward values related to modifications of the forwarding-plane program code.

24. A method according to any of claims 1 6-23, wherein the repeating the reward-driven learning process is stopped in response to a situation in which the reward value has not improved for a predetermined number of repetitions of the reward-driven learning process.

25. A method according to claim 17 or 18, wherein the repeating the reward- driven learning process is stopped in response to a situation in which the reward value minus the implementation cost has not improved for a predetermined number of repetitions of the reward-driven learning process 26. A method according to any of claims 16-25, wherein the repeating the reward-driven learning process is stopped in response to a situation in which the forwarding-plane program code supports the predetermined forwarding-plane functionalities.

27. A method according to any of claims 16-26, wherein the repeating the reward-driven learning process is stopped in response to a situation in which the reward-driven learning process has been repeated a predetermined number of times.

28. A method according to any of claims 16-27, wherein the method further comprises loading (203) the forwarding-plane program code on the programmable processing system of the network element.

29. A computer program for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support predetermined forwarding- plane functionalities, characterized in that the computer program comprises computer executable instructions for controlling a programmable processor to:

- set a predetermined forwarding-plane program code (Co) to represent a starting point for successive modifications of the forwarding-plane program code, and

- repeat a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action (a,(k)) from among modification actions (a,(1 , 2, ...)) suitable for the forwarding-plane program code (C,), the selection being based on quality values (Q,(1 , 2, ...)) each relating to the forwarding-plane program code (C) and to one of the modification actions (a,(1 , 2, ...)) and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the predetermined forwarding-plane functionalities, each of the modification actions (a,(1 , 2, ...)) being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, modifying the forwarding-plane program code (C,) according to the selected modification action (ai(k)), and updating a particular one of the quality values (Qi(k)) which relates to the forwarding-plane program code (C,) prior to the modification and to the selected modification action (ai(k)), the updating being based on i) the quality values (Qi+ (1 , 2, ...)) related to the modified forwarding-plane program code (C ) and to modification actions (ai+i (1 , 2, ...)) suitable for the modified forwarding-plane program code and on ii) a reward value (η) indicating the ability of the forwarding-plane program code (C,) to support the predetermined forwarding-plane functionalities prior to the modification.

30. A computer program product comprising a non-transitory computer readable medium encoded with a computer program according to claim 29.

Description:
A method and a device for generating a forwarding-plane program code for a network element

Field of the disclosure The disclosure relates generally to configuring a network element such as e.g. a router to operate as a part of a data transfer network. More particularly, the disclosure relates to a method and a device for generating a forwarding-plane program code that comprises computer executable instructions for controlling a network element to support desired forwarding-plane functionalities. Furthermore, the disclosure relates to a computer program for generating a forwarding-plane program code for a network element. Furthermore, the disclosure relates to a network element such as e.g. a router.

Background

The forwarding-plane which is sometimes called the data-plane or the user-plane defines network functionality which decides what to do with data frames arriving at network elements of a data transfer network. Each network element can be for example an Internet Protocol "IP" router, a Multiprotocol Label Switching "MPLS" router "LSR", a packet optical switch, an Ethernet switch, and/or a software- defined networking "SDN" controlled network element. Each data frame can be for example an Internet Protocol "IP" packet, an Ethernet frame, or some other protocol data unit "PDU" that is routed in a data transfer network. For example, the forwarding-plane functionalities of a "MPLS" label switching router "LSR" comprise label switching functionality for forwarding data frames based on labels carried by the data frames. Based on a label carried by a received data frame, a LSR decides an egress port of the LSR via which the data frame is to be sent and replaces the label with a new label. The new label is used by a subsequent LSR to perform a new forwarding decision.

In many network elements, the forwarding-plane is at least partly implemented with a programmable processing system and with a forwarding-plane program code that comprises computer executable instructions for controlling the programmable processing system to support desired forwarding-plane functionalities. The forwarding-plane program code has to fulfill several requirements. Firstly, the forwarding-plane program code has to adapt the programmable processing system to support desired forwarding-plane functionalities. Secondly, the implementation costs of the forwarding-plane functionalities have to be at an acceptable level. The implementation costs mean for example memory usage, memory bus usage, consumption of processing time, and/or other factors which indicate the consumption and/or usage of hardware "HW" resources for implementing and/or running the forwarding-plane functionalities. In many cases, a forwarding-plane program code can be complex and thus a significant amount of human work can be needed for developing and/or optimizing a forwarding-plane program code.

Summary The following presents a simplified summary in order to provide a basic understanding of some aspects of various invention embodiments. The summary is not an extensive overview of the invention. It is neither intended to identify key or critical elements of the invention nor to delineate the scope of the invention. The following summary merely presents some concepts in a simplified form as a prelude to a more detailed description of exemplifying embodiments of the invention.

In accordance with the invention, there is provided a new device for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support desired forwarding-plane functionalities. The device comprises a processor system configured to: set a predetermined forwarding-plane program code to represent a starting point for successive modifications of the forwarding-plane program code, and - repeat a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action from among modification actions suitable for the forwarding-plane program code, the selection being based on quality values each relating to the forwarding-plane program code and to one of the modification actions and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the desired forwarding-plane functionalities, each of the modification actions being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code according to the selected modification action, and c) updating the particular one of the quality values which relates to the forwarding-plane program code prior to the modification and to the selected modification action, the updating being based on i) the quality values related to the modified forwarding-plane program code and to modification actions suitable for the modified forwarding-plane program code and on ii) a reward value indicating the ability of the forwarding- plane program code to support the desired forwarding-plane functionalities prior to the modification.

The above-presented iterative reward-driven learning is guided by the desired forwarding-plane functionalities which are to be implemented with the forwarding- plane program code. Thus, the iterative reward-driven learning drives the forwarding-plane program code towards a form capable of supporting the desired forwarding-plane functionalities. Furthermore, the iterative reward-driven learning can be guided by implementation costs which indicate the consumption and/or usage of hardware "HW" resources when running the forwarding-plane functionalities with the aid of each development phase of the forwarding-plane program code. Therefore, human work for developing and/or optimizing a forwarding-plane program code can be avoided or at least reduced. Iterative reward-driven learning of the kind described above is called also reinforcement learning.

In accordance with the invention, there is provided also a new network element that can be for example an Internet Protocol "IP" router, a multiprotocol label switching "MPLS" router "LSR", a packet optical switch, an Ethernet switch, and/or a software-defined networking "SDN" controlled network element. The network element comprises:

- a data interface for receiving data from a data transfer network and for transmitting data to the data transfer network,

- a programmable processing system, and - a device according to the invention for generating a forwarding-plane program code that comprises computer executable instructions for controlling the programmable processing system to support the forwarding- plane functionalities.

In accordance with the invention, there is provided also a new method for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support desired forwarding-plane functionalities. The method comprises:

- setting a predetermined forwarding-plane program code to represent a starting point for successive modifications of the forwarding-plane program code, and

- repeating a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action from among modification actions suitable for the forwarding-plane program code, the selection being based on quality values each relating to the forwarding-plane program code and to one of the modification actions and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the desired forwarding-plane functionalities, each of the modification actions being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code according to the selected modification action, and c) updating the particular one of the quality values which relates to the forwarding-plane program code prior to the modification and to the selected modification action, the updating being based on i) the quality values related to the modified forwarding-plane program code and to modification actions suitable for the modified forwarding-plane program code and on ii) a reward value indicating the ability of the forwarding- plane program code to support the desired forwarding-plane functionalities prior to the modification.

In accordance with the invention, there is provided also a new computer program for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support desired forwarding-plane functionalities. The computer program comprises computer executable instructions for controlling a programmable processor to: set a predetermined forwarding-plane program code to represent a starting point for successive modifications of the forwarding-plane program code, and repeat a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action from among modification actions suitable for the forwarding-plane program code, the selection being based on quality values each relating to the forwarding-plane program code and to one of the modification actions and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the desired forwarding-plane functionalities, each of the modification actions being one of the following: changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code according to the selected modification action, and c) updating the particular one of the quality values which relates to the forwarding-plane program code prior to the modification and to the selected modification action, the updating being based on i) the quality values related to the modified forwarding-plane program code and to modification actions suitable for the modified forwarding-plane program code and on ii) a reward value indicating the ability of the forwarding- plane program code to support the desired forwarding-plane functionalities prior to the modification.

In accordance with the invention, there is provided also a new computer program product. The computer program product comprises a non-volatile computer readable medium, e.g. a compact disc "CD", encoded with a computer program according to the invention.

A number of exemplifying and non-limiting embodiments of the invention are described in accompanied dependent claims. Various exemplifying and non-limiting embodiments of the invention both as to constructions and to methods of operation, together with additional objects and advantages thereof, will be best understood from the following description of specific exemplifying and non-limiting embodiments when read in connection with the accompanying drawings. The verbs "to comprise" and "to include" are used in this document as open limitations that neither exclude nor require the existence of also un-recited features.

The features recited in dependent claims are mutually freely combinable unless otherwise explicitly stated. Furthermore, it is to be understood that the use of "a" or "an", i.e. a singular form, throughout this document does not exclude a plurality.

Brief description of the figures

Exemplifying and non-limiting embodiments of the invention and their advantages are explained in greater detail below with reference to the accompanying drawings, in which: figure 1 shows a schematic illustration of a network element according to an exemplifying and non-limiting embodiment of the invention, and figure 2 shows a flowchart of a method according to an exemplifying and non- limiting embodiment of the invention for generating a forwarding-plane program code for controlling a network element of a data transfer network. Description of exemplifying and non-limiting embodiments

The specific examples provided in the description below should not be construed as limiting the scope and/or the applicability of the accompanied claims. Lists and groups of examples provided in the description below are not exhaustive unless otherwise explicitly stated.

Figure 1 shows a schematic illustration of a network element 101 according to an exemplifying and non-limiting embodiment of the invention. The network element 101 can be for example an Internet Protocol "IP" router, a multiprotocol label switching "MPLS" switch, a packet optical switch, an Ethernet switch, and/or a software-defined networking "SDN" controlled network element. The network element 101 comprises a data interface 109 for receiving data from a data transfer network 1 10 and for transmitting data to the data transfer network 1 10. The network element 101 comprises a programmable processing system 102 that can be programmed to manage data in accordance with desired forwarding-plane functionalities. The programmable processing system 102 may comprise for example a programmable network processor "NP". The network element 101 comprises a device 103 according to an embodiment of the invention for generating a forwarding-plane program code that comprises computer executable instructions for controlling the programmable processing system 102 to support the desired forwarding-plane functionalities. The device 103 can be configured to load the generated forwarding-plane program code on the programmable processing system 102 so as to enable the programmable processing system 102 to manage data in accordance with the desired forwarding-plane functionalities.

The device 103 of the network element comprises a processor system 104 configured to set a predetermined forwarding-plane program code C 0 to represent a starting point for successive modifications of the forwarding-plane program code. The forwarding-plane program code C 0 can be for example a predetermined set of computer executable instructions. It is also possible that the forwarding-plane program code C 0 is an empty set, i.e. the successive modifications are started from null. The processor system 104 is configured to repeat a reward-driven learning process that comprises the following process steps a)-c), where i = 0 at the first execution of the reward-driven learning process, i = 1 at the second execution of the reward-driven learning process, i = 2 at the third execution of the reward-driven learning process, and so on: a) selecting a modification action ai(k) from among modification actions a,(j) suitable for the forwarding-plane program code C,, the selection being based on quality values Qi(j) each relating to the forwarding-plane program code C, and to one of the modification actions a {\), where j is an index 1 , 2, 3, ... identifying the modification action under consideration and k is an index identifying the selected modification action, b) modifying the forwarding-plane program code C, according to the selected modification action aj(k), this yielding the next development phase of the forwarding-plane program code i.e. the forwarding-plane program code C , and c) updating the quality value Qi(k) which relates to the forwarding-plane program code Ci prior to the modification and to the selected modification action a,(k).

The above-mentioned forwarding-plane program code C, represents the development phase of the forwarding-plane program code at the beginning of the 'i+1 ' th execution of the above-mentioned reward-driven learning process. Each of the quality values Qi(j) expresses the ability of the forwarding-plane program code to support the desired forwarding-plane functionalities if the forwarding-plane program code C, were modified according to the modification action a,(j). Each of the modification actions a,(j) can be one of the following : changing one or more of the computer executable instructions of the forwarding-plane program code C,, removing one or more of the computer executable instructions from the forwarding-plane program code C,, adding one or more computer executable instructions to the forwarding code C,, and keeping the forwarding-plane program code Ci unchanged. The quality value Qi(k) can be updated based on the quality values Qi + i (j) related to the modified forwarding-plane program code C and to modification actions ai + i (j) suitable for the modified forwarding-plane program code Ci + 1 and on a reward value η indicating the ability of the forwarding-plane program code Ci to support the desired forwarding-plane functionalities.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to select the modification action a,(k) from among the modification actions a,(j) suitable for the forwarding-plane program code C, based on: 1 ) the quality values Q,(j) and 2) cost change values K, - Ki + i (j) each of which indicates a change of the implementation cost of the forwarding-plane program code which would be caused by the modification action a,(j). Ki is the implementation cost of the forwarding-plane program code C,, and each of K i+ (j) is the implementation cost of the forwarding-plane program code C, if the forwarding-plane program code C, were modified according to the corresponding modification action a,(j).

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to compute selection values S(j) for the modification actions a,(j) in accordance with the following equation and to select the modification action whose selection value is greatest:

S(j) = Qi(j) - r (K i - K i+1 (j)) ! (1 ) where r is a cost factor that expresses how much relative weight is given to the implementation costs with respect to the ability to support the desired forwarding- plane functionalities. The cost factor r is typically less than one.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to run, at least once amongst repetitions of the above-described reward-driven learning process, a process comprising: - a random selection of a modification action from among the modification actions a,(j) suitable for the forwarding-plane program code C,,

- the above-mentioned process step b), and

- the above-mentioned process step c).

The random selection can be used occasionally instead of the selection according to the above-mentioned process step a) in order to avoid a situation in which the forwarding-plane program code is stuck on a local optimum that is however weaker than the global optimum of the iterative reward-driven learning. In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to apply random selection for selecting one from two or more modification actions which are equal to each other in light of the selection criterion related to the above-mentioned process step a). In this exemplifying case, the tiebreaking in the selection is carried out as a random selection. It is, however, also possible that the tiebreaking is carried out in accordance with a suitable deterministic rule.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to update the quality value Q,(k) which relates to the forwarding-plane program code C, and to the selected modification action a,(k) in accordance with the following equation:

Qi(k) U pdated = (1 - cc)Qi(k) + α(η + γ maxj{Q i+ i (j)}), (2) where Qi(k) upda t ed is the updated quality value Qi(k), η is the reward value indicating the ability of the forwarding-plane program code C, to support the desired forwarding-plane functionalities, Qi+i (j) are quality values related to the modified forwarding-plane program code C i+ i and to modification actions a i+ (j) suitable for the modified forwarding-plane program code Ci + i , a is a low-pass filtering factor, and γ is a discount factor modelling uncertainty of the reward values related to future modifications of the forwarding-plane program code. The low-pass filtering factor a is typically less than one, and also the discount factor γ is typically less than one.

The initial values of the quality values Qi(j) can be set to be equal to each other, i.e. a flat-start. It is also possible to set the initial values of the quality values to be randomly generated values. It is worth noting that the initial values are to be set also for the quality values of future modifications of the forwarding-plane program code because there is the future term "Q i+1 (j)" in the above-presented updating equation (2). It can be shown that the quality values Q,(j) of the iterative reward- driven learning defined with the above-mentioned process steps a)-c) and with the above-presented updating equation (2) converge toward quality values which express abilities of different development phases of the forwarding-plane program code to support the desired forwarding-plane functionalities. A mathematical proof for the convergence can be found for example from the publication: Carlos Ribeiro and Csaba Szepesvari, Q-learning combined with spreading: Convergence and results, Proceedings of the ISRF-IEE International Conference: Intelligent and Cognitive Systems (Neural Networks Symposium), pages 32-36, 1996.

Instead of the above-presented updating equation (2), it is also possible to train a neural network to provide the quality values Qi(j) for different modification actions at different development phases of the forwarding-plane program code.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to stop repeating the above- described reward-driven learning process in response to a situation in which the above-mentioned reward η value has not improved for a predetermined number of repetitions of the reward-driven learning process.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to stop repeating the above- described reward-driven learning process in response to a situation in which the reward value minus the implementation cost, i.e. η - Κ,, has not improved for a predetermined number of repetitions of the reward-driven learning process.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to stop repeating the above- described reward-driven learning process in response to a situation in which the forwarding-plane program code C, supports the desired forwarding-plane functionalities.

In a device according to an exemplifying and non-limiting embodiment of the invention, the processor system 104 is configured to stop repeating the above- described reward-driven learning process in response to a situation in which the reward-driven learning process has been repeated a predetermined number of times, i.e. the index i has reached a predetermined limit. The programmable processing system 102 of the network element 101 comprises one or more processors 105 at least one of which is a programmable processor. Furthermore, the processing system 102 may comprise one or more dedicated hardware processors such as for example an application specific integrated circuit "ASIC" and/or a configurable hardware processor such as for example a field programmable gate array "FPGA". The processing system 102 may comprise one or more memory circuits 106 each of which can be e.g. a random access memory circuit "RAM" or a content access memory circuit "CAM". The processor system 104 may comprise one or more processors 107, each of which can be a programmable processor provided with appropriate software, a dedicated hardware processor such as for example an application specific integrated circuit "ASIC", or a configurable hardware processor such as for example a field programmable gate array "FPGA". The processor system 104 may comprise one or more memory circuits 108 each of which can be e.g. a random access memory circuit "RAM".

In the exemplifying case illustrated in figure 1 , the device 103 for generating the forwarding-plane program code is a part of the network element 101 . In many cases, however, a device for generating the forwarding-plane program code is a separate entity with respect to a network element on which the generated forwarding-plane program code is loaded for production use. The device can be implemented for example with programmatic means running in a server. The server can run a process for generating the forwarding-plane program code, where the process may include sending test data frames and analysing received data frames to determine rewards. After the forwarding-plane program code has been generated, it can be uploaded to an appropriate network element for production use.

Figure 2 shows a flowchart of a method according to an exemplifying and non- limiting embodiment of the invention for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support predetermined forwarding-plane functionalities. The method comprises the following actions: action 201 : setting a predetermined forwarding-plane program code C 0 to represent a starting point for successive modifications of the forwarding- plane program code, and action 202 : repeating a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met, i = 0, 1 , 2, 3. . . : a) selecting a modification action a,(k) from among modification actions a,(1 . 2, ...) suitable for the forwarding-plane program code C,, the selection being based on quality values 0,(1 , 2, ...) each relating to the forwarding-plane program code C, and to one of the modification actions ai(1 , 2, ...) and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the predetermined forwarding-plane functionalities, each of the modification actions a,(1 , 2, ...) being one of the following : changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code C, according to the selected modification action ai(k), and c) updating the particular one of the quality values Q,(k) which relates to the forwarding-plane program code C, prior to the modification and to the selected modification action aj(k), the updating being based on i) the quality values Qi + i (1 , 2, ...) related to the modified forwarding-plane program code (C ) and to modification actions , + i (1 , 2, ...) suitable for the modified forwarding-plane program code and on ii) a reward value η indicating the ability of the forwarding-plane program code C, to support the predetermined forwarding-plane functionalities prior to the modification. A method according to an exemplifying and non-limiting embodiment of the invention further comprises loading, action 203, the generated forwarding-plane program code on the programmable processing system of the network element.

In a method according to an exemplifying and non-limiting embodiment of the invention, the modification action ai(k) is selected from among the modification actions a,(1 , 2, ...) suitable for the forwarding-plane program code C, based on the quality values 0,(1 , 2, ...) and on cost change values K, - K i+ (1 , 2, ...) each indicating a change of an implementation cost of the forwarding-plane program code caused by the respective one of the modification actions a,(1 , 2, ...). K, is the implementation cost of the forwarding-plane program code C,, and each of K i+ (1 , 2, ...) is the implementation cost of the forwarding-plane program code C, after being modified according to the respective one of the modification actions a,(1 , 2, ...).

A method according to an exemplifying and non-limiting embodiment of the invention comprises computing selection values S(1 , 2, ...) for the modification actions a,(1 . 2, ...) in accordance with the following equation and selecting the modification action a,(k) whose selection value is greatest:

S(1 , 2, ...) = Qi(1 , 2, ...) - r (Ki - K i+1 (1 , 2, ...)), where r is a cost factor. A method according to an exemplifying and non-limiting embodiment of the invention comprises running, at least once amongst the repetitions of the reward- driven learning process, a process comprising:

- a random selection of a modification action from among the modification actions a,(1 , 2, ...) suitable for the forwarding-plane program code C,, - the above-mentioned process step b), and

- the above-mentioned process step c).

A method according to an exemplifying and non-limiting embodiment of the invention comprises applying random selection for selecting one from two or more modification actions which are equal to each other in light of a selection criterion related to the above-mentioned process step a).

A method according to an exemplifying and non-limiting embodiment of the invention comprises setting initial values of the quality values to be equal to each other.

A method according to an exemplifying and non-limiting embodiment of the invention comprises setting initial values of the quality values to be randomly generated values.

In a method according to an exemplifying and non-limiting embodiment of the invention, the quality value Q,(k) which relates to the forwarding-plane program code Ci prior to the modification and to the selected modification action a,(k) is updated in accordance with the following equation :

Qi(k) U pdated = (1 - cc)Qi(k) + α(η + γ max{Q i+1 (1 , 2, ...)}), where Qi(k) upd at e d is the updated quality value Qi(k), n is the reward value indicating the ability of the forwarding-plane program code C, to support the desired forwarding-plane functionalities prior to the modification, Q i+ (1 , 2, ...) are quality values related to the modified forwarding-plane program code C i+ i and to modification actions ai + i (1 , 2, ...) suitable for the modified forwarding-plane program code, a is a low-pass filtering factor, and γ is a discount factor modelling uncertainty of reward values related to future modifications of the forwarding-plane program code.

In a method according to an exemplifying and non-limiting embodiment of the invention, the repeating the reward-driven learning process is stopped in response to a situation in which the reward value has not improved for a predetermined number of repetitions of the reward-driven learning process.

In a method according to an exemplifying and non-limiting embodiment of the invention, the repeating the reward-driven learning process is stopped in response to a situation in which the reward value minus the implementation cost has not improved for a predetermined number of repetitions of the reward-driven learning process

In a method according to an exemplifying and non-limiting embodiment of the invention, the repeating the reward-driven learning process is stopped in response to a situation in which the forwarding-plane program code supports the predetermined forwarding-plane functionalities.

In a method according to an exemplifying and non-limiting embodiment of the invention, the repeating the reward-driven learning process is stopped in response to a situation in which the reward-driven learning process has been repeated a predetermined number of times.

A computer program according to an exemplifying and non-limiting embodiment of the invention comprises computer executable instructions for controlling a programmable processor to carry out actions related to a method according to any of the above-described exemplifying embodiments of the invention. A computer program according to an exemplifying and non-limiting embodiment of the invention comprises software modules for generating a forwarding-plane program code that comprises computer executable instructions for controlling a programmable processing system of a network element to support predetermined forwarding-plane functionalities. The software modules comprise computer executable instructions for controlling a programmable processor to:

- set a predetermined forwarding-plane program code C 0 to represent a starting point for successive modifications of the forwarding-plane program code, and

- repeat a reward-driven learning process comprising the following process steps a)-c) until a predetermined criterion is met: a) selecting a modification action ai(k) from among modification actions a,(1 . 2, ...) suitable for the forwarding-plane program code C,, the selection being based on quality values 0,(1 , 2, ...) each relating to the forwarding-plane program code C, and to one of the modification actions a,(1 , 2, ...) and expressing an ability of the forwarding-plane program code after being modified according to the one of the modification actions to support the predetermined forwarding-plane functionalities, each of the modification actions a,(1 , 2, ...) being one of the following : changing one or more of the computer executable instructions of the forwarding-plane program code, removing one or more of the computer executable instructions from the forwarding-plane program code, adding one or more computer executable instructions to the forwarding code, and keeping the forwarding-plane program code unchanged, b) modifying the forwarding-plane program code C, according to the selected modification action ai(k), and c) updating the quality value Qi(k) which relates to the forwarding-plane program code C, prior to the modification and to the selected modification action ai(k), the updating being based on i) the quality values Q i+ (1 , 2, ...) related to the modified forwarding-plane program code C i+ i and to modification actions a i+ i (1 , 2, ...) suitable for the modified forwarding-plane program code and on ii) a reward value n indicating the ability of the forwarding-plane program code C, to support the predetermined forwarding-plane functionalities prior to the modification.

The above-mentioned software modules can be e.g. subroutines or functions implemented with a suitable programming language and with a compiler suitable for the programming language and the programmable processor under consideration. It is worth noting that also a source code corresponding to a suitable programming language represents the computer executable software modules because the source code contains the information needed for controlling the programmable processing system to carry out the above-presented modification actions and compiling changes only the format of the information. Furthermore, it is also possible that the programmable processing system is provided with an interpreter so that a source code implemented with a suitable programming language does not need to be compiled prior to running. A computer program product according to an exemplifying and non-limiting embodiment of the invention comprises a computer readable medium, e.g. a compact disc "CD", encoded with a computer program according to an embodiment of invention. A signal according to an exemplifying and non-limiting embodiment of the invention is encoded to carry information defining a computer program according to an embodiment of invention. In this exemplifying case, the computer program can be downloadable from a server that may constitute a part of a cloud service.

A simple example that illustrates an exemplifying and non-limiting embodiment of the invention is presented below.

In this example, a forwarding-plane program code is generated to control a network element to function as a MPLS label switch router "LSR". Based on a label contained by a received data frame e.g. an IP data packet, the LSR selects an egress port from among egress ports of the LSR, replaces the label with a new label, and transmits the data frame via the selected egress port. The new label is used by a subsequent LSR to perform a new forwarding decision.

In this example, it is assumed that the following computer executable instructions can be used as elements of the forwarding-plane program code:

- LOOKUP(label) - performs a lookup by using a label contained by a received data frame as a lookup key, and returns an egress port and a new label,

- WRITE(label) - writes the new label to the data frame, the new label being a result of the LOOKUP instruction,

- TRANSMIT(egress port) - sends the data frame to the egress port provided as a parameter whose value is received as a result of the LOOKUP instruction,

NOP - no operation. The implementation costs of the above-mentioned computer executable instructions can be determined for example based on e.g. usage of scarce hardware "HW" resources and/or processing time. For example, the LOOKUP instruction may utilize a memory bus and require clock cycles. Thus, the implementation cost of the LOOKUP instruction can be higher than that of e.g. a computer executable instruction which requires the same number of clock cycles but does not utilize the memory bus.

In this example, it is assumed that the implementation cost of each of the TRANSMIT, NOP, and WRITE instructions is 1 , and the implementation cost the LOOKUP instruction is 2. The NOP is assumed to consume clock cycles and thus its implementation cost differs from zero.

In this example, the iterative reward-driven learning is configured to provide the following rewards:

0 when no operation is taken, i.e. a data frame is silently discarded, 10 when a data frame is sent to a correct egress port but its label is wrong, and

100 when an egress port and the label are both correct.

When the iterative reward-driven learning starts, the forwarding-plane program code Co has a single NOP instruction and the Q values have been initialized, for example to zeros. During repeated reward-driven learning processes i = 1 , 2, 3..., the forwarding-plane program code C, experiments different sequences of computer executable instructions, for example {WRITE, NOP}. However, this kind of forwarding-plane program code receives a reward of 0 because a data frame is not sent out at all.

At some point of the iterative reward-driven learning, the forwarding-plane program code will comprise the LOOKUP and TRANSMIT instructions in a correct order. The forwarding-plane program code may also include e.g. various NOP instructions. Such forwarding-plane program code will receive a reward of 10 and an implementation cost relative to included computer executable instructions. As a corollary, modification actions a,(1 , 2, ...) which modify the forwarding-plane program code to contain the LOOKUP and TRANSMIT instructions in the correct order will have higher Q values, making them more probable modification actions. Additionally, the effect of the implementation costs on the iterative reward-driven learning increases probability of modification actions which eliminate unnecessary computer executable instructions from the forwarding-plane program code, e.g. NOP instructions.

At some point of the iterative reward-driven learning, the forwarding-plane program code will comprise the LOOKUP, WRITE, and TRANSMIT instructions in this order. Such forwarding-plane program code will receive a reward of 100. The implementation cost of the forwarding-plane program code is at least 4, and the implementation cost can be more if the forwarding-plane program code comprises e.g. additional NOP instructions. The reward of 100 makes modification actions which modify the forwarding-plane program code to comprise the LOOKUP, WRITE, and TRANSMIT instructions in this order to have higher Q values, and thereby the probability of such modification actions is increased. After reaching the reward of 100 first time, the iterative reward-driven learning may continue, also including rounds where the reward is less than 100. This kind of iteration may be beneficial e.g. in order to remove unnecessary computer executable instructions from the forwarding-plane program code. In this example, the forwarding-plane program code is ideally the sequence {LOOKUP, WRITE, TRANSMIT} and no other computer executable instructions.

The specific examples provided in the description given above should not be construed as limiting the scope and/or the applicability of the appended claims. Lists and groups of examples provided in the description given above are not exhaustive unless otherwise explicitly stated.