Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NODES AND METHODS FOR ML-BASED CSI REPORTING
Document Type and Number:
WIPO Patent Application WO/2023/224533
Kind Code:
A1
Abstract:
A method, performed by a first node for training a first Neural Network, NN,-based encoder or decoder of a system of two or more encoders or decoders. The method comprises: 5 Receiving (801), from a second node configured for training a second encoder or decoder, a proposal for a common loss calculation method for training the two or more encoders or decoders and a proposal for a set of common NN architecture parameters; Determining (802) a common loss calculation method for training the two or 10 more encoders or decoders based on the received proposal for the common loss calculation method; Determining (803) a set of common NN architecture parameters for training the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters; 15 Training (805) the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data; providing (807) a first set of trained values of common trainable decoder parameters to a third node; and 20 receiving (808), from the third node, common updated values of the common trainable decoder parameters.

Inventors:
KARAPANTELAKIS ATHANASIOS (SE)
TIMO ROY (SE)
VANDIKAS KONSTANTINOS (SE)
TESLENKO MAXIM (SE)
SHOKRI GHADIKOLAEI HOSSEIN (SE)
ALABBASI ABDULRAHMAN (SE)
ELEFTHERIADIS LACKIS (SE)
Application Number:
PCT/SE2023/050474
Publication Date:
November 23, 2023
Filing Date:
May 15, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G06N3/0455; H04B7/06; H04B17/24; H04B7/0413
Domestic Patent References:
WO2021208061A12021-10-21
WO2022040678A12022-02-24
WO2022040655A12022-02-24
WO2022086949A12022-04-28
Foreign References:
US20210266763A12021-08-26
Other References:
ERICSSON: "Discussions on AI-CSI", 3GPP DRAFT; R1-2203282, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. RAN WG1, no. Online; 20220516 - 20220527, 29 April 2022 (2022-04-29), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052152910
Attorney, Agent or Firm:
BOU FAICAL, Roger (SE)
Download PDF:
Claims:
CLAIMS

1. A method, performed by a first node (701, 711), for training a first Neural Network, NN, -based encoder (701-1) or decoder (711-1) of a system of two or more NN-based encoders (701-1, 702-1) or decoders (711-1, 712-1) to encode Channel State Information (CSI) or decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), and the method comprises: receiving (801), from a second node (702, 712) configured for training a second encoder (702-1) or decoder (712-1) of the two or more encoders (701-1, 702- 1) or decoders (711-1, 712-1), a proposal for a common loss calculation method to be used for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1) and a proposal for a set of common NN architecture parameters for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1); determining (802) a common loss calculation method to be used for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1) based on the received proposal for the common loss calculation method; determining (803) a set of common NN architecture parameters for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1) based on the received proposal for the set of common NN architecture parameters; training (805) the first encoder (701-1) or decoder (711-1) based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data; obtaining (806), based on training the first encoder (701-1) or decoder (711- 1), a first set of trained values of common trainable encoder or decoder parameters; providing (807) the first set of trained values of the common trainable encoder or decoder parameters to a third node (703, 713); and receiving (808), from the third node (703, 713), common updated values of the common trainable decoder parameters.

2. The method according to claim 1 , further comprising: receiving (800a), from the second node (702, 712), a proposal for a common type of encoder or decoder training, wherein the type of encoder or decoder training comprises: a) a first type in which the two or more encoders (701-1 , 702-1) train together with respective two or more trainable decoders (711-1 , 712-1) for which the decoder trainable parameters are to be trained or in which the two or more decoders (711-1, 712-1) train together with respective two or more trainable encoders (701-1, 702-1) for which the encoder trainable parameters are to be trained, and b) a second type in which the two or more encoders (701-1, 702-1) train together with respective two or more fixed decoders (711-1, 712-1) for which the decoder trainable parameters are not to be trained or in which the two or more decoders (711-1 , 712-1) train together with respective two or more fixed encoders (701-1 , 702-1) for which the encoder trainable parameters are not to be trained; and determining (800b) the common type of encoder or decoder training based on at least the proposal for the common type of encoder or decoder training; and training (805) the first encoder (701-1) or the first decoder (711-1) further based on the determined common type of encoder or decoder training.

3. The method according to claims 1-2, wherein the common NN architecture parameters are associated with one or more common NN layers of the first encoder (701-1) and the second encoder (702-1), or of the first decoder (711-1) and the second decoder (712-1), and wherein the one or more common NN layers is a subset of NN layers of the first encoder (701-1) or the first decoder (711-1) and/or a subset of NN layers of the second encoder (702-1) or the second decoder (712-1).

4. The method according to claims 1-3, wherein at least one common NN layer is a convolutional layer.

5. The method according to claims 1-4, wherein a first payload (Yi) of the first compressed channel data is of the same size as a second payload (YK) of a second compressed channel data used for training of the second encoder or the second decoder.

6. The method according to claims 1-5, wherein the common loss calculation method is based on any of the following: a common loss function, or different loss functions together with L1 and/or L2 regularizers, and/or methods for maintaining custom decoder per encoder such as personalized federated learning. The method according to claims 1-6, further comprising transmitting (804), to the second node (702, 712), the common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1, 712-1) and the set of common NN architecture parameters for training each of the two or more NN- based encoders or decoders (711-1, 712-1). The method according to claim 1-7, further comprising: obtaining (806) a first initial set of values of common trainable encoder or decoder parameters, obtaining first channel data; and obtaining first compressed channel data. The method according to claims 1-8, wherein the first node (701, 711) is configured to perform the method of any one of claims 1-8 based on an availability of the first node (701 , 711) in terms of current load. A method, performed by a second node (702, 712), for training a second Neural Network, NN, -based encoder (701-1) or decoder (711-1) of a system of two or more NN-based encoders (701-1 , 702-1) or decoders (711-1 , 712-1) to encode Channel State Information (CSI) or to decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), and the method comprises: transmitting (811), to a first node (701, 711) configured for training a first encoder (701-1) or decoder (711-1) of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1), a proposal for a common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1, 712-1) and a proposal for a set of common NN architecture parameters for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1); receiving (812), from the first node (701 , 711), the common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1, 712-1) and the set of common NN architecture parameters for training each of the two or more encoders (701-1, 702-1) or decoders (711-1, 712-1); training (813) the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data; obtaining (814), based on training the second encoder or decoder, a second set of trained values of the common trainable encoder or decoder parameters; providing (815) the second set of trained values of the common trainable encoder or decoder parameters to a third node (703, 713); receiving (816), from the third node (703, 713), common updated values of the common trainable encoder or decoder parameters.

11. The method according to claim 10, further comprising: obtaining a second initial set of values of common trainable encoder or decoder parameters, obtaining the encoded second channel data for calculating a second loss value based on the common loss calculation method; and obtaining second compressed channel data.

12. A method, performed by a third node (703, 713), for training a system (700) of two or more Neural Network, NN, -based decoders (711-1 , 712-1) or encoders (701-1 , 702-1) to encode Channel State Information (CSI) or to decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), the method comprises: receiving (821) a first set of trained values of common trainable encoder or decoder parameters from a first node (701, 711) configured for training a first encoder (701-1) or decoder (711-1) of the system (700) of two or more decoders (711-1 , 712- 1) or encoders (701-1, 702-1); receiving (822) a second set of trained values of common trainable encoder or decoder parameters from a second node (702, 712) configured for training a second encoder (702-1) or decoder (712-1) of the system (700) of two or more decoders (711- 1, 712-1) or encoders (701-1, 702-1); computing (823) common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm; and transmitting (824), to the first and second nodes (701, 702, 711, 712), the computed common updated values of the common trainable decoder parameters. A first node (701 , 711) for training a first Neural Network, NN, -based encoder (701-1) or decoder (711-1) of a system of two or more NN-based encoders (701-1 , 702-1) or decoders (711-1, 712-1) to encode Channel State Information, CSI, or decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), and configured to: receive, from a second node (702, 712) configured for training a second encoder (702-1) or decoder (712-1) of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1), a proposal for a common loss calculation method to be used for training each of the two or more encoders (701-1, 702-1) or decoders (711-1 , 712-1) and a proposal for a set of common NN architecture parameters for training each of the two or more encoders (701-1, 702-1) or decoders (711-1 , 712-1); determine a common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1) based on the received proposal for the common loss calculation method; determine a set of common NN architecture parameters for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1) based on the received proposal for the set of common NN architecture parameters; train the first encoder (701-1) or decoder (711-1) based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data; obtain, based on training the first encoder (701-1) or decoder (711-1), a first set of trained values of common trainable decoder parameters; provide the first set of trained values of the common trainable encoder or decoder parameters to a third node (703, 713); and receive, from the third node (703, 713), common updated values of the common trainable encoder or decoder parameters. The first node (701 , 711) according to claim 13, further configured to perform the method of any one of claims 2-9. A second node (702, 712) for training a second Neural Network, NN, -based encoder (701-1) or decoder (711-1) of a system of two or more NN-based encoders (701-1 , 702-1) or decoders (711-1 , 712-1) to encode Channel State Information (CSI) or to decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), and configured to: transmit, to a first node (701 , 711) configured for training a first encoder (701-1) or decoder (711-1) of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712- 1), a proposal for a common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1) and a proposal for a set of common NN architecture parameters for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1); receive, from the first node (701 , 711), the common loss calculation method to be used for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1) and the set of common NN architecture parameters for training each of the two or more encoders (701-1 , 702-1) or decoders (711-1 , 712-1); train the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data; obtain, based on training the second encoder or decoder, a second set of trained values of the common trainable encoder or decoder parameters; provide the second set of trained values of the common trainable encoder or decoder parameters to a third node (703, 713); receive, from the third node (703, 713), common updated values of the common trainable encoder or decoder parameters. The second node (702, 712) according to claim 15, further configured to perform the method of claim 11 . A third node (703, 713) for training a system (700) of two or more Neural Network, NN, -based decoders (711-1 , 712-1) or encoders (701-1 , 702-1) to encode Channel State Information (CSI) or to decode the encoded CSI associated with a wireless channel between a wireless communications device (121) and a radio access node (111) in a wireless communications network (100), and configured to: receive a first set of trained values of common trainable encoder or decoder parameters from a first node (701 , 711) configured for training a first encoder (701-1) or decoder (711-1) of the system (700) of two or more decoders (711-1 , 712-1) or encoders (701-1 , 702-1); receive a second set of trained values of common trainable encoder or decoder parameters from a second node (702, 712) configured for training a second encoder (702-1) or decoder (712-1) of the system (700) of two or more decoders (711-1 , 712- 1) or encoders (701-1 , 702-1); compute common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm; and transmit, to the first and second nodes (701 , 702, 711 , 712), the computed common updated values of the common trainable decoder parameters.

18. A computer program (1003, 1103, 1203), comprising computer readable code units which when executed on a node (701 , 711 , 702, 712) causes the node (701 , 711 , 702, 712) to perform the method according to any one of claims 1-12.

19. A carrier (1005, 1105, 1205) comprising the computer program according to the preceding claim, wherein the carrier (1005, 1105, 1205) is one of an electronic signal, an optical signal, a radio signal and a computer readable medium.

Description:
NODES AND METHODS FOR ML-BASED CSI REPORTING

TECHNICAL FIELD

The embodiments herein relate to nodes and methods for ML-based CSI reporting. A corresponding computer program and a computer program carrier are also disclosed.

BACKGROUND

In a typical wireless communication network, wireless devices, also known as wireless communication devices, mobile stations, stations (STA) and/or User Equipments (UE), communicate via a Local Area Network such as a Wi-Fi network or a Radio Access Network (RAN) to one or more core networks (CN). The RAN covers a geographical area which is divided into service areas or cell areas. Each service area or cell area may provide radio coverage via a beam or a beam group. Each service area or cell area is typically served by a radio access node such as a radio access node e.g., a Wi-Fi access point or a radio base station (RBS), which in some networks may also be denoted, for example, a NodeB, eNodeB (eNB), or gNB as denoted in 5G. A service area or cell area is a geographical area where radio coverage is provided by the radio access node. The radio access node communicates over an air interface operating on radio frequencies with the wireless device within range of the radio access node.

Specifications for the Evolved Packet System (EPS), also called a Fourth Generation (4G) network, have been completed within the 3rd Generation Partnership Project (3GPP) and this work continues in the coming 3GPP releases, for example to specify a Fifth Generation (5G) network also referred to as 5G New Radio (NR). The EPS comprises the Evolved Universal Terrestrial Radio Access Network (E-UTRAN), also known as the Long Term Evolution (LTE) radio access network, and the Evolved Packet Core (EPC), also known as System Architecture Evolution (SAE) core network. E- UTRAN/LTE is a variant of a 3GPP radio access network wherein the radio access nodes are directly connected to the EPC core network rather than to RNCs used in 3G networks. In general, in E-UTRAN/LTE the functions of a 3G RNC are distributed between the radio access nodes, e.g. eNodeBs in LTE, and the core network. As such, the RAN of an EPS has an essentially “flat” architecture comprising radio access nodes connected directly to one or more core networks, i.e. they are not connected to RNCs. To compensate for that, the E-UTRAN specification defines a direct interface between the radio access nodes, this interface being denoted the X2 interface.

Wireless communication systems in 3GPP

Figure 1 illustrates a simplified wireless communication system. Consider the simplified wireless communication system in Figure 1, with a UE 12, which communicates with one or multiple access nodes 103-104, which in turn is connected to a network node 106. The access nodes 103-104 are part of the radio access network 10.

For wireless communication systems pursuant to 3GPP Evolved Packet System, (EPS), also referred to as Long Term Evolution, LTE, or 4G, standard specifications, such as specified in 3GPP TS 36.300 and related specifications, the access nodes 103-104 corresponds typically to Evolved NodeBs (eNBs) and the network node 106 corresponds typically to either a Mobility Management Entity (MME) and/or a Serving Gateway (SGW). The eNB is part of the radio access network 10, which in this case is the E-UTRAN (Evolved Universal Terrestrial Radio Access Network), while the MME and SGW are both part of the EPC (Evolved Packet Core network). The eNBs are inter-connected via the X2 interface, and connected to EPC via the S1 interface, more specifically via S1-C to the MME and S1-U to the SGW.

For wireless communication systems pursuant to 3GPP 5G System, 5GS (also referred to as New Radio, NR, or 5G) standard specifications, such as specified in 3GPP TS 38.300 and related specifications, on the other hand, the access nodes 103-104 corresponds typically to an 5G NodeB (gNB) and the network node 106 corresponds typically to either an Access and Mobility Management Function (AMF) and/or a User Plane Function (UPF). The gNB is part of the radio access network 10, which in this case is the NG-RAN (Next Generation Radio Access Network), while the AMF and UPF are both part of the 5G Core Network (5GC). The gNBs are inter-connected via the Xn interface, and connected to 5GC via the NG interface, more specifically via NG-C to the AMF and NG-U to the UPF.

To support fast mobility between NR and LTE and avoid change of core network, LTE eNBs may also be connected to the 5G-CN via NG-U/NG-C and support the Xn interface. An eNB connected to 5GC is called a next generation eNB (ng-eNB) and is considered part of the NG-RAN. LTE connected to 5GC will not be discussed further in this document; however, it should be noted that most of the solutions/features described for LTE and NR in this document also apply to LTE connected to 5GC. In this document, when the term LTE is used without further specification it refers to LTE-EPC.

NR uses Orthogonal Frequency Division Multiplexing (OFDM) with configurable bandwidths and subcarrier spacing to efficiently support a diverse set of use-cases and deployment scenarios. With respect to LTE, NR improves deployment flexibility, user throughputs, latency, and reliability. The throughput performance gains are enabled, in part, by enhanced support for Multi-User Multiple-Input Multiple-Output (MU-MIMO) transmission strategies, where two or more UEs receives data on the same time frequency resources, i.e. , by spatially separated transmissions.

A MU-MIMO transmission strategy will now be illustrated based on Figure 2. Figure 2 illustrates an example transmission and reception chain for MU-MIMO operations. Note that the order of modulation and precoding, or demodulation and combining respectively, may differ depending on the implementation of MU-MIMO transmission.

A multi-antenna base station with NTX antenna ports is simultaneously, e.g., on the same OFDM time-frequency resources, transmitting information to several UEs: the sequence S (1) is transmitted to is transmitted to UE(2), and so on. An antenna port may be a logical unit which may comprise one or more antenna elements. Before modulation and transmission, precoding is applied to each sequence to mitigate multiplexing interference - the transmissions are spatially separated.

Each UE demodulates its received signal and combines receiver antenna signals to obtain an estimate S® of the transmitted sequence. This estimate S® for UE / may be expressed as (neglecting other interference and noise sources except the MU-MIMO interference)

The second term represents the spatial multiplexing interference, due to MU-MIMO transmission, seen by UE(j). A goal for a wireless communication network may be to construct a set of precoders to meet a given target. One such target may be to make - the norm 1| large (this norm represents the desired channel gain towards user i); and the norm || , j #= i small (this norm represents the interference of user i’s transmission received by user j).

In other words, the precoder Wy 1 - 1 shall correlate well with the channel H® observed by UE(j) whereas it shall correlate poorly with the channels observed by other UEs.

To construct precoders Wy l i = 1, ... that enable efficient MU-MI MO transmissions, the wireless communication network may need to obtain detailed information about the users downlink channels H(j), i = 1, . . ,J. The wireless communication network may for example need to obtain detailed information about all the users downlink channels H(j), i = 1, . . ,J.

In deployments where full channel reciprocity holds, detailed channel information may be obtained from uplink Sounding Reference Signals (SRS) that are transmitted periodically, or on demand, by active UEs. The wireless communication network may directly estimate the uplink channel from SRS and, therefore (by reciprocity), the downlink channel H®.

However, the wireless communication network cannot always accurately estimate the downlink channel from uplink reference signals. Consider the following examples:

In frequency division duplex (FDD) deployments, the uplink and downlink channels use different carriers and, therefore, the uplink channel may not provide enough information about the downlink channel to enable MU-MIMO precoding.

In TDD deployments, the wireless communication network may only be able to estimate part of the uplink channel using SRS because UEs typically have fewer TX branches than RX branches (in which case only certain columns of the precoding matrix may be estimated using SRS). This situation is known as partial channel knowledge.

If the wireless communication network cannot accurately estimate the full downlink channel from uplink transmissions, then active UEs need to report channel information to the wireless communication network over the uplink control or data channels. In LTE and NR, this feedback is achieved by the following signalling protocol: - The wireless communication network transmits Channel State Information reference signals (CSI-RS) over the downlink using N ports.

- The UE estimates the downlink channel (or important features thereof such as eigenvectors of the channel or the Gram matrix of the channel, one or more eigenvectors that correspond to the largest eigenvalues of an estimated channel covariance matrix, one or more Discrete Fourier Transform (DFT) base vectors (described on the next page), or orthogonal vectors from any other suitable and defined vector space, that best correlates with an estimated channel matrix, or an estimated channel covariance matrix, the channel delay profile), for each of the N antenna ports from the transmitted CSI-RS.

- The UE reports CSI (e.g., channel quality index (CQI), precoding matrix indicator (PMI), rank indicator (Rl)) to the wireless communication network over an uplink control channel and/or over a data channel.

- The wireless communication network uses the UE’s feedback, e.g., the CSI reported from the UE, for downlink user scheduling and MIMO precoding.

In NR, both Type I and Type II reporting is configurable, where the CSI Type II reporting protocol has been specifically designed to enable MU -Ml MO operations from uplink UE reports, such as the CSI reports.

The CSI Type II normal reporting mode is based on the specification of sets of Discrete Fourier Transform (DFT) basis functions in a precoder codebook. The UE selects and reports L DFT vectors from the codebook that best match its channel conditions (like the classical codebook precoding matrix indicator (PMI) from earlier 3GPP releases). The number of DFT vectors L is typically 2 or 4 and it is configurable by the wireless communication network. In addition, the UE reports how the L DFT vectors should be combined in terms of relative amplitude scaling and co-phasing.

Algorithms to select L, the L DFT vectors, and co-phasing coefficients are outside the specification scope - left to UE and network implementation. Or, put another way, the 3gpp Rel. 16 specification only defines signaling protocols to enable the above message exchanges.

In the following, “DFT beams” will be used interchangeably with DFT vectors. This slight shift of terminology is appropriate whenever the base station has a uniform planar array with antenna elements separated by half of the carrier wavelength. The CSI type II normal reporting mode is illustrated in Figure 3, and described in 3gpp TS 38.214 “Physical layer procedures for data (Release 16). The selection and reporting of the L DFT vectors b n and their relative amplitudes a n is done in a wideband manner; that is, the same beams are used for both polarizations over the entire transmission frequency band. The selection and reporting of the DFT vector co-phasing coefficients are done in a subband manner; that is, DFT vector co-phasing parameters are determined for each of multiple subsets of contiguous subcarriers. The co-phasing parameters are quantized such that e j9n is taken from either a Quadrature phase-shift keying (QPSK) or 8-Phase Shift Keying (8PSK) signal constellation.

With k denoting a sub-band index, the precoder W v f J reported by the UE to the network can be expressed as follows:

The Type II CSI report can be used by the network to co-schedule multiple UEs on the same OFDM time-frequency resources. For example, the network can select UEs that have reported different sets of DFT vectors with weak correlations. The CSI Type II report enables the UE to report a precoder hypothesis that trades CSI resolution against uplink transmission overhead.

NR 3GPP Release 15 supports Type II CSI feedback using port selection mode, in addition to the above normal reporting mode. In this case,

The base station transmits a CSI-RS port in each one of the beam directions.

- The UE does not use a codebook to select a DFT vector (a beam), instead the UE selects one or multiple antenna ports from the CSI-RS resource of multiple ports.

Type II CSI feedback using port selection gives the base station some flexibility to use non-standardized precoders that are transparent to the UE. For the port-selection codebook, the precoder reported by the UE can be described as follows

Here, the vector e is a unit vector with only one non-zero element, which can be viewed as a selection vector that selects a port from the set of ports in the measured CSI- RS resource. The UE thus feeds back which ports it has selected, the amplitude factors and the co-phasing factors.

Autoencoders for Artificial Intelligence (Al)-based CSI reporting Recently neural network (NN)-based autoencoders (AEs) have shown promising results for compressing downlink MIMO channel estimates for uplink feedback. That is, the AEs are used to compress downlink MIMO channel estimates. The compresses output of the AE is then used as uplink feedback. For example, prior art document Zhilin Lu, Xudong Zhang, Hongyi He, Jintao Wang, and Jian Song, “Binarized Aggregated Network with Quantization: Flexible Deep Learning Deployment for CSI Feedback in Massive MIMO System”, arXiv, 2105.00354 v1, May, 2021 provides a recent summary of academic work.

An AE is a type of Neural Network (NN) that may be used to compress and decompress data in an unsupervised manner.

Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. As a result, unsupervised learning algorithms may first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels, often by a human, or from the output of non-learning classification algorithm.

Figure 4a illustrates a fully connected (dense) AE. The AE may be divided into two parts: an encoder (used to compress the input data ), and a decoder (used to recover important features of the input data).

The encoder and decoder are separated by a bottleneck layer that holds a compressed representation, Y in Figure 4a, of the input data X. The variable Y is sometimes called the latent representation of the input X. More specifically,

- The size of the bottleneck (latent representation) Y is smaller than the size of the input data X. The AE encoder thus compresses the input features X to Y. The decoder part of the AE tries to invert the encoder’s compression and reconstruct X with minimal error, according to some predefined loss function. AEs may have different architectures. For example, AEs may be based on dense NNs (like Figure 4a), multi-dimensional convolution NNs, recurrent NNs, transformer NNs, or any combination thereof. However, all AEs architectures possess an encoder- bottleneck-decoder structure.

Figure 4b illustrates how an AE may be used for Al-based CSI reporting in NR during an inference phase (that is, during live network operation).

The UE estimates the downlink channel (or important features thereof) using configured downlink reference signal(s), e.g., CSI-RS. For example, the UE estimates the downlink channel as a 3D complex-valued tensor, with dimensions defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency units (the granularity of which is configurable, e.g., SubCarrier (SC) or subband).

In Figure 4b the 3D complex-valued tensor is illustrated as a rectangular hexahedron with lengths of the sides defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency (SC).

The UE uses a trained AE encoder to compress the estimated channel or important features thereof down to a binary codeword. The binary codework is reported to the network over an uplink control channel and/or data channel. In practice, this codeword will likely form one part of a channel state information (CSI) report that may also include rank, channel quality, and interference information. The CSI may be used for MU -Ml MO precoding to shape an “energy pattern” of a wireless signal transmitted by the gNB.

The network uses a trained AE decoder to reconstruct the estimated channel or the important features thereof. The decompressed output of the AE decoder is used by the network in, for example, MIMO precoding, scheduling, and link adaption.

The architecture of an AE (e.g., structure, number of layers, nodes per layer, activation function etc) may need to be tailored for each particular use case, e.g., for CSI reporting. The tailoring may be achieved via a process called hyperparameter tuning. For example, properties of the data (e.g., CSI-RS channel estimates), the channel size, uplink feedback rate, and hardware limitations of the encoder and decoder may all need to be considered when designing the AE’s architecture.

After the AE’s architecture is fixed, it needs to be trained on one or more datasets.

To achieve good performance during live operation in a network (the so-called inference phase), the training datasets need to be representative of the actual data the AE will encounter during live operation in a network.

The training process involves numerically tuning the AE’s trainable parameters (e.g., the weights and biases of the underlying NN) to minimize a loss function on the training datasets. The loss function may be, for example, the Mean Squared Error (MSE) loss calculated as the average of the squared error between the UE’s downlink channel estimate H and the network’s reconstruction H, i.e. , \\H - H || . The purpose of the loss function is to meaningfully quantify the reconstruction error for the particular use case at hand.

The training process is typically based on some variant of the gradient descent algorithm, which, at its core, comprises three components: a feedforward step, a back propagation step, and a parameter optimization step. We now review these steps using a dense AE (i.e., a dense NN with a bottleneck layer, see Figure 4a) as an example.

Feedforward: A batch of training data, such as a mini-batch, (e.g., several downlink-channel estimates) is pushed through the AE, from the input to the output. The loss function is used to compute the reconstruction loss for all training samples in the batch. The reconstruction loss may be an average reconstruction loss for all training samples in the batch.

The feedforward calculations of a dense AE with N layers (n = 1,2, ..., N) may be written as follows: The output vector of layer n is computed from the output of the previous layer a^ n ~^ using the equations

In the above equation, are the trainable weights and biases of layer n, respectively, and g is an activation function (for example, a rectified linear unit).

Back propagation (BP): The gradients (partial derivatives of the loss function, L, with respect to each trainable parameter in the AE) are computed. The back propagation algorithm sequentially works backwards from the AE output, layer-by-layer, back through the AE to the input. The back propagation algorithm is built around the chain rule for differentiation: When computing the gradients for layer n in the AE, it uses the gradients for layer n + 1. For a dense AE with N layers the back propagation calculations for layer n may be expressed with the following equations where * here denotes the Hadamard multiplication of two vectors.

Parameter optimization: The gradients computed in the back propagation step are used to update the AE’s trainable parameters. An approach is to use the gradient descent method with a learning rate parameter (a) that scales the gradients of the weights and biases, as illustrated by the following update equations

A core idea here is to make small adjustments to each parameter with the aim of reducing the loss over the (mini) batch. It is common to use special optimizers to update the AE’s trainable parameters using gradient information. The following optimizers are widely used to reduce training time and improving overall performance: adaptive subgradient methods (AdaGrad), RMSProp, and adaptive moment estimation (ADAM).

The above steps (feedforward, back propagation, parameter optimization) are repeated many times until an acceptable level of performance is achieved on the training dataset. An acceptable level of performance may refer to the AE achieving a pre-defined average reconstruction error over the training dataset (e.g., normalized MSE of the reconstruction error over the training dataset is less than, say, 0.1). Alternatively, it may refer to the AE achieving a pre-defined user data throughput gain with respect to a baseline CSI reporting method (e.g., a MIMO precoding method is selected, and user throughputs are separately estimated for the baseline and the AE CSI reporting methods).

The above steps use numerical methods (e.g., gradient descent) to optimize the AE’s trainable parameters (e.g., weights and biases). The training process, however, typically involves optimizing many other parameters (e.g., higher-level hyperparameters that define the model or the training process). Some example hyperparameters are as follows:

• The architecture of the AE (e.g., convolutional, transformer) including types of layers (e.g., dense).

• Architecture-specific parameters (e.g., the number of nodes per layer in a dense network, or the kernel sizes of a convolutional network).

• The depth or size of the AE (e.g., number of layers).

• The activation functions used at each node within the AE.

• The mini-batch size (e.g., the number of channel samples fed into each iteration of the above training steps).

• The learning rate for gradient descent and/or the optimizer.

• The regularization method (e.g., weight regularization or dropout) Additional validation datasets may be used to tune such hyperparameters.

The process of designing or creating an AE (hyperparameter tuning and model training) may be expensive - consuming significant time, compute, memory, and power resources.

AE-based CSI reporting is of interest for 3GPP Rel 18 “AI/ML on PHY” study item for example because of the following reasons:

AEs may include non-linear transformations (e.g., activation functions) that help improve compression performance and, therefore, help improve MU-MI MO performance for the same uplink overhead. For example, the normal Type II CSI codebooks in 3GPP Rel 16 are based on linear DFT transformations and Singular Value Decomposition (SVD), which cannot fully exploit redundancies in the channel for compression.

AEs may be trained to exploit long-term redundancies in the propagation environment and/or site (e.g., antenna configuration) for compression purposes. For example, a particular AE does not need to work well for all possible deployments. Improved compression performance is obtained by learning which channel inputs it needs to (and doesn’t need to) reliably reconstruct at the base-station. AEs may be trained to compensate for antenna array irregularities, including, for example, non-uniformly spaced antenna elements and non-half wavelength element spacing. The Type II CSI codebooks in Rel 15 and 16, for example, use a two- dimensional DFT codebook designed for a regular planar array with perfect half wavelength element spacing.

AEs may be trained to be robust against, or updated (e.g., via transfer learning and training) to compensate for partially failing hardware as the massive MIMO product ages. For example, over time one or more of the multiple Tx and Rx radio chains in the massive MIMO antenna arrays at the base station may fail compromising the effectiveness of Type II CSI feedback. Transfer learning implies that parts of a previous neural network that has learned a different but often related task is transferred to the current network in order to speed up the learning process of the current network.

SUMMARY

As mentioned above, the AE training process may be a highly iterative process that may be expensive - consuming significant time, compute, memory, and power resources.

Therefore, it may be expected that AE architecture design and training will largely be performed offline, e.g., in a development environment, using appropriate compute infrastructure, training data, validation data, and test data. Data for training, validation, and testing may be collected from one or more of the following examples: real measurements recorded in live networks, synthetic radio channel data from, e.g., 3GPP channel models or ray tracing models and/or digital twins, and mobile drive tests.

Validation data may be part of the development and tuning of the NN, whereas the test data may be applied to the final NN. For example, a “validation dataset” may be used to optimize AE hyperparameters (like its architecture). For example, two different AE architectures may be trained on the same training dataset. Then the performance of the two trained AE architectures may be validated on the validation dataset. The architecture with the best performance on the validation dataset may be kept for the inference phase. In other words, validation may be performed on the same data set as the training, but on “unseen” data samples (e.g. taken from the same source). Test may be performed on a new data set, usually from another source and it tests the NN ability to generalize. The training of the AE in Figure 4c has some similarities with split NNs, where an NN is split into two or more sections and where each section consists of one or several consecutive layers of the NN. These sections of the NN may be in different entities/nodes and each entity may perform both feedforward and back propagations. For example, in the case of splitting the NN into two sections, the feedforward outputs of a first section are pushed to a second section. Conversely, in the back propagation step, the gradients of the first layer of the second section are pushed into the last layer of the first section.

The split NN (a.k.a. split learning) was introduced primarily to address privacy issues with user data. In the training of an AE for CSI reporting, however, the privacy (proprietary) aspects of the sections (encoder and decoder) are of interest, and training channel data may need to be shared to calculate reconstruction errors.

Autoencoders for CSI reporting -- a multi-vendor perspective

In AE-based CSI reporting, the AE encoder \s in the UE and the AE decoder is in the wireless communications network, usually in the radio access network. The UE and the wireless communications network are typically represented by different vendors (manufactures), and, therefore, the AE solution needs to be viewed from a multi-vendor perspective with potential standardization (e.g., 3GPP standardization) impacts.

It is useful to recall how 3GPP 5G networks support uplink physical layer channel coding (error control coding).

The UE performs channel encoding and the network performs channel decoding.

The channel encoders have been specified in 3GPP, which ensures that the UE’s behaviour is understood by the network and may be tested.

The channel decoders, on the other hand, are left for implementation (vendor proprietary).

If 3GPP specifies one or more AE-based CSI encoders for use in the UEs, then the corresponding AE decoders in the network may be left for implementation (e.g., constructed in a proprietary manner by training the decoders against specified AE encoders. Figure 4d illustrates a network vendor training of an AE decoder with a specified (untrainable) AE encoder. In short and as described above, a training method for the decoder may comprise comparing a loss function of the channel and the decoded channel, or some features thereof, computing the gradients (partial derivatives of the loss function, L, with respect to each trainable parameter in the AE) by back propagation, and updating the decoder weights and biases. Some fundamental differences between AE-based CSI reporting and channel coding are as follows:

Channel coding has a long and well-developed academic literature that enabled 3GPP to pre-select a few candidate architectures (or types); namely, turbo codes, linear parity check codes, and polar codes. Channel codes may all be mathematically described as linear mappings that, in turn, may be written into a standard. Therefore, synthetic channel models may be sufficient to design, study, compare, and specify channel codes for 5G.

- AEs for CSI feedback, on the other hand, have more architectural options and require many tuneable parameters (possibly hundreds of thousands). It is preferred that the AEs are trained, at least in part, on real field data that accurately represents live, in-network, conditions.

The standardization perspectives on AE-based CSI reporting may be summarized as follows:

• AE encoder, or AE decoder, or both may be standardized in a first scenario, o Training within 3GPP (e.g., NN architectures, weights and biases are specified), o Training outside 3GPP (e.g., NN architectures are specified), o Signalling for AE-based CSI reporting/configuration are specified,

• AE encoder and AE decoder may be implementation specific (vendor proprietary) in a second scenario, o Interfaces to the AE encoder and AE decoder are specified, o Signalling for AE-based CSI reporting/configuration are specified.

AE-based CSI reporting has at least the following implementation/standardization challenges and issues to solve:

• The AE encoder and the AE decoder may be complicated NNs with thousands of tuneable parameters (e.g., weights and biases) that potentially need to be open and shared, e.g., through signalling, between the network and UE vendors.

• The UE’s compute and/or power resources are limited so the AE encoder will likely need to be known in advance to the UE such that the UE implementation may be optimized for its task. o The AE encoder’s architecture will most likely need to match chipset vendors hardware, and the model (with weights and biases possibly fixed) will need to be compiled with appropriate optimizations. The process of compiling the AE encoder may be costly in time, compute, power, and memory resources. Moreover, the compilation process requires specialized software tool chains to be installed and maintained on each UE.

• The AE may depend on the UE’s, and/or network’s, antenna layout and RF chains, meaning that many different trained AEs (NNs) may be required to support all types of base station and UE designs.

• The AE design is data driven meaning that the AE performance will depend on the training data. A specified AE (either encoder or decoder or both) developed using synthetic training data (e.g., specified 3GPP channel models) may not generalize well to radio channels observed in real deployments. o To reduce the risks of overfitting to synthetic data, one may need to refine the 3GPP channel models and/or share a vast number of field data for training purposes. Here, overfitting means that the AE generalizes poorly to real data, or data observed in field, e.g., the AE achieves good performance on the training dataset, but when used in the real work, e.g. on the test set, it has poor performance.

• In specifying either an AE encoder or an AE decoder, there may be a need for 3GPP to agree on at least one reference AE decoder (resp. encoder). These reference models will be needed to provide a minimal framework for discussions and specification work, but they may leave room for vendor specific implementations of the AE decoder (resp. encoder).

Given the above challenges and issues with multi-vendor AE-based CSI reporting, there is a need for a standardized procedure that enables training of the AE-encoder (implemented by a UE/chipset vendor) and multiple AE-decoders (implemented by one or several network vendor(s)). The joint training procedure may protect proprietary implementations of the AE encoder and decoder; that is, it may not expose details of the encoder and/or decoder trained weights and loss function to the other party.

Another challenge with the state of art, is that of hardware compute and memory consumption in UEs and gNBs. Specifically, due to the multi-vendor nature of 3GPP networks, UEs potentially need to implement different encoders for different NW vendor decoders. Similarly, NW vendors may need to implement different decoders for different UE/chipset vendors encoders. For example, it is common that a UE will need to handover and/or roam between gNBs of different NW vendors (it is commonplace for a mobile network operator to use different NW vendors). Similarly, a gNB will serve UEs that have different radio chipset vendors. As described previously, the CSI encoders and decoders may be built on proprietary architectures and thus be different among radio chipset vendors. The same is true for NW vendors.

To be able to have multiple encoder/decoder models available at the same time, both UE and NW need to either maintain all models into memory or switch between models when needed. Both approaches are computationally expensive. In addition,

Maintaining models in memory is a power-consuming task, as multiple models of different complexity will be required to be stored in a form of volatile memory, such as most cases of Random-Access Memory (RAM).

Dynamically loading models in real time is challenging given that CSI reporting will share limited hardware resources with other radio functionality. While some UE may have the capability to support this type of processes without affecting their normal operation, some resource-constrained UEs, such as those in narrowband- loT (NB-loT), may not have the computational and storage capacity and/or the short or long-term power reserves to maintain all encoders in memory or switch between encoders. On the NW side, there are similar concerns as current generation of baseband boards has limited computational capacity.

A reference method to train the network’s (NW) AE decoders for receiving CSI reports in live networks that enables proprietary AE encoders for CSI in the UE and also proprietary AE decoders in the NW will now be briefly described.

In the reference method the NW constructs a training dataset for each UE AE encoder by logging the UE’s CSI report received over the air interface (the AE encoder output) together with the NW’s SRS-based estimate of the UL channel. The resulting dataset may then be used to train the NW’s AE decoder without having to know the UE’s AE encoder since the NW knows, from the dataset, both the input and the output of the encoder. This solution assumes that the CSI-RS based estimated downlink channel measured by the UE, i.e. , the input to the AE encoder, can be well approximated by the uplink channel measured by the NW using the SRSs.

Instead of supporting “fully proprietary AE encoders” in the UE, another reference solution to the above problem is to split the AE encoder into two parts - a UE proprietary part and a standardized part. More specifically, the UE vendor may implement a proprietary mapping (e.g., an NN) from the channel measurements on its receive antenna ports (e.g. the CSI-RS-based channel estimate) to a standardized channel feature space. The standardized channel feature space may be a latent representation of the channel designed using, for example, DFT basis vectors.

One solution to address the challenge of operating multiple encoders on the UE side, or decoders on the NW side, as result of their proprietary parts, may be to use a model aggregation approach. In model aggregation, individually trained models are used in combination to provide better predictions. An example of a model aggregation approach is ensemble learning techniques such as boosting and bagging, which use multiple algorithms to obtain a better prediction than when using one of such algorithms in isolation. One of their main disadvantages is that they are computationally expensive, as multiple iterations or parallel executions of different algorithms are required.

As mentioned above, the NW may have to deploy and maintain many different decoders - potentially one for each UE encoder. Similarly, a UE may have to train and maintain multiple encoders, to be able to provide compression of CSI in a NW-vendor compatible way, for the potentially many vendors a UE will encounter throughout its lifetime. In both cases, supporting many UE encoder or NW decoder models may result in excessive training and model management costs (e.g., computational and power consumption-related costs), especially in no-stationary network where the distribution (second order statistics) of the channel changes and therefore every of those models may need to be retrained.

Furthermore, a limitation of one of the approaches outlined above is that the decoder can only reconstruct standardized channel features. That is, any channel state information lost in the UE’s proprietary mapping from its CSI-RS measurements to the standardized channel feature space cannot be recovered by the BS. An object of embodiments herein may be to obviate some of the above-mentioned problems. Specifically, an object of embodiments herein may be to train CSI AE-encoders for a multi-vendor environment.

According to an aspect, the object is achieved by a method, performed by a first node for training a first Neural Network, NN, -based encoder or decoder of a system of two or more encoders or decoders. The method comprises:

The method comprises receiving, from a second node configured for training a second encoder or decoder, a proposal for a common loss calculation method for training the two or more encoders or decoders and a proposal for a set of common NN architecture parameters.

The method further comprises determining a common loss calculation method for training the two or more encoders or decoders based on the received proposal for the common loss calculation method.

The method further comprises determining a set of common NN architecture parameters for training the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.

The method further comprises training the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data.

The method further comprises providing a first set of trained values of common trainable decoder parameters to a third node.

The method further comprises receiving, from the third node, common updated values of the common trainable decoder parameters.

According to a second aspect, the object is achieved by a first node. The first node is configured to perform the method according to the first aspect above.

According to a third aspect, the object is achieved by a method, performed by a second node for training a second NN-based encoder or decoder of a system of two or more NN-based encoders or decoders to encode Channel State Information CSI or to decode the encoded CSI associated with a wireless channel between a wireless communications device and a radio access node in a wireless communications network. The method comprises transmitting, to a first node configured for training a first encoder or decoder of the two or more encoders or decoders, a proposal for a common loss calculation method to be used for training each of the two or more encoders or decoders and a proposal for a set of common NN architecture parameters for training each of the two or more encoders or decoders.

The method further comprises receiving, from the first node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more encoders or decoders.

The method further comprises training the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data.

The method further comprises obtaining, based on training the second encoder or decoder, a second set of trained values of the common trainable decoder parameters.

The method further comprises providing the second set of trained values of the common trainable encoder or decoder parameters to a third node.

The method further comprises receiving, from the third node, common updated values of the common trainable encoder or decoder parameters.

According to a fourth aspect, the object is achieved by a second node. The second node is configured to perform the method according to the third aspect above.

According to a fifth aspect, the object is achieved by a method, performed by a third node for training a system of two or more NN-based decoders or encoders to encode Channel State Information CSI or to decode the encoded CSI associated with a wireless channel between a wireless communications device and a radio access node in a wireless communications network.

The method comprises receiving a first set of trained values of common trainable encoder or decoder parameters from a first node configured for training a first encoder or decoder of the system of two or more decoders or encoders.

The method further comprises receiving a second set of trained values of common trainable encoder or decoder parameters from a second node configured for training a second encoder or decoder of the system of two or more decoders or encoders.

The method further comprises computing common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm.

The method further comprises transmitting, to the first and second nodes, the computed common updated values of the common trainable decoder parameters.

According to a sixth aspect, the object is achieved by a third node. The third node is configured to perform the method according to the fifth aspect above.

According to a further aspect, the object is achieved by a computer program comprising instructions, which when executed by a processor, causes the processor to perform actions according to any of the aspects above.

According to a further aspect, the object is achieved by a carrier comprising the computer program of the aspect above, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

The above aspects provides a possibility to train Al-based encoders and decoders for CSI reporting in an efficient way. For example, embodiments herein allow for creation of any one or more of: a global encoder model that may compress CSI signals for a LIE regardless of the NW decoder, e.g., regardless of the mobile network the LIE is attached to. a global decoder model that may decompress CSI signals for a NW regardless of the LIE encoder, e.g., regardless of the vendor of the radio chipset of the LIE that is attached to a specific NW cell.

In both the above cases, there are savings on computational, power and storage resources, as the LIE and/or NW do not need to maintain multiple NN models into memory and/or switch between NN models at runtime.

In addition to the above, an additional advantage is that chipset and/or network vendors do not share any sensitive data while training.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples of embodiments herein are described in more detail with reference to attached drawings in which: Figure 1 is a schematic block diagram illustrating embodiments of a wireless communications network.

Figure 2 illustrates an example transmission and reception chain for MU -Ml MO operations.

Figure 3 is a schematic block diagram illustrating CSI type II normal reporting mode.

Figure 4a is a schematic block diagram illustrating a fully connected (dense) AE.

Figure 4b is a schematic block diagram illustrating how an AE may be used for Al- based CSI reporting in NR during an inference phase.

Figure 4c is a schematic block diagram illustrating training of an AE.

Figure 4d is a schematic block diagram illustrating a network vendor training of an

AE decoder with a specified (untrainable) AE encoder.

Figure 5 is a schematic block diagram illustrating embodiments of a wireless communications network.

Figure 6 is a schematic block diagram illustrating a reference solution that addresses issues of proprietary decoder/encoder during training.

Figure 7a is a schematic block diagram illustrating a system of nodes for decoder training according to embodiments herein.

Figure 7b is a schematic block diagram illustrating a system of nodes for decoder training according to embodiments herein.

Figure 7c is a schematic block diagram illustrating a system of nodes for encoder training according to embodiments herein.

Figure 7d is a schematic block diagram illustrating a system of nodes for encoder training according to embodiments herein.

Figure 8a is a flow chart illustrating a method performed by a first node for encoder training according to embodiments herein.

Figure 8b is a flow chart illustrating a further method performed by a first node for encoder or decoder training according to embodiments herein.

Figure 8c is a flow chart illustrating a method performed by a second node for encoder or decoder training according to embodiments herein.

Figure 8d is a flow chart illustrating a further method performed by a third node for encoder or decoder training according to embodiments herein.

Figure 9a is a signaling diagram illustrating a method for decoder training according to embodiments herein. Figure 9b is a signaling diagram illustrating a first part of a method for decoder training according to embodiments herein.

Figure 9c is a signaling diagram illustrating a second part of a method for decoder training according to embodiments herein.

Figure 9d is a signaling diagram illustrating a method for decoder training according to embodiments herein.

Figure 10 is a schematic block diagram illustrating a first node according to embodiments herein.

Figure 11 is a schematic block diagram illustrating a second node according to embodiments herein.

Figure 12 is a schematic block diagram illustrating a third node according to embodiments herein.

Figure 13 schematically illustrates a telecommunication network connected via an intermediate network to a host computer.

Figure 14 is a generalized block diagram of a host computer communicating via a base station with a user equipment over a partially wireless connection.

Figures 15-18 are flowcharts illustrating methods implemented in a communication system including a host computer, a base station and a user equipment.

DETAILED DESCRIPTION

As mentioned above, Al-based CSI reporting in wireless communication networks may be improved in several ways. An object of embodiments herein is therefore to improve Al-based CSI reporting in wireless communication networks.

Embodiments herein relate to wireless communication networks in general. Figure 5 is a schematic overview depicting a wireless communications network 100 wherein embodiments herein may be implemented. The wireless communications network 100 comprises one or more RANs and one or more CNs. The wireless communications network 100 may use a number of different technologies, such as Wi-Fi, Long Term Evolution (LTE), LTE-Advanced, 5G, New Radio (NR), Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications/enhanced Data rate for GSM Evolution (GSM/EDGE), Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), just to mention a few possible implementations. Embodiments herein relate to recent technology trends that are of particular interest in a 5G context, however, embodiments are also applicable in further development of the existing wireless communication systems such as e.g. WCDMA and LTE.

Network nodes, such as radio access nodes, operate in the wireless communications network 100. Figure 5 illustrates a radio access node 111. The radio access node 111 provides radio coverage over a geographical area, a service area referred to as a cell 115, which may also be referred to as a beam or a beam group of a first radio access technology (RAT), such as 5G, LTE, Wi-Fi or similar. The radio access node 111 may be a NR-RAN node, transmission and reception point e.g. a base station, a radio access node such as a Wireless Local Area Network (WLAN) access point or an Access Point Station (AP STA), an access controller, a base station, e.g. a radio base station such as a NodeB, an evolved Node B (eNB, eNode B), a gNB, a base transceiver station, a radio remote unit, an Access Point Base Station, a base station router, a transmission arrangement of a radio base station, a stand-alone access point or any other network unit capable of communicating with a wireless device within the service area depending e.g. on the radio access technology and terminology used. The respective radio access node 111 may be referred to as a serving radio access node and communicates with a UE with Downlink (DL) transmissions on a DL channel 123-DL to the UE and Uplink (UL) transmissions on an UL channel 123-UL from the UE.

A number of wireless communications devices operate in the wireless communication network 100, such as a UE 121.

The UE 121 may be a mobile station, a non-access point (non-AP) STA, a STA, a user equipment and/or a wireless terminals, that communicate via one or more Access Networks (AN), e.g. RAN, e.g. via the radio access node 111 to one or more core networks (CN) e.g. comprising a CN node 130, for example comprising an Access Management Function (AMF). It should be understood by the skilled in the art that “UE” is a non-limiting term which means any terminal, wireless communication terminal, user equipment, Machine Type Communication (MTC) device, Device to Device (D2D) terminal, or node e.g. smart phone, laptop, mobile phone, sensor, relay, mobile tablets or even a small base station communicating within a cell.

A reference solution that addresses issues of proprietary decoder/encoder during training will now be described in relation to Figure 6. In detail Figure 6 illustrates a first node 601 comprising a Neural Network, NN, -based Auto Encoder, AE, -encoder 601-1. The first node 601 may also be referred to as a training apparatus.

The first node 601 is configured for training the AE-encoder 601-1 in a training phase of the AE-encoder 601-1. The AE-encoder 601-1 is trained to provide encoded CSI from a first communications node, such as the UE 121 , to a second communications node, such as the radio access node 111 , over a communications channel, such as the UL channel 123-LIL, in a communications network, such as the wireless communications network 100. The CSI is provided in an operational phase of the AE-encoder wherein the AE-encoder 601-1 is comprised in the first communications node 121.

Figure 6 further illustrates a second node 602 comprising an NN-based AE- decoder 602-1 and having access to the channel data. The second node 602 may provide a network-controlled training service for AE-encoders to be deployed in the first communications node 121, such as a UE. The NN-based AE-decoder 602-1 may comprise a same number of input nodes as a number of output nodes of the AE-encoder

601-1.

The first node 601 may have access to one or more trained NN-based AE-encoder models for encoding the CSI. The second node 602 may have access to one or more trained NN-based AE-decoder models for decoding the encoded CSI provided by the first node 602.

The implementation of the AE-decoder 602-1 may not be fully known to the first node 601. For example, the implementation of the AE-decoder 602-1 may be proprietary to the vendor of a certain base station. However, some parameters of the AE-decoder

602-1 , like a number of inputs of the AE-decoder 602-1 , may be known to the first node 601. Thus, the implementation of the AE-decoder excluding the encoder-decoder interface may not be known to the first node 601.

Figure 6 further illustrates a further node 603 comprising a channel data base

603-1. The channel database 603-1 may be a channel data source.

In Figure 6 the first node 601, the second node 602 and the further node 603 have been illustrated as single units. However, as an alternative, each node 601, 602, 603 may be implemented as a Distributed Node (DN) and functionality, e.g. comprised in a cloud 140 as shown in Figure 6, may be used for performing or partly performing the methods. There may be a respective cloud for each node. Figure 6 may also be seen as an illustration of an embodiment of a training interface between the second node 602 providing the network-controlled training service and the UE/chipset-vendor training apparatus 601. In other words, Figure 6 illustrates a standardized development domain training interface that enables UE/chipset vendors and NW vendors to jointly train a UE encoder together with a NW decoder, without exposing proprietary aspects of the encoders and decoders.

Thus, the reference solution of Figure 6 is designed to allow the UE side to be able to train the encoder without knowledge of the NW side’s decoder loss function, output, or internal gradients of the decoder. The decoder may therefore be proprietary, without having to disclose the loss function or the architecture of the network, except for the input layer, that may be standardized, e.g., by 3GPP. At the same time, the reference solution allows for fully proprietary development of encoders, i.e., they do not need to be standardized or their architecture shared with the NW. However, there may still be room for improvements with respect to compute, storage and power concerns on the NW and UE side as a result of training multiple encoders or multiple decoders.

A multi-vendor training setup may consist of a channel data service and a NW- decoder training service:

The channel data service provides training, validation, and test channel data. The NW-vendor controlled training service provides a solution for UE/chipset vendors (e.g., research and/or development labs) to train candidate UE encoders against the NWs pre-trained decoders.

Details of the second node 602 and/or the network-controlled training service, such as decoder architecture, trainable parameters, a reconstructed channel H, a loss function, and a method to compute gradients may be transparent to the UE/chipset-vendor training apparatus 601. Instead, UE/chipset-vendor training apparatus 601 is provided with the gradients of the input to the decoder.

Embodiments herein solve the above problems by introducing solutions for training universal encoders and/or decoders. More specifically, embodiments herein are directed to

Systems and methods for training a universal decoder at the NW side using information, such as encoded channel data, from multiple encoders on the UE side. Systems and methods for training a universal encoder at the UE side using information, such as gradients of the loss function, from multiple decoders on the network side.

The solutions to the above problems may involve two actions.

In a first action, the UE/chipset and NW vendors may agree on

One or more common loss functions to be used when training encoders and/or decoders, and common ML model architectures, or a base line part of the ML model architecture, for the encoders and/or decoders.

The one or more common loss functions may be approximately the same. In some embodiments different regularizers such as L1 and L2 may be used to generate additional loss so that loss functions produce approximately the same output, given a set of training data. The approximation is defined here by a scalar; a threshold which, when comparing output of loss functions on the same dataset should not be crossed. For example, assuming two loss functions the polynomials fitting the loss curve, indicating loss reduction rate, may be computed. Comparing those polynomials using e.g., a vector similarity approach on the coefficients of the indeterminates could be one approach to identify how similar they are. A result of the comparison may be compared to the threshold, and if the result satisfies a threshold condition, such as if it is equal or below the threshold, that means that the regularizers were set to correct values.

In some embodiments where the other side (encoder/decoder) is fixed then it is not required to agree on a common loss function.

The second action performs federated learning between encoder and/or decoder models (possibly from different vendors) to train universal encoders and/or decoders. The federated learning algorithms need not be applied to the whole encoder I decoder models - they can be applied to identified “common parts” leaving room for UE/chipset and NW vendor differentiation.

In order to create a NN model applicable to every vendor (either an encoder on the UE side or a decoder on the NW side), vendors may first align their individual model architectures and loss function, as well as other hyperparameters of the training process.

Embodiments herein allows for communication between UE/chipset vendors and NW vendors to prepare (offline or online) training infrastructure for distributed training of CSI encoders and decoders via federated learning. The embodiments may be implemented or illustrated, at least in part, with an interface which may be standardized.

Embodiments herein disclose a method for federated learning of a global encoder or decoder model for CSI compression or reconstruction, from multiple UE or NW vendors, wherein the encoder or decoder is trained in a distributed manner.

Federated learning techniques such as federated averaging train algorithms in a distributed manner, across different nodes known as workers. Training is done in an iterative way, wherein in every global iteration, known as “epoch”, workers train a local model with their own data, then submit the trained model weights to a server. The server aggregates the weights to generate a global model, which is sent to workers for another round of training (i.e., another global iteration, or epoch). Over time, the global model will incorporate learnings from different worker models. There are several algorithms for federated learning, including federated averaging (FedAvg), which averages weights of worker models to calculate weights of a global model. Privacy-preserving extensions such as secure aggregation may also be used to protect the transfer of model weighs between workers and server.

Embodiments herein allow for creation of either/or: a global encoder model that may compress CSI signals for a UE regardless of the NW decoder, e.g., regardless of the mobile network the UE is attached to. a global decoder model that may decompress CSI signals for a NW regardless of the UE encoder, e.g., regardless of the vendor of the radio chipset of the UE that is attached to a specific NW cell.

In both the above cases, there are savings on computational, power and storage resources, as the UE and/or NW do not need to maintain multiple NN models into memory and/or switch between NN models at runtime.

In addition to the above, an additional advantage is that chipset and/or network vendors do not share any sensitive data while training.

Embodiments herein may comprise one or more systems of nodes according to Figure 6 above. For example, a system of nodes according to embodiments herein may comprise the following components: - A Channel Data Service (CDS), which provides input to the encoder 601-1. The CDS corresponds to the channel data base 603-1 of the further node 603. As previously described, this input contains features indicating an estimated channel quality. The estimated channel quality may, as previously discussed, be a 3- dimensional tensor where dimensions correspond to Tx antenna ports of a gNB, Rx antenna ports of an UE and frequency (either divided in subcarriers or subbands).

The encoder 601-1 which compresses the 3-dimensional tensor of the CDS to a compressed representation known as codeword. The CDS 603-1 and encoder 601-1 may coexist on the UE-chipset vendor side.

- The decoder 602-1, which reconstructs the estimated channel quality from the codeword. The decoder 602-1 exists on the NW vendor side.

In embodiments herein multiple encoders or multiple decoders are trained in parallel or sequentially. Figure 7a illustrate a system of nodes 700 in which embodiments herein may be implemented. For example, Figure 7a illustrates a first node 701 configured for training a first encoder 701-1 and a second node 702 configured for training a second encoder 702-1. Figure 7a further illustrates a further first node 711 configured for training a first decoder 711-1 and a further second node 712 configured for training a second decoder 712-1.

In some embodiments herein the first node 701 and the further first node 711 will assume a role of a driver node which may initiate and/or coordinate the training of multiple decoders or encoders.

Figure 7a further illustrate latent space Yi, .... YK, channel data Hi, ... HK, reconstructed channel data Ai, ... HK, loss function Li, ... LK, GI, ... GK gradients of the loss with respect to the trainable parameters Pi , ... PK, of the encoders and 0i , ... 0K of the decoders.

In a first embodiment, illustrated in Figure 7a, multiple NW vendors 1, ... K train their decoders 711-1, 712-1 using data from different encoders 701-1, 702-1 from different multiple UE vendors 1, ... K. Every NW vendor training a decoder using data provided from a local CDS 701-2, 702-2 and encoder output, may be considered a “worker”. In addition to the “worker”, there exists the role of the “server”. The server aggregates all worker-provided model weights to a global model, in this case a global decoder. The role of the “server” may be assumed by a NW vendor, as illustrated in Figure 7a, but may also be external to the NW vendor, e.g., a third party cloud service. The server periodically collects the weights of locally trained decoders and aggregates them, e.g., using a federated learning algorithm to a global model. This embodiment may specifically apply in cases where the decoder is to be deployed to all participating network vendors.

In a second embodiment, illustrated in Figure 7b, a single network vendor operates all the participating decoders. This network vendor establishes training sessions with multiple UE vendors and also assumes the role of the “server”. This embodiment may be suited for cases where the decoders are exclusively deployed by a single network vendor. In the second embodiment the network vendor may need to negotiate with the different chip set vendors about loss function and model.

In a third embodiment, illustrated in Figure 7c, multiple UE chipset vendors train their encoders using output data from different NW decoders from multiple network vendors. As is the case with the first embodiment, this embodiment may be suited in cases where the encoder is to be deployed at all participating UE vendors.

In a fourth embodiment, illustrated in Figure 7d, a single UE chipset vendor trains its multiple encoders using output data from different respective NW decoders. As is the case with the second embodiment, this approach may be suited in cases where the encoder is to be exclusively deployed by the single UE chipset vendor.

Note that in the first and the second embodiments federated training of a global decoder is performed. In the process encoders receive gradients which in turn may be used to update (e.g., train) its weights or alternatively the encoders may remain static, i.e. not updated during the process. It may be up to chipset vendors to apply and accept the changes to their encoders. Similarly, in the third and fourth embodiments decoders may be updated or not during training of encoders.

It should be noted that in cases of embodiments one and three above and depending on the agreements between chipset vendors and network vendors, multiple training sessions may take place, between different pairs of vendors. Even the same chipset vendor may have separate training sessions for different decoders. For example, as illustrated in Figure 7e, a chipset vendor may train two separate encoders for two network vendors. In this case, an aggregation of local training results, such as local NN trainable parameters, may happen at multiple levels. In a first level, an encoder 2 for the chipset vendor is determined, based on federation of training results from two encoders 2.1, 2.2 trained with decoders 2.1, 2.2 of different network vendors. In a second level, a global encoder for different chipset vendors is established based on the training of the decoders 1, 2 at the first level.

In a fifth embodiment, instead of training one common neural network, such as encoder or decoder, that works with all corresponding neural networks, such as decoders or encoders, different encoders for different NW decoders or different decoders for different encoders may be created but they share parts of their neural networks.

In order to train the neural networks, layers of the neural network may be marked either as a common or a dedicated layer. Workers send trained weights of local models that belong only to common layers to the server, keeping weights of dedicated layers only for itself. The server may do federated averaging of the weights and then sends the results back to the workers which update their common layers accordingly.

The approach is similar to what is shown in e.g., Figure 7d. However, a difference is that workers only send of weights of common layers and the server accordingly does not hold any complete record of decoder or encoder network weights.

With this approach different encoders will be trained tailored for different decoders while those encoders will share weights of common layers. The storage in memory of such encoders may be minimized by keeping only one copy of weights of common layers while only weights of dedicated layers need to be stored for each individual encoder. The encoder information may be stored for example in a volatile memory of the UE (e.g., RAM) or in a non-volatile memory (e.g., EEPROM). A UE also need less time to switch between different encoders because only weights of dedicated layers need to be switched.

The choice which layers to make dedicated versus common may be made empirically based on ability of the layers to provide maximum customization of encoder or decoder network with respect to amount of data associated with storing the weights of the layers. Good candidates for such layers may be convolutional layers, since weights of convolutional layers are used generally multiple times (depending on network configuration they may be used thousands of times) during inference. On the other hand, weights of fully connected layers are used only once during inference. So convolutional layers compared to fully connected layers have better potential to have bigger relative impact to customize performance of the encoder or the decoder relative to number of weights in the layers.

Appropriate methods to train NN-based encoders and decoders for CSI reporting are provided below. Exemplifying methods according to embodiments herein will now be described with reference to a flow chart in Figure 8a and Figure 8b. Figure 8b illustrate some optional actions which may be combined with the actions of Figure 8a.

The flow charts illustrate a method, performed by a first node 701, 711, for training a first Neural Network, NN, -based encoder 701-1 or decoder 711-1 of a system of two or more NN-based encoders 701-1, 702-1 or decoders 711-1, 712-1 to encode Channel State Information CSI or decode the encoded CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

Thus, the method is either for training the first NN-based encoder 701-1 to encode CSI, or for training the first NN-based decoder 711-1 to decode the encoded CSI. As mentioned above, the CSI is associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

In an optional action the first node initializes the training process by sending a suggested-parameters-request message to all worker nodes. The suggested-parameters- request message is a message comprising or indicating a request to respond with suggested NN architecture parameters.

In an optional action 800a the first node 701, 711 may receive, from the second node, a proposal for a common type of encoder or decoder training, wherein the type of encoder or decoder training comprises: a) a first type in which the two or more encoders train together with respective two or more trainable decoders for which the decoder trainable parameters are to be trained or in which the two or more decoders train together with respective two or more trainable encoders for which the encoder trainable parameters are to be trained, and b) a second type in which the two or more encoders train together with respective two or more fixed decoders for which the decoder trainable parameters are not to be trained or in which the two or more decoders train together with respective two or more fixed encoders for which the encoder trainable parameters are not to be trained. In an optional action 800b the first node 701, 711 may determine the common type of encoder or decoder training based on at least the proposal for the common type of encoder or decoder training.

In action 801 the first node 701 , 711 receives, from the second node 712 configured for training the second encoder or decoder of the two or more encoders or decoders, a proposal for a common loss calculation method to be used for training each of the two or more encoders or decoders and a proposal for a set of common NN architecture parameters for training each of the two or more encoders or decoders.

Common NN architecture parameters may for example be NN weights and biases.

The common NN architecture parameters may be associated with one or more common NN layers of the first encoder and the second encoder, or of the first decoder and the second decoder. In some embodiments herein the common NN layers is a subset of the NN layers of the first encoder or the first decoder and/or a subset of the NN layers of the second encoder or the second decoder. Thus, the common NN layers may be a subset of the NN layers of the first decoder and the second decoder. In another embodiment associated with training of the encoder side, the common NN layers may be a subset of the NN layers of the first encoder and the second encoder.

The at least one common NN layer may be a convolutional layer.

Thus, the first node may receive from all worker nodes a suggested-parameters- response message with proposals on training process and NN model-architecture parameters.

The suggested-parameters-response message may contain a description about the loss function used during the training at the worker node. The model architecture may contain a description of the number, size and type of layers used, including input, output, and hidden layers.

In action 802 the first node 701 , 711 determines a common loss calculation method to be used for training each of the two or more encoders or decoders based on the received proposal for the common loss calculation method.

In action 803 the first node 701 , 711 determines a set of common NN architecture parameters for training each of the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters. Thus, the first node may select a set of training process parameters and a NN model architecture using a selection process and may send them as suggested- parameters-selection to all worker nodes (in action 804 below). The selection may be done from the received suggestions. The training process parameters may include learning rate, size of step in gradient and batch size.

The suggested-parameters-selection message may contain a selection of the loss function. The loss function may be determined by means of majority from suggested- parameters-response, or relative vendor importance, as stored in the first node a priori to training process.

The suggested-parameters-selection message may contain a mapping function relating a worker node to a weight which weight the other parameters, the gradient or updated NN weights. The mapping function may be determined by means of majority from suggested-parameters-response, or relative vendor importance.

The suggested-parameters-selection message may contain a description of the selected model architecture. The description may contain either a complete or a partial part of the original suggested-parameters-response message in terms of number of NN layers and NN layer position which may be either “front” or “back” of the model. By “front” of the model is meant one or more layers in the beginning of the model and by “back” is meant one or more layers at the end of the model.

In an optional action 804 the first node 701 , 711 may transmit, to the second node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more NN-based encoder or decoders.

The worker nodes may respond to the first node in a suggested-parameters- selection-response message, either with a full or partial acknowledgement of the training parameters and NN model architecture.

The first node may initialize a federated learning session between the worker nodes and the first node acting as server executing a federated learning algorithm.

In action 805 the first node 701 , 711 trains the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and encoded first channel data. In some embodiments the first node 701, 711 trains the first encoder or the first decoder further based on the determined common type of encoder or decoder training.

A first payload of the first encoded channel data may be of the same size as a second payload of a second encoded channel data used for training of the second encoder or the second decoder. In embodiments herein a size of the payload may refer to a format or shape of the payload. Thus, the first payload of the first encoded channel data may be of the same format or shape as the second payload of the second encoded channel data. Further, the first payload of the first encoded channel data may be of the same data type as the second payload of the second encoded channel data.

For example, the first node 701 may train the first encoder 701-1 by making the first encoder 701-1 encode the first channel data based on the set of common NN architecture parameters. The encoded first channel data is sent to the first decoder 711-1 in the further first node 711 which decodes the encoded first channel data. A loss is calculated based on the common loss calculation method and the first channel data and decoded first channel data. The loss function may take as inputs the first channel data and decoded channel data. The decoded channel data corresponds to decoded encoded channel data. For example, the first decoder 711-1 may decode the encoded first channel data. The loss function may compare the original channel data (ground truth) to the decoded channel data to compute a loss.

Gradients of the loss with respect to the common NN architecture parameters may also be calculated. The loss and the gradients may be used to update the common NN architecture parameters for the first encoder 701-1.

Similarly, the loss and the gradients may be used to update the common NN architecture parameters for the first decoder 711-1.

The common loss calculation method may be based on any of the following: a common loss function, or different loss functions together with L1 and/or L2 regularizers or methods for maintaining custom decoder per encoder such as personalized federated learning which will be described below in relation to Figure 9d.

As part of the training or before the training the first node 701 , 711 may obtain a first initial set of values of common trainable encoder or decoder parameters.

The first node 701, 711 may further obtain the first (uncompressed) channel data. The channel data may be obtained as input for calculating a first loss value based on the common loss calculation method. The first node 701, 711 may further obtain encoded first channel data, e.g., as output from the first encoder 701-1. The encoded first channel data may also be referred to as compressed first channel data. The encoded first channel data may be used as input to the first decoder 711-1.

In action 806 the first node 701 , 711 obtains based on training the first encoder or decoder, a first set of trained values of common trainable encoder or decoder parameters.

In action 807 the first node 701 , 711 provides the first set of trained values of the common trainable encoder or decoder parameters to the third node.

In action 808 the first node 701 , 711 receives, from the third node, common updated values of the common trainable encoder or decoder parameters.

The first node 701, 711 may then re-train the first decoder based on the received common updated values of the common trainable decoder parameters, the common loss calculation method and the set of common NN architecture parameters. The re-training may be repeated, e.g., until a required level of performance has been achieved. The performance may be measured by the loss.

Further training may be done on all workers using the global weights. Each training loop may be called an "epoch". There may be many epochs. The number of epochs may either be preconfigured or based on the performance of the global model (e.g., using an accuracy metric). An epoch may be determined (negotiated) as an NN parameter (or hyper-parameter).

In some embodiments herein the first node is configured to perform the method of Figure 8a or Figure 8b based on an availability of the first node in terms of current load.

The role of the first node 701, 711 (the driver node) may be assigned at random, pre-agreed between NW vendors, or may be assigned based on a number of objective factors, such as the availability of each gNB, e.g., in terms of current load or resources (compute, store, or network resources). The current load may be expressed as a combination or as one of available throughput on the uplink and downlink interface, number of attached UEs in active and idle state, voltage ripples in power supply unit etc.

The driver may alternatively be selected based on level of authority (e.g., “tier-1” operator). In case of a single vendor, selection defaults to the single vendor itself.

The driver may be an entity that coordinates the federation and performs the aggregation. The driver may be external to the nodes comprising the encoders or decoders. However, the driver may also be an internal driver.

The above method will now be described from the network vendor side, or in other words from the decoder side. The method is performed by the first node 711. The method is for training the first NN-based decoder 711-1 of the system of two or more NN-based decoders 711-1 , 712-1 to decode encoded Channel State Information CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

The method comprises:

Receiving, from the second node 712 configured for training the second decoder of the two or more NN-based decoders, the proposal for the common loss calculation method to be used for training each of the two or more NN-based decoders and the proposal for the set of common NN architecture parameters for training each of the two or more NN-based decoders;

Determining the common loss calculation method to be used for training each of the two or more NN-based decoders based on the received proposal for the common loss calculation method;

Determining the set of common NN architecture parameters for training each of the two or more NN-based decoders based on the received proposal for the set of common NN architecture parameters;

Training the first decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data;

Obtaining, based on training the first decoder, the first set of trained values of common trainable decoder parameters; providing the first set of trained values of the common trainable decoder parameters to the third node; and receiving, from the third node, common updated values of the common trainable decoder parameters.

The above method will now be described from the UE vendor side, or in other words from the encoder side. The method is performed by the first node 701. The method is for training the first NN-based encoder 701-1 of the system of two or more NN-based encoders 701-1 , 702-1 to encode Channel State Information CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

The method comprises: receiving, from the second node 702 configured for training the second encoder of the two or more NN-based encoders, the proposal for the common loss calculation method to be used for training each of the two or more NN-based encoders and the proposal for the set of common NN architecture parameters for training each of the two or more NN-based encoders; determining the common loss calculation method to be used for training each of the two or more NN-based encoders based on the received proposal for the common loss calculation method; determining the set of common NN architecture parameters for training each of the two or more NN-based encoders based on the received proposal for the set of common NN architecture parameters; training the first encoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data; obtaining, based on training the first encoder, the first set of trained values of common trainable encoder parameters; providing the first set of trained values of the common trainable encoder parameters to the third node; and receiving, from the third node, common updated values of the common trainable encoder parameters.

Exemplifying methods according to embodiments herein will now be described with reference to a flow chart in Figure 8c. The flow chart illustrates a method, performed by the second node 702, 712, for training the second NN-based encoder 702-1 or decoder 712-1 of the system 700 of two or more NN-based encoders 701-1 , 702-1 or decoders 711-1 , 712-1 to encode Channel State Information CSI or to decode the encoded CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

In action 811 the second node 702, 712 transmits, to the first node 701 , 711 configured for training the first encoder or decoder of the two or more encoders or decoders, the proposal for the common loss calculation method to be used for training each of the two or more encoders or decoders and the proposal for the set of common NN architecture parameters for training each of the two or more encoders or decoders.

In action 812 the second node 702, 712 receives, from the first node, the common loss calculation method to be used for training each of the two or more encoders or decoders and the set of common NN architecture parameters for training each of the two or more encoders or decoders.

In action 813 the second node 702, 712 trains the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and encoded second channel data. The second channel data may differ from first channel data used for training of the first decoder.

As part of the training or before the training the second node 702, 712 may obtain a second initial set of values of common trainable encoder or decoder parameters.

The second node 701, 711 may further obtain the second (uncompressed) channel data. The channel data may be obtained as input for calculating a second loss value based on the common loss calculation method.

The second node 702, 712 may further obtain encoded second channel data, e.g., as output from the second encoder 702-1.

For example, the second node 702 may train the second encoder 702-1 by making the second encoder 702-1 encode the second channel data based on the set of common NN architecture parameters. The encoded second channel data is sent to the second decoder 712-1 in the further second node 712 which decodes the encoded second channel data. A loss is calculated based on the common loss calculation method and the second channel data and decoded second channel data. The loss function may take as inputs the second channel data and decoded channel data. The decoded channel data corresponds to decoded encoded channel data. For example, the second decoder 712-1 may decode the encoded second channel data. The loss function may compare the original channel data (ground truth) to the decoded channel data to compute a loss.

Gradients of the loss with respect to the common NN architecture parameters may also be calculated. The loss and the gradients may be used to update the common NN architecture parameters for the second encoder 702-1.

Similarly, the loss and the gradients may be used to update the common NN architecture parameters for the second decoder 712-1. In action 814 the second node 702, 712 obtains, based on training the second encoder or decoder, a second set of trained values of the common trainable encoder or decoder parameters.

In action 815 the second node 702, 712 provides the second set of trained values of the common trainable encoder or decoder parameters to the third node 703, 713.

In action 816 the second node 702, 712 receives, from the third node 713, common updated values of the common trainable encoder or decoder parameters.

The second node 702, 712 may then re-train the second decoder based on the received common updated values of the common trainable decoder parameters, the common loss calculation method and the set of common NN architecture parameters. The re-training may be repeated, e.g., until a required level of performance has been achieved. The performance may be measured by the loss.

Further training may be done on all workers using the global weights. Each training loop may be called an "epoch". There may be many epochs. The number of epochs may either be preconfigured or based on the performance of the global model (e.g., using an accuracy metric).

Exemplifying methods according to embodiments herein will now be described with reference to a flow chart in Figure 8d. The flow chart illustrates a method, performed by the third node 703, 713, for training the system 700 of two or more NN-based decoders

711-1, 712-1 or encoders 701-1, 702-1 to encode Channel State Information CSI or to decode the encoded CSI associated with the wireless channel between the wireless communications device 121 and the radio access node 111 in the wireless communications network 100.

In action 821 the third node 703, 713 receives the first set of trained values of common trainable encoder or decoder parameters from the first node 701, 711 configured for training the first encoder or decoder of the system 700 of two or more decoders 711-1 ,

712-1 or encoders 701-1, 702-1.

In action 822 the third node 703, 713 receives the second set of trained values of common trainable encoder or decoder parameters from the second node 702, 712 configured for training the second encoder 702-1 or decoder 712-1 of the system 700 of two or more decoders 711-1, 712-1 or encoders 701-1 , 702-1.

In action 823 the third node 703, 713 computes common updated values of the common trainable encoder or decoder parameters based on a distributed optimization algorithm and further based on the received first set of trained values and second set of trained values as input to the distributed optimization algorithm.

The distributed optimization algorithms may be one of the following: federated averaging, federated weighed averaging, federated stochastic gradient descent, or federated learning with dynamic regularization.

The federated learning algorithm may be replaced with any other distributed optimization algorithms that may solve a global optimization problem using a set of computational Workers without the need of collecting their private datasets, for example distributed stochastic gradient descent.

In action 824 the third node 703, 713 transmits, to the first and second nodes 701 , 702, 711, 712, the computed common updated values of the common trainable decoder parameters.

Detailed embodiments

Figures 9a and 9b illustrate the flow for a global decoder creation in a single NW vendor and a multi NW vendor embodiment respectively. Similar processes exist for creation of the global encoder.

The process may be split in a training phase and in an operational phase. In the training phase the federated autoencoder is generated, while in the operational phase the decoder in the gNB is used to be able to reproduce an original input from the latent representation (or latent space) that is sent from the UE.

In embodiments disclosed herein, an offline training process is assumed where UE1 and UE2 of Figure 9a each represents a node hosted on the chipset vendors administrative domain/cloud infrastructure and gNB represents a corresponding further node on the vendors’ side. The UEs of Figure 9a are responsible for training an encoder while the gNB of Figure 9a is responsible for training a decoder. In some example embodiments, the encoder on the UE side is frozen and as such it is not retrained. In these example embodiments the decoder on the other side is retrained in a federated way to produce a global decoder that may work with all participating encoders originating from the different UE chipset vendors.

In actions 1-3 of Figure 9a the CDS (Channel Data Source) sends a set of data partitioned in batches to each UE (UE1 and UE2) and the gNB. This is needed so that the gNB may reproduce the loss between the ground truth and its output.

Actions 4-16 of Figure 9a operate in a loop which is repeated for several rounds or iterations in a federated learning context. In the beginning of each round (actions 4, 5) the gNB produces specific decoders which specialize for each UE chipset vendor. Actions 6- 15 relate to the training of each decoder. In this two decoders are trained sequentially. However, the training of these decoders may also take place in parallel. For every epoch and for every batch the UE sends its latent space (latent_space_1 and latent_space_2) respectively for each encoder and the gNB's decoder computes a reproduction of a ground truth H1 A which is compared with a ground truth H A as provided by each batch. Based on a given loss function (e.g,. MSE) the gNB-node computes a first reconstruction loss 11 and a second reconstruction loss I2 which are used then for a backpropagation process to retrain the decoders. In action 16 at the end of each round the specialized decoders are averaged into one by computing the average of the weights of each layer of the decoders.

In the operational phase the global decoder produced previously is used to reproduce CSI data in action 18 from the input it receives in action 17 from each UE. The reproduced CSI data is used tosetup a physical channel with the UE in action 19. Instead of receiving CSI data directly, it receives (action 17) an encoded representation of such data which is decoded (action 18) in the gNB using the decoder which was averaged in action 16.

Figure 9b and 9c illustrate an embodiment where a global decoder is trained from multiple NW vendors. In this embodiment, before initiating the federated learning, a preparation phase illustrated in Figure 9b takes place where:

One of the gNBs of the NW vendors assumes the role of the “driver”. The role may be assigned at random, pre-agreed between NW vendors, or may be assigned based on a number of objective factors, such as the availability of each gNB (in terms of current load, said load expressed as combination or as one of available throughput on the uplink and downlink interface, number of attached UE in active and idle state, voltage ripples in power supply unit etc.).

The driver (gNB1 in Figure 9b and 9c), may start the method by sending a suggested-parameters-request message to other participating NW vendors. The other participating NW vendors respond with a set of network parameters that may include: o The loss function that may be encoded as a natural number, e.g., 1 corresponding to mean squared error, 2 for mean squared error logarithmic loss, 3 for mean absolute error loss, etc. o An identifier of the vendor may also be a natural number, e.g., 1 corresponding to a first network vendor, 2 to a second network vendor, 3 to a third network vendor, etc. o A description of the architecture of the model, that may include:

■ A description of the input layer, that may include a description of the input in terms of structure, e.g., 1-dimensional list, 2-dimensional matrix, etc., but also in terms of datatype of input e.g., float32, float64, int32, etc..

■ A description of the hidden layers, that may include the number of neurons, the type of layer, e.g., dense/fully connected, convolutional, etc.., as well as the activation function and connections of these neurons to the next layer.

■ A description of the output layer, that may include a structure of output and datatype (similar to input layer description).

The driver then selects a type of architecture to use for the decoder and a loss function. There may be several alternative approaches when making this selection: o When it comes to loss function, selection may be based on majority e.g., what loss function most of the network vendors prefer, on relative priority between vendors (e.g., this priority may be calculated on the number of total cells each vendor has, or its coverage, etc.). In another embodiment, adjustment of individual lambda parameter for regularization coefficients added to loss functions such as L1 and L2 may be considered. Regularization adds a penalty to an existing loss function, the significance of the penalty may be indicated by a lambda coefficient (L1 and L2). This means that in a way it is possible to control/shape the behaviour of the loss function. Choosing different lambda values for different loss functions may affect the behaviour of these loss functions and eventually align the behaviour to a common baseline. In embodiments herein the loss functions may be aligned to produce similar probability value distributions. The loss curves produced by the loss function during training, may have similar probability value distributions (in terms of type and parameter). Such regularization terms may be signalled to different vendors. o When it comes to architecture, in embodiments where all suggested architectures are the same, then selection is straightforward. In embodiments where all suggested architectures are not the same, then a sub-architecture of a number of first or last layers that are the same may be selected for proposal. This practically means that the workers of NW vendors will perform partial federated learning on the common part of the architecture. When the federated learning process in complete, then the common part of the architecture may be transferred to the complete models of the NW vendors, which may be trained further using proprietary data. In yet another embodiment, the driver may provide recommendations, e.g., using dropout layers to model architectures. Dropout layers indicate which neurons’ activation functions may be zeroed out, therefore changing the architecture of the network. It is conceivable that in some cases, use of different dropout layers on different architectures of decoders in individual NW vendors may lead the architectures of the decoders to eventually converge.

Once the initial negotiation process between the different vendors is complete, the process continues using iterative loops for training individual NW decoders and federating to a global decoder. This is described in Figure 9c. NB that Fig 9c is divided in two parts, part 1 and part 2, on separate drawing sheets. The process then continues with feeding weights of the global decoder back to the individual NW decoders. Although the process is similar to the one described in Figure 9a, a point to note is that Figure 9b and 9c show that the driver is the one that is doing the federation in a global model. This may not always be the case, as the federation of a global decoder model may also be done outside the driver, e.g., by a trusted third-party authority.

Figure 9d illustrates an embodiment for personalized federated learning. Figure 9d describes a variation of the process described in Figure 9a where a personalized version of each decoder is maintained for each chipset. The personalized version is obtained in actions 8 and 12 accordingly for the two different UE vendors in the example embodiment. The goal of the solver may be to estimate a function u which is UE-specific and which when applied to the decoder output produces a result that is close to the specialized decoder (decoder_1 and decoder_2) and not as close to the global model (or decoder).

The function u personalizes the model. It acts as a filter to the model weights which when applied reduces the loss to the reconstruction loss of the input for the specific combination between encoder and decoder. In essence it acts against the global decoder but since it is only a parameter it is used in case the global decoder fails, e.g., when there is high Block Error Rate for the channel that is established using the global decoder.

Figure 9d - loop 8 and loop 12 determine a function that personalizes the decoder to the UE instead of the global decoder. The gNB then has access to a global decoder and also to one or more functions that may be triggered in step 24 of Figure 9d if the global model underperforms. The function is applied to the global model and personalizes it.

Thus, once such a function has been obtained, it may be used in the operational phase for example in the case where it is noticed that a UE fails to accept the physical channel that has been established. Then the specialized decoder may be used instead for that UE vendor.

In a different embodiment the same approach may be used to establish specialized models for specific environmental conditions such as location, line-of-sight etc. The main prerequisite in both approaches is for the network vendor (denoted as gNB in the sequence diagram) to receive input from the UE when such a model is trained. In other words, it is possible to extend the concept of personalization not to just specific chipset vendor but also to specific conditions of the channel. For example, it may be assumed that instead of training for LIE/NW vendor it is possible to train for a specific UE but for different conditions e.g., location, line of sight and other conditions that are known to affect the physical channel.

Figure 10 shows an example of the first node 701 , 711 and Figure 11 shows an example of the second node 702, 712. Figure 12 shows an example of the third node 703, 713. The first node 701 , 711 may be configured to perform the method actions of Figure 8a and Figure 8b above. The second node 702, 712 may be configured to perform the method actions of Figure 8c above. The third node 703, 713 may be configured to perform the method actions of Figure 8d above.

The first node 701, 711, the second node 702, 712 and the third node 703, 713 may comprise the units illustrated in Figures 10 to 12 to perform the method actions above.

The first node 701, 711, the second node 702, 712 and the third node 703, 713 may each comprise a respective input and output interface, I/O, 1006, 1106, 1206 configured to communicate with each other. The input and output interface may comprise a wireless receiver (not shown) and a wireless transmitter (not shown).

The first node 701, 711, the second node 702, 712 and the third node 703, 713 may each comprise a respective processing unit 1001, 1101, 1201 for performing the above method actions. The respective processing unit 1001, 1101, 1201 may comprise further sub-units which will be described below.

The first node 701, 711, the second node 702, 712 and the third node 703, 713 may further comprise a respective receiving unit 1010, 1120, 1210 and a transmitting unit 1060, 1110, 1230 which may receive and transmit messages and/or signals.

The first node 701, 711 is further configured to receive, from the second node 702, 712, a proposal for a common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1 , 712-1 and a proposal for a set of common NN architecture parameters for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1.

Correspondingly, the second node 702, 712 is configured to, e.g. by the transmitting unit 1110 being configured to, transmit, a proposal for a common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1 and a proposal for a set of common NN architecture parameters for training each of the two or more encoders 701-1, 702-1 or decoders 711-1 , 712-1.

The second node 702, 712 is further configured to, e.g. by the receiving unit 1120 being configured to, receive, from the first node 701, 711, the common loss calculation method and the set of common NN architecture parameters.

The third node 703, 713 is configured to, e.g. by the receiving unit 1210 being configured to, receive, the first set of trained values of common trainable encoder or decoder parameters from the first node 701, 711 and receive a second set of trained values of common trainable encoder or decoder parameters from the second node 702, 712.

The first node 701, 711 is further configured to determine the common loss calculation method based on the received proposal for the common loss calculation method, and configured to determine the set of common NN architecture parameters.

The respective receiving unit 1010, 1120 of the first node 701, 711 and the second node 702, 712 may be configured to receive, from the third node 703, 713, common updated values of the common trainable decoder parameters.

More generally, the first node 701, 711 and the second node 702, 712 are each configured to receive, from the third node 703, 713, common updated values of the common trainable decoder parameters.

The first node 701, 711 may further comprise a determining unit 1020 which for example may determine the common loss calculation method to be used for training each of the two or more encoders or decoders based on the received proposal for the common loss calculation method, and/or determine the set of common NN architecture parameters for training each of the two or more encoders or decoders based on the received proposal for the set of common NN architecture parameters.

The first node 701, 711 is further configured to train the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data. The second node 702, 712 is further configured to train the second encoder or decoder based on the common loss calculation method, and the set of common NN architecture parameters, second channel data and second encoded channel data.

The first node 701, 711 may further comprise a training unit 1030, to train the first encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, first channel data and first encoded channel data which is based on the first channel data.

The second node 702, 712 may further comprise a training unit 1130 to train the second encoder or decoder based on the common loss calculation method, the set of common NN architecture parameters, second channel data and second encoded channel data which is based on the second channel data.

The first node 701, 711 may further comprise a obtaining unit 1040 configured to obtain, based on training the first encoder or decoder, the first set of trained values of the common trainable encoder or decoder parameters.

The second node 702, 712 may further comprise an obtaining unit 1140 configured to obtain, based on training the second encoder or decoder, the second set of trained values of the common trainable encoder or decoder parameters.

The first node 701, 711 and the second node 702, 712 may further comprise a respective providing unit 1050, 1150, e.g., to provide the respective first and second sets of trained values of the common trainable decoder parameters to the third node.

In some embodiments herein the first node 701, 711 is further configured to receive, from the second node 702, 712, the proposal for the common type of encoder or decoder training. As mentioned above, the type of encoder or decoder training comprises: a a first type in which the two or more encoders 701-1 , 702-1 train together with respective two or more trainable decoders 711-1, 712-1 for which the decoder trainable parameters are to be trained or in which the two or more decoders 711-1, 712-1 train together with respective two or more trainable encoders 701-1, 702-1 for which the encoder trainable parameters are to be trained, and b a second type in which the two or more encoders 701-1, 702-1 train together with respective two or more fixed decoders 711-1, 712-1 for which the decoder trainable parameters are not to be trained or in which the two or more decoders 711-1, 712-1 train together with respective two or more fixed encoders 701-1 , 702-1 for which the encoder trainable parameters are not to be trained. Then the first node 701, 711 may be further configured to determine the common type of encoder or decoder training based on at least the proposal for the common type of encoder or decoder training.

The first node 701, 711 may be further configured to train the first encoder 701-1 or the first decoder 711-1 further based on the determined common type of encoder or decoder training.

The first node 701, 711 may further be configured to transmit, to the second node 702, 712, the common loss calculation method to be used for training each of the two or more encoders 701-1, 702-1 or decoders 711-1, 712-1 and the set of common NN architecture parameters for training each of the two or more NN-based encoders or decoders 711-1, 712-1.

In some embodiments herein the first node 701, 711 is further configured to obtain a first initial set of values of common trainable encoder or decoder parameters. Then the first node 701, 711 may further be configured to obtain first channel data and to obtain first compressed channel data.

In some embodiments herein the second node 702, 712 is configured to obtain a second initial set of values of common trainable encoder or decoder parameters. Then the second node 702, 712 may be configured to obtain the encoded second channel data for calculating a second loss value based on the common loss calculation method and to obtain second compressed channel data.

The embodiments herein may be implemented through a respective processor or one or more processors, such as the respective processor 1004, 1104 and 1204, of a processing circuitry in the first node 701, 711, the second node 702, 712 and the third node 703, 713 together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the respective first node 701, 711, the second node 702, 712 and the third node 703, 713. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the respective first node 701, 711, the second node 702, 712 and the third node 703, 713.

The first node 701, 711, the second node 702, 712 and the third node 703, 713 may further comprise a respective memory 1002, 1102 and 1202 comprising one or more memory units. The memory comprises instructions executable by the processor in the first node 701 , 711 , the second node 702, 712 and the third node 703, 713.

Each respective memory 1002, 1102 and 1202 is arranged to be used to store e.g. information, data, configurations, and applications to perform the methods herein when being executed in the respective first node 701, 711, the second node 702, 712 and the third node 703, 713.

In some embodiments, a respective computer program 1003, 1103 and 1203 comprises instructions, which when executed by the at least one processor, cause the at least one processor of the respective first node 701 , 711, the second node 702, 712 and the third node 703, 713 to perform the actions above.

In some embodiments, a respective carrier 1005, 1105 and 1205 comprises the respective computer program, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.

Those skilled in the art will also appreciate that the units described above may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g. stored in the respective first node 701 , 711 , the second node 702, 712 and the third node 703, 713, that when executed by the respective one or more processors such as the processors described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application- Specific Integrated Circuitry (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a system-on-a-chip (SoC). With reference to Figure 13, in accordance with an embodiment, a communication system includes a telecommunication network 3210, such as a 3GPP-type cellular network, which comprises an access network 3211, such as a radio access network, and a core network 3214. The access network 3211 comprises a plurality of base stations 3212a, 3212b, 3212c, such as the source and target access node 111, 112, AP STAs NBs, eNBs, gNBs or other types of wireless access points, each defining a corresponding coverage area 3213a, 3213b, 3213c. Each base station 3212a, 3212b, 3212c is connectable to the core network 3214 over a wired or wireless connection 3215. A first user equipment (UE) such as a Non-AP STA 3291 located in coverage area 3213c is configured to wirelessly connect to, or be paged by, the corresponding base station 3212c. A second UE 3292 such as a Non-AP STA in coverage area 3213a is wirelessly connectable to the corresponding base station 3212a. While a plurality of UEs 3291, 3292 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE is connecting to the corresponding base station 3212.

The telecommunication network 3210 is itself connected to a host computer 3230, which may be embodied in the hardware and/or software of a standalone server, a cloud-implemented server, a distributed server or as processing resources in a server farm. The host computer 3230 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider. The connections 3221 , 3222 between the telecommunication network 3210 and the host computer 3230 may extend directly from the core network 3214 to the host computer 3230 or may go via an optional intermediate network 3220. The intermediate network 3220 may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network 3220, if any, may be a backbone network or the Internet; in particular, the intermediate network 3220 may comprise two or more subnetworks (not shown).

The communication system of Figure 13 as a whole enables connectivity between one of the connected UEs 3291 , 3292 such as e.g. the UE 121 , and the host computer 3230. The connectivity may be described as an over-the-top (OTT) connection 3250. The host computer 3230 and the connected UEs 3291 , 3292 are configured to communicate data and/or signaling via the OTT connection 3250, using the access network 3211 , the core network 3214, any intermediate network 3220 and possible further infrastructure (not shown) as intermediaries. The OTT connection 3250 may be transparent in the sense that the participating communication devices through which the OTT connection 3250 passes are unaware of routing of uplink and downlink communications. For example, a base station 3212 may not or need not be informed about the past routing of an incoming downlink communication with data originating from a host computer 3230 to be forwarded (e.g., handed over) to a connected UE 3291. Similarly, the base station 3212 need not be aware of the future routing of an outgoing uplink communication originating from the UE 3291 towards the host computer 3230. Example implementations, in accordance with an embodiment, of the UE, base station and host computer discussed in the preceding paragraphs will now be described with reference to Figure 14. In a communication system 3300, a host computer 3310 comprises hardware 3315 including a communication interface 3316 configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 3300. The host computer 3310 further comprises processing circuitry 3318, which may have storage and/or processing capabilities. In particular, the processing circuitry 3318 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The host computer 3310 further comprises software 3311, which is stored in or accessible by the host computer 3310 and executable by the processing circuitry 3318. The software 3311 includes a host application 3312. The host application 3312 may be operable to provide a service to a remote user, such as a UE 3330 connecting via an OTT connection 3350 terminating at the UE 3330 and the host computer 3310. In providing the service to the remote user, the host application 3312 may provide user data which is transmitted using the OTT connection 3350.

The communication system 3300 further includes a base station 3320 provided in a telecommunication system and comprising hardware 3325 enabling it to communicate with the host computer 3310 and with the UE 3330. The hardware 3325 may include a communication interface 3326 for setting up and maintaining a wired or wireless connection with an interface of a different communication device of the communication system 3300, as well as a radio interface 3327 for setting up and maintaining at least a wireless connection 3370 with a UE 3330 located in a coverage area (not shown in Figure 14) served by the base station 3320. The communication interface 3326 may be configured to facilitate a connection 3360 to the host computer 3310. The connection 3360 may be direct or it may pass through a core network (not shown in Figure 14) of the telecommunication system and/or through one or more intermediate networks outside the telecommunication system. In the embodiment shown, the hardware 3325 of the base station 3320 further includes processing circuitry 3328, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The base station 3320 further has software 3321 stored internally or accessible via an external connection.

The communication system 3300 further includes the UE 3330 already referred to. Its hardware 3335 may include a radio interface 3337 configured to set up and maintain a wireless connection 3370 with a base station serving a coverage area in which the UE 3330 is currently located. The hardware 3335 of the UE 3330 further includes processing circuitry 3338, which may comprise one or more programmable processors, applicationspecific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The UE 3330 further comprises software 3331 , which is stored in or accessible by the UE 3330 and executable by the processing circuitry 3338. The software 3331 includes a client application 3332. The client application 3332 may be operable to provide a service to a human or non-human user via the UE 3330, with the support of the host computer 3310. In the host computer 3310, an executing host application 3312 may communicate with the executing client application 3332 via the OTT connection 3350 terminating at the UE 3330 and the host computer 3310. In providing the service to the user, the client application 3332 may receive request data from the host application 3312 and provide user data in response to the request data. The OTT connection 3350 may transfer both the request data and the user data. The client application 3332 may interact with the user to generate the user data that it provides. It is noted that the host computer 3310, base station 3320 and UE 3330 illustrated in Figure 14 may be identical to the host computer 3230, one of the base stations 3212a, 3212b, 3212c and one of the UEs 3291, 3292 of Figure 13, respectively. This is to say, the inner workings of these entities may be as shown in Figure 14 and independently, the surrounding network topology may be that of Figure 13.

In Figure 14, the OTT connection 3350 has been drawn abstractly to illustrate the communication between the host computer 3310 and the use equipment 3330 via the base station 3320, without explicit reference to any intermediary devices and the precise routing of messages via these devices. Network infrastructure may determine the routing, which it may be configured to hide from the UE 3330 or from the service provider operating the host computer 3310, or both. While the OTT connection 3350 is active, the network infrastructure may further take decisions by which it dynamically changes the routing (e.g., on the basis of load balancing consideration or reconfiguration of the network).

The wireless connection 3370 between the UE 3330 and the base station 3320 is in accordance with the teachings of the embodiments described throughout this disclosure. One or more of the various embodiments improve the performance of OTT services provided to the UE 3330 using the OTT connection 3350, in which the wireless connection 3370 forms the last segment. More precisely, the teachings of these embodiments may improve the data rate, latency, power consumption and thereby provide benefits such as reduced user waiting time, relaxed restriction on file size, better responsiveness, extended battery lifetime.

A measurement procedure may be provided for the purpose of monitoring data rate, latency and other factors on which the one or more embodiments improve. There may further be an optional network functionality for reconfiguring the OTT connection 3350 between the host computer 3310 and UE 3330, in response to variations in the measurement results. The measurement procedure and/or the network functionality for reconfiguring the OTT connection 3350 may be implemented in the software 3311 of the host computer 3310 or in the software 3331 of the UE 3330, or both. In embodiments, sensors (not shown) may be deployed in or in association with communication devices through which the OTT connection 3350 passes; the sensors may participate in the measurement procedure by supplying values of the monitored quantities exemplified above, or supplying values of other physical quantities from which software 3311 , 3331 may compute or estimate the monitored quantities. The reconfiguring of the OTT connection 3350 may include message format, retransmission settings, preferred routing etc.; the reconfiguring need not affect the base station 3320, and it may be unknown or imperceptible to the base station 3320. Such procedures and functionalities may be known and practiced in the art. In certain embodiments, measurements may involve proprietary UE signaling facilitating the host computer’s 3310 measurements of throughput, propagation times, latency and the like. The measurements may be implemented in that the software 3311, 3331 causes messages to be transmitted, in particular empty or ‘dummy’ messages, using the OTT connection 3350 while it monitors propagation times, errors etc.

FIGURE 12 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 15 will be included in this section. In a first action 3410 of the method, the host computer provides user data. In an optional subaction 3411 of the first action 3410, the host computer provides the user data by executing a host application. In a second action 3420, the host computer initiates a transmission carrying the user data to the UE. In an optional third action 3430, the base station transmits to the UE the user data which was carried in the transmission that the host computer initiated, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional fourth action 3440, the UE executes a client application associated with the host application executed by the host computer.

FIGURE 16 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 13 will be included in this section. In a first action 3510 of the method, the host computer provides user data. In an optional subaction (not shown) the host computer provides the user data by executing a host application. In a second action 3520, the host computer initiates a transmission carrying the user data to the UE. The transmission may pass via the base station, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional third action 3530, the UE receives the user data carried in the transmission.

FIGURE 17 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 13 and Figure 14. For simplicity of the present disclosure, only drawing references to Figure 14 will be included in this section. In an optional first action 3610 of the method, the UE receives input data provided by the host computer. Additionally or alternatively, in an optional second action 3620, the UE provides user data. In an optional subaction 3621 of the second action 3620, the UE provides the user data by executing a client application. In a further optional subaction 3611 of the first action 3610, the UE executes a client application which provides the user data in reaction to the received input data provided by the host computer. In providing the user data, the executed client application may further consider user input received from the user. Regardless of the specific manner in which the user data was provided, the UE initiates, in an optional third subaction 3630, transmission of the user data to the host computer. In a fourth action 3640 of the method, the host computer receives the user data transmitted from the UE, in accordance with the teachings of the embodiments described throughout this disclosure.

FIGURE 18 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figures 32 and 33. For simplicity of the present disclosure, only drawing references to Figure 15 will be included in this section. In an optional first action 3710 of the method, in accordance with the teachings of the embodiments described throughout this disclosure, the base station receives user data from the UE. In an optional second action 3720, the base station initiates transmission of the received user data to the host computer. In a third action 3730, the host computer receives the user data carried in the transmission initiated by the base station.

When using the word "comprise" or “comprising” it shall be interpreted as nonlimiting, i.e. meaning "consist at least of".

The embodiments herein are not limited to the above described preferred embodiments. Various alternatives, modifications and equivalents may be used.