Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A NODE AND METHODS FOR TRAINING A NEURAL NETWORK ENCODER FOR MACHINE LEARNING-BASED CSI
Document Type and Number:
WIPO Patent Application WO/2023/211346
Kind Code:
A1
Abstract:
A method, performed by a node, for training an NN encoder for encoding CSI associated with a wireless channel. The method comprises obtaining (610) a first metric and a second metric of the wireless channel. The method further comprises obtaining (611) a first and a second uncompressed CSI as input to the compressive NN encoder based on the obtained first and second metrics. The method further comprises calculating (612), with the compressive NN encoder and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI. The method further comprises calculating (613) a first distance between the first and second uncompressed CSIs in an input space. The method further comprises calculating (614) a second distance between the first and second compressed CSIs in an output space; The method further comprises calculating (615) a loss value for the compressive NN encoder based on the first and second distances. The method further comprises updating trainable parameters of the NN encoder based on the loss value.

Inventors:
RINGH EMIL (SE)
TIMO ROY (SE)
RANJAN RAKESH (SE)
Application Number:
PCT/SE2023/050390
Publication Date:
November 02, 2023
Filing Date:
April 26, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G06N3/088; G06N3/045; G06N3/0455; G06N3/0475; H04B7/0417; H04L25/02
Domestic Patent References:
WO2022056502A12022-03-17
Foreign References:
US20180367192A12018-12-20
US20220094411A12022-03-24
Other References:
FERRAND PAUL ET AL.: "Triplet-Based Wireless Channel Charting: Architecture and Experiments", IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 7 June 2021 (2021-06-07), XP011866625, DOI: 10.1109/JSAC.2021.3087251
HUAWEI, HISILICON: "Updated views on Rel-18 AI/ML", 3GPP DRAFT; RP-212155, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. TSG RAN, no. Electronic Meeting; 20210913 - 20210917, 6 September 2021 (2021-09-06), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052049440
WANG CHENGUANG; PAN KAIKAI; TINDEMANS SIMON; PALENSKY PETER: "Training Strategies for Autoencoder-based Detection of False Data Injection Attacks", 2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE), IEEE, 26 October 2020 (2020-10-26), pages 1 - 5, XP033855411, DOI: 10.1109/ISGT-Europe47291.2020.9248894
NTT DOCOMO, INC.: "Discussion on other aspects on AI/ML for CSI feedback enhancement", 3GPP DRAFT; R1-2204376, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. RAN WG1, no. e-Meeting; 20220509 - 20220520, 28 April 2022 (2022-04-28), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052153504
ERICSSON: "Discussions on AI-CSI", 3GPP DRAFT; R1-2203282, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. RAN WG1, no. Online; 20220516 - 20220527, 29 April 2022 (2022-04-29), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP052152910
Attorney, Agent or Firm:
BOU FAICAL, Roger (SE)
Download PDF:
Claims:
CLAIMS

1. A method, performed by a node (601), for training a compressive Neural Network, NN, encoder (601-1) for encoding Channel State Information, CSI, associated with a wireless channel (123-DL) between a radio access node (111) and a wireless communications device (121) in a wireless communications network (100), the method comprises: obtaining (610) a first metric and a second metric of the wireless channel (123-DL); obtaining (611) a first and a second uncompressed CSI as input to the compressive NN encoder (601-1) based on the obtained first and second metrics of the wireless channel (123-DL); calculating (612), with the compressive NN encoder (601-1) and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI; calculating (613) a first distance between the first and second uncompressed CSIs in an input space; calculating (614) a second distance between the first and second compressed CSIs in an output space; calculating (615) a loss value for the compressive NN encoder (601-1) based on the first distance and based on the second distance; and updating (616) trainable parameters of the compressive NN encoder (601-1) based on the calculated loss value.

2. The method according to claim 1, wherein calculating (615) the loss value for the compressive NN encoder (601-1) based on the first distance comprises: selecting the first and second uncompressed CSI input values based on the first distance.

3. The method according to any of the claims 1-2, wherein calculating (615) the loss value is based on a loss function which is dependent on the first distance and/or on the second distance.

4. The method according to claims 3, wherein the loss function is dependent on an absolute value of a difference between the first distance and the second distance. The method according to any of the claims 1-4, further comprising: obtaining (610) a third metric of the wireless channel (123-DL); obtaining (611) a third uncompressed CSI as input to the compressive NN encoder (601-1) based on the obtained third metric of the wireless channel (123-DL); calculating (612), with the compressive NN encoder (601-1) and based on the third uncompressed CSI, a third compressed output value representing a third encoded CSI; calculating (613) a third distance between the first and third uncompressed CSIs in an input space; calculating (614) a fourth distance between the first and third compressed CSIs in an output space; calculating (615) the loss value for the compressive NN encoder (601-1) based on the first distance, the second distance, the third distance and the fourth distance; and updating (616) trainable parameters of the compressive NN encoder (601-1) based on the calculated loss value. The method according to claim 5, further comprising clustering the uncompressed CSIs based on the first distance and/or the third distance. The method according to claim 5 or 6, further comprising selecting the uncompressed CSIs pseudo-randomly such that the first distance is smaller than a first threshold distance and the third distance is larger than a second threshold distance. The method according to any of the claims 1-7, wherein the uncompressed CSIs are any one or more of: measured channel matrices, channel matrices after signalprocessing, eigenvectors for precoding-vector feedback, a coefficient space after beam- and/or delay-transformation. The method according to any of the claims 1-8, wherein the output space is all length n binary sequences that fulfills |T| = 2n and the second distance is a Hamming metric; or wherein the output space is a discrete set of points in representing centroids of quantizer bins and the second distance is an Euclidean distance between the centroids of the quantizer bins; or wherein the output space is a digital representation of a linear vector space and the second distance is a cosine similarity between the vectors of the linear vector space. The method according to any of the claims 1-9, wherein the first distance computes a cosine similarity between main eigenvectors of a transmitter-side covariance matrix. The method according to any of the claims 1-10, wherein if the first distance is less than a first value which is larger than zero then the second distance is less than a second value which is larger than zero and based on the first value and wherein if the first distance is larger than a third value which is larger than zero then the second distance is larger than a fourth value which is larger than zero and based on the third value. The method according to any of the claims 1-11, wherein the node (601) is a network node (511) of the wireless communications network (100), the method further comprising: receiving (700) a compressed CSI associated with the wireless channel (123- DL); obtaining (701) one or more candidate uncompressed CSIs associated with the wireless channel (123-DL); obtaining (702) corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs; calculating (703) a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space; and selecting (704) a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space. The method according to claim 12, wherein selecting (704) the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs comprises selecting the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs that minimizes the distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

14. The method according to any of the claims 12-13, wherein the network node (511) comprises a decoder and wherein obtaining (701) the one or more candidate uncompressed CSIs associated with the wireless channel (123-DL) comprises decoding the received compressed CSI with the decoder.

15. The method according to claim 14, wherein the decoder comprises a same number of input nodes as a number of output nodes of the compressive NN encoder 601-1.

16. A node (601), for training a compressive Neural Network, NN, encoder (601-1) for encoding Channel State Information, CSI, associated with a wireless channel (123- DL) between a radio access node (111) and a wireless communications device (121) in a wireless communications network (100), the node (601) being configured to: obtain a first metric and a second metric of the wireless channel (123-DL); obtain a first and a second uncompressed CSI as input to the compressive NN encoder (601-1) based on the obtained first and second metrics of the wireless channel (123-DL); calculate, with the compressive NN encoder (601-1) and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI; calculate a first distance between the first and second uncompressed CSIs in an input space; calculate a second distance between the first and second compressed CSIs in an output space; calculate a loss value for the compressive NN encoder (601-1) based on the first distance and based on the second distance; and update trainable parameters of the compressive NN encoder (601-1) based on the calculated loss value.

17. The node (601) according to claim 16, configured to perform the method of any of claims 2-15.

18. A method, performed by a network node (511), for handling CSI in a wireless communications network (100), wherein the network node (511) comprises a compressive Neural Network, NN, encoder (511-1) for encoding Channel State Information, CSI, associated with a wireless channel (123-DL) between a radio access node (111) and a wireless communications device (121) in the wireless communications network (100), the method comprises: receiving (700) a compressed CSI associated with the wireless channel (123- DL); obtaining (701) one or more candidate uncompressed CSIs associated with the wireless channel (123-DL); obtaining (702) corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs; calculating (703) a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space; and selecting (704) a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space. A network node (511), for handling CSI in a wireless communications network (100), wherein the network node (511) comprises a compressive Neural Network, NN, encoder (511-1) for encoding Channel State Information, CSI, associated with a wireless channel (123-DL) between a radio access node (111) and a wireless communications device (121) in the wireless communications network (100), the network node (511) being configured to: receive a compressed CSI associated with the wireless channel (123-DL); obtain one or more candidate uncompressed CSIs associated with the wireless channel (123-DL); obtain corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs; calculate a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space; and select a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space. A computer program (803, 903), comprising computer readable code units which when executed by at least one processor, cause the at least one processor to perform the method according to any one of claims 1-15 or 18. A computer-readable storage medium (805, 905), having stored thereon a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to any of the claims 1-15 or 18.

Description:
A NODE AND METHODS FOR TRAINING A NEURAL NETWORK ENCODER FOR

MACHINE LEARNING-BASED CSI

TECHNICAL FIELD

The embodiments herein relate to nodes and methods for training a neural network encoder for Machine Learning-based CSI. A corresponding computer program and a computer program carrier are also disclosed.

BACKGROUND

In a typical wireless communication network, wireless devices, also known as wireless communication devices, mobile stations, stations (STA) and/or User Equipments (UE), communicate via a Local Area Network such as a Wi-Fi network or a Radio Access Network (RAN) to one or more core networks (CN). The RAN covers a geographical area which is divided into service areas or cell areas. Each service area or cell area may provide radio coverage via a beam or a beam group. Each service area or cell area is typically served by a radio access node such as a radio access node e.g., a Wi-Fi access point or a radio base station (RBS), which in some networks may also be denoted, for example, a NodeB, eNodeB (eNB), or gNB as denoted in 5G. A service area or cell area is a geographical area where radio coverage is provided by the radio access node. The radio access node communicates over an air interface operating on radio frequencies with the wireless device within range of the radio access node.

Specifications for the Evolved Packet System (EPS), also called a Fourth Generation (4G) network, have been completed within the 3rd Generation Partnership Project (3GPP) and this work continues in the coming 3GPP releases, for example to specify a Fifth Generation (5G) network also referred to as 5G New Radio (NR). The EPS comprises the Evolved Universal Terrestrial Radio Access Network (E-UTRAN), also known as the Long Term Evolution (LTE) radio access network, and the Evolved Packet Core (EPC), also known as System Architecture Evolution (SAE) core network. E- UTRAN/LTE is a variant of a 3GPP radio access network wherein the radio access nodes are directly connected to the EPC core network rather than to RNCs used in 3G networks. In general, in E-UTRAN/LTE the functions of a 3G RNC are distributed between the radio access nodes, e.g. eNodeBs in LTE, and the core network. As such, the RAN of an EPS has an essentially “flat” architecture comprising radio access nodes connected directly to one or more core networks, i.e. they are not connected to RNCs. To compensate for that, the E-UTRAN specification defines a direct interface between the radio access nodes, this interface being denoted the X2 interface.

Wireless communication systems in 3GPP

Figure 1 illustrates a simplified wireless communication system. Consider the simplified wireless communication system in Figure 1, with a UE 12, which communicates with one or multiple access nodes 103-104, which in turn is connected to a network node 106. The access nodes 103-104 are part of the radio access network 10.

For wireless communication systems pursuant to 3GPP Evolved Packet System, (EPS), also referred to as Long Term Evolution, LTE, or 4G, standard specifications, such as specified in 3GPP TS 36.300 and related specifications, the access nodes 103-104 corresponds typically to Evolved NodeBs (eNBs) and the network node 106 corresponds typically to either a Mobility Management Entity (MME) and/or a Serving Gateway (SGW). The eNB is part of the radio access network 10, which in this case is the E-UTRAN (Evolved Universal Terrestrial Radio Access Network), while the MME and SGW are both part of the EPC (Evolved Packet Core network). The eNBs are inter-connected via the X2 interface, and connected to EPC via the S1 interface, more specifically via S1-C to the MME and S1-U to the SGW.

For wireless communication systems pursuant to 3GPP 5G System, 5GS (also referred to as New Radio, NR, or 5G) standard specifications, such as specified in 3GPP TS 38.300 and related specifications, on the other hand, the access nodes 103-104 corresponds typically to an 5G NodeB (gNB) and the network node 106 corresponds typically to either an Access and Mobility Management Function (AMF) and/or a User Plane Function (UPF). The gNB is part of the radio access network 10, which in this case is the NG-RAN (Next Generation Radio Access Network), while the AMF and UPF are both part of the 5G Core Network (5GC). The gNBs are inter-connected via the Xn interface, and connected to 5GC via the NG interface, more specifically via NG-C to the AMF and NG-U to the UPF.

To support fast mobility between NR and LTE and avoid change of core network, LTE eNBs may also be connected to the 5G-CN via NG-U/NG-C and support the Xn interface. An eNB connected to 5GC is called a next generation eNB (ng-eNB) and is considered part of the NG-RAN. LTE connected to 5GC will not be discussed further in this document; however, it should be noted that most of the solutions/features described for LTE and NR in this document also apply to LTE connected to 5GC. In this document, when the term LTE is used without further specification it refers to LTE-EPC.

NR uses Orthogonal Frequency Division Multiplexing (OFDM) with configurable bandwidths and subcarrier spacing to efficiently support a diverse set of use-cases and deployment scenarios. With respect to LTE, NR improves deployment flexibility, user throughputs, latency, and reliability. The throughput performance gains are enabled, in part, by enhanced support for Multi-User Multiple-Input Multiple-Output (MU-MIMO) transmission strategies, where two or more UEs receives data on the same time frequency resources, i.e. , by spatially separated transmissions.

A MU-MIMO transmission strategy will now be illustrated based on Figure 2. Figure 2 illustrates an example transmission and reception chain for MU-MIMO operations. Note that the order of modulation and precoding, or demodulation and combining respectively, may differ depending on the implementation of MU-MIMO transmission.

A multi-antenna base station with NTX antenna ports is simultaneously, e.g., on the same OFDM time-frequency resources, transmitting information to several UEs: the sequence S (1) is transmitted to is transmitted to UE(2), and so on. An antenna port may be a logical unit which may comprise one or more antenna elements. Before modulation and transmission, precoding is applied to each sequence to mitigate multiplexing interference - the transmissions are spatially separated.

Each UE demodulates its received signal and combines receiver antenna signals to obtain an estimate of the transmitted sequence. This estimate S® for UE / may be expressed as (neglecting other interference and noise sources except the MU-MIMO interference)

The second term represents the spatial multiplexing interference, due to MU-MIMO transmission, seen by A goal for a wireless communication network may be to construct a set of precoders to meet a given target. One such target may be to make - the norm 1| large (this norm represents the desired channel gain towards user i); and the norm || , j #= i small (this norm represents the interference of user i’s transmission received by user j).

In other words, the precoder Wy 1 - 1 shall correlate well with the channel H (i) observed by UE(j) whereas it shall correlate poorly with the channels observed by other UEs.

To construct precoders ... that enable efficient MU-MI MO transmissions, the wireless communication network may need to obtain detailed information about the users downlink channels H(j), i = 1, . . ,J. The wireless communication network may for example need to obtain detailed information about all the users downlink channels

In deployments where full channel reciprocity holds, detailed channel information may be obtained from uplink Sounding Reference Signals (SRS) that are transmitted periodically, or on demand, by active UEs. The wireless communication network may directly estimate the uplink channel from SRS and, therefore (by reciprocity), the downlink channel H (i) .

However, the wireless communication network cannot always accurately estimate the downlink channel from uplink reference signals. Consider the following examples:

In frequency division duplex (FDD) deployments, the uplink and downlink channels use different carriers and, therefore, the uplink channel may not provide enough information about the downlink channel to enable MU-MIMO precoding.

In TDD deployments, the wireless communication network may only be able to estimate part of the uplink channel using SRS because UEs typically have fewer TX branches than RX branches (in which case only certain columns of the precoding matrix may be estimated using SRS). This situation is known as partial channel knowledge.

If the wireless communication network cannot accurately estimate the full downlink channel from uplink transmissions, then active UEs need to report channel information to the wireless communication network over the uplink control or data channels. In LTE and NR, this feedback is achieved by the following signalling protocol: - The wireless communication network transmits Channel State Information reference signals (CSI-RS) over the downlink using N ports.

- The UE estimates the downlink channel (or important features thereof such as eigenvectors of the channel or the Gram matrix of the channel, one or more eigenvectors that correspond to the largest eigenvalues of an estimated channel covariance matrix, one or more Discrete Fourier Transform (DFT) base vectors (described on the next page), or orthogonal vectors from any other suitable and defined vector space, that best correlates with an estimated channel matrix, or an estimated channel covariance matrix, the channel delay profile), for each of the N antenna ports from the transmitted CSI-RS.

- The UE reports CSI (e.g., channel quality index (CQI), precoding matrix indicator (PMI), rank indicator (Rl)) to the wireless communication network over an uplink control channel and/or over a data channel.

- The wireless communication network uses the UE’s feedback, e.g., the CSI reported from the UE, for downlink user scheduling and MIMO precoding.

In NR, both Type I and Type II reporting is configurable, where the CSI Type II reporting protocol has been specifically designed to enable MU -Ml MO operations from uplink UE reports, such as the CSI reports.

The CSI Type II normal reporting mode is based on the specification of sets of Discrete Fourier Transform (DFT) basis functions in a precoder codebook. The UE selects and reports L DFT vectors from the codebook that best match its channel conditions (like the classical codebook precoding matrix indicator (PMI) from earlier 3GPP releases). The number of DFT vectors L is typically 2 or 4 and it is configurable by the wireless communication network. In addition, the UE reports how the L DFT vectors should be combined in terms of relative amplitude scaling and co-phasing.

Algorithms to select L, the L DFT vectors, and co-phasing coefficients are outside the specification scope - left to UE and network implementation. Or, put another way, the 3gpp Rel. 16 specification only defines signaling protocols to enable the above message exchanges.

In the following, “DFT beams” will be used interchangeably with DFT vectors. This slight shift of terminology is appropriate whenever the base station has a uniform planar array with antenna elements separated by half of the carrier wavelength. The CSI type II normal reporting mode is illustrated in Figure 3, and described in 3gpp TS 38.214 “Physical layer procedures for data (Release 16). The selection and reporting of the L DFT vectors b n and their relative amplitudes a n is done in a wideband manner; that is, the same beams are used for both polarizations over the entire transmission frequency band. The selection and reporting of the DFT vector co-phasing coefficients are done in a subband manner; that is, DFT vector co-phasing parameters are determined for each of multiple subsets of contiguous subcarriers. The co-phasing parameters are quantized such that is taken from either a Quadrature phase-shift keying (QPSK) or 8-Phase Shift Keying (8PSK) signal constellation.

With k denoting a sub-band index, the precoder reported by the UE to the network may be expressed as follows:

The Type II CSI report may be used by the network to co-schedule multiple UEs on the same OFDM time-frequency resources. For example, the network may select UEs that have reported different sets of DFT vectors with weak correlations. The CSI Type II report enables the UE to report a precoder hypothesis that trades CSI resolution against uplink transmission overhead.

NR 3GPP Release 15 supports Type II CSI feedback using port selection mode, in addition to the above normal reporting mode. In this case,

The base station transmits a CSI-RS port in each one of the beam directions.

- The UE does not use a codebook to select a DFT vector (a beam), instead the UE selects one or multiple antenna ports from the CSI-RS resource of multiple ports.

Type II CSI feedback using port selection gives the base station some flexibility to use non-standardized precoders that are transparent to the UE. For the port-selection codebook, the precoder reported by the UE may be described as follows

Here, the vector e is a unit vector with only one non-zero element, which may be viewed as a selection vector that selects a port from the set of ports in the measured CSI- RS resource. The UE thus feeds back which ports it has selected, the amplitude factors and the co-phasing factors.

Autoencoders for Artificial Intelligence (Al)-based CSI reporting Recently neural network (NN)-based autoencoders (AEs) have shown promising results for compressing downlink MIMO channel estimates for uplink feedback. That is, the AEs are used to compress downlink MIMO channel estimates. The compresses output of the AE is then used as uplink feedback. For example, prior art document Zhilin Lu, Xudong Zhang, Hongyi He, Jintao Wang, and Jian Song, “Binarized Aggregated Network with Quantization: Flexible Deep Learning Deployment for CSI Feedback in MassiveMIMO System”, arXiv, 2105.00354 v1 , May, 2021 provides a recent summary of academic work.

An AE is a type of Neural Network (NN) that may be used to compress and decompress data in an unsupervised manner.

Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. As a result, unsupervised learning algorithms may first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels, often by a human, or from the output of non-learning classification algorithm.

Figure 4a illustrates a fully connected (dense) AE. The AE may be divided into two parts: an encoder (used to compress the input data ), and a decoder (used to recover important features of the input data).

The encoder and decoder are separated by a bottleneck layer that holds a compressed representation, Y in Figure 4a, of the input data X. The variable Y is sometimes called the latent representation of the input X. More specifically,

The size of the bottleneck (latent representation) Y is smaller than the size of the input data X. The AE encoder thus compresses the input features X to Y.

The decoder part of the AE tries to invert the encoder’s compression and reconstruct X with minimal error, according to some predefined loss function. AEs may have different architectures. For example, AEs may be based on dense NNs (like Figure 4a), multi-dimensional convolution NNs, recurrent NNs, transformer NNs, or any combination thereof. However, all AEs architectures possess an encoder- bottleneck-decoder structure.

Figure 4b illustrates how an AE may be used for Al-based CSI reporting in NR during an inference phase (that is, during live network operation).

The UE estimates the downlink channel (or important features thereof) using configured downlink reference signal(s), e.g., CSI-RS. For example, the UE estimates the downlink channel as a 3D complex-valued tensor, with dimensions defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency units (the granularity of which is configurable, e.g., SubCarrier (SC) or subband).

In Figure 4b the 3D complex-valued tensor is illustrated as a rectangular hexahedron with lengths of the sides defined by the gNB’s Tx-antenna ports, the UE’s Rx antenna ports, and frequency (SC).

The UE uses a trained AE encoder to compress the estimated channel or important features thereof down to a binary codeword. The binary codework is reported to the network over an uplink control channel and/or data channel. In practice, this codeword will likely form one part of a channel state information (CSI) report that may also include rank, channel quality, and interference information. The CSI may be used for MU -Ml MO precoding to shape an “energy pattern” of a wireless signal transmitted by the gNB.

The network uses a trained AE decoder to reconstruct the estimated channel or the important features thereof. The decompressed output of the AE decoder is used by the network in, for example, MIMO precoding, scheduling, and link adaption.

The architecture of an AE (e.g., structure, number of layers, nodes per layer, activation function etc) may need to be tailored for each particular use case, e.g., for CSI reporting. The tailoring may be achieved via a process called hyperparameter tuning. For example, properties of the data (e.g., CSI-RS channel estimates), the channel size, uplink feedback rate, and hardware limitations of the encoder and decoder may all need to be considered when designing the AE’s architecture.

After the AE’s architecture is fixed, it needs to be trained on one or more datasets.

To achieve good performance during live operation in a network (the so-called inference phase), the training datasets need to be representative of the actual data the AE will encounter during live operation in a network.

The training process involves numerically tuning the AE’s trainable parameters (e.g., the weights and biases of the underlying NN) to minimize a loss function on the training datasets. The loss function may be, for example, the Mean Squared Error (MSE) loss calculated as the average of the squared error between the UE’s downlink channel estimate H and the network’s reconstruction H, i.e. , \\H - H || . The purpose of the loss function is to meaningfully quantify the reconstruction error for the particular use case at hand.

The training process is typically based on some variant of the gradient descent algorithm, which, at its core, comprises three components: a feedforward step, a back propagation step, and a parameter optimization step. We now review these steps using a dense AE (i.e., a dense NN with a bottleneck layer, see Figure 4a) as an example.

Feedforward: A batch of training data, such as a mini-batch, (e.g., several downlink-channel estimates) is pushed through the AE, from the input to the output. The loss function is used to compute the reconstruction loss for all training samples in the batch. The reconstruction loss may be an average reconstruction loss for all training samples in the batch.

The feedforward calculations of a dense AE with N layers (n = 1,2, ..., N) may be written as follows: The output vector of layer n is computed from the output of the previous laye using the equations

In the above equation, are the trainable weights and biases of layer n, respectively, and g is an activation function (for example, a rectified linear unit).

Back propagation (BP): The gradients (partial derivatives of the loss function, L, with respect to each trainable parameter in the AE) are computed. The back propagation algorithm sequentially works backwards from the AE output, layer-by-layer, back through the AE to the input. The back propagation algorithm is built around the chain rule for differentiation: When computing the gradients for layer n in the AE, it uses the gradients for layer n + 1. For a dense AE with N layers the back propagation calculations for layer n may be expressed with the following equations where * here denotes the Hadamard multiplication of two vectors.

Parameter optimization: The gradients computed in the back propagation step are used to update the AE’s trainable parameters. An approach is to use the gradient descent method with a learning rate parameter (a) that scales the gradients of the weights and biases, as illustrated by the following update equations

A core idea here is to make small adjustments to each parameter with the aim of reducing the loss over the (mini) batch. It is common to use special optimizers to update the AE’s trainable parameters using gradient information. The following optimizers are widely used to reduce training time and improving overall performance: adaptive subgradient methods (AdaGrad), RMSProp, and adaptive moment estimation (ADAM).

The above steps (feedforward, back propagation, parameter optimization) are repeated many times until an acceptable level of performance is achieved on the training dataset. The terminology epoch is used to describe when all of the training data has been used in the above steps. An acceptable level of performance may refer to the AE achieving a pre-defined average reconstruction error over the training dataset (e.g., normalized MSE of the reconstruction error over the training dataset is less than, say, 0.1). Alternatively, it may refer to the AE achieving a pre-defined user data throughput gain with respect to a baseline CSI reporting method (e.g., a MIMO precoding method is selected, and user throughputs are separately estimated for the baseline and the AE CSI reporting methods). The above steps use numerical methods (e.g., gradient descent) to optimize the AE’s trainable parameters (e.g., weights and biases). The training process, however, typically involves optimizing many other parameters (e.g., higher-level hyperparameters that define the model or the training process). Some example hyperparameters are as follows:

• The architecture of the AE (e.g., dense, convolutional, transformer).

• Architecture-specific parameters (e.g., the number of nodes per layer in a dense network, or the kernel sizes of a convolutional network).

• The depth or size of the AE (e.g., number of layers).

• The activation functions used at each node within the AE.

• The mini-batch size (e.g., the number of channel samples fed into each iteration of the above training steps).

• The learning rate for gradient descent and/or the optimizer.

• The regularization method (e.g., weight regularization or dropout) Additional validation datasets may be used to tune such hyperparameters.

The process of designing an AE (hyperparameter tuning and model training) may be expensive - consuming significant time, compute, memory, and power resources.

AE-based CSI reporting is of interest for 3GPP Rel 18 “AI/ML on PHY” study item for example because of the following reasons:

AEs may include non-linear transformations (e.g., activation functions) that help improve compression performance and, therefore, help improve MU-MI MO performance for the same uplink overhead. For example, the normal Type II CSI codebooks in 3GPP Rel 16 are based on linear DFT transformations and Singular Value Decomposition (SVD), which cannot fully exploit redundancies in the channel for compression.

AEs may be trained to exploit long-term redundancies in the propagation environment and/or site (e.g., antenna configuration) for compression purposes. For example, a particular AE does not need to work well for all possible deployments. Improved compression performance is obtained by learning which channel inputs it needs to (and doesn’t need to) reliably reconstruct at the base-station. AEs may be trained to compensate for antenna array irregularities, including, for example, non-uniformly spaced antenna elements and non-half wavelength element spacing. The Type II CSI codebooks in Rel 15 and 16, for example, use a two- dimensional DFT codebook designed for a regular planar array with perfect half wavelength element spacing.

AEs may be trained to be robust against, or updated (e.g., via transfer learning and training) to compensate for partially failing hardware as the massive MIMO product ages. For example, over time one or more of the multiple Tx and Rx radio chains in the massive MIMO antenna arrays at the base station may fail compromising the effectiveness of Type II CSI feedback. Transfer learning implies that parts of a previous neural network that has learned a different but often related task is transferred to the current network in order to speed up the learning process of the current network.

SUMMARY

Typically, the AE training process is a highly iterative process that may be expensive - consuming significant time, compute, memory, and power resources.

An AE approach for CSI reporting may require an encoder in the UE to directly interact with a decoder in the gNB over the air interface. The UE’s encoder and gNB’s decoder may need to be trained together in some way to be compatible. After the encoder and decoder are trained, they still need to be deployed to the NW and UE respectively, and configured to work with one another. Some solutions to enable joint training and sharing of large AE encoders and/or AE decoders may be:

- Sharing or standardizing (parts of) the UE’s encoder to enable the NW vendor to train decoders for its gNBs.

- Sharing or standardizing (parts of) the gNB’s decoder to enable the UE/chipset vendor to train encoders for UEs.

- Establishing and standardizing cross vendor development domain training infrastructure where UE/chipset vendors and NW vendors may collaborate to jointly train encoders and decoders.

Thus, methods for training may require that the UE’s encoder and the gNB’s decoder are jointly trained e.g., via dedicated cross-vendor development-domain training infrastructure and/or shared between the UE/chipset and NW vendors. In addition, such methods leave the air interface standard incomplete in the sense that only trained AE encoders and decoders understand the UE’s CSI report (bits signalled over Uplink Control Information (UCI)). That is, the meaning of the bits signalled by the UE’s encoder is not defined in a 3GPP technical specification; instead, the meaning of the bits is determined by the data and joint training process.

In addition, sharing AE encoders and/or decoders may lead to a situation where NW vendors and UE/chipset vendors need to implement and manage the lifecycle of many AE encoders and/or decoders - resulting in more complex and expensive products.

An object of embodiments herein may be to obviate some of the above-mentioned problems. Specifically, an object of embodiments herein may be to provide Al-based CSI from a wireless communications device, such as a UE, to a network node, such as a base station, in an efficient manner.

According to an aspect, the object is achieved by a method, performed by a node, such as a server or a wireless communications device or a network node.

The method is for handling CSI in a wireless communications network. The method comprises obtaining a first metric and a second metric of the wireless channel.

The method further comprises obtaining a first and a second uncompressed CSI as input to the compressive NN encoder based on the obtained first and second metrics of the wireless channel.

The method further comprises calculating, with the compressive NN encoder and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI.

The method further comprises calculating a first distance between the first and second uncompressed CSIs in an input space.

The method further comprises calculating a second distance between the first and second compressed CSIs in an output space.

The method further comprises calculating a loss value for the compressive NN encoder based on the first distance and based on the second distance.

The method further comprises updating trainable parameters of the compressive

NN encoder based on the calculated loss value. According to a second aspect, the object is achieved by a node. The node is configured to perform the method according to the first aspect above.

According to a third aspect, the object is achieved by a method, performed by a network node, such as radio access node.

The method is for handling CSI in a wireless communications network. The network node comprises a compressive NN encoder for encoding CSI associated with a wireless channel between a radio access node and a wireless communications device in the wireless communications network.

The method comprises receiving a compressed CSI associated with the wireless channel.

The method further comprises obtaining one or more candidate uncompressed CSIs associated with the wireless channel.

The method further comprises obtaining corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs.

The method further comprises calculating a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

The method further comprises selecting a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

According to a fourth aspect, the object is achieved by a network node. The network node is configured to perform the method according to the third aspect above.

According to a further aspect, the object is achieved by a computer program comprising instructions, which when executed by a processor, causes the processor to perform actions according to any of the aspects above.

According to a further aspect, the object is achieved by a carrier comprising the computer program of the aspect above, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium. The above aspects provides a possibility to enable Al-based CSI reporting using proprietary implementations on both the network side and the UE side.

BRIEF DESCRIPTION OF THE DRAWINGS

In the figures, features that appear in some embodiments are indicated by dashed lines.

The various aspects of embodiments disclosed herein, including particular features and advantages thereof, will be readily understood from the following detailed description and the accompanying drawings, in which:

Figure 1 illustrates a simplified wireless communication system,

Figure 2 illustrates an example transmission and reception chain for MU -Ml MO operations,

Figure 3 is a block diagram schematically illustrating CSI type II normal reporting mode,

Figure 4a schematically illustrates a fully connected, i.e. , dense, AE,

Figure 4b is a block diagram schematically illustrating how an AE may be used for Al-enhanced CSI reporting in NR during an inference phase,

Figure 5a illustrates a wireless communication system according to embodiments herein,

Figure 5b illustrates an example of a compressive neural network with eight input nodes and two output nodes,

Figure 5c illustrates a Siamese network,

Figure 5d illustrates an encoder and a method according to embodiments herein,

Figure 6a illustrates a node and a method according to embodiments herein,

Figure 6b illustrates a node and a further method according to embodiments herein, Figure 6c illustrates a node and a further method according to embodiments herein, Figure 6d illustrates a node and a further method according to embodiments herein, Figure 6e is a flowchart and illustrates a method, performed by a node, according to some embodiments herein,

Figure 6f is a flowchart and illustrates a method, performed by a node, according to some further embodiments herein,

Figure 7a is schematic block diagram illustrating a wireless communications device and a network node, Figure 7b is combined flowchart and signalling diagram according to some embodiments herein,

Figure 7c is a flowchart and illustrates a method, performed by a network node, according to embodiments herein,

Figure 8 is a block diagram schematically illustrating a node according to embodiments herein,

Figure 9 is a block diagram schematically illustrating a network node according to embodiments herein,

Figure 10 schematically illustrates a telecommunication network connected via an intermediate network to a host computer.

Figure 11 is a generalized block diagram of a host computer communicating via a base station with a user equipment over a partially wireless connection.

Figures 12 to 15 are flowcharts illustrating methods implemented in a communication system including a host computer, a base station and a user equipment.

DETAILED DESCRIPTION

As mentioned above, Al-based CSI reporting in wireless communication networks may be improved in several ways. An object of embodiments herein is therefore to improve Al-based CSI reporting in wireless communication networks.

Embodiments herein relate to wireless communication networks in general. Figure 5a is a schematic overview depicting a wireless communications network 100 wherein embodiments herein may be implemented. The wireless communications network 100 comprises one or more RANs and one or more CNs. The wireless communications network 100 may use a number of different technologies, such as Wi-Fi, Long Term Evolution (LTE), LTE-Advanced, 5G, New Radio (NR), Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications/enhanced Data rate for GSM Evolution (GSM/EDGE), Worldwide Interoperability for Microwave Access (WiMax), or Ultra Mobile Broadband (UMB), just to mention a few possible implementations. Embodiments herein relate to recent technology trends that are of particular interest in a 5G context, however, embodiments are also applicable in the context of to-be-developed future wireless communication systems such as, e.g., 6G, and further development of the existing wireless communication systems such as e.g. WCDMA and LTE. Network nodes, such as radio access nodes, operate in the wireless communications network 100. Figure 5a illustrates a radio access node 111. The radio access node 111 provides radio coverage over a geographical area, a service area referred to as a cell 115, which may also be referred to as a beam or a beam group of a first radio access technology (RAT), such as 5G, LTE, Wi-Fi or similar. The radio access node 111 may be a NR-RAN node, transmission and reception point e.g. a base station, a radio access node such as a Wireless Local Area Network (WLAN) access point or an Access Point Station (AP STA), an access controller, a base station, e.g. a radio base station such as a NodeB, an evolved Node B (eNB, eNode B), a gNB, a base transceiver station, a radio remote unit, an Access Point Base Station, a base station router, a transmission arrangement of a radio base station, a stand-alone access point or any other network unit capable of communicating with a wireless device within the service area depending e.g. on the radio access technology and terminology used. Since the embodiments are described in a 5G context, although not limited to such, sometimes the word gNB is used in place of the radio access node 111. Also, the abbreviation BS (basestation) is used in place of the radio access node 111. The respective radio access node 111 may be referred to as a serving radio access node and communicates with a UE with Downlink (DL) transmissions on a DL channel 123-DL to the UE and Uplink (UL) transmissions on an UL channel 123-UL from the UE.

A number of wireless communications devices operate in the wireless communication network 100, such as a UE 121.

The UE 121 may be a mobile station, a non-access point (non-AP) STA, a STA, a user equipment and/or a wireless terminals, that communicate via one or more Access Networks (AN), e.g. RAN, e.g. via the radio access node 111 to one or more core networks (CN) e.g. comprising a CN node 130, for example comprising an Access Management Function (AMF). It should be understood by the skilled in the art that “UE” is a non-limiting term which means any terminal, wireless communication terminal, user equipment, Machine Type Communication (MTC) device, Device to Device (D2D) terminal, or node e.g. smart phone, laptop, mobile phone, sensor, relay, mobile tablets or even a small base station communicating within a cell.

A solution to the problems described above may be to formulate the compression problem in a way that ensures that latent representations of the channels (CSI reports) have a well-defined structure that may be standardized. This standardized structure may then be used by UE/chipset and NW vendors to train their models independently using proprietary methods and data, while still ensuring the UE-side encoding is compatible with the gNB-side processing of the CSI report.

Embodiments herein make use of distance metric learning in a NN which will be explained first before embodiments of a method to train a compressive NN encoder for encoding CSI are disclosed. In short, the trainable parameters of an NN encoder may be based on a loss which in turn may be based on a first distance between two inputs to the NN encoder and a second distance between the respective outputs from the NN encoder.

One-shot learning and triplet loss

Distance metric learning is a branch of AI/ML that has been used in nearest neighbor classification, comparisons of text, image similarity, and person identification.

One version of distance metric learning uses triplet loss functions to train Siamese Networks as follows. Consider a compressive neural network that takes a highdimensional image x, from some set Ω X , and outputs a low-dimensional latent vector y = An example of such a compressive neural network (a dense feedforward neural network), with eight input nodes and two output nodes, is show in Figure 5b. The output space of the neural network Ω Y is sometimes called the latent space.

An idea with the distance metric learning is to train the compressive neural network on a labelled dataset to cluster outputs based on the similarity of their corresponding inputs; e.g., images with the same label should be clustered in the latent space Ω Y . More formally, let be different inputs with labels respectively. For example, imagine that x 1 is an image of person and is an image of person l 2 . Let be a metric, operating on the output space of the neural network Ω Y . That is, represents the “distance” between the compressed versions of x 1 and x 2 .

Now consider a triple of inputs and their corresponding labels satisfying the following:

- The anchor and its labe

The positive and its label , and and its label A triplet loss function may be used to train the compressive neural network to cluster data in the latent space Ω Y by comparing the distance against the distance . For example, the neural network may be trained to minimize the loss function:

The hyper-parameter y > 0 is a margin that gives robustness to the learned metric and prevents the network to map all elements to the same point in latent space.

In Figure 5c, two copies of the trained compressive neural network are stacked on top of each other to form a so-called Siamese network.

The top neural network is used to process reference images and their labels; for example, a reference picture x ref of company employee l ref from a database is input to the network.

The bottom neural network is used to process test images; for example, a picture x of a person from a security camera - a picture without a corresponding label.

If the distance ( ) is small, then one may assume that reference image x ref and the test image x have the same label l ref . For example, in the employee example above:

If the distance is small, then we might declare that x ref and x are pictures of the same employee l ref .

Conversely, if the distance is large, then we might declare that x ref and x are pictures of different employees - pictures with different labels.

In practice, the neural networks processing the reference- and test images might not be identical. Further, the metric M 2 doesn’t need to be a metric in the mathematical sense, it may also be for example a pseudo metric. Designing and training such systems requires manual, application specific, tuning of the trainable parameters e.g., the weights and biases of the underlying NN.

The above idea is used, for example, in one-shot learning problems to recognize and distinguish human faces. Figure 5d illustrates how an encoder according to embodiments herein maps from an input space to an output space £1 Y and how the points and distances relate. Figure 5d shows a fully connected encoder, but other architectures may also apply. The encoder is trained using a loss function described below, according to agreed input space Ω X , output space fl r , and metrics M and M 2 . Each company may train its own encoder to do the encoding, since the metrics are agreed upon. Note that what is called input space may be different depending on feedback and ML architecture. It can, e.g., either be on the direct measurements, before any pre-processing, or after certain feature extraction. The input space may be agreed on, but there may be additional data processing before the neural network that may be proprietary. Examples of input spaces are:

• The raw measurement space, i.e. , the space of channel matrices.

• The space of channels matrices after further filtering/smoothing/noise reduction techniques.

• The space of (eigen)vectors, for precoding- vector feedback.

• The coefficient space after beam- and/or delay- transformation with possible truncation. Applicable to both full-channel and precoding-vector feedback. In the former it would be a space of channel matrices, and in the latter, it would be a space of (eigen)vectors. Here the coefficient space refers to a space of relative amplitude scaling and co-phasing coefficients of the channel matrices or the precoding-vectors.

Figures 6a to 6d illustrate details of a node 601 which may perform method embodiments disclosed herein. The node 601 comprises a compressive Neural Network (NN) encoder 601-1. The node 601 may further comprise a CSI-providing source 601-2. The CSI-providing source 601-2 provides the uncompressed CSI to the NN encoder 601-1. The uncompressed CSI may be based on measured channel data.

The node 601 may also be referred to as a training apparatus. In some embodiments herein the node 601 corresponds to a server, such as a UE or chipset vendor server or a network vendor server. In some other embodiments herein the node 601 corresponds to the UE 121. In yet some further embodiments herein the node 601 corresponds to the radio access node 111.

The node 601 is configured for training the NN encoder 601-1 in a training phase of the NN encoder 601-1. The NN encoder 601-1 is trained to provide encoded CSI from a first communications node, such as the UE 121, to a second communications node, such as the radio access node 111, over a communications channel, such as the UL channel 123-UL, in a communications network, such as the wireless communications network 100. The CSI is provided in an operational phase of the AE-encoder wherein an equivalent NN encoder 601-1 may be comprised in the UE 121 and/or the radio access node 111.

Embodiments herein may be implemented for both full-channel CSI feedback, i.e. , where the whole observed channel and possibly also noise covariance estimates is sent as feedback; as well as precoding-vector CSI feedback, i.e., where one or more suggested precoding vectors are sent as feedback. Detailed explanations for each case follow below.

There are different ways to construct loss functions to be able to train to achieve the result. Some examples are:

• Figure 6a illustrates a method using the loss function = - points

• Figure 6b illustrates a method using a version of triplet loss as loss function. However, the triplet loss function may not be directly used to achieve the goal. The description above is based on that there is a grouping/labeling done so that the training-triplets Anchor may be chosen which is not the case for embodiments herein. Embodiments herein may overcome this in different ways: o The metric may be used to induce a clustering of the training examples (e.g., using spectral clustering), which then allows the triplet loss to be used by picking x A and x P from the same cluster, x N from another cluster. o The sample x A may be chosen pseudo-randomly, and then x P may be chosen pseudo-randomly in a way such that M^x^xp) is small enough, and analogously, x N may be chosen pseudo-randomly in a way such that s large enough. Where small enough and large enough may be implemented using threshold values, which may be tuned as hyperparameters.

• Using the loss function - the sample x A , x P , and x N may be chosen as described in the two sub-bullets above, and y > 0 is a margin (and a hyper-parameter). This enforces that the distance in latent space between two points, as measured by M 2 , more closely/faithfully represents the distance between the corresponding two points in input space, as measured by M 1 , if the corresponding two points in input space are close and/or in the same cluster.

• Figure 6c illustrates a method wherein continuity requirements may be explicitly trained for, e.g., by using the following processes and losses: o Pick two inputs such that , either directly from the dataset or by letting x j being a small, pseudo-random, perturbation of x 1 Then the loss i o Pick two inputs j such that M 1 either directly from the dataset or by letting x j being a small, pseudo-random, perturbation of x i Then the loss i

Here M1 which is based on the distance between the inputs may be used to choose how to calculate the loss L. In the description above, the selection (the if- statements based on M1) is performed when selecting data points from the CSI-providing source 601-2, i.e. , the node 601 may choose to only work/compute on pairs that fulfills one of those requirements. It is, however, possible to work with a loss function containing the selection (the if-statements based on M1). The former may be more efficient in terms of CPU/GPU-clock cycles needed to train.

• Figure 6d illustrates a method wherein the compressive neural network may be trained for the reference CSI features separately. o For example, for one may use the metric to quantify the closeness of and the corresponding reference y ref . o Alternatively, one may use a classical loss function (e.g., cosine similarity or normalized mean squared error) to quantify the closeness of

For the suggested loss functions describe above, different powers and/or scalings may be used. In general, for any monotonic function g, of one variable, may be used as a loss function, where £ is one of the loss functions described above. Moreover, combinations of the above-described loss functions may also be used, e.g., some weighted mean of multiple losses, or training certain epochs and/or batches with certain loss functions.

In practical implementations the metric M 1 may be • For full-channel CSI feedback: o The Frobenius norm of the difference x o The L2/spectral norm of the difference

• For precoding-vector CSI feedback: o The Euclidean norm of the difference

In practical implementations the metric M 2 may be

• For full-channel CSI feedback: o The Frobenius norm of the difference o The L2/spectral norm of the difference o The Hamming distance between binary vectors

• For precoding-vector CSI feedback o The Euclidean norm of the difference o The Hamming distance between binary vectors

In the above description csi.m denotes the cosine similarity between the vectors x t and x j which is defined as cosine of the angle between the vectors. In complex vector spaces there are different angles that may be considered. In the context of embodiments herein any of these angles, and others could be used to define cosine similarity.

Regardless of the feedback-type of the CSI (full-channel or precoding-vector), if the input and/or output is interpreted as a matrix, then the corresponding metric, and/or M 2 respectively, may be applied on the space spanned by the columns/row of the matrix, i.e. , applied on each column/row separately. The distance may then be judged individually per column/row, or by some aggregate, e.g., a mean. As an example, for precoding-vector feedback, the input may be a matrix containing the suggested precoding vectors per subband as columns and the output may have certain bits corresponding to a certain subband (as this would, e.g., allow up- and down-scaling of the bandwidth). Then M may measure on the columns of the input, corresponding to the per subband suggested precoding vectors, and M 2 may measure on the corresponding bits in the output.

Appropriate methods to train the NN encoder 601-1 for CSI reporting will now be described with reference to a flow chart in Figure 6e.

In action 610 the node 601 obtains a first metric and a second metric of the wireless channel.

In action 611 the node 601 obtains a first and a second uncompressed CSI Xj, Xj as input to the compressive NN encoder based on the obtained first and second metrics of the wireless channel. The uncompressed CSIs may be any one or more of: measured channel matrices, channel matrices after signal-processing, eigenvectors for precodingvector feedback, a coefficient space after beam- and/or delay-transformation.

In training the node 601 may use the same encoder multiple times, e.g., twice or three times. Hence, there may be one copy of the encoder in training, and that is called for each uncompressed CSI.

In action 612 the node 601 calculates with the compressive NN encoder and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI.

In action 613 the node 601 calculates a first distance between the first and second uncompressed CSIs in an input space. The first distance may compute a cosine similarity between main eigenvectors of a transmitter-side covariance matrix.

In action 614 the node 601 calculates a second distance between the first and second compressed CSIs in an output space.

The output space may be all length n binary sequences that fulfills |T| = 2 n and the second distance may be a Hamming metric. In some other embodiments the output space is a discrete set of points in representing centroids of quantizer bins and the second distance is an Euclidean distance between the centroids of the quantizer bins; In yet some further embodiments the output space is a digital representation of a linear vector space and the second distance is a cosine similarity between the vectors of the linear vector space. In action 615 the node 601 calculates a loss value for the compressive NN encoder based on the first distance and based on the second distance.

Calculating the loss value for the compressive NN encoder based on the first distance may comprise selecting the first and second uncompressed CSI input values based on the first distance measure. In a first example, the first and second uncompressed CSI input values are selected such that they are close, e.g., such that the first distance measure is below a first threshold distance. This may be the case for training to make sure that uncompressed CSI that is close in input space (first distance) leads to compressed CSI that is close in the output space (second distance).

In a second example, the first and second uncompressed CSI input values are selected such that they are far away, e.g., such that the first distance measure is above a second threshold distance. This may be the case for training to make sure that uncompressed CSI that is far away in input space (first distance) leads to compressed CSI that is far away in the output space (second distance).

Further conditions may apply. For example, if the first distance is less than a first value which is larger than zero then the second distance is less than a second value which is larger than zero and based on the first value. In another example, if the first distance is larger than a third value which is larger than zero then the second distance is larger than a fourth value which is larger than zero and based on the third value.

More specifically, If the input satisfies for some then the output may satisfy where are related (e.g., for some fixed positive real number a).

If the input satisfies then the output may satisfy y 2 for some y 1 and y 2 that are related by, for example, Such conditions may be part of pre-deployment tests.

In some embodiments calculating the loss value for the compressive NN encoder is based on a loss function which is dependent on the first distance and/or on the second distance. For example, the loss function may be dependent on an absolute value of a difference between the first distance and the second distance.

In action 616 the node 601 updates trainable parameters of the compressive NN encoder based on the calculated loss value. Some further methods to train the NN encoder 601-1 for CSI reporting will now be described with reference to a flow chart in Figure 6f. These further methods are related to the embodiments described in relation to Figure 6b. As mentioned above, Figure 6b illustrates a method using a version of triplet loss as loss function.

In action 620 the node 601 obtains a third metric of the wireless channel (123-DL).

In action 621 the node 601 obtains a third uncompressed CSI as input to the compressive NN encoder (601-1) based on the obtained third metric of the wireless channel (123-DL). The method may further comprise clustering the uncompressed CSIs based on the first distance and/or the third distance.

The first, second and third uncompressed CSIs may correspond to x A , x P and x N . For example, the first uncompressed CSI may correspond to x A , the second uncompressed CSI may correspond to x P , and the third uncompressed CSI may correspond to x N . Then, as mentioned above in relation to Figure 6b, the metric M may be used to induce a clustering of the training examples (e.g., using spectral clustering), which then allows the triplet loss to be used by picking x A and x P from the same cluster, x N from another cluster.

The method may further comprise selecting the uncompressed CSIs pseudo- randomly such that the first distance is smaller than a first threshold distance and the third distance is larger than a second threshold distance. For example, The sample x A may be chosen pseudo-randomly, and then x P may be chosen pseudo-randomly in a way such that M-^x^ xp) is small enough, and analogously, x N may be chosen pseudo-randomly in a way such that M^x^ x^ is large enough. Where small enough and large enough may be implemented using threshold values, which may be tuned as hyper-parameters.

In action 622 the node 601 calculates with the compressive NN encoder (601-1) and based on the third uncompressed CSI, a third compressed output value representing a third encoded CSI. The sample x A may be chosen pseudo-randomly, and then x P may be chosen pseudo-randomly in a way such that M^x^xp) is small enough, and analogously, x N may be chosen pseudo-randomly in a way such that M^x^ x^ is large enough. Where small enough and large enough may be implemented using threshold values, which may be tuned as hyper-parameters.

In action 623 the node 601 calculates a third distance between the first and third uncompressed CSIs in an input space.

In action 624 the node 601 calculates a fourth distance between the first and third compressed CSIs in an output space. In action 625 the node 601 calculates the loss value for the compressive NN encoder (601-1) based on the first distance, the second distance, the third distance and the fourth distance.

In action 626 the node 601 updates trainable parameters of the compressive NN encoder (601-1) based on the calculated loss value.

Inference phase

Figure 7a illustrates details of a wireless communications device 521, such as the UE 121 , and a network node 511 , such as the radio access node 111. The wireless communications device 521 may comprise a first compressive NN encoder 521-1 for encoding uncompressed CSI a into a compressed state f(a).

The wireless communications device 521 may further comprise a CSI-providing source 521-2. The CSI-providing source 521-2 provides the uncompressed CSI a to the NN-based encoder 521-1. The uncompressed CSI a may be based on measured channel data.

The wireless communications device 521 may provide the compressed state f(a) to the network node 511. The compressed state may be part of a CSI-report further comprising, e.g., channel quality index (CQI) and rank indicator (Rl). The compressed state f(a) may be related to a single layer transmission or a multi-layer transmission.

The network node 511 may comprise a second compressive NN encoder 511-1 for encoding candidates of uncompressed CSI.

In some embodiments the network node 511 comprises a codebook 511-3 comprising codewords b1, b2 representing candidates of uncompressed CSI.

As described above the compressed CSI f(a) may be compared with encoded versions g(b1), g(b2) of the candidates in order to conclude whether or not any of the candidates b1, b2 are close enough to the uncompressed CSI a.

The network node 511 may further comprise a decoder (not shown in Figure 7a) for decoding the compressed CSI f(a) provided by the wireless communications device 521. Such decoded (decompressed) CSI may be used in addition to or in place of the codewords as a candidate for the uncompressed CSI a at the wireless communications device 521.

The NN-based decoder may comprise a same number of input nodes as a number of output nodes of the NN encoders 521-1, 511-1. In Figure 7a the network node 511 has been illustrated as a single unit. However, as an alternative, the network node 511 may be implemented as a Distributed Node (DN) and functionality, e.g. comprised in a cloud 140 as shown in Figure 7a, for performing or partly performing method embodiments herein.

Appropriate methods to handle NN-based CSI reporting in the inference phase are provided below. Exemplifying methods according to embodiments herein will now be described with reference to a signaling diagram in Figure 7b.

When deployed, one embodiment of the system would work as depicted in Figure 7b. A UE estimates the downlink channel using CSI-RS measurements. The channel estimation step may then be followed by a feature extraction/data preprocessing step; for example, the UE computes precoder vectors in a subband or wideband manner. The UE encodes channel estimate (or extracted channel features) using its trained compressive neural network encoder f . The latent space output of the compressive neural network f (a) is transmitted to the BS over the UL control and/or data channels.

The gNB has its own trained compressive neural network g and a Ml MO precoding codebook B, which may or may not be standardized. The gNB compares f (a) and g b~) for all b e B and selects the MIMO precoding vector that minimizes M 2 ( (a), g (/>)).

In some embodiments when the distance M2 is less than a threshold distance in the pre-defined output space, the candidate b is selected as CSI, even if the second metric M2 is not the minimum for this candidate b.

Intuitively, the CSI report here may be thought of as a precoding matrix indicator from a codebook B, but the codebook B does not need to be fixed (or specified) because a small g(b)) implies M is small (a condition guaranteed by the training process defined above).

The distance may also be used as a quality indication for how good the codebook is at representing the conditions observed by the UE. Since, if the minimum distance is large, then that would implicitly imply that no codeword in the codebook represents the observed UE conditions well.

In the previous description of a deployment scenario the encoders may be the same, f = g, if the standard specifies an encoder or there are bilateral agreements to share the trained compressive neural networks between vendors. However, the encoders may be different proprietary implementations, i.e. , f #= g, following the specified input and output spaces, and corresponding metrics and M 2 .

The codebook may be:

• Fixed by the standard.

• Developed by the network vendor in a centralized and proprietary way, and fixed across sets or groups of gNBs.

• Local and different for each gNBs (or groups thereof). Such a codebook may need to be initialized but may then be updated dynamically. An initial codebook may be achieved by, e.g.,: a measurement campaign; utilizing a legacy codebook; transferred from a gNB deployed in a similar environment; or by any of the means described above for how a codebook may be set. The codebook may be: fixed or updated, e.g., continuously over time using, e.g., reinforcement learning; or specific time instances based on off-line training.

As a further enhancement, to allow for cross-vendor operation based on independent training, the standard may provide a set of reference points, x lf x 2 , ...,x n with their corresponding prescribed mappings, y 1; y 2 , ...,y n - Specifically, for any encoder it would be required that for all i = 1,2, ...,n, at least up to some error margin. This may be used for both training and testing purposes.

As a further enhancement to the reference points, the standard may prescribe a continuity requirement of the encoder. Specifically, for any encoder f (a) it would be required that, for any point a that satisfies for some prescribed parameter e > 0, the following inequality has to hold, for some prescribed parameter 8 > 0. This would be valid for all, or some, of the prescribed pairs and the continuity parameters e, 8 could be the same across all pairs or different for different pairs. This continuity requirement could also be prescribed for all points a in the input space Ω X , or prescribed to hold for a certain fraction of all points in the input space.

The proposed solution enables Al-based CSI reporting using proprietary implementations on both the network side and the UE side. Only the input space, the output space, and the two metrics (M 1 and M 2 ) need to be agreed on. This also has the potential of creating a simpler standard than, e.g., an encoder-decoder-type solution where the interoperability of the encoder and decoder needs to be assured. The proposed solution allows for rather simple testing of proposed implementations. Testing may in this context mean, e.g., tests performed by the developing vendor, such as the UE vendor and/or the NW vendor, during development of the encoders, and/or tests specified by 3GPP WG RAN4, and/or tests specified by 3GPP WG RAN5, and/or crossvendor bilateral/multilateral interoperability testing. Thus, the tests may be performed by the node 601. The ease of testing stems from that only encoders are visible to the standard, and hence a test only needs to consider if pairs of feedback quantities satisfy small, then is small, and large, then is large.

What the network does with the feedback information remains proprietary. A network vendor may either develop their own decoder or apply a codebook-based approach where codewords are compared with the CSI feedback, in the metric M 2 . The codebook may be site specific.

Apart from the above-mentioned advantages the training is greatly simplified as a joint training is not needed across companies. Embodiments herein would also allow the NW vendor to train a decoder to work with its trained compressive neural network.

Exemplifying methods according to embodiments herein will now be described with reference to a flow chart in Figure 7c The flow chart illustrates a method, performed by the network node 511 , such as the radio access node 111 , for handling CSI in the wireless communications network 100.

In action 700 the network node 511 receives a compressed CSI associated with the wireless channel 123-DL. For example, the network node 511 may receive over the radiobased air interface 123-UL and using a standardized radio transmission protocol, the CSI based on output from the trained NN-based encoder.

In action 701 the network node 511 obtains one or more candidate uncompressed CSI b, such as one or more codewords, associated with the wireless channel 123-DL. The candidate uncompressed CSI may belong to the same input space as the uncompressed CSI in the UE 521. In some embodiments the network node 511 comprises a decoder and then obtaining the one or more candidate uncompressed CSIs associated with the wireless channel 123-DL comprises decoding the received compressed CSI with the decoder. The decoder may comprise the same number of input nodes as the number of output nodes of the compressive NN encoder 601-1. In a further action 702 the network node 511 obtains corresponding one or more candidate compressed CSIs g(b) based on an encoding of the candidate uncompressed CSIs b. The candidate compressed CSIs may be pre-calculated from the codewords. In some other embodiments obtaining the second compressed CSI comprises calculating, with the second NN encoder 511-1 and based on the candidate of the uncompressed CSI, the candidate compressed CSI.

In a further action 703 the network node 511 calculates a distance M 2 between the compressed CSIs in the output space. In other words, the network node 511 calculates a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

In a further action 704 the network node 511 selects a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space. In other words, the network node 511 selects a candidate codeword based on the distance M 2 between the compressed CSIs in the output space. For example, the network node 511 selects the codeword that minimizes the distance M 2 between the compressed CSIs in the output space. In other words, selecting the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs may comprise selecting the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs that minimizes the distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

Embodiments herein will now be summarized.

The input space, X, of the compressive neural network in the UE may be standardized, e.g., by 3GPP. The input space may for example be a. The raw measurement space, i.e. , the space of measured channel matrices. b. The space of extracted (eigen)vectors, for precoding-vector feedback.

1. The input space may also be any of the above with further model-based reduction and mapping techniques. For example, 3GPP may specify that the UE should map its CSI-RS channel (feature) estimate to L abstract signal ports. Such signal ports may be defined, for example, via Rel-16 Type II CSI reporting codebooks.

The UE’s CSI-RS based estimate of the downlink channel feature may be represented by a point x ∈ X.

2. 3GPP standardizes the output (latent) space Y of the compressive neural network. The following lists some examples of the space T: a. Y could be the set of all length n binary sequences so that |T | = 2 n . b. Y could be a discrete set of points in representing the centroids of quantizer bins (e.g., resulting from uniform scalar quantization of encoder output nodes, or lattice-based quantization). c. Y could be a digital representation of a linear vector space, e.g., an approximation of C n by representing the real- and imaginary parts in single-precision floating-point format, effectively giving an uplink control channel payload size of 64n. The vectors in the linear vector space may be represented in other formats than single-precision floating-point formats, with both larger and smaller number of bits. It is also possible to apply further quantization schemes to the numbers.

3. 3GPP standardizes a metric for the input space X to quantify distances between CSI features. For example, might compute the cosine similarity between the main (right) eigen vectors of the transmitter-side covariance matrix (in a sub-band or wideband manner).

4. 3GPP standardizes a metric for the output space ) to quantify distances between the compressed CSI features. For example, a. If Y is the set of all length n binary sequences y = y [0], y [1] ... , y[n - 1], then M 2 could be the Hamming metric:

Where I: {0,1} x {0,1} -> {0,1} is the indicator function b. If Y is a discrete set of points (e.g., representing the centroids of quantizer bins), then M 2 could be the Euclidean distance between the centroids (or some generalization thereof). c. If Y is a digital representation of a linear vector space, then M 2 could be the Euclidean distance between the centroids (or some generalization thereof), or the cosine similarity between the vectors. 3GPP specifies continuity requirements for a trained compressive neural network: a. If the input satisfies then the output must satisfy where and e related for some fixed positive real number a). b. If the input satisfies then the output must satisfy and y 2 that are related by, for example, 3GPP specifies a small set of reference CSI features examples with corresponding outputs y For example, a. X ref may be a collection of representative channels for the Rel-15 Type I or Rel-16 Type II codebooks. That is, 3GPP selects a representative channel for each rank and precoding matrix. b. X ref may be DFT beams, in the case of precoding-vector feedback (example 1b). A corresponding set Y ref may for example be generated by drawing (at the time of specification, and then fixating) random binary codewords (example 4a) or Gaussian random variables (example 4b). UE/chipset and NW vendors each train compressive neural network with inputs and output spaces defined by 1 and 2 above. The compressive neural networks should be trained so that conditions 5, and 6 are satisfied, using the metrics defined in 3 and 4. Detailed examples of how to do that have been outlined above. 8. The NW vendor deploys its trained compressive neural network. The UE vendor deploys its trained compressive neural network. The NW vendor’s trained compressive neural network and the UE vendor’s compressive neural network are used together in a Siamese setup.

9. The NW vendor has one or multiple codebook(s). The codewords in a codebook have a corresponding representation in latent space, which may be precomputed once the codeword is known. For each UE report, a codeword is chosen from the codebook based on the latent representation of the codeword being close to the report, as measured by M 2 . The BS may search through the codebook for the codeword whose latent representation is closest to what the BS received from the UE in the CSI report. The BS may use its own encoder to calculate the reference compressed output since the BS should compare the distance in latent space between the received report and the codewords in its own codebook. However, if the codebook is static (or semi-static), then from a computational point of view the NW may precompute the latent space representations and cache these. Since it may have to do many comparisons with them (one comparison with every codeword for every UE CSI report received) it saves processing.

10. The codebook may be standardized by 3GPP; built centrally by the NW vendor and implemented in all BS-nodes (or a few large groups thereof) with the possibility of updates; or locally constructed/updated in each deployed BS (or groups thereof). a. The codebook may be constructed from a legacy codebook. b. The codebook may be constructed from statistics of applied precoder vectors in legacy deployments. E.g., some of the most commonly used precoder vectors are put into the codebook. This may be implemented centrally at the NW vendor, or locally per BS (or groups thereof) as a preparation for the deployment of the compressive neural network. c. A codebook that is local to a BS may be created by a measurement campaign, where the CSI is measured at certain places in physical space. The campaign may be done either with special equipment, e.g., at locations deemed likely for a UE to be at during operations; or with by live UEs. d. All the above-described codebooks (individually or jointly) may be used as an initial codebook that is updated over time in a BS (or group thereof).

Some of the above specification may impose a minimal structure on all trained compressive neural networks. This trained structure may be sufficient for a NW vendor trained compressive neural network to be compatible with a UE/chipset trained compressive neural network (even though the neural networks are trained independently by different vendors using proprietary techniques and data).

Figure 8 shows an example of the node 601 and Figure 9 shows an example of the network node 511. The node 601 may be configured to perform the method actions of Figure 6e above. The network node 511 may be configured to perform the method actions of Figure 7c above.

Thus, the node 601 is configured for training a compressive NN encoder 601-1 for encoding CSI associated with a wireless channel 123-DL between a radio access node 111 and a wireless communications device 121 in a wireless communications network 100.

The node 601 is configured to: obtain a first metric and a second metric of the wireless channel 123-DL; obtain a first and a second uncompressed CSI as input to the compressive NN encoder 601-1 based on the obtained first and second metrics of the wireless channel 123-DL; calculate, with the compressive NN encoder 601-1 and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI; calculate a first distance between the first and second uncompressed CSIs in an input space; calculate a second distance between the first and second compressed CSIs in an output space; calculate a loss value for the compressive NN encoder 601-1 based on the first distance and based on the second distance; and update trainable parameters of the compressive NN encoder 601-1 based on the calculated loss value.

In some embodiments the node 601 is configured to calculate the loss value for the compressive NN encoder 601-1 based on the first distance by being configured to select the first and second uncompressed CSI input values based on the first distance.

The node 601 may be configured to calculate the loss value based on a loss function which is dependent on the first distance and/or on the second distance.

In some embodiments the node 601 is further configured to: obtain a third metric of the wireless channel 123-DL; obtain a third uncompressed CSI as input to the compressive NN encoder 601-1 based on the obtained third metric of the wireless channel 123-DL; calculate, with the compressive NN encoder 601-1 and based on the third uncompressed CSI, a third compressed output value representing a third encoded CSI; calculate a third distance between the first and third uncompressed CSIs in an input space; calculate a fourth distance between the first and third compressed CSIs in an output space; calculate the loss value for the compressive NN encoder 601-1 based on the first distance, the second distance, the third distance and the fourth distance; and update trainable parameters of the compressive NN encoder 601-1 based on the calculated loss value.

The node 601 may be configured to cluster the uncompressed CSIs based on the first distance and/or the third distance.

In some embodiments the node 601 is further configured to select the uncompressed CSIs pseudo-randomly such that the first distance is smaller than a first threshold distance and the third distance is larger than a second threshold distance.

When the node 601 is a network node 511 of the wireless communications network 100, then the node 601 may be further configured to: receive a compressed CSI associated with the wireless channel 123-DL; obtain one or more candidate uncompressed CSIs associated with the wireless channel 123-DL; obtain corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs; calculate a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space; and select a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

In some embodiments, when the node 601 is a network node 511, the node 601 is further configured to select the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs by being configured to select the candidate uncompressed CSI out of the one or more candidate uncompressed CSIs that minimizes the distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

In some embodiments the network node 511 comprises a decoder and then the node 601 may further be configured to obtain the one or more candidate uncompressed CSIs associated with the wireless channel 123-DL by being configured to decode the received compressed CSI with the decoder.

The decoder may comprise the same number of input nodes as the number of output nodes of the compressive NN encoder 601-1.

As mentioned above, the network node 511 is configured for handling CSI in the wireless communications network 100. The network node 511 may comprise the compressive NN encoder 511-1 for encoding CSI associated with the wireless channel 123-DL between the radio access node 111 and the wireless communications device 121 in the wireless communications network 100. The network node 511 may be configured to receive the compressed CSI associated with the wireless channel 123-DL.

The network node 511 may be further configured to obtain one or more candidate uncompressed CSIs associated with the wireless channel 123-DL. The network node 511 may be further configured to obtain corresponding one or more candidate compressed CSIs based on an encoding of the candidate uncompressed CSIs.

The network node 511 may be further configured to calculate a respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

The network node 511 may be further configured to select a candidate uncompressed CSI out of the one or more candidate uncompressed CSIs based on the respective distance between the one or more obtained candidate compressed CSIs and the received compressed CSI in the output space.

The node 601 and the network node 511 may each comprise a respective input and output interface, I/O, 806, 906 configured to communicate with other nodes, see Figures 8-9. The input and output interface may comprise a wireless receiver (not shown) and a wireless transmitter (not shown).

The node 601 and the network node 511 may each comprise a respective processing unit 801, 901 for performing the above method actions. The respective processing unit 801, 901 may comprise further sub-units which will be described below.

The node 601 and the network node 511 may each comprise a respective obtaining unit 810, 920, e.g. for obtaining input values such as uncompressed CSI for the node 601 and compressed CSI for the network node 511.

The network node 511 may further comprise a receiving unit 910 which may receive messages and/or signals.

The node 601 and the network node 511 may each comprise a respective a calculating unit 820, 930 which for example may calculate the first distance and/or the second distance and/or the loss of the NN encoder.

The node 601 may further comprise an updating unit 830 which for example may update the trainable parameters of the NN encoder during training. The trainable parameters of the NN encoder may be updated based on the calculated loss The network node 511 may further comprise a selecting unit 940 which may select a codeword among multiple codewords based on the calculated second distance.

The embodiments herein may be implemented through a respective processor or one or more processors, such as the respective processor 804, and 904, of a processing circuitry in the node 601 and the network node 511 depicted in Figures 8-9 together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the respective node 601 and network node 511. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the respective node 601 and network node 511.

The node 601 and the network node 511 may further comprise a respective memory 802, and 902 comprising one or more memory units. The memory comprises instructions executable by the processor in the node 601 and network node 511.

Each respective memory 802 and 902 is arranged to be used to store e.g. information, data, configurations, and applications to perform the methods herein when being executed in the respective node 601 and network node 511.

In some embodiments, a respective computer program 803 and 903 comprises instructions, e.g., in the form of computer readable code units, which when executed by the at least one processor, cause the at least one processor of the respective node 601 and network node 511 to perform the actions above.

In some embodiments, a respective carrier 805 and 905 comprises the respective computer program, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium. In other words, the computer-readable storage medium 805, 905, having stored thereon the computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to the actions above. Those skilled in the art will also appreciate that the units described above may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g. stored in the respective node 601 and network node 511, that when executed by the respective one or more processors such as the processors described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuitry (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a system-on-a- chip (SoC).

With reference to Figure 10, in accordance with an embodiment, a communication system includes a telecommunication network 3210, such as a 3GPP-type cellular network, which comprises an access network 3211, such as a radio access network, and a core network 3214. The access network 3211 comprises a plurality of base stations 3212a, 3212b, 3212c, such as the source and target access node 111, 112, AP STAs NBs, eNBs, gNBs or other types of wireless access points, each defining a corresponding coverage area 3213a, 3213b, 3213c. Each base station 3212a, 3212b, 3212c is connectable to the core network 3214 over a wired or wireless connection 3215. A first user equipment (UE) such as a Non-AP STA 3291 located in coverage area 3213c is configured to wirelessly connect to, or be paged by, the corresponding base station 3212c. A second UE 3292 such as a Non-AP STA in coverage area 3213a is wirelessly connectable to the corresponding base station 3212a. While a plurality of UEs 3291, 3292 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE is connecting to the corresponding base station 3212.

The telecommunication network 3210 is itself connected to a host computer 3230, which may be embodied in the hardware and/or software of a standalone server, a cloud-implemented server, a distributed server or as processing resources in a server farm. The host computer 3230 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider. The connections 3221 , 3222 between the telecommunication network 3210 and the host computer 3230 may extend directly from the core network 3214 to the host computer 3230 or may go via an optional intermediate network 3220. The intermediate network 3220 may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network 3220, if any, may be a backbone network or the Internet; in particular, the intermediate network 3220 may comprise two or more subnetworks (not shown).

The communication system of Figure 10 as a whole enables connectivity between one of the connected UEs 3291 , 3292 such as e.g. the UE 121 , and the host computer 3230. The connectivity may be described as an over-the-top (OTT) connection 3250. The host computer 3230 and the connected UEs 3291 , 3292 are configured to communicate data and/or signaling via the OTT connection 3250, using the access network 3211 , the core network 3214, any intermediate network 3220 and possible further infrastructure (not shown) as intermediaries. The OTT connection 3250 may be transparent in the sense that the participating communication devices through which the OTT connection 3250 passes are unaware of routing of uplink and downlink communications. For example, a base station 3212 may not or need not be informed about the past routing of an incoming downlink communication with data originating from a host computer 3230 to be forwarded (e.g., handed over) to a connected UE 3291. Similarly, the base station 3212 need not be aware of the future routing of an outgoing uplink communication originating from the UE 3291 towards the host computer 3230. Example implementations, in accordance with an embodiment, of the UE, base station and host computer discussed in the preceding paragraphs will now be described with reference to Figure 11. In a communication system 3300, a host computer 3310 comprises hardware 3315 including a communication interface 3316 configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 3300. The host computer 3310 further comprises processing circuitry 3318, which may have storage and/or processing capabilities. In particular, the processing circuitry 3318 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The host computer 3310 further comprises software 3311, which is stored in or accessible by the host computer 3310 and executable by the processing circuitry 3318. The software 3311 includes a host application 3312. The host application 3312 may be operable to provide a service to a remote user, such as a UE 3330 connecting via an OTT connection 3350 terminating at the UE 3330 and the host computer 3310. In providing the service to the remote user, the host application 3312 may provide user data which is transmitted using the OTT connection 3350.

The communication system 3300 further includes a base station 3320 provided in a telecommunication system and comprising hardware 3325 enabling it to communicate with the host computer 3310 and with the UE 3330. The hardware 3325 may include a communication interface 3326 for setting up and maintaining a wired or wireless connection with an interface of a different communication device of the communication system 3300, as well as a radio interface 3327 for setting up and maintaining at least a wireless connection 3370 with a UE 3330 located in a coverage area (not shown in Figure 11) served by the base station 3320. The communication interface 3326 may be configured to facilitate a connection 3360 to the host computer 3310. The connection 3360 may be direct or it may pass through a core network (not shown in Figure 11) of the telecommunication system and/or through one or more intermediate networks outside the telecommunication system. In the embodiment shown, the hardware 3325 of the base station 3320 further includes processing circuitry 3328, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The base station 3320 further has software 3321 stored internally or accessible via an external connection.

The communication system 3300 further includes the UE 3330 already referred to. Its hardware 3335 may include a radio interface 3337 configured to set up and maintain a wireless connection 3370 with a base station serving a coverage area in which the UE 3330 is currently located. The hardware 3335 of the UE 3330 further includes processing circuitry 3338, which may comprise one or more programmable processors, applicationspecific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The UE 3330 further comprises software 3331 , which is stored in or accessible by the UE 3330 and executable by the processing circuitry 3338. The software 3331 includes a client application 3332. The client application 3332 may be operable to provide a service to a human or non-human user via the UE 3330, with the support of the host computer 3310. In the host computer 3310, an executing host application 3312 may communicate with the executing client application 3332 via the OTT connection 3350 terminating at the UE 3330 and the host computer 3310. In providing the service to the user, the client application 3332 may receive request data from the host application 3312 and provide user data in response to the request data. The OTT connection 3350 may transfer both the request data and the user data. The client application 3332 may interact with the user to generate the user data that it provides. It is noted that the host computer 3310, base station 3320 and UE 3330 illustrated in Figure 11 may be identical to the host computer 3230, one of the base stations 3212a, 3212b, 3212c and one of the UEs 3291, 3292 of Figure 10, respectively. This is to say, the inner workings of these entities may be as shown in Figure 11 and independently, the surrounding network topology may be that of Figure 10.

In Figure 11 , the OTT connection 3350 has been drawn abstractly to illustrate the communication between the host computer 3310 and the use equipment 3330 via the base station 3320, without explicit reference to any intermediary devices and the precise routing of messages via these devices. Network infrastructure may determine the routing, which it may be configured to hide from the UE 3330 or from the service provider operating the host computer 3310, or both. While the OTT connection 3350 is active, the network infrastructure may further take decisions by which it dynamically changes the routing (e.g., on the basis of load balancing consideration or reconfiguration of the network).

The wireless connection 3370 between the UE 3330 and the base station 3320 is in accordance with the teachings of the embodiments described throughout this disclosure. One or more of the various embodiments improve the performance of OTT services provided to the UE 3330 using the OTT connection 3350, in which the wireless connection 3370 forms the last segment. More precisely, the teachings of these embodiments may improve the data rate, latency, power consumption and thereby provide benefits such as reduced user waiting time, relaxed restriction on file size, better responsiveness, extended battery lifetime.

A measurement procedure may be provided for the purpose of monitoring data rate, latency and other factors on which the one or more embodiments improve. There may further be an optional network functionality for reconfiguring the OTT connection 3350 between the host computer 3310 and UE 3330, in response to variations in the measurement results. The measurement procedure and/or the network functionality for reconfiguring the OTT connection 3350 may be implemented in the software 3311 of the host computer 3310 or in the software 3331 of the UE 3330, or both. In embodiments, sensors (not shown) may be deployed in or in association with communication devices through which the OTT connection 3350 passes; the sensors may participate in the measurement procedure by supplying values of the monitored quantities exemplified above, or supplying values of other physical quantities from which software 3311 , 3331 may compute or estimate the monitored quantities. The reconfiguring of the OTT connection 3350 may include message format, retransmission settings, preferred routing etc.; the reconfiguring need not affect the base station 3320, and it may be unknown or imperceptible to the base station 3320. Such procedures and functionalities may be known and practiced in the art. In certain embodiments, measurements may involve proprietary UE signaling facilitating the host computer’s 3310 measurements of throughput, propagation times, latency and the like. The measurements may be implemented in that the software 3311, 3331 causes messages to be transmitted, in particular empty or ‘dummy’ messages, using the OTT connection 3350 while it monitors propagation times, errors etc.

FIGURE 12 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 10 and Figure 11. For simplicity of the present disclosure, only drawing references to Figure 12 will be included in this section. In a first action 3410 of the method, the host computer provides user data. In an optional subaction 3411 of the first action 3410, the host computer provides the user data by executing a host application. In a second action 3420, the host computer initiates a transmission carrying the user data to the UE. In an optional third action 3430, the base station transmits to the UE the user data which was carried in the transmission that the host computer initiated, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional fourth action 3440, the UE executes a client application associated with the host application executed by the host computer.

FIGURE 13 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 10 and Figure 11. For simplicity of the present disclosure, only drawing references to Figure 13 will be included in this section. In a first action 3510 of the method, the host computer provides user data. In an optional subaction (not shown) the host computer provides the user data by executing a host application. In a second action 3520, the host computer initiates a transmission carrying the user data to the UE. The transmission may pass via the base station, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional third action 3530, the UE receives the user data carried in the transmission.

FIGURE 14 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figure 10 and Figure 11. For simplicity of the present disclosure, only drawing references to Figure 14 will be included in this section. In an optional first action 3610 of the method, the UE receives input data provided by the host computer. Additionally or alternatively, in an optional second action 3620, the UE provides user data. In an optional subaction 3621 of the second action 3620, the UE provides the user data by executing a client application. In a further optional subaction 3611 of the first action 3610, the UE executes a client application which provides the user data in reaction to the received input data provided by the host computer. In providing the user data, the executed client application may further consider user input received from the user. Regardless of the specific manner in which the user data was provided, the UE initiates, in an optional third subaction 3630, transmission of the user data to the host computer. In a fourth action 3640 of the method, the host computer receives the user data transmitted from the UE, in accordance with the teachings of the embodiments described throughout this disclosure.

FIGURE 15 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station such as a AP STA, and a UE such as a Non-AP STA which may be those described with reference to Figures 32 and 33. For simplicity of the present disclosure, only drawing references to Figure 15 will be included in this section. In an optional first action 3710 of the method, in accordance with the teachings of the embodiments described throughout this disclosure, the base station receives user data from the UE. In an optional second action 3720, the base station initiates transmission of the received user data to the host computer. In a third action 3730, the host computer receives the user data carried in the transmission initiated by the base station.

When using the word "comprise" or “comprising” it shall be interpreted as nonlimiting, i.e. meaning "consist at least of".

The embodiments herein are not limited to the above described preferred embodiments. Various alternatives, modifications and equivalents may be used.

NUMBERED EMBODIMENTS

1. A method, performed by a node (601), such as a server, for training a compressive Neural Network, NN, encoder (601-1) for encoding CSI associated with a wireless channel between a radio access node (111) and a wireless communications device (121) in a wireless communications network (100), the method comprises:

Obtaining (610) a first metric and a second metric of the wireless channel;

Obtaining (611) a first and a second uncompressed CSI (xi, Xj) as input to the compressive NN encoder based on the obtained first and second metrics of the wireless channel;

Calculating (612), with the compressive NN encoder and based on the first and second uncompressed CSI, a respective first and second compressed output value representing a first and second encoded CSI; calculating (613) a first distance between the first and second uncompressed CSIs in an input space; calculating (614) a second distance between the first and second compressed CSIs in an output space; calculating (615) a loss value for the compressive NN encoder based on the first distance and based on the second distance; and updating (616) trainable parameters of the compressive NN encoder based on the calculated loss value.

2. The method according to embodiment 1 , wherein calculating the loss value for the compressive NN encoder based on the first distance comprises:

Selecting the first and second uncompressed CSI input values based on the first distance measure.

3. The method according to embodiments 1-2, wherein calculating the loss value is based on a loss function which is dependent on the first distance and/or on the second distance.