METHOD FOR TRANSFORMING DATA AND RELATED DEVICE

Title:

METHOD FOR TRANSFORMING DATA AND RELATED DEVICE

Document Type and Number:

WIPO Patent Application WO/2023/224509

Kind Code:

Abstract:

Embodiments of this application provide a method method for transforming data and a related device. The method includes: obtaining an input image, wherein the input image includes N pixels, N is a positive integer; performing a nonlinear transformation on values of the N pixels to obtain N first pixel values; obtaining, according to a quantized model and the N first pixel values, M second pixel values, wherein M is a positive integer; performing a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values; determining, according to the M third pixels values, an output image. The proposed technique makes it possible to increase the quality of quantized models in which the data is unevenly distributed, and at the same time the inference time of quantized models practically does not change.

More Like This:

JP2024007230	Problem set creation program, device, and method
JP2023514677	Determining resource requirements
WO/2018/148029	SYSTEM FOR ANALYZING MACHINE DATA

Inventors:

FILIPPOV ALEXANDER NIKOLAEVICH (CN)
SOLODSKIKH KIRILL IGOREVICH (CN)
CHIKIN VLADIMIR MAXIMOVICH (CN)
SONG DEHUA (CN)
KAMENEV STANISLAV YURYEVICH (CN)
ZHELAVSKAYA IRINA (CN)

Application Number:

PCT/RU2022/000165

Publication Date:

November 23, 2023

Filing Date:

May 19, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
FILIPPOV ALEXANDER NIKOLAEVICH (CN)

International Classes:

G06N20/00; H04N19/42

Domestic Patent References:

WO2022006556A1	2022-01-06
WO2004017286A2	2004-02-26

Foreign References:

JP2019201255A	2019-11-21
US20200098304A1	2020-03-26

Attorney, Agent or Firm:

LAW FIRM "GORODISSKY & PARTNERS" LTD. (RU)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A method for transforming data, characterized in comprising: obtaining an input image, wherein the input image comprises N pixels, N is a positive integer; performing a nonlinear transformation on values of the N pixels to obtain N first pixel values; obtaining, according to a quantized model and the N first pixel values, M second pixel values, wherein M is a positive integer; performing a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values; determining, according to the M third pixels values, an output image.

2. The method according to claim 1, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, comprises: performing a polynomial transformation on the values of the N pixels to obtain the N first pixel values.

3. The method according to claim 2, wherein a bounded degree of the polynomial transformation is less than 5.

4. The method according to claim 1, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, comprises: performing a gamma correction on the values of the N pixels to obtain the N first pixel values.

5. The method according to any one of claims 1-4, wherein a part or all of parameters of the nonlinear transformation are obtained by training, and wherein data used to train the parameters are used to train the quantized model.

6. The method according to claim 1, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, comprises: performing the nonlinear transformation, by looking up a first transformation table, on the values of the N pixels to obtain the N first pixel values.

7. The method according to any one of claims 1-6, wherein bitwidth of the quantized model is less than bitwidth of the input image.

8. An electronic device, characterized in comprising: an input unit, configured to obtain an input image, wherein the input image comprises N pixels, N is a positive integer; a processing unit, configured to perform a nonlinear transformation on values of the N pixels to obtain N first pixel values; the processing unit, further configured to obtain, according to a quantized model and the N first pixel values, M second pixel values, wherein M is a positive integer; the processing unit, further configured to perform a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values; an output unit, configured to determine, according to the M third pixels values, an output image.

9. The device according to claim 8, wherein the processing unit specifically configured to perform a polynomial transformation on the values of the N pixels to obtain the N first pixel values.

10. The device according to claim 8, wherein a bounded degree of the polynomial transformation is less than 5.

11. The deviceaccording to claim 8, wherein the processing unit specifically configured to perform a gamma correction on the values of the N pixels to obtain the N first pixel values.

12. The device according to any one of claims 8-11, wherein a part or all of parameters of the nonlinear transformation are obtained by training, and wherein data used to train the parameters are used to train the quantized model.

13. The device according to claim 8, wherein the processing unit specifically configured to perform the nonlinear transformation, by looking up a first transformation table, on the values of the N pixels to obtain the N first pixel values.

14. The device according to any one of claims 8-13, wherein bitwidth of the quantized model is less than bitwidth of the input image.

19. A computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on a server, the server is enabled to perform the method according to any one of claims 1 to 7.

20. A device, comprising a memory and a processor, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program from the memory and run the computer program, so that a server on which the chip is disposed performs the method according to any one of claims 1 to 7.

21. A computer program product, wherein when the computer program product runs on a server, the server is enabled to perform the method according to any one of claims 1 to 7.

Description:

METHOD FOR TRANSFORMING DATA AND RELATED DEVICE

TECHNICAL FIELD

[0001] Embodiments of the present invention relate to the field of artificial intelligence technologies, and more specifically, to a method for transforming data and a related device.

BACKGROUND

[0002] In computer science, there are two main numerical types: integers and floating points. The floating points are often used in machine learning models. For example, input data of the machine learning models and parameters of the machine learning models, e.g., weights of the machine learning model and input data of inner model layers, are floating point values. However, integer arithmetic works faster on hardware than floating point arithmetic. Therefore, a quantization technique is proposed to accelerate inference as well as to reduce memory and power consumption on hardware.

[0003] The aim of the quantization is to transform floating point values into integer values. For example, input data of the machine learning models may be transform from floating point values into integer values, and the machine learning models may execute operations with integer values rather than floating point values. However, input data in various tasks have their own specifics, which makes quantization difficult.

SUMMARY

[0004] Embodiments of this application provide method for transforming data and a related device. The technical solution may improve the quality of data quantization.

[0005] According to a first aspect, an embodiment of this application provides a method for transforming data, wherein the method includes: obtaining an input image, wherein the input i image includes N pixels, N is a positive integer; performing a nonlinear transformation on values of the N pixels to obtain N first pixel values; obtaining, according to a quantized model and the N first pixel values, M second pixel values, wherein M is a positive integer; performing a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values; determining, according to the M third pixels values, an output image.

[0006] The proposed data transformation method does not depend on the machine learning model and quantization format and does not require model changes. The proposed data transformation method allows for effective implementations. The proposed data transformation of inputs and outputs will practically not change the inference time of the quantized model. Further, the proposed data transformations allow for efficient implementation. Thus, the proposed technique makes it possible to increase the quality of quantized models in which the data is unevenly distributed, and at the same time the inference time of quantized models practically does not change.

[0007] In a possible design, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, includes: performing a polynomial transformation on the values of the N pixels to obtain the N first pixel values.

[0008] The polynomial transformation is an efficient nonlinear transformation method. According to the polynomial transformation, an electronic device may perform the nonlinear transformation more efficiently. Furthermore, the polynomial transformation may be performed by using Homer’s method. The electronic device may perform the polynomial transformation using fused multiply-add (FMA) operations, making the nonlinear transformation more efficient and faster.

[0009] In a possible design, wherein a bounded degree of the polynomial transformation is less than 5.

[0010] The bounded degree is less than 5 may achieve a good balance between the calculation time of the polynomial transformation and the quality of the result.

[0011] In a possible design, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, includes: performing a gamma correction on the values of the N pixels to obtain the N first pixel values.

[0012] In a possible design, wherein a part or all of parameters of the nonlinear transformation are obtained by training, and wherein data used to train the parameters are used to train the quantized model.

[0013] According to this design, the nonlinear transformation and the reverse nonlinear transformation may be trained together with the quantized model. Therefore, the nonlinear transformation and the reverse nonlinear transformation may be more suitable for the quantized model, and may achieve a high quality result.

[0014] In a possible design, wherein the performing a nonlinear transformation on values of the N pixels to obtain N first pixel values, includes: performing the nonlinear transformation, by looking up a first transformation table, on the values of the N pixels to obtain the N first pixel values.

[0015] Although storing the first transformation table will consume the storage of the electronic device, it is the fastest nonlinear transformation compared with the polynomial transformation and the gamma-degamma correction. In a possible design, one of the nonlinear transformation or the revers nonlinear transformation may be given by a lookup table and the other one may be given by the polynomial transformation/the gamma-degamma correction. For example, the forward transformation may be given by a lookup table, while the reverse transformation may be given by a polynomial. Any combination options are possible.

[0016] In a possible design, wherein bitwidth of the quantized model is less than bitwidth of the input image.

[0017] This technique can significantly increase the quality of quantized models in the tasks of quantization of models whose bitwidth is less than the bitwidth of the input data, for example, in the tasks of 8-bit full-int quantization of models that process 12-bit raw data.

[0018] According to a second aspect, an embodiment of this application provides an electronic device, and the electronic device has function of implementing the method in the first aspect. The function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware of the software includes one or more modules corresponding to the function. [0019] According to a third aspect, an embodiment of this application provides a computer readable storage medium, including instructions. When the instructions runs on a computer, the computer is enabled to perform the method in the first aspect or any possible implementation of the first aspect.

[0020] According to a fourth aspect, an electronic device is provided, including a processor, a memory, and a communications interface. The processor is connected to the memory and the communications interface. The memory is configured to store instructions, the processor is configured to execute the instructions, and the communications interface is configured to communicate with another network element under control of the processor. When the processor executes the instructions stored in the memory, the processor is enabled to perform the method in the first aspect or any possible implementation of the first aspect.

[0021] According to a fifth aspect, a chip system is provided, where the chip systems includes a memory and a processor, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program from the memory and run the computer program, so that a server on which the chip is disposed performs the method in the first aspect or any possible implementation of the first aspect.

[0022] According to a sixth aspect, a computer program product is provided, wherein when the computer program product runs on a server, the server is enabled to perform the method in the first aspect or any possible implementation of the first aspect.

DESCRIPTION OF DRAWINGS

[0023] FIG. 1 is a flowchart of an embodiment of the method for transforming data.

[0024] FIG. 2 shows a pipeline corresponding to the method for transforming data.

[0025] FIG. 3 shows the difference between a typical case of distribution of model input data and the result of the transformation proposed by the present application.

[0026] FIG. 4 is a schematic block diagram of an electronic device according to an embodiment of this application.

[0027] FIG 5 is a schematic block diagram of an electronic device according to an embodiment of this application.

[0028] FIG. 6 is a schematic diagram of a system architecture according to an embodiment of this application.

DESCRIPTION OF EMBODIMENTS

[0029] The following describes the technical solutions in this application with reference to the accompanying drawings.

[0030] In artificial intelligence (Al) models, using integer arithmetic may give significant speed up. But in general it is not easy to obtain integer values by scaling. Hardware is restricted by number of bits (that is bitwidth of the hardware). For example, an 8-bit signed integer may store 2 ⁸ (or 256) integer values. If the bitwidth of the hardware is 8-bit, the hardware may support integer values from -128 to 127. If two floating point values (0.1 and 0.0001) are transformed into two integer values by scaling, a scale parameter for transforming the two floating point values is 10000, and the result of the transformation is 1000 and 1. However, 1000 is more than 128, and the hardware cannot support this integer value. Therefore, the scale parameter 10000 cannot be used to transform the floating point values by the hardware.

[0031] The procedure of searching scale parameters and tuning parameters of the Al models to obtain integers is called quantization. The Al models mentioned in the present application may include machine learning models, deep learning models, neural network models, deep neural network (DNN) models, or the like. Quantization is a powerful method for acceleration and compression of the Al models. Full-int quantization of Al models implies the use of exclusively integer arithmetic in computing the Al models, and therefore requires quantization of the first and last model layers, as well as the model input. For current hardware, using different arithmetic for different network parts entails significant overhead during model calculation, and therefore full-int quantized models are much more efficient than quantized models, for calculating some parts of which a large bit rate is used, for example, 16 or 32 bit (even if only for the first or last layer). However, the need to quantize absolutely all layers often leads to a deterioration in the quality of quantization. Thus, due to efficiency, full-int quantization has high business value, but high quality drawdowns of such quantized models significantly complicate the use of full-int quantization.

[0032] The distribution of input data of the Al models is often uneven, especially in some computer vision tasks. In some deep learning tasks, the distribution of input data and at the same time the distribution of inputs to the inner layers is non-uniform, and most of the data values can be concentrated in certain areas. This fact is often a serious obstacle to existing methods of traditional uniform quantization of the Al models in the case of quantization, when the bitwidth of the quantized model is less than the bitwidth of the original input data. Data in various tasks have their own specifics, and may have a special distribution that makes quantization difficult. For example, raw data in denoising tasks use 12 bits for representation, and their distributions are often concentrated around zero, which significantly degrades the quality of traditional 8-bit quantization of models in such tasks. Also, for example, the color distribution of night urban photos has a peak around zero and a very long tail. Due to the large quantization error of the input data, as well as the inputs of the inner model layers, the quality of the final quantized model can significantly decrease. A special type of quantization is full-integer quantization - it has a high business value, since it allows the most effective implementations of quantized networks, but suffers the most from the described problem.

[0033] The present application provides a novel technique for network quantization, which allows to reduce the impact of the described issue. More specifically, the present application proposes a new data transformation method for the quantized models to make the distribution of inputs of first and other model layers closer to uniform distribution. This allows to reduce the data quantization error, and in the result the inputs of model layers will be much easier to quantize.

[0034] FIG. 1 is a flowchart of an embodiment of a method for transforming data.

[0035] 101, An electronic device obtains an input image. The input image includes N pixels, wherein N is a positive integer.

[0036] In some cases, the electronic device may be a computer. For example, the electronic device may be a personal computer, a laptop, a mobile phone, a tablet computer, a virtual reality device, and so on. [0037] In some other cases, the electronic device may be a component in the computer. For example, the electronic device may be a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), a system on chip (SoC), and so on.

[0038] The input image may be a picture, a part of a picture, a frame of a video, a part of a frame, and so on.

[0039] 102, The electronic device performs a nonlinear transformation on values of the N pixels to obtain N first pixel values.

[0040] The N first pixel values are in one-to-one correspondence with the N pixels. In another word, the electronic device performs the nonlinear transformation on the value of the n ^th pixel among the N pixels to obtain the n ^th first pixel value among the N first pixel values, n=l,..., N.

[0041] 103, The electronic device obtains M second pixel values according to a quantized model and the N first pixel values. M is a positive integer.

[0042] The only restriction on the model mentioned in step 103 is that the model is the quantized model. The quantized model may be a full-int quantized model or any other type of quantized model. The present application does not limit functionality of the quantized model or a computer vision task performed by using the quantized model. In another word, the quantized model may be any model used for the computer vision task. For example, the quantized model may a model for image enhancement, a model for image compression, a model for image segmentation, or the like.

[0043] The value of M may be determined according to the task of the quantized model. For example, if the quantized model is used for denoising the input image, M may be equal to N; if the quantized model is used for increasing resolution of the input image, M may be greater than N; if the quantized model is used for extracting a specific element (e.g., a person, a license plate, and the like) in the input image, M may be less than N.

[0044] 104, The electronic device performs a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtains M third pixels values.

[0045] 105, The electronic device determines an output image according to the M third pixels. [0046] For example, if the quantized model is used for increasing the resolution of the input image, the output image may be a clearer image; if the quantized model is used for extracting a specific element, the output image may be an image of the extracted specific element.

[0047] The method shown in FIG. 1 may be regarded as a pipeline. FIG. 2 shows the pipeline corresponding to the method shown in FIG. 1.

[0048] As shown in FIG. 2, a pipeline 200 includes three units: an input unit 210, a processing unit 220, and an output unit 230.

[0049] The input unit 210 is configured to obtain the input image.

[0050] The processing unit 220 includes three subunits: a nonlinear transformation subunit 221, a quantized model subunit 222, and a reverse nonlinear transformation subunit 223.

[0051] The nonlinear transformation subunit 221 is configured to perform the nonlinear transformation on values of the N pixels of the input image to obtain N first pixel values.

[0052] The nonlinear transformation subunit 221 is used to perform the nonlinear transformation on each value of the N pixels to obtain the N first pixel values. The nonlinear transformation does not change the numerical type of inputs of the nonlinear transformation subunit 221. In another word, if inputs of the nonlinear transformation subunit 221 are floating point numbers, outputs of the nonlinear transformation subunit 221 will be the floating point numbers as well. Therefore, the numerical type of the N first pixel values and the N pixels’ values is the floating point.

[0053] The quantized model subunit 222 is configured to process the N first pixel values to obtain M second pixel values.

[0054] The quantized model subunit 222 may first quantize input data (that is the N first pixel values), then use the quantized model to process quantized input data, and finally dequantize output data of the quantized model to obtain output data (that is the M second pixel values). Therefore, the work of the quantized model subunit 222 may be summarized as: quantization, quantized model processing, and dequantization (reverse quantization).

[0055] As described above, the nonlinear transformation subunit 221 does not transform the floating point numbers into integer numbers. Therefore, the transformation work of the numerical type of the N first pixel values may be performed by the quantized model subunit 222. The quantized model subunit 222 may first convert the N first pixel values from the floating point numbers into the integer numbers (that is the quantization). For ease of description, the result of the quantization is referred to as N first integer values. The N first integer values are in one-to-one correspondence with the N first pixel values. Each of the N first integer values is obtained according to the corresponding first pixel value. In another word, the quantized model subunit 222 transforms the n ^th first pixel value among the N first pixel values into the n ^th first integer value among the N first integer values. The key to the present application is about how to transform the value of the input images to get the inputs of the quantized model subunit 222 (that is, how the nonlinear transformation subunit 221 works) and how to transform the outputs of the quantized model subunit 222 to get the output image (that is, how the reverse nonlinear transformation subunit 223 works). Therefore, there is no limitation on the method for quantizing/dequantizing the floating point numbers in the present application. In another word, the quantized model subunit 222 may perform the full-int quantization or any other type of quantization on the N first pixel values to obtain the N first integer values.

[0056] After the quantization, the quantized model subunit 222 may use the quantized model to process the N first integer values. As described above, there is no limitation on the function of the quantized mode in the present application. The N first integer values are inputs of the quantized model, and outputs of the quantized model are M second integer values.

[0057] After obtaining the M second integer values, the quantized model subunit 222 may transform the M second integer values into M floating point values (that is, the M second pixel values). Similarity, the M second integer values are in one-to-one correspondence with the M second pixel values. Each of the M second pixel values is obtained according to the corresponding second integer value. In another word, the quantized model subunit 222 transforms the m ^th second integer value among the M second integer values into the m ^th second pixel value among the M second pixel values, wherein m=l , ..., M.

[0058] The reverse nonlinear transformation subunit 223 is configured to perform the reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values. Similarly, the reverse transformation performed by the reverse nonlinear transformation subunit 223 does not change the numerical type of the M second pixel values.

[0059] The output unit 230 is configured to determine the output image according to the M third pixel values.

[0060] The distribution of machine learning model inputs is often uneven, especially in some computer vision tasks. In some machine learning tasks, the distribution of input data and at the same time the distribution of inputs to the inner layers is non-uniform, and most of the data values can be concentrated in certain areas. This fact is often a serious obstacle to existing methods of traditional uniform quantization of machine learning models in the case of quantization, when the bitwidth of quantized models is less than the bitwidth of the original input data. Data in various tasks have their own specifics, and may have a special distribution that makes quantization difficult. For example, raw data in denoising tasks use 12 bits for representation, and their distributions are often concentrated around zero, which significantly degrades the quality of traditional 8-bit quantization of models in such tasks. Also, for example, the color distribution of night urban photos has a peak around zero and a very long tail. Due to the large quantization error of the input data, as well as the inputs of the inner model layers, the quality of the final quantized models can significantly decrease.

[0061] The method shown in FIG. 1 or the pipeline shown in FIG. 2 provides a novel technique for transforming the input data of the quantized model, which allows to reduce the impact of the described issue. The new data transformation and de-transformation method (pipeline) for computing quantized models to make the distribution of inputs of first and other model layers closer to uniform distribution. This allows to reduce the data quantization error, and in the result the inputs of model layers will be much easier to quantize.

[0062] The technical solution of the present application uses non-linear transformations of input and output data aimed at improving the quality of data quantization. For convenience, the proposed technical solution of the present application may be called non-linear Data transformations for quantization (NDTQ), and the pipeline shown in FIG. 2 may be called a NDTQ pipeline.

[0063] FIG. 3 shows the difference between a typical case of distribution of model input data and the result of the transformation proposed by the present application. [0064] Left graph in FIG. 3 presents a distribution of original input data that takes values from 0 to 1. Most of the data are clustered around zero and only few pixels have values exceeding half of the dynamic range (greater than 0.5). Hence, if one selects a quantization range equal to [0, 1.0], then most of the values will be quantized with low degree of approximation, otherwise if one shrinks the range down to [0, 0.5], then the pixels belonging to that range will be quantized more precisely, but the strong unacceptable saturation of bright pixels will occur: clipping function (during quantization) will cut the data without opportunity to recover the lost information. Quantization of inputs to inner layers of a machine learning model may face the same problem, especially the machine learning model is used to deal with an image-to-image task. For that type of task, an input image is transferred through the model by skip connections. Consequently, the distribution of feature maps of such layers will be close to the input one.

[0065] The present application proposes to use transformations of input and output of machine learning models, which make better utilization of the whole data range (see right graph in FIG. 3). The proposed data transformations are tuned automatically during the training stage and/or the inference stage of quantized models, which makes it possible to obtain and use optimal data transformations for each quantization task. Moreover, the proposed data transformations allow for efficient implementation. Thus, the proposed technique makes it possible to increase the quality of quantized models in computer vision tasks in which the data is unevenly distributed, and at the same time the inference time of quantized models practically does not change.

[0066] The proposed data transformation method may be compatible with any quantization techniques and any training method, and the proposed data transformation method does not depend on the machine learning model and quantization format and does not require model changes. The proposed data transformation method allows for effective implementations. The proposed data transformation of inputs and outputs will practically not change the inference time of the quantized model. Further, the proposed data transformation method demonstrates a serious impact in quantization tasks. This technique can significantly increase the quality of quantized models in the tasks of quantization of models whose bitwidth is less than the bitwidth of the input data, for example, in the tasks of 8-bit full-int quantization of models that process 12-bit raw data. The proposed technique makes the use of effective low-bit full-int quantization more accessible.

[0067] The forward transformation (that is the nonlinear transformation performed by the nonlinear transformation subunit 221) makes the distribution of inputs of the first and other model layers closer to the uniform distribution, and as a result, the inputs to model layers become much easier to quantize using the uniform quantization. The reverse transformation (that is the reverse nonlinear transformation performed by the reverse nonlinear transformation subunit 223) returns back to the desired space of the model output.

[0068] As described above, the key to the present application is how the nonlinear transformation subunit 221 and the reverse nonlinear transformation subunit 223 work. The nonlinear transformation and the reverse nonlinear transformation are described in more detail below.

[0069] In some embodiments, the nonlinear transformation subunit 221 may perform a polynomial transformation on the values of the N pixels of the input image. In another word, the nonlinear transformation performed by the nonlinear transformation subunit 221 is the polynomial transformation.

[0070] Formula 1 shows a normal form of the polynomial transformation.

[0071] (formula 1)

[0072] where ao, ai, a2,...,ak are parameters of the polynomial transformation, k is a bounded degree of the polynomial transformation, and k is a positive integer.

[0073] Corresponding to the present application, the nonlinear transformation subunit 221 may use formula 2 to obtain the n ^th first pixel value among the N first pixel values:

[0074] (formula 2)

[0075] where x _n is the value of the n ^th pixel among the N pixels, P(x _n) is the n ¹¹¹ first pixel value among the N first pixel values, ao, ai, a2,...,ak are parameters of the polynomial transformation, and k is a bounded degree of the polynomial transformation.

[0076] The bounded degree of the polynomial transformation (that is k in the formula 1 and 2) is related to calculation time of the polynomial transformation and quality of the result of the computer vision task for the input image. In general, if a value of the bounded degree is larger, the quality of the result would be better, but the calculation time would be longer; if the value of the bounded degree is smaller, the calculation time would be decreased, but the quality of the result would be worse. Thus, the value of the bounded degree may be selected based on a good balance between the calculation time of the polynomial transformation and the quality of the result, i.e., the value of the bounded degree may be less than 5 (that is, k<5).

[0077] In some embodiments, there is an efficient implementation of the polynomial transformation that use fused multiply-add (FMA) operations. An FMA operation has the form d=a+b*c. Therefore, the electronic device may calculation values of the polynomial according to the Homer’s method. According to the Homer’s method, formula 1 may be represented in: [0078] (formula 3) [0079] In some embodiments, the nonlinear transformation subunit 221 may perform a gamma correction on the values of the N pixels of the input image. In another word, the nonlinear transformation performed by the nonlinear transformation subunit 221 is the gamma correction.

[0080] Formula 4 shows a normal form of the gamma correction.

[0081] Gamma(x) = ax x + b (formula 4)

[0082] where p, a, and b are parameters of the gamma correction.

[0083] Corresponding to the present application, the nonlinear transformation subunit 221 may use formula 5 to obtain the n ^th first pixel value among the N first pixel values:

[0084] Gamma(x _n ) = (/ax x _n + b (formula 5)

[0085] where x _n is the value of the n ^th pixel among the N pixels, Gamma(x _n) is the n ^th first pixel value among the N first pixel values, p, a, and b are parameters of the gamma correction.

[0086] In some embodiments, the nonlinear transformation subunit 221 may perform the nonlinear transformation by looking up a first transformation table, on the values of the N pixels to obtain the N first pixel values. Table 1 is an example of the first transformation table.

Table 1

[0087] The nonlinear transformation subunit 221 may find results of the nonlinear transformation according to table 1. For example, according to table 1, if a value of a pixel is 0.25, than a nonlinear transformation result of the pixel is 0.5.

[0088] The first transformation table may be determined according to an arithmetic transformation, e.g., the polynomial transformation, the gamma correction, or the like.

[0089] The reverse nonlinear transformation subunit 223 performs a reverse nonlinear transformation corresponding to the nonlinear transformation performed by the nonlinear transformation subunit 221.

[0090] In some embodiments, the reverse nonlinear transformation performed by the reverse nonlinear transformation subunit 223 may be an exact reverse transformation of the corresponding nonlinear transformation. The exact reverse transformation means that all parameters used in the reverse nonlinear transformation are the same as the parameters used in the corresponding nonlinear transformation. Take a Gamma-Digamma correction as an example, the nonlinear transformation subunit 221 uses the following formula to perform the gamma correction:

[0091] Gamma(x) = ^p^a. _x x x + bj (formula 6)

[0092] where pi, ai, and bi are parameters of the gamma correction. The reverse nonlinear transformation subunit 223 may use the following formula to perform the degamma correction: [0093] Deamma(x) = x ^P2 - b ₂ ) / a ₂ (formula 7)

[0094] where p2, az, and bz are parameters of the degamma correction. If the reverse nonlinear transformation is the exact reverse transformation, then pi=pz, ai=az, and bi=bz.

[0095] In some embodiments, the reverse nonlinear transformation may be an inexact reverse transformation of the corresponding nonlinear transformation. The inexact reverse transformation means that a method used for the reverse nonlinear transformation corresponds to a method used for the nonlinear transformation, but the parameters of the reverse nonlinear transformation may be different from the parameters of the nonlinear transformation. For example, if the nonlinear transformation subunit 221 performs the gamma correction, the reverse nonlinear transformation subunit 223 will perform the degamma correction, but the parameters for the degamma correction may be different from the parameters for the gamma correction. Take the formula 6 and formula 7 as an example, if the reverse nonlinear transformation is the inexact reverse transformation, then pi^p2, ai a2, or bi b2. For the polynomial transformation, the bounded degree of the nonlinear transformation may be different from that of the reverse nonlinear transformation. For example, the bounded degree of the nonlinear transformation may be 4 whereas the bounded degree of the reverse transformation corresponding to the nonlinear transformation may be 3.

[0096] In some embodiments, the reverse transformation can be not only exact and inexact reverse to the nonlinear transformation, but also absolutely independent with the nonlinear transformation. This means that the reverse transformation can not only have its own independent parameters from the nonlinear transformation, but can also have an absolutely independent format. For example, the forward transformation can be given by a lookup table, while the reverse transformation can be given by a polynomial. Any combination options are possible.

[0097] It may be understood that the above-mentioned nonlinear transformation methods are only examples of the nonlinear transformation rather than limitation. The nonlinear transformation may have other forms. For example, the nonlinear transformation subunit nonlinear 1221 may use an Al model to transform the values of the pixels of the input image.

[0098] In some embodiments, the parameters of the nonlinear transformation (i.e., ao - ak in formula 1; and p, a, and b in formula 4) may be obtained by training. For example, the parameters of the nonlinear transformation may be trained with the quantized model at the same time. In another word, a training system may train the nonlinear transformation and the quantized model on training inputs from a training data repository to determine the parameters of the nonlinear transformation and the parameters of the quantized model. Therefore, at the training stage of the quantized model, the parameters of the nonlinear transformation are variable. The nonlinear transformation at the training stage may be regarded as a trainable data transformation. This makes it possible to improve the final quality of the quantized model. Once the training is complete, the parameters of the nonlinear transformation is fixed, and the nonlinear transformation with fixed parameters may be regarded as a fixed data transformation. At the inference stage, the fixed data transformation may be used to obtain the inputs of the quantized model. Correspondingly, the parameters of the reverse transformation corresponding to the nonlinear transformation may be trained together with the parameters of the nonlinear transformation and the quantized model.

[0099] In an embodiment, all of the parameters of the nonlinear transformation may be obtained by training. In another embodiment, part of the parameters of the nonlinear transformation are obtained by training. For example, the power of the nonlinear transformation (i.e., the parameter p in formula 4) may be obtained by training, and other parameters of the nonlinear transformation may be preset numbers.

[00100] An example of a computer vision problem is the Raw Joint Denoising-Demosaicking (JDD) task. For the Raw JDD task, the input data have a non-uniform distribution and are difficult to quantize (even to 8 bit). In the case of Raw JDD, light enough values could be quantized with less precision because noise and color change in light areas are not visible to the human eye, while values around zero (darker areas) could be quantized with higher precision because the noise from quantization in the dark areas is much more noticeable. [00101] In general case, the proposed nonlinear transformations can be any, and their polynomial approximations can be used for efficiency at the inference stage. In the case of quantization of Raw JDD models, the nonlinear transformations are performed on the inputs of the quantized model, which make the input distribution wider around zero. The square root calculations (SQRT)-like functions (that is the Gamma-Degamma correction) help to quantize dark areas more precisely than light areas. As a result, in the following experiments the baseline full 8-bit quantization and quantization using non-trainable square root NDTQ, as well as quantization using polynomial trainable NDTQ that approximates SQRT-like functions are compared. In order to get the maximum effect from NDTQ, it is sufficient to use 4 ^th degree polynomials for the forward transformation and 3 ^rd degree polynomials for the reverse transformation. Further, keeping the bias parameters at zero in the forward and reverse NDTQ formulas leads to an increase in quality.

[00102] The operation of the proposed NDTQ pipeline (that is the pipeline shown in FIG. 2) using the example of two Raw JDD models: AIRAW1 and AIRAW2 models. In these experiments, full 8-bit quantization of these models in the SNPE format is performed.

[00103] Quantized models are trained by using straight-through estimators and smooth regularization of weights and activations for quantization. However, one can use any other approach to quantization-aware training. The training method should be chosen depending on the task, and the proposed technique is compatible with any training method. To evaluate the contribution of the proposed technique, the same training protocol is used to train all quantized models.

[00104] At the beginning of training of the baseline quantized model, the baseline quantized model is initialized with a pre-trained full precision model without NDTQ, and at the beginning of training of the quantized model with NDTQ, the quantized model is initialized with a pre-trained full precision model with fixed (square root) non-trainable NDTQ. The pre-trained full precision models are trained for the same amount of time and with the same training configurations, and have the same quality.

[00105] As a quality metric, PSNR of quantized models relative to the original full precision model on RGB images from the test datasets is measured. Also, examples of output images of quantized models are demonstrated.

[00106] The conducted experiments demonstrate that the proposed NDTQ technique improves the quality of Raw JDD full 8 bit quantized models

[00107] AIRAW1 model consists of 24 convolutions. Using the example of this model, the results of the baseline full 8-bit quantization with full 8-bit quantization with non-trainable square root NDTQ, as well as with trainable polynomial NDTQ are compared.

[00108] It is observed that using NDTQ improves the quality of the full 8-bit quantized model, in particular, the noise level is reduced, and the details and color are improved.

[00109] The PSNRs corresponding to different full 8-bit quantized models are shown in the table 2.

Table 2

[00110] AIRAW2 model consists of 30 convolutions. This model solves a more complex JDD task than the AIRAW1 model. The result of the baseline full 8-bit quantization is compared with that of the full 8-bit quantization with polynomial trainable NDTQ.

[00111] The perceptual quality of the quantized model with NDTQ exceeds the perceptual quality of the baseline quantized model, in particular, the noise level is reduced, the details and color are improved. PSNRs corresponding to different full 8-bit quantized models are shown in the table 3. Table 3

[00112] FIG. 4 is a schematic block diagram of an electronic device 400 according to an embodiment of this application. As shown in FIG.4, the computing device 400 includes: an input unit 401, a processing unit 402, and an output unit 403.

[00113] The input unit 401 is configured to obtain an input image, wherein the input image includes N pixels, N is a positive integer.

[00114] The processing unit 402 is configured to perform a nonlinear transformation on values of the N pixels to obtain N first pixel values;

[00115] The processing unit 402 is further configured to obtain, according to a quantized model and the N first pixel values, M second pixel values, wherein M is a positive integer; [00116] The processing unit 402 is further configured to perform a reverse transformation corresponding to the nonlinear transformation on the M second pixel values to obtain M third pixel values;

[00117] The output unit 403 is configured to determine, according to the M third pixels values, an output image.

[00118] It should be understood that the electronic device 400 in this embodiment of this application may correspond to the electronic device in the above-mentioned embodiments, and the foregoing and other management operations and/or functions of the units in the electronic device are separately used to implement corresponding steps of the foregoing methods. For brevity, details are not described herein again.

[00119] As shown in FIG.5, an electronic device 500 may include a transceiver 501, a processor 502, and a memory 503. The memory 503 may be configured to store code, instructions, and the like executed by the processor 502.

[00120] It should be understood that the processor 502 may be an integrated circuit chip and has a signal processing capability. In an implementation process, steps of the foregoing method embodiments may be completed by using a hardware integrated logic circuit in the processor, or by using instructions in a form of software. The processor may be a general purpose processor, a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), a system on chip (SoC) or another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor may implement or perform the methods, the steps, and the logical block diagrams that are disclosed in the embodiments of the present invention. The general purpose processor may be a microprocessor, or the processor may be any conventional processor or the like. The steps of the methods disclosed with reference to the embodiments of the present invention may be directly performed and completed by a hardware decoding processor, or may be performed and completed by using a combination of hardware in the decoding processor and a software module. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register. The storage medium is located in the memory, and the processor reads information in the memory and completes the steps of the foregoing methods in combination with hardware in the processor.

[00121] It may be understood that the memory 503 in the embodiments of the present invention may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory. The nonvolatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (Random Access Memory, RAM) and is used as an external cache. By way of example rather than limitation, many forms of RAMs may be used, and are, for example, a static random access memory (Static RAM, SRAM), a dynamic random access memory (Dynamic RAM, DRAM), a synchronous dynamic random access memory (Synchronous DRAM, SDRAM), a double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), an enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), a synchronous link dynamic random access memory (Synchronous link DRAM, SLDRAM), and a direct rambus random access memory (Direct Rambus RAM, DR RAM).

[00122] It should be noted that the memory in the systems and the methods described in this specification includes but is not limited to these memories and a memory of any other appropriate type.

[00123] Referring to FIG. 6, an embodiment of the present application provides a system architecture 600. As shown in the system architecture 600, a data collection device 660 is configured to collect training data. In this embodiment of this application, the training data includes one or more images (namely, training samples) and real results corresponding to the one or more images. The training data may be stored into a database 630. A training device 620 may obtain a quantized model 601, parameters of the nonlinear transformation and the parameters of the reverse nonlinear transformation through training based on training data maintained in the database 630. Details about the training procedure may be referred to the forgoing embodiments and are not described herein again. It should be noted that, in actual application, the training data maintained in the database 630 is not necessarily all collected by the data collection device 660, and may be received from another device. In addition, it should be noted that the training device 620 does not necessarily perform training completely based on the training data maintained in the database 630 to obtain the target model/rule 601, and may obtain training data from a cloud or another place to perform model training. The foregoing description shall not be construed as a limitation on this embodiment of this application.

[00124] The target model/rule 601 obtained by the training device 620 through training may be applied to different systems or devices, for example, applied to an execution device 610 shown in FIG. 6. The execution device 610 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (AR) device, a virtual reality (VR) device, or a vehicle-mounted terminal, or may be a server or the like. In FIG. 6, an I/O interface 612 is configured on the execution device 610 and is configured to exchange data with an external device. A user may input data into the I/O interface 612 by using a customer device 640. In this embodiment of this application, the input data may include an input image. The input image may be an image collected by the execution device 610 by using the data collection device 660, may be an image in the database 630, or may be an image from the customer device 640.

[00125] A preprocessing module 613 is configured to perform preprocessing based on the input data (for example, the input image) received by the I/O interface 612. In this embodiment of this application, the preprocessing module 613 may be configured to implement one or more of the following operations: the nonlinear transformation, the reverse nonlinear transformation, and the like; and is further configured to implement another preprocessing operation.

[00126] In a related processing procedure in which the execution device 610 preprocesses the input data or a calculation module 611 of the execution device 610 performs calculation, the execution device 610 may invoke data, code, and the like in a data storage system 650 to implement corresponding processing, and may also store, into the data storage system 650, data, an instruction, and the like obtained through corresponding processing.

[00127] Finally, the I/O interface 612 returns a processing result, for example, the foregoing obtained image processing result, to the customer device 640, to provide the processing result for the user.

[00128] It should be noted that the training device 620 may obtain, through training based on different training data, corresponding quantized models 601 for different targets that are alternatively referred to as different tasks. The corresponding quantized models 601 may be used to implement the foregoing targets or complete the foregoing tasks, to provide a required result for the user.

[00129] In a case shown in FIG. 6, the user may manually provide the input data. The manually providing may be performed by using a screen provided on the I/O interface 612. In another case, the customer device 640 may automatically send the input data to the I/O interface 612. If it is required that the customer device 640 need to obtain authorization from the user to automatically send the input data, the user may set corresponding permission on the customer device 640. The user may view, on the customer device 640, a result output by the execution device 610. Specifically, the result may be displayed or may be presented in a form of sound, an action, or the like. The customer device 640 may also be used as a data collection end to collect the input data that is input into the I/O interface 612 and an output result that is output from the I/O interface 612, as shown in the figure, use the input data and the output result as new sample data, and store the new sample data into the database 630. Certainly, alternatively, the customer device 640 may not perform collection, and the I/O interface 612 directly stores, into the database 630 as new sample data, the input data that is input into the I/O interface 612 and an output result that is output from the I/O interface 612, as shown in the figure.

[00130] It should be noted that FIG. 6 is merely a schematic diagram of a system architecture provided in an embodiment of the present application. A location relationship between a device, a component, a module, and the like shown in the figure constitutes no limitation. For example, in FIG. 6, the data storage system 650 is an external memory relative to the execution device 610. In another case, the data storage system 650 may be alternatively disposed in the execution device 610.

[00131] An embodiment of this application further provides a system chip, where the system chip includes an input/output interface, at least one processor, at least one memory, and a bus. The at least one memory is configured to store instructions, and the at least one processor is configured to invoke the instructions of the at least one memory to perform operations in the methods in the foregoing embodiments.

[00132] An embodiment of this application further provides a computer storage medium, where the computer storage medium may store a program instruction for performing any of the foregoing methods.

[00133] Optionally, the storage medium may be specifically the memory 503.

[00134] A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.

[00135] It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiment. Details are not described herein again. [00136] In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely an example. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms. [00137] The units described as separate parts may be or may not be physically separate, and parts displayed as units may be or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.

[00138] In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

[00139] When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer readable storage medium. Based on such an understanding, the technical solutions in this application essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this application. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.

[00140] The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Previous Patent: METHOD AND APPARATUSES OF MEASURING PHASE IONOSPHERE SCINTILLATIONS

Next Patent: METHOD FOR MONITORING VOLCANIC ACTIVITY BASED ON THE IDENTIFICATION OF STANDING WAVES