Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EXTRACTING FEATURES FROM SENSOR DATA
Document Type and Number:
WIPO Patent Application WO/2022/157230
Kind Code:
A4
Abstract:
An encoder is trained together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets in a sequence of multiple training steps. Each training step comprises: in a first phase of the training step, updating the set of encoder parameters based on the unannotated sensor data sets, with the aim of optimizing a self-supervised loss function, without updating the set of task-specific parameters of the perception component, and in a second phase of the training step, updating the set of task-specific parameters based on the annotated sensor data sets, with the aim of optimizing a task-specific loss function, wherein the encoder as updated in the first phase of that training step processes a data representation of each annotated sensor data set to extract features therefrom, wherein the perception component processes the extracted features to compute an output therefrom, and wherein the task-specific loss is defined on the output and the associated annotation for each annotated sensor data set for learning a desired perception task. In performing the sequence of multiple training steps, the method alternates repeatedly between the first phase and the second phase.

Inventors:
REDFORD JOHN (GB)
SHARMA ANUJ (GB)
DOKANIA PUNEET (GB)
Application Number:
PCT/EP2022/051205
Publication Date:
September 22, 2022
Filing Date:
January 20, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
FIVE AI LTD (GB)
International Classes:
G06N20/00; G06N3/08; G06V20/56
Attorney, Agent or Firm:
WOODHOUSE, Thomas, Duncan (GB)
Download PDF:
Claims:
AMENDED CLAIMS received by the International Bureau on 02 August 2022 (02.08.2022)

1. A computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets, each annotated sensor data set having an associated annotation, the encoder having a set of encoder parameters, and the perception component having a set of task- specific parameters, the method comprising: performing a sequence of multiple training steps, wherein each training step comprises: in a first phase of the training step, updating the set of encoder parameters based on the unannotated sensor data sets, with the aim of optimizing a self-supervised loss function, without updating the set of task-specific parameters of the perception component, and in a second phase of the training step, updating the set of task-specific parameters based on the annotated sensor data sets, with the aim of optimizing a task-specific loss function, wherein the encoder as updated in the first phase of that training step processes a data representation of each annotated sensor data set to extract features therefrom, wherein the perception component processes the extracted features to compute an output therefrom, and wherein the task-specific loss is defined on the output and the associated annotation for each annotated sensor data set for learning a desired perception task; whereby, in performing the sequence of multiple training steps, the method alternates between the first phase and the second phase, thereby interleaving the training of the perception component with the training of the encoder.

2. The method of claim 1, wherein the self-supervised loss function is defined on positive training examples, each positive training example comprising at least two associated data representations of the same sensor data set.

3. The method of claim 2, wherein the self-supervised loss function is a contrastive loss function that is optimized in the first phase with the aim of identifying associated data representations,

4. The method of claim 2, wherein the at least two data associated representations are related by a transformation parameterized by at least one numerical transformation value, wherein the encoder extracts respective features from the at least two data associated representations of each positive training example, wherein at least one numerical output value is computed from the extracted features, and wherein the self-supervised loss function is a regression loss function that encourages the at least one numerical output value to match the at least one numerical transformation value parameterizing the transformation.

5. The method of any preceding claim, wherein a projection component projects features extracted by the encoder from a feature space into a projection space, wherein the self- supervised loss is defined in the projection space, and wherein a set of projection parameters of the projection component is updated in the first phase simultaneously with the set of encoder parameters,

6. The method of any preceding claim, wherein the set of encoder parameters is frozen in the second phase.

7. The method of any of claims 1 to 5, wherein the set of encoder parameters is further updated in the second phase based on the task-specific loss, simultaneously with the set of perception parameters.

8. The method of any preceding claim, wherein a single update is applied to the set of encoder parameters in the first phase of each training step, and a single update is applied to the set of perception parameters in the second phase of each training step.

9. The method of any of claims 1 to 7, wherein multiple updates are applied to the set of encoder parameters in the first phase of each training step, and/or multiple updates are applied to the set of perception parameters in the second phase of each training step.

10. The method of claim 9, wherein a different number of updates are applied in the second phase than the first phase.

1 1 . The method of claim 10, wherein a greater number of updates arc applied in the second phase than the first phase.

12. The method of any preceding claim, wherein each data representation is an image or voxel representation,

13. The method of claim 12, wherein each data representation is an image or voxel representation of a 2D or 3D point cloud.

14. The method of any preceding claim, wherein each sensor data set comprises 3D sensor data.

15. The method of any preceding claim, wherein in the first phase of each training step, the set of encoder parameters is updated based on the annotated and un annotated sensor data sets, wherein the self-supervised loss function is independent of the annotations.

16. The method of any preceding claim, wherein each annotated dataset comprises real sensor data.

17. The method of claim 16, wherein the associated annotation is a manual annotation.

18. A computer system comprising and encoder and a perception component, each trained in accordance with any preceding claim, wherein the encoder is configured to receive an input sensor data representation and extract features therefrom, and the perception component is configured to use the extracted features to interpret the input sensor data representation.

19. A training computer program configured, when executed on one or more computer processors, to implement the method of any of claims 1 to 17.