Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A SYSTEM AND METHOD FOR QUALITY CHECK OF LABELLED IMAGES
Document Type and Number:
WIPO Patent Application WO/2023/126280
Kind Code:
A1
Abstract:
Method (200) and systems (100) for identifying mislabeled images from a set of labelled images, for a deep neural network, are described. A sequence of plurality of input labelled images (102) is provided as an input to a segmentation network (116) for generating predictions for each image from said set of labelled images (102). An scoring module (118) is configured to compute two or more scoring functions for each image form the set of images (102) using the predictions generated by the segmentation network (116). A quality check module (120) is configured to configured to identify mislabeled images from the set of labelled images (102) by visualizing said computed two or more scoring functions in multi-dimensional graphical representation.

Inventors:
VINOJ JOHN HOSAN EBENEZER KOIL PILLAI (IN)
KUMAR ABHIJEET (IN)
KALE AMIT ARVIND (IN)
MULLICK KOUSTAV (IN)
Application Number:
PCT/EP2022/087313
Publication Date:
July 06, 2023
Filing Date:
December 21, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BOSCH GMBH ROBERT (DE)
ROBERT BOSCH ENGINEERING AND BUSINESS SOLUTIONS PRIVATE LTD (IN)
International Classes:
G06N3/091; G06F18/214; G06V10/774; G06V10/776; G06V10/778
Domestic Patent References:
WO2019137196A12019-07-18
Foreign References:
CN105404896A2016-03-16
Other References:
UMAA REBBAPRAGADA ET AL: "Active Label Correction", DATA MINING (ICDM), 2012 IEEE 12TH INTERNATIONAL CONFERENCE ON, IEEE, 10 December 2012 (2012-12-10), pages 1080 - 1085, XP032311139, ISBN: 978-1-4673-4649-8, DOI: 10.1109/ICDM.2012.162
Attorney, Agent or Firm:
ROBERT BOSCH GMBH (DE)
Download PDF:
Claims:
CLAIMS

We Claim:

1. A computing system (100) comprising: a memory (110); and a processor (112), coupled to the memory (110), configured to provide a set of plurality of labelled images (102) to a segmentation network (116) wherein said segmentation network (116) is configured to generate predictions of each image from said set of labelled images (102); characterized in that: a scoring module (118) is configured to compute two or more scoring functions for each image from the labelled images (102), using the predictions generated by the segmentation network (116); and a quality check module (120) is configured to identify mislabeled images from the set of labelled images (102) by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation, wherein said multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images (102).

2. The computing system (100) as claimed in claim 1, wherein said quality check module (120) is configured to generate a two-dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images (102).

3. The computing system (100) as claimed in claim 1, wherein said segmentation network (116) is communicatively coupled to a deep neural network (300).

4. The computing system (100) as claimed in claim 3, wherein said segmentation network (116) is configured to generate predictions of segmentation based on a plurality of classifiers and plurality of training data from the deep neural network (300). The computing system (100) as claimed in claim 1, wherein said segmentation network (116) is a trained network on a plurality of labeled images. The computing system (100) as claimed in claim 1, wherein said two or more scoring functions includes functions such as performance metric (loll), probability scores, uncertainty scores, and/or the combinations thereof. A computer- implemented method (200) for identifying mislabeled images from a set of labelled images, for a deep neural network, the method (200) comprising the steps for: receiving (201) a set of plurality of labelled images (102), and generating predictions for each image from said set of labelled images (102), by a segmentation network (116); computing (202) two or more scoring functions by using the generated predictions for each of the labelled images (102), by a scoring module (118); identifying (203) mislabeled images from the set of labelled images (102) by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation, wherein said multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images (102), by using a quality check module (120). The computer-implemented method (200) as claimed in claim 7, wherein said method (200) further comprising a step (204) for generating a multidimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images (102). The computer-implemented method (200) as claimed in claim 7, wherein said method (200) after step (203) for identifying mislabeled images from the set of labelled images (102), further comprising a step (205) wherein, a QC admin directly selects regions of abundant mislabeling in said multi-dimensional graphical representation. The computer-implemented method (200) as claimed in claim 9, wherein said method (200) after step (205) by a QC admin, further comprising a step (206) wherein, a QC worker receives regions of abundant mislabeling in said multidimensional graphical representation and iterate over each image in the assigned grid and mark them for relabeling with comments to be sent for further labelling.

16

Description:
A SYSTEM AND METHOD FOR QUALITY CHECK OF

LABELLED IMAGES

FIELD OF THE INVENTION

[0001] The present subject matter relates, in general, to a system and method for a unified interactive framework for identifying labelling errors to improve labeling process, specifically for Autonomous Driving Applications.

BACKGROUND OF THE INVENTION

[0002] There are few machine learning techniques which are using machine learning models, which can learn from unlabeled data without any human intervention. Such type of machine learning techniques are known as unsupervised learning. For an example, a deep learning model may segment data into groups (or clusters) based on some patterns, the model finds in the data. These groups can then be used as labels so that the data can be used to train a supervised learning model.

[0003] The recent technologies of deep learning for supervised tasks in the domain of computer vision requires labelled training data. Human labeling efforts are costly and grow exponentially with the size of the dataset, costing the industries a huge amount for labeling. Automated semantic segmentation which involves assigning a class label for each pixel of an image, is an example of a task for which getting labelled training data is especially on higher side of costing. The problem becomes even more acute when the domain of interest lies in the area of autonomous driving. In most domain which develop such autonomous driving functions, huge number of video/image sequences are captured by vehicles mounted with different reference sensors covering over hundred thousand miles. However, in such solutions of manual curation of data in datasets such as the above can be an issue when it comes to a limited budget for annotating data.

[0004] A huge amount of clean labelled data are required for training a supervised deep neural network. For training such networks, a big dataset such as MSCOCO, Mapillary Vistas, Youtube-8M, is required. Conventional approaches for handling the huge amounts of data, have focused on unsupervised methods (label is not required), weakly/semi supervised methods (partial labels required) and, on (semi) automatic labelling of data. Such methods may use segmentation prediction as instances of polygons by annotating a set number of pixels in each iteration. For this annotation human intervention is still required. The major focus of these work is to reduce the annotation time for human labelers drastically. While most of these wins on reduction of annotation time, at least a few iterations of the algorithm are required to provide human comparable results.

[0005] Current works in literature focus on reducing the amount of annotated data required by training a model on annotated data and utilizing the model for prediction. The predicted images are further added to the dataset after manual inspection from humans. However, this process is flawed. Firstly, the dataset may be poorly labeled and hence may lead to poor performance of the model. Secondly, each predicted image to be added (in the dataset) needs to be verified manually and becomes a resource intensive operation. Therefore, there is a need for a solution for reducing this resource intensive process by limiting images to be verified by humans.

[0006] A prior art, WO2019137196A1 discloses an image annotation information processing method and apparatus, a server and a system. Supervision and judgment processing logic of a plurality of nodes with different processing results can be provided. When image annotation information goes wrong, the results can be automatically returned, so that operators can perform review, modification, etc. The professional ability of the operators can be improved by continuous auditing feedback interaction, image annotation efficiency is gradually improved, and training set picture annotation accuracy is greatly improved. According to the embodiments, annotation quality can be effectively ensured, timely and effective information feedback is provided in a workflow, and sample image annotation information operating efficiency is improved.

[0007] Another prior art, CN105404896A discloses an annotation data processing method and an annotation data processing system. The annotation data processing method comprises the steps that step SI 10: similarity of multiple annotation results related to annotation tasks is calculated; step S120: the similarity is compared with a similarity threshold, the process goes to step S130 if the similarity is greater than or equal to the similarity threshold, and the process goes to step S140 if the similarity is less than the similarity threshold; step S130: a situation that multiple annotation results pass quality detection is determined; and step S140: a situation that multiple annotation results do not pass quality detection is determined. According to the annotation data processing method and the annotation data processing system, the quality of the annotation results is automatically detected by utilizing the similarity so that annotation staff are enabled to possibly obtain the quality of the annotation results timely and then possibly correct annotation errors timely, and thus annotation accuracy can be effectively enhanced.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWING

[0008] The detailed description is provided with reference to the accompanying figures, wherein:

[0009] FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter;

[0010] FIG. 2 illustrates a flow chart of a method for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter; and [0011] FIGs. 3a & 3b illustrate graphical representation of the scoring functions for each image in a 2-d plane, in accordance with an example implementation of the present subject matter.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0012] FIG. 1 illustrates a system environment for identifying mislabeled images from a set of labelled images, for a deep neural network, in accordance with an example implementation of the present subject matter. The present subject matter describes various approaches to obtain a set of correctly labelled images from a larger set of input labelled images and send the mislabeled images, for further labelling. In an example, the set of labelled input images 102 may contain autonomous driving scene images with varied semantic layout and content. In an example, the set of labelled input images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.

[0013] The system environment may include a computing system 100 and a neural network architecture. The computing system 100 may be communicatively coupled to the neural network architecture. In an example, the computing system 100 may be directly or remotely coupled to the neural network architecture. Examples of the computing system 100 may include, but are not limited to, a laptop, a notebook computer, a desktop computer, and so on.

[0014] The computing system 100 may include a memory 110. The memory 110 may include any non-transitory computer-readable medium including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.

[0015] In an example, the computing system 100 may also include a processor 112 coupled to the memory 110. The processor 112 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labelled as “processor(s)”, may be provided using dedicated hardware as well as hardware capable of executing computer-readable instructions. Further, the computing system 100 may include interface(s) 114. The interface(s) 114 may include a variety of interfaces, for example, interface(s) for users. The interface(s) 114 may include data output devices. In an example, the interface(s) 114 may provide an interactive platform for receiving the input images from a user.

[0016] In an example implementation of the present subject matter, a method and a system are proposed for a unified and an interactive framework for identifying labelling errors. The computing system 100 includes a segmentation network 116, a scoring module 118 and a quality check module 120. In one embodiment, the segmentation network 116 is communicatively coupled to the deep neural network 300.

[0017] The segmentation network 116 is configured to generate predictions for each image from the set of labelled images 102. In one embodiment, the segmentation network 116 generates predictions of segmentation based on a plurality of classifiers and plurality of training data from the deep neural network 300. In this embodiment the segmentation network 116 is trained to identify pixels belonging to different classes. For generating the predictions each labelled sample of the image is treated as oracles, but the predictions and labeling might differ. This difference can be used to compute a scoring function to be defined between labels and predictions of segmentations of each image, which measures the dis similarity /similarity between them.

[0018] For the labelled images, each pixel in an image belongs to a single class from the classification schema. The belongingness of a pixel to a class is deterministic. For labelling images, labels are provided to each pixel using this class ideology and hence labels do not have a probability values associated with it. This leads to unambiguous/concrete markings of the labels of the labelled images 102.

[0019] The scoring module 118 is configured to compute two or more scoring functions for each image from the set of images 102 using predictions generated by the segmentation network 116. In one example, two or more scoring functions may include type of functions such as performance metric (loll) score, probability scores, uncertainty scores, and/or the combinations thereof. In one embodiment, the performance metric score may be used to determine the accuracy of the segmentation network 116. In one example, the performance metric score may include Intersection over Union (loU) such as for semantic segmentation which measures the similarityscore. In one embodiment, the scoring module 118 is further configured to compute confidence score using probability values on the classes for pixels of each image from the set of labelled images 102 from the segmentation network 116.

[0020] In one embodiment, for computing performance metric (loU), the scoring module 118 compares the human labels with prediction output generated by the segmentation network 116. In similar manner, to compute uncertainty score (entropy), the scoring module 118 interpret neural network outputs as probability scores.

[0021] The quality check module 120 is configured to identify mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images (102), obtained from a multi-dimensional graphical representation. Herein, the multi-dimensional graphical representation obtained from varying values of the two or more scoring functions for each image from the labelled images (102). In one embodiment, quality check module 120 is further configured to generate a two- dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102.

[0022] The scoring module 118 enables a scoring mechanism which treats labels (from original annotators) as oracles. The process of quality check approves the status of oracles for each labelled image from the set of images 102 individually. Hence, it might be erroneous to treat the labels as oracles for each instance. This insight poses for another scoring function, which ensures the certainty/confidence score for the predictions and/or labels generated by the segmentation network 116.

[0023] In one example, the segmentation network 116, for each pixel, provides a probability values on the classes, thus, enabling use of uncertainty/confidence score. The confidence score is used to define a (class-specific) uncertainty score on each prediction generated by the segmentation network 116. In this example, entropy computes the uncertainty of the prediction i.e., high entropy values point to network being unsure about its prediction while low entropy denotes strong confidence. In addition to entropy, other metric techniques may be envisaged which are variants of the basic entropy definition. [0024] In this embodiment, the identified mislabeled images are further sent for re-labelling. Herein, the set of mislabeled images are a sub-set of the set of labelled images 102.

[0025] FIG. 2 illustrates a flow chart of a method 200 for identifying mislabeled images from a set of labelled image, for a deep neural network, in accordance with an example implementation of the present subject matter. The method 200 may be implemented by the computing system 100 including the memory 110, the processor 112, and the interface(s) 114, of FIG. 1. Further, the computing system 100 may be communicatively coupled with the neural network architecture as described in FIG. 1. Although, the method 200 is described in context of the system that is similar to the computing system 100 of FIG. 1, other suitable devices or systems may be used for execution of the method 200.

[0026] Referring to FIG.2, at block 201, the method 200 may include receiving a set of plurality of labelled images 102 and generating predictions for each image from said set of labelled images 102, by a segmentation network 116. In an example, the input labelled images 102 may be images from a driving scene comprising a traffic sign, vehicle(s), pedestrian(s), and so on.

[0027] At block 202, the method 200 may include computing two or more scoring functions by using the generated predictions for each of the labelled images 102 by a scoring module 118.

[0028] At block 203, the method 200 may include identifying mislabeled images from the set of labelled images 102 by visualizing image patches from the set of labelled images 102, obtained from a multi-dimensional graphical representation, by using a quality check module 120. Herein, the multi-dimensional graphical representation obtained from varying values of said two or more scoring functions for each image from the labelled images 102. In one embodiment, method 200 further comprising a step 204 for generating a multi-dimensional graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images 102. In this embodiment, the identified mislabeled images provide compact sub-set of the sequence of set of labelled images 102 for further labelling.

[0029] In one embodiment, at block 205 method 200 further comprising a step in which a QC admin directly selects regions of abundant mislabeling in said multidimensional graphical representation. Herein, the QC admin can selects region of abundant mislabeling after step 203 for identifying mislabeled images from the set of labelled images 102. In this embodiment, a scatter application showing multidimensional graphical representation may be presented to the QC admin for selecting regions of abundant mislabeling.

[0030] Further in this embodiment, at block 206 the method 200 further comprising a step in which a QC worker receives regions of abundant mislabeling in said multi-dimensional graphical representation by the QC admin and iterate over each image in the assigned grid and mark them for relabeling with comments to be sent for further labelling. All images assigned for relabeling are then passed to labelers for relabeling.

[0031] FIG. 3 is an exemplary graphical representation using the computed scoring functions for allowing faster selection of mislabeled images from the set of the labelled images, generated by the quality check module 120. Herein in FIG. 3a considering the scoring functions as performance metric score (e.g. loll) and confidence score (i.e. Uncertainty) can be used to place each image in a 2-d Scatter plot. This scatter plot is defined by metric score/IoU on the y-axis and confidence score/uncertainty on the x-axis. In the FIG. 3a, since images are not point objects, the center of the images aligned with corresponding loU and Uncertainty scores.

[0032] Instead of providing vanilla images on the scatter plots shown in FIG. 3a & 3b, where visualizing errors might be difficult, the present invention uses different (colored) views of the images. To enable fast tagging of mislabeled examples, the scatter plots have regions of interest shown in FIG. 3b, where finding images with mislabeling and/or incorrect labeling is easy. This follows from the fact that: when the network predictions and labels disagree (low loU) but the network remains confident about its predictions (low Uncertainty), the probability of finding mislabeling increases. Similarly, images with high loU and low uncertainty points towards no or minor errors (and hence no QC use-case). As shown in FIG. 3b, the scatter plot is divided into 4 regions as shown in Table I where Region 2 is of interest with mislabeled images to quality check. In scenarios where deep learning network reflects high confidence score in predictions these regions boundaries may be distorted heavily. But as shown in FIG. 3a, such regions are still easy to segregate with little effort from a human annotator. FIG. 3b shows an exemplary scatter plot for image patches containing object such as vehicles on BDD dataset. Here we observe that the Regions (1,2, 3, 4) are distorted but nevertheless a structure can be seen in the scatter plot.

TABLE I

[0033] The present invention built an interactive user application using the scatter plot. The application provides a region selector tool which can be utilized for selection of any desired region. The selector can then be used effectively to chose datapoints with mislabeling. This selection tool combined with regions of interest in scatter plots allows for faster selection of mislabeled patches. Figure 3 shows the main window of the scatter QC application.

[0034] In one embodiment of the present invention, a two-step procedure for identifying the mislabeled images for relabeling may be employed. In the first step of this procedure, a quality check (QC) admin uses a scatter QC application directly and selects regions of abundant mislabeling from a scatter plot shown in FIG. 3b. This region of interest is automatically further divided into smaller grids, each of which is passed down to another QC worker. This process will iterate over each image in the assigned grid and mark them for relabeling with remarks from the QC worker. All images assigned for relabeling are then passed to labelers for relabeling. The scatter QC application allows for bounding box-based region selection on the scatter plots. Human assisted inputs utilizing this bounding box selection can effectively focus on the regions where mislabeling are abundant. The QC admin and QC workers further divides and selects images for relabeling while focusing on regions of mislabeling. In the last stage of Labeling as a Service (LaaS), the human annotators relabel the patch. [0035] The core of the present invention is a unified system and an interactive framework for identifying labelling errors. In alternate embodiments of the present invention, the application areas of the computing system 100 may include semantic segmentation, object detection, classification and the like.

[0036] The present invention focuses on finding mislabeling from a specific class instead of finding potential mislabeling for all classes in the schema at once. Moreover, the present invention provides approaches for computation of evaluationmetric (mean loll) and Uncertainty-score (mean entropy) for a sample labelled image. While most of the literature and patents have focused on (semi)-automated labeling and reduction in time and clicks (for annotators), the present invention focuses on quality check process. In general, quality check process is a tedious repetitive process where errors are supposedly less (as compared to a correction to be done in automated labeling) and hence it’s likely that poor quality labels pass-through quality check. In contrast the present invention is not completely dependent on human intervention for quality check of a huge number of labelled images.

[0037] Although aspects for the present disclosure have been described in a language specific to structural features and/or methods, it is to be understood that the appended claims are not limited to the specific features or methods described herein. Rather, the specific features and methods are disclosed as examples of the present disclosure.