Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR MONITORING PERFORMANCE OF AN OBJECT CLASSIFICATION SYSTEM
Document Type and Number:
WIPO Patent Application WO/2023/217374
Kind Code:
A1
Abstract:
Aspects concern a method for monitoring performance of an object classification system comprising the steps of: receiving an image of a sorting area, the image comprising an object to be sorted and a sorter or part thereof associated with picking the object; identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the sorter or part thereof; determining based on the first region of interest and the second region of interest, whether the object is picked up; wherein if the object is picked up, comparing the actual object-type of the object picked up with a predicted object-type; and determining if a performance parameter is achieved.

Inventors:
YAN WAI (SG)
NGO CHI TRUNG (SG)
JEON JIN HAN (SG)
ANDALAM SIDHARTA (SG)
Application Number:
PCT/EP2022/062920
Publication Date:
November 16, 2023
Filing Date:
May 12, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BOSCH GMBH ROBERT (DE)
International Classes:
G06V10/776; G06V20/52; G06V40/20
Other References:
LEE KYUNGJUN KJLEE@CS UMD EDU ET AL: "Hands Holding Clues for Object Recognition in Teachable Machines", CCS '18: PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM PRESS, NEW YORK, NEW YORK, USA, 2 May 2019 (2019-05-02), pages 1 - 12, XP058634791, ISBN: 978-1-4503-6201-6, DOI: 10.1145/3290605.3300566
PEURSUM P ET AL: "Object labelling from human action recognition", PERCOM '03 PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS, IEEE, US, 26 March 2003 (2003-03-26), pages 399 - 406, XP032384619, ISBN: 978-0-7695-1893-0, DOI: 10.1109/PERCOM.2003.1192764
Download PDF:
Claims:
CLAIMS

1. A method for monitoring performance of an object classification system comprising the steps of: a. receiving an image of a sorting area, the image comprising an object to be sorted and a sorter or part thereof associated with picking the object; b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the sorter or part thereof; c. determining based on the first region of interest and the second region of interest, whether the object is picked up; wherein if the object is picked up, d. comparing the actual object-type of the object picked up with a predicted objecttype; and e. determining if a performance parameter is achieved.

2. The method of claim 1, wherein the step of determining whether the object is picked up is based on gesture recognition.

3. The method of claim 1 or 2, further comprising the step of storing the first region of interest, the second region of interest, the actual object-type and the predicted object-type as a database entry.

4. The method of any one of the preceding claims, wherein the predicted object is determined using a deep machine learning model.

5. The method of any one of the preceding claims, wherein the performance parameter is the number of matched results between the predicted object-type and the actual object type.

6. The method of claims 4 or 5, wherein if the performance parameter is below a predetermined threshold, further including the step of retraining the deep machine learning model.

7. A system for monitoring performance of an object classification system comprising an image capturing device positioned or arranged to obtain an image of a sorting area, the image comprising an object to be sorted and a part of a sorter associated with picking the object; a processor arranged in data or signal communication with the image capturing module to receive the image and identify a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the sorter or part thereof; determine based on the first region of interest and the second region of interest, whether the object is picked up; wherein if the object is picked up, compare the actual object-type of the object picked up with a predicted object-type; and determine if a performance parameter is achieved.

8. The system of claim 7, wherein the processor includes an object detection/tracking module configured to identify the first region of interest, track the first region of interest on the sorting area, and assign a unique identifier associated with the tracked object to be sorted.

9. The system of claim 7 or 8, wherein the processor includes a gesture recognition module to identify the second region of interest and determine if the object is picked up.

10. The system of any one of claims 7 to 9, wherein the performance parameter is the number of matched results between the predicted object-type and the actual object type.

11. The system of claim 10, wherein if the performance parameter is below a predetermined threshold, further including the step of retraining a deep machine learning model associated with providing the predicted object-type.

12. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of a. receiving an image of a sorting area, the image comprising an object to be sorted and a sorter or part thereof associated with picking the object; b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the sorter or part thereof; c. determining based on the first region of interest and the second region of interest, whether the object is picked up; wherein if the object is picked up, d. comparing the actual object-type of the object picked up with a predicted objecttype; and e. determining if a performance parameter is achieved.

Description:
SYSTEM AND METHOD FOR MONITORING PERFORMANCE OF AN OBJECT CLASSIFICATION SYSTEM

TECHNICAL FIELD

[0001] The disclosure relates to a method and a system for monitoring performance of an object classification system.

BACKGROUND

[0002] Various object classification systems have been proposed for various purposes, such as waste classification to facilitate recycling. Such object classification systems may increasingly include artificial intelligence (Al) based components or models to automatically predict object-type, so as to reduce the need for manual labor. Although such Al components or models are typically trained extensively before actual deployment, there is a lack of monitoring tools or devices to track the performance of these object classification systems in the real world. In some systems, one or more benchmark datasets may be provided but such benchmark datasets may not be updated or not reflect the real situations. In particular, performance of Al models may be degraded over time due to possible variations between real- world data obtained from the data used to train the Al models.

[0003] There exists a need to provide a real-world monitoring method and system to monitor an operating object classification system.

SUMMARY

[0004] A technical solution is proposed where an object classification system utilizes relatively affordable cameras to localize and track moving objects (e.g. plastic items) on a conveyor belt and monitor a part of sorters used to retrieve or pick the objects (e.g. hand gestures). In the object classification system, each sorter is assigned to one specific object type, with the specific object type assumed to be the ground truth label. The classified object type and an associated image of an object picked by a sorter is stored in a database for comparison with a predicted result from an Al model. The comparison may be used to compute a performance parameter. In some embodiments, the performance parameter may be associated with the number of matched results between the predicted result and the actual sorted result. If the performance parameter drops below a pre-determined threshold, the performance of the object classification system may be deemed to have degraded, and the Al model may undergo retraining with newly unseen real-world data for greater accuracy. The re-training may utilize the classified object type (as ground truth label) and the associated image of an object picked by the sorter which was stored in the database.

[0005] According to the present disclosure, a method for monitoring an output of an object classification system as claimed in claim 1 is provided. A system for monitoring performance of an object classification system according to the disclosure is defined in claim 6. A computer program comprising instructions to execute the computer-assisted method is defined in claim 11.

[0006] The dependent claims define some examples associated with the method and system, respectively.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The disclosure will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

- FIG. 1 is a flow chart of a method for monitoring performance of an object classification system according to some embodiments;

- FIGS. 2A and 2B show a system architecture of a system for monitoring performance of an object classification system according to some embodiments;

- FIG. 3 shows an embodiment of the entries stored in a database for use in deriving a performance parameter of an object classification system;

- FIGS. 4A and 4B show specific examples of deep learning framework associated with the detection and tracking of one or more objects;

- FIG. 5A to 5C illustrate hand landmarks, key points detection on left human hand with single and multiple fingers according to some embodiments of the gesture recognition module;

- FIG. 6A to FIG. 6E illustrate the combined hand gesture recognition and object recognition to form a hybrid model for determining if an object is picked; and

- FIG. 7 shows a schematic illustration of a processor 210 for sorting objects according to some embodiments.

DETAILED DESCRIPTION

[0008] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. Other embodiments may be utilized and structural, logical changes may be made without departing from the scope of the disclosure. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

[0009] Embodiments described in the context of one of the systems or methods are analogously valid for the other systems or methods.

[0010] Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.

[0011] In the context of some embodiments, the articles “a”, “an” and “the” as used with regard to a feature or element include a reference to one or more of the features or elements. [0012] As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

[0013] As used herein, the term “object” includes any object, particularly recyclable or reusable object that may be sorted according to type. For example, plastic objects may be sorted according to various object types. Examples of object types may include whether objects to be sorted are High Density Poly Ethylene (HD PE) plastic objects, Polypropylene (PP), Polystyrene (PS), Low-density polyethylene (LDPE), Polyvinyl chloride (PVC) or Polyethylene terephthalate (PET) plastic objects. Such objects may include bottles, jars, containers, plates, bowls etc. of various shapes, sizes, and forms (example, partially compressed, distorted).

[0014] As used herein, the term “associate”, “associated”, “associate”, and “associating” indicate a defined relationship (or cross-reference) between at least two items. For instance, a part of the sorter (e.g. a hand) used to pick an object for sorting may be the part of the sorter associated with picking an object. A captured image associated with an object may include a defined region of interest which focuses on the object for further processing via object detection algorithm(s).

[0015] As used herein, the term “sort” broadly includes at least one of classification, categorization and arrangement. [0016] As used herein, the term “sorter” includes a human tasked to sort an object according to a type assigned to the human. For example, a first human worker may be assigned to sort HDPE plastic bottles and a second human worker may be assigned to sort PET plastic bottles. Correspondingly, a part of a sorter associated with picking the object may include a hand of a human. It is appreciable that the part of a sorter associated with picking the object may also include a mechanical device such as a mechanical claw or a synthetic hand to assist a sorter. It is appreciable that the object-type assigned to each sorter is considered the ground truth label.

[0017] As used herein, the term “module” refers to, or forms part of, or include an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor (shared, dedicated, or group) that executes code; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. The term module may include memory (shared, dedicated, or group) that stores code executed by the processor.

[0018] FIG. 1 shows a flow chart of a method for monitoring the performance of an object classification system according to an embodiment. The method 100 comprises the steps of: a. receiving an image of a sorting area, the image comprising an object to be sorted and a sorter associated with picking the object (step S102); b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the sorter (step S104); c. determining based on the first region of interest and the second region of interest, whether the object is picked up (step S106); wherein if the object is picked up, d. comparing the object-type of the object picked up with a predicted object-type (step S108); and e. determining if a performance parameter is achieved (step S110).

[0019] Steps S102 to S106 may be assisted by an intelligent module to guide a sorter toward retrieving or picking the correct assigned object for sorting. Such an intelligent module may be an artificial intelligence module or a machine learning module trained to identify the correct object-type based on image recognition and classification techniques to detect and track an object at a sorting area and to further determine if the object has been retrieved from a sorting platform. The output of such intelligent module is a predicted object-type associated with the object.

[0020] In step S108, the comparing step may include determining if there is a match between the object-type picked up with the predicted object-type. In some embodiments, before the comparison step of step S108, an image of the object which is picked up, as well as the corresponding object-type, may be stored as entries in a database (step S112). The image of the object and the information relating to object-type may be converted to suitable data formats for storage.

[0021] In step S110, the performance parameter may be associated with the number of matched results between the predicted object type and the actual object type. In some embodiments, if the number of matched results falls below a certain pre-determined threshold, re-training of the intelligent module may be required. In another embodiment, if the number of matched results is equal to the pre-determined threshold, further analysis may be made to determine if re-training of the intelligent module is necessary. Such re-training may include the use of historical data such as the last time the intelligent module was train/re-trained, past occurrences and/or frequency of wrong prediction.

[0022] In some embodiments, the image with the ROIs and the object-type of the object picked up are stored in a database.

[0023] Steps S102 to S112 are shown in a specific order. However, it is contemplated that other arrangements are possible. Steps may also be combined in some cases. For example, steps S104 and S106 may be combined.

[0024] In some embodiments, the method 100 may be implemented as at least part of a sorting system 200 for sorting plastic objects. FIG. 2A illustrates such a sorting system 200 having a movable platform in the form of a conveyor belt 202. Objects to be sorted 204 are placed on the conveyor belt 202 and these objects are denoted as A, B, and C. In FIG. 2A, there is shown two sorters in the form of human workers SI and S2. The human worker SI is assigned to pick and sort an object A, and the human worker S2 is assigned to pick and sort an object B. The object A may belong to an object-type, which may be a class of plastic (e.g. HDPE) and the object B may belong to another class of plastic (e.g. PET).

[0025] One or more image capturing devices 206 are positioned at suitable locations in the vicinity of the sorting area to capture images associated with each human worker SI, S2 and the objects 204. The image capturing devices 206 may be in the form of an RGB camera or video-recorder (to capture sequences of images known as frames). In the embodiment shown in FIG. 2A, one image capturing device 206a is assigned to a sorting area 208a associated with sorter SI and another image capturing device 206b is assigned to another sorting area 208b associated with sorter S2. Although the illustrated embodiment shows one image capturing device 206 associated with each of the sorter SI and S2, it is appreciable that one image capturing device 206 may be assigned to two or more sorters, or two or more image capturing devices 206 may be assigned to one sorter. [0026] Data obtained from the image capturing devices 206a, 206b may be sent to a computer processor 210. In some embodiments, the computer processor 210 may be an edge computing device configured to host edge containers for purpose of predictive maintenance. The computer processor 210 may be arranged in data or signal communication with a database 216.

[0027] FIG. 2B shows an implementation of the method 100 in the form of a two-stage model according to some embodiments. One or more images captured of the sorting areas 208 associated with objects A, B, C and D may be sent as input data into an object detection/tracking module 212. The output of the detection/tracking module 212 includes (a.) location of region of interests in the form of one or more bounding boxes, and (b.) a unique identifier assigned to each of the objects A, B, C and D. The output of the object detection/tracking module 212 forms the input dataset or part of the input dataset to be fed into a gesture recognition module 214. The gesture recognition module 214 and the object detection/tracking module 212 may include respective input interface configured to receive and parse image data. Output from the gesture recognition module 214 may be in the form of whether an object A, B, C and/or D is picked based on the ROI and the presence/absence of the unique identifiers of the respective objects. If an object A is determined to be picked by a sorter, for example SI, the assigned object-type (for example HDPE plastic type) and the respective ROI is stored in the database 216.

[0028] FIG. 3 shows an embodiment of the entries stored in the database 216, illustrating three different possibilities. In the first-row entry where the object A is picked for sorting by a sorter (SI) assigned to pick HDPE objects, there is a match between the assigned object type and the predicted object type performed by the intelligent module. In the second-row entry where the object B is picked for sorting by a sorter (S2) assigned to pick PET objects, there is a mis-match between the assigned object type and the predicted object type performed by the intelligent module. In some embodiments, the number of mis-matches over a pre-defined period, e.g. entire 1 working day is calculated to evaluate the performance of an intelligent module, which may include an Al model. If an statistical measure, for example the average recall (True Positive Rate) or Fl-score (weighted average of precision and recall) is less than a pre-determined threshold, e.g. 80%, the intelligent module may be required to undergo retraining., the intelligent module may be required to undergo retraining.

[0029] In the third-row entry where there is an unknown, this may indicate a new objecttype not previously accounted for in the system. A sorter may be assigned to such “unknown” object types and existence of such unknown object types may indicate a systemic error where re-training of the intelligent module and/or the sorter is required. [0030] FIG. 4A and 4B show a particular embodiment of the object detection/tracking module 212 in the form of a pre-trained YOLOv4 (You Only Look Once) model to detect moving objects (bounding boxes of objects) on a sorting area comprising a movable platform (e.g. conveyor belt), and a DeepSORT tracking algorithm working in tandem with the YOLOv4 to assign unique tracking identifier to each object on the conveyor belt.

[0031] The input dataset included videos of various object types such as PET bottles, HDPE bottles, LDPE shrink wraps, jars, containers, cardboards, etc. of different shapes, sizes and forms. In order to record the data, these objects may be placed on a conveyor belt with a speed of 0.45 m/s. The image capturing device may be an RGB camera placed at suitable locations (e.g. on top) of the conveyor belt and the data of the objects on the conveyor belt may be continuously recorded using the RGB camera. The dataset may be split into train and validation dataset. The train dataset included 85% of the total data and the test dataset included the rest 15%. To generalize a deep learning model, the augmentation was applied to the train dataset including 90-degree rotations, 5% cropping, 2px blur, and horizontal and vertical flipping.

[0032] The YOLOv4 object detection model may be trained with a predetermined epochs and batch size, for example 2000 epochs (pass) with a batch (sample) size of 64. The training process was monitored by conducting evaluation on the test dataset images after every 100 epochs. The metric that is used to calculate the accuracy of the predictions was MAP (Mean Average Precision) which calculates the percentage of correct predictions by the YOLOv4 model by comparing the number of correct predictions to the actual number of ground truths. The MAP metric was calculated over an IOU (Intersection Over Union) of 0.5. The IOU is a metric which calculates the overlapping area between the predicted bounding box and the ground truth bounding box [6]. After training, the YOLOv4 object detection model achieved a MAP of 94.8%.

[0033] In relation to FIG. 4B, after the object detection/tracking module 212 is trained, a real-time tracking algorithm with a deep association metric known as the DeepSORT tracking algorithm is used to track the detected objects throughout the conveyor belt. The DeepSORT algorithm receives the input video of the sorting area comprising a plurality of frames, uses feature matching, where it checks the relevant features of an object in the first frame and compares it with the features of the objects in the successive frames. The DeepSORT algorithm tracks the objects when the features match according within a threshold in the following continuous (successive) frames. When tracking, the algorithm generates a unique tracking id for each object (also referred to as object ID). The algorithm tracks each of the plurality of objects 204 when the features match within a threshold in the following continuous image frames. When tracking, the DeepSORT algorithm generates a unique tracking identifier for each object.

[0034] The gesture recognition module 214 may include hardware and/or software components to implement a machine learning (ML) and/or an artificial intelligence (Al), such as a known MediaPipe™ model to recognizing or predicting a hand gesture to detect a part of the sorter, e.g. hand (or part thereof) using an object recognition technique and analyzing the landmark model to determine if the hand is in an open position or close position. In some embodiments, the part of the hand may include a palm of the hand, and/or a back portion of the hand. It is appreciable that the gesture recognition module 214 is able to perform hand gesture even if the hand (or part thereof) of the sorter is covered by a glove. The MediaPipe™ framework may be adopted for building multimodal applied machine learning pipelines. The framework may be used for building multimodal (e.g. video, audio, anytime series data), cross platform (i.e. Android, iOS, web, edge devices) applied ML pipelines.

[0035] FIG. 5A to 5C show various illustrations of hand landmarks (FIG. 5A), key points detection on a human left hand with single (FIG. 5B) and multiple fingers (FIG. 5C). It is appreciable that the landmarks and key points detection may also be performed on a human right hand. The key points are marked 510 in FIG. 5B and FIG. 5C showing the human left hand in an open position, and in FIG. 6B showing the human left hand in a closed position.

[0036] FIG. 6A shows a state diagram associated with the prediction of whether the sorter’s hand is opened or closed based on the output provided by the gesture recognition module 214. The distance between the key points is considered in order to predict if the hand is open (state 602) or closed (state 604). The hand is considered to be closed when an average distance between all the 10 key points (five fingers) is below or equals to a pre-determined threshold value, for example an average distance value of 100 pixels between two or more key points 510 set for each finger. The toggling between state 602 and state 604 is based on whether the distance between the key points exceeds or is below the key points distance, which may be re-calculated at every pre-determined interval.

[0037] FIG. 6B illustrates a position on a human hand regarded as a close position gesture output with key points on each finger.

[0038] FIG. 6C shows a state diagram associated with the prediction of whether an object is determined to be present or absent based on the output provided by the object detection and tracking module 214. An object is determined to be present and in state 606 if the bounding box and tracking identifier are both present. An object is determined to be absent and in state 608 if the bounding box and/or tracking identifier disappear (that is, over the course of tracking, the identifier was present in some image frames before and then disappear in subsequent image frames).

[0039] The outputs from both state diagrams shown in FIG. 6A and 6C are combined to determine if an assigned sorter has picked up an object. FIG. 6D shows the state diagram for this combination state diagram based on the output provided by the module 214.

[0040] FIG. 6E illustrates the image or video capture of an object being picked by an assigned sorter for sorting, corresponding to a state 610 (left side frame) to a state 612 (right side frame). The gesture recognition module 214 is shown to be capable of generating a landmark model of the back of the hand to determine is the hand is at an open or closed position.

[0041] It is contemplated that the object detection and tracking identifier is used to check if each of the object disappears over the predetermined period as indicated by the image frames and output the class of the object. This assumes that during picking of an object, the hand (or part of the sorter) covers part of or the entirety of the object, and therefore, the object detection/tracking module 214 is not able to detect the object anymore. In other words, the bounding box and the tracking identifier of the object disappears, and the data is used to predict the disappearing object and its class, as shown in FIG. 6E.

[0042] It is contemplated that other computer vision/ image processing techniques known to a skilled person may be combined/supplemented to form further embodiments to supplement and/or replace the ML/AI algorithms. For example, instead of a one-stage feature detector, a multi-stage feature detector may be envisaged.

[0043] In some embodiments, the processor 210 may include hardware components such as server computer(s) arranged in a distributed or non-distributed configuration to implement characterization databases. The hardware components may be supplemented by a database management system configured to compile one or more industry-specific characteristic databases. In some embodiments, the industry-specific characteristic databases may include analysis modules to correlate one or more dataset with an industry. Such analysis modules may include an expert rule database, a fuzzy logic system, or any other artificial intelligence module.

[0044] FIG. 7 shows a server computer system 700 according to an embodiment. The server computer system 700 includes a communication interface 702 (e.g. configured to receive captured images from the image capturing device(s) 206). The server computer 700 further includes a processing unit 704 and a memory 706. The memory 706 may be used by the processing unit 704 to store, for example, data to be processed, such as data associated with the captured images and intermediate results output from the modules 212, 214, and/or final results output from the module 216.

[0045] In some embodiments, the unique identifier as assigned to each object include information relating to the class of the object and the assigned sorter.

[0046] In some embodiments, a computer-readable medium is provided including program instructions, which, when executed by one or more processors, cause the one or more processors to perform the methods according to the embodiments described above. The computer-readable medium may include a non-transitory computer-readable medium.

[0047] In some embodiments, the ML/AI algorithms may be trained using supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning methods. In some embodiments, the ML/AI algorithms may include algorithms such as neural networks, fuzzy logic, evolutionary algorithms etc.

[0048] It should be noted that the server computer system 200 may be a distributed system including a plurality of computers.

[0049] While the disclosure has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims. The scope of the disclosure is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.