Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR RECOGNIZING OBJECT
Document Type and Number:
WIPO Patent Application WO/2019/171116
Kind Code:
A1
Abstract:
The present disclosure relates to a method and device for recognizing an object. The method includes: acquiring an image of the object; inputting the image into a first machine learning model, and acquiring, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model; inputting at least the first output into a second machine learning model, and acquiring, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and providing the first output and the second output corresponding to the first output.

Inventors:
YANAGAWA, Yukiko (801 Minamifudodo-cho, Horikawahigashiiru, Shiokoji-dori, Shimogyo-ku, Kyoto-sh, Kyoto ., 〒600-8530, JP)
Application Number:
IB2018/051387
Publication Date:
September 12, 2019
Filing Date:
March 05, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OMRON CORPORATION (801 Minamifudodo-cho, Horikawahigashiiru Shiokoji-dori, Shimogyo-ku, Kyoto-sh, Kyoto ., 〒600-8530, JP)
International Classes:
G06K9/00; G06K9/62
Foreign References:
US20180012107A12018-01-11
US20170364771A12017-12-21
US20170206426A12017-07-20
JP2017538999A2017-12-28
Attorney, Agent or Firm:
INABA, Yoshiyuki et al. (TMI Associates, 23rd Floor Roppongi Hills Mori Tower,6-10-1 Roppong, Minato-ku Tokyo, 106-6123, JP)
Download PDF:
Claims:
Claims

1. A method for recognizing an object, comprising:

acquiring an image of the object;

inputting the image into a first machine learning model, and acquiring, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model;

inputting at least the first output into a second machine learning model, and acquiring, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and

providing the first output and the second output corresponding to the first output.

2. The method according to claim 1 , wherein the second output includes a second probability corresponding to a class of the object in the image.

3. The method according to claim 1 or 2, wherein inputting at least the first output into the second machine learning model includes:

inputting the first output and the image into the second machine learning model.

4. The method according to claim 3, wherein inputting the first output and the image into the second machine learning model includes:

associating the one or more first probabilities with the image; and inputting the one or more first probabilities and the image associated with each other into the second machine learning model.

5. The method according to any one of claims 1 to 4, further comprising:

acquiring the first machine learning model and the second machine learning model which has been trained by means of a training method including:

pre-acquiring a training image of the object, a class of the object as label data, and one or more predefined features of the object, the predefined features being defined in association with the class of the object;

using the training image and the one or more predefined features to train the first machine learning model; and

using an output of the first machine learning model and the label data to train the second machine learning model.

6. The method according to claim 5, wherein pre-acquiring the one or more predefined features includes: acquiring the one or more predefined features from a table in which the one or more predefined features are associated with the class of the object.

7. The method according to any one of claims 1 to 6, further comprising:

determining whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and

if it is determined that the correlation does not satisfy the predetermined standard, notifying a user.

8. The method according to any one of claims 1 to 6, further comprising

determining whether the first output is not consistent with the second output; and

if it is determined that the first output is not consistent with the second output, notifying a user.

9. The method according to any one of claims 1 to 8, wherein the one or more features of the object includes the color feature, texture feature, shape feature or space relationship feature of the class of the object.

10. A device for recognizing an object, comprising:

an acquisition unit configured to acquire an image of the object; a processing unit configured to

receive the image of the object,

input the image into a first machine learning model and acquire, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model, and input at least the first output into a second machine learning model and acquire, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and

a providing unit configured to provide the first output and the second output corresponding to the first output.

11. The device according to claim 10, wherein the second output includes a second probability of a class of the object in the image.

12. The device according to claim 10 or 11 , wherein, the processing unit is configured to input the first output and the image into the second machine learning model.

13. The device according to claim 12, wherein the processing unit is configured to:

associate the one or more first probabilities with the image; and input the one or more first probabilities and the image associated with each other in the second machine learning model.

14. The device according to any one of claims 10 to 13, further comprises:

a determination unit configured to determine whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and

a notification unit configured to, if it is determined that the correlation does not satisfy the predetermined standard, notify a user.

15. The device according to any one of claims 10 to 13, further comprising

a determination unit configured to determine whether the first output is not consistent with the second output; and

a notification unit configured to, if it is determined that the first output is not consistent with the second output, notify a user.

16. A device for training the device according to any one of claims 10 to 15, the device for training comprises: an acquisition unit configured to pre-acquire a training image of the object, a class of the object as label data, and one or more predefined features of the object, wherein the predefined features is defined in association with the class of the object; and

a training unit configured to

train the first machine learning model using the training image and the one or more predefined features, and

train the second machine learning model using an output of the first machine learning model and the label data.

17. The device according to claim 16, further comprising:

a storage unit configured to store a table in which the one or more predefined features are associated with the class of the object, wherein

the acquiring unit is configured to acquire the one or more predefined features from the table.

18. A program for recognizing an object, comprising instructions which, when the program is executed by a computer, cause the computer to perform the method according to any one of claims 1 to 9.

19. A storage medium that stores a program for recognizing an object, the program including instructions which, when the program is executed by a computer, cause the computer to perform the method according to any one of claims 1 to 9.

Description:
Method and device for recognizing object

Technical Field

The present disclosure relates to a method and device for recognizing an object.

Background

In recent years, more and more researches on recognition of an object such as a person or a vehicle with neural networks receive attentions.

Object recognition using a Convolutional Neural Network (CNN) is described in JP2017-538999A (Patent Document 1 ). The CNN outputs a recognition result upon receiving an input such as an image, as a result of processing in the CNN with the optimized parameters set through training.

Such a conventional configuration disables a person to know a judgment process experienced by the CNN. Therefore, whether the object is successfully recognized or not as well as a cause thereof are also unknown, this makes it difficult to improve the accuracy of the object recognition.

Citation list

Patent documents

Patent document 1 : JP2017-538999A

Summary

According to a first aspect of the present disclosure, a method for recognizing an object is provided, and the method includes: acquiring an image of the object; inputting the image into a first machine learning model, and acquiring, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model; inputting at least the first output into a second machine learning model, and acquiring, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and providing the first output and the second output corresponding to the first output.

In such a manner, the features of the object recognized from the image are output by the first machine learning model and input to the second learning model so as to output the recognition result in relation to the recognized features in the image. By providing both the recognized features in the image and the recognition results of the object, a user can estimate the features of the object that affects success or failure in the object recognition and training data to be used in additional training required for improving the accuracy of the object recognition..

In the foregoing method, the second output may include a second probability corresponding to a class of the object in the image.

In the foregoing method, inputting at least the first output into the second machine learning model may include: inputting the first output and the image into the second machine learning model. Further, inputting the first output and the image into the second machine learning model may include: associating the one or more first probabilities with the image; and inputting the one or more first probabilities and the image associated with each other into the second machine learning model.

In such a manner, in addition to the first output, the image is input into the second machine learning model, so that recognition accuracy of the second machine learning model may be improved.

The method may further include: acquiring the first machine learning model and the second machine learning model which has been trained by means of a training method including: pre-acquiring a training image of the object, a class of the object as label data, and one or more predefined features of the object, the predefined features being defined in association with the class of the object; using the training image and the one or more predefined features to train the first machine learning model; and using an output of the first machine learning model and the label data to train the second machine learning model.. Further, pre-acquiring the one or more predefined features may include: acquiring the one or more predefined features from a table in which the one or more predefined features are associated with the class of the object.

In such a manner, the first machine learning model acquires ability to recognize the features of the object, and the second machine learning model acquires ability to recognize the object.

The foregoing method may further include: determining whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and, if it is determined that the correlation does not satisfy the predetermined standard, notifying a user. In such a manner, the association between the features recognized from the image and the recognition result of the object may be established and prompted, so that the user may estimate whether recognition succeeds or not on the basis of the features recognized from the image, and particularly if the object is failed to be recognized, the user may realize the specific feature of which imperfect recognition is a main cause for a recognition failure.

The foregoing method may further include: determining whether the first output is not consistent with the second output; and, if it is determined that the first output is not consistent with the second output, notifying a user.

In such a manner, the used is notified that there may be an error in the recognition result. Thus, the user can easily determine what features affect object recognition, so that the factors that make the object recognition accuracy low can be effectively estimated.

In the foregoing method, the one or more features of the object may include the color feature, texture feature, shape feature or space relationship feature of the class of the object.According to a second aspect of the present disclosure, a device for recognizing an object is provided, and the device include: an acquisition unit configured to acquire an image of the object; a processing unit configured to receive the image of the object, input the image into a first machine learning model and acquire, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model, and input at least the first output into a second machine learning model and acquire, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and a providing unit configured to provide the first output and the second output corresponding to the first output.

In such as manner, similar to the first aspect, a user can estimate the features of the object that affects success or failure in the object recognition and training data to be used in additional training required for improving the accuracy of the object recognition.

In the foregoing device, the second output may include a second probability of a class of the object in the image.

In the foregoing device, the processing unit may be configured to input the first output and the image into the second machine learning model. Further, the processing unit may be configured to: associate the one or more first probabilities with the image; and input the one or more first probabilities and the image associated with each other in the second machine learning model.

The foregoing device may further include: a determination unit configured to determine whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and a notification unit configured to, if it is determined that the correlation does not satisfy the predetermined standard, notify a user.

The foregoing device may further include: a determination unit configured to determine whether the first output is not consistent with the second output; and a notification unit configured to, if it is determined that the first output is not consistent with the second output, notify a user.

According to a third aspect of the present disclosure, a device for training the above device is provided, and the device includes: an acquisition unit configured to pre-acquire a training image of the object, a class of the object as label data, and one or more predefined features of the object, wherein the predefined features is defined in association with the class of the object; and a training unit configured to train the first machine learning model using the training image and the one or more predefined features, and train the second machine learning model using an output of the first machine learning model and the label data. The device may further include: a storage unit configured to store a table in which the one or more predefined features are associated with the class of the object, wherein the acquiring unit is configures to acquire the one or more predefined features from the table.

In such as manner, the first machine learning model acquires ability to recognize the features of the object, and the second machine learning model acquires ability to recognize the object. Thus, the first machine learning model and the second machine learning model can be constructed to be utilized in recognizing the object according to the present disclosure.

According to a fourth aspect of the present disclosure, a program for recognizing an object is provided, and the program includes instructions which, when the program is executed by a computer, cause the computer to perform the forgoing method.

According to a fifth aspect of the present disclosure, a storage medium is provided, and the storage medium stores a program for recognizing an object, the program including instructions which, when the program is executed by a computer, cause the computer to perform the foregoing method.

Brief Description of the Drawings

The drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present application. The schematic embodiments of the present disclosure and the descriptions thereof are used to explain the present disclosure, and do not constitute improper limitations to the present disclosure. In the drawings:

Fig. 1 is a hardware structure of a recognition system according to an implementation mode of the present disclosure;

Fig. 2 is a functional block diagram of a recognition system according to an implementation mode of the present disclosure;

Fig. 3 is an exemplary block diagram of output of a first machine learning model and second machine learning model of a recognition system according to an implementation mode of the present disclosure;

Fig. 4 is an exemplary block diagram of a content stored in a storage unit of a recognition system according to an implementation mode of the present disclosure;

Fig. 5 is a flowchart of a recognition phase in a recognition method according to an implementation mode of the present disclosure;

Fig. 6 is a flowchart of a learning phase in a recognition method according to an implementation mode of the present disclosure;

Fig. 7 is a functional block diagram of a recognition system according to another implementation mode of the present disclosure;

Fig. 8 is a flowchart of a recognition phase in a recognition method according to another implementation mode of the present disclosure;

Fig. 9 is a functional block diagram of a device for training according to another implementation mode of the present disclosure; and

Fig. 10 is a flowchart of a learning phase in a recognition method according to another implementation mode of the present disclosure.

Detailed Description of the Embodiments

In sequence to make those skilled in the art better understand the present disclosure, the implementation modes of the present disclosure are clearly and completely described below in combination with the accompanying drawings of the present disclosure. Apparently, the described implementation modes are merely a part of the implementation modes of the present disclosure, rather than all of the implementation modes. All other implementation modes obtained by those skilled in the art based on the implementation modes in the present disclosure, without creative efforts, shall fall within the protection scope of the present disclosure.

At first, a hardware structure of a recognition system 100 for recognizing an object according to an implementation mode of the present disclosure is described.

Fig. 1 is a mode diagram of a hardware structure of a recognition system 100 according to an implementation mode of the present disclosure. As shown in Fig. 1 , for example, the recognition system 100 may be implemented by a general-purpose computer of a general-purpose computer architecture. The recognition system 100 may include a processor 110, a main memory 112, a memory 114, an input interface 116, a display interface 118 and a communication interface 120. These parts may, for example, communicate with one another through an internal bus 122.

The processor 110 extends a program stored in the memory 114 on the main memory 112 for execution, thereby realizing functions and processing described hereinafter. The main memory 112 may be structured to be a nonvolatile memory, and plays a role as a working memory required by program execution of the processor 110.

The input interface 116 may be connected with an input unit such as a mouse and a keyboard, and receives an instruction input by operating the input portion by an operator.

The display interface 118 may be connected with a display, and may output various processing results generated by program execution of the processor 110 to the display.

The communication interface 120 is configured to communicate with a Programmable Logic Controller (PLC), a database device and the like through a network 200.

The memory 114 may store a program capable of determining a computer as the recognition system 100 to realize functions, for example, an object recognition program and an Operating System (OS).

The object recognition program stored in the memory 114 may be installed in the recognition system 100 through an optical recording medium such as a Digital Versatile Disc (DVD) or a semiconductor recording medium such as a Universal Serial Bus (USB) memory. Or, the object recognition program may also be downloaded from a server device and the like on the network.

The object recognition program according to the implementation mode may also be provided in a manner of combination with another program. Under such a condition, the object recognition program does not include a module included in the other program of such a combination, but cooperates with the other program for processing. Therefore, the object recognition program according to the implementation mode may also be in a form of combination with the other program.

Fig. 1 shows an example of implementing the recognition system 100 by virtue of a general-purpose computer. However, the present disclosure is not limited, and all or part of functions thereof may be realized through a dedicated circuit, for example, an Application Specific Integrated Circuit (ASIC) or a Field-Programmable Gate Array (FPGA). In addition, part of processing of the recognition system 100 may also be implemented on an external device connected with the network.

Embodiment 1 >

Fig. 2 is a functional block diagram of a recognition system 200 for recognizing an object according to an implementation mode of the present disclosure. Each part in the recognition system 200 will be described below in detail, by example of the recognition system 200 that is a system for recognizing an object during driving assistance or self-driving.

The recognition system 200 may include a storage unit 218, an acquisition unit 202, a processing unit including a first machine learning model 204 and a second machine learning model 206, a display unit 208, a determination unit 210, a prompting unit 212, a first training unit 214 and a second training unit 216. These units may be implemented by the above recognition system 100, and the division or combination of these units are not limited as described here.

The acquisition unit 202 is, for example, a camera mounted on a vehicle. In addition, the acquisition unit 202 may also be an image acquisition unit configured on a mobile phone, a smart phone or other similar mobile equipment. When the recognition system 200 is used to recognize an object under a running condition of a vehicle, the mobile equipment is equipment capable of being attached to the vehicle. The acquisition unit acquires an image of a recognition target object such as a movable object (for example, a pedestrian, an animal and a vehicle) or still object (for example, a still obstacle, a road sign and a traffic light) appearing in the vicinity (for example, in front) of the vehicle.

Under some conditions, the acquisition unit 202 is not required to acquire an image in real time. For example, the acquisition unit 202 may be a device pre-storing the image, or the acquisition unit 202 may acquires image data from another source (for example, a server and a memory). In addition, the acquisition unit 202 may also be arranged outside the recognition system 200. For example, the acquisition unit 202 communicates with the recognition system 200 through a network.

In addition, the acquisition unit 202 may further preprocess acquired image data. For example, contrast regulation and brightness balancing may be performed on the image data to broaden a dynamic range presented in the captured image. In some embodiments, the image data is further scaled to a bit depth suitable for feeding into an image recognition algorithm.

The acquisition unit 202 inputs the acquired image into the first machine learning model 204 that has been trained, and the first machine learning model 204 outputs one or more first probabilities respectively corresponding to one or more predefined features of the recognition target object in the image as a first output. The first probability may refer to probability that the corresponding predefined feature is included in the image.

For example, the one or more features of the object characterize a color feature, texture feature, shape feature or space relationship feature corresponding to a class of the object. The one or more predefined features are defined in association with the class of the recognition target objects, as shown in Fig.4.

Under such a condition, the recognition target object includes, but not limited to: a pedestrian, a vehicle and the like. If the object is a pedestrian, the predefined features includes , for example, there are two legs and the shape is a cylinder. If the object is a vehicle, the predefined features includes , for example, there are tires and the shape is a quadrangle.

An exemplary block diagram of the first machine learning model 204 of a recognition system 200 according to an implementation mode of the present disclosure is shown in Fig. 3. As shown in Fig. 3, the first machine learning model 204 outputs the first probabilities respectively corresponding to the predefined features in the image. For example, the probability corresponding to the feature “there are tires” in the image is 60%; the probability corresponding to the feature“the shape is a quadrangle” in the image is 32%; the probability corresponding to the feature“there are two legs” in the image is 8%; and the probability corresponding to the feature“the shape is a cylinder” in the image is 10%.

The second machine learning model 206 that has been trained takes one or more first probabilities output by the first machine learning model 204 as input, and outputs a recognition result of the object in the image as second output. Here, the recognition result may indicate whether the recognition target object is included in the image. For example, the recognition target object includes: a pedestrian and a vehicle.

An exemplary block diagram of the second machine learning model 206 of a recognition system according to an implementation mode of the present disclosure is shown in Fig. 3. As shown in Fig. 3, the second machine learning model 206 outputs the recognition result of the object in the image. For example, a probability that the object in the image is a“pedestrian” is 3%; a probability that the object in the image is a“building” is 5%; a probability that the object in the image is a“vehicle” is 90%; and a probability that the object in the image is a“tree” is 2%.

In addition, the second machine learning model 206 may further take both one or more first probabilities output by the first machine learning model 204 and the image, acquired by the acquisition unit 202, as the input, and outputs the recognition result of the object in the image as the second output.

Under such a condition, the acquired image of the object is also taken as the input, so that the recognition accuracy of the second machine learning model for the object may be improved. Under such a condition, the first probabilities may be associated with the image, acquired by the acquisition unit 202. For example, the vector or matrix representing the first probabilities and the 3-dimentional matrix representing the image may be combined as one matrix, so as to be taken as an object to be recognized for input into the second machine learning model 206.

As mentioned above, if the second machine learning model 206 only takes the first probabilities output by the first machine learning model 204 as the input, calculation resources are saved. The user may select the specific data taken as the input by the second machine learning model 206 according to a requirement on the recognition accuracy.

The display unit 208 may be a liquid crystal display. The display unit 208 displays the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 to the user. For example, the display unit may display the output of the first machine learning model 204 and the second machine learning model 206 in a manner shown in Fig. 3. In addition, the display unit 208 may perform displaying according to a manner associate the class of the object with the corresponding features of the object. For example, the class“pedestrian” and the corresponding features“there are two legs” and“the shape is a cylinder” may be displayed in the same color or the same text format. Under such a condition, since the class of the object and the corresponding features may be displayed in a unified manner, recognition of the user is more facilitated.

Flere, the display unit 208 may be regarded as a specific example of the “providing unit” in the present disclosure. Flowever, the present disclosure is not limited thereto. For example, the providing unit may also be implemented in other manners. For example, the first output and the second output may be provided to the user through a paper document, a cloud download, or an email.

In addition, the determination unit 210 judges whether a corresponding relationship between the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 satisfies a predetermined standard or not according to displaying of the display unit 208. For example, if the object is a pedestrian, the probability of the corresponding feature“there are two legs” is only 8%. The predetermined standard is that: under the condition that the object is a pedestrian, the probability of the corresponding feature“there are two legs” is 70%. Therefore, the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output does not satisfy the predetermined standard.

Under such a condition, the determination unit 210 outputs a judgment result to the prompting unit 212. Then, the prompting unit 212 prompts the user. For example, the prompting unit 212 may be a buzzer, that is, the user is prompted with a sound. In addition, the prompting unit 212 may also be a visual prompting unit, for example, a Light-Emitting Diode (LED) lamp. The prompting unit 212 may further be a vibrator, or a combination of the abovementioned types.

In the above embodiment, the recognition system 200 may further include a memory unit 218, in another implementation mode, the memory unit 218 may also be arranged outside the recognition system 200. Here, the first output and the second output may be provided to and stored in the storage unit 218. The storage unit 218 may also be regarded as a specific example of the "providing unit" in the present disclosure. The user may read the stored outputs of the first machine learning model 204 and the second machine learning model 206 from the storage unit 218, and analyze or operate on the outputs.

As shown in Fig. 4, the memory unit 218 maps both the features“there are tires” and“the shape is a quadrangle” into the class“vehicle” of the object in a many-to-one manner, and maps both the features“there are two legs” and“the shape is a cylinder” into the class“person” of the object in the many-to-one manner. In addition, one feature may also be mapped into multiple categories of the object in a one-to-many manner.

In such a manner, the features of the object recognized from the image are output by the first machine learning model and input to the second learning model so as to output the recognition result in relation to the recognized features in the image. By providing both the recognized features in the image and the recognition results of the object from various prompting units such as the storage unit 218, a user can estimate the features of the object that affects success or failure in the object recognition and training data to be used in additional training required for improving the accuracy of the object recognition.

For example, if the specific feature of the object is not recognized in the image when the object is failed to be recognized, then the failure in recognizing the specific feature may be estimated as the factor of failure in object recognition. Based on such estimation, user can perform additional learning or augmented training for improving the accuracy in recognizing the specific feature, in order to improve the accuracy in recognizing the object.

In addition, in a conventional art, if the object is failed to be recognized, the person also has no idea about additional learning data to be prepared for construction of the CNN capable of accurately recognizing the object. Therefore, in a conventional art, it is unlikely to implement efficient learning during machine-learning-based recognition. However, in the present disclosure, the machine learning may be effectively proceed.

In addition, according to an implementation mode of the present disclosure, the recognition system 200 may further include a first training unit 214 and a second training unit 216. In another implementation mode, the first training unit 214 and the second training unit 216 may also be arranged outside the recognition system 200. Under such a condition, for example, the first training unit 214 and the second training unit 216 communicate with the recognition system 200 through the network, and the first training unit 214 and the second training unit 216 may refer to the“a device for training” in the present disclosure.

The first training unit 214 trains the first machine learning model 204, wherein the first training unit 214 trains the first machine learning model 204 in the following manner: pre-acquiring a training image, a class of an object in the training image and one or more features of the object; respectively storing the class of the object and the one or more features of the object corresponding to the training image as the class of the object and the one or more features of the object which are predefined; and taking the class of the object as label data, and using the training image and the one or more predefined features of the object corresponding to the label data to train the first machine learning model.

Specifically, the first training unit 214 captures an image through, for example, a camera, thereby collecting a large amount of image data including the recognition target object. The user defines a name of an object included in the captured image as the label data and features of the object. For example, as mentioned above, according to a requirement of a specific application, the user defines that the recognition target object required includes: a pedestrian, a vehicle and the like. If the object is a pedestrian, the user defines one or more features of the object, for example, there are two legs and the shape if a cylinder. If the object is a vehicle, the user defines one or more features of the object, for example, there are tires and the shape is a quadrangle..

In addition, the features“there are two legs” and“the shape is a cylinder” are mapped into the class“person” of the object in the many-to-one manner. In addition, one feature may also be mapped into multiple categories of the object in the one-to-many manner. Under such a condition, the one or more features of the object are stored in the table in the manner of association with the class of the object. Then, the captured image and the one or more features, of the object corresponding to the label data are stored as training data.

The storage unit 218 may store the one or more predefined features of the object in a table in a manner of association with the class of the object. Fig. 4 is an exemplary block diagram of a content stored in a storage unit of a recognition system according to an implementation mode of the present disclosure. In this case, the one or more predefined features of the object may be acquired from the table in the storage unit 218.

In another implementation mode, the storage unit 218 may also be arranged outside the recognition system 200, and communicates with the recognition system 200 through the Internet.

Then, the first training unit 214 uses the training data to train the first machine learning model 204.

A case that deep learning by a neural network is performed is shown in Fig. 3. In such a case, by comparing the probability corresponding to each feature and the label data, and improving consistency therebetween, the algorithm parameters of the neural network may be adjusted so as to obtain a learned neural network. In the present disclosure, the first machine learning model 204 may be implemented not only by the above neural network, but also by other common learning machines.

In addition, the second training unit 216 is configured to train the second machine learning model 206, wherein the second training unit 216 uses the first probabilities output by the first machine learning model 204 and the above label data as training data to train the second machine learning model 206.

Similarly, the second machine learning model 206 in the present disclosure may be implemented not only by the above neural network, but also by other common learning machines.

Therefore, through the recognition system of the present disclosure, the user may determine whether the object is successfully recognized or not as well as a cause thereof. In addition, if the object is failed to be recognized, the user may know about additional learning data to be prepared for construction of a CNN capability of accurately recognizing the object. Therefore, compared with the conventional art, machine-learning-based recognition may be efficiently trained.

Fig. 5 is a flowchart of a recognition phase in a recognition method according to an implementation mode of the present disclosure.

Specific operations of the recognition phases will be described below in detail.

The recognition phase starts with Step 502, and in the step, an acquisition unit 202 acquires an image of a recognition target object. When the recognition method is applied to recognition of the object under a running condition of a vehicle, the acquisition unit 202 may acquire an image of a movable object (for example, a pedestrian, an animal and a vehicle) or still object (for example, a still obstacle, a road sign and a traffic light) appearing in the vicinity (for example, in front) of the vehicle.

Under some conditions, in Step 502, the acquisition unit 202 is not required to acquire the image in real time. For example, the acquisition unit 202 may acquire image data from another source (for example, a server and a memory).

Back referring to Fig. 3, in Step 504, the first machine learning model 204 outputs one or more first probabilities respectively corresponding to one or more predefined features of the recognition target object in the image. For example, the probability corresponding to the feature“there are tires” in the image is 60%; the probability corresponding to the feature“the shape is a quadrangle” in the image is 32%; the probability corresponding to the feature“there are two legs” in the image is 8%; and the probability corresponding to the feature“the shape is a cylinder” in the image is 10%.

Then, processing goes to Step 506. In Step 506, the second machine learning model 206 outputs the recognition result of the object in the image. For example, a probability that the object in the image is a“pedestrian” is 3%; a probability that the object in the image is a“building” is 5%; a probability that the object in the image is a “vehicle” is 90%; and a probability that the object in the image is a“tree” is 2%.

In addition, in Step 506, the second machine learning model 206 may further take both one or more first probabilities output by the first machine learning model 204 in Step 504 and the image, acquired by the acquisition unit 202 in Step 502, of the object as the input, and outputs the recognition result of the object in the image as the second output. Under such a condition, the acquired image of the object is also taken as the input, so that the recognition accuracy of the second machine learning model 206 for the object may be improved.

Then, processing goes to Step 508. In the step, a display unit 208 displays the first output of the first machine learning model 204 in Step 504 and the second output of the second machine learning model 206 in Step 506. The display unit 208 displays the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 to the user.

Then, processing goes to Step 510. In the step, a determination unit 210 judges whether a corresponding relationship between the first output in Step 504 and the second output in Step 506 satisfies a predetermined standard or not according to displaying of the display unit 208 in Step 508. For example, if the object is a pedestrian, the probability of the corresponding predefined feature“there are two legs” is only 8%. The predetermined standard is that: under the condition that the object is a pedestrian, the probability of the corresponding predefined feature“there are two legs” is 70%. Therefore, the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output does not satisfy the predetermined standard.

Under such a condition, processing goes to Step 512. In the step, a prompting unit 212 prompts the user. Then, processing is ended.

In addition, in Step 510, under the condition that the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output satisfies the predetermined standard, and processing is ended.

In such a manner, an association between the features recognized from the image and the recognition result of the object may be established and prompted, so that the user may estimate whether recognition succeeds or not on the basis of the features recognized from the image, and particularly if the object is failed to be recognized, the user may determine the specific feature of which imperfect recognition is a main cause for a recognition failure. The user may estimate the feature with a low recognition rate, so that additional learning is performed to further increase a recognition rate of the feature, and a recognition rate of the object may be improved.

Fig. 6 is a flowchart of a learning phase in a recognition method according to an implementation mode of the present disclosure.

Specific operations of the learning phase will be described below in detail.

The learning phase starts with Step 602. In the step, a first training unit 214 trains the first machine learning model 204, wherein the first training unit 214 trains the first machine learning model 204 in the following manner: pre-acquiring a training image, and a class of an object in the training image and one or more features of the object; respectively storing the class of the object and the one or more features of the object, acquired in association with the training image, as the class of the object and the one or more features of the object which are predefined; and taking the class of the object in the training image as label data, and using the training image and the one or more predefined features of the object corresponding to the label data to train the first machine learning model 204.

Then, processing goes to Step 604. In the step, a second training unit trains the second machine learning model 206, wherein the second training unit 216 uses the class, taken as the label data, of the object in the training image to train the second machine learning model 206.

In addition, the learning phase may further include the following step. In this step, the storage unit 218 stores the one or more predefined features of the object in a table in a manner of association with the class of the object, as shown in Fig. 4. This step may be implemented in advance of Steps 602 and 604.

After Step 604, processing is ended. Thus, the initial training for constructing the first machine learning model 204 and the second machine learning model 206 has been completed, and this constructed first machine model 204 and second machine learning model 206 may be used in the recognition system 200.

Further, since the features of the object that affects success or failure can be estimated in the recognition phase, the first training unit 214 may retrain the first machine learning model 204 with specific features of the object. For example, if the recognition accuracy of the first machine learning model 204 for a specific feature is low, an image with the specific feature may be added as training data to implement augmented training over the first machine learning model 204 for the specific feature, thereby improving the recognition accuracy for the specific feature. For example, if the recognition accuracy of the first machine learning model 204 for a feature (for example, there are two feet) of the pedestrian is low, the user may add an image including two feet of a person as a training image.

On the basis of the recognition results in the above recognition phase, the features of the object that affects success or failure of image recognition may be estimated, then the second training unit 216 may retrain the second machine learning model 206 with specific features of the object. For example, if the accuracy of the second machine learning model 206 for an object of a specific type is low, an image including the object of the specific type may be added as training data to implement augmented training over the second machine learning model 206 for the specific type, thereby improving the recognition accuracy for the specific type.

Embodiment 2>

Fig. 7 shows a recognition system 300 according to an implementation mode of the present disclosure. In the case where an industrial robot that performs, for example, pick and place operations is included in a factory's production site, the industrial robot needs to stop its operation to prevent a safety accident if an object such as a person appears within a predetermined range of the industrial robot. In addition, an object such as a mobile robot that performs, for example, transportation operations may appear in the predetermined range of the industrial robot and interact with the industrial robot, and therefore, the industrial robot does not need to stop operation at the time. The recognition system 300 may be a system that recognizes these objects (recognition target objects) at the production site, that is, recognizes a person and a mobile robot from images (including video), and controls the industrial robot accordingly according to the recognized object.

The recognition system 300 may be part of a robotic safety system (not shown). The robotic safety system may further include a surveillance camera, and the image captured by the surveillance camera may be provided to the recognition system 300 for use in recognizing objects in the image.

The recognition system 300 may include an acquisition unit 302, a processing unit 304, and a prompting unit 306. The acquisition unit 302 may acquire an image of the predetermined range of the industrial robot from the surveillance camera and provide the image to the processing unit 304.

The processing unit 304 may include a first machine learning model 3041 and a second machine learning model 3042 for outputting a recognition result based on the image provided by the acquisition unit 302. Wherein, the image is inputted into the first machine learning model 3041 , and the first machine learning model 3041 outputs first probabilities corresponding to the predefined features of the recognition target object in the image, as a first output that is provided to the second machine learning model 3042. The predefined features here are features defined in association with the recognition target objects. According to the first output or both the first output and the image, the second machine learning model 3042 outputs the recognition result of the object in the image as a second output.

The prompting unit 306 displays the first output and the second output correspondingly on a management purpose display.

Part or all of the recognition system 300 may be implemented using a general purpose computer, but the present disclosure is not limited thereto. For example, all or a part of the functions may be implemented by a dedicated circuit such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). The first machine learning model 3041 and the second machine learning model 3042 may be implemented by the same processor or by separate processors.

The operation of the recognition system 300 in the recognition phase is described below in conjunction with Fig. 8.

As shown in Fig. 8, in step 802, the acquisition unit 302 acquires an image of the predetermined range of the industrial robot from the surveillance camera. In step 804, the image is inputted into the processing unit 304. Specifically, the image is inputted into the first machine learning model 3041 to obtain the first output. For example, the first output may represent the probability/likelihood that a particular feature (each predefined feature) is included in the image. An example of the first output is shown in Table 1 .

(Table 1 )

In step 806, the second machine learning model 3042 receives the first output as an input, and outputs the object recognition result as a second output. For example, the second output may represent the probability/likelihood that a particular object (each recognition target object) is included in the image. An example of the second output is shown in Table 1 . In step 806, the second machine learning model 3042 may also receive both the first output and the image as inputs, so as to improve recognition accuracy.

In step 808, for example, the processing unit 304 may send the object recognition result as the second output to a controller (not shown) of the industrial robot. Based on the object recognition result, the controller determines whether the recognized object is a person or a mobile robot, and sends a stop signal to the industrial robot when the recognized object is a person.

In step 810, the prompting unit 306 displays the first output and the corresponding second output on a management purpose display (not shown) external to the recognition system 200. The management purpose display can be placed near the industrial robot. Or alternatively, the prompting unit 306 may also send the first output and the corresponding second output to a management purpose computer, or store the first output and the corresponding second output in the memory or in the cloud. In this case, by analyzing the stored data, a factor that makes the recognition accuracy low lower can be estimated more reliably, and the first machine learning model 3041 and the second machine learning model 3042 may be subjected to additional learning so as to improve recognition accuracy.

In step 812, if the first output is not consistent with the second output, the prompting unit 306 notifies the user that there may be an error in the recognition result. Here, an example of "non-consistency" includes a case where the features corresponding to the recognized object determined by the second output do not much the features (the recognized feature) determined by the first output, and more specifically, a feature not included in the predefined features is included in the recognized features, or a feature included in the predefined features is not included in the recognized features.

Table 2 shows an example in which the recognized object is determined to be a "person" based on the second output (e.g., based on the result of a comparison of the second output with a predetermined threshold). In this case, the predefined features include "head", "cylindrical body", "two legs", and "two arms." However, the recognized features include "head", "cylindrical body", "box-shaped body" based on the first output (e.g., based on the result of a comparison of the first output with a predetermined threshold). That is, the recognized features do not include "two legs", "two arms", while the predefined features do not include "box-shaped body". Thus, by comparing the predetermined features corresponding to the recognized object determined by the second output with the recognized features determined by the first out, it is determined that there is non-consistency between the first output and second output.

As a way of notifying the user, for example, a sound or voice message may be outputted to alert the user when the non-consistency is detected, or the outputs with the non-consistency is highlighted on the management purpose display to alert the user. Thereby, the user can easily determine what features affect object recognition, so that the factors that make the object recognition accuracy low can be effectively estimated. (Table 2)

example, instead of step 808, the object recognition result as the second output is displayed on the management purpose display through the prompting unit 306.

Also, the order of the above-mentioned steps may be changed as needed. For example, the non-consistency between the first output and second output mentioned in step 812 may be determined before step 810 and the first output and the second output may be displayed on the management purpose display unless the non-consistency is detected.

The operation of the recognition system 300 during the learning phase is described below in conjunction with Figs. 9 and 10. Fig. 9 shows a functional block diagram of a training device 400 according to an implementation mode of the present disclosure, and Fig. 10 shows a flowchart of training the recognition system 300 using the training device 400.

As shown in Fig. 9, the training device 400 may include, for example, an acquisition unit 402 and a training unit 404. The acquisition unit 402 is used for acquiring training data. The training unit 404 uses the training data to train the first machine learning model 3041 and the second machine learning model 3042 included in the recognition system 300. The training unit 404 includes, for example, a first training unit 4041 that performs training on the first machine learning model 3041 and a second training unit 4042 that performs training on the second machine learning model 3042. Alternatively, the first machine learning model 3041 and the second machine learning model 3042 may also be trained by the same training unit.

Flere, the learning phase may be an initial training for constructing the first machine learning model 3041 and the second machine learning model 3042. As an example of a machine learning model, for example, a neural network can be used.

Part or all of the training device 400 may be implemented using a general purpose computer, but the present disclosure is not limited thereto. For example, all or part of the functions may be implemented by a dedicated circuit such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). The first training unit 4041 and the second training unit 4042 may be implemented by the same processor or by separate processors.

The operation of the training device 400 in the learning phase will be described below with reference to Fig. 10.

In step 1002, the acquisition unit 402 acquires the image data and the label data corresponding to the image data. Flere, an image including the recognition target object, i.e., a person or a mobile robot may be provided as image data. In addition, label data "person" is given to an image (or image region) including a person, and label data "mobile robot" is given to an image (or image region) including a mobile robot. The label data can be viewed as the correct value (the desired output value of the machine learning model) for the corresponding input value of the machine learning model.

In step 1004, the acquisition unit 402 acquires one or more predefined features defined in association with each recognition target object, and examples of the predefined features are shown in Table 3. For example, the object "person" has features such as "head", "cylindrical body", "two legs", and "two arms." Flere, the acquisition unit 402 may read the predefined features corresponding to "person" and "mobile robot" from a database or table prepared in advance. Or alternatively, the acquisition unit 402 may also acquire the oredefined features based on the user's input.

(Table 3)

In step 1006, the first training unit 4041 trains the first machine learning model 3041 . Flere, the training data used includes the above-described image data and the predefined features associated with the object corresponding to the label data of the image data. The first machine learning model 3041 performs supervised learning so as to generate a first machine learning model 3041 capable of recognizing one or more features of an object included in the input image.

In step 1008, the second training unit 4042 trains the second machine learning model 3042. Here, the training data used includes the output of the first machine learning model 3041 and the label data of the image data. The second machine learning model 3042 performs supervised learning so as to generate a second machine learning model 3042 capable of recognizing the object included in the image based on the output of the first machine learning model 3041.

Here, the training data used may further include the image data corresponding to the output of the first machine learning model 3041. In this case, the second machine learning model 3042 may output the recognition result using features other than the features included in the output of the first machine learning model 3041 , so that recognition accuracy may be improved.

Descriptions are made here in combination with driving assistance and factory management, and not intended to limit the present disclosure. As comprehended by those skilled in the art, the present invention may be more widely applied to image-based object recognition of another type. An image as an input may be a still image or a moving image of any type including a thermal image, an infrared image, and a range image (depth image). Also, an object to be recognized may be an object of any type including a movable object, a still object.

All or part of the device and system for recognizing the object may be implemented in the form of a software functional unit. When being sold or used as an independent product, the software functional unit may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present disclosure essentially, or the part contributing to the prior art, or all or part of the technical solutions may be implemented in the form of a software product, and the computer software product is stored in a storage medium, including several instructions for causing a piece of computer equipment (which may be a personal computer, a server or network equipment) to execute all or part of the steps of the method according to each example of the present disclosure. The foregoing storage medium includes various media capable of storing program codes such as a USB disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and moreover, may further include a data stream downloaded from a server or a cloud. The foregoing is only the preferred implementation modes of the present disclosure, and it should be noted that those of ordinary skilled in the art may make some improvements and modifications without departing from the principle of the present disclosure. These improvements and modifications should be regarded to be within the scope of protection of the present disclosure.

Reference signs

100, 200, 300 recognition system

110 processor

112 main memory

114 memory

116 input interface

118 display interface

120 communication interface

122 internal bus

200 network

212 storage unit

202, 302, 402 acquisition unit

204, 3041 first machine learning model

206, 3042 second machine learning model

208 display unit

210 determination unit

212, 306 prompting unit

214, 4041 first training unit

216, 4042 second training unit

218 storage unit

304 processing unit

400 training device

404 training unit