METHOD AND APPARATUS FOR VISUALLY RECOGNIZING AN OBJECT

Title:

METHOD AND APPARATUS FOR VISUALLY RECOGNIZING AN OBJECT

Document Type and Number:

WIPO Patent Application WO/2021/115615

Kind Code:

Abstract:

A method for visually recognizing an object (5) based on an image provided by a camera (7) viewing an object supporting surface (6), comprises the steps of: a) providing (S1) three-dimensional shape data of an object (5), b) selecting (S2) at least one pose of the object (5) in which the object can be stably positioned on a supporting surface; c) calculating (S4), from said shape data and said selected pose, an expected image (10) of at least part of the object (5), and d) deciding (S7) that the object (5) exists in the field of view of the camera (7) if the expected image (10) is found to match at least part of the image provided by the camera (7).

Inventors:

DAI FAN (DE)

Application Number:

PCT/EP2019/085081

Publication Date:

June 17, 2021

Filing Date:

December 13, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ABB SCHWEIZ AG (CH)

International Classes:

G06T1/00; G06V10/75

Foreign References:

JPH02110788A	1990-04-23
US9665800B1	2017-05-30

Attorney, Agent or Firm:

BEETZ & PARTNER (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. A method for visually recognizing an object

(5) based on an image provided by a camera (7) viewing an object supporting surface

(6), comprising the steps of: a)providing (SI) three-dimensional shape data of an object (5), b) selecting (S2) at least one pose of the object (5) in which the object can be sta bly positioned on a supporting surface; c)calculating (S4), from said shape data and said selected pose, an expected image (10) of at least part of the object(5), and d)deciding (S7) that the object (5) exists in the field of view of the camera (7) if the expected image (10) is found to match at least part of the image provided by the camera (7).

2. The method of claim 1, wherein the select ing step b) (S2) is carried out based on the three-dimensional shape data.

3. The method of claim 1 or 2, wherein the se lecting step b) (S2) comprises the steps of bl) selecting three points (13a, 13b) at a surface of the object (5) such that the three points (13a, 13b) define a tangential plane which does not intersect the object (5), and b3) selecting the pose such that the tangential plane coincides with the sup- porting surface (6), and that the object (5) is located on and above the tangential plane . 4. The method of claim 3, further comprising the step of b2) checking whether a surface normal (8) of the tangential plane extending through the centre of gravity (12) of the object (5) intersects a triangle defined by the three points selected in step bl), wherein step b3) is carried out only if the surface normal does intersect the triangle. 5. The method of claim 1 or 2, wherein the se lecting step b) comprises bl') finding a centre of gravity (12) of the object (5), b2') identifying a point (11') on the sur face of the object (5) where the distance (r) to the centre of gravity (12) is local ly minimum; and b3' ) selecting a pose in which the centre of gravity (12) is directly above the iden tified point (11').

6. The method of one of the preceding claims, wherein the selecting step b) (S2) compris es selecting a plurality of poses, an order of the poses is defined (S3) according to their stability, and deciding step d) (S7) is carried out for a pose of less than max imum stability only if the expected image of each more stable pose has been found not to match the image provided by the camera (7).

7. The method of claim 6, wherein for each pose the area of the expected image of the object is estimated, and a pose is judged to be the more stable, the larger its area is. 8. The method of claim 6, wherein for each pose the height (r) of the centre of gravi ty (12) of the object (5) is calculated, and a pose is judged to be the more stable, the lower its centre of gravity is.

9. The method of one of the preceding claims, wherein the calculating step comprises - calculating the expected image (10) under the assumption that the object (5) is placed on said supporting surface (6) and is viewed by said camera (7).

10. The method of any of the preceding claims, comprising placing the camera (7) so that a surface normal (8) of the supporting sur face (6) that extends through the camera (7) is within the field of view of the cam era (7). 11. The method of any of the preceding claims, wherein the image provided by the camera (7) and the expected image (10) comprise depth information. 12. The method of claim 11, wherein the calcu lating step further comprises

- selecting a section plane (15) which ex tends horizontally and/or in parallel to the supporting surface (6) and intersects the object (5), and

- selecting as said part of the object (5) the part of the object which is above the section plane (15).

13. A robot system comprising an object sup porting surface (6), a camera (7) mounted above the object supporting surface (6), a robot (1) and a controller (9) that is con nected to the camera (7) for receiving im ages therefrom, is adapted to recognize an object (5) on the object supporting surface (6) using the method of any of the preced ing claims, and is adapted to control ma nipulation of the object (5) by the robot (1) responsive to a pose and a location of the object (5) on the supporting surface (6). 14. A computer program product comprising com puter executable instructions which, when executed on a computer, cause the computer to carry out the method of any of claims 1 to 12.

15. A data carrier having recorded on it, in computer executable form, the computer pro gram product of claim 14.

Description:

Method and apparatus for visually recognizing an object The present invention relates to a method for rec ognizing an object using machine vision, and appa ratus for carrying out the method.

Present day robots are efficient at highly repeti- tive tasks, e.g. for assembling components that are supplied to a predetermined position within the range of action of the robot, and with a predeter mined pose, so that the robot can do its job "blindfolded". Robot systems that can adapt to changes in their environment are much more diffi cult to implement; for doing so, reliable and effi cient identification of objects such as tools or workpieces in the environment of a robot is essen tial.

Enabling recognition of objects by a robot system requires that an "idea" of the looks of an object to be recognized is implemented in the system, so that the system can decide whether image infor- mation it receives from a camera matches the "idea" or not. For many workpieces that are being pro cessed in a manufacturing environment, such an "idea" is available in the form of CAD data, which, having been used for producing the workpieces, pro- vide a readily available and highly precise de scription of these. Conventionally, matching has been carried out by obtaining a point cloud of co ordinates of a surface of an object of interest from the camera data and checking whether there is a rotation by which these coordinates can be made to fit surface coordinates of the CAD data. Doing this requires a huge amount of processing power, making object-recognizing robot systems expensive both to deploy and to operate.

An object of the present invention is to enable ob ject recognition with reduced processing power. The object is achieved, according to a first aspect of the invention, by a method for visually recog nizing an object based on an image provided by a camera viewing an object supporting surface, com prising the steps of: a) providing three-dimensional shape data of an object that is to be recognized, b) selecting at least one pose of the object in which the object can be stably positioned on a sup porting surface; c) calculating, from said shape data and said se lected pose, an expected image of at least part of the object, as it might be seen by the camera if the object existed on the object supporting sur face, and d) deciding that the object exists in the field of view of the camera if the expected image is found to match at least part of the image provided by the camera. The invention relies on the fact that objects which have no rotation symmetry tend to have only a small number of poses in which they are stable on a sup porting surface, and that therefore, when trying to identify such an object in image data, it is suffi cient to try to match the image data from the cam era to expected images of the object in its stable poses, thereby reducing considerably the required amount of calculation. The method is also applica- ble to objects having rotation symmetry, though. Although these have an infinite number of equally stable poses, these can be handled as one under the present invention, since all yield the same ex pected image.

The selecting step b) can be carried out based on the same shape data as the calculation of the ex pected image. Poses that can be selected in step b) can be imme diately apparent from the shape data, e.g. if these explicitly specify planar facets of the object's surface . Else, the selecting step b) can comprise the sub steps of bl) selecting three points at a surface of the ob ject such that the three points define a tangential plane which does not intersect the object, and b3) selecting the pose such that the tangential plane coincides with the supporting surface, and the object is located on and above the tangential plane. In this way, it is possible to identify three points of a planar facet on which the object might rest stably, but it is also possible to identify a pose in which the object rests on three isolated tips.

For example, in step bl), three points can initial ly been chosen at random on the surface of the ob ject. A plane defined by these three points will usually intersect the object. By repeatedly identi fying, in the one part of the object which is on a given side of the intersection plane, the point of the object which is farthest from the intersection plane, and replacing the identified point for the one point among the initial three which is closest to the identified point, the points will converge, if there are, towards three points that have a com mon tangential plane. A pose in which these three points touch the sup porting plane will be stable only if the centre of gravity is directly above the triangle defined by these points; if not, there is the possibility of the object tipping over and assuming a position where the centre of gravity is still lower. There fore, it is useful to check whether a surface nor mal of the tangential plane extending through the centre of gravity of the object intersects the tri angle defined by the three points selected in step bl), and to carry out step b3) only if the surface normal does intersect the triangle.

If the three points converge into one, the surface of the object between them is likely to be convex. A convex surface may also provide a stable pose, if it has a point where the distance to the centre of gravity of the object is locally minimum. There fore, an alternative way of selecting a pose can comprise sub-steps bl') finding a centre of gravity of the object, b2') identifying a point on the surface of the ob ject where the distance to the centre of gravity is locally minimum, and b3') selecting a pose in which the centre of gravi ty is directly above the identified point.

Many objects have various poses in which they can rest stably on a supporting surface, and which an object to be recognized might assume, and which might be assumed with different probabilities if the object is dropped or otherwise placed on the surface without paying attention to its pose. In order to minimize recognition processing times, it is advantageous to check first for a match between the image provided by the camera and the expected image for the most likely pose. Therefore, if the selecting step b) comprises selecting a plurality of poses, an order of the poses should be defined according to their stability, and deciding step d) should be carried out for a pose of less than maxi mum stability only if the expected image of each more stable pose has been found not to match the image provided by the camera.

There are various ways of judging stability of a pose and which differ in reliability and in the amount of processing involved. One possible way is to estimate for each pose the area of the expected image of the object associated to it, and to judge a pose to be the more stable, the larger its area is. Since the expected images have to be calculated anyway, only a minimum of additional processing is required for the judgment.

A more reliable judgment can be made by calculating for each pose the height of the centre of gravity of the object, and judging a pose is judged to be the more stable, the lower its centre of gravity is.

In order to make matching easy and reliable, ex pected images should be calculated under the as- sumption that the object is placed on said support ing surface and is viewed by said camera. Specifi cally, a distance between the camera and the sup porting surface and a focal length of the camera should be taken into account, in order for the ex- pected images to have the same size as would have a real image of the object on the supporting surface. Further, an angle under which the camera would view the object on the supporting surface should be tak en into account.

Given the fact that if an object that is resting on the supporting surface and is viewed under an oblique direction will yield different images if rotated around the surface normal, it is plausible that recognition will be easier if the object is viewed along the surface normal, since then the visible regions of the object will not vary due to a rotation. Therefore, the camera should be placed so that a surface normal of the supporting surface that extends through the camera is within the field of view of the camera. Ideally, this surface normal should coincide with an optical axis of the camera. For reliable recognition, it is helpful if the im age provided by the camera comprises depth infor mation, i.e. information on the distance between the camera and items or surface points visible in the image. Such information can be provided e.g. by a stereoscopic camera. For the expected images, such depth information is straightforwardly ex tracted from the three-dimensional shape data.

The calculation effort involved with processing these depth data can be limited by selecting a section plane which extends hori zontally and/or in parallel to the supporting sur face and intersects the object, and selecting as said part of the object to be matched the part of the object which is above the section plane.

According to a second aspect of the invention, the above-mentioned object is achieved by a robot sys- tern comprising an object supporting surface, a cam era mounted above the object supporting surface, a robot and a controller that is connected to the camera for receiving images therefrom, is adapted to recognize an object on the object supporting surface using the method of any of the preceding claims, and is adapted to control manipulation of the object by the robot responsive to a pose and a location of the object. Further aspects of the invention relate to a com puter program product comprising computer executa ble instructions which, when executed on a comput er, cause the computer to carry out the method as described above, and to a data carrier in which said computer program product is recorded in com puter executable form.

Further features and advantages of the invention will become apparent from the subsequent descrip tion of embodiments thereof, referring to the ap pended drawings.

Fig. 1 schematically illustrates a robot system implementing the present invention;

Fig. 2 illustrates an exemplary object and ex pected images to be obtained from it; Fig. 3 globally illustrates a method for recog nizing the object

Fig. 4 illustrates a method for finding and sort ing stable poses of the object; and

Fig. 5 illustrates a method for matching an ex pected image to an image from the camera.

Fig. 1 is a highly schematic view of a robot system in which the invention is applicable. It comprises a robot 1 of the articulate type, with a stationary base and a plurality of links 2 interconnected by joints 3 having one or two degrees of rotational freedom, and an end effector 4 for manipulating workpieces 5 on a supporting surface 6. Alterna tively, the robot might be of the gantry type, with members being linearly displaceable in mutually perpendicular directions, preferably in parallel and perpendicular to surface 6.

For simplicity, the supporting surface 6 is shown in Fig. 1 as a single plane, but is obvious that it may comprise several planes at different levels on which workpieces or other objects to be manipulated might rest.

Above the supporting surface 6, a camera 7 is pro vided. A surface normal 8 of the supporting surface 5 that extends through camera 7 is within its field of view; preferably, the surface normal 8 is in the centre of the field of view of the camera 7. The camera may be stationary; in a gantry robot, it might also be mounted to one of its mobile members, in order to be displaceable at least parallel to the surface 6.

A controller 9 receives image data from camera 7 and outputs movement commands to robot 1.

Although the workpieces 5 shown in Fig. 1 are all identical, their images as seen by the camera 7 are highly dissimilar. It should be noted that the shape of the workpieces 5 shown here has been chosen exclusively for illus trative purposes, and that the invention is not limited to any particular shape of workpieces, tools or whatever object the robot 1 is to manipu late.

As can be seen more clearly in Fig. 2, the shape of the workpiece 5 is derived from a cuboid, two sides 5u, 5d of which are curved. The others, 5f, 5a, 51, 5r are flat. The workpiece can rest stably on each of these six sides; the six views lOu, lOd, lOf, 10a, 101, lOr shown in Fig.2 are the images of the workpiece that would be obtained by the camera 8 looking onto the workpiece 5 from above in each of these stable poses. In the following, the poses will be referred to as u, d, f, r, 1, r depending on which of the side is facing the camera. E.g. among the four workpieces shown in Fig. 1, the left- and rightmost ones are in pose u, and the other two in poses f and r, respectively. Being specular images of one another, views r and 1 are easy to distinguish from one another, and from the other views, whereas the outlines of views u, d and f, a are practically identical. In order to tell apart these views, the camera 7 is preferably capa ble of providing depth information. In order to recognize a workpiece 5 on surface 6, the controller 9 retrieves three-dimensional shape data thereof (step SI of Fig. 3). These data can be CAD data in any conventional format that have been used for manufacturing the workpieces 5.

In the next step S2, possible poses of the virtual workpiece 5 described by the shape data are deter mined. The shape data may be vector data which, inter alia, directly specify coordinates of flat surfaces of the workpiece, such as sides 5f, 5a. In that case, any pose of the workpiece in which one of these flat surfaces forms a bottom surface of the workpiece and coincides with surface 6 can be re garded as a pose which the workpiece 5 might as sume. Based on the three-dimensional shape data, coordi nates of a centre of gravity of the workpiece can be calculated. If that is done, it is possible to check, for each of the above-mentioned poses where a flat side coincides with the surface 6, whether the centre of gravity is above the side or not. If it isn't, the workpiece will tip over, and the cor responding pose is not a pose which the workpiece 5 might assume. In order to find poses in which the workpiece rests on non-flat sides, other methods are used. In a method illustrated in Figs 4a, b, a point 11 on the surface of the workpiece 5 is selected arbitrarily, and the processor calculates a pose in which, as shown in Fig. 4a, a plane tangent to the surface at point 11 coincides with supporting surface 6. The distance r to the centre of gravity 12 of workpiece 5 is calculated for point 11 and for surface points in its vicinity. Among these, the point closest to the centre of gravity 12 is selected as a new point 11, and the process is repeated. If no closer point exists (as in Fig. 4b) the process has converged, and a pose in which the point 11' thus found is di- rectly underneath the centre of gravity 12 is sta ble.

In another method illustrated in Fig. 4c and d, three points 13a-c at the surface of the workpiece 5 are selected (of which only two, 13a and 13b, are shown in the Fig.). Selection can be completely random; preferably the points are selected such that distances between them are larger than a pre- determined fraction of an overall dimension of the workpiece 5. The selected points 13a-c define a plane which in Figs. 4c, d is represented aligned with surface 6. By following a gradient of the workpiece surface from point 13a, the controller finds a locally lowest point 14a and rotates the workpiece 5 so that points 13b, c remain in the supporting surface while point 14a is raised up to it. The procedure is repeated for points 13b, 13c.

After a few re-iterations, the process converges, yielding as its result a pose in which three points 14a, b, c of the workpiece touch the surface 6, whereas surrounding surface points of the workpiece either touch surface 6 or are above it. This pose may be assumed to be a stable pose. Preferably, the controller 9 further verifies whether the centre of gravity 12 is above the triangle defined by points 14 a, b, c or not. If it doesn't the pose is likely not to be stable but to tip over. In that case, the pose is discarded, and the process is repeated.

All of the above methods may be repeated a prede termined number of times and/or combined with other methods in order to find a variety of stable poses. The poses that are most likely to occur in practice are reliably found by the methods described above.

In the next step S3, the poses that were found in step S2 are ordered according to their stability (which correlates with the likeliness of workpiece 5 assuming a particular one of these poses when dropped onto surface 6). If the centre of gravity of the workpiece is known, its distance from sur- face 6 can be calculated for each pose, and a pose is judged to be the less stable, the higher above surface 6 the centre of gravity is. From Fig. 4b, it can be seen that pose u (in which the distance r _u is the radius of the dash-dot circle drawn around the centre of gravity 12) is the most stable pose, followed closely by pose d (in which the sup porting surface would coincide with dash-dot line 6d). Pose f (in which side 5a would rest on surface 6) is clearly less stable.

Alternatively, the distance between the surface 6 and the point of workpiece 5 farthest from it may be determined for each pose, and a pose is judged to be the less stable, the larger the distance is.

In a simple alternative, poses might be ordered simply based on the size of their respective ex pected images 10: As can be seen clearly in Fig. 2, the most stable poses, u and d, have the largest expected images lOu, lOd, and the least stable ones, f and a, have the smallest, lOf and 10a.

For each of the poses, an expected image of work- piece 5 is calculated in step S4. As is apparent from Fig. 2, the exemplary workpiece 5 considered here would have six stable poses, in each of which one of sides 5u, 5d, 5f, 5a, 51, 5r faces the cam era 7. For example, if 5u faces the camera 7, the image that would be expected to be seen by the cam era would be a rectangle, whereas if side 5r faces the camera, the image would be expected to be an outline having two short straight edges parallel to one another, one long convexly curved edge, and an- other long edge which is part convex, part concave. It is immediately apparent that distinguishing these images from one another, or from an image that would be obtained with face 5f or fa facing the camera, will be straightforward.

On the other hand, distinguishing between poses (shown in side view in the bottom row of Fig. 5) where side 5u or 5d faces the camera is not so straightforward if only brightness and, possibly, hue information is taken into account. The outline of the expected image is nearly identical, and a distinction has to rely mainly on views of sides 5f, 5a, which occupy only a small fraction of the image. The camera 7 is therefore designed to pro- vide also depth information. Based on the depth in formation, image data can be discarded it they originate from surface points whose distance from the support surface 6 is below a predetermined threshold, defined by a plane 15 parallel to and above surface 6. As shown in the top row of Fig. 5, this yields two clearly different expected images, a two-part one, 10u', for concave-convex side 5u facing the camera, and a single-part one, 10d',for convex side 5d facing the camera 7. Thus, by a judicious choice of the threshold, easi ly distinguishable expected images can be obtained for a great variety of differently-shaped workpiec- es.

In operation, controller 9 obtains an image from camera 7 (S5) and begins processing it by searching

(S6) first for the expected image associated to the most stable pose. This can be done using conven tional pattern recognition algorithms that are known to the skilled person and are therefore not described here, such as blob and closest bounding box processing. If this expected image is matched successfully (S7) to a region of the image from the camera, it is decided that a workpiece 5 is present within the reach of the robot 1. In that case, search for the workpiece is successfully terminated with a minimum of calculation effort, and control- ler 9 controls (S8) the robot 1 to process the workpiece 5 in a way adapted to its pose and azi muthal orientation on surface 6, obtained from the image processing described above. (There might be other workpieces 5 in the image from the camera, but if just one of these is to be worked on at a time, the others can remain unrecognized for the time being.) If no workpiece can be identified in the camera image having the most stable pose, the next less stable pose is selected (S9), and step S6 is repeated searching, this time, for the pose se lected in step S9. In this way, one pose after the other can be searched for, and only if none is found, it is decided (S10) that there is no work- piece 5 within the field of view of the camera 7.

PAGE INTENTIONALLY LEFT BLANK

Reference numerals

1 robot

2 link

3 joint

4 end effector

5 workpiece

6 supporting surface

7 camera

8 surface normal

9 controller

10 view

11 point

12 centre of gravity

13 point

14 point

15 plane

Previous Patent: NETWORK NODE AND METHOD FOR ANONYMIZATION OF USER SENSITIVE DATA IN A COMMUNICATION NETWORK

Next Patent: CLO2 MEASUREMENT METHOD FOR ACIDIFIED SODIUM CHLORITE