Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR GENERATING SUPERPIXEL CLUSTERS
Document Type and Number:
WIPO Patent Application WO/2015/049129
Kind Code:
A1
Abstract:
A method and an apparatus (20) for generating a superpixel cluster for an image are described. A retrieving unit (23) retrieves (10) a depth map associated to the image. A depth map generating unit (24) then generates (11) an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image. A determining unit (25) determines (12) fitting errors for the fitted planes, whereas a calculating unit (26) calculates (13) edge information for the superpixels using the adapted depth map. Using at least the fitting errors and the edge information, a grouping unit (27) groups (14) superpixels into a cluster.

Inventors:
GANDOLPH DIRK (DE)
JACHALSKY JOERN (DE)
PUTZKE-ROEMING WOLFRAM (DE)
SCHLOSSER MARKUS (DE)
Application Number:
PCT/EP2014/070278
Publication Date:
April 09, 2015
Filing Date:
September 23, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
International Classes:
G06T7/00
Other References:
BABETTE DELLEN ET AL: "Segmenting color images into surface patches by exploiting sparse depth data", APPLICATIONS OF COMPUTER VISION (WACV), 2011 IEEE WORKSHOP ON, IEEE, 5 January 2011 (2011-01-05), pages 591 - 598, XP031913628, ISBN: 978-1-4244-9496-5, DOI: 10.1109/WACV.2011.5711558
GUPTA SAURABH ET AL: "Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images", IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. PROCEEDINGS, IEEE COMPUTER SOCIETY, US, 23 June 2013 (2013-06-23), pages 564 - 571, XP032493260, ISSN: 1063-6919, [retrieved on 20131002], DOI: 10.1109/CVPR.2013.79
HOANG TRINH ET AL: "Structure and motion from road-driving stereo sequences", COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2010 IEEE COMPUTER SOCIETY CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 13 June 2010 (2010-06-13), pages 9 - 16, XP031728965, ISBN: 978-1-4244-7029-7
BARRERA FERNANDO ET AL: "Multispectral piecewise planar stereo using Manhattan-world assumption", PATTERN RECOGNITION LETTERS, vol. 34, no. 1, 29 August 2012 (2012-08-29), pages 52 - 61, XP028955943, ISSN: 0167-8655, DOI: 10.1016/J.PATREC.2012.08.009
Attorney, Agent or Firm:
SCHMIDT-UHLIG, Thomas (European Patent OperationsKarl-Wiechert-Allee 74, Hannover, DE)
Download PDF:
Claims:
CLAIMS

1. A method for generating a superpixel cluster for an image, the method comprising the steps of:

- retrieving (10) a depth map associated to the image;

- generating (11) an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image ;

- determining (12) fitting errors for the fitted planes;

- calculating (13) edge information for the superpixels using the adapted depth map; and

- grouping (14) superpixels into a cluster using at least the fitting errors and the edge information.

2. The method according to claim 1, wherein for each superpixel a best fitting plane is determined.

3. The method according to claim 1 or 2, wherein the planes are fitted using orthogonal regression.

4. The method according to one of the preceding claims, wherein the fitting error for a superpixel is determined (12) by calculating an average of absolute differences between an original depth and an approximated depth for all pixels within the superpixel.

5. The method according to one of the preceding claims, wherein for two adjacent superpixels the edge information is

calculated (13) by averaging absolute differences between approximated depths for pixels pairs in a neighborhood set of the two adjacent superpixels.

6. The method according to claim 5, wherein the neighborhood set comprises all pixel pair combinations for boundary pixels of a first superpixel of the two adjacent superpixels and directly adjacent boundary pixels of a second superpixel of the two adjacent superpixels.

7. The method according to one of the preceding claims, wherein a presence of an object edge within a superpixel is detected when the determined (12) fitting error exceeds a fitting error threshold.

8. The method according to claims 7 wherein a presence of an object edge aligned with or in proximity of a superpixel boundary is detected when the determined (12) fitting error does not exceed the fitting error threshold and the

calculated (13) edge information exceeds an edge information threshold .

9. An apparatus (20) configured to generate a superpixel

cluster for an image, the apparatus (20) comprising:

- a retrieving unit (23) configured to retrieve (10) a depth map associated to the image;

- a depth map generating unit (24) configured to generate (11) an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image;

- a determining unit (25) configured to determine (12) fitting errors for the fitted planes;

- a calculating unit (26) configured to calculate (13) edge information for the superpixels using the adapted depth map; and

- a grouping unit (27) configured to group (14) superpixels into a cluster using at least the fitting errors and the edge information.

10. A computer readable storage medium having stored therein

instructions enabling generating a superpixel cluster for an image, which, when executed by a computer, cause the computer to:

- retrieve (10) a depth map associated to the image;

- generate (11) an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image ;

- determine (12) fitting errors for the fitted planes;

- calculate (13) edge information for the superpixels using the adapted depth map; and

- group (14) superpixels into a cluster using at least the fitting errors and the edge information.

Description:
METHOD AND APPARATUS FOR GENERATING SUPERPIXEL CLUSTERS

FIELD OF THE INVENTION The invention relates to a method and an apparatus for

generating superpixel clusters for an image, and more

specifically to a method and an apparatus for generating improved superpixel clusters using depth or disparity

information .

BACKGROUND OF THE INVENTION

Today there is a trend to create and deliver richer media experiences to consumers. In order to go beyond the ability of either sample based (video) or model-based (CGI) methods novel representations for digital media are required. One such media representation is SCENE media representation (http://3d- scene.eu) . Therefore, tools need to be developed for the generation of such media representations, which provide the capturing of 3D video being seamlessly combined with CGI.

The SCENE media representation will allow the manipulation and delivery of SCENE media to either 2D or 3D platforms, in either linear or interactive form, by enhancing the whole chain of multidimensional media production. Special focus is on spatio- temporal consistent scene representations. The project also evaluates the possibilities for standardizing a SCENE

Representation Architecture (SRA) . A fundamental tool used for establishing the SCENE media representation is the deployment of over-segmentation on video. See, for example, R. Achanta et al . : "SLIC Superpixels Compared to State-of-the-Art Superpixel Methods", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43 (2012), pp. 2274-2282. The generated segments, also known as superpixels or patches, help to generate metadata representing a higher abstraction layer, which is beyond pure object detection.

Subsequent processing steps applied to the generated

superpixels allow the description of objects in the video scene and are thus closely linked to the model-based CGI

representation .

A novel application evolving from the availability of

superpixels is the generation of superpixel clusters by

creating a higher abstraction layer representing a patch-based object description in the scene. The process for the superpixel cluster generation requires an analysis of different superpixel connectivity attributes. These attributes can be, for example, color similarity, depth/disparity similarity, and the temporal consistency of superpixels. The cluster generation usually is done semi-automatically, meaning that an operator selects a single initial superpixel in a frame and the algorithm

automatically proposes a cluster of adjacent, similar

superpixels containing the initially selected one. The contour of such a group of superpixels should be aligned with the major object edges within a frame to minimize the effort required to manually or semi-automatically refine or correct the groups of superpixels .

A well-known clustering method for image segmentation is based on color analysis. The color similarity of different picture areas is qualified with a color distance and is used to decide for a cluster inclusion or exclusion of a candidate area.

However, this method does not work reliable for cases in which scene objects cannot be distinguished by their color

structures. In such cases the clustering based on color

information will combine superpixels belonging to different objects, e.g. a person and the background, into one superpixel cluster .

One example of generating superpixel clusters is described in Patent Application PCT/EP14/070139. The described approach refines an initially generated superpixel cluster by broadening the color data base incorporated for distance measures. This is realized by considering geometrical distances and color

similarities with respect to an initially selected superpixel. The new superpixel cluster forming is reached by building a set union of previous independently generated superpixel clusters.

SUMMARY OF THE INVENTION It is thus an object of the present invention to propose an improved solution for generating superpixel clusters making use of depth or disparity information.

According to the invention, a method for generating a

superpixel cluster for an image comprises the steps of:

- retrieving a depth map associated to the image;

- generating an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image;

- determining fitting errors for the fitted planes;

- calculating edge information for the superpixels using the adapted depth map; and

- grouping superpixels into a cluster using at least the fitting errors and the edge information. Accordingly, an apparatus configured to generate a superpixel cluster for an image comprises:

- a retrieving unit configured to retrieve a depth map

associated to the image; - a depth map generating unit configured to generate an adapted depth map by fitting planes to areas in the depth map

corresponding to superpixels of the image;

- a determining unit configured to determine fitting errors for the fitted planes;

- a calculating unit configured to calculate edge information for the superpixels using the adapted depth map; and

- a grouping unit configured to group superpixels into a cluster using at least the fitting errors and the edge

information.

Similarly, a computer readable storage medium has stored therein instructions enabling generating a superpixel cluster for an image, which when executed by a computer, cause the computer to:

- retrieve a depth map associated to the image;

- generate an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image;

- determine fitting errors for the fitted planes;

- calculate edge information for the superpixels using the adapted depth map; and

- group superpixels into a cluster using at least the fitting errors and the edge information. In order to improve upon the clustering of superpixels, next to color information also depth or disparity information can be evaluated. It should be noted that in the following the terms depth and disparity are used synonymously. In those cases, in which objects cannot be distinguished reliably based on their color dissimilarity, the clustering can result in a group of superpixels belonging to different objects. Such cases can often be reliably handled by additionally analyzing the depth information available for each pixel in the image. By analyzing depth maps provided with the images object borders can be reliably detected. This gives an indication whether a superpixel should be included into a cluster or not. To this end a reliable method is provided that finds appropriate thresholds to indicate object edges in depth maps. These can be either between adjacent superpixels or even in a superpixel.

The disparity map is approximated with hyper-planes, preferably using orthogonal regression. Then, object edges are detected by analyzing the plane fitting error, giving a further reliable indication of object edges.

It should be noted that the way in which the information about object edges is obtained is also suitable for analyzing

disparity or depth information in general.

For a better understanding the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 shows a single image of a video sequence,

Fig. 2 depicts a depth map for the image of Fig. 1, Fig. 3 depicts an example of the disparity distribution within superpixel areas,

Fig. 4 shows an adapted depth map obtained using hyper- plane fitting, Fig. 5 shows details of a depth map highlighting three adjacent superpixels,

Fig. 6 shows a neighborhood set for a superpixel,

Fig. 7 illustrates neighborhood examples for two adjacent superpixels ,

Fig. 8 shows superpixel fitting errors calculated for the image of Fig. 1,

Fig. 9 schematically shows a method according to the

invention for generating superpixel clusters, and

Fig. 10 illustrates an apparatus adapted to implement a

solution according to the invention for generating superpixel clusters.

DETAILED DESCRIPTION OF PREFERED EMBODIMENTS

In the following it is useful to distinguish two cases. In the first case a major object edge is aligned with the boundary of superpixels that are similar in color. In the second case a major object edge lies within a superpixel. This happens, for example, when the underlying superpixel algorithm failed to detect an object boundary, e.g. due to a missing color edge, and thus the superpixel boundary is not aligned with the object boundary .

According to the proposed solution the information on object edges extracted from the depth maps is used to improve the grouping of superpixels, which is initially based only on color information and thus can fail in those cases in which object edges are not aligned with color edges.

Fig. 1 shows an image from a video sequence where the colors of objects located in the scene are very similar to the color of the background. The white surface of the mannequins is quite similar to the light grey of the background.

In such cases, the grouping of superpixels can be improved by analyzing the additional depth information provided in form of a depth map as shown in Fig. 2. Known superpixel algorithms utilize color information to generate superpixels. Fig. 3 depicts an example of the disparity distribution within the superpixel areas. The two cases distinguished above can occur. Especially in those cases with missing color edges, the

superpixel boundaries are not necessarily aligned with the object edges.

As can be seen in Figs. 2 and 3, the depth map represents the surface of objects in the scene. The difficulty for detecting object edges originates from the fact that object surfaces can be planar, convex, and concave, and that in the region of missing color edges object edges are not necessarily aligned with superpixel boundaries. Thus, the abrupt changes of

disparity values indicating an object edge can also occur within a superpixel. A further difficulty for a reliable determination of object edges is that disparity maps can be noisy and error-prone, especially for edges of moving objects of the scene.

The proposed solution uses an approximation of the disparity map by means of hyper-planes for detecting object edges. For each superpixel the best fitting hyper-plane is determined, preferably using orthogonal regression, which minimizes the accumulated squares of orthogonal distances that remain between the calculated hyper-plane and the original depth. Fig. 4 shows the result of such a hyper-plane fitting. As can be seen, planes oriented in space describe the depth within each

superpixel.

Two steps are defined for the detection of object edges, which in combination are robust and reliable. Before giving the mathematical definition for the edge detection a short example easing the explanation is presented. Fig. 5 shows details of a depth map highlighting three adjacent superpixels, SP1, SP2, and SP3. While the left Figs. 5 al and a2 depict the original disparity map, the right Figs. 5 bl and b2 show the

approximated disparities using the hyper-planes. Figs. 5 al and bl depict a simplified profile of the three superpixels in focus. As can be seen within the original disparity map (Fig. 5 al), the superpixel SP2 has two different depth levels due to an object edge splitting the superpixel area. The same is true for superpixel SP3, but here the edge is located in close proximity to the superpixels boundary. The depths within superpixel SP1 are uncritically homogenous. Figs. 5 a2 shows the simplified depth profiles for the transition between the adjacent superpixels SP1 to SP2 and SP1 to SP3. The Figs. 5 bl and b2 depict the situation when the

approximated disparity information is used. While the depth for superpixel SP1 remains nearly unchanged, the depth in

superpixel SP2 becomes a slanted plane. For superpixel SP3 the depth is approximated by a plane that is parallel to its original position. Fig. 5 b2 again shows the simplified depth profiles indicating the changes to the original profiles in light grey. A new effect is visible in Fig. 5 bl . Now the transition between superpixels SP1 and SP3 has a visible gap, which was missing before as the depth change was located within superpixel SP3.

The example of SP2 highlights the second case described above, whereas superpixel SP3 exemplifies a special case of the second case, i.e. an object edge being in close proximity to the superpixel boundary, as well as the first case, i.e. an object edge at the superpixel boundary.

Two indicators are introduced for the reliable detection of object edges. These indicators are the plane fitting error and the edge information, which are calculated as follows:

SP tin g Err ( σ )=Π∑ <«*^ ) " ^»PP™ (

SP e dg e,nfo = (2) f/(a 1 ,a 2 ) = {[s 11 ,s 21 ]; ... ;[s lk ,s 2k ]} (3)

The fitting error SPf ittingErr is determined per superpixel by calculating the average of the absolute differences between original disparity and approximated disparity for all pixels within the selected superpixel. The value σ is the set of all pixels comprised by the selected superpixel and disp orig and disp approx return the depth value for the pixel ί. \σ\ is the cardinality of σ.

The edge information SP edgeIn f 0 is determined for two adjacent superpixels. Thereby, it is calculated by averaging the

absolute differences between the approximated depths for pixel pairs in the neighborhood set ί/(σ 1 2 ) of superpixels σ and σ 2 . The neighborhood set ί/(σ 1 2 ) - see Equation (3) - comprises all pixel pair combinations for the boundary pixels of superpixel σ with the directly adjacent boundary pixels of superpixel σ 2 , taking a connectivity of 8, as illustrated in Fig. 6. The functions v and v 2 used in Equation (2) return for a pixel pair j the pixel belonging to σ and σ 2 , respectively.

Fig. 7 gives an example for two adjacent superpixels, for which the neighborhood comprises the following pixel pairs:

[s 1/2 , s 2 ,i];[s lr 2 , s 2 ,2];[s lr 2 , s 2/3 ];[s 1/2 , s 2/4 ];[s 1/2 , s 2/5 }; [s 1/3 , s 2/4 ] ; [s 1/3 , s 2/5 ] }

The object edges can be detected using the Equation (1) and Equation (2) . The plane fitting error is checked first and indicates the presence of an object edge within a superpixel if the error is above a certain fitting error threshold Δ . If the error is equal or below this threshold, i.e. if the plane fitting error is considered uncritical, the edge information is calculated using Equation (2) . If the edge information is larger than a certain edge information threshold Δ 2 , the presence of an object edge, which is aligned with a superpixel boundary or in close proximity of a superpixel boundary, is detected .

Fig. 8 shows the superpixel fitting error of Equation (1), which was calculated for the image in Fig. 1 and scaled to the luminance. The areas with a higher intensity in the picture show larger fitting errors and perfectly mark superpixels at object borders in the scene. Using the information about object edges as defined in equations (1) and (2), the grouping of superpixels based on color information can be improved. If the goal is to group only superpixels that lie within the contour of an object, the assignment of a superpixel to a group is rejected if the following condition is not true:

A method according to the invention for generating a superpixel cluster for an image is schematically shown in Fig. 9. In a first step a depth map associated to the image is retrieved 10. Then an adapted depth map is generated 11 by fitting planes to areas in the depth map corresponding to superpixels of the image. Once the adapted depth map is available fitting errors are determined 12 for the fitted planes and edge information - is calculated 13 for the superpixels using the adapted depth map. Finally, using at least the fitting errors and the edge information, superpixels are grouped 14 into a cluster. Fig. 10 schematically illustrates an apparatus 20 adapted to implement a solution according to the invention for generating a superpixel cluster for an image. The apparatus has an input 21 for receiving the image, the superpixels for the image, and a depth map associated to the image, e.g. from a network or a local storage 22. Of course, the superpixels for the image as well as the depth map may likewise be generated by dedicated circuitry (not shown) within the apparatus 20. A retrieving unit 23 retrieves 10 a depth map associated to the image. A depth map generating unit 24 then generates 11 an adapted depth map by fitting planes to areas in the depth map corresponding to superpixels of the image. A determining unit 25 determines 12 fitting errors for the fitted planes, whereas a calculating unit 26 calculates 13 edge information for the superpixels using the adapted depth map. Using at least the fitting errors and the edge information, a grouping unit 27 groups 14

superpixels into a cluster. The resulting superpixel cluster is preferably made available for further processing via an output 28. Of course, the different units 23, 24, 25, 26, 27 may likewise be fully or partially combined into a single unit or implemented as software running on a processor. In addition, the input 21 and the output 28 may likewise be combined or partially combined into a single bi-directional interface.