Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR ALIGNING A 3D MODEL OF AN OBJECT WITH AN IMAGE COMPRISING AT LEAST A PORTION OF SAID OBJECT TO OBTAIN AN AUGMENTED REALITY
Document Type and Number:
WIPO Patent Application WO/2022/152778
Kind Code:
A2
Abstract:
The invention relates to a method for aligning a 3D model of an object with an image comprising at least a portion of said object, said method comprising the following steps: defining (12) a capture position of the image; displaying (15) the 3D model simulated from said capture position; identifying (16) remarkable edges in the 3D model, said remarkable edges having the same reference orientation; selecting (17) an edge among the identified remarkable edges from the displayed 3D model; displaying (18) the image captured by the camera; identifying (20) lines of interest in the image captured by the camera, said lines of interest having an orientation identical to said reference orientation; selecting (21) a line, among said lines of interest, corresponding to the selected edge; and aligning (22) and/or transforming the 3D model so that the selected edge matches with the selected line.

Inventors:
MARTINEZ GONZALEZ JAVIER (BE)
VILLERET ANTOINE (BE)
Application Number:
PCT/EP2022/050605
Publication Date:
July 21, 2022
Filing Date:
January 13, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AGC GLASS EUROPE (BE)
International Classes:
G06T19/00
Domestic Patent References:
WO2014114118A12014-07-31
Foreign References:
US10482663B22019-11-19
Other References:
J ALISON NOBLE, FINDING CORNERS, IMAGE AND VISION COMPUTING, vol. 6, 1988, pages 121 - 128
JIANBO SHITOMASI: "Good features to track", PROCEEDINGS OF IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 1994
Attorney, Agent or Firm:
AGC GLASS EUROPE (BE)
Download PDF:
Claims:
CLAIMS

1- Method (10) for aligning a 3D model (40) of an object with an image (50) comprising at least a portion of said object, said image being captured by a camera in order to obtain an augmented reality, said method comprising the following steps:

- defining (12) a capture position of the image (50) relative to the object;

- displaying (15) the 3D model (40) simulated from said capture position (31);

- identifying (16) remarkable edges (41) in the 3D model (40), said remarkable edges (41) having the same reference orientation;

- selecting (17) an edge (43) among the identified remarkable edges (41) from the displayed 3D model (40);

- displaying (18) the image (50) captured by the camera;

- identifying (20) lines of interest (44) in the image (50) captured by the camera, said lines of interest (44) having an orientation identical to said reference orientation;

- selecting (21) a line (48), among said lines of interest (44), corresponding to the selected edge (43); and

- aligning (22) and/or transforming the 3D model (40) so that the selected edge (43) matches with the selected line (48).

2- Method according to claim 1, wherein said method comprises a step of selection (11) of the 3D model (40) among several 3D models and/or a subpart (22) of the 3D model (40) prior to the step of defining (12) the capture position (31).

3- Method according to claim 1 or 2, wherein the step of defining (12) a capture position (31) is carried out on a 2D above representation of the object extracted from the 3D model (40).

4- Method according to claim 3, wherein said method comprises a step of defining (13) an angular position (al-a2) of the image (50) relative to the object using a rotation of the 2D above representation around the capture position (31); the step of displaying (15) the 3D model (40) being simulated from said capture position (31) and said angular position (al- a2). 5- Method according to any of claims 1 to 4, wherein the step of displaying (15) the 3D model (40) simulated from said capture position (31) is realized by extracting (14) the capture features of said camera and by displaying (15) the 3D model (40) with said capture features.

6- Method according to any of claims 1 to 5, wherein in the step of displaying (15) the 3D model (40) simulated from said capture position (31), said 3D model (40) is superposed to the image (50) captured by the camera.

7- Method according to any of claims 1 to 6, wherein said 3D model (40) is built using a polygon mesh (60) where each polygon (A-R) has at least three sides; the step of identifying remarkable edges (41) in the 3D model (40) being carried out by the following sub-steps: a. detecting (16a) the edges of the 3D model (40) formed by the alignment on a same line of at least two sides of two consecutive polygons (A-R); b. selecting (16b) the edges where the line is orientated along said reference orientation; and c. selecting (16c) the edges with a length greater than a threshold value.

8- Method according to any of claims 1 to 7, wherein said reference orientation is vertical.

9- Method according to any of claims 1 to 8, wherein said method comprises a step of calibration (19) of the ground position (46) in the displayed image (50) after the step of displaying (18) the image (50).

10- Method according to any of claims 1 to 9, wherein the step of identifying (20) lines of interest in the image (50) captured by the camera being carried out by the following substeps: a. capturing (51, 61) an image (50) of the object; b. identifying (52, 62) lines of interest (44) in the captured image (50) using a line detection algorithm; c. determining (53, 63) the orientation of the identified lines of interest (44) using an orientation device; and d. filtering (53, 64) the identified lines of interest (44) with the same orientation as said reference orientation. - Method according to claim 10, wherein the step of identifying (52, 62) lines of interest (44) in the captured image (50) is realized using an algorithm chosen among: the Line Segment Detector, a contour detection, Canny, Sobel, plane intersections or Hough transform. - Method according to claim 10 or 11, wherein said orientation device corresponds to an accelerometer, a gyroscope and/or a magnetometer. - Method according to claim 10 or 11, wherein said camera includes at least one intensity sensor. - Method according to claim 10 or 11, wherein said camera includes at least one depth sensor, the step of determination (63) of the orientation of the identified lines of interest (44) being realized using a reprojection of the image captured by the depth sensor in the 3D coordinate system of the 3D model (40) using the camera capture features and said orientation device. -Method according to any of claims 10 to 12 and any of claims 13 to 14, wherein said camera includes at least one depth sensor and at least one intensity sensor, the step of identifying (20) lines of interest (44) being carried out by merging the filtered lines obtained from the at least one depth sensor and the filtered lines obtained by the at least one intensity sensor before the step of selecting (21) the line (48).

Description:
METHOD FOR ALIGNING A 3D MODEL OF AN OBJECT WITH AN IMAGE COMPRISING AT LEAST A PORTION OF SAID OBJECT TO OBTAIN AN AUGMENTED REALITY

Technical field

The present invention relates to the field of augmented reality.

More precisely, the invention relates to a method for aligning a 3D model of an object with an image comprising at least a portion of said object wherein said augmented reality is obtained by the superposition of the 3D model and the image captured by a camera.

Advantageously, compared to existing methods, the invention allows a faster and more accurate alignment of the 3D model on an image, even when a user is moving and the image stream is changing.

The invention finds applications in multiple domains. For instance, the invention can be applied to the domain of engineering and construction where it can help a user to visualize an architecture project and plan for its construction during the preparation phase. It can also help a user during the building phase and afterwards for the verification and maintenance steps.

Background art

Broadly speaking, the world as we see it can be perceived as real or virtual. We view the real world on a daily basis using our sight. A virtual world, on the other hand, is generated by synthetic video, audio and haptic stimulus. Virtual worlds can be further categorized into immersive and non-immersive environments. Immersive environments submerge the user in an artificial world where vision, audition and touch are mainly computer-generated. For instance, immersion can be achieved using Virtual Reality (VR) head-mounted displays or Cave Automatic Virtual Environments (CAVE) systems. VR can be non-immersive as well. For instance, we come across 3D computer games and Computer-Aided Design (CAD) tools where virtual avatars and objects don’t necessarily provide a sense of complete immersion.

Between real and virtual environments, we observe a unique class of environments called Mixed or Augmented Reality (MR or AR). Following this description, AR can be defined as a method of combining real and virtual worlds to obtain an interface that enhances the visual scene normally seen by the user. The overlaid virtual objects can be additive to the natural environment or destructive to the natural environment, meaning that they can mask at least part of it.

AR systems have many applications ranging from learning, team collaboration, and behavioral studies to industrial applications such as Maintenance Repair and Operations (MRO), crew training and robotics.

Overall, AR can help a user in:

- enhancing his perception, cognition and performance, therefore resulting in an overall improvement in his task performance,

- improving his situational awareness and decision making by providing detailed technical information directly in his field of vision,

- guiding him through complex tasks such as technical operation procedures,

- helping him in the diagnosis and isolation of faults in complex systems such as hardware or electrical connections,

- providing him with a virtual tutorial for support and training during assembly of said complex systems, and

- assist him in testing and verifying said complex systems.

For instance, a user can use AR to display the theoretical position of an equipment in the real world. The user can then compare this theoretical position with the real position were said equipment was installed. If the two positions do not match, the user will be able to tell with just a glance and save time by correcting the mistake rapidly.

As another example, AR can be used to show an operator how to perform a procedure, typically a maintenance or service operation. Step-by-step, the system displays the operations to perform directly onto the real equipment. The user can also navigate from one step to another, possibly take pictures and annotate them to create a report of his intervention.

Classically, AR is available via software applications developed for deployment on a wide range of devices such as smartphones or tablets, but it is also more and more used on wearable devices such as glasses or any other optical see-through system. In order to create the augmented reality, the system uses cameras embedded in these devices to capture an image or a video of the real world. The virtual object is generally created beforehand using computer-aided design (CAD). This tool is commonly used to digitally create 2D drawings and 3D models of real -world objects before they are manufactured. With 3D CAD, it is possible to review, simulate, and modify designs easily.

Afterwards, a chosen 3D model is aligned onto the captured video. Alignment is one of the major challenges of AR. It relates to the ability to display 3D augmentations with the exact location, orientation and scale onto the captured objects from the real -world. The quality of the alignment has a huge impact on the end-user experience.

The first requirement for a good alignment is the ability to recognize the target real -word object and its location relative to the user. There are two basic approaches for real -word object recognition: methods using markers and marker-less methods.

Markers are specific visual patterns visible in the real-word. These markers often include additional visually encoded information, which may be used to identify them in the images or video streams. These markers are used for calibration and provide good estimates for the location and orientation of the target object relative to the camera.

For instance, document WO2014/114118 describes a method of detection of two-dimensional markers such as QR codes through a camera video. The two-dimensional markers contain at least position information on a corresponding virtual object. When the markers are detected and read, the system can align the virtual object with reality using the position information.

However, this technique suffers from several drawbacks. First, it requires adding markers to the reality, which is sometimes not practical. Moreover, these markers can become dirty or be torn out, making them unusable. Finally, this technique can be quite inaccurate since it largely relies on the correct positioning of a small markers in the environment.

Other techniques allow recognizing a target object without needing markers: pattern-based recognition or structural recognition. Pattern-based recognition refers to a technique that requires the target object to have a well recognizable texture, such as a specific pattern on a tissue. The recognition of the object is then based on finding this specific texture on the real- word object. In commercial applications, this approach is for instance used to augment product catalogues. Another approach is to recognize an object using simple shapes. For instance, document US10482663 discloses an object-recognition engine which is configured to compare objects captured by the camera with a plurality of objects stored in a database. For each captured object, the engine tries to match the captured object with the shape of an object stored in the database. However, this shape-based approach becomes problematic when the object cannot be captured in its entirety.

Another approach overcoming this drawback is to recognize an object based on remarkable points. With this approach, the 3D model can be aligned over the captured image from the reality by superposing the remarkable points from the 3D model with the corresponding remarkable points in the captured image.

A well-known approach for aligning a 3D model with reality is to use a corner as the remarkable point. Thus, the alignment method consists in finding a corner in the 3D model and aligning it with a corresponding corner in the captured image of the real world. For instance, in order to find corners is the 3D model, it is possible to project the 3D model on a plane and use a comer detection algorithm. An example of corner detection on a 2D projected image is disclosed on the scientific publication: J Alison Noble, Finding corners, Image and Vision Computing, Volume 6, Issue 2, 1988, Pages 121-128.

This method using a comer as a remarkable point is mainly employed because the detection of corners in the 3D model is simple. However, the identification of the corresponding corner in the captured image of the real world is often a hard task for the user. Indeed, given the current devices capabilities, this whole process is carried out manually, thus requiring the user to manually adapt the position and the direction of the capture device in order to align the selected corner from the 3D model onto the corresponding corner in the captured image of the real world. Typically, the whole alignment process may take around 30 seconds or more.

Thus, the technical problem to be solved is how to develop a faster alignment method in order to match the 3D model with reality, which does not require as many manipulations from the user as existing methods.

Disclosure of the invention

In order to solve this technical problem, the invention proposes to use a line with a predetermined orientation as a remarkable indicator. Matching lines is a very old technique that was abandoned because the current 3D models are getting more and more complex and contain too many different lines. As a consequence, it is very difficult for a user to select a specific line among all these lines. For instance, there are typically millions of lines in a 3D model of a building.

Moreover, current lines extraction algorithms are not robust in the sense that it is hard to extract the same lines in two images of the same object taken from a different point of view or with a different luminosity. Furthermore, lines are a very poor descriptor because they contain little information and they are sensitive to rotations, scaling and perspective.

The literature is abundantly clear about the shortcomings of such methods and they are generally dropped in favor of corners, projected floors, or specific feature points as a starting point for 3D model to captured image alignment. See for instance the scientific publication: Jianbo Shi and Tomasi, "Good features to track" 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.

All these drawbacks are overcome using a predetermined orientation of the line as a remarkable indicator. Indeed, for instance, if only the vertical lines are selected, the number of displayed lines will therefore be limited, and the user will be able to more easily and rapidly choose the line he wishes to use as a remarkable indicator. Moreover, it is also easier to detect a specific line in a captured image when its orientation is previously known. Thus, knowing the predetermined orientation of the lines of interest increases the robustness of lines extraction algorithms so that it becomes more accurate than comer detection algorithms for instance.

Nowadays, corner detection in a captured image is carried out manually. Hence, the identification of lines with a selected orientation is a faster method because the method can be carried out automatically. Moreover, the automatization of corner detection in an image is a very difficult task because it requires finding the orientation of the identified angle in the 3D space. If this orientation is not correctly determined, the alignment can be completely wrong. Thus, automatizing the detection of lines with a selected orientation has a lower error risk than automatizing the detection of angles.

All these features contribute to improve the overall user-end experience and reduce the time of alignment, typically around 10 seconds. In other words, the invention relates to a method for aligning a 3D model of an object with an image comprising at least a portion of said object and captured by a camera in order to obtain an augmented reality, said method comprising the following steps:

- defining a capture position of the image relative to the object;

- displaying the 3D model simulated from said capture position;

- identifying remarkable edges in the 3D model, said remarkable edges having the same reference orientation;

- selecting an edge among the identified remarkable edges from the displayed 3D model;

- displaying the image captured by the camera;

- identifying lines of interest in the image captured by the camera, said lines of interest having an orientation identical to said reference orientation;

- selecting a line, among said lines of interest, corresponding to the selected edge; and

- aligning and/or transforming the 3D model so that the selected edge matches with the selected line.

According to the invention, a camera is a device capable of recording an image of an object. A camera may include at least one sensor that detects the information used to make the image. For instance, the sensor may be an intensity sensor or a depth sensor. The camera may includes several sensors of the same or different way of recording an image (intensity and / or depth sensor). The camera may be embedded into a device such as a tablet or a smartphone.

Such method may also comprises a step of selection of the 3D model among several 3D models and/or a subpart of the 3D model prior to the step of defining the capture position.

Indeed, the 3D model might be a large project such as for instance a building that can be separated into smaller projects such as floors or rooms. Hence, it might be easier for a user to select a subpart of the 3D model in order to find an exploitable remarkable edge. In fact, selecting a smaller project will reduce even more the number of selectable edges and thus reduce the time for a user to select an edge among all the selectable edges.

According to an embodiment of the invention, the step of defining a capture position is carried out on a 2D above representation of the object extracted from the 3D model. This method relies on a simplified global representation of the object. It is indeed easier for a user to navigate in a 2D representation rather than in a 3D representation. The user will thus save time in selecting his capture position and reduce the overall process duration.

For instance, if the object is a building, the user might be more comfortable in identifying his position on the map of a floor, rather than on a 3D model. Indeed, a user is more used to read maps and it is more difficult to navigate into a representation with three dimensions rather than into a representation with only two dimensions.

According to another embodiment of the invention, the method comprises a step of definition of an angular position of the image relative to the object using a rotation of the 2D above representation around the capture position; the step of displaying the 3D model being simulated from said capture position and said angular position.

In other words, the user can rotate the 2D above representation in order to match it with his viewpoint. Indeed, the user generally knows his position in an environment and he is capable of orientating the 3D model relative to his position in order to ease the match between the 3D model and the reality. Moreover, the definition of an angular position helps in the displaying process as the system does not need to test every angular position in order to find the best way to display the 3D model and which part of it to display. Thus, this additional step saves computing time.

To be more precise, the step of displaying the 3D model simulated from said capture position is realized by extracting the capture features of said camera and by displaying the 3D model with said capture features.

According to the invention, the capture features are the intrinsic and extrinsic parameters. The intrinsic parameters are the parameters intrinsic to the camera itself, such as the focal length and lens distortion. On the other hand, the extrinsic parameters are the parameters used to describe the transformation between the camera and its external world.

Indeed, a camera may display the captured reality in a distorted way compared to our vision. A 3D model displayed using the camera capture features will present the same distortions as the camera. Thus, it will be easier for a user to compare the 3D model with the captured image and to identify objects with similarities if they are distorted the same way. Therefore, the user will be quicker to select a remarkable edge from the 3D model to match with a line from the reality. In order to ease even more this process, in the step of displaying the 3D model simulated from said capture position, said 3D model is superposed to the image captured by the camera.

This feature allows the user to check if the alignment process is carrying on smoothly. Indeed, if the user sees that the displayed 3D model does not match at all with the captured reality, he might react more quickly using this feature because he would be able to see the differences in a single glance. In the contrary, if the 3D model was not displayed on the captured reality, the user might forget more easily what the 3D model looks like and try to match two complete different objects.

According to another embodiment of the invention, said 3D model is built using a polygon mesh where each polygon has at least three sides; the step of identifying remarkable edges in the 3D model being carried out by the following sub-steps: a. detecting the edges of the 3D model formed by the alignment on a same line of at least two sides of two consecutive polygons; b. selecting the edges where the line is orientated along said reference orientation; and c. selecting the edges with a length greater than a threshold value.

Mesh modeling is a type of modeling used to builds 3D objects out of smaller components such as polygons. Each polygon is a completely flat shape that is defined by the position of its points and connecting edges. Very complicated models of any shape can be built completely out of polygons. The precision and fidelity of the model can be tuned by increasing or decreasing the number of polygons in the model. Indeed, polygons are flexible and can be rendered quickly by computers.

In a preferred embodiment, the reference orientation of the edges and lines used as a remarkable indicator is vertical. Indeed, vertical lines are more outstanding and reliable than horizontal lines in the natural environment.

In another embodiment, said method comprises a step of calibration of the ground position in the displayed image after the step of displaying the image.

In other words, the user is asked to select the ground in the real image captured by the camera. This step helps in positioning the 3D model height relative to the captured image.

In a specific embodiment, the step of identifying lines of interest in the image captured by the camera being carried out by the following sub-steps: a. capturing at least an image of the object; b. identifying lines of interest in the captured image using a line detection algorithm; c. determining the orientation of the identified lines of interest using an orientation device; and d. filtering the identified lines of interest with the same orientation as said reference orientation.

Preferably, the step of identifying lines of interest in the captured image is realized using an algorithm chosen among: the Line Segment Detector, a contour detection, Canny, Sobel, plane intersections or Hough transform. In this application, these algorithms are particularly efficient in order to detect edges in an image.

More precisely, the orientation device used to determine the orientation of the identified lines of interest corresponds to an accelerometer, a gyroscope and/or a magnetometer.

For instance, the accelerometer can give information on acceleration forces. Such forces may be static, like the continuous force of gravity. A dynamic accelerometer can thus measure gravitational pull to determine the angle at which an object is tilted with respect to the Earth. This feature is used to determine the orientation of the identified lines of interest in the captured image. As another example, the gyroscope uses Earth's gravity to help determine orientation. The magnetometer measures the direction, strength, or relative change of a magnetic field. Vector magnetometers in particular have the capability to measure the component of the magnetic field in a specific direction, relative to the spatial orientation of the camera.

According to the type of the sensor, the method according to the invention may be carried out differently.

According to a first embodiment of the invention, said camera includes at least one intensity sensor.

For instance, a RGB sensor is an intensity sensor. It contains an array of photodiodes covered by red, blue and green filters. The photodiodes receive the light coming from the object and convert it into a current in order to estimate the intensity of each color. The resulting image does not contain information on the orientation of lines in the 3D camera space, which is why this embodiment requires using an orientation device to select lines with a specific orientation.

According to another embodiment of the invention, said camera includes at least one depth sensor, the step of determination of the orientation of the identified lines of interest being realized using a reprojection of the image captured by the depth sensor in the 3D coordinate system of the 3D model using the camera capture features and said orientation device.

A depth sensor is a sensor capable of measuring distances between the sensor and a particular point of an object. It can generally be achieved by measuring the distortion and duration of a returning electromagnetic wave compared to the emitted one. This information can therefore be used to retrieve a 3D representation of the object.

For instance, a lidar sensor is a depth sensor capable of measuring distances by illuminating a target object with a laser light and measuring the reflection. Difference in laser travel time and wavelength can be used to retrieve a 3D representation of the object.

Thus, the resulting image already contains information on the orientation of lines in the 3D camera space. Therefore, this type of sensor allows using the depth information contained in the image in order to create a 3D representation of the image and retrieve the orientation of a line in the 3D camera space. The orientation device is required during this step in order to determine the orientation of the depth sensor with respect to the reality.

This method is advantageously quicker than other methods, but the definition of depth sensor is currently relatively low, which complicates lines detection.

Contrary to depth sensors, intensity sensors have nowadays a higher definition and the line detection is more precise. However, depth sensors give depth information. Which is why, in a preferred embodiment, said camera contains both at least one depth sensor and at least one intensity sensor, the step of identifying lines of interest being carried out by merging the filtered lines obtained from the at least one depth sensor and the filtered lines obtained by the at least one intensity sensor before the step of selecting the line.

In this embodiment, it is possible to combine the information coming from both sensors in order to improve the robustness of the line detection. Moreover, it is possible to compare and merge the lines identified on both depth and intensity sensors images in order to improve the reliability of the line detection.

Brief description of the figures

The different aspects of the present invention will now be described in more details, with reference to the appended drawings showing various exemplifying embodiments of the invention, which are provided by way of illustration and not of limitation. The drawings are a schematic representation and not true to scale. The drawings do not restrict the invention in any way. More advantages will be explained with examples.

Figure 1 is a block diagram of the method for aligning a 3D model of an object with an image captured by a camera according to an embodiment of the invention;

Figure 2 is a representation of the step of selection of the 3D model among several 3D models and/or a subpart of the 3D model of the method from figure 1;

Figure 3a is a representation of the steps of definition of a capture position of the method from figure 1;

Figure 3b is a representation of the steps of definition of an angular position of the method from figure 1;

Figure 4 is a representation of the step of displaying the 3D model of the method from figure 1;

Figure 5 is a representation of a mesh of a 3D model according to an embodiment;

Figure 6 is a representation of the step of selection of a remarkable edge in the 3D model of the method from figure 1 ;

Figure 7 is a representation of the step of selection of a line of interest in the image from reality of the method from figure 1 ;

Figure 8 is a representation of the step of calibration of the ground position in the image from reality of the method from figure 1 ;

Figure 9 is a block diagram of the line detection algorithm depending on the nature of the image sensor;

Figure 10 is a representation of different position of the camera compared to the object and the resulting captured images;

Figure 11 is a representation of the triangulation process using the camera position and captured images from figure 10; and

Figure 12 is a representation of the global process of 3D alignment in the case of a building project. Detailed description

For a better understanding, the scale of each component in the drawings might be different from the actual scale. Figure 1 is a block diagram presenting the main steps of the method 10 for aligning a 3D model 40 of an object with an image 50 of the real word captured by a camera, according to an embodiment of the invention.

In the following description, unless otherwise specified, expression “substantially” mean to within 10%, preferably to within 5%.

Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps. Where an indefinite or definite article is used when referring to a singular noun e.g. "a" or "an", "the", this includes a plural of that noun unless something else is specifically stated.

The method 10 may be hosted by an application that the user can install on his phone, tablet or any dedicated device. The device shall be equipped with at least a camera in order to capture the environment and render an augmented version of it. To this end, the environment shall contain at least part of an object that is to be augmented. The method 10 according to the invention will then be able to recognize a particular feature of this object in the 3D model 40 and in the reality and match the two in order to create an augmented reality.

As shown in figure 2, the method 10 according to the invention includes a first step 11 of selection of a project 21. The project 21 contains at least a 3D model 40 of an object generated using a dedicated software. The 3D model 40 is a representation of the object using a collection of points positioned in a 3D space, also called a point cloud. The points may be connected by various geometric entities such as polygons, lines or curved surfaces. For instance, the 3D model 40 is built using a mesh 60 of triangles A-R such as illustrated in figure 5.

In the case of a building project 21, as shown in figure 12, a designer may create a Building Information Modeling (BIM), which is a process used commonly in the construction field to manage the entire life of the building, from its creation to its dismantling. A BIM is supported by various tools involving the generation and management of digital representations of physical and functional characteristics of the building project 21. The BIM also includes one or several 3D models 40 of the building. The project 21 may be stored on the memory of the user’s device or on the cloud and be accessed using said device or with any type of device including an internet connection and a web browser. This solution allows to access the BIM at any time and to synchronize in real-time with the BIM project as soon as an update is made.

As some projects 21 may be very large, they can sometimes be divided into subparts. For instance, a building project 21 may be separated into subprojects 22, each subproject 22 containing a 3D model 40 of a floor of the building. For instance, as illustrated in figure 2, the application can display a list of the subprojects 22 and the user may therefore select the subproject 22 he needs to display.

Furthermore, the method 10 according to the invention also includes a second step 12 of definition of a capture position 31 of the image 50 relative to the object. In other words, because the user is usually holding the device containing a camera that is capturing an image 50 of the environment, the user is asked to define his position relative to the object that needs to be augmented. For instance, in the case of a building project 21, the user is asked to define his position in the building.

For this purpose, the application may display the selected subproject 22 in three dimensions. The user can then navigate into the 3D model 40 using a control pad. But, as it might be difficult to navigate into the 3D model 40 of an entire floor, the application may also display a 2D above representation of the object. For instance, as illustrated on figures 3a and 3b, in the case of a building project 21, a map 30 of the floor can be displayed to the user. The user can therefore select on the map 30 his position 31.

Optionally, the method 10 may also ask the user to define 13 his angular position al- a2 relative to the displayed object. In other words, the user is asked to define how the object is oriented relative to his position 31, so that the selection of the resulting edge can be achieved via the device in 3D. For instance, in the case of a building project 21, the user may define how the 3D model of the floor is positioned relative to the direction he is facing. In order to define this angular position al- a2, the user may rotate the 2D above representation around his capture position 31 as shown in figures 3a and 3b.

Moreover, the method 10 may also extract 14 the capture features of the camera by which the scene is captured. Indeed, depending on the type of the camera, the displayed image 50 of the environment can change a lot. For instance, a wide-angle video camera captures images with a fisheye effect, meaning that the image 50 is distorted and all the objects are drawn in perspective from the center. The capture features may include intrinsic parameters of the camera, such as the focal length and lens distortion and extrinsic parameters of the camera, used to describe the transformation between the camera and the real world. As the 3D model 40 and the image 50 from the reality will ultimately be superposed, the model extracts 14 the capture features of the camera in order to display 15 the 3D model 40 with said capture features.

Thus, the fifth step of the method 10 is to display 15 to the user the 3D model 40 using the defined position of capture 31, the defined angular position al- a2 and the capture features of the camera.

For instance, if a user is localized in a room of the building, facing a wall of the room where a window needs to be installed, the user will see, displayed on the screen of the device, a 3D model 40 of this wall containing said window to install.

Optionally, as shown in figure 4, the 3D model 40 is superposed to the image 50 captured by the camera in order for the user to be able to compare the reality and the 3D model 40 on the screen of his device without having to check with his eyes what the reality looks like. Indeed, as mentioned previously, his vision does not have the same capture features as the camera and the wall that he sees in the reality might not look the same through the camera lens.

Once the model is displayed, the next step of the method 10 is to identify 16 remarkable edges 41 in the 3D model 40. A remarkable edge 41 is for instance aline separating two adjacent walls or a wall and the ceiling. The edges 41 are identified using the mesh 60 of the 3D model 40.

For example, as shown on figure 5, if the mesh 60 is made out of triangles A-R, the edges 41 are identified by detecting the alignment of consecutive triangles A-R on a same line. For instance, on figure 5, triangle K and O are adjacent triangles, meaning that they share a side. The straight line passing by this edge also passes by the side shared by triangles L and M. Thus, a line passing by two consecutive sides of adjacent triangles can be identified as an edge 41. However, this edge might not be the longest edge there is. Thus, only the longest edges 41 are selected and presented to the user. For instance, in figure 5, the edge 41 goes from triangle O to triangle R. Moreover, the identified edges 41 can be filtered depending on their size. Only the edges 41 with a length greater than a threshold value may be displayed to the user.

A further step is to select the edges 41 orientated along a same reference orientation. The reference orientation can either be predetermined or be chosen by the user depending on the nature of the 3D model 40. For instance, only horizontal lines can be chosen or vertical lines, or lines with a predetermined angle or a range of predetermined angles with the vertical, for instance the lines between 40 and 60° angle with the vertical.

The resulting identified edges 41 are displayed on the 3D model 40. The user is then invited, by a message 42 appearing on the screen for instance, as shown in figure 6, to select 17 an edge on the 3D model 40. If several edges are present in the selected area, the closest edge to the selected zone is selected. The selected edge 43 may be highlighted compared to the other edges in order for the user to check if the right edge was selected.

The next step 18 of the method 10 consists in displaying the camera feed. This feed can either be a static image or a video. Optionally, the user may be asked to select the ground position 46 in the image 50 by a message 47 appearing on the screen for instance, as shown in figure 8. This step is used to calibrate 19 the ground position 46 in order to ease the process of alignment of the 3D model 40 on the captured image 50. Indeed, the ground position 46 will help in positioning the height of the 3D model 40. For instance, for a 3D model 40 of a building, the calibration step 19 will help in positioning the walls of the room relative to the ground so that they do not appear to float.

Then, the image 50 is processed in real time in order to identify 20 lines of interest 44.

Depending on the nature of the image sensor, the image processing may be carried out differently. Indeed, many types of sensors can be used to obtain an image of an object. For instance, it is possible to use RGB sensors, infrared sensors, visible light sensors, hyperspectral sensors, multi-lens sensors, structured light sensors, lidar sensors...

Overall, these sensors can be divided into two categories:

- a first type of sensors, called depth sensors, that contains information on the position of the lines of interest 44 in the 3D camera space. For instance, a lidar sensor is a depth sensor, and

- a second type of sensor, called intensity sensor, containing no information on the position of the lines of interest 44 in the 3D camera space. For instance, a RGB sensor is an intensity sensor.

Figure 9 is a block diagram of the image processing method depending on the nature of the sensor. The left part of the block diagram is dedicated to the treatment of an image captured by at least one intensity sensor. The right part of the block diagram is dedicated to the treatment of a map captured by a depth sensor. As an example, an intensity sensor is either a RGB Ultra-Wide sensor with the following features: 1920 x 1440 pixels, 24 bits, 60 fps, a fixed focus of 13mm or/and a RGB Wide sensor with 1920 x 1080 pixels, 24 bits, 60 fps, an auto focus of 29mm. On the other hand, the depth sensor may be a Lidar sensor with 256 x 192 pixels, 32 bits, 60 fps, and a fixed focus. More generally, the invention may be applied to any type of sensor embedded in a mobile device.

In a first example, the sensor may include only an intensity sensor. In this case, the first step of identification 20 of lines of interest 44 is to take 51 an intensity image 50 of the object. An intensity image is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its intensity value. The lines of interest 44 are characterized by a sharp changes in the image brightness. Thus, in order to identify these lines of interest 44, the image 50 is processed 52 using edges detection algorithms such as the Line Segment Detector algorithm, a contour detection algorithm, the Canny algorithm, the Sobel algorithm or a plane intersections algorithm. These algorithms detect variations in the pixels values by using derivatives, thresholds and/or various other methods.

Once the algorithm applied, it is possible to extract the (x,y) coordinates of the lines of interest 44. Among them, lines shorter than a threshold value, skewed lines or lines that are not temporally robust are filtered.

However, the lines of interest 44 identified in the intensity image are a projection from the 3D space on a 2D image. Thus, it is impossible to know their position in the 3D space without further information. In order to filter 53 the remaining lines to select only the lines with the same 3D orientation as the selected edge 43, the orientation device contained in the camera is used. This orientation device may either be an accelerometer, a gyroscope and/or a magnetometer. For instance, an accelerometer can measure gravitational pull to determine the angle at which an object is tilted with respect to the Earth. This feature is used to determine the lines of interest 44 that are collinear to the selected edge 43 in the captured image 50.

In the case of a vertical orientation for the selected edge 43, a gravity vector, obtained thanks to the orientation device, is first projected from the optical axis at infinity. For this purpose, the gravity vector is placed at infinity along the optical axis, then, said gravity vector is scaled to be infinite in size. Afterwards, the gravity vector is projected back onto the sensor. The point where the projected gravity vector encounters the sensor is called central horizon point.

In a further step, the identified lines of interest 44 are compared to this central horizon point. The lines of interest 44, which are often segments of lines are extended in order to see if they cross with the projected gravity vector at substantially the central horizon point position, meaning within 5 pixels and preferably within 2 pixels. If this is the case, the lines of interest 44 are vertical.

As another example, the sensor may include only a depth sensor such as a lidar sensor. In this case, the first step of identification 20 of the lines of interest 44 is to capture 61 an image of the object. For a lidar sensor, it is possible to capture a depth map and/or a confidence map of the object.

To be more precise, the depth map is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its value, which is the average distance traveled by a light beam in order to be reflected by a specific area of the object. On the other hand, a confidence map is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its confidence value. It helps in estimating the signal quality. Indeed, depending on the nature of the object and the material it is made of, the light used to measure the object depth can be reflected differently and even be deviated. Thus, the confidence map measures the intensity of the signal until saturation of the sensor for each point of the object.

In a further step, the depth image is processed 62, using a line detection algorithm such as for instance the Hough transform algorithm, in order to detect the lines of interest 44 by finding huge variations of depth from one pixel to another. Afterwards, the (x,y) coordinates of the lines of interest 44 are extracted.

Following this step, the depth image is reprojected 63 in three dimensions using the information contained in the depth image. Moreover, the orientation device is used to determine how the camera is tilted with respect to the real world. This reprojection 63 is preferably carried out using the camera parameters in order to adapt the 3D representation to the camera features and deformations.

The identified lines 44 are then filtered 64 in order to select only the lines 44 with the same orientation as the selected edge 43, knowing their orientation in the 3D space thanks to reprojection step 63.

The resulting identified lines 44 are displayed 20 on the captured image 50. In both cases, the user is then invited, by a message 45 appearing on the screen for instance, as shown in figure 7, to select 21 a line on the image 50 from reality. Otherwise, the edge can be selected by the user by moving the camera until the selected edge is aligned on the real edge. If several lines 44 are present in the selected area, the closest line 44 to the selected zone is selected. The selected line 48 may be highlighted compared to the other lines 44 in order for the user to check if the right line was selected. In order to ease the process of selection of a line from the real image 50, the 3D model 40 is preferably hidden during this step.

In the case of an image 50 captured by an intensity sensor, the process includes additional steps. Indeed, the selected line 48 is tracked 54 to compute its projection from a 2D image 50 space to a 3D model 40 space. In fact, several lines from the 3D real world may have the same projection onto a certain plane. In other words, with an image taken from a single point of view, it is impossible to determine the 3D orientation of this selected line 48. The projection can either be computed via automatic plane detection of the ground, ceiling or manual operation by the user.

It is also possible to track the line 48 by asking the user to move away from his position 31 in order to triangulate the depth of the line 48. Indeed, taking several images of the same line 48 from different points of view will disambiguate the line 48 position in the 3D space. For instance, figure 10 shows an example of three images k-1, k, k+1 taken during the user’s movement around the object. Each captured image k-1, k, k+1 shows the object from a different point of view. The first information required is to determine the position of the camera relative to the line 48 for every captured image k-1, k, k+1. For example, the camera position can be determined using the internal position tracker of the device.

Once this position known, it is possible to triangulate 55 the line 48 position using the information within the different images taken from different camera positions, as described above. Indeed, as shown in figure 11, when two sensors Ol, 02 observe an object point P, the projection of the centers of the sensors Ol, 02 and the considered object point P define a triangle. Within this triangle, the distance between the sensors is the base b and it is known thanks to the camera positions. By determining the angles y, between the projection of the centers of the sensors Ol, 02 and the base b, it is possible to calculate, using triangular relations, the object point P coordinates, and thus the 3D coordinate of the object by calculating the position of multiple object points P.

In a preferred embodiment, the camera may include two RGB sensors separated from a fixed known distance d. It is therefore possible to use the two images captured by the two RGB sensors in order to triangulate the object position without requiring the user to move around. Similarly, the camera may include more than two RGB sensor and/or other types of sensor such as hyperspectral sensors, multi lens sensors or structured light sensors in order to get a better triangulation of the object position.

As soon as the triangulation 55 is correctly carried out, the process goes on to the next step, which is the alignment 22 of the 3D model 40 with the reality 50.

In a preferable embodiment of the invention, the camera includes at least both an intensity sensor and a depth sensor. This embodiment may allow to compare the lines identified on both captured images and to confirm or deny the existence of an identified line 44.

Considering the current resolution of depth sensors, it is possible to combine the huge resolution of the intensity sensor with the accurate depth measure of the depth sensor. Indeed, as mentioned before, nowadays, a depth sensor has a smaller resolution than an intensity sensor. Typically, a depth sensor has a resolution that is a hundred time smaller than an intensity sensor. However, in order to use the depth information from the depth sensor to determine the 3D projection of the intensity captured image, the resolution between the two sensors must be adapted. The adaptation may take into account the parameters of both sensors and the distance between the two sensors.

As a result, the pixels from the intensity captured image include depth information coming from the depth sensor, and thus information on the orientation of the selected lines 44. This embodiment may therefore allow to omit the steps of tracking 54 and triangulating 55 the object on the captured image from the intensity sensor.

It is therefore possible to merge the results of both sensors in order to improve the recognition of the 3D position of each point of the object.

Finally, the alignment step 22 consists in aligning the selected edge 43 from the 3D model 40 with the selected line 48 from the captured image 50 via rigid transformation, meaning that the line is only rotated and translated and not rescaled. The obtained transformation is also applied to the rest of the 3D model 40.

Once the alignment process is complete, if the user moves around the object, the 3D model can stay aligned with the camera image feed using a detection of the movement of the object and/or a detection of the movement of the capture device.

However, during the movement of the user, there is still a risk of misalignment of the 3D model with the camera image feed. If this problem occurs, the process of alignment must be restarted. To conclude, the invention is an alignment method for matching a 3D model with reality, that does not require too many manipulations from the user and that is faster than existing methods. Thus, the invention improves the user-end experience during the first step of alignment of an AR system but also during the step of realignment.