Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SELECTIVE LOCALIZATION AND MAPPING OF A PARTIALLY KNOWN ENVIRONMENT
Document Type and Number:
WIPO Patent Application WO/2023/204740
Kind Code:
A1
Abstract:
There is provided mechanisms for localizing and mapping a partially known environment for a device. A method is performed by an image processing device. The method comprises obtaining sensor data, as sensed by the device, of the environment in which the device is located. The method comprises performing localization of the device as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device with respect to a previously computed partial map of the environment. The method comprises determining that the localization fails. The localization has failed due to a failure having a failure type. The failure has lasted during an elapsed failure time. The method comprises determining to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment.

Inventors:
MATEUS ANDRÉ (SE)
ARAÚJO JOSÉ (SE)
CUBERO PAULA (SE)
HERNANDEZ SILVA ALEJANDRA (SE)
GOMEZ BLAZQUEZ CLARA (SE)
Application Number:
PCT/SE2022/050381
Publication Date:
October 26, 2023
Filing Date:
April 19, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G06T7/73; G01C21/16; G05D1/02; G06T7/246; G06T17/05; G06V10/74; G06V20/10
Domestic Patent References:
WO2004059900A22004-07-15
WO2021121564A12021-06-24
Foreign References:
US10390003B12019-08-20
US11069082B12021-07-20
US11037320B12021-06-15
US20200047340A12020-02-13
US10395117B12019-08-27
US20120106828A12012-05-03
EP3112969A12017-01-04
US20170278231A12017-09-28
Attorney, Agent or Firm:
LUNDQVIST, Alida (SE)
Download PDF:
Claims:
CLAIMS

1. A method for localizing and mapping a partially known environment (150) for a device (110), the method being performed by an image processing device (140, 200, 500, 600), the method comprising: obtaining (S102) sensor data, as sensed by the device (110), of the environment (150) in which the device (110) is located; performing (S104) localization of the device (110) as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device (110) with respect to a previously computed partial map of the environment (150); determining (S108) that the localization fails, wherein the localization has failed due to a failure having a failure type, and wherein the failure has lasted during an elapsed failure time; and determining (S110) to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment (150).

2. The method according to claim 1, wherein the method further comprises: determining (S112) that the localization is successful for at least the M latest obtained frames of sensor data, and when having done so: performing (S116) the localization by operating the odometry algorithm with a default set of odometry parameters.

3. The method according to claim 1 or 2, wherein performing localization of the device (110) involves matching the sensor data to points in the partial map, and wherein the method further comprises: obtaining (S106) an indication of number of failed matches and/or discontinuities of the incrementally estimated pose, and wherein that the localization fails is determined as a function of the number of failed matches and/or discontinuities of the incrementally estimated pose.

4. The method according to any preceding claim, wherein a first failure type is defined by the number of matches being below a threshold value for an elapsed failure time corresponding to M consecutive frames of sensor data, and wherein a second failure type is defined by a median number of matches for an elapsed failure time corresponding to N > M consecutive frames of sensor data being below the threshold value or the pose estimate being discontinuous, and wherein the failure type is either of the first failure type or of the second failure type.

5. The method according to any preceding claim, wherein the localization was last successful for the sensor data up to and including frame j, and wherein the discontinuities of the incrementally estimated pose are determined by rotation and translation errors between relative transformations of the incrementally estimated pose for the sensor data up to an including frame j and the incrementally estimated pose for the sensor data up to an including frame k > j being larger than rotation and translation error thresholds.

6. The method according to any preceding claim, wherein the frames of sensor data are represented by image frames of the environment (150), wherein the localization was last successful for the sensor data up to and including image frame j, and wherein the discontinuities of the incrementally estimated pose are determined by a reprojection error of a transform being performed from the localization for the sensor data of image frame k > j to points in image frame j being larger than an error threshold.

7. The method according to claim 4, wherein the odometry parameters of the odometry algorithm are adapted when the failure is of the first failure type.

8. The method according to claim 4, wherein the building on the new map is started when the failure is of the second failure type.

9. The method according to claim 4 or 8, wherein the new map is built at least based on the N > M consecutive frames of sensor data being below the threshold value when the medium number of matches for the elapsed failure time corresponding to N > M consecutive frames of sensor data is below the threshold value. io. The method according to any preceding claim, wherein the new map is built on by performing simultaneous localization and mapping, SLAM, on the frames of sensor data. n. The method according to any preceding claim, wherein the method further comprises: fusing (S114) the new map with the previously computed partial map.

12. An image processing device (140, 200, 500) for localizing and mapping a partially known environment (150) for a device (110), the image processing device (140, 200, 500, 600) comprising processing circuitry (510), the processing circuitry being configured to cause the image processing device (140, 200, 500, 600) to: obtain sensor data, as sensed by the device (110), of the environment (150) in which the device (110) is located; perform localization of the device (110) as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device (110) with respect to a previously computed partial map of the environment (150); determine that the localization fails, wherein the localization has failed due to a failure having a failure type, and wherein the failure has lasted during an elapsed failure time; and determine to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment (150).

13. An image processing device (140, 200, 600) for localizing and mapping a partially known environment (150) for a device (110), the image processing device (140, 200, 500, 600) comprising: "2-1 an obtain module (6io) configured to obtain sensor data, as sensed by the device (no), of the environment (150) in which the device (no) is located; a localization module (620) configured to perform localization of the device (110) as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device (110) with respect to a previously computed partial map of the environment (150); a determine module (640) configured to determine that the localization fails, wherein the localization has failed due to a failure having a failure type, and wherein the failure has lasted during an elapsed failure time; and a determine module (650) configured to determine to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment (150).

14. The image processing device (140, 200, 500, 600) according to claim 12 or 13, further being configured to perform the method according to any of claims 2 to 11.

15. A computer program (720) for localizing and mapping a partially known environment (150) for a device (110), the computer program comprising computer code which, when run on processing circuitry (510) of an image processing device (140, 200, 500), causes the image processing device (140, 200, 500) to: obtain (S102) sensor data, as sensed by the device (110), of the environment (150) in which the device (110) is located; perform (S104) localization of the device (110) as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device (110) with respect to a previously computed partial map of the environment (150); determine (S108) that the localization fails, wherein the localization has failed due to a failure having a failure type, and wherein the failure has lasted during an elapsed failure time; and determine (Siio) to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment (150).

16. A computer program product (710) comprising a computer program (720) according to claim 15, and a computer readable storage medium (730) on which the computer program is stored.

Description:
SELECTIVE LOCALIZATION AND MAPPING OF A PARTIALLY KNOWN ENVIRONMENT

TECHNICAL FIELD

Embodiments presented herein relate to a method, an image processing device, a computer program, and a computer program product for localizing and mapping a partially known environment for a device.

BACKGROUND

In general terms, simultaneous localization and mapping (SLAM) refers to the computational problem of constructing, or updating, a map of an unknown environment whilst simultaneously keeping track of a device’s location within it.

Implementation of SLAM algorithms commonly comprises a front-end and a back- end. The front-end is responsible for odometry estimation, i.e., the displacement between consecutive sensor measurements (e.g., images, point-clouds, etc.). The current pose estimate is then given from integration of the computed displacements. Since odometry is prone to drift, the back-end is responsible for not only building the map from the poses estimates, but also for reducing drift by finding loop closures. The front-end is usually run at the sensor frame rate (i.e., the rate at which sensor data of the environment is acquired by a sensor of the device), whilst the back-end is more time and power consuming, therefore operating only on a smaller set of sensor data, referred to as keyframes. In general terms, SLAM algorithms are considered computationally heavy and energy consuming, especially the back-end, which is responsible for the map building.

Even though alternatives have been presented to reduce the growing rate of the map, and to relax the pose graph (e.g., by performing sparsification and pruning), the power consumption is still high. This is an issue for power-constrained devices, such as battery-operated devices, such as handheld communication devices. Some algorithms provide the ability to run re-localization to handle with tracking/odometry failures (e.g., ORB-SLAM). However, such re-localization algorithms are reliant on the ability to match sensor data to map data with appearance-based methods. For example, in ORB-SLAM this is achieved by querying an image database to find similar images. Nevertheless, such re-localization algorithms are prone to symmetry in the environment. As an illustrative example, consider two hallways with the same wallpaper and objects on its sides. If a map is built based on sensor data acquired by a first device in one of the hallways and then a second device is trying to perform localization in the same map, but entering the other hallway, there exist correspondences between the map of the previous hallway with the one being visited now. This can result in the pose estimate of the localization mission to jump from one hallway to the other, creating a geometric inconsistency.

Localization algorithms assume that the environment is fully known (i.e., that a map of the entire environment is already available), and thus that the device can be localized with respect to the map. This localization consists of finding correspondences between the map and the sensor data. For example, localization might involve matching image points to three-dimensional (3D) map points in a visual localization, query an image database for similar previously seen images, or matching three-dimensional (3D) points to 3D points if a depth sensor is available. Afterwards, a localization algorithm is run to obtain the pose between the sensor and the map. Since the correspondences are prone to outliers, i.e., erroneous matches, robust cost functions in the optimization can be used or random sample consensus (RANSAC) based approaches can be used. Further in this respect, if the localization fails, by either not being coherent with the sensor data and/or by the device being misplaced, one remedy is to enable the SLAM algorithm with the ability to handle the so-called kidnapped robot problem. One approach if the full environment has been mapped is to perform re-localization, i.e., to localize with respect to the map and reset the odometry estimate.

More sophisticated re-localization approaches are based on Deep Learning, as shown by H. Jo and E. Kim, “New Monte Carlo Localization Using Deep Initialization: A Three-Dimensional LiDAR and a Camera Fusion Approach,” in IEEE Access, vol. 8, pp. 74485-74496, 2020, doi: 10.1109/ACCESS.2020.2988464. In this paper is shown the ability to initialize an estimate for the pose using a Deep Neural Network (DNN), which is inputted into a Particle Filter based localization algorithm. The reinitialization is triggered by a localization failure module, which compares the pose difference between poses estimated by the network and the ones estimated by the filter. Some drawbacks with this approach are: 1) the re-initialization does not use the 3D map of the environment required by the localization filter, 2) the approach is based on using the DNN for pose regression, which is not guaranteed to generalize from the training data, and is more related to image retrieval then to pose estimate methods, and; 3) the DNN is trained by running a SLAM algorithm on the same scene, which is assuming that the map used for localization represents the entire scene.

Another localization technique that handles differences in the pose-graph of the previous mapping session from the current pure localization session is presented by S. Macenski and I. Jambrecic, “SLAM Toolbox: SLAM for the dynamic world,” in Journal of Open Source Software, 6(61), 2783, https://doi.org/1o.211o5/joss.o2783. Localization is performed by keeping a memory of past measurements which are added temporally to the original pose-graph, after matching to the map. This allows to anchor the current mission on the previous one, without changing the map, since the new factors added to it are removed after a specified elapsed time. This approach is also reliant on a full map of the environment to be available in other to anchor most of the new measurements. Furthermore, the map must be rich enough to obtain matches throughout the session.

SUMMARY

An object of embodiments herein is to address the above issues with known techniques for localization and for simultaneous localization and mapping.

A particular object is to provide techniques for accurate and computationally efficient localization and mapping of partially known environments.

According to a first aspect there is presented a method for localizing and mapping a partially known environment for a device. The method is performed by an image processing device. The method comprises obtaining sensor data, as sensed by the device, of the environment in which the device is located. The method comprises performing localization of the device as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device with respect to a previously computed partial map of the environment. The method comprises determining that the localization fails. The localization has failed due to a failure having a failure type. The failure has lasted during an elapsed failure time. The method comprises determining to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment.

According to a second aspect there is presented an image processing device for localizing and mapping a partially known environment for a device. The image processing device comprises processing circuitry. The processing circuitry is configured to cause the image processing device to obtain sensor data, as sensed by the device, of the environment in which the device is located. The processing circuitry is configured to cause the image processing device to perform localization of the device as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device with respect to a previously computed partial map of the environment. The processing circuitry is configured to cause the image processing device to determine that the localization fails. The localization has failed due to a failure having a failure type. The failure has lasted during an elapsed failure time. The processing circuitry is configured to cause the image processing device to determine to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment.

According to a third aspect there is presented an image processing device for localizing and mapping a partially known environment for a device. The image processing device comprises an obtain module configured to obtain sensor data, as sensed by the device, of the environment in which the device is located. The image processing device comprises a localization module configured to perform localization of the device as a function of frames of the sensor data by operating an odometry algorithm with a set of odometry parameters to, from the sensor data per each frame of sensor data, incrementally estimate pose of the device with respect to a previously computed partial map of the environment. The image processing device comprises a determine module configured to determine that the localization fails. The localization has failed due to a failure having a failure type. The failure has lasted during an elapsed failure time. The image processing device comprises a determine module configured to determine to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment.

According to a fourth aspect there is presented a computer program for localizing and mapping a partially known environment for a device, the computer program comprising computer program code which, when run on an image processing device, causes the image processing device to perform a method according to the first aspect.

According to a fifth aspect there is presented a computer program product comprising a computer program according to the fourth aspect and a computer readable storage medium on which the computer program is stored. The computer readable storage medium could be a non-transitory computer readable storage medium.

Advantageously, these aspects provide efficient selection between either performing localization only or performing SLAM, depending on the cause of the localization failure.

Advantageously, these aspects avoid SLAM to be performed when unnecessary, thereby saving computational resources.

Advantageously, these aspects do not need access to a complete map of the environment, but only trigger SLAM to be performed when necessary.

That is, by performing localization only in previously mapped regions and starting mapping only in long periods of localization failure, the herein disclosed aspects allow to reduce power consumption of the device when compared to traditional SLAM algorithms which runs continuously.

Advantageously and contrarily to prior art, these aspects enable the image processing device to handle situations with a partial map of the environment by either ramping up the performance of the odometry algorithm (by adapting the odometry parameters) or by start building a new map of the environment that later can be fused with the existing partial map. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.

Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, module, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, module, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

BRIEF DESCRIPTION OF THE DRAWINGS

The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which:

Fig. 1 is a schematic diagram illustrating a system according to embodiments;

Fig. 2 schematically illustrates an image processing device according to an embodiment;

Figs. 3 and 4 are flowcharts of methods according to embodiments;

Fig. 5 is a schematic diagram showing functional units of an image processing device according to an embodiment;

Fig. 6 is a schematic diagram showing functional modules of an image processing device according to an embodiment; and

Fig. 7 shows one example of a computer program product comprising computer readable storage medium according to an embodiment.

DETAILED DESCRIPTION

The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.

Fig. i is a schematic diagram illustrating a system 100 according to embodiments. In Fig. 1(a) is illustrated a system 100a where localization and mapping of a partially known environment 150 is to be performed for a device 110. The device 110 is configured to implement some application that requires an accurate pose estimate. The device 110 comprises a sensor 120 for acquiring sensor data of the partially known environment 150. The sensor might be a 2D camera (such as a monocular camera) or a 3D camera (such as a stereo camera), a RADAR module, a LIDAR module, etc., optionally in combination with one or more complementary sensors, such as an Inertial Measurement Unit (IMU), etc. The device 110 comprises a communications interface 130 for communicating the sensor data over a communications link 160 to an image processing device 140. The image processing device 140 is configured to perform the localization and mapping for the device 110. In Fig. 1(b) is illustrated a system 100b similar to the system 100a but where the image processing device 140 is provided in the device 110. The image processing device 140 is thus part of the device 110. There could be different types of devices, such as portable wireless devices, mobile stations, mobile phones, handsets, wireless local loop phones, user equipment (UE), smartphones, laptop computers, tablet computers, network-connectable sensor devices, network-connectable vehicles, gaming equipment, and Internet-of-things (loT) devices.

Issues with known techniques for localization and mapping have been disclosed above. The embodiments disclosed herein therefore relate to mechanisms for localizing and mapping a partially known environment 150 for a device 110. In order to obtain such mechanisms there is provided an image processing device 140, 200, 500, 600, a method performed by the image processing device 140, 200, 500, 600, a computer program product comprising code, for example in the form of a computer program, that when run on an image processing device 140, 200, 500, 600, causes the image processing device 140, 200, 500, 600 to perform the method. At least some of the herein disclosed embodiments are directed to localization and mapping partially known environments 150. In this respect, when performing several mapping missions, or sessions, in the same environment 150, if there is a high overlap with previously mapped regions, the device 110 is a low power device, the device 110 should restrain from remapping those areas, and instead only perform localization up to the point where there is no longer an overlap. Furthermore, if the device 110 is located in the region without any overlap for only a small period of time, it might not be advantageous to start the map building, but instead to focus on adapting, or even optimizing, the odometry parameters.

Fig. 2 is a block diagram of an image processing device 200 according to embodiments. A frame localization block 210 implements localization of the device 110, finding the pose of the current sensor data with respect to a previously computed map 170 of the environment 150. An odometry block 220 implements an odometry algorithm which takes sensor data, as provided by the sensor, and, based on odometry parameters, incrementally estimates the pose of the device 110. A localization failure detection block 230 implements logic that identifies if the result from the frame localization block 210 is reliable. If the result is identified as not reliable, the localization failure detection block 230 triggers the decision to either adapt the odometry parameters of the odometry block 220 or to start map building (as implemented by a map builder block 240). The map builder block 240 is configured to build a map of the environment 150 given the odometry estimated poses and the features of the frame localization module, to find loop closures, and to optimize the pose-graph and reconstructed features. A map fusion block 250 is implemented to combine the new map built by the map builder block 240 with the previously computed map.

In general terms, the determination of switching between localization only and mapping (either mapping only or mapping combined with localization) is based on detecting localization failure, both by checking appearance based (i.e., finding matches between sensor data and the map), and geometric consistency (e.g., check for discontinuities in the pose estimate).

If the localization is failing due to lack of matches (e.g., the number of matches is smaller than some threshold a in a time window of size M), then an odometry parameter adaptation that improves pose accuracy even at the cost of more power is performed. Hence, in some embodiments, a first failure type is defined by the number of matches is below a threshold value (represented by the threshold a) for an elapsed failure time corresponding to M consecutive frames of sensor data. Further, if the device no again can perform successful localization without the threshold a being violated, then a switch is made back to performing localization only with default odometry parameters. On the other hand, if the localization fails due to geometric inconsistency (such as wrong pose estimate), or if the threshold a keeps being violated in a time window of size N (where N > M), this is an indication that the map built thus far does not contain sufficient information, and thus map building is started. Hence some embodiments, a second failure type is defined by a median number of matches for an elapsed failure time corresponding to N > M consecutive frames of sensor data is below the threshold value or the pose estimate is discontinuous. The failure type is either of the first failure type or of the second failure type.

Optionally, if combined localization and map building is being performed, the frequency at which the mapping tries to find correspondences in the thus far built map can be reduced. Mapping is only performed until the localization is successful or the mission, or session, reaches its end. If mapping has been performed, map fusion can be performed. If the mission, or session, is not over, a switch is made back to performing localization only with default odometry parameters.

Further details of localizing and mapping a partially known environment 150 for the device 110 will be disclosed next with reference to the flowchart of Fig. 3. The methods are performed by the image processing device 140, 200, 500, 600. The methods are advantageously provided as computer programs 720.

S102: The image processing device 140, 200 obtains sensor data of the environment 150 in which the device 110 is located. The sensor data has been sensed by the device 110.

S104: The image processing device 140, 200 performs localization of the device 110.

The localization is performed as a function of frames of the sensor data and by operating an odometry algorithm with a set of odometry parameters. The odometry algorithm is operated to, from the sensor data per each frame of sensor data, incrementally estimate the pose of the device no with respect to a previously computed partial map of the environment 150.

S108: The image processing device 140, 200 determines that the localization fails. The localization has failed due to a failure having a failure type. The failure has lasted during an elapsed failure time.

S108: The image processing device 140, 200 determines to, based on the failure type and/or the elapsed failure time, either adapt the set of odometry parameters or start building a new map of the environment 150.

Embodiments relating to further details of localizing and mapping a partially known environment 150 for a device 110 as performed by the image processing device 140, 200, 500, 600 will now be disclosed.

Aspects of how the image processing device 140, 200 might determine that the localization fails will now be disclosed.

In some aspects, that localization fails is determined by checking appearance based consistency (i.e. , finding matches between sensor data and the existing map) and geometric consistency (e.g., checking for discontinuities in the pose estimate). Hence, in some embodiments, performing localization of the device 110 involves matching the sensor data to points in the partial map and the image processing device 140, 200 is configured to perform (optional) step S106.

S106: The image processing device 140, 200 obtains an indication of number of failed matches and/or discontinuities of the incrementally estimated pose.

That the localization fails is then determined as a function of the number of failed matches and/or discontinuities of the incrementally estimated pose.

When relying on a partial map of the environment the frame localization is susceptible to fail. The localization might fail due to different types of failure (above referred to as failure type). For example, the localization might fail due to failing to find enough matches and/or inliers (i.e., matches that support the estimate of the localization), failing to compute a pose, failing to correctly estimate the pose due to incorrect found matches. Furthermore, if there is a low number of matches and/or inliers, the reliability of the estimate becomes low as well.

Furthermore, it might be impossible for the image processing device 140, 200 to estimate the pose. For example, in the 2D-3D case, at least three matches are required for a pose to be estimated. Nevertheless, for configurations where there can be more than one solution of the pose, the minimum number of matches might be set to four. With so few matches, the reliability of the localization might be compromised. This is one reason for comparing the number of matches to the above-disclosed threshold a. Hence, the number of matches and/or inliers must be larger than the threshold a for estimate of the pose to be considered valid. In this respect, the value of the threshold a might be dependent on the actual localization algorithm used, and the data sources. For example, if the localization involves finding correspondences between image points (which are in 2D) and map points (which are in 3D), the localization algorithm used can be the Perspective-n-Point, for which the threshold a could be set to the order of 10. If the threshold a is violated, the frame localization block 210 does not send an estimate of the pose to the odometry block 220 but still sends the number of matches/inliers to the localization failure detection block 230.

The localization failure detection block 230 thereby keeps track of the number of matches and inliers from the localization block. To prevent reactive triggering, i.e., always triggering either adaptation of the odometry parameters or start building a new map every time a frame of sensor data results in fewer than a matches, the localization failure detection block 230 keeps track of matches/inliers in a window of size M. This results in decisions being made over a wider time window.

Furthermore, the median number of matches/inliers might be taken as the decision variable. By using the median value instead of the mean value, singular failures can be discarded. For example, if a comparative high level of blur is experienced in one of the M frames, the number of matches in that one frame is low, but in the number of matches in the remaining windows is high. In this case, if the mean value is used, the low matches in one single frame might lead to reactive triggering, even though localization failed for only one single frame. Thus, a failure by lack of matches might only be identified if the median number of matches across the latest M frames is below the threshold a. In some examples, if the odometry block 220 is using sensor data in terms of visual information (such as images) and employs a semi-direct method, frames of sensor data that are likely to provide a low number of features might be identified. A semidirect visual odometry method can provide the frame localization block 210 with the metrics on the average and median gradient intensities and/or the distribution of high intensity gradient areas in per image. If an image presents a low gradient intensity (e.g., due to for example blur, observing a flat colored wall, etc.), the frame localization block 210 can skip those frames. Then if there are M skipped frames, an appearance-based localization failure occurs. Given the low gradient intensity, the frame localization block 210 can abstain from detecting features in the image, and thus save power and computation time. If the odometry block 220 is using semidirect methods, the frame localization block 210 only needs to extract sparse features for localization.

Another failure type is erroneous matches that consequently yield wrong pose estimates. This occurs if the great majority of the matches is incorrect, and thus it is possible to find an inlier set that supports an incorrect pose estimate. Incorrect matches might arise due to lack of information provided by the feature descriptor used by the frame localization block 210 and/or structured environments 150 that tend to be symmetric. To identify such a situation, geometric consistency of consecutive frame localizations can be checked.

In some embodiments, the localization was last successful for the sensor data up to and including frame j and the discontinuities of the incrementally estimated pose are determined by rotation and translation errors (below denoted e t and e r ) between relative transformations of the incrementally estimated pose for the sensor data up to an including frame j and the incrementally estimated pose for the sensor data up to an including frame k > j is larger than rotation and translation error thresholds. In further detail, let be the current pose (i.e., rotation and translation represented by 4 x 4 matrix in the special Euclidean group, or Lie group, SE(3) consisting of a rotation matrix and a translation vector) from the frame localization block 210 at instant k, and 7} L be the recent most previous successful frame localization (i.e., passed the appearance-based and geometric consistency check). Furthermore, let T° and 7}° be the odometry pose estimates at instants k and j, respectively. The odometry is estimated at a higher rate (such as at the same rate as new sensor data is acquired) whilst the localization is run at a lower rate. In this case instant k refers to the current time instant where the latest localization was computed, and the instant j refers to the time instant where the previous successful localization occurred. Furthermore, an odometry estimate is available for each sensor measurement. The relative transformations, T^ el , in both frames are given as:

T el = (r 1 T } A where A e {0, L).

The geometric consistency is verified by checking the rotation and translation errors between the relative transformations in both frames. The translation error, denoted s t , is defined as the norm of the difference between both translation components. That is:

The rotation error, denoted e r , is defined as:

E r = acos Trace

If both errors s t and s r are above respective thresholds and p 2 , then the localization is inconsistent and thus incorrect.

The geometric consistency check is tied to the assumption that the localization at instant j is valid. The issue with this assumption arises to determine the first successful localization. To do so, when solely relying on the appearance-based check, a high number of matches/inliers (some factor multiplied by the threshold a) might be required.

If the localization failure detection block 230 comprises, or implements the functionality of, a drift module, the drift module can be exploited to tune the thresholds and /3 2 . In this respect and in general terms, the higher the time displacement between odometry estimates, the higher the drift. By exploiting a drift module, this can be accounted for by increasing or decreasing the thresholds and p 2 , and prevent misclassification of geometric inconsistency. By increasing or decreasing these thresholds according to the expect drift for the time difference between instants k and j, where instant k refers to a successful frame localization (i.e., passed the appearance-based check), thus the time between instants can vary.

In some examples, if the sensors is a camera, the frame localization block 210 might provide the matches found and not only the number of matches found, and keep track of features across multiple frames. The geometric consistency can then be computed by taking the tracked 2D-3D matched points, applying the relative transformation in the frame from instant k to instant j to the 3D points and check the reprojection error of those points to the image at instant j. Thus, in some embodiments, the frames of sensor data are represented by image frames of the environment 150, the localization is assumed to lastly be successful for the sensor data up to and including image frame j, and the discontinuities of the incrementally estimated pose are determined by a reprojection error of a transform is performed from the localization for the sensor data of image frame k > j to points in image frame j is larger than an error threshold. The geometric consistency is exploiting the feature points directly, which can help identify if the localization failure is related to the matched points or a failure in the pose estimation algorithm. If it is the latter, a pose may be recomputed with the lowest reprojection error points. If the median reprojection error is above a threshold y then the localization is incorrect. If the incorrect matches occur the reprojection error will be high. Similar to the thresholds and /? 2 , the threshold y can also be adapted by exploiting an available odometry drift module. A similar check can be performed for the first appearance valid localization. Reprojection errors can be computed in a window of size Q for each frame acquired since the attempted localized frame. This requires to the rate of the feature tracking to be increased.

In some examples the matches are computed in a single instance, i.e., there is not any tracking of 2D points. In this scenario, a localization failure can be identified if at least 6 points fall outside the image at instant j, after reprojection. As before, this check can be applied in a window of size Q after obtaining the first appearance valid localization to ensure its geometric validity. Another possibility is to consider any combination of two or more metrics, for example in a weighted sum. This might improve the robustness of the failure detection. The computation of the reprojection errors uses the relative transformation computed to assess the pose error in the first test. The additional computation requires a linear complexity in the number of matches, which is not significant with respect to the overall process (odometry and frame localization).

When the pose T is incorrect the frame localization block 210 needs to signal to the odometry block 220 to discard that estimate. This is in order to prevent sudden changes in the pose estimates. Such sudden changes can lead to issues in applications using those estimates.

Aspects of how the image processing device 140, 200 might adapt the odometry parameters will now be disclosed.

In some embodiments, the odometry parameters of the odometry algorithm are adapted when the failure is of the first failure type. In further detail, if the localization failure is due to low number of matches over a window of size M, the odometry parameters are adapted before star building a new map. In this way the map builder block 240 is prevented from being started prematurely (which would require additional power from the device). This prevention is made if, for example, the device is visiting an unknown (i.e., unmapped) region during a short period of time.

To adapt, or even optimize, the odometry parameters, predefined configurations of default odometry parameters can be exploited. For example, if three modes for the odometry parameters are provided, such as: (1) low power and accuracy, (2) medium power and accuracy, and (3) high power and accuracy, then the default odometry parameters can be defined by mode (1). A switch can the be made from mode (1) to either mode (2) or mode (3), depending on the available load on the device at the time duration during which the localization has failed.

An alternative to this is to actually modify the odometry parameters that allows for a better tradeoff between power consumption and odometry accuracy. Thus, an alternative is to perform design space exploration (DSE) a priori using available tools to find Pareto front configurations, which can then replace the default odometry parameters. Another alternative is to exploit the data acquired by DSE experiments to train a regressor that, given the configuration and sensor data, can predict the power consumption and odometry estimation error. This regressor allows for fast inference, and thus can be used for online optimization of the odometry parameters.

Aspects of how the image processing device 140, 200 might build a new map of the environment 150 will now be disclosed.

In some embodiments, building on the new map is started when the failure is of the second failure type. That is, in some aspects, when the number of matches/inliers continues to be below the threshold a for more than N frames (where N > M) and/or the geometric consistency test fails, the localization failure detection triggers the map builder block 240 to start mapping the environment. In this respect, the geometric consistency failure might be regarded as a catastrophic failure and thus in this case the map builder block 240 can be started directly without first adapting the odometry parameters and trying to find a valid localization.

In some embodiments, the new map is built on by performing SLAM on the frames of sensor data. In general terms, the building of the new map is related to the sensor setup, particularly on the availability of depth data. Two different scenarios are therefore envisioned.

In a first scenario, depth data is available. In this scenario the building of the new map involves registering consecutive point clouds using the poses estimated by the odometry block 220. Those poses are inserted into a pose graph. The bottlenecks of this approach are the size of the map (which will grow fast) and the bundle adjustment when a loop closure is found. An option to reduce map size is to select a subset of the most representative point clouds (i.e., keyframes) to build the new map. This will also reduce the number of nodes in the pose graph, which consequently speeds up the bundle adjustment.

In a second scenario, depth data is not available. The 3D points then need to be triangulated from two or more views. In this scenario, three possible implementations are envisioned. In a first implementation both the odometry block 220 and the frame localization block 210 are exploiting the same features. The odometry block 220 keeps track of the features, and those features are then used by the frame localization block 210 and the map builder block 240. This implementation can be based on ORB-SLAM. In a second implementation the odometry block 220 is using specialized features for fast estimation. Those features are usually in an insufficient number to create a representative map. Thus, the frame localization block 210 uses more informative features, which are then used by the map builder block 240. This implementation can be based on maplab. In a third implementation the odometry block 220 uses a direct/semi-direct method, i.e., uses all or most of the image to estimate the pose displacement. Since it is computationally expensive to keep track of such a high number of points, the frame localization block 210 might exploit descriptor-based features. In this case, the map builder block 240 uses the features/points from both the odometry block 220 and the frame localization block 210 to build the new map. This implementation can be based on LSD-SLAM.

In other examples, if the device has the storage capacity, the device can store the sensor data of the N frames where the localization failure is appearance-based and use that data to start the map building process. Therefore, in some embodiments, the new map is built at least based on the N > M consecutive frames of sensor data is below the threshold value when the medium number of matches for the elapsed failure time corresponding to N > M consecutive frames of sensor data is below the threshold value. The new map will cover a wider unmapped region, and can improve robustness of the fused map, by finding a more representative set of keyframes. This does not apply to the geometric inconsistency failure This allows for a wider coverage of the localization failure area. If the map builder block 240 is not triggered (i.e., a successful localization is found prior to N frames) the already stored data can be discarded.

When the map builder block 240 receives the signal that the localization estimates are valid, the map building is stopped, and the new map can be optimized (i.e., execute bundle adjustment) and sent to the map fusion block 250.

Aspects of how the image processing device 140, 200 might determine that the localization again is successful will now be disclosed. In some aspects, even though localization has failed, and either the odometry parameters were adapted or the building of a new map was started, the frame localization keeps attempting to perform localization in the already available map.

This enables to identify when the device again is visiting an already mapped region and enables connections to be made between the pose graphs of the new map and the previously existing map. This thus enables the pose to be re-anchored if only using the odometry block 220, or to find loop-closures between the new map and the already available map, if using the map builder block 240. The procedure to signal either of the above-mentioned situations starts if both localization checks are verified. The number of matches/inliers needs to be above the threshold a for M consecutive localizations, and all the poses in the window are geometrically consistent. Hence, in some embodiments, the image processing device 140, 200 is configured to perform (optional) steps S112 and S116.

S112: The image processing device 140, 200 determines that the localization is successful for at least the M latest obtained frames of sensor data.

Step S116 is then entered. However, as will be further disclosed below, optional step S114 might be entered after step S112 but before step S116.

S116: The image processing device 140, 200 performs the localization by operating the odometry algorithm with a default set of odometry parameters.

In some examples, to save power (e.g., if the map builder block 240 was started) the rate at which frame localization is attempted can be decreased. Thus, whenever a localization is valid, the rate is iteratively increased until it reaches the predefined rate. If a localization failure occurs, the rate is lowered to the minimum once again. The minimum can be found by profiling the power consumption of the frame localization block 210 and the map builder block 240 and select an appropriate rate to account for the tradeoff of localization for mapping.

When localization passes both previous tests, two scenarios are possible; (1) the odometry parameters are reset to default values (if the map builder block 240 was not started), or (2) the map fusion block 250 is triggered to merge the new map with the already available map. Aspects of how the image processing device 140, 200 might perform map fusion will now be disclosed.

In some embodiments, the image processing device 140, 200 is configured to perform (optional) step S114.

S114: The image processing device 140, 200 fuses the new map with the previously computed partial map.

In further detail, the map fusion block 250 is triggered by the localization failure detection block 230 if the map builder block 240 was started and the number of matches/inliers of the frame localization are above the threshold a for M consecutive localizations, and all those M localizations are geometrically consistent. Then each localization is added to the pose graph as loop closure edges. The nodes to connect can be found using place recognition methods like Bag-of-Words.

In general terms, the map fusion is related to the sensor setup, particularly on the availability of depth data. Two different scenarios are therefore envisioned.

In a first scenario, depth data is available. The map fusion then involves optimizing the joint pose graph, i.e., a graph with the vertices and edges of the already available map and the new map connected by the loop-closure edges. The result is a single pose graph of the complete map. The point-clouds can then be joined using the optimized poses. If the overlap of the point-clouds is high it may be necessary to combine the points in those regions, to avoid storing redundant data.

In a second scenario, depth data is not available. As above, the map fusion first involves optimization of the joint pose graph. The triangulation of the points may have a high error, given the current pose estimates. To tackle this issue, methods such as ORB-SLAM can be applied to update the point position using the optimized poses, and then run a full bundle-adjustment, which optimizes both the points and the poses by minimizing the reprojection error of each point across all camera poses, which observe that point.

After map fusion, the map is updated, and the frame localization block starts using the fused map. This prevents the remapping of the regions that originated the localization failures. One particular embodiment of a method for localizing and mapping a partially known environment 150 for a device 110 as performed by an image processing device 140, 200 will be disclosed next with reference to the flowchart of Fig. 4.

S201: Localization is performed by the frame localization block 210. This localization is continuously performed in the background.

S202: It is checked whether a localization failure occurs or not. If a localization failure occurs (yes), step S203 is entered. If a localization failure does not occur (no), step S207 is entered.

S203: It is checked whether the localization failure is appearance based or not. If the localization failure is appearance based (yes), step S204 is entered. If the localization failure is not appearance based (no), step S206 is entered.

S204: It is checked whether the localization failure has occurred over N or more consecutive localizations or not. If the localization failure has occurred over N or more consecutive localizations (yes), step S205 is entered. If the localization failure has not occurred over N or more consecutive localizations (no), step S206 is entered.

S205: The odometry block 220 is started to adapt the odometry parameters. The thus adapted odometry parameters are then used for localization.

S206: The map builder block 240 is started to build a new map. Optionally, the rate at which the localization is performed by the frame localization block 210 in the background is lowered.

S207: It is checked whether the map builder block 240 was started or not (i.e., whether step S206 was entered or not). If the map builder block 240 was started (yes), step S208 is entered. If the map builder block 240 was not started (no), step S209 is entered.

S208: The map fusion block 250 is started to merge the new map with the already available map. S209: It is checked whether the odometry block 220 was started or not (i.e., whether step S205 was entered or not). If the odometry block 220 was started (yes), step S210 is entered. If the odometry block 220 was not started (no), step S202 is entered.

S210: The adapted odometry parameters are reset to default odometry parameters.

Fig. 5 schematically illustrates, in terms of a number of functional units, the components of an image processing device 500 according to an embodiment. Processing circuitry 510 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), etc., capable of executing software instructions stored in a computer program product 710 (as in Fig. 7), e.g. in the form of a storage medium 530. The processing circuitry 510 may further be provided as at least one application specific integrated circuit (ASIC), or field programmable gate array (FPGA).

Particularly, the processing circuitry 510 is configured to cause the image processing device 500 to perform a set of operations, or steps, as disclosed above. For example, the storage medium 530 may store the set of operations, and the processing circuitry 510 may be configured to retrieve the set of operations from the storage medium 530 to cause the image processing device 500 to perform the set of operations. The set of operations may be provided as a set of executable instructions.

Thus the processing circuitry 510 is thereby arranged to execute methods as herein disclosed. The storage medium 530 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory. The image processing device 500 may further comprise a communications interface 520 at least configured for communications with other entities, functions, nodes, and devices, as in Fig. 1. As such the communications interface 520 may comprise one or more transmitters and receivers, comprising analogue and digital components. The processing circuitry 510 controls the general operation of the image processing device 500 e.g. by sending data and control signals to the communications interface 520 and the storage medium 530, by receiving data and reports from the communications interface 520, and by retrieving data and instructions from the storage medium 530. Other components, as well as the related functionality, of the image processing device 500 are omitted in order not to obscure the concepts presented herein. Fig. 6 schematically illustrates, in terms of a number of functional modules, the components of an image processing device 6oo according to an embodiment. The image processing device 6oo of Fig. 6 comprises a number of functional modules; an obtain module 6io configured to perform step S102, a localization module 620 configured to perform step S104, a determine module 640 configured to perform step S108, and a determine module 650 configured to perform step S110. The image processing device 600 of Fig. 6 may further comprise a number of optional functional modules, such as any of an obtain module 630 configured to perform step S106, a determine module 660 configured to perform step S112, a fuse module 670 configured to perform step S114, and a localization module 680 configured to perform step S116.

In general terms, each functional module 610:680 may in one embodiment be implemented only in hardware and in another embodiment with the help of software, i.e., the latter embodiment having computer program instructions stored on the storage medium 530 which when run on the processing circuitry makes the image processing device 500, 600 perform the corresponding steps mentioned above in conjunction with Fig 6. It should also be mentioned that even though the modules correspond to parts of a computer program, they do not need to be separate modules therein, but the way in which they are implemented in software is dependent on the programming language used. Preferably, one or more or all functional modules 610:680 may be implemented by the processing circuitry 510, possibly in cooperation with the communications interface 520 and/or the storage medium 530. The processing circuitry 510 may thus be configured to from the storage medium 530 fetch instructions as provided by a functional module 610:680 and to execute these instructions, thereby performing any steps as disclosed herein.

The image processing device 140, 200, 500, 600 may be provided as a standalone device or as a part of at least one further device. For example, the image processing device 140, 200, 500, 600 may be provided in the device 110, as in Fig. 1(b). Alternatively, functionality of the image processing device 140, 200, 500, 600 may be distributed between at least two devices. A first portion of the instructions performed by the image processing device 140, 200, 500, 600 may be executed in a first device, and a second portion of the of the instructions performed by the image processing device 140, 200, 500, 600 may be executed in a second device; the herein disclosed embodiments are not limited to any particular number of devices on which the instructions performed by the image processing device 140, 200, 500, 600 may be executed. Hence, the methods according to the herein disclosed embodiments are suitable to be performed by an image processing device 140, 200, 500, 600 residing in a cloud computational environment. Therefore, although a single processing circuitry 510 is illustrated in Fig. 5 the processing circuitry 510 may be distributed among a plurality of devices, or nodes. The same applies to the functional modules 610:680 of Fig. 6 and the computer program 720 of Fig. 7.

Fig. 7 shows one example of a computer program product 710 comprising computer readable storage medium 730. On this computer readable storage medium 730, a computer program 720 can be stored, which computer program 720 can cause the processing circuitry 510 and thereto operatively coupled entities and devices, such as the communications interface 520 and the storage medium 530, to execute methods according to embodiments described herein. The computer program 720 and/or computer program product 710 may thus provide means for performing any steps as herein disclosed.

In the example of Fig. 7, the computer program product 710 is illustrated as an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. The computer program product 710 could also be embodied as a memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or an electrically erasable programmable read-only memory (EEPROM) and more particularly as a non-volatile storage medium of a device in an external memory such as a USB (Universal Serial Bus) memory or a Flash memory, such as a compact Flash memory. Thus, while the computer program 720 is here schematically shown as a track on the depicted optical disk, the computer program 720 can be stored in any way which is suitable for the computer program product 710.

The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.