Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR DETERMINING A STATE OF A CAMERA
Document Type and Number:
WIPO Patent Application WO/2022/144653
Kind Code:
A1
Abstract:
The invention relates to a method for determining a state x k (8) of a camera (11) at a time t k , the state x k (8) being a realization of a state random variable X k , wherein the state is related to a state-space model of a movement of the camera (11). The method comprises the following steps: a) receiving an image (1) of a scene of interest (15) in an indoor environment (15) captured by the camera (11) at the time t k , wherein the indoor environment (15) comprises N landmarks (9) having known positions in a world coordinate system (12), N being a natural number; b) receiving a state estimate x^ k (2) of the camera (11) at the time t k , c) determining (3) positions of M features in the image (1), M being a natural number; d) receiving (4) distance data indicative of distance between the M features and the corresponding M landmarks (9), respectively; e) determining (5) an injective mapping estimate from the M features into the set of the N landmarks (9) using at least (i) the positions of the M features in the image and (ii) the state estimate (2); f) using the determined injective mapping estimate (5) to set up (6) an observation model in the state-space model, wherein the observation model is configured for mapping the state random variable X k of the camera onto a joint observation random variable Z k , wherein at the time t k , an observation z k is a realization of the joint observation random variable Z k , and wherein the observation z k comprises (i) the position of at least one of the M features in the image, and (ii) the distance data indicative of distance; and g) using (7) (i) the state estimate, (ii) the observation model, and (iii) the observation z k , to determine the state x k (8) of the camera at the time t k . The invention also relates to a computer program product and to an assembly.

Inventors:
HEHN MARKUS (CH)
ROSSETTO FABIO (CH)
Application Number:
PCT/IB2021/061637
Publication Date:
July 07, 2022
Filing Date:
December 13, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VERITY AG (CH)
International Classes:
G06T7/277
Other References:
KOSAKA A ET AL: "AUGMENTED REALITY SYSTEM FOR SURGICAL NAVIGATION USING ROBUST TARGET VISION", PROCEEDINGS 2000 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. CVPR 2000. HILTON HEAD ISLAND, SC, JUNE 13-15, 2000; [PROCEEDINGS OF THE IEEE COMPUTER CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION], LOS ALAMITOS, CA : IEEE COMP., 13 June 2000 (2000-06-13), pages 187 - 194, XP001035639, ISBN: 978-0-7803-6527-8
PLANK HANNES ET AL: "High-performance indoor positioning and pose estimation with time-of-flight 3D imaging", 2017 INTERNATIONAL CONFERENCE ON INDOOR POSITIONING AND INDOOR NAVIGATION (IPIN), IEEE, 18 September 2017 (2017-09-18), pages 1 - 8, XP033261470, DOI: 10.1109/IPIN.2017.8115878
ALVARADO VASQUEZ BIEL PIERO E ET AL: "Sensor Fusion for Tour-Guide Robot Localization", IEEE ACCESS, vol. 6, 12 December 2018 (2018-12-12), pages 78947 - 78964, XP011694870, DOI: 10.1109/ACCESS.2018.2885648
Attorney, Agent or Firm:
P&TS SA (AG, LTD.) (CH)
Download PDF:
Claims:
Claims

1. Method for determining a state xk (8) of a camera (11) at a time tk, the state xk (8) being a realization of a state random variable Xkl wherein the state is related to a state-space model of a movement of the camera (11), the method comprising: a) receiving an image (1) of a scene of interest (15) in an indoor environment (15) captured by the camera (11) at the time tk, wherein the indoor environment (15) comprises N landmarks (9) having known positions in a world coordinate system (12), N being a natural number; b) receiving a state estimate xk (2) of the camera (11) at the time tk; c) determining (3) positions of M features in the image (1), M being a natural number smaller than or equal to N, wherein an injective mapping between the M features and the N landmarks exists; d) receiving (4) distance data indicative of distance between the M features and the corresponding M landmarks (9), respectively; e) determining (5) an injective mapping estimate from the M features into the set of the N landmarks (9) using at least (i) the positions of the M features in the image and (ii) the state estimate (2); f) using the determined injective mapping estimate (5) to set up (6) an observation model in the state-space model, wherein the observation model is configured for mapping the state random variable Xk of the camera onto a joint observation random variable Zk, wherein at the time tk, an observation zk is a realization of the joint observation random variable Zk, and wherein the observation zk comprises (i) the position of at least one of the M features in the image, and (ii) the distance data indicative of distance; and g) using (7) (i) the state estimate, (ii) the observation model, and (iii) the observation zk, to determine the state xk (8) of the camera at the time tk.

2. Method according to claim 1, wherein the joint observation random variable Zk comprises M observation random variables Zk i, i = 1, wherein each of the M observation random variables Zk i comprises a distance data random variable Dk il and wherein the observation zk comprises observations zk i, i = 1,

3. Method according to claim 2, wherein the observation model is configured to model a 3D-to-2D projection of each of the M landmarks (9) corresponding to the M features to the corresponding feature, respectively, and wherein the injective mapping estimate, subsequently termed IME, links the M features with the M landmarks (9), wherein, for a feature i of the M features, the corresponding landmark is landmark IME(i), and wherein, for a feature-landmark pair (i,IME(i)), the observation model links the observation random variable Zk i to the state random variable Xk. Zk i = ^iME(o(^fc)/ wherein observation model function /IIMECOC) is dependent on landmark IME(i), and wherein the observation model comprises the observation model functions

4. Method according to claim 3, wherein the observation model function configured to map the state random variable Xk onto the distance data random variable Dk il wherein the distance data indicative of distance dk i, dk i being a realization of the distance data random variable Dk il relates to the distance between feature i and landmark IME(i).

5. Method according to any of the preceding claims, wherein the determining (7) of the state xk (8) using (i) the state estimate xk (2), (ii) the observation model, and (iii) the observation zk, is done by using update equations provided by applying an extended Kalman filter to the statespace model, wherein the update equations comprise the Jacobian matrix of the observation model, wherein the Jacobian matrix is evaluated at the state estimate xk (2).

6. Method according to claim 5, wherein a separate Jacobian matrix is used for each observation model function /IIME 1, - ,M, and wherein the update equations, using the respective separate Jacobian matrix, are consecutively and independently invoked for all M features. 18

7. Method according to one of the preceding claims, wherein the distance data indicative of distance are embodied as distances, which distances are provided by a time-of-flight (TOF) camera (11) as distances between the TOF camera and the M landmarks (9) corresponding to the M features, respectively.

8. Method according to any one of the preceding claims, wherein the image (1) is captured by the camera (11) as a light source (10) is operated to emit light which illuminates the scene of interest (15).

9. Method according to claim 8, wherein the distance data indicative of distance are embodied as intensity information for each of the M features.

10. Method according to claim 9, wherein each of the M observation model functions comprises a respective illumination model which is configured to map the state random variable Xk onto the respective distance data random variable Dk i,i = 1, the distance data random variables statistically modelling intensity information, wherein the illumination model i, i = 1, uses at least (i) a power of light emitted by the light source and an estimated light source position, (ii) a directivity of light emission by the light source, (iii) a reflectivity of landmark IME(i), and (iv) the known position of landmark IME(i) in the world coordinate system, for mapping the state random variable Xk onto the distance data random variables Dk i, i = 1, ... , M.

11. Method according to any one of the preceding claims, wherein the M observation model functions each comprise a camera model of the camera (11).

12. Method according to claim 11, wherein the camera model is embodied as a pinhole camera model. 19

13. Computer program product comprising instructions which when executed by a computer, cause the computer to carry out a method according to any one of claims 1 to 12.

14. Assembly, comprising (a) a camera (11), (b) a plurality of landmarks (9), and (c) a controller, wherein the controller is configured to carry out a method according to one of claims 1 to 12.

15. Assembly according to claim 14, further comprising a time-of-flight (TOF) camera and/or a light source (10).

Description:
Method and System for Determining a State of a Camera

Field of the invention

[0001] The present invention relates to a method for determining a state x k of a camera at a time t k , and to a computer program product and assembly.

Background to the invention

[0002] Indoor navigation of robots, for example drones, is an important problem, e.g., in the field of automatic warehousing. To facilitate indoor navigation, the robot, e.g., the drone, needs to know its current position with respect to its environment. Contrary to outdoor environments in which GNSS (Global Navigation Satellite Systems) can be employed, providing a high localization accuracy, GNSS in indoor environments is often not reliable due to signal attenuation and multi-path effects. Existing RF localization technologies for indoor and outdoor spaces also struggle with signal attenuation and multi-path effects limiting the usability in complex environments, for instance, in the presence of a significant amount of metal.

[0003] In the prior art, optical localization systems for indoor localization are known. Such optical localization systems extract information from images captured by a camera. The location of an object of which the pose is to be determined can then be computed using triangulation techniques after relating the coordinates of features in the two-dimensional camera image to three-dimensional rays corresponding to said features. The relation between image coordinates and three- dimensional rays is typically captured in a combination of first-principle camera models (such as pinhole or fisheye camera models) and calibrated distortion models (typically capturing lens characteristics, mounting tolerances, and other deviations from a first-principle model). [0004] In optical localization systems for determining the location of an object known in the prior art, the camera can be rigidly mounted outside the object, observing the motion of the object ("outside-in tracking"), or the camera can be mounted on the object itself observing the apparent motion of the environment ("inside-out tracking"). While outside-in tracking localization systems typically determine the location of the object relative to the known locations of the camera(s), inside-out tracking systems like SLAM (Simultaneous Localization and Mapping) typically generate a map of the environment in which the object moves. The map is expressed in an unknown coordinate system but can be related to a known coordinate system in case the locations of at least parts of the environment are already known or if the initial pose of the camera is known. In both cases, some error will accumulate as the map is expanded away from the initial field of view of the camera or from the parts of the environment with known location. The potential for propagating errors is a problem for applications where the location information must be referred to external information, for example to display the location of the object in a predefined map, to relate it to the location of another such object, or when the location is used to guide the object to a location known in an external coordinate system.

[0005] Outside-in optical localization systems typically scale very poorly to larger localization systems because at every point, the object must be seen by several cameras in order to triangulate the 3D position of the object. Especially for large spaces where only few objects are tracked this is economically not viable.

[0006] The position and orientation of a camera, e.g., mounted on a drone, may be summarized in a state, and the state may be tracked over time. Existing methods for determining the state of a camera do not provide an adequate level of accuracy, however, thus making them insufficient for use in many applications. [0007] It is an object of the present invention to mitigate at least some of the disadvantages associated with the methods for determining a state x k of a camera known from the state of the art.

Summary of the invention

[0008] According to a first aspect of the present invention there is provided a method for determining a state x k of a camera, involving the steps recited in claim 1. Further optional features and embodiments of the method of the present invention are described in the dependent patent claims.

[0009] The invention relates to a method for determining a state x k of a camera at a time t k , the state x k being a realization of a state random variable X kl wherein the state is related to a state-space model of a movement of the camera. The method comprises the following steps: a) receiving an image of a scene of interest in an indoor environment captured by the camera at the time t k , wherein the indoor environment comprises N landmarks having known positions in a world coordinate system, N being a natural number; b) receiving a state estimate x k of the camera at the time t k , c) determining positions of M features in the image, M being a natural number smaller than or equal to N, wherein an injective mapping between the M features and the N landmarks exists; d) receiving distance data indicative of distance between the M features and the corresponding M landmarks, respectively; e) determining an injective mapping estimate from the M features into the set of the N landmarks using at least (i) the positions of the M features in the image and (ii) the state estimate; f) using the determined injective mapping estimate to set up an observation model in the state-space model, wherein the observation model is configured for mapping the state random variable X k of the camera onto a joint observation random variable Z k , wherein at the time t k , an observation z k is a realization of the joint observation random variable Z k , and wherein the observation z k comprises (i) the position of at least one of the M features in the image, and (ii) the distance data indicative of distance; and g) using (i) the state estimate, (ii) the observation model, and (iii) the observation z k , to determine the state x k of the camera at the time t k .

[0010] The orientations of the N landmarks in the world coordinate system may also be known. Alternatively, the distance data indicative of distance may also relate to distances between a camera center of the camera and the M landmarks corresponding to the M features.

[0011] In principle, the number of features may also be larger than N in case outliers are detected as features. In this case, M would be larger than N. Such outliers may be removed during different processing steps: they could be removed during the determining of the injective mapping estimate, for example; outliers could also be removed before the determining of the injective mapping estimate based on (i) the received distance data indicative of distance, (ii) the state estimate, and (iii) the known positions of the N landmarks in the world coordinate system, e.g., by excluding features for which no plausible landmark may be identified with respect to the respective distance data indicative of distance. It may hence be assumed that - in case outliers are present - such outliers are removed: the M features are features which correspond to actual landmarks.

[0012] In an embodiment of the method according to the invention, the joint observation random variable Z k comprises M observation random variables Z k i , i = 1, wherein each of the M observation random variables Z k i comprises a distance data random variable D k il and wherein the observation z k comprises observations z k i , i = 1, ... , M.

[0013] In a further embodiment of the method according to the invention, the observation model is configured to model a 3D-to-2D projection of each of the M landmarks corresponding to the M features to the corresponding feature, respectively. The injective mapping estimate, subsequently termed IME, links the M features with the M landmarks, wherein, for a feature i of the M features, the corresponding landmark is landmark IME(i), and wherein, for a feature-landmark pair (i,IME(i)), the observation model links the observation random variable Z k i to the state random variable X k . Z k i = hi M E(i)( fc), wherein observation model function dependent on landmark IME(i), and wherein the observation model comprises the observation model functions

[0014] In a further embodiment of the method according to the invention, the observation model function /IIMECOC) is configured to map the state random variable X k onto the distance data random variable D k il wherein the distance data indicative of distance d k i , d k i being a realization of the distance data random variable D k il relates to the distance between feature i and landmark IME(i).

[0015] Alternatively, the distance data indicative of distance d k i as well as the distance data random variable D k i may relate to the distance between the camera center of the camera and landmark IME(i).

[0016] In a further embodiment of the method according to the invention, the determining of the state x k using (i) the state estimate x^, (ii) the observation model, and (iii) the observation z k is done by using update equations provided by applying an extended Kalman filter to the statespace model, wherein the update equations comprise the Jacobian matrix of the observation model, wherein the Jacobian matrix is evaluated at the state estimate x^.

[0017] In a further embodiment of the method according to the invention, a separate Jacobian matrix is used for each observation model function 1, - ,M, and wherein the update equations, using the respective separate Jacobian matrix, are consecutively and independently invoked for all M features.

[0018] In a further embodiment of the method according to the invention, the distance data indicative of distance are embodied as distances, which distances are provided by a time-of-flight (TOF) camera as distances between the TOF camera and the M landmarks corresponding to the M features, respectively.

[0019] The TOF camera may have a camera center and determine distances between its camera center and the M landmarks corresponding to the M features.

[0020] TOF camera functionality may be provided as part of the camera. Alternatively, the TOF camera may be a separate device. In case the TOF camera is a separate device, a coordinate transformation between the TOF camera and the camera may be assumed to be known. Measurements carried out by the TOF camera may then be transferred into a local coordinate system of the camera, and thereby compared to the image captured by the camera.

[0021] In a further embodiment of the method according to the invention, the image is captured by the camera as a light source is operated to emit light which illuminates the scene of interest.

[0022] In a further embodiment of the method according to the invention, the distance data indicative of distance are embodied as intensity information for each of the M features.

[0023] The term intensity information may, e.g., refer to an average intensity of a feature, or to a maximum intensity of a feature. Average intensity and maximum intensity of a feature may be determined from pixels, wherein said pixels capture the feature, which are part of the image sensor capturing the image.

[0024] In a further embodiment of the method according to the invention, each of the M observation model functions comprises a respective illumination model which is configured to map the state random variable X k onto the respective distance data random variable D k i , i = 1, the distance data random variables statistically modelling intensity information, wherein the illumination model i, i = 1, uses at least (i) a power of light emitted by the light source and an estimated light source position, (ii) a directivity of light emission by the light source, (iii) a reflectivity of landmark IME(i), and (iv) the known position of landmark IME(i) in the world coordinate system, for mapping the state random variable X k onto the distance data random variables D k i , i = 1, ... , M.

[0025] The estimated light source position may be estimated from the state estimate x k in case a geometrical relationship of the light source to the camera is known.

[0026] In a further embodiment of the method according to the invention, the M observation model functions each comprise a camera model of the camera.

[0027] In a further embodiment of the method according to the invention, the camera model is embodied as a pinhole camera model.

[0028] According to a further aspect of the present invention there is provided a computer program product comprising instructions which when executed by a computer, cause the computer to carry out a method according to the invention.

[0029] According to a further aspect of the present invention there is provided an assembly, comprising (a) a camera, (b) a plurality of landmarks, and (c) a controller, wherein the controller is configured to carry out a method according to the invention.

[0030] In an embodiment of the assembly according to the invention, the assembly further comprises a time-of-flight (TOF) camera and/or a light source. [0031] The assembly may comprise a camera and a separate TOF camera. A known coordinate transformation between the camera and the separate TOF camera may be assumed to be known, implying that measuring results obtained by either camera may be translated between the respective local coordinate system of the two cameras.

Brief description of drawings

[0032] Exemplar embodiments of the invention are disclosed in the description and illustrated by the drawings in which:

Figure 1 shows a schematic depiction of the method according to the invention for determining a state x k of a camera at a time t k ; and

Figure 2 shows a schematic depiction of a drone comprising a light source and a camera, wherein the drone is configured to fly in an indoor environment, wherein landmarks are arranged at a plurality of positions in said indoor environment.

Detailed description of drawings

[0033] Fig. 1 shows a schematic depiction of the method according to the invention for determining a state x k of a camera at a time t k . The state x k may comprise a 3D position and a 3D orientation of the camera at the time t k . The 3D position and the 3D orientation may be expressed with respect to a world coordinate system which is a predefined reference frame. The state x k may additionally comprise 3D velocity information of the camera at the time t k , wherein said 3D velocity information may, for example, also be expressed with respect to the world coordinate system. As the camera may move through an indoor environment over time, its state may need to be tracked to determine current positions and orientations of the camera. [0034] At the time t k , the camera may capture an image 1 of a scene of interest in the indoor environment comprising N landmarks. The positions (and possibly orientations) of the N landmarks in the indoor environment are known in the world coordinate system. Since, at the time t k , the camera has a specific position and orientation, not all the N landmarks may be visible to the camera. For example, / < N landmarks may be visible to the camera at the time t k , which / landmarks are projected by the camera onto the image 1 of the scene of interest. The projection of a landmark into an image is termed a 'feature'. From the / landmarks projected onto the image 1, M < J features may be identified, and their 2D positions in the image determined 3. The 2D position of a feature may relate to the 2D position of, e.g., a centroid of said feature. Some of the / landmarks may be positioned and oriented to the camera at the time t k in such a way that their projections onto the image are too small/dim/badly detectable. In this case, M may be strictly smaller than /, i.e., M < J, and the remaining J - M landmarks which are projected by the camera onto the image 1 may be disregarded/not detected. Features may be determined using a scaleinvariant feature transform, for example, or using a speeded up robust feature detector, or using a gradient location and orientation histogram detector, or using any other feature detector known from the prior art, or using a custom feature detector tailored to possible shapes of the landmarks in the indoor environment. It is also assumed that the M features are features which correspond to projections of landmarks onto the image, i.e., that outliers, which are projections of other objects which are not landmarks on to the image, are removed from the image.

[0035] The image 1 is captured by an image sensor of the camera. The image sensor has a position and orientation in the world coordinate system, wherein said position and orientation of the image sensor at the time t k may be implicitly encoded in the state x k of the camera. A feature with a specific 2D position in the image thereby also has a 3D position in space, wherein the 3D position corresponds to the 3D position of the point on the image sensor corresponding to the specific 2D position of the feature. [0036] In a next step, an injective mapping estimate from the M features to the N landmarks is determined 5. Since typically it holds that M < N, the injective mapping estimate is typically only injective and not surjective as well. The injective mapping estimate describes which landmark of the N landmarks induced which feature of the M features in the image. To determine 5 such an injective mapping estimate, a position/orientation of the camera at the time t k may need to be known. Instead of the current state x k , however, only a state estimate x k 2 is available. Starting with the state estimate x k 2, the injective mapping estimate may be determined, wherein during the injective mapping estimate, approximations to the state x k may be constructed. Since the injective mapping estimate is an injective function from one set to another, it may be represented in functional notation as IME(-), wherein the domain on which the injective mapping estimate is configured to operate is the set of M features, and the range is the set of N landmarks: a feature i is linked to landmark IME(i) through the injective mapping estimate.

[0037] Using the determined feature-to-landmark assignment IME(-), in a next step an observation model is set up 6. The observation model is configured to map a state random variable X kl wherein the state x k is a realization of the state random variable, onto a joint observation random variable Z k , the joint observation random variable termed joint since it probabilistically describes observations related to the M features. Observation z k is a realization of the joint observation random variable Z k , wherein said observation is obtained through an actual measurement process, or through a computation carried out on data provided by an actual measurement process. The joint observation random variable Z k may comprise M observation random variables Z k i , i = 1, wherein each of the M observation random variables may statistically describe observations related to the respective feature. The M observation random variables may be statistically independent from one another, or the joint observation random variable may comprise a probability distribution which does not factor into a product of probability distributions of the M observation random variables. [0038] The observation model may comprise M observation model functions wherein each of the M observation model functions may be configured to map the state random variable X k onto the respective observation random variable Z k i , i = 1, Each observation random variable Z k i , i = 1, may comprise a distance data random variable D k i , i = and a random variable related to the 2D position of feature i, i = 1, respectively. Distance data indicative of distance d k i , i = 1, may be realizations of the distance data random variables. The presence of a distance data random variable in an observation random variable implies that a quantity related to a distance between a feature i and its corresponding landmark is measured. The corresponding landmark may be the landmark which actually caused feature i (through projection by the camera) as well as observations associated to feature i. The corresponding landmark may be equal to landmark IME(i) in case the determined 5 injective mapping estimate assigns features to landmarks in a correct way. The term distance between a feature i and its corresponding landmark may relate to a distance between the 3D position of said feature i and a known 3D position of said corresponding landmark in the world coordinate system. Instead of a distance between the 3D position of a feature and its corresponding landmark, a distance between a camera center of the camera and the corresponding landmark may be used.

[0039] The distance data random variables D k i , i = 1, may statistically model intensities of features, and/or actual distances between a feature and its corresponding landmark. The intensity of a feature comprises, e.g., information on the distance between the feature and its corresponding landmarks, because intensity of a feature typically decreases with increasing distance between the feature and its corresponding landmark. Distance data indicative of distance are received 4 by the method according to the invention as part of the observation z k .

[0040] The observation model therefore models the mapping of the M landmarks - which M landmarks correspond to the M features by way of the determined injective mapping estimate - onto an image plane on which the image sensor is located according to the state random variable X k . The observation model may also comprise processing steps, e.g., for extracting a 2D position of a projected landmark, i.e., a feature, the 2D position, e.g., being a centroid of the feature in the image. The observation model may comprise a mathematical camera model, e.g., embodied as pinhole camera model, which mathematically describes the projection of a point in three-dimensional space onto the image plane on which the image sensor of the camera lies. To map the state random variable X k onto a distance random variable D k i , i = 1, ... , M, the observation model may comprise an illumination model in case a distance random variable relates to an intensity of a feature, or it may comprise a distance estimation model for determining a distance between a feature i and landmark IME(i).

[0041] An illumination model may model power losses of light emitted by a light source between emission by a light source and reception by the camera. The landmarks may be embodied as retroreflectors having a retroreflector-specific reflectivity, and the light source may be used for illuminating the retroreflectors. Light reflected by the retroreflectors may then appear brightly in the image 1 captured by the camera. The illumination model may comprise the reflectivity of a landmark IME(i) at which the emitted light is reflected. The illumination model may also comprise a distance (potentially with statistical uncertainty) between a feature i and its landmark IME(i), wherein the distance may be obtained based on the state estimate x k and a known position of the landmark IME(i) in the world coordinate system. The illumination model may further comprise a power of light emitted by the light source, a directivity of the light source, and an estimated light source position. In case the relative position and orientation of the camera to the light source is known, the estimated light source position may be determined using the state estimate x^. The illumination model may be a part of the observation model.

[0042] In case the distance random variables D k i ,i = 1, statistically model actual distances between features i, i = and landmarks IME(i), i = 1, respectively, distances may be measured using a time-of- flight (TOF) camera. The TOF camera can be a phase-based TOF camera, or a pulse-based TOF camera. The TOF camera may provide a distance between a feature and its corresponding landmark. TOF camera functionality may be a part of the camera, or the TOF camera may be a separate device. In case the TOF camera and the camera are separate devices, geometrical transformations between the TOF camera and the camera may be known, implying that measurements carried out using the TOF camera can be related to measurements carried out by the camera.

[0043] The observation model is part of a state-space model used for tracking a movement of the camera through space. Besides the observation model, the state-space model may typically comprise a state-transition model. The state-transition model describes how the state itself evolves over time. In case the camera is mounted on a drone, for example, the state-transition model may comprise equations modelling drone flight, the equations potentially comprising control input used for controlling the drone flight. The state-transition model typically also comprises a further term modelling statistical uncertainty in state propagation. The observation model and/or the state-transition model may be linear or nonlinear in their input, which input is the state of the camera. The state-transition model may also have the control input as input.

[0044] In case the observation model and the state-transition model are both linear, a Kalman filter may be used for determining 7 the state x k 8 at the time t k , using at least the state estimate x^, the observation model and the observation z k , which observation is a realization of the joint observation random variable Z k . The observation comprises (i) the 2D positions of the M features in the image, and (ii) the distance data indicative of distance, e.g., embodied as intensities of features or as measured distances between features and their respective corresponding landmarks. During the determining 7 of the state x k , the state estimate x k is used as input to the observation model (alternatively, an approximation to the state determined during the determining of the injective mapping estimate may be used as input to the observation model). In case the observation model and/or the state-transition model are nonlinear, an extended Kalman filter may be used, wherein the extended Kalman filter linearizes the nonlinear equations. Both Kalman filter and extended Kalman filter provide update equations for updating the state estimate x k using the observation model and the measured observation. Once the state x k 8 has been determined 7, it may be propagated in time, e.g., from time t k to time t k+1 , using the state-transition model, the propagation in time providing a state estimate for the state of the camera at the time t k+1 . Instead of Kalman filters, particle filters may be used, or state observers such as Luenberger observers may be used, or any other known filtering technique known from the state of the art. The state estimate x^ may be taken as a new state estimate for determining the state x k+1 at the time c+i ■

[0045] The update equations of the Kalman filter or of the extended Kalman filter may be invoked at once for all M features, or separately for each feature of the M features. In case an extended Kalman filter is used, a Jacobian of the observation model needs to be computed with respect to the state, and the Jacobian is evaluated at the state estimate x k ~ (or alternatively at an approximation to the state determined during the determining of the injective mapping estimate). In case the extended Kalman filter is separately invoked for each of the M features, a separate Jacobian may be determined for each of the M observation model functions 1 ... .,M. In case the time t k+1 - t k between the capture of consecutive images by the camera is not long enough to process all M features, not all the M features may be considered during the updating of the state.

[0046] Fig. 2 shows a schematic depiction of a drone comprising a light source 10 and a camera 11, wherein the drone is flying in an indoor environment 15. Landmarks 9, which in this particular example are embodied as retroreflectors, are arranged at a plurality of positions in the indoor environment 15. The landmarks 9 may be mounted on a ceiling of the indoor environment 15. At any given pose (comprising position and orientation) of the drone, some landmarks 9 may be visible to the camera 11 - in Fig. 2 indicated by lines between the landmarks 9 and the camera 11 - while other landmarks 9 may not be visible to the camera 11. The positions of the landmarks 9 may be known in a world coordinate system 12 which is a predefined reference frame 12, and the current location of the drone may be expressed as a drone coordinate system 13 which is a second reference frame 13, wherein a coordinate transformation 14 may be known between the world coordinate system 12 and the drone coordinate system 13. In case the camera 11 and the light source 10 are mounted rigidly to the drone and their pose relative to the drone is known, the pose of the camera 11 and of the light source 10 can be related to the world coordinate system 12 using the drone coordinate system 13. The current position of the drone can be determined using image(s) of scene(s) of interest 15 in the indoor environment 15, specifically of the landmarks 9 having known positions. Alternatively, or in addition, the drone may be equipped with an inertial measurement unit, which inertial measurement unit may be also used for pose determination. The light source 10 may be an isotropically emitting light source, or it may be a directional light source emitting in a non-isotropic manner. Light source 10 and camera 11 are ideally close to each other, specifically in case the landmarks 9 are embodied as retroreflectors. The camera 11 may also be mounted on top of the drone during a normal movement condition of the drone, i.e., next to the light source 10. The term normal movement condition may refer to a usual movement of the drone with respect to a ground of the scene of interest. The drone may additionally comprise a time-of-flight (TOF) camera for directly measuring distances to the landmarks 9. TOF camera functionality may be provided by a separate TOF camera, or TOF camera functionality may be included in the camera 11.