Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUSES FOR DETERMINING POSITIONS OF MULTI-DIRECTIONAL IMAGE CAPTURE APPARATUSES
Document Type and Number:
WIPO Patent Application WO/2018/100230
Kind Code:
A1
Abstract:
This specification describes a method comprising performing image re-projection on each of a plurality of first images (21 ), wherein each first image is captured by a camera (11) of a respective one of a plurality of multi-directional image capture apparatuses (10), thereby to generate a plurality of re- projected second images (22) which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

Inventors:
WANG TINGHUAI (FI)
YOU YU (FI)
FAN LIXIN (FI)
ROIMELA KIMMO (FI)
Application Number:
PCT/FI2017/050749
Publication Date:
June 07, 2018
Filing Date:
October 31, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
G06T7/70; G01C11/02; G03B37/04; G06T3/00; G06T7/80; H04N13/20
Domestic Patent References:
WO2012082127A12012-06-21
WO2012082127A12012-06-21
Foreign References:
US20150302561A12015-10-22
US20140125771A12014-05-08
US20150302561A12015-10-22
US20140125771A12014-05-08
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
Claims

1. A method comprising:

performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and

based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

2. The method of claim l, wherein a plurality of second images are generated from each first image.

3. The method of claim 1 or claim 2, wherein each of the second images has a different viewing direction compared to each of the other second images.

4. The method of any one of the preceding claims, wherein the first images are fisheye images.

5. The method of any one of the preceding claims, wherein the second images are rectilinear images. 6. The method of any one of the preceding claims, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras. 7. The method of any one of the preceding claims, wherein the determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras; and

determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

8. The method of claim 7, wherein the determination of a position of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining outliers and inliers in the generated positions of the virtual cameras; and

determining the positions of each of the cameras based only on the inliers.

9. The method of any one of the preceding claims, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the method further comprises:

based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses.

10. The method of claim 9, wherein the determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras comprises:

determining an orientation of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated orientations of the virtual cameras; and

determining the orientation of each of the plurality of multi-directional image capture apparatuses based on the determined orientations of the cameras.

11. The method of claim 9 or claim 10, wherein the position of each of the plurality of multi-directional image capture apparatuses is determined based on both the generated positions and the generated orientations of the virtual cameras.

12. The method of any one of claims 7 to 11, further comprising:

determining a pixel to real world distance conversion factor based on the determined positions of the cameras.

13. The method of any one of claims 7 to 12, further comprising:

determining an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras.

14. The method of claim 13, wherein the up-vector is determined by: determining two respective vectors between the position of one of the cameras and the positions of two other cameras; and

determining the cross product of the two vectors. 15. Apparatus configured to perform a method according to any one of claims 1 to 14.

16. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any one of claims 1 to 14·

17. Apparatus comprising:

at least one processor; and

at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:

perform image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

process the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and

based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.

18. The apparatus of claim 17, wherein a plurality of second images are generated from each first image.

19. The apparatus of claim 17, wherein each of the second images has a different viewing direction compared to each of the other second images. 20. The apparatus of claim 17, wherein the first images are fisheye images.

21. The apparatus of claim 17, wherein the second images are rectilinear images.

22. The apparatus of claim 17, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

23. The apparatus of claim 17, wherein the determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras; and

determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

24. The apparatus of claim 23, wherein the determination of a position of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining outliers and inliers in the generated positions of the virtual cameras; and

determining the positions of each of the cameras based only on the inliers.

25. The apparatus of claim 17, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, causes the apparatus to:

determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

26. The apparatus of claim 25, wherein the determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras comprises:

determining an orientation of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated orientations of the virtual cameras; and

determining the orientation of each of the plurality of multi-directional image capture apparatuses based on the determined orientations of the cameras.

27. The apparatus of claim 25, wherein the position of each of the plurality of multidirectional image capture apparatuses is determined based on both the generated positions and the generated orientations of the virtual cameras.

28. The apparatus of claim 23, wherein the computer program code, when executed by the at least one processor, causes the apparatus to:

determine a pixel to real world distance conversion factor based on the determined positions of the cameras.

29. The apparatus of claim 23, wherein the computer program code, when executed by the at least one processor, causes the apparatus to:

determine an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras.

30. The apparatus of claim 29, wherein the up-vector is determined by:

determining two respective vectors between the position of one of the cameras and the positions of two other cameras; and

determining the cross product of the two vectors.

31. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of: performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and

based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

32. Apparatus comprising:

means for performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi- directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and

means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

Description:
Methods and Apparatuses for Determining Positions of Multi- Directional Image Capture Apparatuses

Technical Field

The present specification relates to methods and apparatuses for determining positions of multi-directional image capture apparatuses.

Background

Camera pose registration is an important technique used to determine positions and orientations of image capture apparatuses such as cameras. The recent advent of commercial multi-directional image capture apparatuses, such as 360° camera systems, brings new challenges with regard to the performance of camera pose registration in a reliable, accurate and efficient manner. Summary

According to a first aspect, this specification describes a method comprising performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses. A plurality of second images may be generated from each first image.

Each of the second images may have a different viewing direction compared to each of the other second images. The first images may be fisheye images.

The second images may be rectilinear images.

The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras. The determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining a position of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras, and determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

The determination of a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining outliers and inliers in the generated positions of the virtual cameras, and determining the positions of each of the cameras based only on the inliers.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the method may further comprise determining an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

The determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras may comprise determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras, determining the orientation of each of the plurality of multi-directional image capture apparatuses based on the determined orientations of the cameras.

The position of each of the plurality of multi-directional image capture apparatuses may be determined based on both the generated positions and the generated orientations of the virtual cameras. The method may further comprise determining a pixel to real world distance conversion factor based on the determined positions of the cameras.

The method may further comprise determining an up-vector of each of the multidirectional image capture apparatuses based on the determined positions of the cameras. The up-vector may be determined by determining two respective vectors between the position of one of the cameras and the positions of two other cameras, and determining the cross product of the two vectors.

According to a second aspect, this specification describes apparatus configured to perform any method described with reference to the first aspect.

According to a third aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method described with reference to the first aspect.

According to a fourth aspect, this specification describes apparatus comprising at least one processor, and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: perform image re- projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, process the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.

A plurality of second images may be generated from each first image.

Each of the second images may have a different viewing direction compared to each of the other second images.

The first images may be fisheye images.

The second images may be rectilinear images.

The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

The determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining a position of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras, and determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

The determination of a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining outliers and inliers in the generated positions of the virtual cameras, and determining the positions of each of the cameras based only on the inliers.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor may cause the apparatus to determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

The determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras may comprise determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras, and determining the orientation of each of the plurality of multidirectional image capture apparatuses based on the determined orientations of the cameras.

The position of each of the plurality of multi-directional image capture apparatuses may be determined based on both the generated positions and the generated orientations of the virtual cameras. The computer program code, when executed by the at least one processor, may cause the apparatus to determine a pixel to real world distance conversion factor based on the determined positions of the cameras.

The computer program code, when executed by the at least one processor, may cause the apparatus to determine an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras. The up-vector may be determined by determining two respective vectors between the position of one of the cameras and the positions of two other cameras, and determining the cross product of the two vectors. According to a fifth aspect, this specification describes a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

The computer-readable code stored on the medium of the fifth aspect may further cause performance of any of the operations described with reference to the method of the first aspect. According to a sixth aspect, this specification describes apparatus comprising means for performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and means for determining a position of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras.

The apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.

Brief Description of the Drawings

For a more complete understanding of the methods, apparatuses and computer-readable instructions described herein, reference is now made to the following descriptions taken in connection with the accompanying drawings, in which:

Figure 1 illustrates an example of multiple multi-directional image capture apparatuses in an environment; Figure 2 illustrates an example of processing of an image captured by a multi-directional image capture apparatus to generate re-projected images;

Figures 3A to 3C illustrate the determination of the position and orientation of a multidirectional image capture apparatus relative to a reference coordinate system;

Figure 4 illustrates an example of the determination of an up-vector of a multi-directional image capture apparatus;

Figure 5 is a flowchart illustrating examples of various operations described herein.

Figure 6 is a schematic diagram of an example configuration of computing apparatus configured to perform various operations described herein.

Figure 7 illustrates an example of a computer-readable storage medium with computer readable instructions stored thereon.

Detailed Description

In the description and drawings, like reference numerals may refer to like elements throughout.

Figure 1 illustrates a plurality of multi-directional image capture apparatuses 10 located within an environment. The multi-directional image capture apparatuses 10 may, in general, be any apparatus capable of capturing images of the scene 13 from multiple different perspectives simultaneously. For example, multi-directional image capture apparatus 10 may be a 360° camera system (also known as an omnidirectional camera system or a spherical camera system). However, it will be appreciated that multidirectional image capture apparatus 10 does not necessarily have to have full angular coverage of its surroundings and may only cover a smaller field of view.

The term "image" used herein refers generally to visual content captured by multidirectional image capture apparatus 10. For example, an image may be a photograph or a single frame of a video. As illustrated in Figure 1, each multi-directional image capture apparatus 10 may comprise a plurality of cameras 11. The term "camera" used herein may refer to a sub-part of a multi-directional image capture apparatus 10 which performs the capturing of images. As illustrated, each of the plurality of cameras 11 of multi-directional image capture apparatus 10 may be facing a different direction to each of the other cameras 11 of the multi-directional image capture apparatus 10. As such, each camera 11 of a multidirectional image capture apparatus 10 may have a different field of view, thus allowing the multi-directional image capture apparatus 10 to capture images of a scene 13 from different perspectives simultaneously.

Similarly, as illustrated in Figure 1, each multi-directional image capture apparatus 10 may be at a different location to each of the other multi-directional image capture apparatuses 10. Thus, each of the plurality of multi-directional image capture apparatuses 10 may capture images of the environment (via their cameras 11) from different perspectives simultaneously.

In the example scenario illustrated in Figure 1, a plurality of multi-directional image capture apparatuses 10 are arranged to capture images of a particular scene 13 within the environment. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and orientation of each of the multidirectional image capture apparatuses 10. In particular, it may be desirable to determine these positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the multi-directional image capture apparatuses 10 relative to each other to be determined, which may be useful for a number of functions. For example, such information may be used for any of: performing 3D reconstruction of the captured environment, 3D registration of multi-directional image capture apparatuses 10 with respect to other sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors, audio positioning of audio sources, playback of object-based audio with respect to multi-directional image capture apparatus 10 location, and presenting multi-directional image capture apparatuses positions as 'hotspots' to which a viewer can switch during virtual reality (VR) viewing.

One way of determining the positions of multi-directional image capture apparatuses 10 is to use Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide orientation information. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the multi-directional image capture apparatuses 10. However, such instruments may be susceptible to local disturbance (e.g. magnetometers may be disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high.

Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by performing structure from motion (SfM) analysis on images captured by a multi-directional image capture apparatus 10. Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences. However, when used on images captured by multiple multi-directional image capture apparatuses 10, SfM analysis may be unreliable due to unreliable determination of point correspondences between images.

A computer vision method for performing camera pose registration which may address some or all of the challenges mentioned above will now be described.

Figure 2 illustrates one of the plurality of multi-directional image capture apparatuses 10 of Figure l. A camera n of the multi-directional image capture apparatus 10 may capture a first image 21. The first image 21 may be an image of a scene within the field of view 20 of the camera 11. In some examples, the lens of the camera 11 may be a fish-eye lens and so the first image 21 may be a fish-eye image (in which the camera field of view is enlarged). However, the method described herein may be applicable for use with lenses and resulting images of other types. More specifically, the camera pose registration method described herein may also be applicable to images captured by a camera with a hyperbolic mirror in which the camera optical centre coincides with the focus of the hyperbola and images captured by a camera with a parabolic mirror and an orthographic lens in which all reflected rays are parallel to the mirror axis and the orthographic lens is used to provide a focused image.

The first image 21 may be processed to generate one or more second images 22. More specifically, image re-projection may be performed on the first image 21 to generate one or more re-projected second images 22. For example, if the first image 21 is not a rectilinear image (e.g. a fish-eye image), it may be re-projected to generate one or more second images 22 which are rectilinear images (as illustrated by Figure 2). The type of re- projection may be dependent on the algorithm used to analyse the second images. For instance, as is explained below, a structure from motion algorithm, which are typically used to analyse rectilinear images, may be used, in which case the re-projection may be selected so as to generate rectilinear images. However, it will be appreciated that, in general, the re-projection may generate any type of second image, as long as the image type is compatible with the algorithm used to analyse the re-projected images.

Each re-projected second image 22 may be associated with a respective virtual camera. A virtual camera is an imaginary camera which does not physically exist, but which corresponds to a camera which would have captured the re-projected second image 22 with which it is associated. A virtual camera is defined by virtual camera parameters which represent the configuration of the virtual camera required in order to have captured to the second image 22. As such, for the purposes of the methods and operations described herein, a virtual camera can be treated as a real physical camera. For example, each virtual camera has, among other virtual camera parameters, a position and orientation which can be determined.

When a plurality of re-projected second images 22 are generated (e.g. Figure 2 illustrates nine re-projected second images 22 being generated), each re-projected second image 22 may have a different viewing direction compared to each of the other second images 22. In other words, the virtual camera of each second image 22 may have a different orientation compared to each of the other virtual cameras. Similarly, the orientation of each of the virtual cameras may also be different to the orientation of the real camera 11 which captured the first image 21. Furthermore, each virtual camera may have a smaller field of view than the real camera 11 as a result of the re-projection. The virtual cameras may have overlapping fields of view with each other.

The orientations of the virtual cameras may be pre-set. In other words, the re-projection of the first image 21 may generate second images 22 with associated virtual cameras which each have a certain pre-set orientation relative to the orientation of the real camera 11. For example, the orientation of each virtual camera may be pre-set such that it has certain yaw, pitch and roll angles relative to the real camera 11.

It will be appreciated that, in general, any number of second images 22 may be generated. Generally speaking, generating more second images 22 leads to less distortion in each of the second images 22, but may also increase computational complexity. The precise number of second images may be chosen based on the scene/environment being captured by the multi-directional image capture apparatus 10.

The re-projection process described with reference to Figure 2 may be performed for a plurality of first images 21 respectively captured by a plurality of cameras 11 of the multidirectional image capture apparatus 10. Furthermore, the same process may be performed for each of a plurality of multi-directional image capture apparatuses 10 which are capturing the same general environment, e.g. the plurality of multi-directional images capture apparatuses 10 as illustrated in Figure 1. In this way, all of the first images 21 captured by a plurality of multi-directional image capture apparatuses 10 of a particular scene may be processed as described above. It will be appreciated that the first images 21 may correspond to images of a scene at a particular moment in time. For example, if the plurality of multi-directional image capture apparatuses 10 are capturing video images, a first image 21 may correspond to a single video frame of a single camera 11, and all of the first images 21 may be video frames that are captured at the same moment in time.

Figures 3A to 3C illustrate the process of determining the positions and orientations of a multi-directional image capture apparatus 10. In Figures 3A to 3C, each arrow 31, 32, 33 represents the position and orientation of a particular element in a reference coordinate system 30. The base of the arrow represents the position and the direction of the arrow represents the orientation. More specifically, each arrow 31 in Figure 3A represents the position and orientation of a virtual camera associated with a respective second image, each arrow 32 in Figure 3B represents the position and orientation of a real camera 11 (determined based on the positions and orientations of the re-projected second images 22 derived from the first image 21 captured by the real camera), and the arrow 33 in Figure 3C represents the position and orientation of the multi-directional image capture apparatus 10.

After generating the one or more second images, the one or more second images are processed to generate respective positions of the virtual cameras associated with the second images, the generated positions being relative to the reference coordinate system 30. The processing of the one or more second images may also generate respective orientations of the virtual cameras relative to the reference coordinate system 30. The processing may involve processing a plurality of the second images generated from first images captured by a plurality of different multi-directional image capture apparatuses 10.

It will be appreciated that, in order to perform the processing for a plurality of multidirectional image capture apparatuses 10, it may be necessary for the multi-directional image capture apparatuses 10 to have at least partially overlapping fields of view with each other (for example, in order to allow point correspondence determination as described below).

The output of the processing for one multi-directional image capture apparatus is illustrated by Figure 3A. As shown, each cluster 34 of arrows 31 in Figure 3A represents the virtual cameras corresponding to a single first image of a single real camera. The above described processing may be performed by using a structure from motion (SfM) algorithm to determine the position and orientation of each of the virtual cameras. The SfM algorithm may operate by determining point correspondences between various ones of the second images and determining the positions and orientations of the virtual cameras based on the determined point correspondences. For example, the determined point correspondences may impose certain geometric constraints on the positions and orientations of the virtual cameras, which can be used to solve a set of quadratic equations to determine the positons and orientations of the virtual cameras relative to the reference coordinate system 30. More specifically, in some examples, the SfM process may involve any one of or any combination of the following operations: extracting images features, matching image features, estimating camera position, reconstructing 3D points, and performing bundle adjustment.

Once the positions of the virtual cameras have been determined, the position of each of the real cameras 11 relative to the reference coordinate system 30 may be determined based on the determined positions of the virtual cameras. Similarly, once the orientations of the virtual cameras have been determined, the orientation of each of the real cameras 11 relative to the reference coordinate system 30 may be determined based on the

determined orientations of the virtual cameras. For example, the position of each real camera may be determined by averaging the positions of the virtual cameras

corresponding to the real camera. Similarly, the orientation of each real camera may be determined by averaging the orientation of the virtual cameras corresponding to the real camera. In other words, referring to Figures 3A to 3C, each cluster of arrows 34 in Figure 3A may be averaged to obtain a corresponding arrow 32 in Figure 3B.

The above described determination of the positions of each of the real cameras 11 may further involve determining outliers and inliers in the generated positions of the virtual cameras and determining the positions of each of the real cameras 11 based only on the inliers. For example, the above mentioned averaging may involve only averaging the inlier positions. This may improve the accuracy of the determined positions of the real cameras 11.

The inlier and outlier determination may be performed according to:

d\ = ICj? — Me i&H(C v fr?¾_?/ )l, Vc t * & (-virtual

d ff = Mecliaii({i%< . . . , <¾y }) mliers -~~ < w, Vi E where Cvi uai is the set of the positions of the virtual cameras, d, is a measure of the difference between the position of a virtual camera and the median position of all of the virtual cameras, d a is the median absolute deviation (MAD), m is a threshold value below which a determined virtual camera position is considered an inlier (for example, m may be set to be 2).

It will therefore be understood from the above expressions that a virtual camera may be determined to be an inlier if the difference between its position and the median position of all of the virtual cameras divided by the median absolute deviation is less than a threshold value. That is to say, for a virtual camera to be considered an inlier, the difference between its position and the median position of all of the virtual cameras must be less than a threshold number of times larger than the median absolute deviation. The orientation of each real camera may be determined in the following way. The orientation of each virtual camera may be represented by a rotation matrix Rv. Similarly the orientation of each real camera 11 relative to the reference coordinate system 30 may be represented by a rotation matrix Ri. The orientation of each virtual camera relative to its corresponding real camera 11 may be known as this may be pre-set (as described above with reference to Figure 2), and may be represented by rotation matrix R v i. Thus, the rotation matrix of each virtual camera may be used to obtain a rotation matrix for the real camera 11 according to:

Put another way, the rotation matrix of a real camera (Ri) may be determined by multiplying the rotation matrix of a virtual camera (Rv) onto the inverse of the rotation matrix representing the orientation of the virtual camera relative to the orientation of the real camera (Rvi 1 ).

For example, if there are nine virtual cameras corresponding to each real camera (as illustrated in Figure 3) then nine rotation matrices are obtained for the orientation of each real camera 11. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for each real camera 11. The set of Euler angles may then be averaged according to: ™ sextan

Where θι represents the averaged Euler angles for a real camera and 0,· represents the set of Euler angles. Put another way, the averaged Euler angles are determined by calculating the sum of the sines of the set of Euler angles divided by the sum of the cosines of the set of Euler angles, and taking the arctangent of the ratio. 0/may then be converted back into a rotation matrix representing the final determined orientation of real camera 11.

It will be appreciated that the above formula is for the specific example in which there are nine virtual cameras per real camera 11 - the maximum value of z may vary according to the number of virtual cameras generated.

In some examples, unit quaternions may be used instead of Euler angles for the abovementioned process. The use of unit quaternions to represent orientation is a known mathematical technique and will not be described in detail here. Briefly, quaternions q q 2 , ... qN corresponding to the virtual camera rotation matrices may be determined. Then, the quaternions may be transformed, as necessary, to ensure that they are all on the same side of the 4D hypersphere. Specifically, one representative quaternion q M is selected and the signs of any quaternions qi where the product of qM and qi is less than zero may be inverted. Then, all quaternions qi (as 4D vectors) may be summed into an average quaternion q A , and q A may be normalised into a unit quaternion q A '. The unit quaternion q A may represent the averaged orientation of the camera and may be converted back to other orientation representations as desired. Using unit quaternions to represent orientation may be more numerically stable than Euler angles.

Once the orientation of each real camera 11 of a multi-directional image capture apparatus 10 is known, the orientation of the multi-directional image capture apparatus 10 relative to the reference coordinate system 30 may be determined in the following way. The orientation of the multi-directional image capture apparatus 10 may be represented by rotation matrix Rd ev - The orientation of each real camera 11 relative to its corresponding multi-directional image capture apparatus 10 may be known, and may be represented by rotation matrix Ridev Thus, the rotation matrices Ri of the real cameras 11 may be used to obtain a rotation matrix for multi-directional image capture apparatus 10 the according to:

Put another way, the rotation matrix of a multi-direction image capture apparatus (Rdev) can be determined by multiplying the rotation matrix of a real camera (Rj) onto the inverse of the matrix representing the orientation of the real camera relative to the orientation of the multi-directional image capture apparatus (Ridev 1 )-

For example, if there are six real cameras 11 corresponding to the multi-directional image capture apparatus 10 (as illustrated in Figure 3) then six rotation matrices are obtained for the orientation of the multi-directional image capture apparatus 10. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for the multi-directional image capture apparatus 10. The set of Euler angles may then be averaged and converted into a final rotation matrix representing the orientation of the multi-directional image capture apparatus 10. This may be done using the same process as described above, with corresponding equations. Similarly, as above, unit quaternions may be used instead of Euler angles.

The position of the multi-directional image capture apparatus 10 may be determined in the following way. The position of each real camera 11 relative to its corresponding multi- directional image capture apparatus 10 may be known, and may be represented by vector videv However, videv is relative to a local coordinate system of the multi-directional image capture apparatus. To obtain the position of each real camera 11 relative to its

corresponding multi-directional image capture apparatus 10 (relative the reference coordinate system 30), uw ei ,may be rotated according to:

Where Rdev is the final rotation matrix of the multi-directional image capture apparatus 10 as determined above, and v w id ev is a vector representing the position of each real camera 11 relative to the multi-directional image capture apparatus 10 relative the reference coordinate system 30. As such, the position of a real camera 11 relative to its

corresponding multi-directional image capture apparatus (relative the reference coordinate system 30) may be determined by multiplying the final rotation matrix of the multi-directional image capture apparatus 10 onto the position of the real camera relative to the multi-directional image capture apparatus in the local coordinate system of the multi-directional image capture apparatus. Therefore, the position of the multi-directional image capture apparatus 10 may be determined according to:

w Where Q represents the position vector of each of the real cameras 11 as determined above, v w id ev represents the position of each real camera 11 relative to the multi-directional image capture apparatus as determined above, and Cdev is a set of position vectors of the multi-directional image capture apparatus 10 in the reference coordinate system. Put another way, a position of the multi-directional image capture apparatus 10 may be determined by taking the difference between the position vector of a real camera 11 and the position vector of the real camera relative to the multi-directional image capture apparatus.

The same inlier and outlier determination and averaging process as described above may then be applied to Cdev to obtain a final position for the multi-directional image capture apparatus 10, except substituting the determined positions of the virtual cameras for the set of positions of the multi-directional image capture apparatus 10.

In examples in which only one second image 22 is generated for each first image 21, and thus only one virtual camera's position and/ or orientation is determined, the position of the real camera 11 may simply be determined to be the position of the one virtual camera, and the, the orientation of the real camera 11 may simply be determined to be the orientation of the one virtual camera Once the positions of the real cameras 11 in the reference coordinate system 30 have been determined, a pixel to real world distance conversion factor may be determined. This may be performed by determining the distance between a pair of real cameras 11 on a multidirectional image capture apparatus 10 in both pixels and in a real world distance (e.g. metres). The pixel distance may be determined from the determined positions of the real cameras 11 in the reference coordinate system. The real world distance may be known already from known physical parameters of the multi-directional image capture apparatus 10. The pixel to real world distance conversion factor may then be simply calculated by taking the ratio of the two distances. This may be further refined by calculating the factor based on multiple different pairs of real cameras 11 of the multi-directional image capture apparatus 10, determining outliers and inliers (for example, in the same way as described above), and averaging the inliers to obtain a final pixel to real world distance conversion factor. The pixel to real world distance conversion factor may be denoted S P ix e i 2 meter in the present specification.

In addition, an up-vector of each of the multi-directional image capture apparatuses 10 may also be determined based on the determined positions of the real cameras 11. As illustrated in Figure 4, this may be performed by determining two vectors Vi and V 2 between the position of one of the real cameras 11 and the positions of two other real cameras 11. As such, the up-vector may be determined based on the positions of a group of three real cameras 11. The up-vector may be determined by determining the cross-product of Vi and V 2 in accordance with the right hand rule. As illustrated in Figure 4, V 3 is the result of the cross product of Vi and V 2 and represents the direction of the up-vector. V 3 may be normalised to obtain a unit vector representing the up-vector.

It will be appreciated that the up-vector of a multi-directional image capture apparatus 10 may be defined based on a group of real cameras 11 of the multi-directional image capture apparatus 10 which are, in normal use, intended to be in a plane that is perpendicular to gravity. As such, the up-vector may be another representation of the orientation of the multi-directional image capture apparatus 10. Further, if it is assumed that the multidirectional image capture apparatus 10 is placed in an orientation in which the plane of the cameras in the group is actually perpendicular to gravity, the up-vector may correspond to the real world up-vector (the vector opposite in direction to the local gravity vector). The up-vector may provide further information which can be used in 3D reconstruction of the captured environment. In some instances, the reference coordinate system discussed herein may not correspond exactly with the real world (for instance, the "up" direction in the reference coordinate system may not correspond with "up" direction in the real world). As such, assuming that the multi-directional image capture apparatuses were/are being used in a level orientation, the calculated up-vector may allow a 3D reconstruction of the captured environment to be aligned with the real world (e.g. by ensuring that the up-vector is pointing in an up direction in the 3D reconstruction).

As above, a set of up-vectors may be determined for each multi-directional image capture apparatus 10 based on determining Vi, V 2 and V 3 for a plurality of different groups of three cameras. Then outliers and inliers may be determined (in the same way as above, except substituting the determined positions of the virtual cameras for the set of determined up- vectors) and a final up-vector may be determined based only on the inliers (e.g. by averaging the inliers). Once the up-vector is determined, it may be rotated to align with a known local gravity vector (which is, for instance, determined using an accelerometer forming part of, or otherwise co-located with, the multi-directional image capture apparatus 10) to determine the real world up-vector in the reference coordinate system 30 (if it is not already aligned).

Once final positions for a plurality of multi-directional image capture apparatuses 10 has been determined, the relative positions of the plurality of multi-directional image capture apparatuses may be determined according to:

In the above equation, ΐ ί 5 represents the relative positions of one of the plurality of multi-directional image capture apparatuses (apparatus/) relative to another one of the plurality of multi-directional image capture apparatuses (apparatus i). O ' dev is the position of apparatus j and dev is the position of apparatus i. S P i X ei 2 meter is the pixel to real world distance conversion factor.

As will be understood from the above expression, a vector representing the relative position of one of the plurality of multi-directional image capture apparatuses relative to another one of the plurality of multi-directional image capture apparatuses may be determined by taking the difference between their positions. This may be divided by the pixel-to-real world distance conversion factor depending on the scale desired.

As such, the positions of all of the multi-directional image capture apparatuses 10 relative to one another may be determined in the reference coordinate system 30.

Figure 5 is a flowchart showing examples of operations as described herein.

At operation 5.1, a plurality of first images 21 which are captured by a plurality of multidirectional image capture apparatuses 10 may be received. For example, image data corresponding to the first images 21 may be received at computing apparatus 60 (see Figure 6). At operation 5.2, image re-projection may be performed on each of the first images 21 to obtain one or more re-projected second images 22 corresponding to respective virtual cameras. At operation 5.3, the second images 22 may be processed to obtain positions and orientations of the virtual cameras. For example, the second images 22 may be processed using a structure from motion algorithm.

At operation 5.4, positions and orientations of real cameras 11 may be determined based on the positions and orientations of the virtual cameras determined at operation 5.3.

At operation 5.5, a pixel-to-real world distance conversion factor may be determined based on the positions of the real cameras 11 determined at operation 5.4. At operation 5.6, an up-vector of each multi-directional image capture apparatus 10 may be determined based on the positions of the real cameras 11 determined at operation 5.4.

At operation 5.7, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the real cameras 11 determined at operation 5.4.

At operation 5.8, positions of the plurality of multi-directional image capture apparatuses 10 relative to each other may be determined based on the positions of the plurality of multi-directional image capture apparatuses 10 determined at operation 5.7.

It will be appreciated that the position of a real camera 11 as described herein may be the position of the centre of a lens of the real camera 11. Similarly, the position of a virtual camera may be the position of the centre of a virtual lens of the virtual camera. The position of the multi-directional image capture apparatus 10 may be the centre of the multi-directional image capture apparatus (e.g. if a multi-directional image capture apparatus is spherically shaped, its position may be defined as the geometric centre of the sphere).

Figure 6 is a schematic block diagram of an example configuration of computing apparatus 60, which may be configured to perform any of or any combination of the operations described herein. The computing apparatus 60 may comprise memory 61, processing circuitry 62, an input 63, and an output 64. The processing circuitry 62 may be of any suitable composition and may include one or more processors 62A of any suitable type or suitable combination of types. For example, the processing circuitry 62 may be a programmable processor that interprets computer program instructions and processes data. The processing circuitry 62 may include plural programmable processors. Alternatively, the processing circuitry 62 may be, for example, programmable hardware with embedded firmware. The processing circuitry 62 may be termed processing means. The processing circuitry 62 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 62 may be referred to as computing apparatus.

The processing circuitry 62 described with reference to Figure 6 is coupled to the memory 61 (or one or more storage devices) and is operable to read/write data to/from the memory. The memory 61 may store thereon computer readable instructions 612A which, when executed by the processing circuitry 62, may cause any one of or any combination of the operations described herein to be performed. The memory 61 may comprise a single memory unit or a plurality of memory units upon which the computer-readable instructions (or code) 612A is stored. For example, the memory 61 may comprise both volatile memory 611 and non-volatile memory 612. For example, the computer readable instructions 612A may be stored in the non-volatile memory 612 and may be executed by the processing circuitry 62 using the volatile memory 611 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memories 61 in general may be referred to as non-transitory computer readable memory media.

The input 63 may be configured to receive image data representing the first images 21 described herein. The image data may be received, for instance, from the multi-directional image capture apparatuses 10 themselves or may be received from a storage device. The output may be configured to output any of or any combination of the camera pose registration information described herein. As discussed above, the camera pose registration information output by the computing apparatus 60 may be used for various functions as described above with reference to Figure 1. Figure 7 illustrates an example of a computer-readable medium 70 with computer- readable instructions (code) stored thereon. The computer-readable instructions (code), when executed by a processor, may cause any one of or any combination of the operations described above to be performed.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term "circuitry" refers to all of the following: (a) hardware- only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above- described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of Figure 5 is an example only and that various operations depicted therein may be omitted, reordered and/or combined. For example, it will be appreciated that operations S5.5 and S5.6 as illustrated in Figure 5 may be omitted.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.