Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MULTIPLE VIEW COLOUR RECONSTRUCTION
Document Type and Number:
WIPO Patent Application WO/2018/078222
Kind Code:
A1
Abstract:
This specification describes a method comprising, for each of plural constituent portions of input image data (311 ) captured by at least one image sensor overlaid with a colour filter array, determining a depth value (315) for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture (313) and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array. The method further comprises projecting pixels of the input image data into an output image perspective (321) based on the determined depth values (312), the associated directions of capture and/or the associated locations of capture, and information describing the output image perspective. The method further comprises performing colour reconstruction (330) based on the pixels of the input image data projected into the output image perspective.

Inventors:
ROIMELA KIMMO (FI)
Application Number:
PCT/FI2017/050744
Publication Date:
May 03, 2018
Filing Date:
October 30, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECHNOLOGIES OY (FI)
International Classes:
H04N21/218; G06T15/20; H01L31/0216
Foreign References:
EP2327059B12014-08-27
US20100259595A12010-10-14
US20100097444A12010-04-22
Other References:
ZITNICK, CL. ET AL.: "High-quality video view interpolation using a layered representation", ACM TRANSACTIONS ON GRAPHICS (TOG) - PROCEEDINGS OF ACM SIGGRAPH 2004, vol. 23, no. 3, 12 August 2004 (2004-08-12), pages 600 - 608, XP002354522, Retrieved from the Internet [retrieved on 20180212]
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
Claims

1. A method comprising:

for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, determining a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array;

projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective; and

performing colour reconstruction based on the pixels of the input image data projected into the output image perspective.

2. The method of claim 1, wherein projecting pixels of the input image data into the output image perspective comprises associating each pixel in the output image perspective with one of the determined depth values.

3. The method of claim 2, wherein associating each pixel in the output image perspective with one of the determined depth value comprises:

projecting the determined depth values into three-dimensional space based on the associated directions of capture and/or the associated locations of capture; and

projecting the projected depth values into the output image perspective such that each pixel location in the output image perspective is associated with a single depth value.

4. The method of claim 3, wherein projecting the projected depth values into the output image perspective comprises:

responding to an instance of more than one of the projected depth values

corresponding with a pixel location in the output image perspective by associating only the depth value indicating the shortest depth with the pixel location.

5. The method of any of claims 2 to 4, wherein projecting the pixels of the input image data into the output image perspective comprises:

for each output pixel in the output image perspective:

estimating an orientation in three-dimensions of a plane of a surface of an imaged object at the location of the output pixel based on the depth values associated with locations of neighbouring output pixels in the output image perspective; and projecting one or more non-occluded pixels of the input image data which contribute to the output pixel onto the plane.

6. The method of claim 5, wherein performing colour reconstruction based on the pixels of the input image data projected into the output image perspective comprises:

colour-reconstructing each output pixel in the output image perspective based on the contributing pixels of the input image data which have been projected onto the plane estimated for that output pixel. 7. The method of claim 6, wherein colour-reconstructing each output pixel in the output image perspective comprises:

for each output pixel, computing a weighted average of the projected contributing pixels associated with each colour of the colour filter array. 8. The method of claim 7, wherein each contributing pixel is given a weight that has an inverse relationship with a distance between the location of projection onto the plane of the contributing pixel and the location of the output pixel.

9. The method of any preceding claim, wherein at least two of the plural constituent portions of input image data were captured with different rotational orientations relative to their respective direction of capture.

10. The method of any preceding claim, wherein the method is performed at the image capture system that captures the input image data.

11. Apparatus configured to perform the method of any preceding claim.

12. An image capture system configured to perform the method of any of claims 1 to 10. 13. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform the method of any of claims 1 to 10.

14. Apparatus comprising:

at least one processor; and

at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus:

for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, to determine a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array;

to project pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective; and

to perform colour reconstruction based on the pixels of the input image data projected into the output image perspective.

15. The apparatus of claim 14, wherein causing the apparatus to project pixels of the input image data into the output image perspective comprises causing the apparatus to associate each pixel in the output image perspective with one of the determined depth values. 16. The apparatus of claim 15, wherein causing the apparatus to associate each pixel in the output image perspective with one of the determined depth value comprises causing the apparatus:

to project the determined depth values into three-dimensional space based on the associated directions of capture and/or the associated locations of capture; and

to project the projected depth values into the output image perspective such that each pixel location in the output image perspective is associated with a single depth value.

17. The apparatus of claim 16, wherein causing the apparatus to project the projected depth values into the output image perspective comprises causing the apparatus:

to respond to an instance of more than one of the projected depth values

corresponding with a pixel location in the output image perspective by associating only the depth value indicating the shortest depth with the pixel location.

18. The apparatus of claim 15, wherein causing the apparatus to project the pixels of the input image data into the output image perspective comprises causing the apparatus:

for each output pixel in the output image perspective:

to estimate an orientation in three-dimensions of a plane of a surface of an imaged object at the location of the output pixel based on the depth values associated with locations of neighbouring output pixels in the output image perspective; and to project one or more non-occluded pixels of the input image data which contribute to the output pixel onto the plane.

19. The apparatus of claim 18, wherein causing the apparatus to perform colour reconstruction based on the pixels of the input image data projected into the output image perspective comprises causing the apparatus:

to colour-reconstruct each output pixel in the output image perspective based on the contributing pixels of the input image data which have been projected onto the plane estimated for that output pixel.

20. The apparatus of claim 19, wherein causing the apparatus to colour-reconstruct each output pixel in the output image perspective comprises causing the apparatus:

for each output pixel, to compute a weighted average of the projected contributing pixels associated with each colour of the colour filter array.

21. The apparatus of claim 20, wherein causing the apparatus to compute a weighted average of the projected contributing pixels associated with each colour of the colour filter array comprises causing the apparatus to give each contributing pixel a weight that has an inverse relationship with a distance between the location of projection onto the plane of the contributing pixel and the location of the output pixel.

22. The apparatus of claim 14, wherein at least two of the plural constituent portions of input image data were captured with different rotational orientations relative to their respective direction of capture.

23. The apparatus of claim 14, wherein the apparatus is an image capture system. 24. A computer-readable medium having computer-readable code stored thereon, the computer-readable code, when executed by at least one processor, causing performance of at least:

for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, determining a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array;

projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective; and

performing colour reconstruction based on the pixels of the input image data projected into the output image perspective.

25. Apparatus comprising:

means for determining, for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array;

means for projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective; and

means for performing colour reconstruction based on the pixels of the input image data projected into the output image perspective.

Description:
Multiple View Colour Reconstruction

Field

This specification relates to the field of image processing, particularly that of colour reconstruction. More specifically, the specification relates to the colour reconstruction of image data describing multiple different perspectives of a scene.

Background

Digital cameras typically utilise a monochrome image sensor overlaid with a colour filter array (e.g. RGB) to enable the capture of colour information. The colour filter array is commonly arranged in a so-called Bayer pattern, where each group of 2x2 pixels on the image sensor receives one red, one blue, and two green pixels. This pattern is then reconstructed into a full-colour image at the resolution of the image sensor by estimating the intensity of the two missing colour channels for each pixel from the pixels in the

neighbourhood.

Colour reconstruction (e.g. that involved in "debayering") is non-trivial and sometimes results in colour artefacts, such as Moire patterns, in high-frequency details or aliasing at sharp edges. To reduce these issues, an optical low-pass filter is typically placed in front of the colour filter array and the imaging sensor. However, while this may reduce the presence and extent of colour artefacts, imaging resolution is also reduced.

Multiple capture systems (such as Nokia's OZO®) may differ from common digital cameras in that they have several imaging units (each comprising an image sensor overlaid with a colour filter array), each of which captures a separate colour image. These images can then be combined in post-processing into panoramic and/or stereo images.

The imaging resolution of sensors commonly used in multiple capture systems is relatively poor, particularly for VR use cases, and the resolution of the captured content is nowhere near the visual acuity of human vision. Increasing sensor resolution poses its own problems, however, as smaller photo-sites can result in reduced sensitivity to light, more noise, and lower dynamic range. As such, engineers in the field are looking for other ways in which to improve the resolution of the images produced by multiple capture devices. Summary

In a first aspect, this specification describes a method comprising, for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, determining a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array. The method further comprises projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective. The method further comprises performing colour reconstruction based on the pixels of the input image data projected into the output image perspective.

Projecting pixels of the input image data into the output image perspective may comprise associating each pixel in the output image perspective with one of the determined depth values. Associating each pixel in the output image perspective with one of the determined depth value may comprise projecting the determined depth values into three-dimensional space based on the associated directions of capture and/or the associated locations of capture, and projecting the projected depth values into the output image perspective such that each pixel location in the output image perspective is associated with a single depth value. Projecting the projected depth values into the output image perspective may comprise responding to an instance of more than one of the projected depth values corresponding with a pixel location in the output image perspective by associating only the depth value indicating the shortest depth with the pixel location.

Projecting the pixels of the input image data into the output image perspective may comprise, for each output pixel in the output image perspective, estimating an orientation in three- dimensions of a plane of a surface of an imaged object at the location of the output pixel based on the depth values associated with locations of neighbouring output pixels in the output image perspective, and projecting one or more non-occluded pixels of the input image data which contribute to the output pixel onto the plane. Performing colour reconstruction based on the pixels of the input image data projected into the output image perspective may comprise colour-reconstructing each output pixel in the output image perspective based on the contributing pixels of the input image data which have been projected onto the plane estimated for that output pixel. Colour-reconstructing each output pixel in the output image perspective may comprise, for each output pixel, computing a weighted average of the projected contributing pixels associated with each colour of the colour filter array. Each contributing pixel may be given a weight that has an inverse relationship with a distance between the location of projection onto the plane of the contributing pixel and the location of the output pixel. At least two of the plural constituent portions of input image data may have been captured with different rotational orientations relative to their respective direction of capture.

Additionally or alternatively, the method may be performed at the image capture system that captures the input image data.

In a second aspect, this specification describes apparatus configured to perform any method as described with reference to the first aspect.

In a third aspect, this specification describes an image capture system configured to perform any method as described with reference to the first aspect.

In a fourth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the first aspect.

In a fifth aspect, this specification describes apparatus comprising at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus: for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, to determine a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array; to project pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/ or locations of capture and information describing the output image perspective; and to perform colour reconstruction based on the pixels of the input image data projected into the output image perspective. Causing the apparatus to project pixels of the input image data into the output image perspective may comprise causing the apparatus to associate each pixel in the output image perspective with one of the determined depth values. Causing the apparatus to associate each pixel in the output image perspective with one of the determined depth value may comprise causing the apparatus to project the determined depth values into three- dimensional space based on the associated directions and/or locations of capture and to project the projected depth values into the output image perspective such that each pixel location in the output image perspective is associated with a single depth value. Causing the apparatus to project the projected depth values into the output image perspective may comprise causing the apparatus to respond to an instance of more than one of the projected depth values corresponding with a pixel location in the output image perspective by associating only the depth value indicating the shortest depth with the pixel location. Causing the apparatus to project the pixels of the input image data into the output image perspective may comprise causing the apparatus, for each output pixel in the output image perspective, to estimate an orientation in three-dimensions of a plane of a surface of an imaged object at the location of the output pixel based on the depth values associated with locations of neighbouring output pixels in the output image perspective, and to project one or more non-occluded pixels of the input image data which contribute to the output pixel onto the plane. Causing the apparatus to perform colour reconstruction based on the pixels of the input image data projected into the output image perspective may comprise causing the apparatus to colour-reconstruct each output pixel in the output image perspective based on the contributing pixels of the input image data which have been projected onto the plane estimated for that output pixel. Causing the apparatus to colour-reconstruct each output pixel in the output image perspective may comprise causing the apparatus, for each output pixel, to compute a weighted average of the projected contributing pixels associated with each colour of the colour filter array. Causing the apparatus to compute a weighted average of the projected contributing pixels associated with each colour of the colour filter array may comprise causing the apparatus to give each contributing pixel a weight that has an inverse relationship with a distance between the location of projection onto the plane of the contributing pixel and the location of the output pixel.

At least two of the plural constituent portions of input image data may have been captured with different rotational orientations relative to their respective direction of capture.

Alternatively or additionally, the apparatus may be an image capture system.

In a sixth aspect, this specification describes computer-readable medium having computer- readable code stored thereon, the computer-readable code, when executed by at least one processor, causing performance of at least, for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, determining a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array; projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture and information describing the output image perspective; and performing colour reconstruction based on the pixels of the input image data projected into the output image perspective. The computer-readable code stored on the medium of the sixth aspect may further cause performance of any of the operations described with reference to the method of the first aspect.

In a seventh aspect, this specification describes apparatus comprising means for

determining, for each of plural constituent portions of input image data captured by at least one image sensor overlaid with a colour filter array, a depth value for each input image pixel in the constituent portion, wherein each constituent portion of the input image data has a different associated direction of capture and/or a different associated location of capture and represents a different view of a scene, and wherein each pixel of input image data is associated with a respective one of the colours of the colour filter array. The apparatus further comprises means for projecting pixels of the input image data into an output image perspective based on the determined depth values, the associated directions and/or locations of capture, and information describing the output image perspective. The apparatus further comprises means for performing colour reconstruction based on the pixels of the input image data projected into the output image perspective. The apparatus of the seventh aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.

Brief Description of the Figures

For better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:

Figure ι is an example of a multiple image capture system configured to capture multiple images with various different perspectives of a scene;

Figure 2 is a schematic illustration of part of an image capture device, specifically an image sensor overlaid with a colour filter array;

Figure 3 is a functional block diagram of image processing apparatus for performing colour reconstruction based on input image data describing multiple different perspectives of a scene;

Figure 4 is a flow chart illustrating various operations which may be performed by the image processing apparatus of Figure 3;

Figure 5 is a schematic illustration of an example hardware configuration of the image processing apparatus of Figure 3; and

Figure 6 is an illustration of a computer-readable medium upon which computer readable code may be stored. Detailed Description

In the description and drawings, like reference numerals refer to like elements throughout.

Figure 1 is an example of a multiple image capture system 100 configured to capture multiple images with various different perspectives of a scene. In the example of Figure l, the multiple image capture system 100 is configured to capture image data from a plurality of directions, e.g. simultaneously, by way of plural image sensors. In other examples, however, the multiple image capture system 100 may be configured to capture one image at a time, with the perspective of the capturing image sensor being changed between each image capture by changing the capture direction and/ or the location of the sensor.

Multiple image capture systems 100 (which may be referred to as multiple capture systems or, simply, multi-capture systems) may be used, for instance, to generate panoramic or immersive images or video.

In the example of Figure l, the multi-capture system loo comprises a plurality of image capture devices no at least one some of which are arranged to capture image data from different perspectives of a scene. In the example of Figure l, each image capture device no has a different orientation (or, put another way, faces in a different direction of capture) relative to the other capture devices. In other examples, some of the capture devices no may have the same or a similar orientation but their different location results in a different perspective being captured. Each of the image capture devices no may comprise a camera module. Although the capture devices of the system 100 of Figure 1 are integrated into a single device, it will be appreciated that the system may instead be formed of separate capture devices.

As illustrated in a simplified manner in Figure 2, each capture device 110 comprises an image sensor 111 and a colour filter array (CFA) 112. The image sensor 111 comprises an array of light sensitive regions (photo-sites) 113, each of which is configured to output a sample (or pixel), the value of which is indicative of the intensity of light falling on the region. For simplicity, only four of the light sensitive regions have been given reference numerals (113a to 113d). However, it will be understood that each of the boxes of which the image sensor 111 of Figure 2 is made up is a light sensitive region.

Although shown offset from the image sensor 111, the CFA 112 overlies the image sensor 111. The CFA includes a plurality of colour filtering regions 114 configured to filter light passing through the colour filtering region and falling on the corresponding light sensitive region below. Each colour filtering region 114 permits light of a specific colour (in the example of Figure 2, one of red, green and blue). The colour filtering regions 114 are arranged in a particular pattern, depending on the type of the CFA 112. In this example, the CFA 112 is a Bayer filter so the colour filtering regions are arranged accordingly (each 2x2 group of regions including two green regions and one each of blue and red). However, it will be appreciated that the concepts described herein are not limited to use with Bayer filters; they are also applicable to CFAs of different types.

As a result of the CFA 112 overlying the image sensor 111 the output from each image sensing region is associated with a particular colour of the colour filter arrangement. Specifically, the output from each image sensing region is indicative of an intensity of light, which is incident on the image sensing region, of the specific colour of the corresponding region of the CFA. For instance, as can be seen from Figure 2, the output of sensing region 113a is associated with the colour red R, the outputs from sensing region 113b and 113c are associated with the colour green G and the output from sensing region 113d is associated with the colour B.

Image data derived from the image sensor 111 as a whole indicates the intensity of light incident on each of the image sensing regions. Based on knowledge of the configuration of the CFA 112 and the image data, it is possible to reconstruct a full colour image at the resolution of the image sensor 111. This is called colour reconstruction or demosaicing and, in the specific case in which a Bayer filter has been used, may be referred to as "debayering". In the example of a Bayer filter, reconstruction may include estimating the intensity of the two missing colour channels for each pixel from the samples in the neighbourhood. This specification describes processing methods for performing colour reconstruction on image data derived from a multiple capture (or multi-view) system. As will be understood from the below description, the colour reconstruction processing described herein may result in the number of image data samples available per output image pixel and/or the resolution of the output image being increased, without increasing the resolution of the individual capturing image sensor(s). Specifically, the resolution of the output image may be increased compared to the output image resolution obtainable for each lens/sensor combination and also compared to the resolution of panoramas or other virtual reality renderings stitched from the individual output images. The benefits of a higher number of samples and/or a higher resolution are more pronounced the more overlapping views are included in the input image data.

In addition, multi-view colour-reconstruction as described herein may also decouple the input images from the output images. This may enable, for instance, less redundancy when coding the output image(s). As an example, and as will be appreciated from the below description, input image data representing multiple different views of a scene (each including at least one overlapping region) can be processed using the multi-view colour reconstruction process described herein with the output being, for instance, a single panoramic image. This may be particularly beneficial if the multi-view colour-reconstruction is performed at the multi-capture system, in which case the input image data may not need to be permanently stored or transmitted.

Methods for performing colour reconstruction on input image data derived from a multiple capture system will now be described with reference to Figures 3 and 4. As will be appreciated, the colour-reconstruction described herein is performed based on the whole input image data and not based on one constituent portion of the data (each of which represents a different view of a scene) at a time. Figure 3 is a schematic illustration of image processing apparatus 300 comprising functional blocks (in the form of hardware and/or software) which may function together to output at least one reconstructed colour image from input image data representing multiple views of a scene, for instance derived from a multiple capture system 100, such as that illustrated in Figure 1. Figure 4 is a flow chart illustrating various operations which may be performed so as to process input image data derived from a multiple capture system in such a way as to arrive at a colour-reconstructed image.

The image processing apparatus 300 of Figure 3 comprises a depth-mapping function 310. The depth mapping function is configured to receive input image data. As discussed above, the input image data is derived from at least one image sensor overlaid with a colour filter array. The input image data is formed of constituent portions of image data each having a different associated direction of image capture (put another way, the direction in which the sensor is facing when capturing the constituent portion of image data) and/or a different location of capture (put another way, the location of the sensor when capturing the constituent portion of image data). As such, each constituent portion of input image data represents a different view or perspective of a scene. Each constituent portion includes pixels or samples representative of intensity of light incident at each image sensing region of the image sensor which captures it. Each pixel or sample is associated with a particular colour of the CFA 112 overlying the image sensor during image capture. In some examples, each constituent portion of image data may be derived from a different image sensor. In other examples, each constituent portion of image data may be captured using the same image sensor (or group of image sensors) but at different time. The depth mapping function 310 is configured to compute depth map data 312. The depth map data 312 includes depth values corresponding to each pixel in input image data 311. The depth value is indicative of the distance of the object represented by the pixel from the multiple capture system 100. As will be described in more detail below with reference to Figure 4, the depth map data 312 may be generated based on the input image data 311 and/ or depth sensor data 314. Depth sensor data 314 may be generated by one or more depth sensor 315·

The depth map data 312 is registered relative to the multiple capture system 100. In this way, it is possible to determine which portion of depth map data 312 is associated with which part of the scene surrounding the multiple capture system 100. More specifically, it is possible to determine which depth value is associated with which specification location within the scene.

The depth map data 312 that is generated by the depth mapping function 310 is passed to the pixel projection function 320. The pixel projection function 320 is configured to project pixels of the input image data 311 into an output image perspective (or view). The pixel projection is performed based on the input image data 311, the depth map data 312, and information descriptive of the output image 321. The information descriptive of the output image 321 describes the characteristics of the one or more output images which are to be reconstructed. Such characteristics may include one or more of the number, geometry, virtual location (that is the location of the output image relative to the capture system in terms of distance and/or direction), resolution of the output images and orientation of the output image(s).

The pixel projection function 320 may be configured to project pixels of the input image data 311 into one or more output image view by associating each pixel in the output image view ("output pixel") with a depth value from the depth map data 312. Subsequently, the pixel projection module 320 projects the input image pixels which correspond with the output image view into the output image view based on the depth value associated with each output pixel.

Either or both of the depth mapping function 310 and the pixel projection function 320 may be configured to utilise camera configuration information 313 which describes the

configuration of the multiple capture system during capture of the input image data 311. Such configuration information 313 may describe the intrinsic and extrinsic parameters of each captured view. For instance, the configuration information 313 may specify one or more of the number of captured views, the direction of capture associated with each view, the field of view associated with each of the different views, the location of the sensor(s) when capturing each view, the rotational orientation of the sensor relative to the lens axis, lens distortion, and the image centre-point on sensor. For instance, the depth mapping module 310 may use the camera configuration information 313 for registration to the camera system and/or to generate a three-dimensional depth map. Alternatively, this projection of the depth map data into three dimensions may be performed by the pixel projection function 320. The pixel projection function 320 may also (or alternatively) utilise the camera configuration information 313 for performing the projection of the input image pixels into the output image view. The projected input pixel data 322 generated by the pixel projection function 320 is passed to a colour reconstruction function 330. The colour reconstruction function is configured to colour-reconstruct (for instance, to "debayer") the projected input pixel data 322 thereby to produce output image data 331 representing one or more output images. The output image data 331 may then be passed to memory 350 for storage or to a display device 340 for display.

The functionality described above with reference to Figure 3 will now be described in more detail with respect to the flow chart of Figure 4.

In operation S4.1, the input image data is received by the image processing apparatus 300. The method of Figure 4 may be performed post-capture in which case the input image data is captured and stored and subsequently, the input image data is retrieved from storage and provided (in operation S4.1) for processing. In such examples, the image processing apparatus 300, for instance as illustrated in Figure 3, may be located remotely from the multiple capture system 100. Alternatively, the input image data may be received from the multiple capture system 100 in substantially real-time. In such examples, the image processing apparatus 300 may be communicatively coupled with the multiple capture system 100. For instance, the functionality of the image processing apparatus 300 may be performed by the multiple capture system 100. In operation, S4.2, the image processing apparatus 30 determines a depth value associated with each pixel of input image data. Put another way, the image processing apparatus 300 creates depth map data 312.

Generation of the depth map data 312 can be performed in a number of ways. For instance, it may be generated by visual depth mapping analysis performed on the input image data.

Alternatively or in combination with visual depth mapping analysis it may be generated using data 314 from external sensors 315. This data may be collected at the time of image capture using sensors 315 which are colocation with the multiple capture system 100. Such sensors may include but are not limited to LiDAR, time-of-flight infrared, and/or ultrasound sensors. Such techniques (both visual and sensor-based) produce a dense map of distances to the scene elements around the multiple capture system 100. Preferably, the depth map data 312 comprises a specific depth value per input image pixel. This may be achieved using purely visual analysis or, alternatively, may be achieved using depth sensor data with refinement based on visual analysis. However, as will be appreciated, the process may be implemented without a specific depth value associated with each input image pixel. Instead, groups of input image pixels (e.g. a 2x2 group) may be associated with a specific depth value.

In examples in which visual depth mapping analysis is employed, this may be performed using the green image channel only, thereby to reduce the amount of data processing required. Alternatively, the depths based on all colour channels (e.g. the red, green, and blue) may be computed. In such examples, the best one of the channels may then be selected by voting between the three based on the saliency of features in each of the channels.

In operation S4.3, the image processing apparatus 300 projects the data values associated with the input image pixels into three-dimensions relative to a location of the multiple- capture system 100. This is performed using the camera configuration information 313. For instance, in examples in which the capture devices of the system 100 are arranged in a spherical array (such as that illustrated in Figure 1), the depth map values are projected into a three-dimensional space surrounding an origin corresponding to the position of the multiple capture system 100.

In operation S4.4, the image processing apparatus 300 "re-projects" the projected depth values into one more output image perspective based on output image specification. As mentioned above, the output image specification may define the number of output images that are to be generated based on the input image data as well as their geometry, virtual location, and resolution. For instance, the output image specification may specify plural output images each corresponding 1:1 with the fields of view of the individual image sensors of the multiple capture system 100. In other examples, the output image specification may specify a monoscopic panorama or a pair of stereoscopic panoramas depending on the configuration of the multiple capture system (e.g. whether or not it is configured to capture stereoscopic images). By way of example only, if the output image is to be a cylindrical panorama, the three-dimensional depth values are re-projected onto a virtual cylindrical surface, with characteristics (resolution, geometry, orientation etc.) corresponding to those of the output image. As will be appreciated, fields of view of the different image sensors typically overlap with one another and, as such, there may be pixels in more than one constituent input image data portion which correspond to a single point in three-dimensional space. Consequently, the depth map data may include multiple depth values which correspond to same location in three-dimensional space (particularly at the overlapping regions). At least because of this, when the projected depth values are projected into the output image perspective, more than one depth value may be projected onto a single output image pixel. This situation (of multiple depth values corresponding to a single output pixel) may occur even more frequently (not just at pixels corresponding to the overlapping regions) when the resolution of the output image(s) is less than that of the image sensors.

In view of this potential conflict between different depth values for a single output pixel location, in operation S4.5, the image processing apparatus 300 is configured to resolve any conflicts by selecting the nearest depth value (that is, the depth value representing the distance that is closest to the multiple camera system 100). The result of this operation is projected depth map data including plural depth values each associated one to one with a different output pixel location.

In operation S4.6, the image processing apparatus 300 estimates, for each output pixel location, based on the depth values associated with output pixels surrounding the output pixel location, a local plane. This local plane is local to the output pixel and represents an orientation in three dimensions of a surface of an imaged object at the location on the object represented by the output pixel. Next, in operation S4.7, the image processing apparatus 300 projects the input image pixels which contribute to a particular output pixel onto the estimated local plane for that output pixel. Those input image pixels which contribute to the output pixel may be determined based on a distance between the location of the output pixel and location of the input image pixels when projected onto the output image perspective.

For instance, the contributing input pixels may be those input pixels which, when projected into the output image perspective, overlap with the output pixel. In some other examples, a maximum number of contributing pixels per colour channel may be defined. In such examples, the projected input image pixels of each colour, up to that defined limit, which are closest to (and, optionally, are within a defined distance from) the output pixel location could be selected as contributing pixels.

The exact location of the projected input pixels onto the estimated plane varies in

dependence on the location in 3D space of the input pixel. As such, the result of the projection of operation S4.7 is, for each of the output pixels, plural input image pixels projected onto the local plane in an area surrounding and encompassing the output pixel location. When projecting the input pixels onto the local plane, occluded input pixels may be omitted. That is, input image pixels which are occluded by other pixels from a different constituent portion of the image data (which represents a different view of the scene) may be omitted from projection. As such, occluded pixels may not be taken into account during the subsequent operation of determining the colour for each output pixel. The image processing apparatus 300 may therefore 300 be configured to identify (and omit from further processing) the occluded pixels for each output pixel location.

Occluded input image pixels for a particular output pixel location P may be those input pixels in the direction of P that, for a given constituent image portion captured by a sensor that is a distance A from the output pixel location P, have an associated depth map value that is less than the distance A. Put another way, for a location X of a capturing sensor and an output pixel location P, the location P is occluded from the capturing sensor if the depth map value for input image data derived from the sensor at the location X in the direction of P is less than the distance from X to P. Subsequently, in operation S4.8, the image processing apparatus 300 performs colour reconstruction based on the input image pixels which have been projected onto the output image perspective.

Specifically, for each of the output pixels, the image processing apparatus 300 determines the colour of each output pixel based on the colours of the input image pixels that have been projected onto the local plane for that output pixel. In addition, the distance in the plane between the input image pixels and the output pixel location may also be taken into account when determining the reconstructed colour for an output image pixel. For instance, a weighted average per colour of all input image pixels which contribute to the output image pixel may be computed, with the weighting being based on the distance (in the plane) between the output image pixel location and the projected input pixel. In some examples, the weights may be inversely proportional to the distance. As such, projected input pixels which are further away from the output pixel location may contribute less to the reconstructed colour than do pixels which are closer to the output pixel location. As will be appreciated, other colour reconstruction algorithms may alternatively be applied in operation S4.8. However, these may be more computationally expensive than simply using a weighted average as described above. Once such algorithm which might be utilised is described in "Non-uniform sampling, image recovery from sparse data and the discrete sampling theorem" by Leonid P. Yaroslavsky, Gil Shabat, Benny G. Salomon, Ianir A. Ideses and Barak Fishbain (Department of Physical Electronics, Faculty of Engineering, Tel Aviv University, Tel Aviv 69978, Israel - retrieved from

https://arxiv.org/ftp/arxiv/papers/o8o8/o8o8..^728.pdf).

Once the colour has been determined for each output pixel, the output image data 331 may be provided. The output image data may indicate an intensity/brightness and a colour for each pixel in the output image(s). The output image data may be stored on memory 350 for later retrieval and/or may be sent for display on a display device 340. The display device and/or storage device may be in wired or wireless connection with the image processing apparatus 300.

In some examples, the multi-capture system 100 may be configured to utilise different orientations in the "roll" direction (i.e. the multiple views of the scene are rotated differently around the lens axis). Put another way, some of the multiple views may be captured with different rotational orientations relative to their respective direction of capture. For instance, in a system comprising multiple image sensors, at least some of the image sensors may be rotated differently about their lens axis. Having a different orientation in the "roll" direction causes pixels of the different views of the scene to be to be oriented differently to one another. Consequently, the distribution of the colours in the input image data (which includes constituent portions derived using the different orientations in the "roll" direction) due to the CFA may be less regular. This may serve to reduce the occurrence and/or extent of aliasing artefacts.

The multi-view colour reconstruction process described above with reference to Figures 3 and 4 may have increased computational requirements when compared to regular

debayering, (e.g. one view at a time). The results may also be dependent on the quality of the depth information available. In view of this, there may be benefits derived from performing the reconstruction as a post-process (after capture of the raw data), when issues from, for instance, incorrect depth estimates or poor camera calibration can be corrected.

Figure 5 is a schematic illustration of an example hardware configuration with which the image processing apparatus 300 described with reference to Figures 3 and 4 may be implemented;

The image processing apparatus 300 comprises processing apparatus 50. The processing apparatus 50 is configured to receive the input image data and to perform colour

reconstruction as described with reference Figures 3 and 4. The input image data may be received at the processing apparatus 50 via an input interface 53. In the example of Figure 5, the input image data is received directly from the image sensors 110 of the multi-capture system. In such examples, the image processing apparatus 300 may form part of the multi-capture system 100. In other examples, however, the input image data may be received at the processing apparatus 50 via wired communication (e.g. via the input interface 53) or wireless communication (via transceiver 54 and antenna 55) from the multi-capture system 100 or from a storage medium. In some other examples, the input image data may be pre-stored in the memory 51 which forms part of the processing apparatus 50.

After colour reconstruction has been performed, the processing apparatus 50 may provide the colour-reconstructed output image data via an output interface 56. The output image data may be provided for display via a display device 340 or to a storage device for storage and later retrieval. In some instances, the output image data may be transmitted wirelessly via the transceiver 54 and antenna 55 to a display device or a storage device as appropriate. Additionally or alternatively, the output image data may be stored in local storage 51 at the processing apparatus 50 for later retrieval The processing apparatus 50 may comprise processing circuitry 52 and memory 51.

Computer-readable code 512A may be stored on the memory 51, which when executed by the processing circuitry 52, causes the processing apparatus 50 to perform any of the operations described herein. Example configurations of the memory 51 and processing circuitry 52 will be discussed in more detail below.

In implementations in which the image processing apparatus 300 is a device designed for human interaction, the user may control the operation of the image processing apparatus 300 by means of a suitable user input interface UII (not shown) such as key pad, voice commands, touch sensitive screen or pad, combinations thereof or the like. A speaker and a microphone (also not shown) may also be provided, for instance in conjunction with the display 340. Furthermore, the image processing apparatus 300 may comprise appropriate connectors (either wired or wireless) to other devices and/or for connecting external accessories thereto. Some further details of components and features of the above-described apparatus 300 and alternatives for them will now be described. The processing apparatus 50 may comprise processing circuitry 52 communicatively coupled with memory 51. The memory 51 has computer readable instructions 512A stored thereon, which when executed by the processing circuitry 52 causes the processing apparatus 50 to cause performance of various ones of the operations described with reference to Figures 1 to 4. The processing apparatus 50 may in some instances be referred to, in general terms, as "apparatus", "computing apparatus" or "processing means".

The processing circuitry 52 may be of any suitable composition and may include one or more processors 52A of any suitable type or suitable combination of types. Indeed, the term "processing circuitry" should be understood to encompass computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures. For example, the processing circuitry 52 may be a programmable processor that interprets computer program instructions 512A and processes data. The processing circuitry 52 may include plural programmable processors. Alternatively, the processing circuitry 52 may be, for example, programmable hardware with embedded firmware. The processing circuitry 52 may alternatively or additionally include one or more specialised circuit such as field programmable gate arrays FPGA, Application Specific Integrated Circuits (ASICs), signal processing devices etc.

The processing circuitry 52 is coupled to the memory 51 and is operable to read/write data to/from the memory 51. The memory 51 may comprise a single memory unit or a plurality of memory units, upon which the computer readable instructions (or code) 512A is stored. For example, the memory 51 may comprise both volatile memory 511 and non-volatile memory 512. In such examples, the computer readable instructions/program code 52A may be stored in the non-volatile memory 512A and may be executed by the processing circuitry 52 using the volatile memory 511 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memory 51 may be referred to as one or more non-transitory computer readable memory medium or one or more storage devices. Further, the term 'memory', in addition to covering memory comprising both one or more non-volatile memory and one or more volatile memory, may also cover one or more volatile memories only, one or more non-volatile memories only. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. The computer readable instructions/program code 512A may be pre-programmed into the processing apparatus 50. Alternatively, the computer readable instructions 512A A may arrive at the control apparatus via an electromagnetic carrier signal or may be copied from a physical entity 60 such as a computer program product, a memory device or a record medium such as a CD-ROM or DVD an example of which is illustrated in Figure 6. The computer readable instructions 512A A may provide the logic and routines that enables the apparatus 300 to perform the functionality described above. The combination of computer- readable instructions stored on memory (of any of the types described above) may be referred to as a computer program product. In general, references to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc. The transceiver and antenna 54, 55 may be adapted for any suitable type of wireless communication including but not limited to a Bluetooth protocol, a cellular data protocol or a protocol in accordance with IEEE 802.11.

The input and/or output interface 53, 56 may be of any suitable type of wired interface. For instance, when one or both of the interfaces is configured for wired connection with another device, they may be, for instance but not limited to, physical Ethernet or USB interfaces.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above- described functions may be optional or may be combined.

Although various aspects of the methods, apparatuses described herein are set out in the independent claims, other aspects may comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and

modifications which may be made without departing from the scope of the present invention as defined in the appended claims.