BABON FRÉDÉRIC (FR)
DOYEN DIDIER (FR)
EP3065394A1 | 2016-09-07 | |||
US20130076931A1 | 2013-03-28 | |||
US20140092281A1 | 2014-04-03 | |||
US9497380B1 | 2016-11-15 |
CLAIMS: 1. A method comprising: obtaining data including corresponding metadata information about a content captured by a camera; analyzing said data to determine information about positioning and respective color of one or more pixels of said content; determining at least two views of said content, each having a plurality of different pixels; and generating a view synthesis from said views by constructing a single synthetized pixel using said plurality of different pixels of each view. 2. An apparatus comprising at least one processor configured to: obtain data including corresponding information about a content; analyze said data to determine information about positioniong and respective color of one or more pixels of said content; determine at least two views of said content, each having a plurality of different pixels; generate a view synthesis from said views by constructing a single synthetized pixel using said plurality of different pixels of each view; render of said content using view synthesis 3. The method of claim 1 or apparatus of claim 2, wherein said first view inludes an image using a filter and a second image that includes at least one color component, the at least one color component including non-interpolated data from the first image and interpolated data based on the first image. 4. The method of claim 1 or 3 or the apparatus of claim 2 or 3 wherein said wherein said metadata indicates one or more locations of the non-interpolated data and one or more colors applied by the filter at the one or more locations. 5. The method of any of claims 1 or 3-4 further comprising providing a final rendering of said content using view synthesis. 6. The apparatus of any of claims 2-4 further configured to render said content using view syntheses. 7. The method of claims 1 or 3-5 or apparatus of any of claims 2-4 or 6 wherein said data is RAW data. 8. The method of claim 7 or apparatus of claim 7 wherein said raw data comprises data at least partially captured by a color filter array (CFA) disposed on a camera. 9. The method of claim 8 or apparatus of claim 8, wherein said CFA is captured by a Bayer filter. 10. The method of any of claims 1 or 3-5 or 7-9 or apparatus of any of claims 2 -4 or 6-9 wherein said captured content was captured by a plenoptic camera and multiple views. 11. The method of any of claims 1 or 3-9 or apparatus of any of claims 2 -9, wherein said Metadata includes Auxillary Video Information (AVI) Infoframe transmitted with said content. 12. The method of claim 11 or apparatus of claim 11, wherein said AVI is CEA861-D used by HDMI to transmit information about said content. 13. A method comprising: obtaining an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data; and obtaining metadata indicating one or more locations in the at least one color component that have the non-interpolated data. 14. A device comprising: at least one processor configured to obtain an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data; and obtain metadata indicating one or more locations in the at least one color component that have the non-interpolated data. 15. The method of claim 13 or device of claim 14 wherein said metadata includes non- interpolated data. 16. A non-transitory computer-readable medium storing computer-executable instructions executable to perform the method of any of claims 1, 3-5 and 7-13 or 15. |
[0043] Similarly from equation (3), (x k , y k ', z k ) denotes the re-projected coordinate (x, y, z) from the virtual camera to the real camera k. The great advantage of this approach is that the integer coordinate ( x , y, s ) from the virtual color cube are computed with a backward warping approach which is made possible thanks to the sampling of z by the cube.
[0044] An example is shown in FIG. 4. FIG. 4 provides an illustration of 6 consecutive slices of a virtual color cube (Top-left: 3 foreground slices, 3 bottom-right: background slices). The virtual color cube is similar to a focal-stack where only objects lying at the given slice are visible. The foreground objects have been removed. [0045] Virtual image computation can be provided by staking the virtual color cube. The virtual color cube is merged to form a unique virtual colour image. It is first required to compute the consensus cube Consensus synth (x,y,z) and the visibility cube SoftVis synth (x,y, z) associated with the colour virtual images. Similarly to equation (5) the computation is done by averaging the M' initial consensus or visibility cube: Both cubes defined above are combined into CC(x, y, z)
CC(x, y, z) = min ( Consensus synth (x,y,z),SoftVis synth (x,y, z )) (8)
[0046] The CC is a kind of probability which varies between 0 to 1. The typical values are: - If a given CC(x,y,z) is equal to 1, this means that all cameras agrees that an object is lying at the distance z from the virtual camera, and is seen at the coordinate ( x , y) within the virtual camera.
- A high value CC > 50% is rare it corresponds to object where the depth estimation was accurate (textured areas) and positioned exactly on a slice of the virtual cameras and quite close to the slices of the real cameras.
— CC values are mostly equal to 0 since many slices do not match any object.
- For objects with few details, the depth-maps extracted from the raw images does not agree and the raw consensus is low, it can be as low as 1/N where N is the number of cameras. In this case the CC is also low with values around 1/iV.
— CC values can be lower than 1/N for objects which he between 2 slices. So CC values equal to few percent are common.
The color slices are then weighted by consensus and accumulated until ray visibility reaches zero:
[0047] In practice, the virtual color cube is saved with pixels made of 4 values: Red, Green, Blue and Alpha (RGBA). The RGB encodes the colour as computed by equation (5). The alpha encodes the CC(x, y, z ) component has computed by equation (8).
[0048] Figure 3 shows an application of the view synthesis. As shown 4 left images are observed by 4 central cameras. The right images are synthetic view of the virtual camera located at the middle of the 4 central ones. An algorithm can be applied to the images captured with a matrix of 4 x 4 cameras. 4 consensus and visibility cubes are computed with 128 slices for the 4 central cameras. All depth-maps are contributing to compute the consensus and visibility cubes: the set M is made of 15 cameras. The synthetic colour cube is computed with the 4 central cameras: the set M' is made of 4 cameras. Figure 5 illustrates a detailed view of the 4 original images (4 images on the left), and the synthetized image (right image). This algorithm can produce very accurate results even with scenes made of complex occlusions. It requires a large amount of memory for the M' consensus and visibility cubes. Decreasing the memory occupation can be performed by applied the complete process slice per slice. But care must be taken since a slice of the virtual color cube will intersect several slices of the consensus and visibility cubes associated to the original images. Slice per slice computation is not feasible for matrix of cameras where cameras are not roughly located on the same plane and pointing to the same orientation.
[0049] In one embodiment, an algorithm can also be provided that allow for view synthesis with CFA information provided. In this embodiment, the View synthesis is computed by merging several input pixels l k (x k ',y k ') ml ° a single pixel of the synthetic view slice as described in equations (5).
[0050] The coordinate (x k ', y k ') is a not an integer coordinate and interpolation is required to guess the pixel value at that coordinate. The coordinate (x k ,y k ) is decomposed into where is the neared pixel rounding value, and (. } is the fractional part.
The pixel value of a non-integer coordinate is estimated using the 2 x 2 or 4 x 4 real pixels surrounding that coordinate. The interpolation defines the weight associated with surrounding pixels as a function of the fractional part of the coordinate. For instance with a bi-linear interpolation the interpolated pixel value is computed with 4 weights associated with the 4 surrounding pixels:
[0051] The input images I k are assumed to be colour images which have been computed by demosaicing from the raw images recorded by the pixel sensor. The interpolation given in equation (10) is applied equally for the Red Green and Blue components. One notices that if the values {x k '} and {y k '} are close to 0 or 1 then the interpolated pixel value I k (x k ,y k ) is almost equal to one of the surrounding input pixels. By contrast, if ( x k ' } and {y k } are close to 0.5 . In case of bi-linear interpolation, the 4 surrounding pixel values listed in the previous equation are associated to given colours of the CFA mounted on top of the pixel sensor. If one keep only the interpolation pixel values l k {x k ',y k ' ) such that ({x k }{y k }) is small then I k (x k ,y k ) ~ I k ([x k '\, [y k \). The input pixels are associated to one colour recorded according to the CFA. The h(x k ' > y k ') input pixels are averaged into a single pixel of the synthetic view according to equation (5). By keeping only the real colour component as observed by the input pixels, the demosaicing is performed naturally. The strategy is to cumulate only colour components which have been observed, and discard the colour components estimated by demosaicing, this idea developed in [2],
[0052] The view synthesis algorithm can be improved as follows. The 3 distances CI R G,B (X' ,y') between the coordinate (x k ,y k ) and the 3 closest colour components observed is used to weight the interpolated pixel value Iii(.x k ,y k )· The coordinate (C x {x k ], C y {y k '}') is the coordinate of the interpolated pixel within the CFA. In case of a Bayer CFA (2{x k ], 2{y k }) is the coordinate of the projected pixel within the Bayer matrix. This coordinate is used to compute the closest distance d R G B to a color component of the CFA is computed FIG. 8 provides the distance between the interpolation pixel coordinates and the three colors of a Bayer CFA. It should be noted that the distance di j (x k ', y k ') between the projected coordinate (x k ', y k ') and the centre of the filter F(i,j ) is computed by:
[0053] The distance is computed for each pixel of the CFA. In case of the Bayer matrix, one computes 4 distances d ij. . The distance to a given color C is computed by taking the minimum value of the distances to that colour. For instance for the Bayer matrix illustrated in 8, d R = d o,o , d G = min(d 1 ,0, d 0 1 ) and d B = d 1 ,1
[0054] The view synthesis algorithm is updated by modifying Equation (5) which becomes:
[0055] Where d c is the distance between the projected coordinate (x k ,y k )to the colour C. Equation (12) is applied for the 3 colors components.
[0056] Figure 9 provides one example of an embodiment that compares a slice of virtual color cube. The picture in the right is estimated with the original version, while the one in the left is the version that contains the CFA information.
[0057] In order to motivate the need of having the CFA information associated to an MVD content, a view synthesis using the CFA information introduced above has been partially implemented. Figure 9 also illustrates one slice of the virtual color cube corresponding to one object lying at that slice. The right image is showing the original version as computed by equation (5), and left image is showing the same slice computed with equation (12).
[0058] The view synthesis algorithm is updated by modifying Equation (5) which becomes:
Where d c is the distance between the projected coordinate (x k ,y k )to the colour C. Equation (12) is applied for the 3 color components
[0059] FIG. 9 provides an example of a result that can be obtained. In FIG. 9, a slice of a virtual color cube is compared with the version with the CFA information. Reference 910 which appears in the right image is showing the original version as computed by equation (5), and left image (905) is showing the same slice computed with equation (12).
[0060] Figure 10 is a flowchart illustration according to one embodiment. As shown in step 1010 data including corresponding metadata information about a content captured by a camera is received and in step 1020 this is analyzed to determining information about positioniong and respective color of each pixel of said content. Step 1030 deals with determining at least two views of the content, each having a plurality of different pixels. In step 1040 a view synthesis is generated from said views by constructing a single synthetized pixel using said plurality of different pixels of each view. In step 1050, a final rendering can optionally be provided. [0061] Figure 11 schematically illustrates a general overview of an encoding and decoding system according to one or more embodiments. The system of Figure 11 is configured to perform one or more functions and can have a pre-processing module 1100 to prepare a received content (including one more images or videos) for encoding by an encoding device 11400. The pre-processing module 11300 may perform multi-image acquisition, merging of the acquired multiple images in a common space and the like, acquiring of an omnidirectional video in a particular format and other functions to allow preparation of a format more suitable for encoding. Another implementation might combine the multiple images into a common space having a point cloud representation. Encoding device 11400 packages the content in a form suitable for transmission and/or storage for recovery by a compatible decoding device 11700. In general, though not strictly required, the encoding device 11400 provides a degree of compression, allowing the common space to be represented more efficiently (i.e., using less memory for storage and/or less bandwidth required for transmission). In the case of a 3D sphere mapped onto a 2D frame, the 2D frame is effectively an image that can be encoded by any of a number of image (or video) codecs. In the case of a common space having a point cloud representation, the encoding device 6400 may provide point cloud compression, which is well known, e.g., by octree decomposition. After being encoded, the data, is sent to a network interface 11500, which may be typically implemented in any network interface, for instance present in a gateway. The data can be then transmitted through a communication network 11500, such as internet but any other network may be foreseen. Then the data received via network interface 11600 may be implemented in a gateway, in a device. After reception, the data are sent to a decoding device 11700. Decoded data are then processed by the device 11800 that can be also in communication with sensors or users input data. The decoder 11700 and the device 11800 may be integrated in a single device (e.g., a smartphone, a game console, a STB, a tablet, a computer, etc.). In another embodiment, a rendering device 11900 may also be incorporated. [0062] In one embodiment, the decoding device 11700 can be used to obtain an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data and obtaining metadata indicating one or more locations in the at least one color component that have the non-interpolated data. [0063] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this disclosure and are within the scope of this disclosure.
Next Patent: COUPLER FOR A RAIL VEHICLE AND RAIL VEHICLE WITH A COUPLER