IMAGE REPRESENTATION OF A SCENE - KONINKLIJKE PHILIPS NV

Title:

IMAGE REPRESENTATION OF A SCENE

Document Type and Number:

WIPO Patent Application WO/2020/156844

Kind Code:

Abstract:

An apparatus comprises a receiver (301) for receiving an image representation of a scene. A determiner (305) determines viewer poses for a viewer with respect to a viewer coordinate system. An aligner (307) aligns a scene coordinate system with the viewer coordinate system by aligning a scene reference position with a viewer reference position in the viewer coordinate system. A renderer (303) renders view images for different viewer poses in response to the image representation and the alignment of the scene coordinate system with the viewer coordinate system. An offset processor (309) determines the viewer reference position in response to an alignment viewer pose where the viewer reference position is dependent on an orientation of the alignment viewer pose and has an offset with respect to a viewer eye position for the alignment viewer pose. The offset includes an offset component in a direction opposite to a view direction of the viewer eye position.

More Like This:

JP6878177	Information processing equipment, information processing methods and programs
WO/2022/040014	BILLBOARD LAYERS IN OBJECT-SPACE RENDERING
JP2002342394	SYSTEM AND METHOD FOR AUTOMATICALLY DISPLAYING LONGITUDINAL SECTIONAL PERSPECTIVE

Inventors:

BRULS WILHELMUS (NL)
VAREKAMP CHRISTIAAN (NL)
KROON BART (NL)

Application Number:

PCT/EP2020/051205

Publication Date:

August 06, 2020

Filing Date:

January 19, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

KONINKLIJKE PHILIPS NV (NL)

International Classes:

G06T15/20; G06F3/01; H04N13/00

Domestic Patent References:

WO2015155406A1

2015-10-15

Foreign References:

US20170003764A1

2017-01-05

Attorney, Agent or Firm:

PHILIPS INTELLECTUAL PROPERTY & STANDARDS (NL)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS:

1. An apparatus for rendering images, the apparatus comprising:

a receiver (301) for receiving an image representation of a scene, the image representation being provided with respect to a scene coordinate system, the scene coordinate system including a reference position;

a determiner (305) for determining viewer poses for a viewer, the viewer poses being provided with respect to a viewer coordinate system;

an aligner (307) for aligning the scene coordinate system with the viewer coordinate system by aligning the scene reference position with a viewer reference position in the viewer coordinate system;

a Tenderer (303) for rendering view images for different viewer poses in response to the image representation and the alignment of the scene coordinate system with the viewer coordinate system;

the apparatus further comprising

an offset processor (309) arranged to determine the viewer reference position in response to a first viewer pose being a viewer pose for which aligment is performed, the viewer reference position being dependent on an orientation of the first viewer pose and having an offset with respect to a viewer eye position for the first viewer pose, the offset including an offset component in a direction opposite to a view direction of the viewer eye position;

wherein the receiver (301) is arranged to receive an image data signal comprising the image representation and further comprising an offset indication; and wherein the offset processor (309) is arranged to determine the offset in response to the offset indication.

2 The apparatus of claim 1 wherein the offset component is no less than 2 cm.

3. The apparatus of any previous claim wherein the offset component is no more than 12 cm.

4. The apparatus of any previous claim wherein the offset processor (309) is arranged to determine the offset in response to an error metric for at least one viewer pose, the error metric being dependent on candidate values of the offset.

5. The apparatus of claim 4 wherein the offset processor (309) is arranged to determine the error metric for a candidate value in response to a combination of error metrics for a plurality of viewer poses.

6. The apparatus of any of the previous claims 4 or 5 wherein the error metric for a viewer pose and a candidate value of the offset comprises an image quality metric for a view image for the viewer pose synthesized from at least one image of the image

representation, the at least one image having a position relative to the viewer pose depending on the candidate value.

7. The apparatus of any of the previous claims 4 or 5 wherein the error metric for a viewer pose and a candidate value of the offset comprises an image quality metric for a view image for the viewer pose synthesized from at least two images of the image representation, the at least two images having reference positions relative to the viewer pose depending on the candidate value.

8. The apparatus of any previous claim wherein the image representation includes an omni-directional image representation.

9. The apparatus of any previous claim wherein the offset comprises an offset component in a direction perpendicular to a view direction of the viewer eye position.

10. The apparatus of any previous claim wherein the offset comprises a vertical component.

11. An apparatus for generating an image signal, the apparatus comprising:

a receiver (201) for receiving a number of images representing a scene from one or more poses;

a representation processor (205) for generating image data providing an image representation of the scene, the image data comprising the number of images and the image representation being provided with respect to a scene coordinate system, the scene coordinate system including a scene reference position;

an offset generator (209) for generating an offset indication, the offset indication being indicative of an offset to apply between the scene reference position and a viewer eye position when aligning the scene coordinate system to a viewer coordinate system, the offset including an offset component in a direction opposite to a view direction of the viewer eye position;

an output processor (207) for generating the image signal to comprise the image data and the offset indication.

12. The apparatus of any previous claim wherein the offset generator (209) is arranged to determine the offset in response to an error metric for at least one viewer pose, the error metric being dependent on candidate values of the offset.

13. The apparatus of claim 12 wherein the offset generator (209) is arranged to determine the error metric for a candidate value in response to a combination of error metrics for a plurality of viewer poses.

14. A method of rendering images, the method comprising:

receiving an image representation of a scene, the image representation being provided with respect to a scene coordinate system, the scene coordinate system including a reference position;

determining viewer poses for a viewer, the viewer poses being provided with respect to a viewer coordinate system;

aligning the scene coordinate system with the viewer coordinate system by aligning the scene reference position with a viewer reference position in the viewer coordinate system;

rendering view images for different viewer poses in response to the image representation and the alignment of the scene coordinate system with the viewer coordinate system;

the method further comprising

determining the viewer reference position in response to a first viewer pose, the viewer reference position being dependent on an orientation of the first viewer pose and having an offset with respect to a viewer eye position for the first viewer pose, the offset including an offset component in a direction opposite to a view direction of the viewer eye position;

wherein receiving the image representation of the scene comprises receiving an image data signal comprising the image representation and further comprising an offset indication; and further comprising determining the offset in response to the offset indication.

15. A method for generating an image signal, the method comprising:

receiving a number of images representing a scene from one or more poses; generating image data providing an image representation of the scene, the image data comprising the number of images and the image representation being provided with respect to a scene coordinate system, the scene coordinate system including a scene reference position;

generating an offset indication, the offset indication being indicative of an offset to apply between the scene reference position and a viewer eye position when aligning the scene coordinate system to a viewer coordinate system, the offset including an offset component in a direction opposite to a view direction of the viewer eye position;

generating the image signal to comprise the image data and the offset indication. 16. A computer program product comprising computer program code means adapted to perform all the steps of claims 14 or 15 when said program is run on a computer.

Description:

FIELD OF THE INVENTION

The invention relates to image representation of a scene and in particular, but not exclusively, to generation of an image representation and rendering of images from this image representation as part of a virtual reality application.

BACKGROUND OF THE INVENTION

The variety and range of image and video applications have increased substantially in recent years with new services and ways of utilizing and consuming video being continuously developed and introduced.

For example, one service being increasingly popular is the provision of image sequences in such a way that the viewer is able to actively and dynamically interact with the system to change parameters of the rendering. A very appealing feature in many applications is the ability to change the effective viewing position and viewing direction of the viewer, such as for example allowing the viewer to move and“look around” in the scene being presented.

Such a feature can specifically allow a virtual reality experience to be provided to a user. This may allow the user to e.g. (relatively) freely move about in a virtual environment and dynamically change his position and where he is looking. Typically, such virtual reality applications are based on a three-dimensional model of the scene with the model being dynamically evaluated to provide the specific requested view. This approach is well known from e.g. game applications, such as in the category of first person shooters, for computers and consoles.

It is also desirable, in particular for virtual reality applications, that the image being presented is a three-dimensional image. Indeed, in order to optimize immersion of the viewer, it is typically preferred for the user to experience the presented scene as a three- dimensional scene. Indeed, a virtual reality experience should preferably allow a user to select his/her own position, camera viewpoint, and moment in time relative to a virtual world.

Typically, virtual reality applications are inherently limited in that they are based on a predetermined model of the scene, and typically on an artificial model of a virtual world. It is often desirable for a virtual reality experience to be provided based on real world capture. However, in many cases such an approach is restricted or tends to require that a virtual model of the real world is built from the real world captures. The virtual reality experience is then generated by evaluating this model.

However, the current approaches tend to be suboptimal and tend to often have a high computational or communication resource requirement and/or provide a suboptimal user experience with e.g. reduced quality or restricted freedom.

In many e.g. virtual reality applications a scene may be represented by an image representation, such as e.g. by one or more images representing specific view poses for the scene. In some cases, such images may provide a wide-angle view of the scene and may cover e.g. a full 360° view or cover a full view sphere.

In many applications, and specifically for virtual reality applications, an image data stream is generated from data representing the scene such that the image data stream reflects the user’s (virtual) position in the scene. Such an image data stream is typically generated dynamically and in real time such that it reflects the user’s movement within the virtual scene. The image data stream may be provided to a Tenderer which renders images to the user from the image data of the image data stream. In many applications, the provision of the image data stream to the Tenderer is via a bandwidth limited communication link. For example, the image data stream may be generated by a remote server and transmitted to the rendering device e.g. over a communication network. However, for most such applications it is important to maintain a reasonable data rate to allow efficient communication.

It has been proposed to provide a virtual reality experience based on 360° video streaming where a full 360° view of a scene is provided by a server for a given viewer position thereby allowing the client to generate views for different directions. Specifically, one of the promising applications of virtual reality (VR) is omnidirectional video (e.g.

VR360 or VR180). The approach tends to result in a high data rate and therefore the number of view points for which a full 360° view sphere is provided is typically limited to a low number.

As a specific example, virtual reality glasses have entered the market. These glasses allow viewers to experience captured 360 degree (panoramic) video. These 360 degree videos are often pre-captured using camera rigs where individual images are stitched together into a single spherical mapping. In some such embodiments, images representing a full spherical view from a given viewpoint may be generated and transmitted to a driver which is arranged to generate images for the glasses corresponding to the current view of the user.

In many systems, an image representation of a scene may be provided where the image representation includes images and often depth for one or more capture points/ view points in the scene. In many such systems, a Tenderer may be arranged to dynamically generate views that match a current local viewer pose. In such systems, a viewer pose may dynamically be determined, and views dynamically generated to match this viewer pose.

Such an operation requires the viewer pose to be aligned with or mapped to the image representation. This is typically done by positioning the viewer at a given optimal or default position in the scene/ image representation at the start of the application and then tracking the viewer movement relative to this. The optimal or default position is typically selected to correspond to a position for which the image representation comprises image data, i.e. to a capture or anchor position.

However, as the viewer pose changes from this position, view interpolation and synthesis is required and this will tend to introduce degradation and artefacts, thereby reducing the image quality.

Hence, an improved approach for processing and generating image representations of a scene would be advantageous. In particular, a system and/or approach that allows improved operation, increased flexibility, an improved virtual reality experience, reduced data rates, increased efficiency, facilitated distribution, reduced complexity, facilitated implementation, reduced storage requirements, increased image quality, improved rendering, an improved user experience and/or improved performance and/or operation would be advantageous.

SUMMARY OF THE INVENTION

Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.

According to an aspect of the invention there is provided an apparatus for rendering images, the apparatus comprising: a receiver for receiving an image representation of a scene, the image representation being provided with respect to a scene coordinate system, the scene coordinate system including a reference position; a determiner for determining viewer poses for a viewer, the viewer poses being provided with respect to a viewer coordinate system; an aligner for aligning the scene coordinate system with the viewer coordinate system by aligning the scene reference position with a viewer reference position in the viewer coordinate system; a Tenderer for rendering view images for different viewer poses in response to the image representation and the alignment of the scene coordinate system with the viewer coordinate system; the apparatus further comprising: an offset processor arranged to determine the viewer reference position in response to a first viewer pose being a viewer pose for which aligment is performed, the viewer reference position being dependent on an orientation of the first viewer pose and having an offset with respect to a viewer eye position for the first viewer pose, the offset including an offset component in a direction opposite to a view direction of the viewer eye position; wherein the receiver is arranged to receive an image data signal comprising the image representation and further comprising an offset indication; and wherein the offset processor is arranged to determine the offset in response to the offset indication.

The invention may provide an improved operation and/or performance in many embodiments. The invention may in particular provide improved image quality for a range of viewer poses.

The approach may in many embodiments provide an improved user experience, e.g. it may in many scenarios allow a flexible, efficient, and/or high performance Virtual Reality (VR) application. In many embodiments, it may allow or enable a VR application with a substantially improved trade-off between image qualities for different viewer poses.

The approach may be particularly suited to e.g. broadcast video services supporting adaptation to movement and head rotation at the receiving end.

The image representation may comprise one or more images of the scene.

Each image of an image representation may be associated with and linked to a viewing or capture pose for the scene. The viewing or capture pose may be provided with reference to the scene coordinate system. The scene reference position may be any position in the scene coordinate system. The scene reference position is independent of the viewer pose. The scene reference position may be a predetermined and/or fixed position. The scene reference position may be unchanged between at least some consecutive alignments.

The rendering of the view images may be subsequent to the aligning. The offset component may be in a direction opposite to the view direction of the viewer eye position by being opposite to a view direction for the alignment viewer pose.

The first viewer pose may also be referred to as the alignment viewer pose (it is the viewer pose for which alignment is performed). The alignment/ first viewer pose may be indicative/ represent/ describe the viewer eye position and the view direction of the viewer eye position. The alignment/ first viewer pose comprises data allowing the viewer eye position and the view direction of the viewer eye position to be determined.

The offset indication may be indicative of a target offset to apply between the scene reference position and a viewer eye position when aligning the scene coordinate system to the viewer coordinate system. The target offset may include an offset component in a direction opposite to a view direction of the viewer eye position.

In accordance with an optional feature of the invention, the offset component is no less than 2 cm.

This may provide a particularly advantageous operation in many embodiments. It may in many scenarios allow a sufficiently high quality improvement to be achieved for many viewer poses.

In some embodiments, the offset component is no less than 1 cm, 4 cm, 5 cm, or even 7 cm. Larger offsets have been found to improve the quality of the images generated for view poses corresponding to head rotations while potentially degrading the image quality for forwards views, although typically to a much lower degree.

In accordance with an optional feature of the invention, the offset component is no more than 12 cm.

This may provide a particularly advantageous operation in many embodiments. It may in many scenarios provide improved image quality trade off for images generated for different viewer poses.

In some embodiments, the offset component is no more than 8 cm or 10 cm.

In accordance with an optional feature of the invention, the receiver (301) is arranged to receive an image data signal comprising the image representation and further comprising an offset indication; and wherein the offset processor is arranged to determine the offset in response to the offset indication.

This may provide advantageous operation in many systems and scenarios, such as in particular for broadcasting scenarios. It may allow offset optimization to be performed simultaneously for a plurality of different rendering devices.

In accordance with an optional feature of the invention, the offset processor is arranged to determine the offset in response to an error metric for at least one viewer pose, the error metric being dependent on candidate values of the offset. This may provide improved operation and/or performance in many

embodiments. It may in particular in many embodiments allow an improved trade-off in quality for different viewer poses, and may in many embodiments allow a dynamic optimization. It may further in many embodiments, allow a low complexity and/or efficient/ low resource operation.

The offset may in many embodiments be determined as an offset resulting a (combined) minimum error metric for one or more viewer poses.

In some embodiments, the error metric may represent an error measure or value for a continuous range of candidate values. For example, the error metric may be represented as a function of the candidate offset, and the offset may be determined as the candidate offset for which this function is minimized.

In some embodiments, only a discrete number of candidate offset values are considered and the offset processor may determine an error metric/ measure/ value for each of these candidate values. It may then determine the offset as the candidate offset for which the lowest error metric was found (e.g. after combining error metrics for a plurality of viewer poses).

In accordance with an optional feature of the invention, the offset processor is arranged to determine the error metric for a candidate value in response to a combination of error metrics for a plurality of viewer poses.