Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD, DEVICE AND PROGRAM STORAGE UNIT FOR LIVE BROADCASTING OF SPHERICAL VIDEO
Document Type and Number:
WIPO Patent Application WO/2017/164798
Kind Code:
A1
Abstract:
In the direct transmission of spherical video one or more additional video streams are rendered into the spherical video so that they are perceived as video screens in the spherical environment that is created. Rendering of the additional video streams is carried out in the same stage as combining the camera unit's video streams into one spherical video.

Inventors:
PERSSON MAGNUS (SE)
NILSSON JONATHAN (SE)
DANIELSSON TORKEL (SE)
Application Number:
PCT/SE2017/050271
Publication Date:
September 28, 2017
Filing Date:
March 21, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VOYSYS AB (SE)
International Classes:
H04N5/232; H04N5/45
Domestic Patent References:
WO2002009445A22002-01-31
WO2015092968A12015-06-25
Foreign References:
US20160065946A12016-03-03
Attorney, Agent or Firm:
AWAPATENT AB (SE)
Download PDF:
Claims:
CLAIMS

1 . Method of, in the direct transmission of spherical video to a receiver screen, rendering additional video to the spherical video

characterised in that:

video streams from cameras used to film an environment are rendered to form a first spherical video stream and

in the same rendering, or more additional video streams are rendered to the spherical video stream to form video screens in virtual environment created by the first spherical video stream

so that a second video stream is created which includes the first spherical video stream and said additional video streams.

2. Method according to claim 1 in which rendering is brought about in a first processing unit and in which the second spherical video stream is sent to a second processing unit which is connected to the receiver screen.

3. Method according to claim 1 or 2 in which the receiver screen comprises at least one pair of VR glasses.

4. Method according to any one of previous claims in which a 3D motor is used during rendering.

5. Method according to claim 4 in which rendering of the spherical video comprises:

the fact that a model of the camera unit, which contains sufficient information for each pixel on the camera unit's sensor(s) to be able to be mapped against the direction light has come in from is entered,

the fact that a projection model is entered which describes how each pixel coordinate in the resulting video can be converted into a camera view in the 3D motor, with position, direction and visual field area, the fact that with the help of the 3D motor the video streams from the camera(s) in the camera unit are projected back against a surface with coincides with the geometry of the room the camera unit is located in,

the fact that the position, rotation and size are entered of the video stream to be rendered,

the fact that a 3D model of the respective virtual video screens are entered,

the fact that additional videos streams are projected onto the surface of each screen model placed according to the entered data and

the fact that the resulting video is created by going through all the pixels and for each pixel taking the camera view in the 3D model which corresponds to precisely the pixel according to the entered projection model with the addition of mixing different cameras' pictures and linear interpolation between adjoining pixels.

6. Method according to claim 5 where creation of the model of the camera unit includes:

the position and rotation of the optical point of intersection of each camera lens being entered and calibrated,

an equation describing the characteristics of the respective lens being brought about and

if required, each lens being calibrated to find optimum constants for the formula describing the lens characteristics. 7. Device which is designed to implement the method according to any one of claims 1 -6.

8. Program storage unit which can be read by machine, in which the program storage unit embodies a program with instructions which can be executed by the machine in order to carry out the program steps for decoding video data, wherein the program steps include the step in one or more processes according to claim 1 - 6.

9. Method of, in the direct transmission of video to VR glasses, rendering additional video to a spherical video characterised by the following steps:

- Video streams from the camera used to film the environment are rendered to a spherical video with known technology

- In the same rendering one or more additional video streams are rendered into the spherical video stream so that they are perceived as video screens in the virtual environment created by the spherical video. 10. Method according to claim 9 in which a 3D motor is used during rendering.

1 1 . Method according to claim 10 in which rendering of the spherical video is characterised by the following steps:

- A model of the camera unit, which contains sufficient information for each pixel on the camera unit's sensor(s) to be able to be mapped against the direction light has come in from is entered,

- A projection model is entered which describes how each pixel coordinate in the resulting video can be converted into a camera view in the 3D motor, with position, direction and visual field area,

- With the help of the 3D motor the video streams from the camera(s) in the camera unit are projected back against a surface with coincides with the geometry of the room the camera unit is located in,

- The position, rotation and size are entered of the respective video screen to be rendered in

- A 3D model of the respective virtual video screens is entered,

- Additional videos streams are projected onto the surface of each screen model placed according to the entered data

- The resulting video is created by going through all the pixels and for each pixel taking the camera view in the 3D model which corresponds to precisely the pixel according to the entered projection model with the addition of mixing different cameras' pictures and linear interpolation between adjoining pixels where this is applicable.

12. Method according to claim 1 1 where creation of the model of the camera unit is characterised by:

- The position and rotation of the optical point of intersection of each camera lens being entered and calibrated

- A known formula which describes the respective lens characteristics as well as possible being entered

- If required, each lens being calibrated in accordance with a known method to find optimum constants for the formula describing the lens characteristics.

13. Device which is designed to implement one or more of the methods according to any one of claims 9-12. 14. Program storage unit can be read by machine, in which the program storage unit embodies a program with instructions which can be executed by the machine in order to carry out the program steps for decoding video data, wherein the program steps include the step in one or more processes according to claim 9 - 12.

Description:
METHOD, DEVICE AND PROGRAM STORAGE UNIT FOR LIVE BROADCASTING OF SPHERICAL VIDEO

Technical domain

The present invention relates to a method, a device and a program storage unit for, during live transmissions, creating VR video (panorama video or 360 video) with embedded virtual TV screens which show additional video in such a way that the user perceives this as TV screens when the video is played in VR glasses.

Technical position

VR glasses allow the user to look round in pictures which have a larger area/field of vision than shown in VR glasses screens at present. Built-in direction sensors determine which part of the picture is to be shown. The technology has created a requirement for large fields of vision in the video material being played. In order to deliver video with a large field of vision camera rigs with several cameras are often used, each of which is provided with a wide-angled lens. With two or more cameras videos with a 360 degree field of vision can be created for example.

Known methods combine video images from several cameras in a transmission end into a large rectangular video, often through an

equirectangular transform.

In certain live transmission there is a need for so-called "second screens" with live transmissions from other events. An example of live transmission is football or ice hockey, where the viewer wants to check what is happening in other matches.

Known methods can in a VR player (at a receiving end) take in flat video streams and render them so that they are experienced as virtual TV screens These virtual TV screens do not provide the same feeling of being present as a spherical video. Existing technology is lacking when a feeling of being present is to be created in combination with showing spherical video as the virtual TV screen has to be rendered simultaneously with the spherical video. Weaker units (such as mobile telephones) lack the capacity to decode several video streams and synchronisation problems often occur. In addition a broad bandwidth is required and decoding capacity for streaming all transmissions to the receiver.

The invention envisages a method of, at the transmitting end, rendering all video streams to a single spherical VR video in one step. The method means that only one stream has to be sent, coded and decoded, and that synchronisation of video sources can be carried out in a controlled environment.

Problem solution

The limitations and deficiencies of known technology are addressed by a method, a device and a program storage unit in that they have the characteristics set out in the following claims.

Summary

One aim of this document is to bring about a method of direct transmission of spherical video which at least partially eliminates the drawbacks of the prior art.

The invention is defined by the attached independent claims. Forms of embodiment are evident from the non-independent claim the following description and the attached drawing.

According to a first aspect a method is bought about of, during direct transmission of spherical video to a receiver screen, rendering additional video to the spherical video. The method consists in rendering in video streams from cameras used to film an environment to form a first spherical video stream and in the same rendering, rendering one or more additional video streams into the spherical video stream to form video screens in virtual environment created by the first spherical video stream Through the method a second spherical video stream is created which includes the first spherical video stream and said additional video streams. "Direct transmission" means transmission essentially in real time, which in most cases only involves delays relating to data processing and buffering.

The receiver screen can be of any type, such as, but not limited to stationary and portable computers, readers, smartphones or VR glasses.

Through the method a spherical video stream is created with one or more video streams which are perceived by the user as separate video screens in the virtual environment which the spherical video stream brings about.

Rendering is brought about in a first processing unit and in the second spherical video stream can be sent to a second processing unit which is connected to the receiver screen. For example, the first processing unit can be located by the operator in charge of rendering and/or transmission of the video stream and the first processing unit can be located at a receiver.

Hence there can be a large number of second processing units. In most cases the second video stream is sent via a computer network such as the internet.

The receiver screen can comprise at least one pair of VR glasses. A 3D motor can be used during rendering.

Rendering of the spherical video cam include the fact that a model of the camera unit, which contains sufficient information so that each pixel on the camera unit's sensor(s) is able to be mapped against the direction light has come in from, the fact that a projection model is entered which describes how each pixel coordinate in the resulting video can be converted into a camera view in the 3D motor, with position, direction and visual field area, the fact that with the help of the 3D motor the video streams from the camera(s) in the camera unit are projected onto a surface with coincides with the geometry of the room the camera unit is located in, the fact that the position, rotation and size are entered of the video stream to be provided, the fact that a 3D model of the respective virtual video screens are entered, the fact that additional videos streams are projected onto the surface of each screen model placed according to the entered data and the fact that the resulting video is created by going through all the pixels and for each pixel taking the camera view in the 3D model which corresponds to precisely the pixel according to the entered projection model with the addition of mixing different cameras' pictures and linear interpolation between adjoining pixels where this is applicable.

The model of the camera unit can consist of known models, such as a pinhole lens model or any geometric camera model.

The projection model should correspond as closely as possible to the cameral model as this is to bring about an image of the first video stream.

The term "enter" includes pre-programming, obtaining suitable models or data from a server as well as ad-hoc entering.

Creating the model of the camera unit can include entering and calibrating the position and rotation of the optical point of intersection of each camera lens, entering a formula which describes the characteristics of the respective lens as well as possible, and, if required, each lens being calibrated in accordance with a known method to find optimum constants for the formula describing the lens characteristics.

The formula can be pre-programmed or entered as required. Formulas for lens characteristics are known per se. The method of lens calibration, which is selectable, is also known per se.

According to a second aspect a device is brought about which is designed to implement the method described above.

According to a third aspect a program storage unit can be read by machine, in which the program storage unit embodies a program with instructions which can be executed by the machine in order to carry out the program steps for decoding video data, wherein the program steps include the step in one or more processes as described above.

According to a fourth aspect a method is brought about of, in the direct transmission of video to VR glasses, rendering additional video into a spherical video characterised by the following steps:

- Video streams from the camera used to film the environment are rendered to a spherical video with known technology - In the same rendering one or more additional video streams are rendered into the spherical video stream so that they are perceived as video screens in the virtual environment created by the spherical video.

The method can include using a 3D motor during rendering.

The method can consist in the fact that rendering of the spherical video includes the following stages:

- A model of the camera unit, which contains sufficient information for each pixel on the camera unit's sensor(s) to be able to be mapped against the direction light has come in from is entered,

- A projection model is entered which describes how each pixel coordinate in the resulting video can be converted into a camera view in the 3D motor, with position, direction and visual field area,

- With the help of the 3D motor the video streams from the camera(s) in the camera unit are projected back against a surface with coincides with the geometry of the room the camera unit is located in,

- The position, rotation and size are entered of the respective video screen to be rendered in

- A 3D model of the respective virtual video screens is entered,

- Additional videos streams are projected onto the surface of each screen model placed according to the entered data

- The resulting video is created by going through all the pixels and for each pixel taking the camera view in the 3D model which corresponds to precisely the pixel according to the entered projection model with the addition of mixing different cameras' pictures and linear interpolation between adjoining pixels where this is applicable.

The method can involve that the creation of the model of the camera unit includes:

- The position and rotation of the optical point of intersection of each camera lens being entered and calibrated

- A known formula which describes the respective lens characteristics as well as possible being entered - If required, each lens being calibrated in accordance with a known method to find optimum constants for the formula describing the lens characteristics.

According to a fifth aspect a device is brought about which is designed to implement one or more of the methods according to the fourth aspect.

According to a sixth aspect a program storage unit can be read by machine, in which the program storage unit embodies a program with instructions which can be executed by the machine in order to carry out the program steps for decoding video data, wherein the program steps include the step in one or more processes as described above.

List of figures

Fig. 1 Example of a system which can be used to implement the invention.

Detailed description of the invention

The invention relates to a scenario in which one or more

cameras/sensors are mounted in a camera rig with the aim of creating a spherical video with a large visual field/image size and in which a viewer watches this spherical video in live transmission with VR glasses.

For example, the camera unit should be able to be placed in an ice rink to give the viewer the feeling of being present in the ice rink when a match is going on.

So that the viewer can also check what is happening in other matches, TV broadcasts from these are shown on virtual screens in the virtual environment of the ice rink.

Fig. 1 and the following description are an example of how the invention can be implemented:

- Sensors/camera (10) record video over a large visual field. The video is transported to a 3D motor (13) with suitable video transmission technology, e.g. SDI cables. - The TV broadcasts (12) to be shown in the virtual environment are transported to the 3D motor in a suitable manner, for example through streaming over an IP network.

- Detailed parameters of the camera unit are entered into the 3D motor or generated on site through a calibration process. The parameters each describe, among other things, the optical intersection point of the

camera/sensor with position and rotation and are used to combined the image into a seamless spherical video.

- A formula for the optical properties of each lens is entered in the same way or generated through calibration. Depending on which lens is used various formulae can be used, but for a fish-eye lens the following formula can be used for example:

r: radial distance from the optical centre on the image sensor

Θ: angle from the optical axis

k1 ...5 lens parameters

r = k1 * 9 + k2 * 9 2 + k3 * 9 3 +k4 * 9 4 + k5 * 9 5

- If required the lens is calibrated with known technology in order to optimise the values of the constants used in the formula. However, for many lenses sufficiently good values can be found by searching through already carried out calibrations.

- A 3D model describing respective virtual video screens is entered. In addition to the respective screen's shape, the position and rotations are entered. For example, a flat surface in the form of a rectangle can be used. If the size is set to 160 cm by 90 cm and the surface is placed 3 metres from the centre of the camera unit, the virtual screen will be perceived as a large TV screen at normal everyday room distance.

- A surface which approximately corresponds to the geometry in the ice rink is entered into the 3D motor. The centre point of this surface is the same point as the centre point of the camera unit. If the camera unit is placed centrally in the ice rink a sphere with a radius of 10 metres can be used as an approximation for example.

- A projection model for the resulting video is entered. In this example the known equirectangular projection and resolution 4096x2028 are used. I = Pi-2 * M_Pi * x/w;

f = Pi/2 - Pi * y/h;

Where:

I = azimuth in radians

f = elevation in radians

x = horizontal pixel coordinate

y = vertical pixel coordinate

w = 4096

h = 2048

and where 0.0 is the resulting midpoint of the video.

- For each frame in the video the following step is now carried out: 1 ) A frame is captured from the respective camera/sensor in the camera unit (1 1 )

2) The image which is captured from the respective camera/sensor is projected back from the model of the camera's optical point of intersection to the projection surface (sphere with a radius of 10 metres in this example). For this projection the formula for the respective optical properties of the lens is used so that each pixel can be projected to the side corresponding to the direction in which the corresponding light fell on the sensor.

3) A frame is captured from each video source (12) to be shown on the virtual video screen.

4) Images captured from the respective video sources are projected onto a corresponding model of the virtual TV screen with the aid of the 3D motor. In this example these screens are within the projection surface which means they end up in front of the background video during rendering.

5) The resulting image frame is rendered through an iteration over all pixels and images. For each pixel the above projection model is used to calculate which direction vector corresponds to the pixel's coordinates. With the aid of the 3D motor image data for the pixel is obtained with a camera placed in the entre of the model and direction according to the direction vector. - The video is transmitted with suitable technology, for example through streaming over an IP network

- A VR player (14) receives the streamed video.

- For each frame in the video the following step is now carried out: 1 ) The VR player obtains a direction vector from the direction sensor

(16) which is fitted in the VR glasses (15). The direction sensor uses known technology to determine in which the direction the VR glasses are directed.

2) Using the direction vector the projection model and a lens model for the lenses in the glasses the VR player calculates which pixels are to be rendered to the VR glasses. - This is done with known technology and there are several VR players available on the market which can play video coded with precisely the equirectangular projection model used in this example.

- The viewer now perceives being in the environment in which the camera is place, but with a number of virtual TV screens being deployed.