RECOMMENDATION METHOD AND SYSTEM - NOKIA TECHNOLOGIES OY

Title:

RECOMMENDATION METHOD AND SYSTEM

Document Type and Number:

WIPO Patent Application WO/2018/002423

Kind Code:

Abstract:

A method comprises determining at least two objects of interest (610, 620-1, 620-2, 630) in an image viewable by a user (602), assigning respective weightings to the objects (610, 620-1, 620-2, 630), the weighting of an object (610, 620-1, 620-2, 630) being related to a measure of quality of a user's (602) view in a viewing direction that includes the object (610, 620-1, 620-2, 630) and recommending a perspective in the image based on the assigned weightings.

Inventors:

LAAKSONEN LASSE JUHANI (FI)

Application Number:

PCT/FI2017/050445

Publication Date:

January 04, 2018

Filing Date:

June 15, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA TECHNOLOGIES OY (FI)

International Classes:

G06F3/04815; A63F13/525; G02B27/00; G06F3/01

Foreign References:

US20140081956A1	2014-03-20
EP0859996A1	1998-08-26
US20160142626A1	2016-05-19
US20160080643A1	2016-03-17

Other References:

TATZGERN, M. ET AL.: "Multi-perspective compact explosion diagrams", COMPUTERS & GRAPHICS, vol. 35, no. 1, 9 November 2010 (2010-11-09), pages 135 - 147, XP028132923, ISSN: 0097-8493

Attorney, Agent or Firm:

NOKIA TECHNOLOGIES OY et al. (FI)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. A method comprising:

determining at least two objects of interest in an image viewable by a user;

assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

recommending a perspective in the image based on the assigned weightings. 2. The method according to claim 1, wherein assigning respective weightings comprises determining the importance of the respective objects for the user.

3. The method according to claim 1 or 2, wherein assigning respective weightings further comprises identifying compositional elements within the image including respective ones of the objects.

4. The method according to claim 3, wherein assigning respective weightings further comprises determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects.

5. The method according to claim 3 or 4, wherein identifying compositional elements comprises performing image analysis of the image containing respective ones of the objects.

6. The method according to any one of the preceding claims, wherein recommending a perspective comprises selecting a viewing position with a view that includes an object having a relatively high weighting. 7. The method according to claim 6, wherein the perspective includes more than one object of interest.

8. The method according to any one of the preceding claims, wherein at least one of the objects of interest are determined in response to user selection.

9. The method according to any one of the preceding claims, wherein at least one of the objects of interest are determined by automatic selection.

10. The method according to any one of the preceding claims, comprising capturing the image using a presence capture device arranged to capture images from a plurality of directions.

11. The method according to claim 10, wherein the perspective comprises a distance and a rotational angle from a current position and orientation of the presence capture device. 12. The method according to any one of claims 1 to 9, comprising displaying the image using a virtual reality VR headset.

13. The method according to any one of the preceding claims, wherein recommending a perspective comprises displaying a graphical indicator in the view.

14. The method according to claim 13, wherein the graphical indicator comprises first and second graphical elements, wherein the perspective is attained by aligning the first and second graphical elements. 15. The method according to any one of the preceding claims, comprising matching compositional elements in the image with predetermined reference images.

16. Apparatus configured to perform a method according to any of the preceding claims.

17. Apparatus comprising:

means for determining at least two objects of interest in an image viewable by a user;

means for assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

means for recommending a perspective in the image based on the assigned weightings. 18. Apparatus according to claim 17, wherein the means for assigning respective weightings comprises means for determining the importance of the respective objects for the user.

19. Apparatus according to claim 17 or 18, wherein the means for assigning respective weightings further comprises means for identifying compositional elements within the image including respective ones of the objects.

20. Apparatus according to claim 19, wherein the means for assigning respective weightings further comprises means for determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects.

21. Apparatus according to claim 19 or 20, wherein the means for identifying compositional elements comprises means for performing image analysis of the image containing respective ones of the objects.

22. Apparatus according to any of claims 17 to 21, wherein the means for

recommending a perspective comprises means for selecting a viewing position with a view that includes an object having a relatively high weighting.

23. Apparatus according to claim 22, wherein the perspective includes more than one object of interest.

24. Apparatus according to any of claims 17 to 23, comprising means for determining at least one of the objects of interest in response to user selection.

25. Apparatus according to any of claims 17 to 24, comprising means for determining at least one of the objects of interest by automatic selection.

26. Apparatus according to any of claims 17 to 25, comprising means for capturing the image using a presence capture device arranged to capture images from a plurality of directions.

27. Apparatus according to claim 26, wherein the perspective comprises a distance and a rotational angle from a current position and orientation of the presence capture device.

28. Apparatus according to any one of claims 17 to 27, comprising means for viewing the image using a virtual reality VR headset.

29. Apparatus according to any of claims 17 to 28, wherein the means for

recommending a perspective comprises means for displaying a graphical indicator in the view. 30. Apparatus according to claim 29, wherein the graphical indicator comprises first and second graphical elements, wherein the perspective is attained by means for aligning the first and second graphical elements.

31. Apparatus according to any of claims 17 to 30, comprising means for matching compositional elements in the image with predetermined reference images.

32. A head mounted display comprising an apparatus according to any of claims 17 to 31· 33· A presence capture device comprising an apparatus according to any of claims 17 to 31·

34. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any of claims 1 to 15.

35. A non-transitory computer-readable storage medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:

determining at least two objects of interest in an image viewable by a user;

assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

recommending a perspective in the image based on the assigned weightings. 36. Apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor to perform:

determining at least two objects of interest in an image viewable by a user;

assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

recommending a perspective in the image based on the assigned weightings.

37. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

assigning respective weightings by determining the importance of the respective objects for the user.

38. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

assigning respective weightings by identifying compositional elements within the image including respective ones of the objects.

39. Apparatus according to claim 38, wherein the computer-readable code when executed controls the at least one processor to perform:

assigning respective weightings by determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects.

40. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

identifying compositional elements by image analysis of the image containing respective ones of the objects.

41. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

recommending a perspective by selecting a viewing position with a view that includes an object having a relatively high weighting.

42. Apparatus according to claim 36, wherein the perspective includes more than one object of interest. 43. Apparatus according to claim 36, wherein at least one of the objects of interest are determined in response to user selection.

44. Apparatus according to claim 36, wherein at least one of the objects of interest are determined by automatic selection.

45. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform: capturing the image using a presence capture device arranged to capture images from a plurality of directions.

46. Apparatus according to claim 45, wherein the perspective comprises a distance and a rotational angle from a current position and orientation of the presence capture device.

47. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

displaying the image using a virtual reality VR headset.

48. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

recommending a perspective by displaying a graphical indicator in the view. 49. Apparatus according to claim 48, wherein the graphical indicator comprises first and second graphical elements, wherein the perspective is attained by aligning the first and second graphical elements.

50. Apparatus according to claim 36, wherein the computer-readable code when executed controls the at least one processor to perform:

matching compositional elements in the image with predetermined reference images.

Description:

Recommendation Method and System

Field

This disclosure relates generally to the provision of virtual reality content, particularly but not exclusively to recommending a viewing position for an image processing device such as a presence capture device or a VR headset.

Background

When experiencing virtual reality (VR) content, such as a VR computer game, a VR movie or "Presence Capture" VR content, users generally wear a specially-adapted head-mounted display device (which may be referred to as a VR device) which renders the visual content. An example of such a VR device is the Oculus Rift (RTM), which allows a user to watch 360-degree visual content captured, for example, by a presence capture device such as the Nokia OZO (RTM) camera.

Summary

According to an aspect, a method comprises determining at least two objects of interest in an image viewable by a user, assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object and recommending a perspective in the image based on the assigned weightings.

Assigning respective weightings may comprise determining the importance of the respective objects for the user.

Assigning respective weightings may further comprise identifying compositional elements within the image including respective ones of the objects.

Assigning respective weightings may further comprise determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects.

Identifying compositional elements may comprise performing image analysis of the image containing respective ones of the objects.

Recommending a perspective may comprise selecting a viewing position with a view that includes an object having a relatively high weighting. The perspective may include more than one object of interest.

At least one of the objects of interest may be determined in response to user selection.

At least one of the objects of interest may be determined by automatic selection.

The method may comprise capturing the image using a presence capture device arranged to capture images from a plurality of directions.

The perspective may comprise a distance and a rotational angle from a current position and orientation of the presence capture device.

The method may comprise viewing the image using a virtual reality VR headset.

Recommending a perspective may comprise displaying a graphical indicator in the view.

The graphical indicator may comprise first and second graphical elements, wherein the perspective is attained by aligning the first and second graphical elements.

The method may comprise matching compositional elements in the image with

predetermined reference images.

According to another aspect, there is provided apparatus configured to perform a method as defined above.

According to another aspect there is provided: means for determining at least two objects of interest in an image viewable by a user, means for assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object and means for recommending a perspective in the image based on the assigned weightings.

The means for assigning respective weightings may comprise means for determining the importance of the respective objects for the user.

The means for assigning respective weightings further may comprise means for identifying compositional elements within the image including respective ones of the objects. The means for assigning respective weightings further may comprise means for determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects. The means for identifying compositional elements may comprise means for performing image analysis of the image containing respective ones of the objects.

The means for recommending a perspective may comprise means for selecting a viewing position with a view that includes an object having a relatively high weighting. The perspective may include more than one object of interest.

The apparatus may comprise means for determining at least one of the objects of interest in response to user selection. The apparatus may comprise means for determining at least one of the objects of interest by automatic selection.

The apparatus may comprise means for capturing the image using a presence capture device arranged to capture images from a plurality of directions.

The perspective may comprise a distance and a rotational angle from a current position and orientation of the presence capture device.

The apparatus may comprise means for comprising viewing the image using a virtual reality VR headset.

The means for recommending a perspective may comprise means for displaying a graphical indicator in the view. The graphical indicator may comprise first and second graphical elements, wherein the perspective is attained by means for aligning the first and second graphical elements.

The apparatus may comprise means for means for matching compositional elements in the image with predetermined reference images.

A head mounted display may comprise an apparatus as defined above. A presence capture device may comprise an apparatus as defined above.

According to another aspect, there are provided computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method as defined above.

According to a further aspect, there is provided a non-transitory computer-readable storage medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: determining at least two objects of interest in an image viewable by a user;

assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

recommending a perspective in the image based on the assigned weightings.

According to yet another aspect, there is provided apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor to perform:

determining at least two objects of interest in an image viewable by a user;

assigning respective weightings to the objects, the weighting of an object being related to a measure of quality of a user's view in a viewing direction that includes the object; and

recommending a perspective in the image based on the assigned weightings.

The computer-readable code when executed may control the at least one processor to perform:

assigning respective weightings by determining the importance of the respective objects for the user.

The computer-readable code when executed may control the at least one processor to perform:

assigning respective weightings by identifying compositional elements within the image including respective ones of the objects.

The computer-readable code when executed may control the at least one processor to perform: assigning respective weightings by determining the degree of preference of perspectives based on the relation between the respective object and the identified elements within the image including respective ones of the objects. The computer-readable code when executed may control the at least one processor to perform:

identifying compositional elements by image analysis of the image containing respective ones of the objects. The computer-readable code when executed may control the at least one processor to perform:

recommending a perspective by selecting a viewing position with a view that includes an object having a relatively high weighting. The perspective may include more than one object of interest.

At least one of the objects of interest may be determined in response to user selection.

At least one of the objects of interest maybe determined by automatic selection.

The computer-readable code when executed may control the at least one processor to perform:

capturing the image using a presence capture device arranged to capture images from a plurality of directions.

The perspective may comprise a distance and a rotational angle from a current position and orientation of the presence capture device.

The computer-readable code when executed may control the at least one processor to perform:

displaying the image using a virtual reality VR headset.

The computer-readable code when executed may control the at least one processor to perform:

recommending a perspective by displaying a graphical indicator in the view. The graphical indicator may comprise first and second graphical elements, and the perspective may be attained by aligning the first and second graphical elements.

The computer-readable code when executed may control the at least one processor to perform:

matching compositional elements in the image with predetermined reference images.

Brief Description of the Figures

For a more complete understanding of the methods, apparatuses and computer-readable instructions described herein, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:

Figure l is a schematic illustration of a system for presence capture;

Figure 2a shows a schematic illustration of a presence capture device;

Figures 2b and 2c show schematic diagrams of the controller device 10 and the server apparatus 12 shown in Figure 1;

Figure 3 shows an exemplary schematic of the software application 40;

Figures 4a and 4b illustrate an example of a conventional composition recommendation system for a digital camera;

Figures 5a and 5b illustrate an example of obtaining VR video content using the presence capture device;

Figure 6 illustrates the concept of recommendation for VR video content;

Figure 7 shows a high-level block diagram for an algorithm that is used to determine the perspective recommendation for a main object of interest;

Figure 8 shows a high-level block diagram for the algorithmic steps for the case where there are multiple main and secondary objects in the captured image;

Figures 9a to 9c illustrates user interface UI aids;

Figure 10 illustrates an example of the first and second graphical indicators in a VR video content; and

Figures 11a to lid illustrate the perspective recommendation method in comparison with a conventional composition recommendation method.

Detailed Description

In the description and drawings, like reference numerals may refer to like elements throughout.

Presence capture has introduced a new way to view real-life content. Essentially, devices and applications attempt to replace the view around us by a captured view around another point in space thus allowing us to experience a different space as if we were there. For example, the Oculus Rift (RTM) virtual reality headset or a similar device may be used to enjoy such content. Figure ι is a schematic illustration of a system for presence capture ι by a user 2.

During the presence capturing process using a presence capture device 5, the user 2 may be able to monitor captured image using a headset display 14 or a display in a controller device 10. The controller device 10 may communicate with a remote server apparatus 12. VR content may cover, but is not limited to, computer-generated VR content, content captured by the presence capture device 5 or a combination of the two. VR content may cover any type or combination of types of immersive media, or multimedia, content.

The presence capture device 5 may be a device comprising an array of content capture modules for capturing video and/or audio content from various different directions. For instance, the presence capture device may include a 2D (e.g. circular) array of content capture modules for capturing video and/or audio content from a wide range of angles (e.g. 360-degrees) in a single plane. The presence capture device may include a 3D array such as a spherical or partly spherical array of content capture modules for capturing content from a wide range of angles in multiple different planes.

Figure 2a shows a schematic illustration of a presence capture device 5 (such as Nokia's OZO), which includes a spherical array of video capture modules 51 to 58. Although not visible in the Figure, the presence capture device may further comprise plural audio capture modules (e.g. directional microphones) for capturing audio from various directions around the presence capture device 5. It should be noted that the device 5 may include additional video/audio capture modules which are not visible from the perspective of Figure 2a. The presence capture device 5 may therefore capture content derived from all directions.

The output of such devices may be plural streams of video content and/or plural streams of audio content. These may be combined so as to provide VR content for consumption by the user 2 at the headset display 14. The controller device 10 may be configured to cause provision of the VR content being captured to the user 2 via the headset display 14. The headset display 14 may be configured to provide a visual component of the VR content being captured with the presence capture device 5 to the user 2. The headset display 14 may comprise a dedicated virtual reality device which is specifically configured for provision of VR content (for instance Oculus Rift (RTM)) or may be a general-purpose device which is currently being utilised to provide immersive VR content, for instance, a smartphone utilised with a VR mount. In alternative embodiments, the display 14 may comprise other types of display device such as a portable display device, for example, but not limited to, a smart phone or a tablet computer.

Figures 2b and 2c show schematic diagrams of the controller device 10 and the server apparatus 12 shown in Figure 1.

The server apparatus 12 may be, for instance, any type of LAN-based or cloud-based server. The controller device 10 may include one or more transceivers 105 and associated antennas 106 for enabling wireless communication (e.g. via Wi-Fi or Bluetooth) with the server apparatus 12, which comprises a transceiver 121 and antenna 122. As shown in Figure 2c, the server apparatus 12 comprises a controller 120.

The controller device 10 and the server apparatus 12 may communicate via an interface that may be wired or wireless using any suitable protocol. The server apparatus may comprise an Input/Output interface 123. The controllers 100, 120 of the controller device and the server apparatus comprise processing circuitry 1001, 1201 communicatively coupled with memory 1002, 1202. The memory 1002, 1202 has computer readable instructions 1002A, 1202A stored thereon, which when executed by the processing circuitry 1001, 1201 causes the processing circuitry 1001, 1201 to perform various operations.

The processing circuitry 1001, 1201 of any of the controller device 10 and the server apparatus 12 may be of any suitable composition and may include one or more processors 1001A, 1201A of any suitable type or suitable combination of types. For example, the processing circuitry 1001, 1201 may be a programmable processor that interprets computer program instructions 1002A, 1202A and processes data. The processing circuitry 1001, 1201 may include plural programmable processors. Alternatively, the processing circuitry 1001, 1201 may be, for example, programmable hardware with embedded firmware. The processing circuitry 1001, 1201 may be termed processing means. The processing circuitry 1001, 1201 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 1001, 1201 may be referred to as computing apparatus. The processing circuitry 1001, 1201 is coupled to the respective memory (or one or more storage devices) 1002, 1202 and is operable to read/write data to/from the memory 1002, 1202. The memory 1002, 1202 may comprise a single memory unit or a plurality of memory units, upon which the computer readable instructions (or code) 1002A, 1202A is stored. For example, the memory 1002, 1202 may comprise both volatile memory 1002-2, 1202-2 and non-volatile memory 1002-1, 1202-1. For example, the computer readable instructions 1002A, 1202A may be stored in the non-volatile memory 1002-1, 1202-1 and may be executed by the processing circuitry 1001, 1201 using the volatile memory 1002-2, 1202-2 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memories in general may be referred to as non-transitory computer readable memory media. The term 'memory', in addition to covering memory comprising both non-volatile memory and volatile memory, may also cover one or more volatile memories only, one or more non-volatile memories only, or one or more volatile memories and one or more nonvolatile memories. The computer readable instructions 1002A, 1202A may be pre-programmed into the controller device 10 and the server apparatus 12. Alternatively, the computer readable instructions 1002A, 1202A may arrive at the apparatus 10, 12 via an electromagnetic carrier signal or may be copied from a physical entity such as a computer program product, a memory device or a record medium such as a CD-ROM or DVD. The computer readable instructions 1002A, 1202A may provide the logic and routines that enable the controller device 10 and the server apparatus 12 to perform the desired functionality. The combination of computer-readable instructions stored on memory (of any of the types described above) may be referred to as a computer program product. Where applicable, wireless communication capability of the controller device 10 and the server apparatus 12 may be provided by a single integrated circuit. It may alternatively be provided by a set of integrated circuits (i.e. a chipset). The wireless communication capability may alternatively be a hardwired, application-specific integrated circuit (ASIC). As will be appreciated, the controller device 10 and the server apparatus 12 described herein may include various hardware components which may not have been shown in the Figures. For instance, the controller device 10 may in some implementations include a portable computing device such as a mobile telephone or a tablet computer and so may contain components commonly included in a device of the specific type. Similarly, the controller device 10 and the server apparatus 12 may comprise further optional software components which are not described in this specification since they may not have direct interaction to embodiments of this specification.

Embodiments may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specific circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term 'circuitry' refers to all of the following: (a) hardware- only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term

"circuitry" would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term "circuitry" would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.

In particular, the server apparatus 12 may include a reference image database 1220 configured to store reference images and information regarding compositional elements to provide images that are balanced and pleasing to the eye.

A software application 40 such as OZO remote applications may be installed in the controller device 10.

Figure 3 shows an exemplary schematic of the software application 40. The software application 40 may be configured to control the presence capture device 5. For example a user interface 310 may be equipped in the software application 40 to control various control parameters of the camera, in the example the exposure time of the camera. Plural streams of video content and/or plural streams of audio content from the array of video capture modules 51 to 58 of the presence capture device 5 may be monitored with the camera display 320 within the software application 40.

The control may be in a wireless fashion using the transceiver 105 and the antenna 106 of the controller device. The software application 40 may also support interactive preview of VR content captured by presence capture device 5 with the headset display 14.

The output of such devices may be plural streams of video content and/or plural streams of audio content. These may be combined so as to provide VR content by the software application 40 to be sent to the headset display 14. In case the video portion of the VR content may be available only from a certain viewpoint, pre-processing of the VR content may be performed by software application 40 prior to rendering the VR content such that a user can view all of the surroundings from various viewpoints. When a head-mounted display (HMD) such as Oculus Rift or a similar device may be used as the device 14 to view the VR content, these devices are arranged to replace the view around the user with the VR content captured by the presence capture device such that the user experiences being placed in the space captured in the VR content.

When a presence capture device is used for capturing VR video content, different considerations should be taken into account compared to traditional image capture using a conventional 2D capture device such as a digital camera.

When people look at conventional 2D visual content such as movies, photographs and paintings, it is well known in the visual arts that people generally prefer the arrangement in which main elements are not centred. Often the arrangement of a 2D image follows various compositional rules, such as the Golden ratio, the Rule of Thirds, and so on.

In contrast, in real life people will directly look at a main object of interest, placing it in the centre of their vision. Therefore, in VR content viewed using a head-mounted display (HMD), the main object of interest will spend most of the time in the middle of the user's view. Compositional aids such as cues given by a user interface UI may therefore help enhancing the user's experience by building a more scenic arrangement around the main object of interest. Another consideration to be taken differently compared to conventional 2D images is that while these head-mounted displays offer complete freedom of viewing the VR content in all viewpoints, people tend to consume mainly the most interesting piece of content within their current vision. For example, a user sitting in a chair will not turn around very often to check out the scenery on the rear side. Therefore, user experience may be enhanced when the user is guided to view the most interesting or important part of the content by a UI. In particular, various UI may be devised to recommend the user a most enjoyable viewing perspective by taking into consideration of the contents lying both inside and outside the frame. The current embodiments relate to a perspective recommendation system for presence capture devices.

This is in contrast to conventional composition recommendation systems that help users to place the main point of interest elsewhere in the frame to create a more balanced and pleasing composition. In other words, the composition is now defined as distances and rotations with relation to the camera position and the various secondary objects of interest in the scene. Figures 4a and 4b illustrate an example of a conventional composition recommendation system such as a digital camera. Figure 4a illustrates an example where a user 402 takes a 2D digital photograph of an object of interest 410 using a digital camera 400, against a background 420. The digital camera 400 may include a display 405, which shows a preview of the image to be taken, which is determined by the current position of the digital camera 400. Figure 4b illustrates an example of a conventional composition recommendation system installed in the digital camera 400. A recommendation system user interface UI 430 may appear in the display 405, in this example as a vertical and a horizontal dotted line. The recommendation system UI 430 may direct the user 402 to follow the crossing point of the two dotted lines. The user 420 may correspondingly shift the position of the camera 400 so that the object of interest 410 lies at the intersection of the lines. The resulting image may have an improved composition compared to the one obtained in the example of Figure 4a. This composition may follow the rule of thirds, as illustrated in Figure 4b. Alternatively, the user may rotate the digital camera 400 to follow the guidance given by the recommendation system UI 430. This would result in a different perspective to the image shown in Figure 4b. However, the composition in terms of the position of the main object of interest 410 in the frame would be the same.

Figures 5a and 5b illustrate an example of obtaining VR video content using the presence capture device 505.

Figure 5a illustrates an example where a user 502 obtains VR video content of an object of interest 510 against a background 520, using the presence capture device 505. The presence capture device 505 may be held by the user 502 during operation or may be a head mounted display. The VR video content may be monitored by the user using a display 514, which shows an image of the VR video content being obtained. The display 514 shows the image that is determined by the current position of the presence capture device 505. Similarly to Figure 4a, the main object of interest 510 may appear near the middle of the frame within the display 514. Figure 5b illustrates why a recommendation system UI 530, similar to the

recommendation system UI 430 of the conventional 2D digital camera, may not properly function when it is implemented for the presence capture device 505. As shown in Figure 5b, the position of the presence capture device 505 may be shifted according to the guidance of the recommendation system UI 530, similarly to the case in Figure 4b.

However, since the user 502 (not shown in this Figure) is now watching immersive VR video content from a presence capture device 505 instead of a conventional 2D digital photograph, the user 502 is likely to place the object of interest 510 in the centre of the vision by turning their head to look directly at the object of interest 510. As a result, the conventional system recommendation system UI 530 in this case merely achieves an arbitrary rotation of perspective relative to the background 520. In relation to VR video content, the relevance of the composition should therefore be considered differently to the case of conventional 2D photography to establish a more suitable concept of the composition recommendation.

In conventional 2D photography, the entire photograph or the entire painting is in a viewer's vision, and their eyes will wander around looking at its details. The compositional quality lies in the arrangement of these details.

In a 3D virtual reality environment, people tend to simply look at the objects right in front of them. Since everything cannot be in a single natural view in a real or immersive environment, their eyes will be mostly centred, and only wander around the central reference spot to a certain degree. The user in a VR environment will therefore generally be looking primarily at the main object of interest.

Therefore, recommendation for the arrangement of elements in VR video content may advantageously be directed to the perspective rather than the position of the main object of interest in the frame.

The description below is made mainly in relation to the capturing and recording of VR content, but the implementation also applies for a user viewing pre-recorded VR content.

Figure 6 illustrates the concept of recommendation for VR video content. VR content of the main object of interest 610, in this example a performer, is obtained by a user using a presence capture device 605 in three different positions. The three different positions of the presence capture device are numbered 605-1, 605-2, 605-3. From the first position 605-1, the VR content may show the main object of interest 610 from behind the performer looking towards the foreground. Since the presence capture device 605 may be capable of recording from all directions, when the user 602 turns around to study the surroundings, secondary objects will come into the view of the user 602. Such secondary objects may be, for example, first and second groups of trees 620-1, 620-2 and a castle 630. The first position 605-1 may be desired if all of the elements in the surrounding are to be separately recorded for example to allow for the user to freely study the immersive surroundings. However, if the performer 610 is the main theme of the VR content, the first position 605-1 may not be the most appropriate. The second position 605-2 captures the performer 610 in front of the first group of trees 620-1 as a background within a single view of the user 602. Therefore, this composition is more balanced that that of the first position 605-1. The third position 605-3 captures the performer 610 between the first and second groups of trees 620-1, 620-2, with the road 640 meandering towards the castle 630 in the background. The third position 605-3 therefore arguably gives the most interesting and balanced single composition in this example.

The example in Figure 6 illustrates that a perspective or view-based recommendation system is needed, rather than a composition recommendation system for

capturing/consuming VR video content with pleasing compositions/perspectives. The system may separately evaluate a pleasing composition for views towards several objects and combine the perspectives to decide on a single recommended perspective for the user 602, defined by a recommended user or capture device position.

The example in Figure 6 illustrates that the intended experience may largely dictate the decision of the recommended perspective. The intended experience may be reflected in the decision process by assigning weights to each object. For example, the decision process may receive a direct user input where at least two desired objects are defined. The at least two desired objects may further be grouped into at least two categories that reflect their desired role in the view, such as 'must appear' or 'should appear' or 'may appear'. The categorization may alternatively be realized as weights that may be understood or denoted, e.g., as numerical values in the range [o, 1]. In one example, the main direction of view, towards the performer 610, is weighted heavily. If it is intended that the user 602 in addition views other interesting things in the background, the secondary directions of view, for example towards objects such as the trees 620-1, 620-2 and the castle 630, may also be weighted heavily. The preferred capture perspective, determined by the location and angle, for example user viewing angle, of the presence capture device, may be decided based on optimizing the viewing location i.e. selecting a capture or viewing location corresponding to the combination of the weighted 'criteria' for each individual object of interest in the VR content. The term 'criterion/criteria' will be explained in more detail below. The objects may be designated either via user input, as described above or automatically. Any element other than the objects in the view may be considered in evaluating the compositional quality and optimising the viewing location. The relationship between these extra elements and each object may be important in evaluating measure of quality of a user's view. These extra elements will be termed 'compositional elements.' For example, the position of the horizon in the view may be identified via image analysis and may be important in relation to an object being considered in evaluating compositional quality of a certain viewing position. The horizon in this case may be a compositional element. The compositional elements may be identified via image analysis. Alternatively, object A may be a compositional element of object B if object B appears as part of the composition in relation to object A. The compositional elements may be defined in relation to each object depending on the perspective.

Based on compositional elements of each object, a perspective-dependent criterion may be determined for each object as a function of capture position. A criterion refers to a measure of preferences of a composition from a certain perspective, which may be derivable from an algorithm. The algorithm may be predetermined by the user 2. The algorithm may be accessed from the server apparatus 12.

In optimising the viewing location, the weight assigned to each object, described above, may be considered in addition to the perspective-dependent criterion evaluated for each object. For example, even if a higher weight is set on object A than object B, the criterion value for object B from a certain perspective may be evaluated much higher than that of object A. In that case, a perspective with object B in the view may be presented as the result of optimisation. How weights and criterion values are considered in the

optimisation process will be described in more detail later. The user may provide further inputs and parameters in the optimisation process depending on the details of each implementation. For example, in addition to the weights for the objects of interest, the recommendation system may receive as input the desired amount of angular movement for each perspective corresponding to a secondary object of interest. This provides a way to take user activity into consideration. The recommendation system may also take as input the degree of difficulty in finding certain objects of interest and switching to the perspective that includes those objects of interest. The

recommendation system may also take as input whether a certain secondary object of interest is allowed to appear as a background of another object. The recommendation system may also take as input the desired maximum or minimum number of secondary objects of interest which can appear as a background to the main object of interest. For example, if the minimum number of background objects are set to be 2 in the example of Figure 6, only the perspective of the third position 605-3 niay be chosen.

The optimisation process will be described in further detail below in Figures 7 and 8.

Figure 7 shows a high-level block diagram for an algorithm that is used to determine the perspective recommendation for the specific case of a single main object of interest.

In step 7.1, the array of video capture modules 51 to 58 (shown in Figure 2a) within the presence capture device 5 may start recording the scene to capture video content. Where the algorithm is being applied to a pre-captured VR environment, this step may be omitted. In this case, it will be understood that references to a captured image or content should be considered as references to pre-recorded images or content.

In step S7.2, the main object of interest may be selected either automatically, or by accepting user input 710 via an appropriate user interface UI.

In steps S7.3 or S7.4, compositional elements in the captured VR video content are analysed. In this example, compositional elements comprise aspects of the image other than the main object of interest, but which nevertheless contribute to the overall composition of the image. The compositional elements give context to the main object of interest. For example, compositional elements in Figure 6 may comprise the arrangement of groups of trees or the spaces between them, as opposed to the trees themselves. Since a single main object is being considered in this case, the compositional elements in an image may be obtained through image analysis. A perspective-based criterion may then be obtained for the main object of interest as a function of capture position relative to the compositional elements, effectively providing a measure of compositional quality.

The analysis of the compositional elements may be performed locally at the controller device 10 by the software application 40 (step S7.3) or at the server apparatus 12 (step S7.4). Where it is performed at the server apparatus 12, the server apparatus 12 may refer to a reference image database 720 to analyze compositional elements of references. In other words, the compositional elements may be compared with reference images stored in the reference image database 720 with known balanced and pleasing image

compositions.

In step S7.5, it is decided whether there is match between at least one reference image and the analysed compositional elements in the captured image. In this case information on composition stored in the reference image database 720 may be used instead of analysis from step S7.4. The purpose of this step is to obtain known good compositions at a given place or other contextually relevant scene. In step S7.6, distance between the main object and the presence capture device 505, 605 or the viewing position of the user 502, 602 in the VR environment may be obtained. The rotation required to view the main object may also be evaluated with respect to the current viewing angle of the presence capture device or of the user in the VR environment. This step will be described in more detail later with a more generalised algorithm given in Figure 8.

In step S7.7, the results of evaluation from step S7.6 and analysis from step S7.3 (or alternatively S7.5) are used to select a desirable perspective for the main object and therefore to determine the position and angle of the presence capture device or the user with respect to the main object. In step S7.8, recommended distance and rotation relative to the main object may be evaluated and stored. In step S7.9 a recommendation may be presented in a user interface, for example, the display of the presence capture device. The recommendations from the recommendation system may be presented to the user via any suitable device UI. The recommendation may comprise the recommended position and direction of viewing, illustrated on the display. The recommendation may alternatively comprise displaying the reference capture images stored in the reference image database 720. In step S7.10, the user may react to the recommendation presented by the system to adjust the capture position of the presence capture device, or in the VR viewing environment to adjust the viewing position. These actions trigger an update of the recommendation system via a feedback loop 730 to repeat steps S7.6 to S7.10. If the adjusted distance and rotation information matches the recommendation, no

recommended adjustment is presented to the user.

The recommendation algorithm may run locally at the controller device 10 via the software application 40 or it may be part of a service from the server apparatus 12. Using the server apparatus 12 for running the algorithm may give more robust results because more input data may be available and more computational complexity may be allowed. Figure 8 shows a high-level block diagram for the algorithmic steps for the case where there are multiple main and secondary objects in the captured image. This is the generalized embodiment of that given above in Figure 7 where there may be only one main object. Steps s8.i, S8.3 to S8.5 and S8.9 to s8.io correspond generally to S7.1, S7.3 to S7.5 and S7.9 to S7.10 described above, respectively. In step s8.2i, the main and secondary objects are selected, either automatically or via user input 810. Automatically selected objects may be deselected by a user so as to limit the overall number of objects and therefore the computational complexity.

It will be understood that there may be more than one main object, for example where two or more objects are considered to be of equivalent importance. It will further be understood that in this example, where there are multiple main objects, in contrast to Figure 7, a compositional element includes the secondary objects as well as any other element in the scene that may be obtained through image analysis, such as the position of the horizon. Compositional elements are considered relative to each main object, as will be described in more detail below.

It will further be understood that the classification of objects as main objects or secondary objects can be done in various ways and is not limited to those specifically presented in this disclosure. For example, where there is a combination of user input and automatic object selection, the user inputs may indicate the main objects while the remaining automatic selections may indicate secondary objects. In step s8.22, a weighting may be defined for each of the objects of interest. This may be based, at least in part, on user input 810. The weighting may define the relative

importance of the main and secondary objects. A separate weighting for each of the objects may be defined for each camera position and angle within the VR video content. In step S8.23, which may be omitted, rotation parameters for the secondary objects can be determined or input via user input 810. The rotation parameters define the amount of angular movement that is permitted when viewing a secondary object.

As mentioned above in relation to step S7.3, a perspective-dependent criterion is obtained for the main object of interest as a function of capture position relative to the

compositional elements. In the case of step S8.3, such a perspective-based criterion is obtained for each object. The criterion may be a numerical value. Weightings of objects or views come into play by affecting the criteria. The evaluated criterion value of a highly weighted object may be rendered to contribute more significantly in the optimisation process. For example, when objects A and B appear as parts of the individual composition of object C, objects A and B are the compositional elements of object C. If object A has a higher weight than object B, the effect of object A can be made greater than the effect of object B in optimising the recommended perspective.

Referring back to Figure 6, as an example implementation, the highest weight is given to the performer 610, the next highest weight to the castle, a lower weight to the road and the lowest rating to the groups of trees. Weights may also be given to compositional features other than the objects themselves, for example, to the opening between the two groups of trees 620-1, 620-2.

In step s8.6, the distance between the main and secondary objects and the presence capture device or the viewing position of the user in the VR environment may be obtained. The rotation required to view the main and secondary objects may also be evaluated with respect to the current viewing angle of the presence capture device or of the user in the VR environment. Step s8.6 may include and utilise the following: · GPS positioning data of the main and secondary objects

• Tagging the objects with HAIP (High Accuracy Indoor Positioning) positioning devices

• Estimation of angle and distance with visual analysis

Distance calculation based on 3D depth map acquired by camera system

· Triangular calculation of the position of secondary objects. In more detail, viewed at a certain distance to a main object, the information on the relative position of a secondary object to a main object may be reflected in the apparent size of the secondary object in the view. The algorithm may consider the apparent size of the secondary object at various distances from a main object and weight the secondary object based on a predetermined composition rule.

Triangular calculation of position of secondary objects that may be outside the view of the main object.

Some of the features described above also apply for step S7.6 when appropriate.

In steps S7.6 and s8.6, the user 2 may define a set of candidate capture positions for which the analysis is carried out. The candidate capture positions may be obtained by, for example, tagging the positions with HAIP tags. Alternatively, the candidate capture positions may be obtained by placing the presence capture device at each candidate position in turn. In steps S7.6 and s8.6, when using positioning tags such as HAIP positioning tags, in the algorithm at least some of the objects may be defined as area boundaries rather than a point. Alternatively, an area within which a tagged object may move may be defined.

In step s8.6, the movement of some objects may be modelled based on additional input parameters obtained for example from the server apparatus 12 or from a separate service. For example, it may be desirable to include a moving secondary object in the view for as long a time as possible.

In step s8.6, a camera position, namely a distance and a rotation, relative to each main and secondary object, is obtained.

In steps S8.7 and s8.8, the information from step s8.6 is combined with the compositional element features to provide a recommendation of the capture location, or the distance and the rotation relative to the main and secondary objects.

In more detail, at step S8.7, the individual perspective-based criteria for each object are combined and an optimised location selected, which may be done in any meaningful way. For example, where the individual criteria are numerical values, they may simply be summed, and the perspective corresponding to the maximum sum may be selected as the optimised perspective or view. The criterion value of a highly weighted object may be rendered to contribute more significantly. For example, a multiplication of the criterion value and the weight of each object may be considered at each perspective for step S8.7. However, the implementation of step S8.7 is not limited to these examples. This step may further consider how close (in terms of the calculated value) to the best combination (recommended perspective) the current combination is. This may alter the response provided at step S8.9. For example, different feedback may be provided when the current perspective is poor, as compared to when the current perspective is almost as good as the recommended one.

The algorithm given in Figure 8 may take into consideration the arrangement of the objects outside the user's main view, i.e., the view that includes the main object of interest. The algorithm given in Figure 8 may provide a perspective where the secondary objects are appropriately arranged in the main viewing direction such that there is at least one interesting secondary view in a second direction that falls within the user's main view. According to the algorithm given in Figure 8, the capture location with a high quality main view and a high quality secondary view is preferred over an equally good main view without any interesting secondary view.

The intended experience may affect the optimisation carried out at step 8.7. If it is intended to have several good views for the user to study, a single view combining all selected objects is not desirable. This may, for example, be used to change how the weights are applied, the combining process or the rules to be used during optimisation.

Since the processing may be simplified with only one main object, any VR video content image may be treated with the algorithm given in Figure 7 to limit computational complexity or to simplify a UI or for any other reason.

In steps S7.3 and S8.3, the lighting and position of dominant light sources in the VR video content, such as the sun, may be considered. Alternatively, the lighting may be analysed separately. For example, lighting may influence the position and rotation based weightings in step s8.22. In general, it may not be desirable to have a very strong light source such as the sun within the most interesting part of the VR video content.

Figure 9 illustrates one type of user interface UI aid in the 3D view that may be used to present a recommendation in steps S7.9 and S8.9. The user viewing the VR video content may be provided with first and second graphical indicators 910 and 920 inside a frame of view, or a viewfmder 900 of the display 14 as shown in Figure 1. The first graphical indicator 910 represents the recommended perspective, capture position, or distance and angle from the main object, obtained as a result of the algorithms discussed above. The position of the first graphical indicator 910 in the VR environment indicates the recommended capture position or recommended viewing position determined from the optimal distance from the main object. The angle of the first graphical indicator 910, indicated by the axis of the cylinder, represents the recommended viewing angle or capture angle. According to the algorithms described above, the angle indicated by the first graphical indicator 910 may be directed into the main object of interest. The second graphical indicator 920 represents the capture position and direction of presence capture device 95 or the viewing direction of the user 2. The position and direction may be obtained if the presence capture device 5 or the display 14, such as a head-mounted display, is equipped with sensors for position and direction such as GPS sensor, and/or sensors for measuring spatial orientation, such as an accelerometer.

In steps S7.10 and s8.io where the user 2 may take action to adjust the capture position or the viewing position, the user 2 may be guided by the first graphical indicator 910 and the second graphical indicator 920 such that the user 2 may attempt to overlay the two graphical indicators on top of each other.

Figure 9a describes this situation in 3D view. Figure 9b shows the progression of the user 2 in improving the match or overlay of the two graphical indicators 910 and 920, in a lateral direction, in view from front. Figure 9c shows similar progress, but in parallel direction, viewing from a side.

In order to make this process of the user 2 trying to overlay the two graphical indicators 910 and 920 easier, different view directions, such as the three given in Figures 9a, 9b and 9c may alternate in time, according to input parameters defined by the user 2.

Figure 10 illustrates an example of the first and second graphical indicators in a VR video content. Figure 10 represents what the user sees through a display such as a head- mounted display (HMD). First graphical indicator 1010 corresponds to recommended location, while second graphical indicator 1020 visualizes the viewing direction. The distance between first and second graphical indicators 1010 and 1020 depends on the user distance to the first graphical indicator 1010 and the user's view. The algorithm

recommends the perspective indicated by the first graphical indicator 1010 such that the main object of interest 1030 can be viewed or captured with the most interesting and balanced composition. The user 2 may attempt to align the second graphical indicator 1020 with the first graphical indicator 1010 to follow the instruction.

Figures 11a to lid illustrate the perspective recommendation method in comparison with a conventional composition recommendation method.

Figure 11a illustrates a scene photographed using a 2D camera and conventional recommendation system applied. The user 2 has been instructed to place the main object 1130 roughly at the thirds of the horizontal and/or vertical of the image by the indicators 1150 and 1160. This generally helps to create a pleasing composition.

In Figure 11b, this same recommendation system has been applied to capture 360-degree footage with the presence capture device 5 or to view VR video content with the display 14. As the user 2 however turns their head to view the most interesting part of the image 1130, the ongoing conversation between the two actors, a less interesting composition is obtained. Regardless of the indicators 1150, 1160, 1165, the resulting composition is a simple rotation of the presence capture device 5 or the viewing direction of the user 2. It is this less interesting composition that the user is likely to view for the majority of the viewing time. In this example, a secondary object 1140, a ship moving in the background, still happens to appear in a point that would be desirable for a still image. However, since this is a moving object, it will drift out of the viewfinder 110. The exact position of the horizon will also depend on user head rotation or the rotation of the presence capture device.

In Figure 11c, the user has been instructed according to the algorithm described above to change the position of the presence capture device 95 or the perspective of the user 2. Figure 11c shows the image in the frame 1100 immediately after the user 2 followed the instructions by, for example, trying to overlay the first and second graphical indicators.

The resulting image may be arguably less interesting than Figure 11a, which shows that the current method may be less suited for helping with traditional photographic compositions than the conventional method. In this example, the algorithm however considered a secondary object outside the frame 1100. The user 2 may naturally turn their head toward the main object 1130, which may rotate the Figure 11c and result in Figure lid.

Figure lid finally presents an illustration of the "main view" for a user viewing 360-degree footage of the VR video content. Compared to Figure 11c, the rotation creates a mismatch at the "rule-of-thirds" line on the horizontal indicator 1150 with respect to the main object of interest 1130. However, the very rotation has simultaneously brought a secondary object of interest 1170, the lighthouse which was outside the frame 1100, into view exactly at the optimal position in the user's view, indicated by the intersection of the two indicators 1150 and 1165. Therefore, the user may be offered a pleasing overall composition in the direction he may be expected to be viewing for the majority of the time.

As described above, embodiments may further consider additional information for moving objects. In this example, the system has maximized the time the ship spends in the frame by considering its route and timetable against the placement of the dominant objects of interest.

In addition to optimizing the main direction of view, embodiments help to provide pleasing secondary directions of view. These may be controlled with object/direction weights, for example by a user specifying what is important and rotation weights/ranges, for example by a user specifying what kind of user activity is expected or desired. The overall content consumption experience may thus be improved.

It will be appreciated that the above described embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present application.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Previous Patent: USER TRACKING FOR USE IN VIRTUAL REALITY

Next Patent: METHODS AND APPARATUSES RELATING TO UPLINK TRANSMISSIONS