Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR OBJECT DETECTION IN IMMERSIVE IMAGES
Document Type and Number:
WIPO Patent Application WO/2023/222182
Kind Code:
A9
Abstract:
An immersive technology system (100), a device and a method for detecting objects within an immersive image (104i), for determining fear of a user (108) using an immersive medium (104), and/or for reducing a fear level of a user (108) using an immersive medium (104) are provided which allow to detect a fear-trigger within the immersive medium (104) by detecting objects within the immersive medium (104), to determine fear of a user (108) using the immersive medium (104) by detecting a reaction of the user (108) in response to providing the immersive medium (104) to the user (108) considering the detected fear trigger, and to reduce the fear level of the user (108) by modifying the immersive medium (104), such as the objects detected within the immersive medium (104), and/or the presentation of the immersive medium (104).

Inventors:
POHL DANIEL (DE)
Application Number:
PCT/EP2022/063164
Publication Date:
February 29, 2024
Filing Date:
May 16, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
IMMERVR GMBH (DE)
International Classes:
G06V20/70; G06T3/00; G06T5/00; G06T15/20; G06V20/00; G06V20/20; G06F3/01; G06T15/00
Attorney, Agent or Firm:
VIERING, JENTSCHURA & PARTNER MBB (DE)
Download PDF:
Claims:
Claims

1. A device for detecting objects within an immersive image, the device comprising: one or more processors configured to:

• provide an immersive image associated with at least a portion of a sphere;

• tessellate at least the portion of the sphere into a plurality of polygons such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image;

• for one or more polygons of the plurality of polygons, project at least the corresponding part of the immersive image onto the sphere and generate a perspective image representing at least the corresponding part of the immersive image; and

• for each generated perspective image, detect one or more objects within the respective perspective image using semantic image segmentation.

2. The device according to claim 1, wherein one or more polygons of the plurality of polygons are quadrilaterals.

3. The device according to claim 1 or 2, wherein each generated perspective image represents the corresponding part of the immersive image and an additional part of the immersive image surrounding the corresponding part.

4. The device according to any one of claims 1 to 3, wherein the immersive image has an Equirectangular format or is converted into an Equirectangular format; wherein the one or more processors are configured to:

• divide the immersive image into an upper edge region, a lower edge region, and a center region located between the upper edge region and the lower edge region, wherein the upper edge region of the immersive image is associated with a first sub-portion of the portion of the sphere and wherein the lower edge region of the immersive image is associated with a second sub-portion of the portion of the sphere, wherein the one or more polygons comprise each polygon associated with the first sub-portion and each polygon associated with the second sub-portion; and

• detect one or more obj ects within the center region of the immersive image using image segmentation. The device according to any one of claims 1 to 4, wherein the one or more processors are further configured to: for each generated perspective image:

• generate a respective depth image comprising depth information regarding the one or more objects detected within the respective perspective image;

• determine a respective size of each of the one or more objects detected within the respective perspective image and/or a respective distance of each of the one or more objects from a center of the sphere. An immersive technology system, comprising:

• one or more output devices configured to provide computer-simulated reality in accordance with an immersive medium to a user;

• one or more sensors configured to detect sensor data representing a reaction of the user in response to providing the immersive medium to the user via the one or more output devices;

• one or more processors configured to: o determine, using the sensor data, whether the reaction of the user is associated with a fear reaction; o detect one or more objects within the immersive medium; o for each of the detected one or more objects, determine, whether the respective object is associated with at least one fear; o in the case that it is determined that the reaction of the user is associated with the fear reaction and that the respective object is associated with at least one fear, increase a probability of the user having the at least one fear; o determine that the user has the at least one fear in the case that the probability associated with the at least one fear is above a predefined fear threshold value. The immersive technology system according to claim 6,

• wherein the immersive medium comprises an immersive image;

• wherein the one or more output devices comprise a display output device configured to display the immersive image; and

• wherein the one or more processors are configured to detect at least one object of the one or more objects within the immersive image using the device in accordance with any one of claims 1 to 5. The immersive technology system according to claim 7, • wherein the one or more sensors comprise at least one sensor configured to detect a viewing direction of the user;

• wherein the one or more processors are configured to detect the one or more objects within the immersive image in the field of view of the viewing direction of the user. The immersive technology system according to any one of claims 6 to 8,

• wherein the immersive medium comprises audio data; wherein the one or more output devices comprise an audio output device configured to output an audio signal in accordance with the audio data; and wherein the one or more processors are configured to detect at least one object of the one or more objects within the audio data using audio object detection; and/or

• wherein the immersive medium comprises haptic data; wherein the one or more output devices comprise a haptic output device configured to output a haptic signal in accordance with the haptic data; and wherein the one or more processors are configured to detect at least one object of the one or more objects within the haptic data. A device, comprising: one or more processors configured to:

• detect one or more objects within an immersive medium;

• for each object of the detected one or more objects: o determine, whether the respective object is associated with at least one fear; o in the case that it is determined that the respective object is associated with at least one fear, determine a fear level associated with the object and determine, whether a user has the at least one fear; and o in the case that it is determined that the user has the at least one fear,

■ prevent the immersive medium from being presented to the user, or

■ provide data to one or more output devices indicating a modified presentation of the immersive medium to the user, and/or

■ modify at least the respective obj ect within the immersive medium to reduce the fear level.

11. The device according to claim 10, wherein the immersive technology system according to any one of claims 6 to 9 is employed to determine whether the user has the at least one fear.

12. The device according to claim 10 or 11,

• wherein the immersive medium comprises an immersive image;

• wherein the one or more processors are configured to detect at least one object of the one or more objects within the immersive image using the device in accordance with any one of claims 1 to 5.

13. The device according to any one of claims 10 to 12, wherein the one or more processors are configured to provide the data to a display output device, wherein the provided data indicate the display output device:

• to reduce a field of view of the immersive image when displaying the immersive image to the user;

• to present the immersive image on a virtual user device;

• to present the immersive image as monoscopic immersive image provided that the immersive image is a stereoscopic immersive image;

• to change, provided that the immersive image is a stereoscopic immersive image, the respective position of the immersive image associated with the left eye and the immersive image associated with the right eye by changing the interpupillary distance associated with the stereoscopic immersive image;

• to change a height position of the user within the computer-simulated reality;

• to change the lateral position of the user within the computer-simulated reality provided that the immersive image represents six degrees of freedom;

• to increase a transparency of the display output device.

14. The device according to any one of claims 10 to 13, wherein the one or more processors are configured to modify at least the at least one object within the immersive image to reduce the fear level by:

• blurring the respective object;

• changing the color of at least the respective object;

• cutting the respective object out of the immersive image and reconstructing the immersive image using image inpainting;

• replacing the respective obj ect with another obj ect which is not associated with a fear of the user;

• adding an artificial floor below the user. A method for detecting objects within an immersive image, the method comprising:

• providing an immersive image associated with at least a portion of a sphere;

• tessellating at least the portion of the sphere into a plurality of polygons such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image;

• for one or more polygons of the plurality of polygons, projecting at least the corresponding part of the immersive image onto the sphere and generating a perspective image representing at least the corresponding part of the immersive image; and

• for each generated perspective image, detecting one or more objects within the respective perspective image using semantic image segmentation. A method for determining fear of a user using an immersive technology system, the method comprising:

• providing computer-simulated reality in accordance with an immersive medium to a user using an immersive technology system;

• detecting sensor data representing a reaction of the user in response to providing the immersive medium to the user;

• determining, using the sensor data, whether the reaction of the user is associated with a fear reaction;

• detecting one or more objects within the immersive medium;

• for each of the detected one or more objects, determining, whether the respective object is associated with at least one fear;

• in the case that it is determined that the reaction of the user is associated with the fear reaction and that the respective object is associated with at least one fear, increasing a probability of the user having the at least one fear; and

• determining that the user has the at least one fear in the case that the probability associated with the at least one fear is above a predefined fear threshold value. A method for reducing a fear level of a user using an immersive medium, the method comprising:

• detecting one or more objects within an immersive medium;

• for each object of the detected one or more objects: o determining, whether the respective object is associated with at least one fear; o in the case that it is determined that the respective object is associated with at least one fear, determining a fear level associated with the object and determine, whether a user has the at least one fear; and e case that it is determined that the user has the at least one fear,

■ preventing the immersive medium from being presented to the user, or

■ providing data to one or more output devices indicating a modified presentation of the immersive medium to the user, and/or

■ modifying at least the respective object within the immersive medium educe the fear level of the user.

Description:
Description

SYSTEM AND METHOD FOR OBJECT DETECTION IN IMMERSIVE IMAGES

Various embodiments generally relate to an immersive technology system, a device and a method for detecting objects within an immersive image, for determining fear of a user using an immersive medium, and/or for reducing a fear level of a user using an immersive medium. As an illustrative example, the immersive technology system, the device and the method may allow to detect a fear-trigger within an immersive medium by detecting objects within the immersive medium, to determine fear of a user using the immersive medium considering the detected fear trigger, and to reduce the fear-level of the user by modifying the immersive medium, such as the objects detected within the immersive medium.

Content for computer-simulated reality (such as augmented reality and virtual reality) is represented by immersive media to provide an immersive experience to a user using a dedicated immersive technology system including, for example, a head-mounted display. This immersive experience can feel completely real to the user which may not only lead to joyful situations but also to uncomfortable situations in which the user feels fear (e.g., due to natural fear, phobias, or even anxiety). Therefore, it may be helpful if the immersive technology system can detect fear reactions of a user and/or to reduce a user-specific fear level associated with an immersive medium. This may allow the immersive technology system to improve the overall immersive experience in a user-specific manner.

According to various aspects, an immersive technology system, a device and a method are provided which are capable to detect fear reactions of a specific user and/or to modify an immersive medium to reduce a user-specific fear level, thereby, improving the immersive experience of the user.

According to various aspects, an immersive technology system, a device and a method are provided which are capable to detect fear-triggers within immersive media. For example, the immersive technology system, the device and the method are capable to detect feartriggers within an immersive image by employing semantic image segmentation. With the rise of computer-simulated reality in the consumer market, various different types and formats of immersive images emerged. It has been recognized that distortions within immersive images (e.g., Tissot distortions or Tissot’s indicatrices within Equirectangular immersive images) may prohibit to usefully employ common (two-dimensional) semantic image segmentation algorithms to detect objects within immersive images. Training a semantic image segmentation algorithm on immersive images, on the other hand, may require a high effort with respect to the number of required training data, the required computational cost, the required time investment, etc. According to various aspects, the immersive technology system, the device and the method are capable to detect objects within an immersive image using a common (two-dimensional) semantic image segmentation algorithm. They may detect the objects within an immersive image by converting the immersive image into a plurality of perspective images and by applying the semantic image segmentation algorithm on each of the perspective images.

In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:

FIG. 1 A shows an immersive technology system according to various aspects;

FIG. IB to FIG. IE each show a respective processing flow according to various aspects;

FIG. 2A to FIG.2C show various exemplary immersive images having different formats;

FIG. 2D shows three degrees-of-freedom and six degrees-of-freedom of a user in accordance with associated immersive media;

FIG. 2E and FIG.2F each show distortions within a respective immersive image having an Equirectangular format;

FIG. 2G shows an immersive image having an Equirectangular format and FIG.2H shows a segmentation image generated by image segmentation applied on the immersive image of FIG. 2G;

FIG. 3 A and FIG. 3B show an exemplary tessellation of a sphere into a plurality of polygons according to various aspects;

FIG. 3C shows an immersive image divided into an upper edge region, a lower edge region, and a center region according to various aspects; FIG. 3D shows a tessellation of a portion of a sphere into a plurality of polygons according to various aspects, wherein the portion corresponds to the upper edge region and the lower edge region of FIG. 3C;

FIG. 4 shows an immersive image and a depth image generated from the immersive image;

FIG. 5 shows artifacts within a stitched immersive image;

FIG. 6 A shows a flow diagram of a method for detecting objects within an immersive image according to various aspects;

FIG. 6B shows a flow diagram of a method for determining fear of a user using an immersive medium according to various aspects; and

FIG. 6C shows a flow diagram of a method for reducing a fear level of a user using an immersive medium.

The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. Various embodiments are described in connection with methods and various embodiments are described in connection with devices. However, it may be understood that embodiments described in connection with methods may similarly apply to the devices, and vice versa.

The term "circuit" may be understood as any kind of a logic implementing entity, which may be hardware, software, firmware, or any combination thereof. Thus, in an embodiment, a "circuit" may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor (e.g. a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A "circuit" may also be software being implemented or executed by a processor, e.g. any kind of computer program, e.g. a computer program using a virtual machine code such as e.g. Java. The term "processor" may be understood as any kind of entity capable to process data and/or signals. For example, the data or signals may be handled according to at least one (i.e., one or more than one) specific function performed by the processor. A processor may include or may be an analog circuit, a digital circuit, a mixed signal circuit, a logic circuit, a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a programmable gate array (FPGA), an integrated circuit, or any combination thereof. Any other method of implementing the respective functions, described in more detail below, may also be understood to include a processor or logic circuit. It is understood that one or more of the method steps described in detail herein may be carried out (e.g., implemented) by a processor, through one or more specific functions performed by the processor. The processor may therefore be arranged to carry out any of the information processing methods or components thereof described herein.

Computer-simulated reality provides a highly immersive experience to users which may, on one hand, lead to joyful situations but, on the other hand, also to uncomfortable situations in which a user feels fear due to natural fear, a phobia, or even an anxiety. Various aspects relate to an immersive technology system, a device and a method for detecting fear(s) of a specific user and for modifying immersive media user-specifically to the reduce the fear developed by the specific user. Thereby, the immersive technology system, the device and the method are capable to user-specifically improve the computer-simulated immersive experience of the user.

FIG. 1A shows an immersive technology system 100 according to various aspects. The immersive technology system 100 may include one or more processors 102. The immersive technology system 100 may include one or more output devices 106. Each output device of the one or more output devices 106 may be configured to provide computer-simulated reality to a user 108. The one or more output devices 106 may be configured to provide computer-simulated reality in accordance with an immersive medium 104 to the user 108. Each of the one or more output devices 106 may be configured to provide at least a part of the immersive medium 104 to the user 108. For example, the immersive medium 104 may include an immersive image and the one or more output devices 106 may include a display output device configured to display the immersive image. The immersive medium 104 may include an immersive video. An immersive video may include a plurality of images within a predefined time period (e.g., 30 frames per second, 60 frames per second, 120 frames per second, etc.). It is noted that any processing of an immersive image described herein may be correspondingly carried out for an image of an immersive video. In the case that the immersive medium 104 includes an immersive video, the processing described herein may be carried out for images of the immersive video at predefined time intervals and/or after a predefined number of images (e.g., every tenth image of the immersive video to name an example) and/or each image associated with a key frame of the immersive video. An immersive image, as described herein, may be any kind of image that allows to display, via a dedicated device, computer-simulated reality content in accordance with the image. Hence, an immersive image may show content which allows to provide computer-simulated reality. The display output device may be, for example, a head-mounted display. For example, the immersive medium 104 may include audio data and the one or more output devices 106 may include at least one audio output device configured to output an audio signal in accordance with the audio data. The audio data may represent speech, voice, etc. The audio output device may be, for example, a speaker or a headphone. For example, the immersive medium 104 may include haptic data and the one or more output devices 106 may include at least one haptic output device configured to output a haptic signal in accordance with the haptic data. The haptic data may represent information when and how the haptic signal is to be applied. The haptic signal may be a force, (e.g., focused) ultrasound, a vibration, and/or a motion, etc. A haptic output device may be, for example, a hand controller, a vibrator included in a head-mounted display, a haptics vest, a full body haptics suit, a mouth haptics device (e.g., for lips and/or teeth), etc. It is noted that the above haptic signals only serve as examples and that any other haptics trigger may be employed using a suitable haptic output device. The same applies to the audio output device and the display output device. As an example, an immersive medium may include a disco photo as immersive image, audio data representing disco music, and haptic data representing a beat (e.g., a bass beat) of the disco music. The haptic data may be stored in a memory as an individual file or may be included as meta data of another file.

The herein-described processes carried out by the one or more processors 102 may be carried out by one or more processors of another device separate from the immersive technology system 100. As an example, the object detection described herein may be carried out on a cloud server. However, for illustration, the processes are described herein as being carried out by the one or more processors 102 of the immersive technology system 100.

According to various aspects, the immersive technology system 100 may include one or more sensors 110. The one or more sensors 110 may be configured to detect sensor data representing a reaction of the user 108 in response to providing the immersive medium 104 to the user 108 via the one or more output devices 106. The one or more sensors 110 may include at least one sensor configured to detect a viewing direction of the user 108, such as an orientation sensor configured to detect a head orientation of the user 108 and/or an eye tracking sensor configured to detect eye tracking data representing an eye viewing direction of the user 108. The one or more sensors 110 may include at least one sweat sensor (e.g., a sweat biosensor) configured to detect a sweat production of the user 108. The one or more sensors 110 may include at least one pulse sensor (e.g., a pulse sensor included in a smart watch) configured to detect a pulse of the user 108. The one or more sensors 110 may include at least one face tracking camera configured to detect at least a lower part of the face of the user 108. The one or more sensors 110 may include at least one body tracking sensor configured to detect a motion of the user 108. A body tracking sensor may be a motion sensor within the equipment the user 108 is equipped with. A body tracking sensor may be a camera pointed at the user 108 to detect a motion of the user 108. A body tracking sensor may be a camera located within the equipment the user 108 is equipped with to detect a motion of the user 108 based on the motion of the surrounding. A body tracking sensor may be a position sensor (e.g., within shoes of the user 108). The one or more sensors 110 may include at least one microphone configured to capture audio generated by the user 108. The one or more sensors 110 may include a brain-computer- interface configured to detect reaction of the user (e.g., fear) within the brain of the user 108.

According to various aspects, the one or more processors 102 may be configured to process the immersive medium 104 and to provide instructions representing the immersive medium 104 to the one or more output devices 106. The instructions may include information when and how to display the immersive image, to apply the audio signal, and/or to apply the haptic signal.

Computer-simulated reality (CR) may be related to any kind of immersive environment. The immersive environment may take place in the physical world with, optionally, information (e.g., objects) added virtually (e.g., the computer-simulated reality may be an augmented reality (AR)). The immersive environment may take place in a virtual world (e.g., the computer-simulated reality may be a virtual reality (VR)). It is understood that the virtual world may show a simulation of real-world content. The immersive environment may take place in both, the physical world and the virtual world (e.g., the computer-simulated reality may be a mixed reality (MR)). The immersive environment may be a combination of AR, VR, and MR (e.g., the computer-simulated reality may be an extended reality (XR)). Thus, the immersive medium 104 may be associated with AR, VR, MR, and/or XR.

The immersive medium 104 may represent a specific content of the computer-simulated reality. An immersive image of the immersive medium 104 may be an image (e.g., photo) taken in the real world, an image rendered by a computer, or an image taken in the real world into which computer-rendered features are added. The immersive image may represent a specific area within the computer-simulated reality. This area may be defined by a number of degrees the immersive image fills the computer-simulated reality with content. Illustratively, the area may define how many degrees the user 108 can move his angle of view while still seeing computer-simulated reality content.

In an example, a half-sphere may be filled with content. The phrase “filled with content”, as used herein, may describe that a pixel having a pixel value (e.g., different from black) may be present. In this case, the immersive image may represent 180 degrees of content. Illustratively, the user 108 can move his head 90 degrees in both directions, left and right, from a center point and still see computer-simulated reality content. In this case, the immersive image may have a half-spherical format. A half-spherical format may be advantageous over a spherical format in that it is easier to capture the immersive image with a camera without the camera (and/or a camera crew) being seen in the immersive image. For example, a stereoscopic, half-spherical immersive image can be easily created using a camera with two lenses. However, in the case of a half-spherical immersive image, the immersive experience may be lowered by seeing the black area in the other region of the sphere in the case of the user 108 moves his/her head.

In another example, a full-sphere may be filled with content. In this case, the immersive image may represent 360 degrees of content. Illustratively, the user 108 can move his head anywhere and still see computer-simulated reality content. In this case, the immersive image may have a (full-)spherical format. A spherical format may be advantageous over a half-spherical format in that the whole 360 degrees around the user are filled with content improving the immersive experience. However, it may be difficult to capture a 360 degrees immersive image without seeing the camera in the image. The 360 degrees immersive image may be created using stitching which, however, may lead to artifacts and, thus, lower the immersive experience.

The above examples serve as illustration. The content of the computer-simulated reality, as represented by the format of the immersive image, may be associated with any number of degrees (filled with content). According to various aspects, the format of the immersive image may be associated with a content of 70 degrees or greater. For example, the immersive image may have format of 130 degrees (CR130, such as VR130), 140 degrees (CR140, such as VR140), 270 degrees (CR270, such as VR270), etc.

An immersive image may have an Equirectangular format, a Fisheye format or a Cubemap format. The immersive image may also have another format. According to various aspects, the immersive image may have any kind of format that can be converted into an Equirectangular format (e.g., any kind of map projection). In this case, the one or more processors, described herein, may be configured to convert the immersive image into the Equirectangular format prior to processing the immersive image as described herein. The Equirectangular format is characterized by a map projection in which meridians are mapped to vertical straight lines of constant spacing (e.g., for meridional intervals of constant spacing) and circles of latitude are mapped to horizontal straight lines of constant spacing (e.g., for constant intervals of parallels). The projection is neither equal area nor conformal. Hence, spherical or half-spherical content may be mapped to a rectangle. The Fisheye format is characterized by circular representation. The circle in the center of the image may be filled with content whereas the region outside the circle may be black. A Fisheye format may be, for example, captured using a camera having wide-angle lense(s). The Cubemap format is characterized by six images representing six faces of a cube. “Folding” the six images may lead to the cube and a spectator located inside the cube can see the six images providing spherical content of 360 degrees. An immersive image may be a stereoscopic image or a monoscopic image. In the case of a monoscopic image, the immersive image may include the same content for the left eye and the right eye of the user 108. In the case of a stereoscopic image, the immersive image may include different content for the left eye and the right eye of the user (e.g., a slightly different perspective).

FIG. 2A shows an exemplary immersive image 200A having an Equirectangular format. An immersive image, as described herein, may be characterized by a width, W, in a width direction, w, and a height, H, in a height direction, h. FIG. 2B shows an exemplary immersive image 200B having a Fisheye format. FIG. 2C shows an exemplary immersive image 200C having a Cubemap format.

The immersive medium 104 may be, for example, associated with three degrees-of- freedom or six degrees-of-freedom.

An exemplary illustration is shown in FIG. 2D. In the case of three degrees-of-freedom 202, the user 108 can pitch, yaw, and roll his/her head (i.e., three rotational movements of the head). In the case of six degrees-of-freedom 204, the user 108 can, in addition to the three rotational movements of the head, move translationally forward/backward, left/right, and up/down. An example of a six degrees-of-freedom 204 immersive image is an immersive image having a Lightfield format. This may allow the user 108 to freely position and rotate inside the immersive world. It is noted that the Lightfield format is an example of a format associated with six degrees-of-freedom 204 and that aspects described herein exemplarily for an immersive image having the Lightfield format apply accordingly to any other format capable to represent six degrees-of-freedom 204. An immersive image may be associated with distortions. FIG. 2E shows distortions 210 (as objects with dashed pattern) within the immersive image 200 A exemplarily of immersive images having the Equirectangular format. These distortions are also called Tissot distortions or Tissot’s indicatrix (or indicatrices). Thus, the Equirectangular projection illustrated in FIG. 2E is represented by the Tissot indicatrix. As shown, a lower edge portion and a higher edge portion of immersive images having the Equirectangular format are highly distorted, in particular stretched. FIG. 2F shows an another exemplary immersive image 200F having the Equirectangular format. In FIG. 2F, a portion 206 of the lower edge region of the immersive image 200F is zoomed out illustrating that a person located in the lower edge region is shown in a stretched manner. If a common (two- dimensional) semantic image segmentation algorithm may be employed to detect objects within the immersive image 200F, it may not be possible to detect the person within the lower edge region due to the distortions (since the semantic image segmentation algorithm may be trained on two-dimensional non-distorted images). This may apply to all regions associated with distortions. Generally, a (e.g., flat) two-dimensional image (in some aspects referred to as perspective image) may not be used to provide computer- simulated reality. As shown in FIG. 2G, the immersive image 200 A may include a plurality of pixels p(h, w) arranged adjacent to each other. The plurality of pixels p(h, w) may define the width, W, of the immersive image 200A and the height, H, of the immersive image 200A. Each of the plurality of pixels p(h, w) may be associated with a position within the immersive image 200A different from the position of the other pixels. The position of each pixel may be associated with a position in height direction, h, and a position in width direction, w. FIG. 2H shows a segmentation image 200H generated from the immersive image 200A using a two-dimensional semantic image segmentation algorithm. The segmentation image 200H may include a plurality of labels l(h, w). Each label of the plurality of labels l(h, w) may be bijectively assigned to a corresponding pixel of the plurality of pixels p(h, w) of the immersive image 200A. A label may represent a class associated with an object represented within the immersive image 200 A. However, wrong labels may be assigned to pixels which are associated with distortions within the immersive image 200A. Hence, a two-dimensional semantic image segmentation algorithm may fail to correctly label an immersive image. The same applies to immersive images having a Fisheye format or a Cubemap format. In the case of an immersive image having the Fisheye format, the distortions increase closer to the edge of the circular image. In the case of an immersive image having the Cubemap format, the border(s) between two or more neighboring faces of the cube induces distortions.

A two-dimensional (e.g., semantic) image segmentation algorithm, as used herein, may be or may employ any (e.g., common) two-dimensional image segmentation algorithm, such as an image segmentation convolutional neural network (CNN), the Mask R-CNN (e.g., as described in He et al.: “Mask R-CNN”, arXiv: 1703.06870v3, 2018), the DeeplLabV3 algorithm (e.g., as described in Chen et al.: “Rethinking Atrous Convolution for Semantic Image Segmentation”, arXiv: 1706.05587v3, 2017), Graphonomy (e.g., as described in Gong et al.: “Graphonomy: Universal Human Parsing via Graph Transfer Learning“, arXiv: 1904.04536vl, 2019), or any other algorithm capable to carry out image segmentation (e.g., semantic image segmentation) of an image.

Further training a two-dimensional (e.g., semantic) image segmentation algorithm using immersive images is computationally costly since the shape of an object (e.g., a human, a face, a car, etc.) varies depending on the position of the object within the immersive image due to the distortions described above.

According to various aspects, the one or more processors 102 may be capable to detect objects within an immersive image using any two-dimensional image processing algorithm, such as any (e.g., semantic) image segmentation algorithm as described in the following with reference to the process flows shown in FIG. IB:

The immersive medium 104 may include an immersive image 104i. As described herein, an immersive image, such as the immersive image 104i, may represent a predefined number of degrees of a sphere which are filled with content. Hence, the immersive image 104i may be associated with at least a portion of the sphere (e.g., with a portion of the sphere, such as a half sphere, or with the full sphere). The one or more processors 102 may be configured to process the immersive image 104i .

The one or more processors 102 may be configured to tessellate at least the portion of the sphere into a plurality of polygons 112. A tessellation, as described herein, is understood as fully dividing the associated portion of the sphere or the full sphere into disjoint (e.g., non-overlapping) polygons. The plurality of polygons 112 may include different types of polygons or the same type of polygons. According to various aspects, the polygons of the plurality of polygons 112 may be the same type of polygon, i.e., having the same number of sides and edges. However, dimensions of one or more polygons of the plurality of polygons 112 may be different from the other polygons. For example, some polygons may be larger or smaller. For example, some polygons may have a different shape (e.g., different angles between their sides). It is understood that the polygons of the plurality of polygons 112 may also have the same dimensions. The one or more processors 102 may be configured to tessellate at least the portion of the sphere into the plurality of polygons 112 such that each polygon of the plurality of polygons 112 corresponds to a respective part of the immersive image 104i .

FIG. 3A shows an exemplarily tessellated full-sphere 300 according to various aspects. The tessellated full-sphere 300 may be tessellated into the plurality of polygons 112. The plurality of polygons 112 may include a top polygon 302t, a bottom polygon 302b, and side-on polygons 302s. According to various aspects, each side-on polygon 302s (i.e., each polygon of the plurality of polygons 112 except for the top polygon 302t and the bottom polygon 302b) may be of the same type of polygon. According to various aspects, each side-on polygon 302s may be a quadrilateral (i.e., having four edges and four sides). Illustratively, the full-sphere 300 may be tessellated into a polyhedron with quadrilateral faces. It is noted that for the purpose of processing the one or more processors 102 may be configured to tessellate the sphere (e.g., the portion of the sphere or the full sphere) into triangles, wherein two (or more) neighboring triangles form the respective polygon, such as the quadrilateral. According to various aspects, each quadrilateral described herein may be formed by two triangles. A quadrilateral may be, for example, a rectangle, trapezoid, a parallelogram, a rhombus, etc.

In the case of tessellating the full sphere, the plurality of polygons may include a number of polygons in the range from about 12 to about 50 (e.g., in the range from about 30 to about 50).

With reference to FIG. IB, the one or more processors 102 may be configured to, for one or more polygons (e.g., for each polygon) of the plurality of polygons 112, project at least the corresponding part of the immersive image 104i onto the sphere and generate a respective perspective image 114 representing at least the corresponding part of the immersive image 104i . A perspective image, as used herein, may refer to a two- dimensional image. Illustratively, the immersive image 104i may represent a portion of the sphere or the full sphere and the one or more processors 102 may tessellate this portion or full sphere into the plurality of polygons 112 such that each polygon represents a corresponding part of the immersive image 104i; then the one or more processors 102 may generate, for one or more of the polygons (each for each polygon), a respective perspective image which shows at least this corresponding part of the immersive image 104i .

According to some aspects, each generated perspective image 114 may represent (e.g., show) only the corresponding part of the immersive image 104i . Hence, in this case, there may be no overlap between the content of images generated for neighboring polygons. According to other aspects, each generated perspective image 114 may represent (e.g., show) the corresponding part of the immersive image 104i and an additional part of the immersive image 104i surrounding the corresponding part. In this case, there may be an overlap between the content of perspective images generated for neighboring polygons. This may improve the subsequent object detection since objects may be shown completely whereas in the case of no overlap, objects which are present at border(s) between two or more neighboring polygons, may be shown in the perspective images only in part, thereby, reducing the probability of a correct classification via (semantic) image segmentation. Hence, in the case that each generated perspective image 114 represents (e.g., shows) the corresponding part and the additional part of the immersive image 104i, the accuracy of the subsequent (semantic) image segmentation is improved. In the case that the generated perspective image 114 represents the corresponding part of the immersive image 104i, the polygon may be a rectangle such that the perspective image 114 is a rectangular perspective image. In the case that the generated perspective image 114 represents the corresponding part and the additional part of the immersive image 104i, the area covered by the corresponding part and the additional part may be rectangular such that the perspective image 114 is a rectangular perspective image.

A ratio between an area of the corresponding part of the immersive image 104i and an area of the additional part (surrounding the corresponding part) of the immersive image 104i may be in the range from about 1.4 to about 3.1. Hence, each side of the respective (e.g., rectangular) perspective image may be greater than the side of the corresponding polygon (e.g., the corresponding quadrilateral) by about 15 % to about 30 %. As an example, the one or more processors may be configured to generate a frame, for example a rectangular frame, surrounding the (e.g., rectangular) polygon. The edges of the frame may extend from the edge of the polygon by about 15 % to about 30 % of the length of the respective edge of the polygon. The perspective image 114 may represent the area covered by the polygon and the frame.

For example, the polygon may be rectangle covering an area of 800 pixels times 400 pixels of the immersive image 104i . In the case that the edges of the frame extend from the edge of the rectangle by 15% (e.g., by 7.5 % in each dimension of each edge), the frame covers an area of 920 pixels times 460 pixels minus the area covered by the rectangle resulting in 103200 additional pixels. Hence, the ratio between the area covered by the rectangle and the area covered by the frame may be:

J J 800 * 400

- « 3.10 (920 * 460) - (800 * 400)

In the case that the edges of the frame extend from the edge of the rectangle by 30% (e.g., by 15 % in each dimension of each edge), the frame covers an area of 1040 pixels times 520 pixels minus the area covered by the rectangle resulting in 220800 additional pixels. Hence, the ratio between the area covered by the rectangle and the area covered by the frame may be:

800 * 400 - « 145 (1040 * 520) - (800 * 400)

FIG. 3B shows the exemplarily tessellated full-sphere 300. As shown on the left, a first perspective image 1 may be generated which shows, for a first side-on polygon 302s(l), the corresponding part and the additional part of the immersive image 104i. As shown on the right, a second perspective image 2 may be generated which shows, for a second side- on polygon 302s(2), the corresponding part and the additional part of the immersive image 104i. Illustratively, there may be a part of the immersive image 104i which may be represented (e.g., shown) by the first perspective image 1 and the second perspective image 2. A respective perspective image may be generated for each polygon of the plurality of polygons 112. Illustratively, the sphere may be “peeled like an onion”. According to various aspects, a respective quadrilateral perspective image may be generated for the top polygon and the bottom polygon. These quadrilateral perspective images may have some overlap with the perspective images generated for each of the neighboring side-on polygons.

Illustratively, the one or more processors 102 may be configured to generate one or more perspective images 114 (e.g., a plurality of perspective images) which represent a part of the immersive image 104i or the complete immersive image 104i two-dimensionally.

As described herein, an immersive image, such as the immersive image 104i, may have an Equirectangular format, a Fisheye format, or a Cubemap format. According to some aspects, the one or more processors 102 may be configured to generate the perspective images 114 directly from immersive image 104i having the respective format by projecting the immersive image 104i onto the sphere and generating the perspective image accordingly. According to other aspects, the one or more processors 102 may be configured to convert the immersive image 104i to another format prior to generating the perspective images 114. For example, the immersive image 104i may have a Fisheye format and the one or more processors 102 may be configured to convert the Fisheye format into an Equirectangular format prior to projecting at least the corresponding part of the immersive image associated with a respective polygon onto the sphere and generating the respective perspective image representing at least the corresponding part of the immersive image. As shown in FIG. 2E, the lower edge region and the upper edge region of an immersive image having the Equirectangular format may highly distorted (e.g., stretched) as compared to a center portion of the immersive image (see distortions 210 in FIG. 2E). According to various aspects, in the case that the immersive image 104i has the Equirectangular format (or is converted into the Equirectangular format), the one or more processors 102 may be configured to divide the immersive image 104i into an upper edge region, a lower edge region, and a center region and may generate the perspective images 114 for the (e.g., distorted) upper edge region and lower edge region. As exemplarily shown in FIG. 3C, the one or more processors 102 may (disjointly) divide the immersive image 200F into upper edge region 312, the lower edge region 314, and the center region 316 located between the upper edge region 312 and the lower edge region 314. The respective size of the upper edge region 312, the lower edge region 314, and the center region 316 may be predefined (e.g., as respective percentage in height direction or as relation to each other). The upper edge region 312 of the immersive image 200F may be associated with an upper region of the sphere and the lower edge region 314 may be associated with a lower region of the sphere. As shown in FIG. 3D, the one or more processors 102 may be configured to tessellate an upper region 322 of the sphere which is associated with the upper edge region 312 and a lower region 324 of the sphere which is associated with the lower edge region 314 into the plurality of polygons. The one or more processors 102 may be configured to generate the perspective images 114 for the plurality of polygons as described herein and may apply a selected two-dimensional image processing algorithm (e.g., a two-dimensional semantic image segmentation algorithm 116) on each perspective image. According to various aspects, the selected two- dimensional image processing algorithm (e.g., the two-dimensional semantic image segmentation algorithm 116) may be applied directly on the center region 316 of the immersive image. Since the center regions includes a lower amount of distortions 210, the selected two-dimensional image processing algorithm may be capable to process this part of the immersive image directly without generating one or more perspective images of this part first. This may reduce the processing cost, such as the processing cost for detecting the one or more objects within the immersive image 104i .

With reference to FIG. IB, the one or more processors 102 may be configured to implement the two-dimensional semantic image segmentation algorithm 116. The two- dimensional semantic image segmentation algorithm 116 may be configured to (e.g., trained to) detect (e.g., classify) objects within an image. The one or more processors 102 may be configured to detect, or each generated perspective image 114, one or more objects 118 within the respective perspective image using two-dimensional semantic image segmentation (e.g., using the two-dimensional semantic image segmentation algorithm 116).

According to various aspects, the one or more processors 102 may be configured to generate a segmentation image of the immersive image 104i by combining the objects detected for all perspective images 114.

It is understood that the two-dimensional semantic image segmentation is an example of processing a perspective image. The one or more processors 102 may be configured to implement any (two-dimensional image processing) algorithm configured to process perspective images. Hence, the one or more processors 102 may be configured to process an immersive image to generate perspective images as described above and may then apply the algorithm on each perspective image. This allows to process any immersive image using a (e.g., common) two-dimensional image processing algorithm.

An example is shown in FIG. 1C: The one or more processors 102 may be configured to implement an algorithm 120 configured to generate a respective depth image 122 of each perspective image 114. A depth image, as used herein, may include depth information of the processed perspective image. The one or more processors 102 may be configured to combine the generated depth images to a depth map. Hence, the depth map may represent depth information of the immersive image 104i. FIG. 4 shows an exemplary immersive image 400 and a graphical representation 402 of a depth map generated for the immersive image 400 as described above.

According to various aspects, the one or more processors 102 may be configured to apply more than one two-dimensional image processing algorithm on the generated perspective images 114 (and optionally the center region of the immersive image in the case that the immersive image is divided as described above). For example, the one or more processors 102 may be configured to detect one or more objects in each perspective image 114 using the two-dimensional semantic image segmentation algorithm 116 and to generate a respective depth image 122 for each perspective image 114 using the algorithm 120. The one or more processors 102 may be configured to determine a respective size of each of the one or more objects detected within the respective perspective image using the generated depth image 122. The one or more processors 102 may be configured to determine a respective distance of each of the one or more objects from a center of the sphere using the generated depth image 122. The size of the one or more objects and/or their distance from the center of the sphere may be determined using the distance between the camera lenses of the camera used for capturing the immersive image 104i . The distance between the camera lenses may correspond to the interpupillary distance (IPD) of humans.

Illustratively, using the distance between the camera lenses may allow to map the depth values of the depth image to units like centimeters or meters.

According to various aspects, the immersive image 104i may be a stereoscopic immersive image. In this case, the one or more processors 102 may be configured to carry out the above described image processing either for one of the two images (i.e., either the image associated with the left eye or the image associated with the right eye of the user 108) or for both images of the stereoscopic immersive image.

According to various aspects, as an alternative to the above described tessellation, the one or more processors 102 may be configured to generate the perspective images from the immersive image 104i in a different manner. According to this alternative, the one or more processors 102 may be configured to render the immersive image 104i . As known to the skilled person, rendering (also called image synthesis) is the process of generating an image considering geometry, viewpoint, texture, lighting, and shading information describing the (e.g., virtual) scene. In the case of an immersive image, rendering the immersive image may include projecting the immersive image on a sphere. Images (e.g., immersive images) may be rendered, for example, using rasterization or ray-tracing. The one or more processors 102 may be configured to, for each virtual camera position of one or more virtual camera positions, generate, for each virtual viewing direction of one or more virtual viewing directions, a respective perspective image by taking a screenshot of a corresponding part of the rendered immersive image from the respective virtual camera position in the respective virtual viewing direction. Hence, from each virtual camera position screenshots from the rendered immersive image may be taken in one or more virtual viewing directions. These screenshots may be perspective images and, as described above, a selected image processing algorithm (e.g., the two-dimensional semantic image segmentation algorithm 116) may be applied on each of the perspective images. This may be, for example, advantageous in the case that the immersive image 104i is associated with six degrees-of-freedom. These screenshots may be stereoscopic or monoscopic.

As described above, the immersive experience (also referred to as feeling of “presence”) can feel completely real to the user 108 which may lead to uncomfortable situations in which the user 108 feels fear. A “fear”, as used herein, may relate to natural fear, a phobia, or an anxiety. “Natural fear”, as used herein, may be understood as a natural emotion that protects people (hence, including the user 108) from harm when they face real and imminent danger. A “phobia”, as used herein, may be understood as an excessive fear or an anxiety related to specific objects or situations that are out of proportion to the actual danger they present. An “anxiety”, as used herein, may be understood as an intense fear that may be triggered by a stimulus that is excessive, unpredictable and unfocused. An „anxiety” may persist long (e.g., several minutes or even up to hours) after the trigger (if there is a trigger) is removed. It is understood that there may be some correlation since a phobia may cause natural fear and/or anxiety. Therefore, natural fear(s), phobia(s), and anxiety (or anxieties) may be described as “fear” in general. According to various aspects, the immersive technology system 100 may be capable to determine whether the user 108 has a specific fear or not. FIG. ID shows a processing flow for determining fear of the user 108 using the immersive medium 104i via the immersive technology system 100.

The one or more processors 102 may be configured to detect one or more objects 124 within the immersive medium 104. The objects of the one or more objects 124 may be visual objects, audio objects, and/or haptic objects.

For example, the immersive medium 104 may include the immersive image 104i and the one or more processors 102 may be configured to detect at least one object of the one or more objects 124 as visual object within the immersive image 104i . The one or more processors 102 may be configured to detect the at least one object within the immersive image 104i using semantic image segmentation. The detection of the at least one object within the immersive image 104i may be carried out via tessellation (such that the at least one object may correspond to the one or more objects 118) and/or via taking screenshots within the rendered immersive image as described herein. The one or more sensors 110 may include the at least one sensor configured to detect a viewing direction of the user 108 (such as the orientation sensor and/or the eye tracking sensor) and the one or more processors 102 may be configured to detect the at least one object within the immersive image 104i in the field of view in viewing direction of the user 108. According to various aspects, the immersive image 104i may be a stereoscopic immersive image including a first immersive image associated with the left eye and a second immersive image associated with the right eye. In this case, the one or more processors 102 may be configured to detect the at least one object within the first immersive image and/or the second immersive image.

The immersive medium 104 may include audio data. The one or more processors 102 may be configured to detect at least one object of the one or more objects 124 within the audio data as audio object. The one or more processors 102 may be configured to detect the at least one object within the audio data using audio object detection. An example of audio object detection is described in Sukthankar et al.: “Semantic Learning for Audio Applications: A Computer Vision Approach”, IEEE Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06), 2006.

The immersive medium 104 may include haptic data. The one or more processors 102 may be configured to detect at least one object of the one or more objects 124 within the haptic data as haptic object (e.g., an applied force, a vibration, etc.).

The one or more processors 102 may be configured to determine, in 126, for each object of the detected one or more objects 124, whether the respective object is associated with at least one (sensation of) fear i. Each fear, i e I with i > 1, may be associated with at least one object. Each fear, i, may be associated with more than one object and each object may be associated with more than one fear. According to various aspects, a predefined list including one or more associated objects for each fear, i e I, may be stored within a memory device and provided to the one or more processors 102. The one or more processors 102 may be configured to determine for each fear, i e I, within the predefined list, whether at least one of the one or more associated objects is part of the detected one or more objects 124. Illustratively, each of the detected one or more objects may be labelled as not fear-related or may be associated with one or more fears, i.

According to various aspects, the one or more processors 102 may be configured to detect the one or more objects 124 by detecting a plurality of elements within the immersive medium (e.g., the immersive image 104i, the audio data, and/or the haptic data), by determining for each element a respective probability of each of a plurality of objects, and by detecting the associating the element with the object having the highest probability. Alternatively, all objects having a probability equal to or greater than a threshold value may be added to the one or more objects. According to various aspects, the one or more processors 102 may be configured to adjust the determined probabilities using context information. For example, the immersive medium 104 may include a sequence of immersive images including at least a first immersive image and a (e.g., consecutive) second immersive image. The one or more processors 102 may be configured to detect one or more objects within the first immersive image and one or more objects within the second immersive image. The one or more processors 102 may be configured to determine, whether the same object is detected within the first immersive image and the second immersive image and, in the case that it is determined that the same object is detected within the first immersive image and the second immersive image, increase the probability associated with this object (or lower the predefined object detection threshold value). Illustratively, in the case that two consecutive images show the same object, it is assumed that the object is correctly labelled and the probability of the object is increased. The probability may represent a confidence level. This may also apply to consecutive objects within audio data. Similarly, the one or more processors 102 may be configured to detect one or more objects within the immersive image 104i and one or more objects within the audio data and may determine, whether the same object is or connected objects are detected within the immersive image 104i and the audio data. For example, a bee may be shown in the immersive image 104i and someone may talk about bees represented by the audio data. In this case, the bee may be detected as visual object within the immersive image 104i and also as audio object within the audio data. In another example, the bee may be shown in the immersive image 104i and the audio data may represent a droning of bees. In this case connected objects within the immersive image 104i and the audio data may be detected (a bee and a droning of bees). The one or more processors 102 may be configured to increase the probability associated with this object/these objects in the case that it is determined that the same object is or connected objects are detected within the immersive image and the audio data. It is noted that not detecting a specific object may also be related to the object or another object. For example, not detecting any sound may increase a probability of some detected objects which are not associated with any sound (e.g., ants). This may be an indirect trigger. Similarly, text may be shown in the immersive image (e.g., detected via optical character detection) may be used as indirect trigger or as visual object.

According to various aspects, the one or more processors 102 may be configured to determine depth information for each object of the detected one or more objects 124 (e.g., by generating the depth map as described herein).

Table 1 shows various fears and an example of their detection. It is understood that there are many other fears which may be detected accordingly by detecting visual objects, audio objects, and/or haptic objects associated with the respective fear. It is noted that the detection of the fears serves as an example only and that only some of the described parameters may be detected and/or that other parameters may be detected which are associated with the respective fear. It is noted that the categorization serves for illustration only. For detection, the specific values (probabilities of detected objects, sound level, etc.) or classes (e.g., low/medium/high or true/false) may be employed. For example, sound may be classified as “loud” in the case that the sound level is equal to or greater than a predefined threshold value.

Further, there may be fears which are not yet known and/or which may be experienced specifically for immersive media. Fears which are not yet known may be detected similar to fears not known to the user 108. For example, in the case that a fear reaction of the user 108 is detected (as described below), the detected objects and a respective probability indicating that the object induces this fear may be stored and, in the case that the probability that the object induces the fear is greater than a predefined threshold, determine that the user has a fear associated with this object. According to various aspects, one or more immersive media related to the determined fear may be presented to the user to verify whether the user has the fear (by detecting the fear reaction of the user) or not (by detecting no fear reaction responsive to presenting the immersive media).

As mentioned above, there may be fears experienced specifically for immersive media. Immersive images (e.g., 360 degrees immersive images) may be created using stitching. However, stitching may lead to artifacts within the stitched images present as areas filled with a color value in a predefined manner. Hence, the artifacts may be one-colored areas. As an example, the areas may be black leading to black holes (black since there is no content) within the stitched images. For example, when generating 360 mono images through a plurality of individual perspective images (e.g., photos) using a smartphone or tablet, some software may not be able to get data from all angles and instead leave these areas black. These black holes may, for example, have a size of 0.5 by 0.5 meters. As another example, these filled areas may be filled with a predefined color value, the color value of a neighboring pixel, or an average (e.g., median or arithmetic average) over a predefined number of nearby pixels. However, in all cases, the user may recognize these filled areas as being non-realistic inside the otherwise real-looking content. FIG. 5 shows one-colored artifacts 504, 506 within a zoomed region 502 of a stitched immersive image 500. Due to the immersive experience, these artifacts may result in a moody feeling of the user 108 or even fear. For example, they may give the impression as if the world would be ripped apart like in Sci-Fi movies and can, therefore, make an unpleasant feeling which might trigger fear. They may be detected as one-colored (e.g., black) objects having no content. Another example of a fear experienced specifically for immersive media is the fear of seeing yourself realistically different from a mirror (since the immersive image may include an image of the user 108 itself). Another example of a fear experienced specifically for immersive media is the fear hearing sounds that cannot be there in the specific situation shown in the immersive image. For example, the user 108 may hear a sound from outside (i.e., the real world), whereas a completely unrelated situation is shown in the immersive image. It is understood that these are only examples and that there may be other fears experienced specifically to immersive media. All of these fears may be detected (and optionally mitigated) as described herein.

The one or more processors 102 may be configured to determine a respective fear level for each fear, i e I, using the detected one or more objects 124. For example, each object may be associated with a predefined fear value and the fear level of the respective fear, i, may be increased by this predefined fear value in the case that the detected object is associated with the fear, i. Initially, the fear level may be zero. According to various aspects, the predefined fear value may be reduced in the case that the object is detected within the immersive medium 104 more than once. For example, in the case that two spiders are detected within an immersive image, the fear level associated with the fear of spiders may be increased by the predefined fear value (e.g., 30 points) for the first spider and a value less than the predefined fear value (e.g., 20 points) for the second spider. This takes into account that a fear may be trigger by a specific object but not increased linearly with the number of objects. The one or more processors 102 may be configured to adjust the predefined fear value depending on the size of the object. For example, the predefined fear value may be adjusted depending on a ratio between the size of the object and the common size or common size range of the object. As an example, in the case that a three-meter large spider is detected, the predefined fear value associated with the fear of spiders, may be increased, whereas in the case that a three-meter large elephant is detected, the predefined fear value associated with the fear of elephants may be kept unchanged. The one or more processors 102 may be configured to adjust the predefined fear value depending on the distance of the object from a predefined point (e.g., the camera location). The distance may be determined using the depth information associated with the detected object (e.g., using the depth map). As an example, in the case that the fear-triggering object is located far away from the user 108, the predefined fear value may be kept unchanged, whereas in the case that the user 108 is located directly next (e.g., less than 10 meters) to the detected object, the predefined fear value may be increased. According to various aspects, the predefined fear value associated with a detected object and a corresponding fear, i, may be adjusted as a function of the distance, the size, how often the object is detected, etc. Similarly, the predefined fear value may be reduced depending on the abovedescribed parameters (e.g., instead of being kept unchanged).

The one or more processors 102 may be configured to classify each fear, i e I, depending on the determined fear level. For example, each fear, i, may be classified into True (“fear”) in the case that the determine fear level is equal to or greater than a predefined trigger threshold value and False (“no fear”) otherwise. As another example, each fear, i, may be classified into low trigger in the case that the determined fear level is lower than a predefined first trigger threshold value, into medium trigger in the case that the determined fear level is equal to or greater than the predefined first trigger threshold value and lower than a predefined second trigger threshold value (greater than the predefined first trigger threshold value), and a high trigger in the case that the determined fear level is equal to or greater than the predefined second trigger threshold value. These are only examples and other kinds of fear level trigger-detection are possible. Using these triggers, the one or more processors 102 may be configured to, for each fear, i e I, whether the respective fear is triggered or not. Thus, a fear, i, may be triggered either directly in the case that an object associated with the fear is detected or depending on the determined fear level. Using the trigger(s) associated with the determined fear level takes into account that an object which is associated with a fear, but far away, hidden, and/or small compared to its common size, may not trigger the fear.

As an example, the initial fear level associated with the fear of spiders (arachnophobia) starts at 0. After detecting a spider as visual object within the immersive image 104i, the predefined fear value is adjusted in accordance with the size of the spider (e.g., increased in the case that the spider covers a 5 degree times 5 degree field of view in the 180 degree times 180 degree immersive image) and/or the distance of the spider (e.g., increased (e.g., linearly or quadratically) with distance closer to the user 108). The fear level may be increased by the adjusted fear value. In the case that the fear level is equal to or greater than a predefined trigger threshold value, the fear of spiders is determined (e.g., triggered)

An exemplary illustration for detecting the fears, i e I, is shown in FIG. IE. The one or more processors 102 may be configured to determine, in 156, a respective fear level for each fear, i e I, using a list of visual objects 150 (e.g., including a respective position and/or respective size of each object within the immersive image 104i), the depth image 122 (in some aspects referred to as depth map), a list of audio objects 152 (e.g., sounds, voices, etc.), and/or a list of haptic objects 154. The one or more processors 102 may be configured to, for each fear having a determined fear level equal to or greater than an associated predefined trigger threshold value, output, in 158, the fear and corresponding information regarding the object(s) associated with the fear. The corresponding information regarding the object(s) associated with the fear may include the determined fear level (in some aspects referred to as fear index), the position (e.g., elevation and azimuth) of the visual object(s), the size (e.g., a radius of a circle surrounding the object or the lengths of the sides of a rectangle surrounding the object, etc.) of the visual object(s), a point in time or a timeframe in which the audio object occurs within the audio data, a point in time or a timeframe in which the haptic object occurs within the haptic data, etc.

With reference to FIG. ID, the one or more sensors 110 may be configured to detect sensor data 128 representing a reaction of the user 108 in response to providing the immersive medium 104 to the user 108 via the one or more output devices 106. The sensor data provided by a respective sensor of the one or more sensors 110 may be associated with a predefined time interval after providing the immersive medium 104 to the user 108. The reaction of the user 108 in response to the immersive medium 104 may refer to different body reactions and each body reaction may be associated with a specific reaction time. For example, in response to feeling fear, an eye reaction and/or a movement of the body may have a lower reaction time than the production of sweat. Therefore, the time interval in which the sensor data of a respective sensor are acquired (or which sensor data of acquired sensor data are used) may depend on the reaction time associated with the type of reaction and, thus, the type of sensor. Hence, the sensor data 128 may be associated with one or more time intervals after providing the immersive medium 104 to the user 108. As an example, upon reacting with fear to an immersive medium, the user 108 may close his/her eyes and/or may avert his/her gaze from the fear-inducing object(s) first (e.g., within a first predefined time interval), then the pulse of the user 108 may increase due to fear (e.g., within a second predefined time interval), and even later the sweat production of the user 108 may increase (e.g., within a third predefined time interval). It is noted that the predefined time intervals may at least partially overlap with each other.

The one or more processors 102 may be configured to, in 130, determine, using the sensor data 128, whether the reaction of the user 108 is associated with a fear reaction. According to various aspects, the sensor data 128 may represent one or more individual reactions of the user 108. A probability that the reaction of the user 108 is associated with a fear reaction may be adapted (e.g., increased, decrease, or kept the same) for each of the one or more individual reactions. For example, the one or more sensors 110 may include the at least one sensor configured to detect a viewing direction of the user 108 (such as the orientation sensor and/or the eye tracking sensor), and the one or more processors 102 may be configured to determine, using the sensor data provided by the at least one sensor, whether the user 108 averts his/her gaze from the one or more objects 124 detected within a field of view of the user 108 in viewing direction within the predefined time interval associated with the at least one sensor and to increase the probability of the fear reaction in the case that it is determined that the user 108 averts his/her gaze from the detected one or more objects 124 within the predefined time interval after the immersive image 104i of the immersive medium 104 is provided to the user 108. For example, the one or more sensors 110 may include the sweat sensor and the one or more processors 102 may be configured to increase the probability of the fear reaction with increasing sweat production within the predefined time interval associated with the sweat sensor. For example, the one or more sensors 110 may include the pulse sensor and the one or more processors 102 may be configured to increase the probability of the fear reaction with increasing pulse of the user 108 within the predefined time interval associated with the pulse sensor. For example, the one or more sensors 110 may include the face tracking camera and the one or more processors 102 may be configured to determine, whether at least the lower part of the face of the user 108 detected within the predefined time interval associated with the face tracking camera indicates fear and to increase the probability of the fear reaction in the case that it is determined that the detected lower part of the face of the user indicates fear. For example, the one or more sensors 110 may include the at least one body tracking sensor and the one or more processors 102 may be configured to determine, using the motion of the user 108 detected via the at least one body tracking sensor, whether the user 108 moves back within the predefined time interval associated with the at least one body tracking sensor and to increase the probability of the fear reaction in the case that it is determined that the user 108 moves back within the predefined time interval. For example, the one or more sensors 110 may include the microphone and the one or more processors 102 may be configured to determine, whether the audio captured within the predefined time interval associated with the microphone indicates fear, and to increase the probability of the fear reaction in the case that it is determined that the audio captured within the predefined time interval indicates fear. As an example, the user 108 screaming may indicate fear. As another example, a hard and/or rapid breathing of the user 108 may indicate fear. The one or more sensors 110 may include the brain-computer-interface and the one or more processors 102 may be configured to increase the probability of the fear reaction in the case that fear within the brain of the user 108 is detected via the brain- computer-interface within the predefined time interval associated with the brain-computer- interface. It is noted that the above examples are for illustration only and that any sensor configured to detect a reaction of the user 108 in response to an immersive medium may be used. The above described relations may be implemented in any suitable manner. For example, the probability of the fear reaction may be increased by a respective predefined probability value in the case that a value of the parameter detected by the respective value is greater than a predefined threshold value (or two or more predefined threshold values may be implemented with respectively associated predefined probability values). In another example, the probability of the fear reaction may be increased as a function of (e.g., linearly with) the detected value of the respective parameter. This kind of implementation may be the same for each sensor or different for the sensors.

The one or more processors 102 may be configured to determine that the reaction of the user 108 is associated with the fear reaction in the case that the probability of the fear reaction is equal to or greater than a predefined fear reaction threshold value.

As an example of the determination of the fear reaction (in 130), an immersive image 104i showing a spider as detected visual object 124 may be presented to the user 108. The determined fear level may indicate a fear of spiders (e.g., determined depending on the size and/or distance of the spider) (“Yes” in 126). The sensor data 128 may indicate that the user 108 looks at the spider, that the pulse increases by 20 heart beats within a predefined time interval of 5 seconds, that the amount of sweat increases by 25 percent within a predefined time interval of 30 seconds after seeing the spider, that the pupil dilation of the user 108 responsive to looking at the spider increases by 15 percent, that the user 108 moved back after seeing the spider. The probability of the fear reaction may be increased (in accordance with one or more predefined values) in accordance with the indicators of the sensor data 128 leading to a fear reaction probability greater than the associated predefined fear reaction threshold value such that the one or more processors 102 may determine (in 130) that the reaction of the user 108 is associated with the fear reaction. Due to the determined fear reaction and the triggered fear of spiders in this example, the probability that the user 108 has the fear of spiders is increased (e.g., in 132).

The one or more processors 102 may be configured to, in 132, adjust, for each detected fear, i, (“Yes” in 126) the probability of the user 108 having the fear depending on the reaction of the user 108. For example, in the case that the fear reaction is determined (“Yes” in 130, i.e., that the reaction of the user 108 is associated with a fear reaction), the one or more processors 102 may increase the fear probability, pi, of the fear, i. For example, in the case that no fear reaction is determined (“No” in 130, i.e., that the reaction of the user 108 is not associated with a fear reaction), the one or more processors 102 may decrease the fear probability, pi, of the fear, i. The one or more processors 102 may be configured to determine (in 134), whether the fear probability, pi, of the fear, i, is equal to or greater than a predefined fear threshold value, ptu,i . The one or more processors 102 may be configured to determine, in 136, that the user 108 has the fear, i, in the case that the fear probability, pi , is equal to or greater than the predefined fear threshold value, ptu,i (“Yes” in 134). The one or more processors 102 may be configured to determine, in 138, that the user 108 does not have the fear, i, in the case that the fear probability, pi, is less than the predefined fear threshold value, pth,i (“No” in 134). Thereby, also fears which are determined for the user 108 may be removed in the case that the user 108 does not show a fear reaction (anymore). For example, the one or more processors 102 may be capable to determine that the user 108 gets used to the objects such that they do not trigger the fear or the overall fear of the user is reduced (e.g., the user may no longer have the fear).

It is noted that all thresholds or threshold values described herein may be equal for all fears, i c I, or may be different for at least one fear, i, from the other ones. For example, each fear, i, may be associated with individual fears.

The described determination of fears (or no fears) of the user 108 may be employed in various situations. For example, the user 108 may (via an input device of the immersive technology system 100) initially input fears he has (or thinks to have) and these fears may be adapted by adding (in the case it is determined, in 136, that the user 108 has a fear, i) and/or removing fears (in the case it is determined, in 138, that the user 108 has not a fear, i). As another example, a predefined list of fears may be initially stored for each user. According to even another example, initially no fears may be known at all only determined as described above.

The fears determined for the user 108 may be stored in a memory device as user-specific fear information (e.g., as part of a user profile). This user-specific fear information may be used for treating the fears of the user 108 and/or inducing fear to the user 108 and/or for providing mitigation strategies for reducing the fear level of an immersive medium.

For example, a selected fear of the user 108 may be treated such that the one or more processors 102 may select one or more immersive media which show at least one object associated with (at least) the selected fear of the user 108 and in which the fear level associated with the fear is below a predefined therapy threshold value. Illustratively, low fear triggers may be provided to the user 108 to get used to them (exposure therapy). The predefined therapy threshold value may be increased over time. The sensor data acquired by the one or more sensors 110 may be used to monitor the user 108 while treating the selected fear. For example, the treatment may be interrupted or stopped in the case that the pulse and/or the sweat production are greater than a respective predefined stop threshold value.

According to another example, one or more detected fears of the user 108 may be used to provide a horror video to the user 108. In this case, content of the horror video may be selected such that objects triggering the fear of the user are present.

According to various aspects, the one or more processors 102 may be configured to mitigate the user-specific fear(s) since the fear(s) of the user 108 and the objects which induce (e.g., trigger) this fear(s) may be known (e.g., determined as described above). In the case that the fear-triggering object(s) are classified using three or more categories (e.g., no trigger, low trigger, medium trigger, and high trigger), the one or more processors 102 may be configured to carry out the below described mitigation strategies for all feartriggering object(s) or only for the ones associated with specific classes (e.g., only the ones having the medium trigger and/or high trigger). According to various aspects, the one or more processors 102 may be configured to carry out the below described mitigation strategies for the fear-triggering object(s) having a fear level equal to or greater than a predefined threshold value. The one or more processors 102 may be configured to mitigate the fear by preventing that the immersive medium 104 is presented to the user 108 (and, thus, not triggering the fear). Thereby, the fear level is reduced. Illustratively, the immersive medium 104 may be removed from a playlist. For example, the immersive image 104i may be removed from a sequence of immersive images. For example, the immersive image 104i may be removed from an immersive video or the immersive video itself may be removed from a playlist.

The one or more processors 102 may be configured to reduce the fear level associated with the immersive medium 104 by providing data to at least one of the one or more output devices 106 indicating a modified presentation of the immersive medium 104 to the user 108.

For example, the provided data may indicate the display output device to reduce a field of view of the immersive image 104i when displaying the immersive image 104i to the user 108. As an example, the immersive image 104i may be spherical or half-spherical immersive image and the field of view of the immersive image 104i may be reduced (e.g., to 90 degrees, to 70 degrees, etc.). The field of view may be reduced depending on the fear level associated with the fear-triggering object(s). For example, in the case that the fear level of the fear-triggering object corresponds to the low trigger, the field of view may be reduced to a first predefined field of view value (e.g., 70 degrees), in the case that the fear level of the fear-triggering object corresponds to the medium trigger, the field of view may be reduced to a second predefined field of view value (e.g., 55 degrees), and in the case that the fear level of the fear-triggering object corresponds to the high trigger, the field of view may be reduced to a third predefined field of view value (e.g., 35 degrees). Hence, the higher the determined fear level is, the more the field of view may be reduced (and the more the immersion is reduced).

The provided data may indicate the display output device to present the immersive image 104i on a virtual user device (e.g., tablet, smartphone, laptop, a (e.g., TV-sized) screen, etc.) and, optionally, to reduce the visibility of the background of the virtual user device (e.g., by blurring the background, by reducing the color density of the background, by darkening the background, etc.). The virtual user device may be movably (e.g., a virtual tablet in the hands of the user 108) or stationary (e.g., a TV-sized screen) within the computer-simulated reality. In the case that the immersive medium is associated with six degrees-of-freedom, the content displayed on the movable virtual user device may depend on the orientation and/or position of the movable virtual user device. In the case that the virtual user device is stationary, the immersive technology system 100 may be configured such that the user 108 can rotate around in the immersive image 104i using a (e.g., virtual reality) controller.

The provided data may indicate the display output device to present the immersive image 104i as monoscopic immersive image provided that the immersive image is a stereoscopic immersive image.

The provided data may indicate the display output device to change, provided that the immersive image 104i is a stereoscopic immersive image, the respective position of the immersive image associated with the left eye and the immersive image associated with the right eye by changing the interpupillary distance associated with the stereoscopic immersive image (e.g., increasing the interpupillary distance from about 6.3 cm to a greater value). Thereby, the object size and, hence, the fear level may be reduced.

The provided data may indicate the display output device to change a height position of the user 108 within the computer-simulated reality. By virtually increasing the height of the user 108, the object may look smaller which may result in a reduced fear level.

The provided data may indicate the display output device to change the lateral position of the user 108 within the computer-simulated reality provided that the immersive image represents six degrees-of-freedom (e.g., in the case that the immersive image has a Lightfield format or any other six degrees-of-freedom 204 representation format). By this, the user 108 may be moved to a position further away from the object which reduces the fear level induced by the object.

The provided data may indicate the display output device to increase a transparency of the display output device. Thereby, the real world in the surrounding of the user 108 may be shown which reduces the immersive experience and, thus, the induced fear level.

Illustratively, the real world in the surrounding of the user 108 may be mix into the immersive medium.

The one or more processors 102 may be configured to reduce the fear level associated with the immersive medium 104 by providing data to the audio output device indicating a modified presentation of the audio data of the immersive medium 104 to the user 108. For example, the one or more processors 102 may be configured to detect at least one object of the one or more objects 124 within the audio data using audio object detection and the provided data may indicate the audio output device to decrease an output volume when presenting an audio signal in accordance with the at least one object within the audio data. The one or more processors 102 may be configured to reduce the fear level associated with the immersive medium 104 by modifying at least one fear-triggering object within the immersive medium 104.

The one or more processors 102 may be configured to modify at least the fear-triggering object within the immersive image.

As an example, the one or more processors 102 may be configured to blur the object. This may reduce the visibility of the object and, thus, also reduce the induced fear level. The object may be blurred by applying a Gaussian blur filter.

The one or more processors 102 may be configured to change the color of the at least one fear-triggering object. For example, the color of the object may be changed to a less realistic color, thereby, reducing the fear level. As an example, the color of blood may be changed to yellow making it less realistic. According to another example, a one-colored box may be put around the object to hide the object completely.

The one or more processors 102 may be configured to cut the fear-triggering object out of the immersive image 104 and to reconstruct the immersive image using image inpainting. Image inpainting can reconstruct an image depending on the residual parts of the image. Since the other objects within the immersive image might not indicate anything regarding the fear-triggering object, image inpainting may result in an image not showing the fear- triggering-object but other objects.

The one or more processors 102 may be configured to replace the fear-triggering object with another object which is not associated with a fear of the user 108. For example, in the case that the user 108 has a fear of kids, all detected objects associated with kids may be changed to other objects (e.g., to adults).

The one or more processors 102 may be configured to add objects to the immersive image 104i to reduce the fear level.

For example, in the case that the user 108 has a fear of heights, an artificial floor may be added below the user 108. This may give the feeling of stability and not falling over.

The one or more processors 102 may be configured to modify at least the fear-triggering object within the audio data. As an example, the one or more processors 102 may be configured to remove at least the at least one object from the audio data. According to some aspects, the one or more processors 102 may be configured to remove only the at least one object from the audio data. According to other aspects, the one or more processors 102 may be configured to remove at least the sentence, which includes the at least one object, from the audio data.

The one or more processors 102 may be configured to detect at least one object of the one or more objects 124 within the haptic data and to modify at least the fear-triggering object within the haptic data. As an example, the one or more processors 102 may be configured to remove at least the at least one object from the audio data.

These examples may reduce the immersive experience and, thereby, also the induced fear level. Thus, the immersive technology system 100 may be capable to reduce the probability that a user 108 exhibits fear when using an immersive medium. These mitigation strategies may be combined in any suitable manner. Hence, one or more mitigation strategies may be combined. As an example, in the case that the immersive medium 104 is removed, only one mitigation strategy may be employed. According to another example, one or more objects within the immersive medium may be modified and the field of view may be reduced, thereby, employing more than one mitigation strategy.

According to some aspects, each fear may be associated with different mitigation strategies depending on the fear level. For example, one or more first mitigation strategies (e.g., modifying the fear-triggering objects) may be employed for fear-triggering objects associated the low trigger, one or more different second mitigation strategies (e.g., reducing the field of view) may be employed for fear-triggering objects associated the medium trigger, and one or more (e.g., different) third mitigation strategies (e.g., removing the immersive medium from the playlist) may be employed for fear-triggering objects associated the high trigger.

For illustration, exemplary mitigation strategies of selected fears are shown in Table 2.

Even though the mitigation strategies are described for immersive media, they may also be used for perspective images (e.g., which are part of perspective videos).

A perspective image may be presented on a technology system capable to present three- dimensional content (e.g., 3D-TV, a tablet with auto-stereoscopy, a Nintendo 3DS, etc.). In the case that fear-triggering object(s) are detected within the immersive image, the perspective image may be removed from the playlist, the provided data may indicate the technology system to reduce the field of view, to change the content to the movable virtual user device, to blur the fear-triggering object(s), to change the color of the fear-triggering object(s), to cut out the fear-triggering object(s), to change to monoscopic content, to decrease the audio volume or to remove the audio object(s), to change the IPD, etc.

A perspective image may be presented on a technology system capable to present only two-dimensional content (a PC screen, a TV, a tablet, a smartphone, etc.), such as devices capable to present low-field of view content only. In the case that fear-triggering object(s) are detected within the immersive image, the perspective image may be removed from the playlist, the provided data may indicate the technology system to reduce the field of view, to blur the fear-triggering object(s), to change the color of the fear-triggering object(s), to cut out the fear-triggering object(s), to decrease the audio volume or to remove the audio object(s), to change the IPD, etc.

New immersive content may be pre-processed as described herein prior to presenting the pre-processed immersive content to the user 108. This will provide an overall improved experience for the user when using immersive media.

According to various aspects, an immersive video may include one or more parallel playback strings which are selected via playback decisions/options. A user may manually select the decision (also referred to as playback option), thereby, defining which content will be presented. Knowing the fear(s) of users (e.g., determined as described herein and stored as respective user profile) may allow to automatically select the decision such that the presented content induces a lower fear level as the other string would induce. According to various aspects, the determined fear(s) of the user 108 may be considered already when rendering the immersive medium through a rendering engine. When playing back the immersive medium on a dynamic streaming source (e.g., Netflix or a custom created video format which could be delivered on discs), the user profile regarding fear(s) may be set there as well and depending on this, the immersive technology system 100 can decide to play back the scene without the fear-triggering object(s).

Being capable to detect fear-triggering objects within immersive media allows to consider these fear-triggers in a lot of circumstances. According to various aspects, a content creation system may include one or more processors configured to detect fear-triggering objects within created immersive images. For example, the one or more processors may be configured to render an immersive image and to detect fear-triggering objects within the rendered immersive image. The content creation system may include a user interface (e.g., a display device) for providing information to a user (e.g., content creator) using (e.g., controlling) the content creation system. The one or more processors may be configured to provide instructions to the user interface to inform the user about the fear-triggering objects (and optionally the type of fear associated with the objects). For example, the one or more processors may be configured to implement a content creation software which is capable to inform the content creator about the detected fear-triggering objects. This may allow the user (e.g., the content creator) to adapt the immersive image in the case that he/she does not want to trigger any fears. The content creation system may be or may include a camera having a display as user interface. The camera may be configured to show an image as preview on the display and the one or more processors may be configured to detect fear-triggering objects within this preview image. The camera may be configured to capture an immersive image and the one or more may be configured to detect fear-triggering objects within this captured immersive image. This may allow the user (e.g., content creator) using the camera to consider the fear-triggering objects (e.g., to change the scene and to recaptured the immersive image). As described herein, the immersive image may be an image of an immersive video. Hence, the rendered immersive image may be an image of a rendered immersive image and the captured immersive image may be an image of a captured immersive video.

FIG. 6A shows a flow diagram of a method 600 A for detecting objects within an immersive image according to various aspects. The method 600A may include providing an immersive image associated with at least a portion of a sphere (in 602A). The method 600A may include tessellating at least the portion of the sphere into a plurality of polygons (e.g., a plurality of quadrilaterals) such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image (in 604 A). The method 600 A may include, for one or more polygons (e.g., for each polygon) of the plurality of polygons, projecting at least the corresponding part of the immersive image onto the sphere and generating a perspective image representing at least the corresponding part of the immersive image (in 606A). The method 600A may include for each generated perspective image, detecting one or more objects within the respective perspective image using (semantic) image segmentation (in 608A).

FIG. 6B shows a flow diagram of a method 600B for determining fear of a user using an immersive medium according to various aspects. The method 600B may include providing computer-simulated reality in accordance with an immersive medium to a user using an immersive technology system (in 602B).

The method 600B may include detecting sensor data representing a reaction of the user in response to providing the immersive medium to the user (in 604B). The method 600B may include determining, using the sensor data, whether the reaction of the user is associated with a fear reaction (in 606B). The method 600B may include detecting one or more objects (e.g., visual objects, audio objects, haptic objects, etc.) within the immersive medium (in 608B). The immersive medium may include an immersive image and at least one of the one or more objects may be (in 608B) detected within the immersive image using the method 600 A. The method 600B may include, for each of the detected one or more objects, determining, whether the respective object is associated with at least one (sensation of) fear (in 61 OB). The method 600B may include in the case that it is determined that the reaction of the user is associated with the fear reaction and that the respective object is associated with at least one fear, increasing a probability of the user having the at least one fear (in 612B). The method 600B may include determining that the user has the at least one fear in the case that the probability associated with the at least one fear is above a predefined fear threshold value (in 614B).

FIG. 6C shows a flow diagram of a method 600C for reducing a fear level of a user using an immersive medium. The method 600C may include detecting one or more objects within an immersive medium (in 602C). The immersive medium may include an immersive image and at least one of the one or more objects may be (in 602C) detected within the immersive image using the method 600A. The method 600C may, in 604C, for at least one object (e.g., each object) of the detected one or more objects include: determining, whether the respective object is associated with at least one (sensation of) fear (in 606C). The method 600C may, in 604C, for each object of the detected one or more objects include: in the case that it is determined that the respective object is associated with at least one fear, determining a fear level associated with the object and determine, whether a user has the at least one fear (in 608C). Method 600B may be carried out to determine, in 608C, whether the user has the at least one fear. The method 600C may include in the case that it is determined that the user has the at least one fear: preventing the immersive medium from being presented to the user (in 610C). Alternatively, the method 600C may include in the case that it is determined that the user has the at least one fear: providing data to one or more output devices indicating a modified presentation of the immersive medium to the user and/or modifying at least the respective object within the immersive medium to reduce the fear level of the user (in 612C).

In the following, various aspects of this disclosure will be illustrated. It is noted that aspects described with reference to the device or the immersive technology system may be accordingly implemented in the method and vice versa. Example 1 is a device for detecting objects within an immersive image, the device including: one or more processors configured to: provide an immersive image associated with at least a portion of a sphere; tessellate at least the portion of the sphere into a plurality of polygons (e.g., a plurality of quadrilaterals) such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image; for one or more polygons (e.g., for each polygon) of the plurality of polygons, project at least the corresponding part of the immersive image onto the sphere and generate a perspective image representing at least the corresponding part of the immersive image; and for each generated perspective image, detect one or more objects within the respective perspective image using (semantic) image segmentation.

In Example 2, the subject matter of Example 1 can optionally include that the immersive image is an image of an immersive video.

In Example 3, the subject matter of Example 1 or 2 can optionally include that each polygon of the plurality of polygons is a quadrilateral.

In Example 4, the subject matter of any one of Examples 1 to 3 can optionally include that the immersive image is associated with the full sphere, that the one or more processors are configured to tessellate the full sphere into the plurality of polygons, and that the plurality of polygons includes a number of polygons in the range from about 12 to about 50 (e.g., in the range from about 30 to about 50).

In Example 5, the subject matter of any one of Examples 1 to 4 can optionally include that each generated perspective image represents the corresponding part of the immersive image and an additional part of the immersive image surrounding the corresponding part.

In Example 6, the subject matter of Example 5 can optionally include that a ratio between an area of the corresponding part of the immersive image and an area of the additional part of the immersive image is in the range from about 1.4 to about 3.

In Example 7, the subject matter of any one of Examples 1 to 6 can optionally include that the immersive image has an Equirectangular format, a Fisheye format, or a Cubemap format.

In Example 8, the subject matter of any one of Examples 1 to 7 can optionally include that the immersive image has a Fisheye format; wherein the one or more processors are configured to convert the Fisheye format into an Equirectangular format prior to projecting at least the corresponding part of the immersive image associated with a respective polygon onto the sphere and generating the respective perspective image representing at least the corresponding part of the immersive image.

In Example 9, the subject matter of any one of Examples 1 to 8 can optionally include that the immersive image has an Equirectangular format or is converted into an Equirectangular format; wherein the one or more processors are configured to: divide the immersive image into an upper edge region, a lower edge region, and a center region located between the upper edge region and the lower edge region, wherein the upper edge region of the immersive image is associated with a first sub-portion of the portion of the sphere and wherein the lower edge region of the immersive image is associated with a second subportion of the portion of the sphere, wherein the one or more polygons include each polygon associated with the first sub-portion and each polygon associated with the second sub-portion; and detect one or more objects within the center region of the immersive image using image segmentation.

In Example 10, the subject matter of any one of Examples 1 to 9 can optionally include that the one or more processors are further configured to: for each generated perspective image (and optionally the center region of the immersive image provided that in combination with Example 9): generate a respective depth image including depth information regarding the one or more objects detected within the respective perspective image; determine a respective size of each of the one or more objects detected within the respective perspective image and/or a respective distance of each of the one or more objects from a center of the sphere.

Example 11 is a device for generating a depth map of an immersive image, the device including: one or more processors configured to: provide an immersive image associated with at least a portion of a sphere; tessellate at least the portion of the sphere into a plurality of polygons such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image; for each polygon of the plurality of polygons, project at least the corresponding part of the immersive image onto the sphere and generate a perspective image representing at least the corresponding part of the immersive image; for each generated perspective image, generate a respective depth image including depth information; and generate a depth map of the immersive image by combining the generated depth images.

Example 12 is a device for detecting objects within an immersive image, the device including: one or more processors configured to: render an immersive image; for each virtual camera position of one or more virtual camera positions, generate, for each virtual viewing direction of one or more virtual viewing directions, a respective perspective image by taking a screenshot of a corresponding part of the rendered immersive image from the respective virtual camera position in the respective virtual viewing direction; and for each generated perspective image, detect one or more objects within the respective perspective image using image segmentation.

Example 13 is an immersive technology system including: one or more output devices configured to provide computer-simulated reality in accordance with an immersive medium to a user; one or more sensors configured to detect sensor data representing a reaction of the user in response to providing the immersive medium to the user via the one or more output devices; one or more processors configured to: determine, using the sensor data, whether the reaction of the user is associated with a fear reaction; detect one or more objects (e.g., visual objects, audio objects, haptic objects, etc.) within the immersive medium; for each of the detected one or more objects, determine, whether the respective object is associated with at least one (sensation of) fear; in the case that it is determined that the reaction of the user is associated with the fear reaction and that the respective object is associated with at least one fear, increase a probability of the user having the at least one fear; determine that the user has the at least one fear in the case that the probability associated with the at least one fear is above a predefined fear threshold value.

In Example 14, the subject matter of Example 13 can optionally include that the one or more processors are further configured to decrease the probability of the user having the at least one fear in the case that it is determined that the object is associated with the at least one fear and that the reaction of the user is not associated with the fear reaction.

In Example 15, the subject matter of Example 13 or 14 can optionally include that the immersive medium includes an immersive image; that the one or more output devices include a display output device configured to display the immersive image; and that the one or more processors are configured to detect at least one object of the one or more objects within the immersive image using image segmentation (e.g., using the device in accordance with any one of Examples 1 to 12).

In Example 16, the subject matter of Example 15 can optionally include that the one or more sensors include at least one sensor configured to detect a viewing direction of the user; wherein the one or more processors are configured to detect the one or more objects within the immersive image in the field of view of the viewing direction of the user. In Example 17, the subject matter of Example 16 can optionally include that the at least one sensor includes an orientation sensor configured to detect a head orientation of the user and/or an eye tracking sensor configured to detect eye tracking data representing an eye viewing direction of the user.

In Example 18, the subject matter of Example 16 or 17 can optionally include that the one or more processors are configured to: determine, whether the user averts his gaze from the detected one or more objects within a predefined time interval after the immersive image is displayed via the display output device; increase a probability of the fear reaction in the case that it is determined that the user averts his gaze from the detected one or more objects within the predefined time interval after the immersive image is displayed via the display output device; and determine that the reaction of the user is associated with the fear reaction in the case that the probability of the fear reaction is above a predefined fear reaction threshold value.

In Example 19, the subject matter of any one of Examples 15 to 18 can optionally include that the immersive image is a stereoscopic immersive image including a first immersive image associated with the left eye and a second immersive image associated with the right eye; and wherein the one or more processors are configured to detect the at least one object within the first immersive image and/or the second immersive image using image segmentation.

In Example 20, the subject matter of any one of Examples 13 to 19 can optionally include that the immersive medium includes audio data; wherein the one or more output devices include an audio output device configured to output an audio signal in accordance with the audio data; wherein the one or more processors are configured to detect at least one object of the one or more objects within the audio data using audio object detection.

In Example 21, the subject matter of any one of Examples 13 to 20 can optionally include that the immersive medium includes haptic data (e.g., representing information when and how haptic feedback is to be applied); wherein the one or more output devices include a haptic output device configured to output a haptic signal (e.g., a force, (focused) ultrasound, a vibration, and/or a motion) in accordance with the haptic data; wherein the one or more processors are configured to detect at least one object of the one or more objects within the haptic data.

In Example 22, the subject matter of any one of Examples 13 to 21 can optionally include that the one or more processors are configured to determine that the reaction of the user is associated with the fear reaction in the case that a probability of the fear reaction is above a predefined fear reaction threshold value.

In Example 23, the subject matter of Example 22 can optionally include that the one or more sensors include a sweat sensor (e.g., a sweat biosensor) configured to detect a sweat production of the user; wherein the one or more processors are configured to: increase the probability of the fear reaction with increasing sweat production within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user.

In Example 24, the subject matter of Example 22 or 23 can optionally include that the one or more sensors include a pulse sensor (e.g., included in a smart watch) configured to detect a pulse of the user; wherein the one or more processors are configured to: increase the probability of the fear reaction with increasing pulse within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user.

In Example 25, the subject matter of any one of Examples 22 to 24 can optionally include that the one or more sensors include a face tracking camera configured to detect at least a lower part of the face of the user; wherein the one or more processors are configured to: determine, whether the lower part of the face of the user detected within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear; and in the case that it is determined that the detected lower part of the face of the user indicates fear, increase the probability of the fear reaction.

In Example 26, the subject matter of any one of Examples 22 to 25 can optionally include that the one or more sensors include at least one body tracking sensor (e.g., a motion sensor, a camera pointed at the user, a camera located within equipment the user is equipped with, a position sensor (e.g., within shoes), etc.) configured to detect a motion of the user; wherein the one or more processors are configured to: determine, using the detected motion of the user, whether the user moves back within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user; and in the case that it is determined that the user moves back within the predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user, increase the probability of the fear reaction.

In Example 27, the subject matter of any one of Examples 22 to 26 can optionally include that the one or more sensors include a microphone configured to capture audio generated by the user; wherein the one or more processors are configured to: determine, whether the audio captured within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear (e.g., screaming and/or hard/rapid breathing); and in the case that it is determined that the audio captured within the predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear, increase the probability of the fear reaction.

In Example 28, the subject matter of any one of Examples 22 to 27 can optionally include that the one or more sensors include a brain-computer-interface configured to detect fear within the brain of the user; wherein the one or more processors are configured to: in the case that fear within the brain of the user is detected via the brain-computer-interface within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user, increase the probability of the fear reaction.

In Example 29, the subject matter of any one of Examples 13 to 28 can optionally include that the one or more processors are configured to: for each object of a plurality of objects, determine a respective probability that the respective object is in the immersive medium; detect the objects of the plurality of objects which have a respective probability equal to or greater than a predefined object detection threshold value as the one or more objects within the immersive medium.

In Example 30, the subject matter of Example 29 can optionally include that the immersive medium includes a sequence of immersive images, the sequence including at least a first immersive image and a (e.g., consecutive) second immersive image; wherein the one or more processors are configured to: detect one or more objects within the first immersive image and one or more objects within the second immersive image; determine, whether the same object is detected within the first immersive image and the second immersive image; and in the case that it is determined that the same object is detected within the first immersive image and the second immersive image, increase the probability associated with this object (or lower the predefined object detection threshold value).

In Example 31, the subject matter of Example 29 or 30 can optionally include that the immersive medium includes an immersive image and corresponding audio data; wherein the one or more processors are configured to: detect one or more objects within the immersive image and one or more objects within the audio data; determine, whether the same object (e.g., a bee within the immersive image and someone speaking about bees within the audio data) is or connected objects (e.g., a bee within the immersive image and a droning of bees within the audio data) are detected within the immersive image and the audio data; and in the case that it is determined that the same object is or connected objects are detected within the immersive image and the audio data, increase the probability associated with this object.

In Example 32, the subject matter of any one of Examples 13 to 31 can optionally include that the one or more processors are further configured to provide one or more immersive media to the one or more output devices to provide the one or more immersive media to the user, wherein each of the one or more immersive media includes at least one objects associated with at least one detected fear of the user (e.g., for therapy or horror).

Example 33 is a device including: one or more processors configured to: detect one or more objects within an immersive medium; for each object of the detected one or more objects: determine, whether the respective object is associated with at least one (sensation of) fear; in the case that it is determined that the respective object is associated with at least one fear, determine a fear level associated with the object and determine, whether a user has the at least one fear; and in the case that it is determined that the user has the at least one fear, prevent the immersive medium from being presented to the user, or provide data to one or more output devices indicating a modified presentation of the immersive medium to the user, and/or modify at least the respective object within the immersive medium to reduce the fear level.

In Example 34, the subject matter of Example 33 can optionally include that the immersive technology system according to any one of Examples 13 to 32 is employed to determine whether the user has the at least one fear.

In Example 35, the subject matter of Example 33 or 34 can optionally include that the immersive medium includes an immersive image and that the one or more processors are configured to detect at least one object of the one or more objects within the immersive image using image segmentation (e.g., using the device in accordance with any one of Examples 1 to 12).

In Example 36, the subject matter of Example 35 can optionally include that the one or more processors are configured to determine the fear level associated with the respective object depending on the size of the object (e.g., depending on a ratio between the size of the object and the common size or common size range of the object). In Example 37, the subject matter of Example 35 or 36 can optionally include that the one or more processors are configured to: generate a depth map of the immersive image (e.g., using the device in accordance with Example 10 or 11), the depth map including depth information regarding each object of the detected one or more objects; determine, using the depth map, a respective size of each object of the detected one or more objects and/or a respective distance of each object of the one or more objects from a predefined point; and determine the fear level associated with the respective object depending on the determined size and/or determined distance of the object.

In Example 38, the subject matter of any one of Examples 35 to 37 can optionally include that the one or more processors are configured to provide the data to a display output device, wherein the provided data indicate the display output device: to reduce a field of view of the immersive image when displaying the immersive image to the user; to present the immersive image on a virtual user device (e.g., tablet, smartphone, laptop, a (e.g., TVsized) screen, etc.) and to reduce the visibility of the background of the virtual user device (e.g., by blurring the background, by reducing the color density of the background, by darkening the background, etc.) (the user device may be movably or stationary within the computer-simulated reality); to present the immersive image as monoscopic immersive image provided that the immersive image is a stereoscopic immersive image; to change, provided that the immersive image is a stereoscopic immersive image, the respective position of the immersive image associated with the left eye and the immersive image associated with the right eye by changing the interpupillary distance associated with the stereoscopic immersive image; to change a height position of the user within the computer- simulated reality; to change the lateral position of the user within the computer-simulated reality provided that the immersive image represents six degrees of freedom (e.g., in the case that the immersive image includes a Lightfield format or any other six degrees-of- freedom 204 representation format); to increase a transparency of the display output device.

In Example 39, the subject matter of any one of Examples 35 to 38 can optionally include that the one or more processors are configured to modify at least the at least one object within the immersive image to reduce the fear level by: blurring the respective object; changing the color of at least the respective object (e.g., changing the color of the respective object or putting a one-colored box around the respective object); cutting the respective object out of the immersive image and reconstructing the immersive image using image inpainting; replacing the respective object with another object which is not associated with a fear of the user; adding an artificial floor below the user. In Example 40, the subject matter of any one of Examples 35 to 39 can optionally include that the immersive image is a stereoscopic immersive image including a first immersive image associated with the left eye and a second immersive image associated with the right eye; and wherein the one or more processors are configured to modify the at least one object within the first immersive image and the second immersive image.

In Example 41, the subject matter of any one of Examples 33 to 40 can optionally include that the immersive medium includes audio data; wherein the one or more processors are configured to detect at least one object of the one or more objects within the audio data using audio object detection; and wherein the one or more processors are configured to provide the data to an audio output device, wherein the provided data indicate the audio output device to decrease an output volume when presenting an audio signal in accordance with the at least one object within the audio data.

In Example 42, the subject matter of any one of Examples 33 to 40 can optionally include that the immersive medium includes audio data; wherein the one or more processors are configured to detect at least one object of the one or more objects within the audio data using audio object detection; and wherein the one or more processors are configured to remove at least the at least one object from the audio data.

In Example 43, the subject matter of any one of Examples 33 to 42 can optionally include that the immersive medium includes haptic data; wherein the one or more processors are configured to detect at least one object of the one or more objects within the haptic data; and wherein the one or more processors are configured to remove at least the at least one object from the haptic data.

In Example 44, the subject matter of any one of Examples 33 to 43 can optionally include that the one or more processors are configured to: for each object of a plurality of objects, determine a respective probability that the respective object is in the immersive medium; detect the objects of the plurality of objects which have a respective probability equal to or greater than a predefined object detection threshold value as the one or more objects within the immersive medium.

In Example 45, the subject matter of Example 44 can optionally include that the immersive medium includes a sequence of a first immersive image and a (e.g., consecutive) second immersive image; wherein the one or more processors are configured to: detect one or more objects within the first immersive image and one or more objects within the second immersive image; determine, whether the same object is detected within the first immersive image and the second immersive image; and in the case that it is determined that the same object is detected within the first immersive image and the second immersive image, increase the probability associated with this object (or lower the predefined object detection threshold value).

In Example 46, the subject matter of Example 44 or 45 can optionally include that the immersive medium includes an immersive image and corresponding audio data; wherein the one or more processors are configured to: detect one or more objects within the immersive image and one or more objects within the audio data; determine, whether the same object (e.g., a bee within the immersive image and someone speaking about bees within the audio data) is or connected objects (e.g., a bee within the immersive image and a droning of bees within the audio data) are detected within the immersive image and the audio data; and in the case that it is determined that the same object is or connected objects detected within the immersive image and the audio data, increase the probability associated with this object.

Example 47 is a method for detecting objects within an immersive image, the method including: providing an immersive image associated with at least a portion of a sphere; tessellating at least the portion of the sphere into a plurality of polygons (e.g., a plurality of quadrilaterals) such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image; for one or more polygons (e.g., for each polygon) of the plurality of polygons, projecting at least the corresponding part of the immersive image onto the sphere and generating a perspective image representing at least the corresponding part of the immersive image; and for each generated perspective image, detecting one or more objects within the respective perspective image using (semantic) image segmentation.

In Example 48, the subject matter of Example 47 can optionally include that the immersive image is an image of an immersive video.

In Example 49, the subject matter of Example 47 or 48 can optionally include that each polygon of the plurality of polygons is a quadrilateral.

In Example 50, the subject matter of any one of Examples 47 to 49 can optionally include that the immersive image is associated with the full sphere, that tessellating at least the portion of the sphere includes tessellating the full sphere into the plurality of polygons, and that the plurality of polygons includes a number of polygons in the range from about 12 to about 50 (e.g., in the range from about 30 to about 50).

In Example 51, the subject matter of any one of Examples 47 to 50 can optionally include that each generated perspective image represents the corresponding part of the immersive image and an additional part of the immersive image surrounding the corresponding part.

In Example 52, the subject matter of Example 51 can optionally include that a ratio between an area of the corresponding part of the immersive image and an area of the additional part of the immersive image is in the range from about 1.4 to about 3.

In Example 53, the subject matter of any one of Examples 47 to 52 can optionally include that the immersive image has an Equirectangular format, a Fisheye format, or a Cubemap format.

In Example 54, the subject matter of any one of Examples 47 to 53 can optionally include that the immersive image has a Fisheye format; wherein the method further includes converting the Fisheye format into an Equirectangular format prior to projecting at least the corresponding part of the immersive image associated with a respective polygon onto the sphere and generating the respective perspective image representing at least the corresponding part of the immersive image.

In Example 55, the subject matter of any one of Examples 47 to 54 can optionally include that the immersive image has an Equirectangular format or is converted into an Equirectangular format; wherein the method further includes: dividing the immersive image into an upper edge region, a lower edge region, and a center region located between the upper edge region and the lower edge region, wherein the upper edge region of the immersive image is associated with a first sub-portion of the portion of the sphere and wherein the lower edge region of the immersive image is associated with a second subportion of the portion of the sphere, wherein the one or more polygons include each polygon associated with the first sub-portion and each polygon associated with the second sub-portion; and detecting one or more objects within the center region of the immersive image using image segmentation.

In Example 56, the method of any one of Examples 47 to 55 can optionally further include: for each generated perspective image (and optionally the center region of the immersive image provided that in combination with Example 55): generating a respective depth image including depth information regarding the one or more objects detected within the respective perspective image; and determining a respective size of each of the one or more objects detected within the respective perspective image and/or a respective distance of each of the one or more objects from a center of the sphere.

Example 57 is a method for generating a depth map of an immersive image, the method including: providing an immersive image associated with at least a portion of a sphere; tessellating at least the portion of the sphere into a plurality of polygons such that each polygon of the plurality of polygons corresponds to a respective part of the immersive image; for each polygon of the plurality of polygons, projecting at least the corresponding part of the immersive image onto the sphere and generating a perspective image representing at least the corresponding part of the immersive image; for each generated perspective image, generating a respective depth image including depth information; and generating a depth map of the immersive image by combining the generated depth images.

Example 58 is a method for detecting objects within an immersive image, the method including: rendering an immersive image; for each virtual camera position of one or more virtual camera positions, generating, for each virtual viewing direction of one or more virtual viewing directions, a respective perspective image by taking a screenshot of a corresponding part of the rendered immersive image from the respective virtual camera position in the respective virtual viewing direction; and for each generated perspective image, detecting one or more objects within the respective perspective image using image segmentation.

Example 59 is a method for determining fear of a user using an immersive technology system, the method including: providing computer-simulated reality in accordance with an immersive medium to a user using an immersive technology system; detecting sensor data representing a reaction of the user in response to providing the immersive medium to the user; determining, using the sensor data, whether the reaction of the user is associated with a fear reaction; detecting one or more objects (e.g., visual objects, audio objects, haptic objects, etc.) within the immersive medium; for each of the detected one or more objects, determining, whether the respective object is associated with at least one (sensation of) fear; in the case that it is determined that the reaction of the user is associated with the fear reaction and that the respective object is associated with at least one fear, increasing a probability of the user having the at least one fear; and determining that the user has the at least one fear in the case that the probability associated with the at least one fear is above a predefined fear threshold value.

In Example 60, the subject matter of Example 59 can optionally further include decreasing the probability of the user having the at least one fear in the case that it is determined that the object is associated with the at least one fear and that the reaction of the user is not associated with the fear reaction.

In Example 61, the subject matter of Example 59 or 60 can optionally include that the immersive medium includes an immersive image and that the computer-simulated reality is provided to the user using a display output device; wherein the method further includes detecting at least one object of the one or more objects within the immersive image using image segmentation (e.g., using the method in accordance with any one of Examples 46 to 58).

In Example 62, the subject matter of Example 61 can optionally further include detecting a viewing direction of the user; and detecting the one or more objects within the field of view of the viewing direction of the user.

In Example 63, the subject matter of Example 62 can optionally further include determining, whether the user averts his gaze from the detected one or more objects within a predefined time interval after the immersive image is displayed via the display output device; increasing a probability of the fear reaction in the case that it is determined that the user averts his gaze from the detected one or more objects within the predefined time interval after the immersive image is displayed via the display output device; and determining that the reaction of the user is associated with the fear reaction in the case that the probability of the fear reaction is above a predefined fear reaction threshold value.

In Example 64, the subject matter of any one of Examples 59 to 63 can optionally include that the immersive image is a stereoscopic immersive image including a first immersive image associated with the left eye and a second immersive image associated with the right eye; and wherein detecting the at least one object within the immersive image includes detecting the at least one object within the first immersive image and/or the second immersive image using image segmentation.

In Example 65, the subject matter of any one of Examples 59 to 64 can optionally include that the immersive medium includes audio data; wherein the method includes detecting at least one object of the one or more objects within the audio data using audio object detection.

In Example 66, the subject matter of any one of Examples 59 to 65 can optionally include that the immersive medium includes haptic data (e.g., representing information when and how haptic feedback is to be applied); wherein the method includes detecting at least one object of the one or more objects within the haptic data.

In Example 67, the subject matter of any one of Examples 59 to 66 can optionally include that the method includes determining that the reaction of the user is associated with the fear reaction in the case that a probability of the fear reaction is above a predefined fear reaction threshold value.

In Example 68, the subject matter of Example 67 can optionally further include increasing the probability of the fear reaction with increasing sweat production within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user.

In Example 69, the method of Example 67 or 68 can optionally further include: increasing the probability of the fear reaction with increasing pulse within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user.

In Example 70, the method of any one of Examples 67 to 69 can optionally further include: determining, whether the lower part of the face of the user detected within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear; and in the case that it is determined that the detected lower part of the face of the user indicates fear, increasing the probability of the fear reaction.

In Example 71, the subject matter of any one of Examples 67 to 70 can optionally further include: determining, using a detected motion of the user, whether the user moves back within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user; and in the case that it is determined that the user moves back within the predefined time interval after providing the computer- simulated reality in accordance with the immersive medium to the user, increasing the probability of the fear reaction.

In Example 72, the subject matter of any one of Examples 67 to 71 can optionally further include: determining, whether user generated audio captured within a predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear (e.g., screaming and/or hard/rapid breathing); and in the case that it is determined that the audio captured within the predefined time interval after providing the computer-simulated reality in accordance with the immersive medium to the user indicates fear, increasing the probability of the fear reaction.

In Example 73, the subject matter of any one of Examples 67 to 72 can optionally further include: detecting fear within the brain of the user using a brain-computer-interface and increasing the probability of the fear reaction.

In Example 74, the subject matter of any one of Examples 59 to 73 can optionally further include: for each object of a plurality of objects, determining a respective probability that the respective object is in the immersive medium; and detecting the objects of the plurality of objects which have a respective probability equal to or greater than a predefined object detection threshold value as the one or more objects within the immersive medium.

In Example 75, the subject matter of Example 74 can optionally include that the immersive medium includes a sequence of immersive images, the sequence including at least a first immersive image and a (e.g., consecutive) second immersive image; wherein the method further includes: detecting one or more objects within the first immersive image and one or more objects within the second immersive image; determining, whether the same object is detected within the first immersive image and the second immersive image; and in the case that it is determined that the same object is detected within the first immersive image and the second immersive image, increasing the probability associated with this object (or lower the predefined object detection threshold value).

In Example 76, the subject matter of Example 74 or 75 can optionally include that the immersive medium includes an immersive image and corresponding audio data; wherein the method further includes: detecting one or more objects within the immersive image and one or more objects within the audio data; determining, whether the same object (e.g., a bee within the immersive image and someone speaking about bees within the audio data) is or connected objects (e.g., a bee within the immersive image and a droning of bees within the audio data) are detected within the immersive image and the audio data; and in the case that it is determined that the same object is or connected objects are detected within the immersive image and the audio data, increasing the probability associated with this object.

In Example 77, the subject matter of any one of Examples 59 to 76 can optionally further include: providing one or more immersive media to one or more output devices to provide the one or more immersive media to the user, wherein each of the one or more immersive media includes at least one objects associated with at least one detected fear of the user (e.g., for therapy or horror). Example 78 is a method for reducing a fear level of a user using an immersive medium, the method including: detecting one or more objects within an immersive medium; for each object of the detected one or more objects: determining, whether the respective object is associated with at least one (sensation of) fear; in the case that it is determined that the respective object is associated with at least one fear, determining a fear level associated with the object and determine, whether a user has the at least one fear; and in the case that it is determined that the user has the at least one fear, preventing the immersive medium from being presented to the user, or providing data to one or more output devices indicating a modified presentation of the immersive medium to the user, and/or modifying at least the respective object within the immersive medium to reduce the fear level of the user.

In Example 79, the subject matter of Example 78 can optionally include that the method according to any one of Examples 59 to 77 is employed to determine whether the user has the at least one fear.

In Example 80, the subject matter of Example 78 or 79 can optionally include that the immersive medium includes an immersive image and that the method further includes: detecting at least one object of the one or more objects within the immersive image using image segmentation (e.g., using the method in accordance with any one of Examples 47 to 58).

In Example 81, the subject matter of Example 80 can optionally include that the one or more processors are configured to determine the fear level associated with the respective object depending on the size of the object (e.g., depending on a ratio between the size of the object and the common size or common size range of the object).

In Example 82, the subject matter of Example 80 or 81 can optionally further include: generating a depth map of the immersive image (e.g., using the method in accordance with Example 55 or 56), the depth map including depth information regarding each object of the detected one or more objects; determining, using the depth map, a respective size of each object of the detected one or more objects and/or a respective distance of each object of the one or more objects from a predefined point; and determining the fear level associated with the respective object depending on the determined size and/or determined distance of the object.

In Example 83, the subject matter of any one of Examples 80 to 82 can include that the method includes providing the data to a display output device, and the provided data can indicate the display output device: to reduce a field of view of the immersive image when displaying the immersive image to the user; to present the immersive image on a virtual user device (e.g., tablet, smartphone, laptop, a (e.g., TV-sized) screen, etc.) and to reduce the visibility of the background of the virtual user device (e.g., by blurring the background, by reducing the color density of the background, by darkening the background, etc.) (the user device may be movably (mitigation strategy 3) or stationary (mitigation strategy 4) within the computer-simulated reality); to present the immersive image as monoscopic immersive image provided that the immersive image is a stereoscopic immersive image; to change, provided that the immersive image is a stereoscopic immersive image, the respective position of the immersive image associated with the left eye and the immersive image associated with the right eye by changing the interpupillary distance associated with the stereoscopic immersive image; to change a height position of the user within the computer-simulated reality; to change the lateral position of the user within the computer- simulated reality provided that the immersive image represents six degrees of freedom (e.g., in the case that the immersive image includes a Lightfield format or any other six degrees-of-freedom 204 representation format); to increase a transparency of the display output device.

In Example 84, the subject matter of any one of Examples 80 to 83 can optionally include that the method includes modifying at least the at least one object within the immersive image to reduce the fear level by: blurring the respective object; changing the color of at least the respective object (e.g., changing the color of the respective object or putting a one-colored box around the respective object); cutting the respective object out of the immersive image and reconstructing the immersive image using image inpainting; replacing the respective object with another object which is not associated with a fear of the user; adding an artificial floor below the user.

In Example 85, the subject matter of any one of Examples 80 to 84 can optionally include that the immersive image is a stereoscopic immersive image including a first immersive image associated with the left eye and a second immersive image associated with the right eye; and wherein the method includes modifying the at least one object within the first immersive image and the second immersive image.

In Example 86, the subject matter of any one of Examples 78 to 85 can optionally include that the immersive medium includes audio data; wherein the method further includes: detecting at least one object of the one or more objects within the audio data using audio object detection; and providing the data to an audio output device, wherein the provided data indicate the audio output device to decrease an output volume when presenting an audio signal in accordance with the at least one object within the audio data.

In Example 87, the subject matter of any one of Examples 78 to 85 can optionally include that the immersive medium includes audio data; wherein the method further includes: detecting at least one object of the one or more objects within the audio data using audio object detection; and removing at least the at least one object from the audio data.

In Example 88, the subject matter of any one of Examples 78 to 85 can optionally include that the immersive medium includes haptic data; wherein the method further includes: detecting at least one object of the one or more objects within the haptic data; and removing at least the at least one object from the haptic data.

In Example 89, the subject matter of any one of Examples 78 to 88 can optionally further include: for each object of a plurality of objects, determining a respective probability that the respective object is in the immersive medium; and detecting the objects of the plurality of objects which have a respective probability equal to or greater than a predefined object detection threshold value as the one or more objects within the immersive medium.

In Example 90, the subject matter of Example 89 can optionally include that the immersive medium includes a sequence of a first immersive image and a (e.g., consecutive) second immersive image; wherein the method further includes: detecting one or more objects within the first immersive image and one or more objects within the second immersive image; determining, whether the same object is detected within the first immersive image and the second immersive image; and in the case that it is determined that the same object is detected within the first immersive image and the second immersive image, increasing the probability associated with this object (or lower the predefined object detection threshold value).

In Example 91, the subject matter of Example 89 or 90 can optionally include that the immersive medium includes an immersive image and corresponding audio data; wherein the method further includes: detecting one or more objects within the immersive image and one or more objects within the audio data; determining, whether the same object (e.g., a bee within the immersive image and someone speaking about bees within the audio data) is or connected objects (e.g., a bee within the immersive image and a droning of bees within the audio data) are detected within the immersive image and the audio data; and in the case that it is determined that the same object is or connected objects detected within the immersive image and the audio data, increasing the probability associated with this object. Example 92 is a non-transitory computer-readable medium having instructions recorded thereon which, when executed by one or more processors, cause the one or more processors to carry out the method according to any one of Examples 47 to 91. Although the disclosure refers to fear (fear-triggering objects and fear of a user), the above described principles may analogously apply to other feelings of the user, such as preferences. In this case, the objects showing preferences may not be modified by selected to be presented to the user.

Table 1: Fears and examples of their detection

Table 2: Fears and examples of their mitigation strategies