AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR REPRODUCING SPATIAL AUDIO

Title:

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR REPRODUCING SPATIAL AUDIO

Document Type and Number:

WIPO Patent Application WO/2019/197709

Kind Code:

Abstract:

There are disclosed various methods, apparatuses and computer program products for audio encoding and reproduction. A method comprises determining, in a first device, parameters for at least two reverberator methods for reproducing a reverberated signal; obtaining information about capabilities of the reverberator methods to reproduce the reverberated signal; producing at least one multidimensional preference information for the reverberator methods; and transmitting the information on the reverberators, the at least one multidimensional preference and related parameters to a second device.

See also references of EP 3777249A4

Attorney, Agent or Firm:

NOKIA TECHNOLOGIES OY et al. (FI)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method comprising:

determining, in a first device, one or more parameters for at least two reverberator methods for reproducing a reverberated signal;

determining, for each reverberator method, a preference attribute describing a capability of the reverberator method to reproduce the reverberated signal;

determining, for each reverberator method, at least one further preference attribute describing a usage suitability of the reverberator method; and

transmitting for each reverberator method, at least an identification of the reverberator method, said one or more parameters for reproducing a reverberated signal and the preference attributes to a second device.

2. The method according to claim 1, wherein said one or more parameters for reproducing a reverberated signal comprise at least one of the following:

- room impulse response RIR;

- filter parameters, such as coefficients, delay line lengths, filter orders;

- a reference or an identifier for directly or indirectly adjusting any of the above.

3. The method according to claim 1 or 2, wherein the preference attributes comprise at least one of the following

Content Reproduction Quality CRQ;

- Bitrate B;

Computational Complexity CC;

Latency L.

4. The method according to claim 3, further comprising

determining weights for the preference attributes;

calculating a preference attribute for Overall Quality (OQ) on the basis of weighted preference attributes; and transmitting the preference attribute for Overall Quality (OQ) and the weights to the second device.

5. The method according to claim 4, further comprising

determining the weights for the preference attributes based on a profile associated with rendering.

6. An apparatus comprising:

means for determining one or more parameters for at least two reverberator methods for reproducing a reverberated signal;

means for determining, for each reverberator method, a preference attribute describing a capability of the reverberator method to reproduce the reverberated signal;

means for detennining, for each reverberator method, at least one further preference attribute describing a usage suitability of the reverberator method; and

means for transmitting for each reverberator method, at least an identification of the reverberator method, said one or more parameters for reproducing a reverberated signal and the preference attributes to a second apparatus.

7. The apparatus according to claim 6, wherein said one or more parameters for reproducing a reverberated signal comprise at least one of the following:

- room impulse response RIR;

- filter parameters, such as coefficients, delay line lengths, filter orders;

- a reference or an identifier for directly or indirectly adjusting any of the above.

8. The apparatus according to claim 6 or 7, wherein the preference attributes comprise at least one of the following

Content Reproduction Quality CRQ

- Bitrate B

Computational Complexity CC

Latency L

Overall Quality OQ.

9. The apparatus according to any of claims 6 or 8, further comprising means for determining weights for the preference attributes;

means for calculating a preference attribute for Overall Quality (OQ) on the basis of weighted preference attributes; and

means for transmitting the preference attribute for Overall Quality (OQ) and the weights to the second device.

10. The apparatus according to claim 9, further comprising

means for determining the weights for the parameters based on a profile associated with rendering.

11. An apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform:

determine one or more parameters for at least two reverberator methods for reproducing a reverberated signal;

determine, for each reverberator method, a preference attribute describing a capability of the reverberator method to reproduce the reverberated signal;

determine, for each reverberator method, at least one further preference attribute describing a usage suitability of the reverberator method; and

transmit for each reverberator method, at least an identification of the reverberator method, said one or more parameters for reproducing a reverberated signal and the preference attributes to a second device.

12. A method comprising:

receiving, in an apparatus, for each of a plurality of reverberator methods, at least an identification of the reverberator method, one or more parameters for reproducing a reverberated signal and preference attributes describing a capability of the reverberator method to reproduce the reverberated signal and a usage suitability of the reverberator method; and

selecting, based on the preference attributes, a reverberator method to be used for reproducing the reverberation in said apparatus.

13. The method according to claim 12, wherein the preference attributes compri se at least one of the following

- Content Reproduction Quality CRQ

- Bitrate B

Computational Complexity CC

Latency L

Overall Quality OQ.

14. The method according to claim 13, wherein the preference attributes are associated with weights for a subset of preference attributes, wherein the preference attribute for the Overall Quality OQ is calculated on the basis of the subset of weighted preference attributes.

15. The method according to claim 14, wherein the plurality of reverberator methods are received in a list arranged in ascending order based on the value of the Overall Quality OQ.

16. The method according to claim 14 or 15, wherein selecting the reverberator method to be used for reproducing the reverberation further comprises

comparing a received first set of the weights to a second set of weights maintained by the apparatus;

using, in response to the received first set of the weights not matching to the second set of weights, the second set of weights for calculating updated values of Overall Quality OQ for the reverberator methods; and

selecting the reverberator method to be used for reproducing the reverberation based on the updated values of Overall Quality OQ.

17. The method according to claim 16, wherein the second set of weights maintained by the apparatus is based on capabilities and application requirements of the apparatus.

18. An apparatus comprising:

means for receiving, for each of a plurality of reverberator methods, at least an identification of the reverberator method, one or more parameters for reproducing a reverberated signal and preference attributes describing a capability of the reverberator method to reproduce the reverberated signal and a usage suitability of the reverberator method; and

means for selecting, based on the preference attributes, a reverberator method to be used for reproducing the reverberation in said apparatus.

19. The apparatus according to claim 18, wherein the preference attributes comprise at least one of the following

Content Reproduction Quality CRQ

- Bitrate B

Computational Complexity CC

- Latency L

Overall Quality OQ.

20. The apparatus according to claim 18 or 19, wherein the preference attributes are associated with weights for a subset of preference attributes, wherein the preference attribute for the Overall Quality OQ is calculated on the basis of the subset of weighted preference attributes.

21. The apparatus according to claim 20, further comprising

means for comparing a received first set of the weights to a second set of weights maintained by the apparatus; means for using, in response to the received first set of the weights not matching to the second set of weights, the second set of weights for calculating updated values of Overall Quality OQ for the reverberator methods; and

means for selecting the reverberator method to be used for reproducing the reverberation based on the updated values of Overall Quality OQ.

22. The apparatus according to any of claims 18 - 21, wherein the 1 preference attributes comprise an identifier, such as a name or a type, of each reverberator.

23. The apparatus according to any of claims 18 - 22, further comprising means for signaling information on the reverberators available in said apparatus to a second apparatus determining the at least one preference attributes about a plurality of reverberators.

24. The apparatus according to claim 23, further comprising

means for signaling, upon a change in configuration or environment of the apparatus, updated information on the reverberators and/or their parameters to the second apparatus.

25. The apparatus according to claim 23 or 24, further comprising

means for signaling that only a subset of the reverberator parameters available in said apparatus are adjustable.

26. An apparatus comprising at least one processor and at least one memory, said at least one memory stored with computer program code thereon, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform:

receive, for each of a plurality of reverberator methods, at least an identification of the reverberator method, one or more parameters for reproducing a reverberated signal and preference attributes describing a capability of the reverberator method to reproduce the reverberated signal and a usage suitability of the reverberator method; and select, based on the preference attributes, a reverberator method to be used for reproducing the reverberation in said apparatus.

Description:

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR

REPRODUCING SPATIAL AUDIO

TECHNICAL FIELD

[0001] The present invention relates to an apparatus, a method and a computer program for reproducing spatial audio with reverberation.

BACKGROUND

[0002] Volumetric video and audio data represent a three-dimensional scene with spatial audio, which can be used as input for virtual reality (VR), augmented reality (AR) and mixed reality (MR) applications. The user of the application can move around in the blend of physical and digital content, and digital content presentation is modified according to user’s position and orientation. Most of the current applications operate in three degrees-of-freedom (3-DoF), which means that head rotation in three axes yaw/pitch/roll can be taken into account. However, the development of VR/AR/MR applications is eventually leading to 6-DoF volumetric virtual reality, where the user is able to freely move in a Euclidean space (x, y, z) and rotate his/her head (yaw, pitch, roll).

[0003] For reproducing the spatial audio of the 3D scene in a natural way, reverberation characteristic to the 3D scene should applied. Reverberation refers to the persistence of sound in a space after the actual sound source has stopped. Different spaces are

characterized by different reverberation characteristics. For conveying spatial impression of the 3D scene, reproducing reverberation perceptually accurately is important. In VR applications, the reverberation relates either to the real acoustic characteristics of the captured VR content or virtual acoustic characteristics of the virtual world. In AR or MR applications, the reverberation may instead or in addition relate to the physical

consumption environment where the rendering device operates.

[0004] Currently, for reproducing reverberation in a client device, generic perceptual or physical parameters for representing reverberation or parameters which are tailored for a specific reverberation method are determined in a server. The parameters are sent to the client device, which uses the parameters with a predetermined reverberation method to add the necessary reverberation during rendering. [0005] However, different real-world platforms, such as mobile devices, PCs, TV sets, home media devices, etc. can have a different set of reverberator methods available.

Moreover, the rendering devices may have different computational and network capabilities, and obtaining perceptually accurate reverberation may require some of the reverberation parameters and/or methods to be preferred over the others. Consequently, the reverberation parameters determined straightforward by the server are non-optimal for use across heterogeneous rendering devices.

SUMMARY

[0006] Now, an improved method and technical equipment implementing the method has been invented, by which the above problems are alleviated. Various aspects of the invention include methods, apparatuses and computer readable media comprising a computer program or a signal stored therein, which are characterized by what is stated in the independent claims. Various details of the invention are disclosed in the dependent claims and in the corresponding images and description.

[0007] According to a first aspect, there is provided a method comprising: determining, in a first device, parameters for at least two reverberator methods for reproducing a reverberated signal; obtaining information about capabilities of the reverberator methods to reproduce the reverberated signal; producing at least one multidimensional preference information for the reverberator methods; and transmitting the information on the reverberators, the at least one multidimensional preference and related parameters to a second device.

[0008] According to an embodiment, the parameters of said multidimensional preference information comprises at least one of the following

Content Reproduction Quality CRQ

Bitrate B

Computational Complexity CC

Latency L

Overall Quality OQ.

[0009] According to an embodiment, the preference information comprises weights for each parameter of the preference information. [0010] According to an embodiment, the method further comprises determining the weights for the parameters based on a profile associated with rendering.

[0011] According to a second aspect, there is provided a method comprising receiving, in an apparatus, information about the reverberators, at least one multidimensional preference information about a plurality of reverberators and related parameters; and selecting, based on the preference information, a reverberator method to be used for reproducing the reverberation in said apparatus.

[0012] Apparatuses according to some embodiments comprise at least one processor and at least one memory, said at least one memory stored with code thereon, which when executed by said at least one processor, causes the apparatus to perform the above methods.

[0013] Apparatuses according to some embodiments comprise means for carrying out the above methods.

[0014] Computer readable storage media according to some embodiments comprises code for use by an apparatus, which when executed by a processor, causes the apparatus to perform the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] For a more complete understanding of example embodiments of the present invention, reference is now made to the following descriptions taken in connection with the accompanying drawings in which

[0016] Fig. 1 shows a system capable of capturing and encoding volumetric video and audio data for representing a 3D scene with spatial audio;

[0017] Fig. 2 shows an example of devices suitable for implementing the embodiments;

[0018] Fig. 3 shows a model of room impulse response;

[0019] Fig. 4 shows a flow chart for determining preference information for

reverberators according to an embodiment;

[0020] Fig. 5 shows a simplified block chart of the system according to an

embodiment;

[0021] Fig. 6 shows an example of synthesizing a reverberated signal according to an embodiment; [0022] Fig. 7 shows a flow chart for determining preference information for

reverberators according to an embodiment;

[0023] Fig. 8 shows an example of an auditory filterbank model for the calculation of perceptual features from reverberation:;

[0024] Fig. 9 shows an example of a reverberator which can be used in various embodiments;

[0025] Fig. 10 shows an example of another reverberator which can be used in various embodiments;

[0026] Fig. 11 shows a flow chart for selecting a reverberator based on the preference information according to an embodiment;

[0027] Fig. 12 shows a flow chart for updating parameters of a reverberator based on the preference information according to an embodiment; and

[0028] Fig. 13 shows a simplified block chart of the system according to an

embodiment.

DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS

[0029] In the following, several embodiments will be described in the context of volumetric or 6DOF audio coding. It is to be noted, however, that while some of the embodiments are described relating to certain audio coding technologies, the invention is not limited to any specific volumetric audio technology or standard. In fact, the different embodiments have applications in any environment where reverberation information of a spatial audio scene is required to be conveyed. Thus, applications including but not limited to general computer gaming, virtual reality, or other applications of digital virtual acoustics can benefit from the use of the embodiments.

[0030] Fig. 1 shows a system for capturing, encoding, decoding, reconstructing and viewing a three-dimensional scheme, that is, for 3D video and 3D audio digital creation and playback. The system is capable of capturing and encoding volumetric video and audio data for representing a 3D scene with spatial audio, which can be used as input for virtual reality (VR), augmented reality (AR) and mixed reality (MR) applications. The task of the system is that of capturing sufficient visual and auditory information from a specific scene to be able to create a scene model such that a convincing reproduction of the experience, or presence, of being in that location can be achieved by one or more viewers physically located in different locations and optionally at a time later in the future. Such reproduction requires more information that can be captured by a single camera or microphone, in order that a viewer can determine the distance and location of objects within the scene using their eyes and their ears. To create a pair of images with disparity, two camera sources are used. In a similar manner, for the human auditory system to be able to sense the direction of sound, at least two microphones are used (the commonly known stereo sound is created by recording two audio channels). The human auditory system can detect the cues, e.g. in timing and level difference of the audio signals to detect the direction of sound.

[0031] The system of Fig. 1 may consist of three main parts: image/audio sources, a server and a rendering device. A video/audio source SRC1 may comprise multiple cameras CAM1, CAM2, ..., CAMN with overlapping field of view so that regions of the view around the video capture device is captured from at least two cameras. The video/audio source SRC1 may comprise multiple microphones uPl, uP2, ..., uPN to capture the timing and phase differences of audio originating from different directions. The video/audio source SRC1 may comprise a high-resolution orientation sensor so that the orientation (direction of view) of the plurality of cameras CAM1, CAM2, ..., CAMN can be detected and recorded. The cameras or the computers may also comprise or be functionally connected to means for forming distance information corresponding to the captured images, for example so that the pixels have corresponding depth data. Such depth data may be formed by scanning the depth or it may be computed from the different images captured by the cameras. The video source SRC1 comprises or is functionally connected to, or each of the plurality of cameras CAM1, CAM2, ..., CAMN comprises or is functionally connected to a computer processor and memory, the memory comprising computer program code for controlling the source and/or the plurality of cameras. The image stream captured by the video source, i.e. the plurality of the cameras, may be stored on a memory device for use in another device, e.g. a viewer, and/or transmitted to a server using a communication interface. It needs to be understood that although a video source comprising three cameras is described here as part of the system, another amount of camera devices may be used instead as part of the system. [0032] It also needs to be understood that although microphones uPl to uPN have been depicted along with cameras CAM1 to CAMN in Fig.l this does not need to be the case. For example, a possible scenario is that closeup microphones are used to capture audio sources at close proximity to obtain a dry signal of each source such that minimal reverberation and ambient sounds are included in the signal created by the closeup microphone source. The microphones co-located with the cameras can then be used for obtaining a wet or reverberant capture of the entire audio scene where the effect of the environment such as reverberation is captured as well. It is also possible to capture the reverberant or wet sound of single objects with such microphones if each source is active at a different time. Alternatively or in addition to, individual room microphones can be positioned to capture the wet or reverberant signal. Furthermore, each camera CAM1 through CAMN can comprise several microphones, such as two or 8 or any suitable number. There may also be additional microphone arrays which enable capturing spatial sound as first order ambisonics (FOA) or higher order ambisonics (HO A). As an example, a SoundField microphone can be used.

[0033] One or more two-dimensional video bitstreams and one or more audio bitstreams may be computed at the server SERVER or a device RENDERER used for rendering, or another device at the receiving end. The devices SRC1 and SRC2 may comprise or be functionally connected to one or more computer processors (PROC2 shown) and memory (MEM2 shown), the memory comprising computer program (PROGR2 shown) code for controlling the source device SRC1/SRC2. The image/audio stream captured by the device may be stored on a memory device for use in another device, e.g. a viewer, or transmitted to a server or the viewer using a communication interface COMM2. There may be a storage, processing and data stream serving network in addition to the capture device SRC1. For example, there may be a server SERVER or a plurality of servers storing the output from the capture device SRC1 or device SRC2 and/or to form a visual and auditory scene model from the data from devices SRC1, SRC2. The device SERVER comprises or is functionally connected to a computer processor PROC3 and memory MEM3, the memory comprising computer program PROGR3 code for controlling the server. The device SERVER may be connected by a wired or wireless network connection, or both, to sources SRC! and/or SRC2, as well as the viewer devices VIEWER 1 and VIEWER2 over the communication interface COMM3.

[0034] For viewing and listening the captured or created video and audio content, there may be one or more reproduction devices REPROC1 and REPROC2. These devices may have a rendering module and a display and audio reproduction module, or these functionalities may be combined in a single device. The devices may comprise or be functionally connected to a computer processor PROC4 and memory MEM4, the memory comprising computer program PROG4 code for controlling the reproduction devices. The reproduction devices may consist of a video data stream receiver for receiving a video data stream and for decoding the video data stream, and an audio data stream receiver for receiving an audio data stream and for decoding the audio data stream. The video/audio data streams may be received from the server SERVER or from some other entity, such as a proxy server, an edge server of a content delivery network, or a file available locally in the viewer device. The data streams may be received over a network connection through communications interface COMM4, or from a memory device MEM6 like a memory card CARD2. The reproduction devices may have a graphics processing unit for processing of the data to a suitable format for viewing. The reproduction REPROC1 may comprise a high-resolution stereo-image head-mounted display for viewing the rendered stereo video sequence. The head-mounted display may have an orientation sensor DET1 and stereo audio headphones. The reproduction REPROC2 may comprise a display (either two- dimensional or a display enabled with 3D technology for displaying stereo video), and the rendering device may have an orientation detector DET2 connected to it. Alternatively, the reproduction REPROC2 may comprise a 2D display, since the volumetric video rendering can be done in 2D by rendering the viewpoint from a single eye instead of a stereo eye pair. The reproduction REPROC2 may comprise audio reproduction means, such as headphones or loudspeakers.

[0035] It needs to be understood that Fig. 1 depicts one SRC1 device and one SRC2 device, but generally the system may comprise more than one SRC1 device and/or SRC2 device.

[0036] The present embodiments relate to providing spatial audio in a 3D scene, such as in the system depicted in Figure 1. In other words, the embodiments relate to volumetric or six-degrees-of- freedom (6-DoF) audio, and more generally to augmented reality (AR) or virtual reality (VR) or mixed reality (MR). AR/VR/MR is volumetric by nature, which means that the user is able to move around in the blend of physi cal and digital content, and digital content presentation is modified accordingly to user position & orientation.

[0037] It is expected that AR/VR/MR is likely to evolve in stages. Currently, most applications are implemented as 3 -DoF, which means that head rotation in three axes yaw/pitch/roll can be taken into account. This facilitates the audio-visual scene remaining static in a single location as the user rotates his head.

[0038] The next stage could be referred as 3-DoF+ (or restricted/limited 6-DoF), which will facilitate limited movement (translation, represented in Euclidean spaces as x, y, z).

For example, the movement might be limited to a range of some tens of centimeters around a location.

[0039] The ultimate target is 6-DoF volumetric virtual reality, where the user is able to freely move in a Euclidean space (x, y, z) and rotate his head (yaw, pitch, roll).

[0040] It is noted that the term“user movement” as used herein refers any user movement i.e. changes in (a) head orientation (yaw/pitch/roll) and (b) user position performed either by moving in the Euclidian space or by limited head movements. User can move by physically moving in the consumption space, while either sensors mounted in the environment track his location in outside-in fashion, or sensors co-located with the head-mounted-display (HMD) device track his location. Sensors co-located in a HMD or a mobile device mounted in an HMD can generally be either inertial sensors such as a gyroscope or image/vision based motion sensing devices.

[0041] Figure 2 depicts example devices for implementing various embodiments. It is noted that the capturing features and rendering features may be implemented either on the same or different devices. The capture side may be implemented on a computing device with at least a processor and a memory and a connection to at least two microphones. The at least two microphones are used to capture audio signals, which enable room impulse response (RIR) measurements to be performed. On the basis of the RIR measurements, reverberation information of the surrounding space may be determined. The capture device transmits at least part of the captured audio signals and the determined reverberation information to the rendering device, which renders the audio signal with reverberation using the determined reverberation method and parameters. The rendering device can be, for example, a mobile device comprising a memory and a processor and a connection to headphones or loudspeakers.

[0042] Reverberation refers to the persistence of sound in a space after the actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important.

[0043] Figure 3 depicts a simplified model of a room impulse response (RIR) which characterizes the reflections and reverberation in a space. After the direct sound, the listener hears directional early reflections as the sound bounces from the walls, the floor and the ceiling. After some point, individual reflections can no longer be perceived but the listener hears diffuse late reverberation. Some of the embodiments further below focus on modeling the diffuse late reverberation.

[0044] The devices shown in Figure 2 may operate according to the ISO/IEC

JTC1/SC29/WG11 or MPEG (Moving Picture Experts Group) future standard called MPEG-I, which will facilitate rendering of audio for 3DoF, 3DoF+ and 6DoF scenarios. The technology will be based on 23008-3:20lx, MPEG-H 3D Audio Second Edition. MPEG-H 3D audio is used for the core waveform carriage (encoding, decoding) in the form of objects, channels, and Higher-Order-Ambisonics (HO A). The goal of MPEG-I is to develop and standardize technologies comprising metadata over the core MPEG-H 3D and new rendering technologies to enable 3DoF, 3DoF+ and 6DoF audio transport and rendering.

[0045] MPEG-I Audio phase 2 (6DOF) standardization aims to define a bitstream comprising parametric metadata to enable 6DOF rendering over MPEG-H 3D audio bistream and new rendering technologies. One requirement is that the MPEG-I renderer should be able to interface with external reverberators on the rendering device. This implies that there needs to be a way in MPEG-I to carry parameters and information enabling selecting the suitable reverberator and its parameters since there are numerous different reverberation methods and algorithms possible. [0046] Therefore, there is a need to develop methods which determine a suitable reverberation method and its parameters among a set of candidate reverberation methods which can be available at a client.

[0047] In the following, an enhanced method for producing preference information relating to different reverberation methods will be described in more detail, in accordance with an embodiment.

[0048] Considering a first aspect, i.e. the operation on the server side, there is now provided a method for determining the suitability of different reverberator methods for reproducing reverberation, producing preference information of different reverberator methods, and transmitting the preference information to a rendering device.

[0049] The method, which is disclosed in Figure 4, comprises determining (400), in a first device, one or more parameters for at least two reverberator methods for reproducing a reverberated signal; determining (402) for each reverberator method, a preference attribute describing a capability of the reverberator method to reproduce the reverberated signal; determining (404) for each reverberator method, at least one further preference attribute describing a usage suitability of the reverberator method; and transmitting (406) for each reverberator method, at least an identification of the reverberator method, said one or more parameters for reproducing a reverberated signal and the preference attributes to a second device.

[0050] In particular, the determined one or more parameters for reproducing a reverberated signal are mapped to the parameters of different reverberation methods and the preference attributes are created for the usage of each method.

[0051] According to an embodiment, said one or more parameters for reproducing a reverberated signal comprise at least one of the following:

- room impulse response RIR;

- filter parameters, such as coefficients, delay line lengths, filter orders;

- a reference or an identifier for directly or indirectly adjusting any of the above.

[0052] Examples for determining the parameters for reproducing a reverberated signal are described more in detail further below.

[0053] According to an embodiment, the preference attributes comprise at least one of the following: - CRQ (Content Reproduction Quality)

- B (Bitrate)

- CC (Computational Complexity)

- L (Latency)

OQ (Overall Quality)

[0054] Thus, compared to the prior known solutions, there is provided selecting between different available reverberation methods depending on how they can fulfil the goals in reproducing captured reverberation or bitrate, computational constraints or latency. Further, the preference attributes are provided for the rendering device to be used in selecting the reverberation method. The reverberation methods and their parameters along with the preference attributes and weights used for ranking the reverberation methods based on different attributes are signaled to the rendering device.

[0055] Figure 5 depicts an overview of the procedure according to the above method, where a reverberated audio signal is captured along with a dry audio signal. The

reverberation parameters are analysed from the reverberated audio signal. Thereafter, the suitability of various reverberation methods in reproducing the reverberation is determined. When the best reverberator or a ranked list of reverberation methods has been determined, information describing the selected reverberator and its parameters are transmitted along with the dry signal to the client device. In addition, preference attributes are transmitted which includes information why a certain reverberator is selected; the preference attributes may include information on how well the reverberator reproduces the reverberation with the determined parameters, its computational complexity, and the latency it causes to the processed signal.

[0056] The client devices then select, from the reverberators available at the client device, the most optimal reverberator on the basis of the preference attributes and renders (synthesizes) the reverberated audio using the dry audio signal, said one or more parameters for reproducing a reverberated signal and the preference attributes.

[0057] Figure 6 illustrates an example of a method for synthesizing the reverberated signal in accordance with the method disclosed in“Creating interactive virtual acoustic environments”, by Savioja, Huopaniemi, Lokki, Vaananen, J. Audio Eng. Soc., Vol. 47, No. 9, 1999 Sep. In general, the early reflections and reverberator parameters can be obtained based on room-impulse-response (RIR) measurement and analysis of the measured response. RIR estimation can operate on any input signals with sufficient frequency content.

[0058] The input signal is fed to a delay line, and the direct sound and directional early reflections are read at suitable delays. The delays corresponding to early reflections can be obtained by analysing the time delays of the early reflections from a measured or idealized room impulse response, such as the one shown in Figure 3. The direct sound is fed to a source directivity and/or distance/gain attenuation modelling filter To(z). The attenuated and directionally-filtered direct sound is then passed to the reverberator. In a conceived implementation scenario of MPEG-I, the reverberator is implemented in the rendering device or hosting software such as Digital Audio Workstation (DAW) and the rest of rendering is taken care by the MPEG-I renderer. The metadata indicating which reverberator method should be used and its parameters are selected and communicated as disclosed in the embodiments. The output of the filter To(z) is also fed to a set of head- related-transfer-function (HRTF) filters which spatially positions the direct sound to the correct direction wrt. the listener’s head. The processing for the early reflections is analogous to the direct sound; they can be also subjected to level adjustment and directionality processing and then HRTF filtering to maintain their spatial position.

[0059] To create a multichannel reverberator, two sets of parameters, one for the left channel and one for the right channel, are used to create incoherent outputs. Similarly, for loudspeaker reproduction there are as many reverberators as there are output channels.

[0060] Finally, the HRTF -filtered direct sound, early reflections and the non-HRTF- filtered reverberation are summed to produce the signals for the left and right ear for binaural reproduction.

[0061] Although not shown in Figure 6, the listener’s head orientation, indicated for example as yaw, pitch, roll, can be used to update the directions of the direct sound and early reflections, as well as sound source directionality, depending on user’s head orientation.

[0062] Although not shown in Figure 6, the listener’s position can be used to update the directions and distances to the direct sound and early reflections. [0063] Figure 7 shows some more detailed embodiments for producing the preference attributes on the server side. At least some of the steps of Figure 7 may be considered sub- steps of Figure 4.

[0064] The step of determining (700) one or more parameters for at least two reverberator methods for reproducing a reverberated signal may be considered to comprise one or more of the following sub-steps:

[0065] obtaining (700a) a reverberated signal;

[0066] extracting (700b), from the reverberated signal, parameters describing the reverberation;

[0067] obtaining (700c) a reverberation method with undetermined reverberation parameters from a list of available reverberation methods; and

[0068] adjusting (700d) the reverberation parameters of the obtained reverberation method so as to be capable of producing the reverberation.

[0069] The parameters describing the reverberation may comprise at least the room impulse response (RIR), for example, like the one depicted in Figure 3. The RIR can be estimated or measured from a captured audio signal. The RIR can alternatively be simulated with geometrical models of virtual acoustics or other suitable means.

[0070] A model for the room impulse response (RIR) measurement from captured audio signals is to assume that the system is linear and time invariant. In this case, the sound source signal i(t ) is convolved with the system’s impulse response h(t ) (the RIR):

where o(t) is the measured, wet, signal (captured by the array) and * the convolution operator. If represented with the complex transfer functions by applying the Fourier transform, we get

O(f) = H(f) · 1(F)

where O(f) = FFT(o(t)), FFT denotes the Fourier transform, and /is the frequency. If we solve for the system transfer function we get [0071 ] The impulse response is obtained by taking real part of the inverse Fourier transform (IFFT):

[0072] As the input signal i(t), maximum length sequences or sinusoidal sweeps with logarithmically increasing frequencies can be used. For some other methods of RIR measurement, any input signals with sufficient frequency content can be used.

[0073] From a list of reverberation methods available at the server, one reverberator method is selected at a time and the parameters of the reverberation method are adjusted so that the measured reverberation can be reproduced. Some non-limiting examples of different reverberators as well as different methods of adjusting their parameters are described further below.

[0074] Generally, the adjusting of parameters may be carried out by analysing suitable parameter values from the measured RIR, for example, estimating a value for the reverberation time (RT) by analysing the slope of decay of the reverberant tail of the RIR. Here, for example the beginning and the end of the diffuse late reverberation may be determined, and the parameters of the reverberation method adjusted so that the diffuse late reverberation can be reproduced. Alternatively, the parameter values can be varied starting from an initial value until the reverberator produces an output which is close to the measured RIR.

[0075] The number and type of the reverberation parameters used for adjusting the reverberation method may vary. The parameters may comprise, for example, digital filter parameters such as coefficients, delay line lengths, filter orders, reverberator method names or other identifiers, software implementation references or identifiers, even a software or pseudocode description of the implementation of the reverberator, or any other suitable representation.

[0076] The parameter values which produce the best match to the measured

reverberation are stored.

[0077] The step of determining (702) for each reverberator method, a preference attribute describing a capability of the reverberator method to reproduce the reverberated signal may be considered to comprise one or more of the following sub-steps: [0078] comparing (702a) reverberation times of the modelled reverberation and the measured reverberation on a plurality of frequency bands; and

[0079] determining (702b) a value for content reproduction quality of said reverberation method on the basis of differences in reverberation times of the modelled reverberation and the measured reverberation on said plurality of frequency bands; and

[0080] analysing (702c) further parameters indicating capabilities of the reverberator methods to reproduce the reverberated signal.

[0081] Accordingly, obtaining information on how well each reverberator reproduces the reverberated signal is carried out to indicate how well the reverberator is able to reproduce the measured reverberator. The reproduction quality may be evaluated on frequency bands, such as octave frequency bands. The goodness evaluation comprises measuring the reverberation time at the frequency bands of the artificial and measured reverberation and comparing the measured reverberation times. The reverberation time measurement is performed at each frequency band, for example as a RT60 measurement by calculating the time it takes for the sound pressure level to reduce by 60 dB, measured from the moment the excitation signal for RIR measurement is ended.

[0082] The content reproduction quality (CRQ) value may be calculated, for example, as mean(RelativeAbsoluteRTDifference), where RelativeAbsoluteRTDifference is a vector which contains the relative absolute differences in reverberation times at the plurality of frequency bands between the modelled and measured reverberation time, and mean denotes the average. Thus, at band i the relative absolute difference is

abs(ReverberationT imeMeasured(i)-ReverberationT imeModeled(i)) /

ReverberationTimeMeasured(i) .

[0083] Herein, the smaller CRQ value of the reverberation method has, the better said reverberation method is capable of reproducing the measured reverberation.

[0084] One alternative for obtaining CRQ value is to compare the reverberation signal generated with the reverberator to the measured reverberation signal. This could be done, for example, by correlation or via a mean-squared-error, if it is desired to model the reverberation as accurately as possible.

[0085] A further alternative is to analyse the goodness of a reverberator in perceptual sense. For this, an auditory model such as depicted in Figure 8 can be applied. Such an auditory model has been presented in Karjalainen, Jarve!ainen:”More about this reverberation science: Perceptually good late reverberation”, In Proc AES 11 lth

Convention, 2001 Sep 21-24, New York, USA. The filterbank divides the input signal to perceptually motivated frequency bands, using, for example, Bark, ERB (equivalent rectangular bandwidth), or mel-frequency scale. This could be a gammatone filterbank, for example. This is followed by an envelope detector for half-wave rectification and a low- pass filter which aims to model the synchrony loss in neural firings toward higher frequencies. The lowpass filter can be a second order lowpass filter with 1-2 kHz cutoff frequency. The loudness or modulation block performs temporal integration or modulation analysis. For loudness calculation it can contain a lowpass filter and for modulation analysis a bandpass filter.

[0086] The auditory model produces a loudness spectrum or modulation spectrum on a perceptually motivated frequency scale. The measured room impulse response and the impulse response of the reverberator algorithm can be fed through the auditory filterbank to produce a loudness spectrum and a modulation spectrum. These spectra can then be compared, for example, by calculating a log spectral difference to obtain a measure of similarity. The more similar spectra the reverberator impulse response has to the spectra of the measured impulse response, the better goodness.

[0087] Alternatively or in addition, various perceptual features can be calculated from the measured room impulse response and the room impulse response created with the artificial reverberator. Such features include, for example, reverberation time, direct-to- reverberant ratio, early-reflection-to-late-response ratio, echo density, or other suitable features. The feature values calculated from the two responses can be compared and the closer they are, the better goodness score is given to the artificial reverberator. Feature values can be compared by first normalizing the features with overall mean and variance of the respective feature values and then applying the Euclidean distance. Various other feature value comparison methods known in the art can be used.

[0088] The step of determining, for each reverberator method, at least one further preference attribute describing a usage suitability of the reverberator method, described in Figure 4 as the step 404 may further comprise analysing (702c) further parameters indicating capabilities of the reverberator methods to reproduce the reverberated signal. [0089] Further preference attributes describing a usage suitability of the reverberator method to reproduce the reverberated signal may comprise one or more of bitrate, computational complexity and latency.

[0090] Thus, the bitrate (B) required for transmitting its parameters and attributes may be used as a factor in selecting the reverberator. The bitrate may be calculated, for example, as the number of (kilo)bits required for transmitting the parameter values at required bit resolution for each reverberator. Thus, the more parameters a reverberator has and the more bits are required for encoding each parameter value, the larger bitrate the reverberator requires. A normalized bitrate score may be obtained by normalizing the bitrate required for the parameters of a particular reverberator with the bitrate required for transmitting the parameters of an equivalent convolutional reverberator (calculating the ratio). Herein, the smaller values of B are again better.

[0091] Another factor which can be used in selecting the reverberator is its

Computational Complexity (CC). Each reverberator method can be associated with a computational complexity value, or the run-time of the reverberator can be measured when it is applied to create the reverberated signal, and this can be used as a factor in deciding the reverberator suitability. In general, computationally lighter reverberator methods are preferred over computationally more complex ones. In the main embodiment, the computational complexity is obtained from a table which tabulates the number of arithmetic operations needed for processing an input of certain length with each reverberator. A normalized Computational Complexity value may be obtained by calculating the ratio of the number of arithmetic operations for this reverberator to the number of arithmetic operations required for an equivalent convolutional reverberator. Again, the smaller value of CC is better.

[0092] Alternatively or in addition to tabulated numbers of required arithmetic operations, the Computational Complexity can be analysed by calculating the time it takes for a reverberator to process a certain input. The computational complexity can be represented as a number relative to the computational complexity of an equivalent convolution reverberator and obtained by dividing the run time for the reverberator with the run time of a convolution reverberator. [0093] The latency caused by a reverberator may also be taken into account when making the selection. Low latency is important in applications which need to operate in real time, for example, in two-way communication. The Latency (L) score may be calculated as the ratio of the input/output latency of a candidate reverberator to the input/output latency of an equivalent convolutional reverb. Again, the smaller value of L is better.

[0094] As shown in Figure 7, the steps of adjusting the reverberation parameters of the obtained reverberation method so as to be capable of producing the measured reverberation (700d) and obtaining information about capabilities of the reverberator methods to reproduce the reverberated signal (702a - 702c) are repeated for all available reverberation methods for which the parameters have not yet been defined. For each reverberation method, the parameters (CRQ, B, CC, L) indicating capabilities of said reverberator method to reproduce the reverberated signal are stored. If there are further reverberated signals available, then the above process may be repeated from the beginning for each further reverberated signal.

[0095] According to an embodiment, the method further comprises determining weights for the preference attributes; calculating a preference attribute for Overall Quality (OQ) on the basis of weighted preference attributes; and transmitting the preference attribute for Overall Quality (OQ) and the weights to the second device.

[0096]

[0097] The steps of determining and calculating may be considered to comprise one or more of the following sub-steps:

[0098] determining (704a) weights for the attributes (CRQ, B, CC, L) indicating capabilities of said reverberator method to reproduce the reverberated signal;

[0099] calculating (704b) overall quality (OQ) for each of the reverberator methods on the basis of weighted attributes; and

[0100] sorting (704c) the list of available reverberator methods on the basis of the overall quality (OQ) values of each reverberator method.

[0101] Accordingly, weights are obtained for CRQ, B, CC, and L and the weighted attribute values are used to calculate the OQ. Weights can be denoted WCRQ, WB, WCC, and WL, respectively. The Overall Quality (OQ) for a reverberator may be obtained as a weighted sum of CRQ, B, CC, and L. The smaller the value of OQ, the better quality across different metrics. Thus,

OQ = WCRQ CRQ + WB B + WCC CC + WL L,

where WCRW is the weight for CRW, WB is the weight for B, WCC is the weight for CC, WL is the weight for L and denotes the product. Weights may be determined, for example, as values between 0 and 1 and normalized such that they sum to unity. In a default scenario, equal weights of 0.25 may be used.

[0102] According to an embodiment, the weights for the summation (WCRQ, WB, WCC, WL) may be signalled from the renderer, depending on how important is modeling goodness or low bitrate or low computational complexity or low latency. In some embodiments, all attributes are signalled to the renderer which makes the final decision based on application requirements and system characteristics. For example, if an application has requested good reverberation modeling then the weight for the modeling goodness score CRQ is increased. Correspondingly, if the computational load of the rendering system is high and/or the rendering is done on a mobile device with limited computational capability, reverberators with low computational complexity CC and reasonable good modeling accuracy CRQ can be preferred.

[0103] According to an embodiment, the weights for the attributes may be determined based on a profile associated with the rendering. For example, there can be a

- hiqh quality profile which associates a high value for WCRQ and lower value for other weights;

real-time profile which associates a high value for WL and WCC and lower value for other weights;

bandwidth-constrained profile which associates a high value for WB and lower value for other weights.

[0104] When OQ values for different reverberators are obtained, the list of available reverberators is sorted in an ascending order based on OQ. Thus, the preferred reverberator having the smallest value of OQ is placed at the top of the list.

[0105] Upon transmitting (706) the identifications of the reverberators, said one or more parameters for reproducing a reverberated signal and the preference attributes to a second (e.g. a rendering) device, any suitable data structure may be used. For example, the following data structure may be formed for transmiting the necessary information to the rendering device:

<T imestamp>value</T imestamp>

< W CRQ>value</W CRQ>

<WB>value</WB>

<WCC>value</WCC>

<WL>value</W CL>

</WeightsForPreferenceInformation >

<OQ>value<OQ>

<CRQ>value</CRQ>

<B>value</B>

<CC>value</CC>

<L>value</CL>

</PreferenceInformation>

<Identifier>IdentifierOfReverberator</Identifier> ;

<ParameterIdentifier>ParameterIdentifier</Parameter Identifier> <ParameterV alue>ParameterV alue</ParameterV alue>

</Parameter>

<ParameterIdentifier>ParameterIdentifierValue</Para meterIdentifier> <ParameterV alue>ParameterV alue</ParameterV alue>

</Parameter>

</Reverberator> </MultidimensionalReverberatorPreferenceInformation>

[0106] It is noted that the reverberation attributes can be associated with a particular space. For example, different rooms or sub-volumes of a 6DOF volume may have different sets of suitability values. For 3DOF content, different sectors (angular parts) of the volume may be associated with different suitability values. Furthermore, in order to convey a desired artistic or creative intent, the reverberator attributes may be associated with a certain audio object (e.g., a sound of an artist reproduced in a certain way irrespective of where s(he) is) or with a subset of the content.

[0107] According to an embodiment, the data structure for transmitting the information to the rendering device may comprise one or more information fields for transmitting the above information. The information fields may enable signalling, which allows the client to make optimal choice, such as the content specific parameters superseding the space - specific parameters or vice versa.

[0108] The signalling may be performed as HTTP/XML/JSON or SIP/SDP or

RTSP/SDP or using other suitable means or bitstream. The values can be quantized for efficient representation and the final data stream furthermore compressed for efficient storage and transmitting.

[0109] As mentioned above, various reverberators may be available at the rendering device. In the following, a non-exhaustive example list of some known reverberators is given. Future reverberators can be added to the system by defining an identifier for them and the set of parameters to be transmitted.

[0110] Convolution reverberator

[0111] The convolution reverberator is the reference reverberator which uses the measured RIR as such for convolutional reverb. The advantage is that no analysis and modeling is needed but the disadvantage is large delay and computational complexity. The parameters to be transmitted contain the measured RIR samples.

[0112] Velvet noise reverberator

[0113] Velvet noise reverberator is depicted in Figure 9. The implementation of the velvet noise reverberator has been described more in detail in“Late reverberation synthesis using filtered velvet noise” by Valimaki, Holm-Rasmussen, Alary, Lehtonen, Appl. Sci. 2017, 7, 483; doi:l0.3390/app70S0483. The velvet noise reverberator enables

computationally efficient reverberation synthesis using sparse velvet noise.

[0114] For the velvet noise reverberator, the step of adjusting reverberator parameters involves dividing the measured RIR into short non-overlapping temporal segments, and then approximating each segment with filtered noise. Instead of Gaussian noise, the method uses sparse velvet noise enabling fast convolution calculation (VNC). Non- uniform temporal segmentation of the RIR can be used to estimate the velvet noise sequences. The filters H _m are spectral coloration filters and they are estimated by fitting a linear prediction model to each segment to model their overall lowpass characteristics. The following describes the procedure in more detail:

[0115] A velvet noise sequence is created as follows:

Create an impulse train with period T _<j, that is, a repeating sequence of one followed by T _<j zeros

Each impulse location is randomly offset within the period T _d

The sign of each impulse is randomly changed

[0116] The following Matlab code may be used to implement the procedure: function [velvet] = velvetnoise(Nd, fs, N)

%VELVETN OISE Generate a sequence of velvet noise

% Nd: desired pulse density per second

% fs: sampling rate in Hz

% N: length of the velvet noise sequence in samples

Td = fs / Nd;

m = l :N;

rl = rand(l,N);

k = round(m * Td + rl .* (Td - 1));

k = k(k <= N);

r2 = rand(l, length(k));

velvet = zeros(l,N); velvet(k) = 2 * round(r2) - 1 ;

end

[0117] In the above code, rand generates a pseudorandom number from the standard uniform distribution on the open interval (0, 1). round rounds to the nearest integer length returns the length of a vector.

[0118] For example, if the sampling rate is 44100 Hz, the pulse density can be selected as 2205 pulses / second which leads to the value of T _d=20 samples as the average pulse distance.

[0119] Convolution with the velvet noise sequence can be represented as

where x(n) is the input signal, s(n) is the velvet noise sequence, * denotes the convolution, and k(m ₊ ) and fc(m_) represent the indices of the positive and negative impulses in the velvet noise sequence.

[0120] The reverberation part of the input room impulse response is next divided into non-overlapping segments W _m, m=l, ..., M which will be modelled with sequences of velvet noise. The segments start when the diffuse reverberant part starts (individual reflections are no longer detectable), for example as in Figure 3. The segments end when the reverberant tail has decayed in amplitude up to the noise floor and the level no longer significantly decreases. For example, the segments can end when the measured

reverberation time T30 or T60 is reached. Non-uniform segmentation to segments is used, so that the first segments are shorter and last segments longer. For example, for a reverberant tail of length two seconds, 20 segments can be used, such that the first segment is 28 ms long and the last segment 17 ms long. If the reverberant tail is shorter then less segments can be used. In some cases, the segments can be designed such that a new segment starts when the spectral bandwidth of the RIR to be modelled changes over a predetermined threshold. The spectral bandwidth can be determined, for example, as the frequency at which the spectrum level has decayed 20 dB from its maximum value.

[0121] For each non-overlapping segment, the system determines the parameters of the velvet noise, a spectral shaping all-pole filter, and an attenuation factor. The velvet noise parameter for a segment W _m includes the noise density Nd,m. The noise density can be adjusted individually for each segment, so that in the beginning a larger noise density is used, such as N _d,i=l00. For the later segments, the noise density can be decreased. A low noise density of N _d,M=40 can be sufficient for the last segment. In between these, the noise density can gradually decrease.

[0122] The spectral shaping filter H _m can be modelled as an all pole filter. For example, a tenth order linear prediction (LP) model can be used. The parameters of the LP model can be obtained, for example, using the Yule-Walker method. The parameters include the coefficients a _m,i, ..., a _m, where L=l0. Furthermore, a gain coefficient g _m for the all pole filter can be obtained as the square root of the prediction error estimate. The coefficients and the gain can further be quantized and their indices can be obtained for storing into the bitstream.

[0123] The attenuation factor G _m adjusts the output of this path to match the RIR segment being modeled. The attenuation factor can be adjusted based on the ratio of the average power of the RIR segment to the average power of a one-second velvet noise sequence with the same pulse density as used for this segment filtered with the all-pole filter. The ratio can be, for example,

sqrt(RIR_segment_averagejower/filtered_velvet_noise _average_power). Optionally the whitening filter corresponding to the all-pole filter can be used to whiten the RIR segment before average power calculation.

[0124] To make the reverberation more diffuse and to smooth the transitions between segments, Schroeder allpass filters can further be used. Such a filter has the transfer function

y + z -N

A(z ) =

1 + gz ^~N

[0125] Total of K such filters can be added in cascade to the reverberator. For example, four Schroeder allpass filters with delay line lengths (N) equal to 225, 341, 441, and 556 samples can be used. Other values can also be used. The coefficient y=0.7 or other suitable value.

[0126] In summary, the parameters included in the bitstream for the velvet noise reverberator include at least some of the following, for each segment W _m: Segment beginning index in samples. The next segment begins one sample after the previous one ends. For the last segment, the end index needs to be added as well.

- Pulse densities N _d,m

All-pole filter coefficients a _m,i, ..., a _m,L or their indices when the coefficients are quantized, and the gain coefficient g _m. The coefficients can also be represented in other form such as reflection coefficients or line spectral frequencies, or when quantized, as corresponding indices

- Attenuation factor G _m

[0127] In addition, Shroeder allpass filter parameters for K allpass filters can be transmitted. These include the coefficients g and delay line lengths for K filters.

[0128] All these parameters can have default values. Only the values differing from the default ones can be signalled.

[0129] In addition, the levels, delays, and directions of arrival for the early reflections can be transmitted. These are transmitted independent of the selected reverberator method.

[0130] Feedback delay network

[0131] Another possible reverberator is the feedback delay network (FDN), depicted Figure 10, for order 3. It is characterized by the delays Ml through M3, coefficients bl through b3, cl through c3, gl through g3, and ql l through q33. The parameters of the FDN can be adjusted such that the reverberation time in octave bands of the artificial and measured reverberation match.

[0132] Various other reverberator methods are available and can be utilized, for example, reverberator methods disclosed in“Fifty years of artificial reverberation,” by V. Valimaki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, IEEE Trans. Audio, Speech, and Language Processing, vol. 20, no. 5, pp. 1421-1448, July 2012, and“More than 50 years of artificial reverberation,” by V. Valimaki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, 2016. In these publications, reverberation methods can be divided to four main categories:

Delay networks simulate the reverberation using delay lines, filters, and feedback connections;

Convolution algorithms apply convolution to a dry input signal using a captured or simulated room impulse response (RIR); Computational acoustics simulate the propagation of sound and its reflections in a given geometry using, for example, ray tracing methods;

Virtual analog models simulate tapes, plates, or springs; devices which were formerly used for producing reverberation effects.

[0133] Binaural and multichannel reverberators

[0134] To create multichannel reverberator, two sets of parameters, one for the left channel and one for the right channel, are used to create incoherent outputs. Similarly, for loudspeaker reproduction there are as many reverberators as there are output channels or the reverberator produces as many outputs as there are output channels. The RIR to be modelled is obtained from a single, monophonic recording, or from a recording matching the desired reproduction setup (binaural or multichannel). In the case of binaural or multichannel RIR, the RIR modelling with a reverberator is performed for each channel separately and parameters are signalled for each channel. In the case of having a single RIR, the modelling is performed only once, and in the rendering two or more outputs of the reverberator are taken to create incoherent outputs. For example, in the case of the velvet noise reverberator two uncorrelated velvet noise sequences can be used to create two different reverberators, otherwise with the same parameters, to create two uncorrelated outputs with similar reverberation characteristics but such that the reverberated output signals will be uncorrelated.

[0135] According to an embodiment, the set of reverberation parameters to be signaled includes the name or type or other identifier of a reverberator method, and its parameters. Each reverberator has at least one parameter which can be adjusted, and the parameter types and numbers depend on the reverberator method. For example, the reverberator type can be a Schroeder reverb, feedback delay network, or velvet noise convolution. The possible reverberator parameters include filter coefficients, delay line lengths, filter orders, software implementation references or identifiers, even a software or pseudocode description of the implementation of the reverberator, or any other suitable representation.

[0136] According to an embodiment, the reverberator and reverberation parameters signalled can change over time. This takes care of updating the reverberation

characteristics as the environment changes. If the reverberator does not change, it is sufficient to signal only the parameter values. Furthermore, if some parameter values do not change compared to previous time then their values do not need to be signalled.

[0137] According to an embodiment, the preference attributes can be delivered as a space-time varying metadata track where each timestamp corresponds to a media segment.

[0138] According to an embodiment, different parts of a physical space can have different reverberator methods and/or parameters. Different parts of a physical space can be identified based on differences in the reverberated signal or based location data associated with the reverberated signal. The reverberated signal can be measured at different locations in a space. It is desirable that the method finds a reverberator and parameters for it for all points so that the reverberation at different locations can be reproduced. To enable reproducing reverberation for different points in space, the reverberator parameters are assigned to location or area definitions. For example, coordinates indicating the measurement points of the original RIR can be added to the reverberator parameters to indicate that the reverberator parameters are valid between those points. Alternatively, boundaries of areas indicating the possible sound source position and listening position can be used to indicate the validity of the reverberator parameters.

[0139] Furthermore, it is desirable to be able to morph or interpolate the reverberator parameters between different points, so that transitions from one listening point to another can be made. For this, the method may comprise analysing the smoothness of the reverberator output if the parameters are interpolated between different points. The more consistent and/or smoother the reverberator output is when its parameters are interpolated between the available measurement points, the better score the reverberator gets. Thus, a new score referred as T ransitionSmoothness (TS) can be added to the bitstream.

[0140] According to an embodiment, the server receives information about the availability of different reverberator methods from another apparatus, such as the rendering device, and the server selects the reverberator method among the reverberator methods indicate as available.

[0141] According to an embodiment, the server receives information that only a subset of a reverberator method parameters can be adjusted and other parameters have fixed values. In this case the server optimizes the reverberator selection by adjusting only the adjustable parameters. There can be, for example, a flag which indicates which parameters can be adjusted and which can be fixed. This can be indicated, for example, by providing a range within which each parameter can be adjusted. In this case the server optimizes the reverberator selection by adjusting only the adjustable parameters, within their allowed ranges. In this case, the server does not need to transmit non-adjustable parameter values to the client.

[0142] It is noted that instead of selecting a suitable reverberator for reproducing measured reverberation the embodiments are also applicable to reproducing idealized, perceptually suitable reverberation. For example, W02015/103024 Al provides one solution for designing perceptually optimal binaural room impulse response (BRIR) filters to be used in headphone virtualizers. The approach may be further adjusted according to the embodiments disclosed herein for selecting the most appropriate reverberation method for producing perceptually optimal reverberation. Thus, in this embodiment the system receives as input an idealized reverberated signal, the parameters of which it attempts to model with a reverberator method and its parameters.

[0143] Considering a second aspect, i.e. the operation on the client side, there is now provided a method for receiving the parameters for different alternative reverberators and their multidimensional preference information, along with the weights used in calculating the multidimensional preference information.

[0144] The method, which is disclosed in Figure 11, comprises receiving (1100) for each of a plurality of reverberator methods, at least an identification of the reverberator method, one or more parameters for reproducing a reverberated signal and preference attributes describing a capability of the reverberator method to reproduce the reverberated signal and a usage suitability of the reverberator method; and selecting (1102), based on the preference attributes, a reverberator method to be used for reproducing the

reverberation.

[0145] According to an embodiment, the preference attributes comprise at least one of the following:

CRQ (Content Reproduction Quality)

B (Bitrate)

CC (Computational Complexity) - L (Latency)

- OQ (Overall Quality)

[0146] The client may receive the information, for example, in the data structure depicted in the server-side step 706 above. The data structure may be read from the audio bitstream. The data structure contains multidimensional preference information for reverberator selection.

[0147] According to an embodiment, the preference attributes are associated with weights for a subset of preference attributes, wherein the preference attribute for the Overall Quality OQ is calculated on the basis of the subset of weighted preference attributes.

[0148] According to an embodiment, the plurality of reverberator methods are received in a list arranged in ascending order based on the value of the Overall Quality OQ.

[0149] According to an embodiment, the step of selecting the reverberator method to be used for reproducing the reverberation may further comprise comparing a received first set of the weights to a second set of weights maintained by the apparatus; using, in response to the received first set of the weights not matching to the second set of weights, the second set of weights for calculating updated values of Overall Quality OQ for the reverberator methods; and selecting the reverberator method to be used for reproducing the

reverberation based on the updated values of Overall Quality OQ. .

[0150] According to an embodiment, the second set of weights maintained by the apparatus is based on capabilities and application requirements of the apparatus.

[0151] The above embodiments are illustrated in Figure 12, where the client, as a part of receiving (1100) for each of a plurality of reverberator methods, at least an identification of the reverberator method, one or more parameters for reproducing a reverberated signal and the preference attributes, also receives (1200) a first set of the weights for each preference attribute. The client obtains (1202) the second set of the weights for ranking different reverberator methods based on application requirements and device state.

Depending on the application and device state, different importance and thus weights should be assigned to the preference attributes. For example, for real-time low-latency operation WL and WCC are important. On the other hand, for high quality reproduction WCRQ is important. Each application can use the weights to adjust application specific preferences for different criteria. The operating system or application resource

management may also adjust the preferences according to device overall computational load or resource budget or available bandwidth.

[0152] The client then reads (1204) the received first set of weights used by the server for calculating preference attributes and determines (1206) whether the importance of different parameters, such as CRQ, B, CC, and L used by the server, match with the capabilities and application requirements of the client device.

[0153] If there is a match, the client device continues from steps (1102) by selecting, based on the preference attributes, the first reverberator method on the list arranged in ascending order based on the value of the Overall Quality OQ to be used for reproducing the reverberation and initializes said reverberator with the given parameters. The client reads the parameters associated with this reverberator and passes the parameter values to the corresponding reverberator implementation on the client devi ce. If the reverberator has already been initialized, the client replaces the old parameter values with the new values. Interpolation of parameter values can be used to ensure a smooth audible output.

[0154] On the other hand, if the importance of different parameters of the preference information used by the server does not match with the second set of weights for ranking different reverberator methods based on the capabilities and application requirements of the client device, the client device reads (1208) the preference attributes for different reverberators. The client device then uses the updated weights (i.e. the second set of weights based on the capabilities and application requirements of the client device) to calculate (1210) updated overall quality (OQ) parameters values for all reverberators available at the client device. On the basis of the updated OQ values, the reverberators available at the client device are sorted (1212) in a new list, which may be used in selecting (1102), based on the preference information, the first reverberator method on the list to be used for reproducing the reverberation and initializes said reverberator with the given parameters.

[0155] According to an embodiment, the client may signal information on the available reverberator methods to the server determining the reverberator. This is illustrated in Figure 13, where compared to Figure 5, an additional block of signaling the available reverberation methods has been added on the side of the rendering device. Besides the available reverberation methods, the client may also signal any other relevant information, such as intention to carry out the rendering on headphones, intention to use a loudspeaker on a phone, the audio reproduction system used in the room, etc. The server may then adjust the reverberator selection and their parameters accordingly. If the application preferences or the rendering environment change during the rendering, for example due to changed configuration of the rendering device, updated information may be sent to the server, which may then readjust the reverberator selection and their parameters.

[0156] According to an embodiment, the client may signal to the server that only a subset of the reverberator parameters can be adjusted and other parameters may have fixed values. There can be, for example, a flag which indicates which parameters can be adjusted and which remain fixed. This can be indicated, for example, by providing a range within which each parameter can be adjusted. In this case the server optimizes the reverberator selection by adjusting only the adjustable parameters, within their allowed ranges. In this case, the server does not need to transmit non-adjustable parameter values to the client.

[0157] In the above, the embodiments relating to determining the suitability of a reverberator have been disclosed as carried out on the server. According to an

embodiment, the client device is configured to carry out all or a subset of operations relating to determining the suitability of a reverberator. In the latter option, the operations relating to determining the suitability of a reverberator may be divided so that a part of the operations is performed in the server and another part in the client device.

[0158] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0159] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0160] Programs, such as those provided by Synopsys, Inc. of Mountain View,

California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0161] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.

Previous Patent: AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR VOLUMETRIC VIDEO

Next Patent: CONTENT-SPECIFIC NEURAL NETWORK DISTRIBUTION

WO/2015/147737	DEVICE FOR REPRODUCING SOUND
WO/2022/140103	PERCEPTUAL ENHANCEMENT FOR BINAURAL AUDIO RECORDING
JP4966705	Mobile communication terminals and programs

WO2016053432A1	2016-04-07
WO2016053432A1	2016-04-07

US20140153727A1	2014-06-05
US20120057715A1	2012-03-08