Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR RECORDING AND INTERPOLATION OF AMBISONIC SOUND FIELDS
Document Type and Number:
WIPO Patent Application WO/2020/148650
Kind Code:
A1
Abstract:
A method of recording ambisonic sound fields with a spatially distributed plurality of ambisonic microphones comprising a step of recording sound signals from plurality of ambisonic microphones a step of converting recorded sound signals to ambisonic sound fields and a step of interpolation of the ambisonic sound fields according to the invention comprises a step of generating synchronizing signals for particular ambisonic microphones for synchronized recording of sound signals from plurality of ambisonic microphones and during the step of interpolation of the ambisonic sound fields it includes filtering sound signals from particular microphones with individual filters having a distance-dependent impulse response having a cut-off frequency fc(dm) depending on distance dm between point of interpolation and m-th microphone applying gradual distance dependent attenuation applying re-balancing with amplification of 0th ordered ambisonic component and attenuating remaining ambisonic components. Invention further concerns recording system and computer program product.

Inventors:
JANUSZKIEWICZ LUKASZ (PL)
PATRICIO EDUARDO (PL)
KUKLASINSKI ADAM (PL)
RUMINSKI ANDRZEJ (PL)
ZERNICKI TOMASZ (PL)
Application Number:
PCT/IB2020/050265
Publication Date:
July 23, 2020
Filing Date:
January 14, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ZYLIA SPOLKA Z OGRANICZONA ODPOWIEDZIALNOSCIA (PL)
International Classes:
H04S7/00
Domestic Patent References:
WO2018064528A12018-04-05
WO2003092260A22003-11-06
WO2017137921A12017-08-17
Foreign References:
US10349194B12019-07-09
Other References:
EDUARDO PATRICIO (ZYLIA) ET AL: "Report on Recording of Test Material for 6DoF Audio", no. m46067, 9 January 2019 (2019-01-09), XP030214555, Retrieved from the Internet [retrieved on 20190109]
NILS PETERS (QUALCOMM) ET AL: "On the application of multiple HOA streams for MPEG-I", no. m44875, 4 October 2018 (2018-10-04), XP030192449, Retrieved from the Internet [retrieved on 20181004]
PATRICIO EDUARDO ET AL: "Toward Six Degrees of Freedom Audio Recording and Playback Using Multiple Ambisonics Sound Fields", AES CONVENTION 146; MARCH 2019, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 10 March 2019 (2019-03-10), XP040706485
JOT, J. ET AL.: "Group Report: A spatial audio format with 6 Degrees of freedom", THE TWENTY-SECOND ANNUAL INTERACTIVE AUDIO CONFERENCE - PROJECT BAR-B-Q, 2017
TYLKA, J. G.CHOUEIRI, E.: "Audio Engineering Society Convention 139", October 2015, AUDIO ENGINEERING SOCIETY, article "Comparison of techniques for binaural navigation of higher-order ambisonic soundfields"
SCHULTZ, F.SPORS, S.: "Audio Engineering Society Conference: 52nd International Conference: Sound Field Control-Engineering and Perception", September 2013, AUDIO ENGINEERING SOCIETY, article "Data-based binaural synthesis including rotational and translatory head-movements"
NOISTERNIG, M.SONTACCHI, A.MUSIL, T.HOLDRICH, R.: "Audio Engineering Society Conference: 24th International Conference: Multichannel Audio", June 2003, THE NEW REALITY. AUDIO ENGINEERING SOCIETY, article "A 3D ambisonic based binaural sound reproduction system"
PLINGE, A.SCHLECHT, S. J.THIERGART, 0.ROBOTHAM, T.RUMMUKAINEN, 0.HABETS, E. A.: "Audio Engineering Society Conference: 2018 AES International Conference on Audio for Virtual and Augmented Reality", August 2018, AUDIO ENGINEERING SOCIETY, article "Six-Degrees-of-Freedom Binaural Audio Reproduction of First-Order ambisonics with Distance Information"
BY TYLKA, J. G.CHOUEIRI, E.: "Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality", September 2016, AUDIO ENGINEERING SOCIETY, article "Soundfield Navigation using an Array of Higher-Order ambisonic Microphones"
Attorney, Agent or Firm:
BURY, Marek (PL)
Download PDF:
Claims:
Claims

1. A method of recording and interpolation of ambisonic sound field with a spatially distributed plurality of ambisonic microphones (510, 520, ... , 590) comprising a step of recording sound signals from plurality of ambisonic microphones (510, 520, ... , 590) a step of converting recorded sound signals to an ambisonic sound fields and a step of interpolation of the ambisonic sound fields, characterized in that during the step of recording it further comprises a step of generating synchronizing signals (SI, S2, ...,S9) for particular ambisonic microphones (510, 520, ... , 590) for synchronized recording of sound signals from plurality of ambisonic microphones (510, 520, ... , 590) and during the step of interpolation of the ambisonic sound fields it includes

filtering ambisonic fields from particular microphones with individual filter having a distance-dependent impulse response having a cut off frequency fc(dm) depending on distance dm between point of interpolation and m-th microphone applying gradual distance dependent attenuation applying re-balancing with amplification of 0th ordered ambisonic component and attenuating remaining ambisonic components of order greater than 0.

2. Method according to claim 1, characterized in that before step of recording plurality of ambisonic microphones (510, 520, ... , 590) is arranged in an equilateral triangular grid forming a diamond shape.

3. Method according to claim 2, characterized in that equilateral triangular grid is substantially planar.

4. Method according to claim 2, characterized in that equilateral triangular grid is distributed in three dimensions .

5. Method according to any of claim 1 to 4, characterized in that cut-off frequency fc(dm) decreases linearly with distance dm when dm exceeds predefined value.

6. Method according to any of claim 1 to 4, characterized in that cut-off frequency fc(dm) decreases exponentially with distance dm when dm exceeds predefined value.

7. Method according to any of claim 1 to 6, characterized in that attenuation of ambisonic components of order greater than zero increases exponentially with distance dm when dm exceeds predefined value tt .

8. A system for recording and interpolation of ambisonic sound field comprising a recording device (500) and plurality of ambisonic microphones (510, 520, ... , 590), characterized in that it has a means for generation of individual synchronization signals (SI, S2, ..., S9) and recording device (500) is adapted to execute a method as defined in claim 1.

9. System according to claim 8, characterized in that plurality of ambisonic microphones is arranged in an equilateral triangular grid forming a diamond shape.

10. System according to claim 8, characterized in that equilateral triangular grid is substantially planar.

11. System according to claim 8, characterized in that equilateral triangular grid is distributed in three dimensions .

12. System according to any of claim 8 to 11, characterized in that means for generating synchronization signal are individual sound emitters located in proximity of particular ambisonic microphones (510, 520, ... , 590) .

13. System according to claim 12, characterized in that at least a subset of plurality of ambisonic microphones comprises identical ambisonic microphones and sound emitters are located on the ambisonic microphones (510, 520, ... , 590) within this at least subset in the same place .

14. System according to claim 8, characterized in that ambisonic microphones comprise microphone sensor capsules with individual analog-to-digital converters and means for generating synchronization signal comprise common generator of synchronization signals delivered to analog-to-digital converters of individual microphone sensor capsules.

15. Computer program product for recording and interpolation of ambisonic sound fields, which when executed on processing device fed with sound signals recorded from plurality of ambisonic microphones, is adapted to cause the processing device to execute conversion of the sound signals to ambisonic sound fields and interpolation of said ambisonic sound fields,

characterized in that the interpolation includes

filtering ambisonic sound fields from particular microphones with individual filter having a distance-dependent impulse response having a cut off frequency fc(dm) depending on distance dm between given point of interpolation and m-th microphone

applying gradual distance dependent attenuation applying re-balancing with amplification of 0th ordered ambisonic component and attenuating remaining ambisonic components.

16. Computer program product according to claim 15, characterized in that it is adapted to cause the processing device to execute sound synchronization signals in recorded signals from particular ambisonic microphones and synchronize sound recorded from particular ambisonic microphones prior to conversion and interpolation .

Description:
Method, system and computer program product for recording and interpolation of ambisonic sound fields

[0001] The invention concerns recording of ambisonic sound fields. More specifically the invention concerns interpolation of the ambisonic sound fields obtained from conversion of sound signals recorded with ambisonic microphones.

[0002] Sound field is the dispersion of sound energy within a space with given boundaries. Ambisonics is a sound format used for representation of the sound field taking into account its directional properties. In first order Ambisonics the sound field is decomposed into 4 ambisonic components - spherical harmonics. In higher order of Ambisonics (HOA) the number of ambisonic components is higher, thus the higher spatial resolution of the sound field decomposition can be achieved. Decoding of ambisonic sound field enables reproduction of the sound field at any point of the surrounding space represented by a sphere which originates from the point of recording.

[0003] Immersive media has had a significant increase in popularity and, as related technologies are developed, its usefulness has also seen growth with potential applications in entertainment, research, commerce and education. Six-degrees-of-freedom (6DoF) usually refers to the physical displacement of a rigid body in space. It combines 3 rotational (roll, pitch and yaw) and 3 translational (up-down, left-right and forward-back) movements. The term is also used to refer to the freedom of navigation in immersive / VR environments. While 6DoF has long been a standard in computer gaming, with widely available tools to implement both immersive audio and video, the same cannot be said about cinematic audio and video scenarios. Most VR audio content available nowadays presents a 3DoF ( three-degrees-of-freedom) scenario, in which the user occupies a single, fixed point of view allowing rotation, but not translation movements. There have been noticeable advancements in volumetric videography, e.g disclosed in US patent application no US10349194 and publication of international patent application no W02003092260 which are relevant to VR/AR applications. On the other hand, there is still much to be done regarding live recorded 6DoF audio solutions .

[0004] According to Jot, J. et al . (2017) Group Report: A spatial audio format with 6 Degrees of freedom. The Twenty- Second Annual Interactive Audio Conference - Project Bar-B-Q, there is growing interest in 6DoF audio, but the solutions for live recorded scenarios are still very limited. Live recorded 6DoF audio can be particularly useful in scenarios in which it is of relevance to capture the acoustic characteristics of a specific space e.g. concert room or synchronized spatially spread sound sources (e.g. performing arts; sports events) . It is possible to point to 2 main approaches to live recorded 6DoF audio rendering.

[0005] The first type of scenario makes use of a single ambisonic recordings with simulated off-center listening perspectives - such scenario discussed in detail e.g in Tylka, J. G., & Choueiri, E. (2015, October), Comparison of techniques for binaural navigation of higher-order ambisonic soundfields, In Audio Engineering Society Convention 139, Audio Engineering Society, Schultz, F., & Spors, S. (2013, September), Data- based binaural synthesis including rotational and translatory head-movements, Audio Engineering Society Conference: 52nd International Conference: Sound Field Control-Engineering and Perception. Audio Engineering Society or Noisternig, M., Sontacchi, A., Musil, T., & Holdrich, R. (2003, June) A 3D ambisonic based binaural sound reproduction system. Audio Engineering Society Conference: 24th International Conference: Multichannel Audio, The New Reality. Audio Engineering Society .

[0006] The second type of scenario relies on simultaneous spatially adjacent recordings and was discussed by Plinge, A., Schlecht, S. J., Thiergart, 0., Robotham, T . , Rummukainen, 0., & Habets, E. A. (2018, August), in Six-Degrees-of-Freedom Binaural Audio Reproduction of First-Order ambisonics with Distance Information, Audio Engineering Society Conference: 2018 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society and by Tylka, J. G., & Choueiri, E. (2016, September), Soundfield Navigation using an Array of Higher-Order ambisonic Microphones, Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society.

[0007] Tylka et al . disclosed a method and a system for recording ambisonic sound field with a spatially distributed plurality of higher order (HOA) ambisonic microphones. Sound signals are recorded with plurality of ambisonic microphones and afterwards converted to ambisonic filed. Values of the field in-between ambisonic microphones are interpolated. Ambisonic microphones are matrices of microphones for recording spatial audio. An example of such HOA microphone is disclosed in WO2017137921A1. The aim of interpolation is to reproduce 6DoF sound in the space between ambisonic microphones .

[0008] Plinge et al . disclosed 6DoF reproduction of recorded content based on spatially distributed positions and dedicated transformations for obtaining virtual signals at arbitrary positions of the listener.

[0009] In experimentation the inventors found that known methods of interpolation of ambisonic sound fields recording and conversion sound signals from ambisonic microphones to ambisonic sound field does not work as effectively as expected in general and particularly tend to fail in particular positions of virtual observer with respect to recording microphones . [ 0010 ] A method of recording and interpolation ambisonic fields with a spatially distributed plurality of ambisonic microphones comprising a step of recording sound signals - so called A-format - from plurality of ambisonic microphones, a step of converting recorded sound signals to an ambisonic sound fields and a step of interpolation of the ambisonic fields. The method according to the invention is special during the step of recording it further comprises a step of generating synchronizing signals for particular ambisonic microphones for synchronized recording of sound signals from plurality of ambisonic microphones. That generation of individual signals allows synchronization precise enough to capture spatial properties of the ambisonic sound fields captured by the plurality of ambisonic microphones. During the step of interpolation of the ambisonic sound fields the method includes filtering sound signals from particular ambisonic microphones with individual filters having a distance-dependent impulse response having a cut-off frequency f c (d m ) depending on distance d m between point of interpolation (virtual listener' s position) and m-th microphone, applying gradual distance dependent attenuation applying re-balancing with amplification of 0 th ordered ambisonic component and attenuating remaining components of order greater than 0. Application of distance dependent individual filtration and fading allows reducing disadvantageous impact of signals from ambisonic microphones being further away from the listener's position. Particularly attenuation of the ambisonic components of order greater than 0 allows elimination of irrelevant sound directivity information while preserving contribution of its energy. Amplification of the 0 order ambisonic component allows compensation of energy change and more natural perception of the sound.

[ 0011 ] Advantageously before step of recording plurality of ambisonic microphones is arranged in an equilateral triangular grid forming a diamond shape substantially planar or three dimensional. Use of planar grid is advantageous as the processing runs faster while (3D) distribution enables recording of the sound field in the in the volume of the room.

[0012] Advantageously cut-off frequency f c (d m ) decreases linearly with distance d m when d m exceeds predefined value.

[0013] Alternatively that cut-off frequency f c (d m ) decreases exponentially with distance d m when d m exceeds predefined value .

[0014] Advantageously attenuation of ambisonic components of order greater than zero increases exponentially with distance d m when d m exceeds predefined value t t .

[0015] A system for recording and interpolation ambisonic sound field comprising a recording device and plurality of ambisonic microphones according to the invention has a means for generation of individual synchronization signals and recording device is adapted to execute a method according to the invention.

[0016] Advantageously plurality of ambisonic microphones is arranged in an equilateral triangular grid forming a diamond shape .

[0017] Advantageously equilateral triangular grid is substantially planar or alternatively it is distributed in three dimensions.

[0018] Means for generating synchronization signal are individual sound emitters located in proximity of particular ambisonic microphones.

[0019] Advantageously at least a subset of plurality of ambisonic microphones comprises identical ambisonic microphones and sound emitters are located on the ambisonic microphones within this subset in the same place.

[0020] Advantageously ambisonic microphones comprise microphone sensor capsules with individual analog-to-digital converters and means for generating synchronization signal comprise common generator of synchronization signals delivered to analog-to-digital converters of individual microphone sensor capsules.

[0021] Computer program product for recording and interpolation of ambisonic sound fields, which when executed on processing device fed with sound signals recorded from plurality of ambisonic microphones, is adapted to cause the processing device to execute conversion of the sound signals to ambisonic sound fields and interpolation of said ambisonic sound fields. The interpolation includes filtering ambisonic sound fields from particular microphones with individual filter having a distance-dependent impulse response having a cut-off frequency f c (d m ) depending on distance d m between point of interpolation and m-th microphone applying gradual distance dependent attenuation applying re-balancing with amplification of 0 th ordered ambisonic component and attenuating remaining ambisonic components of higher order.

[0022] Advantageously computer program product is adapted to cause processing device it is run on to detect sound synchronization signals in recorded signals from particular ambisonic microphones and synchronize sound recorded from particular ambisonic microphones prior to conversion and interpolation .

[0023] A system of recording ambisonic sound fields, according to the invention comprises a number of ambisonic microphones connected to processing unit adapted to generate synchronization signal and to receive recording results.

[0024] The invention has been described below in detail, with reference to the attached figures, wherein:

Fig. 1 shows exemplary playback program user interface;

Fig. 2 shows top view of the virtual room with sound sources and microphone placement indications: (1) TV set, (2) phone and (3) fan; Fig. 3 shows absolute MUSHRA scores for Test 1 and Test 2. The 95 % confidence intervals (13 listeners) are plotted;

Fig. 4 shows sifferential MUSHRA scores (30A vs other conditions) for Test 1 and Test 2. The 95 % confidence intervals ;

Fig. 5 shows a block diagram of an embodiment of the recording system according to the inveniton.

[0025] A method according to the invention requires signals from plurality of HOA microphones arranged in a grid covering area (flat) or volume (3D space) .

[0026] Entire area or volume that is to be made navigable in the resulting recording needs to be covered by the grid. An uniform grid composed of equilateral triangles proved to be particularly effective. Experiments with square grids were also successful. In cases when full 6DoF with height is to be recorded, several layers of the grid may be stacked one above the other, possibly with an offset. Orientation of each HOA microphone in the grid should be the same, i.e. the "front" and "top" of all microphones should point to the same directions, respectively.

[0027] In present embodiment of the system according to the invention 9 HOA ambisonic microphones were used. ZYLIA ZM-1 spherical microphone array providing 19 channels from 19 microphone sensor capsules disclosed in WO2017137921A1 proved to be particularly well suited HOA microphone. 9 HOA microphones were used together with state of the art ZYLIA Ambisonics software A-B converter capable of producing ambisonics B-Format of up to the third order being run on processing unit.

[0028] RAW audio captured from the capsules of the ambisonic microphone are represented as multi-channel recording in the so-called A-format. Since each ambisonic microphone can have a different characteristics such as number of microphone sensor capsules, type of capsules and arrangement of the capsules, the A-format is specific to the amibsonic microphone model. The ambisonic sound field is represented in the B-format which is derived from A-format by means of convolution of the raw multi-channel signals with the dedicated matrix of impulse responses. The resulting B-format ambisonic sound fields are subjected to the user's distance dependent interpolation process. The A-B conversion in this example is performed as disclosed in Moreau, S., Daniel, J., & Bertet, S. (2006, May), 3D sound field recording with higher order ambisonics- Objective measurements and validation of a 4th order spherical microphone, in 120th Convention of the AES. Yet, other state of the art conversion mechanism are also applicable.

[0029] It has been found out that conversion of the recorded sound signals to ambisonic sound field requires precise synchronization. Ambisonic microphones provide mechanisms for synchronization of particular microphone sensors being a part of single ambisonic microphone but in order to perform an effective interpolation of the ambisonic sound fields a precise synchronization of sound fields from whole ambisonic microphones is also required.

[0030] Block diagram of an embodiment of the system according to the invention is shown in Fig. 5. It comprises recording device 500 and a plurality of nine ambisonic microphones 510, 520, ... 590 connected to the recording device and feeding sound signals to the recording device 500. Recording device generates individual sound signals with synchronization module 501. Synchronization signals are delivered to particular ambisonic microphones .

[0031] As the ZYLIA ZM-1 does not support an external synchronization through a word clock or USB input, a dedicated synchronization method was applied. The method is based on a hardware and a software components:

• piezoelectric buzzers as sound emitters. They were driven by a common synchronization signal delivered to them electronically. Buzzers were, attached at the base of each ZM-1 microphones in exactly same position near the same microphone sensor capsule;

• a software tool detecting a synchronization impulse played by the buzzers near the beginning and end of each recording and synchronizing/aligning recorded signals accordingly .

[0032] Such synchronization method allows the beginning of the recording from each HOA microphone to be time-aligned as well as the sample clock drift to be estimated. This operation allows for linear interpolation of audio samples.

[0033] Ambisonic microphones are identical and have a form of sphere with 19 microphone sensor capsule. Each of the ambisonic microphones has individual buzzer attached to the same point on the surface of sphere close to the same capsule. That allows most precise synchronization.

[0034] Each ambisonic microphone delivers 19 sound signals from individual capsules. The sound signals are converted to ambisonic sound fields. In the space between ambisonic microphones sound fields obtained from them are interpolated. Synchronization of the sound fields resulting from prior synchronization (alignment) of the sound signals proved to have strong effect on the quality of not only conversion but also interpolation.

[0035] Actual alignment of the recorded sound signals may either be done at the recording stage or at the stage of post processing the signals and conversion.

[0036] Computer program product according to the invention when run on the processing device causes in post processing a conversion of sound signals to ambisonic sound fields and interpolation of the ambisonic sound fields in a manner presented below.

[0037] Computer program product may be further adapt to detect synchronization signals and cause alignment of signals or even adapted to be run on the recording device 500 and control the whole recording process.

[0038] Also alternative mechanisms for synchronization are available. Synchronization of the microphone arrays signals can be performed by application of the dedicated timecode audio signal. Time code signal is distributed as a single-channel audio signal which is attached as an additional audio channel to the raw multi-channel signals of the all microphone arrays used in the system.

[0039] Another way of synchronization is to feed a common World Clock signal to all of the Analog-to-Digital converters used for every single capsule of all of the microphone arrays in the system.

[0040] Method according to the invention provides a playback mechanism capable of ambisonic sound fields interpolation at locations of virtual observes between physical ambisonic microphones used during the recording stage. Computer program product according to the invention in some embodiments is run on the recording device and does synchronization, conversion and interpolation together with recording process while on others is used for post processing of previously recorded and synchronized signals. It can also receive raw signals - and incorporate software tool to detect synchronization audio signals form buzzers and synchronize in postprocessing.

[0041] Method according to the invention of ambisonic sound fields interpolation operates on time-domain ambisonic components which we denote ym, P (n) , where m is the number of the HOA microphone, p is the ambisonic component index, and n is the sample index. The interpolated ambisonic component x P (n) is calculated as a sum of contributions from all HOA microphones in the recording grid. These contributions are calculated by a distance-dependent filtering and scaling of the original ambisonic components. Denoting the number of HOA microphones in the recording grid by M, the distance between the point of interpolation and the m-th microphone by d m , the scaling function by a P{ dm) , the filter by h{dm), and using the convention that (a*b) ( n) is the convolution of signals a(n) and b(n), the interpolated signal can be expressed by:

[ 0042 ] The distance-dependent h(dm) is a first-order low-pass infinite impulse response filter whose cut-off frequency f c is equal to 20kHz when dm is below a threshold value t f , and falls linearly with a slope S f < 0 when dm is above t f :

[ 0043 ] Even better results may be obtained when applying exponential decrease of f c (d m ) for d m exceeding tf .

[ 0044 ] The scaling function a P{ d m) has two components l(d m ) and kp(d m ) .

[ 0045 ] l(d m ) applies a gradual fading of contributions from far-away ambisonic microphones corresponding to the free space attenuation - linear in dB scale.

[ 0046 ] Additionally, a re-balancing of the ratio between the 0 th order omni-directional ambisonics component (p=0) and the directional components (p>0) of higher orders is applied due to kp(d m ) component .

[ 0047 ] Similarly to the filtering operation described above, the fading 1 (dm) and the component re-balancing k P (dm) are progressively applied only when dm exceeds corresponding threshold values ti and t k . Beyond these distances 1 (dm) and k P ( dm) change linearly (in dB) : the greater the distance the stronger the attenuation and the greater the dominance of the omni-directional component over the directional ones. Mathematical formulation of the above follows:

[0048] Distance-dependent attenuation and ambisonics order re balancing are formulated nearly identically cf . (4) and (5) .

However, the attenuation slopes for ambisonics component re balancing can be different for each ambisonics component index p. Typically, this slope will be positive for the zeroth-order ambisonic component and negative for higher-order ambisonic components :

[0049] Consequently the contributions from far-away HOA microphones are not only attenuated but also less contribute to the direction of arrival of the interpolated signal due to attenuation of the higher order ambisonic component.

[0050] Attenuation of higher order ambisonic components results in change of total energy, which if not compensated would be detectable by human as unnatural sound level decrease. That change is compensated by increase of 0-order ambisonic component because of s kp = 0 >0.

[0051] While more advanced methods based on physical modeling of the sound field have been proposed in the past by Plinge et al . and Tylka et al . The relative simplicity of the method according to the invention allows a real-time interactive system to be implemented and used on a personal computer.

[0052] An interactive system was developed to test the method according to the invention interpolation method of simultaneous adjacent ambisonic recordings. Its final design choices, regarding functionality and parameter control, were based on the general theoretical proposition and the need to perform interactive subjective evaluations. The system has two main components: an input/control application, a representational navigable 3D environment and application that executes all the necessary audio transformations based on the navigation input data, having interface shown in Fig. 2.

[0053] The positioning data sent from the navigable 3D scene to the playback component is used to calculate the distance between the listener' s position and the center of each sound field. This distance is the main reference value to control the interpolation mechanism. So, for any given sound field, as the listener moves farther from the center, the following sound transformations occur: (a) volume level fades out; (b) a low- pass filter is applied, and (c) the ambisonic image is gradually reduced to 0 th order. It is possible to set a distance threshold (a point at which the transformation starts) and range that determines the distance necessary to go from 0 to 100% applied transformation. For volume, the full range of transformation goes from the original volume to -75.6dB; for low-pass filtering the cut-off frequency is gradually shifted from 20kHz (no filtering) to 200Hz with 6dB attenuation per octave; for the ambisonic order transformation, crossfading is done between the original order (1st or 3rd) and the 0th order. Both threshold and range parameters are given in meters. The flexibility of defining thresholds and ranges for each transformation, consistently, across all sound fields, is meant to provide room for experimentation and different interpolation configurations.

[0054] The system considers a specific microphone arrangement as seen on the central area of the application' s user interface (Fig. ) . The distance between microphones, a, in meters, can be set in the program to match the distance used during recording. This parameter is essential to calculate the position of each microphone in the grid and, consequently, perform the necessary distance-based interpolations.

[0055] The output of the interpolated ambisonics sound fields is sent to a binaural decoder and can be listened to on headphones. The standard ambisonics rotation transformations are done by IEM' s 'Scene Rotator' VST plug-in.

[0056] The playback system is capable of 5-degrees-of-freedom playback. Vertical translation movement (up and down) is not included and it could be implemented in a future iteration for playback of recording grids with microphone arrays placed in different elevations.

[0057] Spatial attributes of a recorded acoustic scene are preserved when using the proposed strategy for interpolation of multiple ambisonics sound fields. The following aspects were of particular interest:

• naturalness and realism of the perceived direction and distance of sound sources,

• naturalness and smoothness of auditory image evolution when moving across the scene.

[0058] To this end a modified MUSHRA Bl¾d! Nie mozna odnalezc zrodla odwolania. methodology (as disclosed in International Telecommunication Union, "ITU-R BS.1534-3, Method for the subjective assessment of intermediate quality level of audio systems," 2015) was adopted with audio-visual stimuli presented by means of a computer screen and stereo headphones. This allowed the test subjects to have a visual reference regarding the true placement of sound sources in the scene.

[0059] Audio component of the stimuli was prepared as follows. An acoustic scene comprising three sound sources was recorded in a room measuring 4.5 x 6.5 x 2.8 m and exhibiting an average reverberation time of 0.26 s. The sources were chosen to have different tonal and temporal characteristics. The first source was a floor-standing fan that was switched on throughout the recording session. Strips of foil were attached to it in order to make the airflow more audible. Two 5-inches loudspeakers were used as the second and third sources. A sound of a phone ringing intermittently was played through one the loudspeakers and a cartoon soundtrack through the other one. The three sources were arranged in a triangle around the center of the room, 2.5 to 3.5 meters from one another.

[0060] The above-mentioned sources were recorded by a system made up of 9 ZYLIA ZM-1 HOA microphones arranged in an equilateral triangular grid forming a diamond shape encompassing substantially the entire room.

[0061] The distance between adjacent microphones in the grid was 1.6 m and the height of all the microphones above the floor was 1.7 m. Since the HOA microphone grid was two-dimensional (without height) , the resulting recording did not contain full 6DoF information. This was deemed sufficient for the purpose of this evaluation. In addition to the HOA microphones, three large-diaphragm condenser microphones were used to record each of the sources from a short distance. Directional characteristic of these microphones was set to cardioid which resulted in a high degree of separation between the recorded sources .

[0062] The signals registered by the HOA microphones were time- aligned using the system described in Section 2 and subsequently transformed to the Ambisonic domain using the A- B converter. The ambisonics-encoded signals were processed by the proposed interpolation method and subsequently binauralized by IEM rotator and binaural decoder plugins within the Max MSP described in section 3.

[0063] Since the recording took place in a relatively small room, the low-pass filtering functionality of the proposed method was not used. The remaining parameters of the interpolator were set as follows:

ti = t k = 1.4m, Si =—38dB/m,

s k,p =o = lOdB/m, s p>0 = -126dB/m [0064] Three different renderings of the ambisonics sound fields were prepared as stimuli for the test:

• The 0 th order ambisonics (OOA) interpolated by cross fading according to listener position. This was included as the hidden anchor in the test.

• The 1 st order ambisonics (10A) interpolated by using the proposed method.

• The 3 rd order ambisonics (30A) interpolated by using the proposed method.

• The OOA signal contained no spatial clues apart from loudness changes according to distance from a given source .

[0065] The fourth stimulus condition was prepared by spatializing signals of the cardioid microphones at the original positions of the sound sources in the room using Google Resonance decoder and room reverberation simulator (ResonanceAudioRoom Unity audio component) . This stimulus was used as the reference in the MUSHRA test.

[0066] Other tests and recording shown good results of the method according to the invention for

ti e (0.3 m, 2 m)

t k e (0.3 m, 2 m)

[0067] The visual component of the stimuli was prepared in Unity 3D engine and consisted of an interactively navigable virtual recreation of the room where the sound signals were recorded. The fan and the phone were represented by 3D objects of a fan and a phone, respectively. At the position of the third source playing a cartoon soundtrack, a TV receiver object was placed. The dimensions of the room and positions of the sources within it corresponded to the physical room dimensions and source positions. A top view of the virtual room is shown in Fig . 2.

[0068] The virtual camera was controllable by means of a keyboard and mouse in a way similar to computer games with first person perspective.

[0069] Rendering of the audio component of the stimuli was synchronized with the 3D visual scene by linking the Unity 3D session with the Max MSP implementation of the proposed interpolation method via OSC messages. This allowed synchronization of the position and orientation of the virtual listener in the audio scene to the position and orientation of the virtual camera in the 3D visual scene. This system allowed for interactive audio-visual exploration of the virtual room in 5DoF. However, in order to better control the evaluation experiment, a pre-rendered video of the room was prepared where the virtual viewer and listener move on a predefined path around the room. The movement trajectory in the pre-rendered video included two translation dimensions (front-back and left-right) and one rotation dimension (pitch) . By removing the interactive aspect during the MUSHRA test and using a pre rendered cinematic one instead, we were able to ensure that all participants of the experiment experience the same stimuli. The visual component of the stimulus was rendered once and was used for all four audio stimuli described above.

[0070] Presentation system consisted of a personal computer with a player application enabling gapless playback switching between the various audio stimuli included in the test while at the same time displaying the visual component which was common between all conditions. The test interface was presented to test subjects on a separate computer from the one used for stimuli presentation. Two questions were asked: • Test 1: In a scale from 0 to 100 how natural and realistic is the acoustic localization of sound sources with respect to their position in the video?

• Test 2: In a scale from 0 to 100 how natural and smooth is the evolution of distance and position of sound objects during changing the listening point in the scene (translation and rotation) ?

[0071] Additionally, participants were asked to write notes regarding the general listening impression.

[0072] The listening tests were done with 15 trained subjects with the average age of 29.5 years (with standard deviation of 5.1) . 4 subjects were female. 12 subjects had an experience in MUSHRA listening tests before. Most of the subjects were familiar with the acoustics of the room in which the test item was recorded. All of the subjects scored the Reference system over 90 in both tests, however 2 of them scored the lOA-based systems lower than the Anchor. Therefore, the scores of those subjects were removed from statistical analysis of the results.

[0073] Fig. 3 shows the absolute scores with 95% confidence intervals for Test 1 and Test 2. For both tests the Reference system performed significantly better than other assessed systems. Still, the performance of 30A-based systems was rated as "Excellent" in the MUSHRA scale, with average scores of 79.5 for Test 1 and 79.8 for Test 2. The confidence intervals of 10A- and 30A-based systems are overlapping by 4-5 MUSHRA points. However, in the differential scores (Fig. 4) it can be noticed that for both Tests 30A-based system performed better than the lOA-based one, showing statistically significant improvement .

[0074] It is noteworthy that, despite the scores of Test 1 and Test 2 of individual subjects varied significantly, the averaged scores of these Tests show high level of correlation.

[0075] As the results of MUSHRA evaluation show, the proposed method can be a viable to interpolate simultaneous adjacent ambisonics recordings, providing a decent level of consistency in terms of sound source localization and perception of the translation movement within the recorded audio scene. During the test subjects also reported that:

• 30A-based system had more convincing ambient sound than the Reference and 10A-based systems.

• 10A- and 30A-based systems sound more realistic in terms of recreation of the room acoustic properties.

• 30A-based system provides a better sense of localization and immersion of the sound over the 10A-based system.

• Acoustic localization of the sound sources in the Reference signal is more obvious but it sounds artificially.

[0076] The system and method according to the invention are highly applicable for virtual reality purposes. Computer program product according to the invention in some embodiments may be fed with signals already synchronized at the recording step or detect synchronization signals and execute channel synchronization prior to conversion of the sound signals to the ambisonic sound field.

[0077] It is stressed that description above illustrate rather than limit the invention, and that those skilled in the art will be able to easily provide many alternative embodiments and recording scenarios.

[0078] The computer program product according to the invention may be provided on a tangible or non-tangible data carrier including memory devices and data connections. Variants of the computer program product may be used directly in recording process or in postprocessing of previously recorded signals.