VILKAMO JUHA TAPIO (FI)
WO2021069793A1 | 2021-04-15 |
GB2595475A | 2021-12-01 | |||
US20200015028A1 | 2020-01-09 | |||
GB2574239A | 2019-12-04 |
CLAIMS 1. An apparatus comprising means for: obtaining a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; using, at least the associated spatial metadata to determine directional distribution information for the indirect audio; determining rendering information corresponding to the determined directional distribution information; and enabling rendering of the spatial audio using the determined rendering information, the one or more audio signals and the associated spatial metadata. 2. An apparatus as claimed in claim 1, wherein the indirect audio comprises non- directional audio. 3. An apparatus as claimed in any preceding claim, wherein the indirect audio comprises diffuse audio. 4. An apparatus as claimed in any preceding claim, wherein the determined directional distribution information indicates one or more directions associated with the indirect audio. 5. An apparatus as claimed in any preceding claim, wherein the rendering information comprises a target covariance matrix of the audio signals. 6. An apparatus as claimed in any of claims 1 to 4, wherein the rendering information comprises diffuse sound gains for channels of a multichannel loudspeaker arrangement. 7. An apparatus as claimed in any preceding claim, wherein the means are for using, at least the associated spatial metadata to determine direction information for the direct audio. 8. An apparatus as claimed in any preceding claim, wherein the associated spatial metadata comprises information that enables mixing of audio signals so as to enable rendering of the spatial audio in a selected audio format. 9. An apparatus as claimed in any preceding claim, wherein the associated spatial metadata comprises, for one or more frequency sub-bands, information indicative of at least one of: a sound direction; and sound directionality. 10. An apparatus as claimed in any preceding claim, wherein the associated spatial metadata comprises, for one or more frequency sub-bands one or more prediction coefficients. 11. An apparatus as claimed in any preceding claim, wherein the associated spatial metadata comprises one or more coherence parameters. 12. An electronic device comprising an apparatus as claimed in any preceding claim, wherein the electronic device is at least one of: a telephone, a camera, a computing device, a teleconferencing apparatus. 13. A method comprising: obtaining a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; using, at least the associated spatial metadata to determine directional distribution information for the indirect audio; determining rendering information corresponding to the determined directional distribution information; and enabling rendering of the spatial audio using the estimated target spatial features, the one or more audio signals and the associated spatial metadata. 14. A method as claimed in claim 13, wherein the indirect audio comprises non-directional audio. 15. A method as claimed in any of claims 13 to 14, wherein the indirect audio comprises diffuse audio. 16. A method as claimed in any of claims 13 to 15, wherein the determined directional distribution information indicates one or more directions associated with the indirect audio. 17. A method as claimed in any of claims 13 to 16, wherein the rendering information comprises a target covariance matrix of the audio signals. 18. A method as claimed in any of claims 13 to 16, wherein the rendering information comprises diffuse sound gains for channels of a multichannel loudspeaker arrangement. 19. A method as claimed in any of claims 13 to 18, wherein using at least the associated spatial metadata comprises determining direction information for the direct audio. 20. A method as claimed in any of claims 13 to 19, wherein the associated spatial metadata comprises information that enables mixing of audio signals so as to enable rendering of the spatial audio in a selected audio format. 21. A method as claimed in any of claims 13 to 20, wherein the associated spatial metadata comprises at least one of: for one or more frequency sub-bands, information indicative of at least one of: a sound direction; and sound directionality; and for one or more frequency sub-bands, one or more prediction coefficients. 22. A method as claimed in any of claims 13 to 21, wherein the associated spatial metadata comprises one or more coherence parameters. 23. A computer program comprising computer program instructions that, when executed by processing circuitry, cause: obtaining a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; using, at least the associated spatial metadata to determine directional distribution information for the indirect audio; determining rendering information corresponding to the determined directional distribution information; and enabling rendering of the spatial audio using the estimated target spatial features, the one or more audio signals and the associated spatial metadata. 24. An apparatus comprises: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; determine directional distribution information for the indirect audio using at least the associated spatial metadata; determine rendering information corresponding to the determined directional distribution information; and enable rendering of the spatial audio using the determined rendering information, the one or more audio signals and the associated spatial metadata. |
where ( ) , ( ), and ( ) are the Cartesian coordinates corresponding to , , ( ) comprises a contribution of indirect audio that originates mostly from the X direction, , , ( ) comprises a contribution of indirect audio that originates mostly from the Y direction and , , ( ) comprises a contribution of indirect audio that originates mostly from the Z direction. The indirect audio binaural covariance matrix with directional distribution can be determined using the diffuse sound ratios , ( , ) , ( ) , and , ( , ) and the diffuse binaural covariance matrices in X, Y, and Z directions , , ( ) , , , ( ) , and , , ( ) . For example, the indirect audio binaural covariance matrix with directional distribution can be determined by: In examples where the indirect audio ratios , ( , ) , ( , ) , and , ( , ) are equal (that is, where each of the ratios is 1/3), the resulting indirect audio binaural covariance matrix , might be close to the uniform diffuse binaural covariance matrix , , ( ). In some examples, it might be possible to add slight tuning to values of ^ , , ( ) , , , ( ) , and ^ , , ( ) so that the average of them is exactly ). Therefore, rendering of audio signals with an even diffuse distribution (that is where, , ( , ) , , ( , ) , and , ( , ) are all 1/3) produces the same results as implementations that do not use examples of the disclosure. The covariance matrix determiner 411 provides covariance matrices 413 as an output. The covariance matrices 413 that are provided as the output can comprise the input covariance matrix ( , ) and the target covariance matrix ( , ) . In the above equations it is implied that the processing is performed in a unified manner within the bins of each band ^. In some examples the processing can be performed with a higher frequency resolution, such as for each frequency bin ^. In such examples the equations given above would be adapted so that the covariance matrices are determined for each bin ^, but using the parameters of the rendering metadata 315 for the band ^ where the bin resides. In some examples the input covariance matrices and the target covariance matrices can be temporally averaged. The temporal averaging could be implemented using infinite impulse repose (IIR), finite impulse response (FIR) averaging or any other suitable type of temporal averaging. The covariance matrix determiner 411 can be configured to perform the temporal averaging so that the temporally averaged covariance matrices 413 are provided as an output. In this example for obtaining the target covariance matrix only parameters relating to direction and energy ratios have been considered. In other examples other parameters can be taken into consideration when obtaining the target covariance matrix. For example, in addition to the direction and energy ratios spatial coherence parameters, or any other suitable parameters could be considered. The use of other types of parameters can enable spatial audio outputs to be provided in formats other than binaural formats and/or can improve the accuracy with which the spatial sounds can be reproduced. The processing matrix determiner 415 is configured to receive the covariance matrices 413 C x (k, n) and C y (k, n) as an input. The processing matrix determiner 415 is configured to use the covariance matrices 413 C y (k, n) and C y (k, n) to determine processing matrices M(k, n) and M(k, n.) Any suitable process can be used to determine the processing matrices M(k, n) and M r (k, n). In some examples the process that is used can comprise determining mixing matrices for processing audio signals with a measured covariance matrix C x (k, n), so that they attain a determined target covariance matrix C y (k, n). Such methods can be used to generate binaural audio signals or surround loudspeaker signals or other types of audio signals. To formulate the processing matrices the method can comprise using a matrix such as a prototype matrix. The prototype matrix is a matrix that indicates, for the optimization procedure, which kind of signals are meant for each of the outputs. This can be within the constraint that the output must attain the target covariance matrix. In examples where the spatial audio output format is a binaural format, the prototype matrix could be: This protype matrix indicates that the signal for the left ear is predominantly rendered from the left pre-processed transport channel and the signal for the right ear is predominantly rendered from the right pre-processed transport channel. In some examples the orientation of the user’s head can be tracked. If it is determined that the user is now facing towards the rear half- sphere then the prototype matrix would be: 0.0 The processing matrix determiner 415 may be configured to determine the processing matrices M(k, n) and M r (k, n), based on the prototype matrix and the input and target covariance matrices, using means described in Vilkamo, J., Bäckström, T., & Kuntz, A. (2013). Optimized covariance domain framework for time–frequency processing of spatial audio. Journal of the Audio Engineering Society, 61(6), 403-411. The processing matrix determiner 415 is configured to provide the processing matrices M(k, n) and M r (k, n) 417 as an output. The processing matrices M(k, n) and M r (k, n) 417 are provided as an input to a decorrelate and mix block 405. The decorrelate and mix block 405 also receives the pre-processed transport audio signals x(b, n) 403 as an input. The decorrelate and mix block 405 can comprise any means that can be configured to decorrelate and mix the pre-processed transport audio signals x(b, n) 403 based on the processing matrices M(k, n) and M r (k, n) 417. Any suitable process can be used to decorrelate and mix the pre-processed transport audio signals x(b, n) 403. In some examples the decorrelating and mixing of the pre-processed transport audio signals x(b, n) 403 can comprise processing the pre-processed transport audio signals x(b, n) 403 with the same prototype matrix that has been applied by the processing matrix determiner 415 and decorrelating the result to generate decorrelated signals x D (b, t). The decorrelated signals x D (b, t) (and the pre-processed transport audio signals x(b, n) 403) can then be mixed using any suitable mixing procedure to generate time-frequency audio signals 407. In some examples the following mixing procedure can be used to generate the time-frequency audio signals 407: y(b, t) = M(k, n) x(b, n) + M r (k, n) x D (b, n) where the band k is the band in which bin ^ resides. As mentioned previously the notation that has been used here implies that the temporal resolution of processing matrices M(k, n) and M r (k, n) 417 and pre-processed transport audio signals x(b, n) 403 are the same. In other examples they could have different temporal resolutions. For example, the temporal resolution of the processing matrices 417 could be sparser than the temporal resolution of the pre-processed transport audio signals 403. In such examples an interpolation process, such as linear interpolation, could be applied to the processing matrices 417 so as to achieve the same temporal resolution of the pre-processed transport audio signals 403. The interpolation rate can be dependent on any suitable factor. For example, the interpolation rate can be dependent on whether or not an onset has been detected. Fast interpolation can be used if an onset has been detected and normal interpolation can be used if an onset has not been detected. The decorrelate and mix block 405 provides the time-frequency spatial audio signals 407 as an output. The time-frequency spatial audio signals 407 are provided as an input to an inverse filter bank 409. The inverse filter bank 409 is configured to apply an inverse transform to the time- frequency spatial audio signals 407. The inverse transform that is applied to the time- frequency spatial audio signals 407 can be a corresponding transform to the one that is used to convert the decoded transport audio signals 323 to time-frequency transport audio signals 327 in Fig.3. The inverse filter bank 409 is configured to provide spatial audio output 111 as an output. The spatial audio output 111 is provided in any suitable audio format. Different examples could use different methods instead of the covariance matrix based rendering other than the example used in Fig.4. For instance, in other examples the audio signals could be divided into directional and non-directional parts. A ratio parameter from the spatial metadata could be used to divide the signals into directional and non-directional parts. The directional part could then be positioned to virtual loudspeakers using amplitude panning or any other suitable means. The non-directional part could be distributed to all loudspeakers and decorrelated. The processed directional and non-directional parts could then be added together. Each of the virtual loudspeakers can then be processed with HRTFs to obtain the binaural output. Systems that implement examples of the disclosure can therefore provide a distribution of indirect audio to virtual loudspeakers in which the indirect audio is not evenly distributed but instead is based on the indirect audio ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) and the directions of the virtual loudspeakers. As an example, the non-directional sound gain for a virtual loudspeaker could be obtained by multiplying the squared x-coordinate of the virtual loudspeaker by the diffuse sound ratio r x,diff (k, n), multiplying the squared y-coordinate of the virtual loudspeaker by the diffuse sound ratio r y,diff (k, n), multiplying the squared z-coordinate of the virtual loudspeaker by the diffuse sound ratio r z,diff (k, n), and summing the results. Then, the non-directional sound gains of all the virtual loudspeakers could be normalized so that the squared sum of them equals to one. A similar approach could also be used for multichannel loudspeaker output, in which case the virtual loudspeakers would be replaced by actual loudspeakers. In the examples described above the bitstream 107 only comprises a single encoded transport audio signal 319 and mixing is performed using the single transport audio signal and decorrelated versions of it. As a result, each input signal to the mixing had the same energy, and the diffuse sound ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) could be computed without taking the energy into account. It is not necessary to compute the energy for the decorrelated channels because this should correspond to the energy of the transport audio signals from which they were created (that, is the first, omnidirectional, transport audio signal). The energy values for the transport audio signals can be used, instead. In some examples, the bitstream 107 could comprise a plurality of transport audio signals. In such examples the energy needs to be taken into account when the diffuse sound ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) are being computed. In such examples, the energies E(k, n, j) of the transport audio signals are computed in frequency bands. where s'(b, n, j) is the j:th channel signal of s'(b, n). The diffuse sound gains can then be computed using the energies E(k, n, j) and the mixing matrix A(i, j, k, n) It should be noted that these equations could be used for scenarios that use with a single transport audio signal, as well even though they are computationally more complex. The rest of the processing can be performed as was presented above. That is, the diffuse sound ratios can be computed using these diffuse sound gains. Examples of the disclosure can be implemented in systems that allow for head tracking of the listeners head orientation. In such examples the matrix ^(^, ^, ^, ^) can be rotated using a rotation matrix according to the listener head orientation prior before the diffuse sound gains g x,diff (k, n), g y,diff (k, n), and g z,diff (k, n) are estimated. For example, when an FOA signal is generated from the transport audio signals by y(b, n) = A(k, n) s'(b, n), then a rotated FOA signal could be generated by Where ^(^) is an FOA rotation matrix according to the listener head orientation. The rotation matrix can mix the X,Y,Z channels to new X,Y,Z channels so that they are aligned according to the current listener head position). Therefore, using a rotated mixing matrix Â(k, n) in place of A(k, n) in the equations enables the head orientation to be taken into account. In the above example the spatial metadata in the spatial audio signal was provided in an FOA format. Other formats could be used for the spatial metadata in other examples. In such example, the diffuse sound ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) can be estimated in a different way. For instance, in some examples the spatial metadata can comprise direction (azimuth, elevation) θ(k, n), Φ(k, n) and direct-to-total energy ratio r(k, n) parameters. This kind of spatial metadata can be obtained from mobile devices having a microphone array attached to them or from any other suitable type of device or by using any suitable processes. In such examples, the diffuse sound ratios can be estimated based on an average direction. This can be implemented as follows. The directions are converted to Cartesian coordinates , , ( , ) The diffuse sound gains can be determined by averaging the absolute values of them over time, weighted by how “diffuse” the sound is estimated to be (e.g., 1 − r(k, n)) where ^ denotes the time interval over which the averaging is performed. The averaging can also be performed using IIR (infinite impulse response) smoothing or any other suitable process. The rest of the processing can be performed as described above. For example, the diffuse sound ratios can be computed using these diffuse sound gains. These formulas for the diffuse sound gains can also be energy-weighted so that time indices n' with a higher energy have a greater affect on the average result than time indices n' with a lower energy. In some examples the spatial metadata could a plurality of different types of parameters or different types of parameters could be obtained from the spatial metadata. For example, the spatial metadata could comprise both SPAR (Spatial Audio Rendering) and DirAC (Directional Audio Coding) parameters or any other suitable type of parameters. In such cases a first type of parameters could be used to compute the diffuse sound ratios and a second type of parameters could be used for the rendering. For example, the diffuse sound ratios could be computed using the SPAR parameters and the DirAC parameters could be used for the rendering. In some examples the spatial metadata might be available in a first format for a first set of frequencies and the spatial metadata might be available in a second format for a second set of frequencies. For instance, the spatial metadata could comprise SPAR parameters for a first set of frequencies and DirAC parameters for a second set of frequencies. In these cases, different processes could be used for estimating the diffuse sound ratios for the different frequencies. In some cases, different processes can also be used for determining the rendering metadata 315 at different frequencies. In the examples shown in Figs. 3 and 4 the directional distribution information such as the diffuse sound ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) are estimated within the decoder 109. In some examples the directional distribution information such as the diffuse sound ratios r x,diff (k, n), r y,diff (k, n), and r z,diff (k, n) can be estimated within the encoder 105 and then transmitted to the decoder 109. The directional distribution information can be transmitted with the spatial metadata and any other suitable parameters. In the above examples the directional distribution information was obtained for X, Y and Z axes separately. On other examples the directional distribution information can be obtained for other coordinate systems. For instance, the diffuse component could be determined in a rotated coordinate system such that the rotation adaptively maximizes, or substantially maximizes, the energy of the first-axis diffuse component. As an example, there could be one source at 45 degrees azimuth and 45 degrees elevation, and another at -135 degrees azimuth and -45 degrees elevation (that is, at the opposite direction). In this case, an embodiment would measure and reproduce the diffuse component predominantly on that axis, instead of focusing on fixed X, Y and Z coordinates. In some examples, the organization of the diffuse component does not need to follow any rotated or unrotated set of axes. For example, when a FOA covariance matrix is determined, based on measuring the covariance matrix of signal y(b, n) = A(k, n) s'(b, n), it is possible to use a minimum-variance distortionless response (MVDR) beamforming method to determine the spatial energy spectrum at a surrounding spatial distribution of directions. To do this the energy from d:th direction (that is corresponding to sound arriving from Direction of Arrival d (DOA d ) can be denoted E'(d, k, n). These energy values can be utilized to spatially weight the diffuse binaural covariance matrix (^^^ ^ , ^ ) ^ ^ (^^^ ^ , ^)^′(^, ^, ^) ∑ ^ ^ ^^ ^′(^, ^, ^) where ^(^^^ ^ , ^) is the HRTF vector for ^^^ ^ and band ^. In the above formula, it is possible also to use temporal averaging to prior temporal indices and apply (1 − r(k, n)) weighting to the emphasize the more diffuse temporal steps at the estimate. The above formula to determine C d (k, n) can also be used when the method of determining the diffuse sound distribution follows a coordinate system such as X, Y, Z or a rotated one, by first mapping the ratio values g (x,y,z),diff (k, n) (or rotated similar values) to the spatial energy distribution values E'(d, k, n), and then applying the above formula. In the above described examples, the decorrelated sound was generated based on decorrelating the left and right pre-processed transport audio signals 403 and mixing them to obtain a residual component. In mono cases where there is a single transport audio signal this means that the mono sound is decorrelated to left and right decorrelated sounds. The left and right decorrelated sounds are mixed using a covariance-matrix based rendering scheme. This can assume an input covariance matrix (of the decorrelated part) to be a diagonal matrix. In some examples, the mono transport audio signal can be decorrelated just once and a two- channel signal can be generated by providing to the first channel the decorrelated signal, and an inverted (multiplied by -1) decorrelated signal to the second channel. In such cases, the input covariance matrix of the decorrelated part is not diagonal, but the cross-term of the covariance matrix is the same as the diagonal value, but with a negative sign. This procedure can be used for the situations where the decorrelators themselves fall short in generating signals that can be assumed to be fully incoherent. In some examples, the diffuse binaural covariance matrix can be generated based on the estimated FOA covariance matrix. In such examples, the transport signals covariance matrix can be determined by Then, C s (k, n) is zero-padded to size 4x4, and the values of the first diagonal value are placed on the zero-padded diagonal values, to obtain a padded matrix C s' (k, n). The FOA covariance matrix then is ^ ^^^ (^, ^) = ^(^, ^)^ ^^ (^, ^)^ ^ (^, ^) The FOA matrix can be rotated according to the head-orientation by Then, the diffuse binaural covariance matrix can be determined as where H FOA (k) is a FOA-to-binaural processing matrix. In this method, if the spatial metadata denotes that audio is indirect audio then the target covariance matrix is predominantly based on the FOA-to-binaural rendering scheme. However, if the spatial metadata denotes the sound to be direct audio then the target covariance matrix is predominantly based on the rendering metadata consisting of the direction parameters. This covariance matrix C d (k, n) can also be estimated by first generating a signal y d (b, n) = H FOA (k)R(n) A(k, n) s'(b, n) and then measuring the covariance matrix of signal ^ ^ ( ^, ^ ) . Fig.5 schematically shows an example apparatus 501 that could be used in some examples of the disclosure. The apparatus 501 could comprise a controller apparatus and could be provided within an electronic device such as a telephone, a camera, a computing device, a teleconferencing apparatus or any other suitable type of device. In the example of Fig.5 the apparatus 501 comprises at least one processor 503 and at least one memory 505. It is to be appreciated that the apparatus 501 could comprise additional components that are not shown in Fig.5. In the example of Fig. 5 the implementation of the apparatus 501 can be implemented as processing circuitry. In some examples the apparatus 501 can be implemented in hardware alone, have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware). As illustrated in Fig.5 the apparatus 501 can be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 507 in a general-purpose or special-purpose processor 503 that can be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 503. The processor 503 is configured to read from and write to the memory 505. The processor 503 can also comprise an output interface via which data and/or commands are output by the processor 503 and an input interface via which data and/or commands are input to the processor 503. The memory 505 is configured to store a computer program 507 comprising computer program instructions (computer program code 509) that controls the operation of the apparatus 501 when loaded into the processor 503. The computer program instructions, of the computer program 507, provide the logic and routines that enables the apparatus 501 to perform the methods illustrated in Figs.2 to 4. The processor 503 by reading the memory 505 is able to load and execute the computer program 507. The apparatus 501 therefore comprises: at least one processor 503; and at least one memory 505 including computer program code 509, the at least one memory 505 and the computer program code 509 configured to, with the at least one processor 503, cause the apparatus 501 at least to perform: obtaining a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; using, at least the associated spatial metadata to determine directional distribution information for the indirect audio; determining rendering information corresponding to the determined directional distribution information; and enabling rendering of the spatial audio using the determined rendering information, the one or more audio signals and the associated spatial metadata. As illustrated in Fig. 5 the computer program 507 can arrive at the apparatus 501 via any suitable delivery mechanism 511. The delivery mechanism 511 can be, for example, a machine readable medium, a computer-readable medium, a non-transitory computer-readable storage medium, a computer program product, a memory device, a record medium such as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a solid-state memory, an article of manufacture that comprises or tangibly embodies the computer program 507. The delivery mechanism can be a signal configured to reliably transfer the computer program 507. The apparatus 501 can propagate or transmit the computer program 507 as a computer data signal. In some examples the computer program 507 can be transmitted to the apparatus 501 using a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IPv6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC), Radio frequency identification, wireless local area network (wireless LAN) or any other suitable protocol. The computer program 507 comprises computer program instructions for causing an apparatus 501 to perform at least the following: obtaining a spatial audio signal comprising one or more audio signals and associated spatial metadata wherein the associated spatial metadata is configured to enable rendering of spatial audio from the one or more audio signals and wherein the spatial audio comprises direct audio and indirect audio; using, at least the associated spatial metadata to determine directional distribution information for the indirect audio; determining rendering information corresponding to the determined directional distribution information; and enabling rendering of the spatial audio using the determined rendering information, the one or more audio signals and the associated spatial metadata. The computer program instructions can be comprised in a computer program 507, a non- transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions can be distributed over more than one computer program 507. Although the memory 505 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable and/or can provide permanent/semi-permanent/ dynamic/cached storage. Although the processor 503 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable. The processor 503 can be a single core or multi-core processor. References to “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc. or a “controller”, “computer”, “processor” etc. should be understood to encompass not only computers having different architectures such as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed- function device, gate array or programmable logic device etc. As used in this application, the term “circuitry” can refer to one or more or all of the following: (a) hardware-only circuitry implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable): (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g. firmware) for operation, but the software might not be present when it is not needed for operation. This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device. The blocks illustrated in the Figs.2 to 4 can represent steps in a method and/or sections of code in the computer program 507. The illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block can be varied. Furthermore, it can be possible for some blocks to be omitted. The term ‘comprise’ is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising Y indicates that X may comprise only one Y or may comprise more than one Y. If it is intended to use ‘comprise’ with an exclusive meaning then it will be made clear in the context by referring to “comprising only one...” or by using “consisting”. In this description, reference has been made to various examples. The description of features or functions in relation to an example indicates that those features or functions are present in that example. The use of the term ‘example’ or ‘for example’ or ‘can’ or ‘may’ in the text denotes, whether explicitly stated or not, that such features or functions are present in at least the described example, whether described as an example or not, and that they can be, but are not necessarily, present in some of or all other examples. Thus ‘example’, ‘for example’, ‘can’ or ‘may’ refers to a particular instance in a class of examples. A property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example. Although examples have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the claims. Features described in the preceding description may be used in combinations other than the combinations explicitly described above. Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain examples, those features may also be present in other examples whether described or not. The term ‘a’ or ‘the’ is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use ‘a’ or ‘the’ with an exclusive meaning then it will be made clear in the context. In some circumstances the use of ‘at least one’ or ‘one or more’ may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer any exclusive meaning. The presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features). The equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way. The equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result. In this description, reference has been made to various examples using adjectives or adjectival phrases to describe characteristics of the examples. Such a description of a characteristic in relation to an example indicates that the characteristic is present in some examples exactly as described and is present in other examples substantially as described. Whilst endeavoring in the foregoing specification to draw attention to those features believed to be of importance it should be understood that the Applicant may seek protection via the claims in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not emphasis has been placed thereon. I/we claim: