Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DEVICES AND METHODS FOR PERSONALIZED BINAURAL AUDIO RENDERING
Document Type and Number:
WIPO Patent Application WO/2023/143727
Kind Code:
A1
Abstract:
A data processing apparatus (110) is disclosed for providing a head-related transfer function, HRTF, and/or head-related impulse response, HRIR, personalized for a user (120). The data processing apparatus (110) comprises a processing circuitry (111) configured to obtain information about a pinna concha height value and a pinna concha width value of an ear of the user (120) and provide at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120). The data processing apparatus (100) allows providing quick, and efficienthigh quality personalized HRTFs and/or HRIRs for a large number of users which help to improve spatial localization and timbral accuracy of binaural audio.

Inventors:
DINAKARAN MANOJ (DE)
PANG LIYUN (DE)
GROSCHE PETER (DE)
POLLOW MARTIN (DE)
Application Number:
PCT/EP2022/052016
Publication Date:
August 03, 2023
Filing Date:
January 28, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
DINAKARAN MANOJ (DE)
International Classes:
H04S1/00
Domestic Patent References:
WO2012104297A12012-08-09
Foreign References:
US20060274901A12006-12-07
US20130169779A12013-07-04
US20190014431A12019-01-10
Other References:
ZOTKIN D N ET AL: "HRTF personalization using anthropometric measurements", APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2003 IEEE WO RKSHOP ON. NEW PALTZ, NY, USA OCT,. 19-22, 2003, PISCATAWAY, NJ, USA,IEEE, 19 October 2003 (2003-10-19), pages 157 - 160, XP010697926, ISBN: 978-0-7803-7850-6, DOI: 10.1109/ASPAA.2003.1285855
DELLEPIANE M ET AL: "Reconstructing head models from photographs for individualized 3D-audio processing", COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS, WILEY-BLACKWELL, OXFORD, vol. 27, no. 7, 23 January 2009 (2009-01-23), pages 1719 - 1727, XP071487533, ISSN: 0167-7055, DOI: 10.1111/J.1467-8659.2008.01316.X
Attorney, Agent or Firm:
KREUZ, Georg M. (DE)
Download PDF:
Claims:
CLAIMS

1. A data processing apparatus (110) for providing a head-related transfer function, HRTF, and/or head-related impulse response, HRIR, personalized for a user (120), wherein the data processing apparatus (110) comprises a processing circuitry (111) configured to: obtain information about a pinna concha height value and a pinna concha width value of an ear of the user (120); and provide at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120).

2. The data processing apparatus (110) of claim 1 , wherein the data processing apparatus (110) is configured to obtain an image of the ear of the user (120) and to determine, based on the image, the pinna concha height value and the pinna concha width value for obtaining the information about the pinna concha height value and the pinna concha width value of the ear of the user (120).

3. The data processing apparatus (110) of claim 1 or 2, wherein the data processing apparatus (110) further comprises a memory (115) configured to store a plurality of 3D mesh models for a plurality of pinna concha height anchor values and a plurality of pinna concha width anchor values.

4. The data processing apparatus (110) of any one of the preceding claims, wherein for providing the at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120) the processing circuitry (111) is configured to: select at least one of a plurality of 3D mesh models of at least a portion of a generic human head, including an ear, based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120); and generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model.

5. The data processing apparatus (110) of claim 4, wherein the processing circuitry

(111) is configured to select the at least one 3D mesh model based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120) by selecting those 3D mesh models of the plurality of 3D mesh models, for which an absolute difference between the pinna concha height value of the ear of the user (120) and the pinna concha height anchor value is smaller than a threshold and/or an absolute difference between the pinna concha width value of the ear of the user (120) and the pinna concha width anchor value is smaller than a threshold.

6. The data processing apparatus (110) of claim 5, wherein the threshold is about 0.3 cm.

7. The data processing apparatus (110) of claim 5 or 6, wherein the processing circuitry (111) is configured to generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model by adjusting the selected 3D mesh model such that an absolute difference between the pinna concha height value of the ear of the user (120) and a pinna concha height value of the adjusted selected 3D mesh model is smaller than a resolution threshold and/or an absolute difference between the pinna concha width value of the ear of the user (120) and a pinna concha width value of the adjusted selected 3D mesh model is smaller than a resolution threshold and by determining the at least one personalized HRTF and/or HRIR based on the adjusted selected 3D mesh model.

8. The data processing apparatus (110) of claim 7, wherein the resolution threshold is between 0.020 and 0.030 cm.

9. The data processing apparatus (110) of any one of the preceding claims, wherein the data processing apparatus (110) further comprises a memory (115) configured to store the at least one personalized HRTF and/or HRIR as at least one anchor HRTF and/or HRIR together with the pinna concha height value and the pinna concha width value of the ear of the user (120).

10. The data processing apparatus (110) of claim 1 , wherein for providing the at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120) the processing circuitry (111) is configured to select the at least one personalized HRTF and/or HRIR by selecting those anchor HRTFs and/or HRIRs of a plurality of anchor HRTFs and/or HRIRs for which an absolute difference between the pinna concha height value of the ear of the user (120) and a pinna concha height value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold and/or an absolute difference between the pinna concha width value of the ear of the user (120) and a pinna concha width value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold.

11. The data processing apparatus (110) of claim 10, wherein the resolution threshold is between 0.020 and 0.030 cm.

12. An audio rendering apparatus (100) for rendering of an input signal, wherein the audio rendering apparatus (100) comprises: a data processing apparatus (110) of any one of the preceding claims; and one or more transducers (101a, 101b) configured to render one or more audio signals based on the input signal and the at least one personalized HRTF and/or HRIR provided by the data processing apparatus (110).

13. The audio rendering apparatus (100) of claim 12, wherein the one or more transducers (101a, 101b) comprise one or more headphone transducers (101a, 101b).

14. A cloud server (200) for providing a head-related transfer function, HRTF, and/or a head-related impulse response, HRIR, personalized for a user (120), wherein the cloud server (200) comprises: a communication interface (203) configured to receive information about a pinna concha height value and a pinna concha width value of the user (120); and a data processing apparatus (110) according to any one of claims 1 to 11 ; wherein the communication interface (203) is further configured to transmit the at least one personalized HRTF and/or HRIR to an audio rendering apparatus (100).

15. The cloud server (200) of claim 14, wherein the information about the pinna concha height value and the pinna concha width value of the user (120) comprises an image of an ear of the user (120).

16. A data processing method (500) for providing a head-related transfer function, HRTF, and/or a head-related impulse response, HRIR, personalized for a user (120), wherein the data processing method (500) comprises: obtaining (501) information about a pinna concha height value and a pinna concha width value of an ear of the user (120); and providing (503) at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user (120).

17. A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (500) of claim 16, when the program code is executed by the computer or the processor.

Description:
Devices and methods for personalized binaural audio rendering

TECHNICAL FIELD

The present disclosure relates to audio processing and audio rendering in general. More specifically, the disclosure relates to devices and methods for personalized binaural audio rendering.

BACKGROUND

Binaural rendering may be used for rendering 3D audio over headphones based on spatial filters known as head-related transfer functions (HRTFs) or head-related impulse responses (HRIRs). These filters describe how a sound source at any given angle with respect to the head of a listener results in time, level and spectral differences of the received signals at the ear canals of the listener. However, these spatial filters are unique to the individual listener, since they depend on the anatomic details of the head and the ears of the listener. Generic HRTFs or HRIRs based on averaged head and ear shapes are typically used but have drawbacks in terms of incorrect perception of location of rendered sound sources as well as tonality. Personalized HRTFs or HRIRs, i.e. HRTFs or HRIRs adapted to the individual listener, provide an improved audio experience, but are more difficult to obtain.

The most precise approach to obtain individual HRTFs or HRIRs personalized for a user is a direct acoustic measurement. This, however, requires a special setup in an anechoic chamber and many speakers set ups with a microphone inserted at the ear canal of the respective user making it highly impractical, if the number of users is large.

Another approach for obtaining individual HRTFs or HRIRs personalized for a user is based on a high-quality 3D scanning of the head and pinna of the respective user and using the scans for determining the individual HRTFs or HRIRs using numerical methods. Due to the necessity to obtain the high-quality 3D scans of the head and pinna of the respective use also this approach is highly impractical, if the number of users is large.

Another more practical approach for obtaining personalized HRTFs or HRIRs for a large number of users is to find, based on the respective user's anthropometric features, the best match from a database of HRTFs or HRIRs. The main challenge for this approach is that there are around 27 anthropometric parameters and that there usually is no proper correlation between the anthropometric parameters and HRTFs or HRIRs. Thus, using the entire set of anthropometric parameters may degrade the process of finding the best match HRTFs from the database.

SUMMARY

It is an objective to provide improved devices and methods for providing personalized HRTFs or HRIRs for personalized binaural audio rendering.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect a data processing apparatus is provided for providing a head- related transfer function, HRTF, and/or head-related impulse response, HRIR, personalized for a user. The data processing apparatus comprises a processing circuitry configured to obtain information about a pinna concha height value and a pinna concha width value of an ear of the user. Moreover, the processing circuitry is further configured to provide at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user.

The data processing apparatus according to the first aspect addresses the dual challenge of finding the most influential anthropometric parameters and creating an optimal database covering the entire range of the influential anthropometric parameters. The data processing apparatus according to the first aspect allows providing quick, and efficient high quality personalized HRTFs and/or HRIRs for a large number of users which help to improve spatial localization and timbral accuracy of binaural audio.

In a further possible implementation form, the data processing apparatus is configured to obtain an image of the ear of the user and to determine, based on the image, the pinna concha height value and the pinna concha width value for obtaining the information about the pinna concha height value and the pinna concha width value of the ear of the user.

In a further possible implementation form, the data processing apparatus further comprises a memory configured to store the plurality of 3D mesh models for a plurality of pinna concha height anchor values and a plurality of pinna concha width anchor values. The memory may comprise a database to store the plurality of 3D mesh models as a 2D array of mesh models for the plurality of pinna concha height anchor values and the plurality of pinna concha width anchor values. In a further possible implementation form, for providing the at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user the processing circuitry is configured to select at least one of a plurality of 3D mesh models of at least a portion of a human head, including an ear, based on the information about the pinna concha height value and the pinna concha width value of the ear of the user, and generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model. The human head may be a generic human head or a real human head.

In a further possible implementation form, the processing circuitry is configured to select the at least one 3D mesh model based on the information about the pinna concha height value and the pinna concha width value of the ear of the user by selecting those 3D mesh models of the plurality of 3D mesh models, for which an absolute difference between the pinna concha height value of the ear of the user and the pinna concha height anchor value is smaller than a threshold and/or an absolute difference between the pinna concha width value of the ear of the user and the pinna concha width anchor value is smaller than a threshold.

In a further possible implementation form, the threshold is about 0.3 cm.

In a further possible implementation form, the processing circuitry is configured to generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model by adjusting the selected 3D mesh model such that an absolute difference between the pinna concha height value of the ear of the user and a pinna concha height value of the adjusted selected 3D mesh model is smaller than a resolution threshold and/or an absolute difference between the pinna concha width value of the ear of the user and a pinna concha width value of the adjusted selected 3D mesh model is smaller than a resolution threshold and by determining the at least one personalized HRTF and/or HRIR based on the adjusted selected 3D mesh model.

In a further possible implementation form, the resolution threshold is between 0.020 and 0.030 cm. The resolution threshold may be about 0.025 cm.

In a further possible implementation form, the memory of the data processing apparatus is configured to store the at least one personalized HRTF and/or HRIR as at least one anchor HRTF and/or HRIR together with the pinna concha height value and the pinna concha width value of the ear of the user. This allows generating a database of personalized HRTFs and/or HRIRs for a plurality of pinna concha height and width values.

In a further possible implementation form, for providing the at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user the processing circuitry is configured to select the at least one personalized HRTF and/or HRIR by selecting those anchor HRTFs and/or HRIRs of a plurality of anchor HRTFs and/or HRIRs for which an absolute difference between the pinna concha height value of the ear of the user and a pinna concha height value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold and/or an absolute difference between the pinna concha width value of the ear of the user and a pinna concha width value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold.

In a further possible implementation form, the resolution threshold is between 0.020 and 0.030 cm. The resolution threshold may be about 0.025 cm.

According to a second aspect an audio rendering apparatus for rendering of an input signal is provided. The audio rendering apparatus comprises a data processing apparatus according to the first aspect. Moreover, the audio rendering apparatus comprises one or more transducers configured to render one or more audio signals based on the input signal and the at least one personalized HRTF and/or HRIR provided by the data processing apparatus. The audio rendering apparatus according to the second aspect may be a mobile phone. The audio rendering apparatus according to the second aspect may comprise a camera for capturing the image of the ear of the user.

In a further possible implementation form, the one or more transducers comprise one or more headphone transducers.

According to a third aspect a cloud server for providing a head-related transfer function, HRTF, and/or a head-related impulse response, HRIR, personalized for a user is provided. The cloud server comprises a communication interface configured to receive information about a pinna concha height value and a pinna concha width value of the user. Moreover, the cloud server comprises a data processing apparatus according to the first aspect. The communication interface of the cloud server is further configured to transmit the at least one personalized HRTF and/or HRIR to an audio rendering apparatus. The at least one personalized HRTF and/or HRIR may be transmitted via a communication network, such as the Internet to the audio rendering apparatus.

In a further possible implementation form, the information about the pinna concha height value and the pinna concha width value of the user comprises an image of an ear of the user.

According to a fourth aspect a data processing method for providing a head-related transfer function, HRTF, and/or a head-related impulse response, HRIR, personalized for a user is provided. The data processing method comprises: obtaining information about a pinna concha height value and a pinna concha width value of an ear of the user; and providing at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user.

The method according to the fourth aspect can be performed by the data processing apparatus according to the first aspect. Thus, further features of the method according to the fourth aspect result directly from the functionality of the data processing apparatus according to the first aspect as well as its different implementation forms and embodiments described above and below.

According to a fifth aspect a computer program product is provided, comprising a computer- readable storage medium for storing program code which causes a computer or a processor to perform the method according to the third aspect, when the program code is executed by the computer or the processor.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Fig. 1 is a schematic diagram illustrating an audio rendering apparatus comprising a data processing apparatus according to an embodiment for providing personalized HRTFs or HRIRs; Fig. 2 is a schematic diagram illustrating a cloud server comprising a data processing apparatus according to an embodiment for providing personalized HRTFs or HRIRs to an audio rendering apparatus;

Fig. 3 is a diagram illustrating an algorithm for generating personalized HRTFs or HRIRs based on 3D mesh models implemented by a data processing apparatus according to an embodiment;

Fig. 4 is a schematic diagram illustrating steps implemented by a data processing apparatus according to an embodiment for generating a database of HRTFs or HRIRs and providing personalized HRTFs or HRIRs based on the database; and

Fig. 5 is a flow diagram illustrating a data processing method for providing personalized HRTFs or HRIRs according to an embodiment.

In the following, identical reference signs refer to identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description, reference is made to the accompanying figures, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

Figure 1 is a schematic diagram illustrating an audio rendering apparatus 100 according to an embodiment for binaural rendering of an input signal. In an embodiment, the audio rendering apparatus 100 may be a mobile phone 100 and may further comprise a communication interface 103 and a camera 107 as illustrated in figure 1.

As illustrated in figure 1 , the audio rendering apparatus 100 comprises a left ear transducer 101a, e.g. loudspeaker 101a configured to generate a left ear audio signal based on a left ear transducer driver signal and a right ear transducer 101b, e.g. loudspeaker 101b configured to generate a right ear audio signal based on a right ear transducer driver signal for a user 120. In an embodiment, the transducers 101a, 101b may be implemented in the form of headphone transducers 101a, 101b.

Moreover, the audio rendering apparatus 100 shown in figure 1 comprises a data processing apparatus 110 according to an embodiment. As will be described in more detail below, the data processing apparatus 110 is configured to provide a head-related transfer function, HRTF, and/or head-related impulse response, HRIR, personalized for the user 120. To this end, the data processing apparatus 110 comprises a processing circuitry 111 , e.g. one or more processors 111 configured to obtain information about a pinna concha height value and a pinna concha width value of an ear of the user 120 and provide at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120. The processing circuitry 111 may be implemented in hardware and/or software and may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. As illustrated in figure 1, the data processing apparatus 110 may further comprise a memory 115 configured to store executable program code which, when executed by the processing circuitry 111, causes the data processing apparatus 110 to perform the functions and methods described herein.

The transducers 101a, 101b of the audio rendering apparatus 100 are configured to render one or more audio signals based on an input signal and the at least one personalized HRTF and/or HRIR provided by the data processing apparatus 110. Figure 2 is a schematic diagram illustrating a cloud server 200 comprising the data processing apparatus 110 according to an embodiment for providing personalized HRTFs or HRIRs to an audio rendering apparatus 100 according to a further embodiment. By means of the data processing apparatus 110 the cloud server 200 is configured to provide a HRTF and/or a HRIR personalized for the user 120. To this end, as illustrated in figure 2, the cloud server 200 may further comprise a communication interface 203 configured to receive the information about the pinna concha height value and the pinna concha width value of the user 120, for instance, from the audio rendering apparatus 100. The communication interface 203 is further configured to transmit the at least one personalized HRTF and/or HRIR via a communication network, such as the Internet to the audio rendering apparatus 100 for enabling the audio rendering apparatus 100 to render audio based on the at least one personalized HRTF and/or HRIR.

In an embodiment, the information about the pinna concha height value and the pinna concha width value of the user 120 may comprises an image of an ear of the user 120. Thus, in an embodiment, the data processing apparatus 110 is configured to obtain an image of the ear of the user 120 and to determine, based on the image, the pinna concha height value and the pinna concha width value for obtaining the information about the pinna concha height value and the pinna concha width value of the ear of the user 120.

Further embodiments of the data processing apparatus 110 shown in figures 1 and 2 will be described in the following under further reference to figures 3 and 4. Figure 3 illustrates an algorithm for generating personalized HRTFs or HRIRs based on a plurality of 3D mesh models of at least a portion of a generic human head, including an ear, as implemented by the data processing apparatus 110 according to an embodiment, while figure 4 is a schematic diagram illustrating steps implemented by the data processing apparatus 110 according to an embodiment for generating a database 115a of HRTFs or HRIRs and providing personalized HRTFs or HRIRs based on the database 115a, which may be provided in the memory 115 of the data processing apparatus 110.

In an embodiment, the plurality of 3D mesh models of at least a portion of a generic human head, including an ear my be stored for a plurality of different pinna concha height anchor values and a plurality of different pinna concha width anchor values in the database 115a provided in the memory 115 of the data processing apparatus 110. In an embodiment, the plurality of different pinna concha height anchor values and a plurality of different pinna concha width anchor values may have values in the range from about 2.4 to 3.6 centimeters and from about 1.2 to 2.2 centimeters, respectively. In the embodiment described in the context of figure 3, for providing the at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120 the processing circuitry 111 of the data processing apparatus 110 is configured to select at least one of the plurality of 3D mesh models of at least a portion of a generic human head, including an ear, based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120. Moreover, the processing circuitry 111 of the data processing apparatus 110 is configured to generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model.

In an embodiment, the processing circuitry 111 of the data processing apparatus 110 is configured to select the at least one 3D mesh model based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120 by selecting those 3D mesh models of the plurality of 3D mesh models, for which an absolute difference between the pinna concha height value of the ear of the user 120 and the pinna concha height anchor value is smaller than a threshold and/or an absolute difference between the pinna concha width value of the ear of the user 120 and the pinna concha width anchor value is smaller than a threshold. In figure 3 this is illustrated as a plurality of orbits created around each of the plurality of different pinna concha height anchor values and the plurality of different pinna concha width anchor values. In an embodiment, the threshold may be about 0.3 centimeters for defining the plurality of orbits created around each of the plurality of different pinna concha height anchor values and the plurality of different pinna concha width anchor values.

In an embodiment, the processing circuitry 111 of the data processing apparatus 110 is configured to generate the at least one personalized HRTF and/or HRIR based on the selected 3D mesh model by adjusting the selected 3D mesh model such that an absolute difference between the pinna concha height value of the ear of the user 120 and a pinna concha height value of the adjusted selected 3D mesh model is smaller than a resolution threshold and/or the absolute difference between the pinna concha width value of the ear of the user 120 and a pinna concha width value of the adjusted selected 3D mesh model is smaller than a resolution threshold and by determining the at least one personalized HRTF and/or HRIR based on the adjusted selected 3D mesh model, for instance, by means of numerical simulations based on the adjusted selected 3D mesh model. In an embodiment, the resolution threshold is between 0.020 and 0.030 centimeters, in particular 0.025 centimeter. In an embodiment, the data processing apparatus 110 is configured to store the such determined personalized HRTF and/or HRIR as at least one anchor HRTF and/or HRIR together with the pinna concha height value and the pinna concha width value of the ear of the user 120 in the database 115a.

As already mentioned above, figure 4 is a schematic diagram illustrating steps implemented by the data processing apparatus 110 according to an embodiment for generating the database 115a of personalized HRTFs or HRIRs and providing the personalized HRTFs or HRIRs from the database 115a. As will be described in more detail below, in the embodiment shown in figure 4, for providing the personalized HRTFs and/or HRIRs based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120 the processing circuitry 111 of the data processing apparatus 100 is configured to select the personalized HRTFs and/or HRIRs by selecting those anchor HRTFs and/or HRIRs of the plurality of anchor HRTFs and/or HRIRs stored in the database 115a for which an absolute difference between the measured pinna concha height value of the ear of the user 120 and the pinna concha height value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold and/or an absolute difference between the measured pinna concha width value of the ear of the user 120 and a pinna concha width value associated with the respective anchor HRTF and/or HRIR is smaller than a resolution threshold.

In a stage 401 illustrated in an upper portion of figure 4, based on an image of the ear of the user 120 the pinna concha height and width values of the ear of the user 120 are measured, i.e. determined. In a further stage 403 of figure 4, for the measured pinna concha height and width values of the ear of the user 120 the best matching anchor HRTFs and/or HRIRs are selected from the database 115a. In a further stage the best matching anchor HRTFs and/or HRIRs are provided to the audio rendering apparatus 100 for rendering, for instance, left and a right audio signal based on the best matching anchor HRTFs and/or HRIRs.

The lower portion of figure 4 illustrates different processing stages implemented by the data processing apparatus 110 for creating the database 115a. In stages 411 , 413 of figure 4 the range of models may be grouped based on limits for the pinna concha height and width values. In a stage 415 of figure 4 the best matching anchor 3D mesh model is selected. In a stage 417 of figure 7, the best matching anchor 3D mesh model is adjusted, i.e. deformed to match the measured pinna concha height and width values of the ear of the user 120 in the way already described above in more detail. By doing this for a plurality of different measured pinna concha height and width values of the ear of the user 120 the database 115a is generated. As will be appreciated, the database 115a of HRTFs and/or HRIRs may be created offline and stored in the cloud (e.g. for the embodiment shown in figure 2). So, in an embodiment, only the stages illustrated in the upper portion of figure 4 need to be performed online based on the two pinna measurements resulting in a very rapid generation of the personalized HRTFs and/or HRIRs for a given user 120. At runtime the HRTFs may be loaded by the audio rendering apparatus 100 that may produce the final headphone mix for 3D audio. Thus, no additional computation complexity is introduced at runtime.

In an embodiment, the distance between neighboring values of the plurality of different pinna concha height anchor values and the plurality of different pinna concha width anchor values may be in the range of 0.02 to 0.03 centimeters, in particular about 0.025 centimeters. It has been found that an anthropometric measurement resolution in the range of 0.02 to 0.03 centimeters, in particular about 0.025 centimeters is good enough as the changes in the interaural level difference and spectral difference are well below just noticeable difference values. In other words, when the change of these two pinnae anthropometric measurement values is less than, for instance, 0.025 centimeters, the difference in HRTFs cannot be perceived by the human ear. If the difference is greater than, for instance, 0.025 centimeters, then the difference between the HRTFs may be perceived.

Figure 5 is a flow diagram illustrating a data processing method 500 for providing a HRTF and/or a HRIR personalized for the user 120. The data processing method 500 comprises a step 501 of obtaining information about a pinna concha height value and a pinna concha width value of an ear of the user 120. The data processing method 500 further comprises a step 503 of providing at least one personalized HRTF and/or HRIR based on the information about the pinna concha height value and the pinna concha width value of the ear of the user 120.

The data processing method 500 can be performed by the data processing apparatus 110 according to an embodiment. Thus, further features of the data processing method 500 result directly from the functionality of the data processing apparatus 110 as well as its different embodiments described above and below.

The person skilled in the art will understand that the "blocks" ("units") of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step).

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.