Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ACOUSTIC ENVIRONMENT MAPPING
Document Type and Number:
WIPO Patent Application WO/2018/029341
Kind Code:
A1
Abstract:
A method for estimating a map of an acoustic environment including a set of known sources, a set of sensor arrays, and a central processing unit which is synchronized with the known sources and sensor arrays. The method includes sequentially emitting a known measurement signal from each known source, and, in each sensor array, estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and computing a quality measure of the estimated position. The position estimates and quality measures of each position estimate are transmitted to the central processing unit, which aligns all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.

Inventors:
KJÆR NIELSEN JESPER (DK)
Application Number:
PCT/EP2017/070429
Publication Date:
February 15, 2018
Filing Date:
August 11, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BANG & OLUFSEN AS (DK)
International Classes:
H04S7/00; G01S5/18
Foreign References:
US20120295637A12012-11-22
CN102901949A2013-01-30
US8279709B22012-10-02
Other References:
ANDREAS M ALI ET AL: "An Empirical Study of Collaborative Acoustic Source Localization", INFORMATION PROCESSING IN SENSOR NETWORKS, 2007. IPSN 2007. 6TH INTERN ATIONAL SYMPOSIUM ON, IEEE, PI, 1 April 2007 (2007-04-01), pages 41 - 50, XP031158396, ISBN: 978-1-59593-638-7
LEWIS GIROD ET AL: "The design and implementation of a self-calibrating distributed acoustic sensing platform", SENSYS'06 : PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS : OCT. 31 - NOV. 3, 2006, BOULDER, COLORADO, USA, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, NY, USA, 31 October 2006 (2006-10-31), pages 71 - 84, XP058318518, ISBN: 978-1-59593-343-0, DOI: 10.1145/1182807.1182815
AYLLON DAVID ET AL: "Indoor Blind Localization of Smartphones by Means of Sensor Data Fusion", IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 65, no. 4, 1 April 2016 (2016-04-01), pages 783 - 794, XP011602096, ISSN: 0018-9456, [retrieved on 20160309], DOI: 10.1109/TIM.2015.2494629
CHRISTOPHER M. BISHOP.: "Pattern Recognition and Machine Learning", 2006, SPRINGER
FABIO CROSILLA; ALBERTO BEINAT: "Use of generalised Procrustes analysis for the pho-togrammetric block adjustment by independent models", ISPRS JOURNAL OF PHOTOGRAM-METRY AND REMOTE SENSING, vol. 56, no. 3, 2002, pages 195 - 209
J. R. JENSEN ET AL.: "On Frequency Domain Models for TDOA Estimation", PROC. IEEE INT. CONF. ACOUST., SPEECH, SIGNAL PROCESS, 2015
M. A. KOSCHAT; D. F. SWAYNE: "A weighted Procrustes criterion", PSYCHOMETRIKA, vol. 56, no. 2, 1991, pages 229 - 239
J. K NIELSEN ET AL.: "Grid Size Selection for Nonlinear Least-Squares Optimisation in Spectral Estimation and Array Processing", PROC. EUROPEAN SIGNAL PROCESSING CONF., 2016
Attorney, Agent or Firm:
AWAPATENT AB (SE)
Download PDF:
Claims:
CLAIMS

1 . A method for estimating a map of an acoustic environment including a set of known sources, comprising:

providing a set of sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;

providing a central processing unit, said central processing unit being synchronized with said known sources and with said sensor arrays;

sequentially emitting a known measurement signal from each known source;

in each sensor array:

- estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and

- computing a quality measure of the estimated position; transmitting the position estimates of each known source in each local coordinate system and the quality measure of each position estimate to the central processing unit; and

in the central processing unit, aligning all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.

2. The method according to claim 1 , wherein at least one sensor array is co-located with one of the known sources.

3. The method according to claim 1 or 2, wherein the acoustic environment includes more than two known sources.

4. The method according to any one of the preceding claims, wherein the known sources are loudspeakers.

5. The method according to any one of the preceding claims, wherein each sensor array has at least three microphones. 6. The method according to any one of the preceding claims, wherein the sensor array estimates the position in a two-dimensional plane.

7. The method according to claim 6, wherein each sensor array is a uniform circular array (UCA).

8. The method according to any one of the preceding claims, wherein the environment further comprises at least one unknown source which is unsynchro ni zed with said central processing unit, and wherein the method further comprises:

sequentially emitting an unknown signal from each unknown source; in each sensor array:

- estimating a direction of arrival (DOA) of the currently

emitting unknown source in the local coordinate system of the sensor array, and

- computing a quality measure of the estimated DOA;

transmitting DOA estimates of each unknown source in each local coordinate system and the quality measure of each DOA estimate to the central processing unit; and

in the central processing unit, determining the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.

9. The method according to claim 8, wherein the at least one unknown source includes a user at an unknown listening position.

10. A system for estimating a map of an acoustic environment including a set of known sources, comprising:

a set of sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;

a central processing unit, said central processing unit being

synchronized with said known sources and with said sensor arrays;

wherein each sensor array is configured to:

- sequentially receive a known measurement signal from each known source,

- estimate a position of the currently emitting known source in a local coordinate system of the sensor array,

- compute a quality measure of the estimated position, and

- transmit, to the central processing unit, position estimates of each known source in said local coordinate system and a quality measure of each position estimate; and

wherein the central processing unit is configured to align all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.

1 1 . The system according to claim 10, wherein at least one sensor array is co-located with one of the known sources.

12. The system according to claim 10 or 1 1 , wherein the acoustic environment includes more than two known sources.

13. The system according to any one of claims 10 - 12, wherein the known sources are loudspeakers. 14. The system according to any one of claims 10 - 13, wherein each sensor array has at least three microphones.

15. The system according to any one of claims 10 - 14, wherein each sensor array is configured to estimate the position in a two-dimensional plane.

1 6. The system according to claim 15, wherein each sensor array is a uniform circular array (UCA). 17. The system according to any one of claims 10 - 1 6, wherein the environment further comprises at least one unknown source which is unsynchronized with said central processing unit;

wherein each sensor array is further configured to:

- sequentially receive an unknown signal from each unknown source, - estimate a direction of arrival (DOA) of the currently emitting

unknown source in the local coordinate system of the sensor array,

- compute a quality measure of the estimated DOA, and

- transmit DOA estimates of each unknown source in each local coordinate system and a quality measure of each DOA estimate to the central processing unit; and

wherein the central processing unit is further configured to determine the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.

18. The system according to claim 17, wherein the at least one unknown source includes a user at an unknown listening position.

Description:
ACOUSTIC ENVIRONMENT MAPPING

Field of the invention

The present invention relates to estimation of loudspeaker topology, i.e. to determine a map of an acoustic environment including a set known sources and at least one unknown source. In particular, the present invention relates to estimation of loudspeaker positions relative to each other and to a listener. Background

The listening experience is highly influenced by the position of the loudspeakers relative to the listener and the room in which they are placed. For example, in a stereo set-up, the two loudspeakers and the listener should ideally be placed on the vertices of an even sided triangle, and in a surround set-up, the loudspeakers should be placed at certain angles on a circle centered on the listening position. Unfortunately, the loudspeakers and listener(s) are often not be placed in their ideal positions since other interior design considerations often take higher priority. However, if the positions of the loudspeakers and the listener are known, signal processing algorithms can to a certain extent compensate for the non-ideal positions.

As illustrated in figure 1 , an acoustic environment includes a set of loudspeakers 1 in an acoustic enclosure 2 such as a room, and a listener at a given position 3 in the enclosure 2. In order to achieve a satisfactory listening experience, the audio rendering should take the position of the loudspeakers and the listening position into account. In order to achieve that, a map containing the position of the loudspeakers and the listening position is required. Preferably, the map should also contain the orientation of the loudspeakers.

A map including the positon (and possibly also orientation) of the loudspeakers in an acoustic environment is not only useful in the above scenario. Future consumer audio will to a much wider extent be object-based, in which case it becomes more important for the rendering system to know the speaker layout. The typical approach to compensate for a non-ideal system layout is to measure the transfer function from all loudspeakers to the listening position. Unfortunately, this is quite inconvenient for the listener, and this approach does not directly create a speaker and listener map that can be used for rendering object-based audio faithfully.

Another approach is provided in US 8,279,709. Here, a microphone is placed on each speaker, and impulse responses from each speaker are measured to determine distances between each pair of speakers. The distance matrix is then used to estimate the position of each speaker.

However, this approach still does not provide the orientation of the loudspeakers. Also, the method in US 8,279,709 is sensitive for errors in the measurements by the microphones. Finally, the method requires a significant amount of data to be sent between each speaker and the central computation unit.

General disclosure of the Invention

It is an object of the present invention to at least mitigate some of the problems mentioned above, and provide an improved way to determine a map of an acoustic environment including a set of known sources and optionally one or more unknown sources.

According to a first aspect of the present invention, this and other objects are achieved by a method for estimating a map of an acoustic environment including a set of S known sources, comprising:

providing a set of M sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;

providing a central processing unit, said central processing unit being synchronized with said known sources and with said sensor arrays;

sequentially emitting a known measurement signal from each known source;

in each sensor array:

estimating a position of the currently emitting known source in a local coordinate system of the sensor array, and computing a quality measure of the estimated position;

transmitting position estimates of each known source in each local coordinate system and the quality measure of each position estimate to the central processing unit; and

in the central processing unit, aligning all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array..

According to a second aspect of the present invention, the above object is achieved by a system for estimating a map of an acoustic environment including a set of known sources, comprising:

a set of sensor arrays, each sensor array including at least two microphones and processing circuitry for processing sound signals received by the microphones;

a central processing unit, said central processing unit being

synchronized with said known sources and with said sensor arrays;

wherein each sensor array is configured to:

- sequentially receive a known measurement signal from each known source,

- estimate a position of the currently emitting known source in a local coordinate system of the sensor array,

- compute a quality measure of the estimated position, and

- transmit, to the central processing unit, position estimates of each known source in said local coordinate system and a quality measure of each position estimate; and

wherein the central processing unit is configured to align all local coordinate systems based on the position estimates, using the position estimate quality measure to give relatively higher weight to more accurate estimates than to less accurate estimates, thereby providing a map including a position of each known source, and a position and orientation of each sensor array.

According to these aspects, a set of M sensor arrays are used to create a map containing the positions and orientation of the sensor arrays, and the position of a set of known sources (e.g. loudspeakers).

The "known" sources are thus synchronized with the sensor arrays and with the central processing unit. In a loudspeaker setup, all loudspeakers are connected to the same system and synchronization within a few tens of a microsecond is typically required for playback of audio.

By emitting a known signal, the synchronized sensor array can determine distance as well as direction. Accordingly, the sensor array can determine the position of the source in its own, local coordinate system.

The sensor arrays and sources (referred to as slaves) are connected to the central processing unit (referred to as the master). As position estimates and quality measures are calculated locally in each sensor array, only low rate data transmission is required from the slaves to the master. The required data rate can typically be provided by available connections, without the need for high speed data transmission.

The sensor arrays and the known sources (i.e. loudspeakers) do not need to be co-located, and the orientation of the sensor arrays may be unknown. However, in one embodiment, at least one sensor array is co- located with one of the known sources. Thereby, the estimated orientation of the sensor array can be used to determine the orientation of the co-located known source.

Preferably, each sensor array includes at least three microphones when the positions are estimated in two-dimensional space, and at least four microphones when the positions are estimated in three-dimensional space.

According to one embodiment, the acoustic environment further includes one or several unknown sources (e.g. a user in a listening position), and the method further includes:

sequentially emitting an unknown signal from each unknown source; in each sensor array:

estimating a direction of arrival (DOA) of the currently emitting unknown source in the local coordinate system of the sensor array, and

computing a quality measure of the estimated DOA;

transmitting DOA estimates of each unknown source in each local coordinate system and the quality measure of each DOA estimate to the central processing unit; and

in the central processing unit, determining the position of each unknown source in the map based on the DOA estimates, using the DOA estimate quality measures to give relatively higher weight to more accurate estimates than to less accurate estimates.

Brief description of the drawings

The present invention will be described in more detail with reference to the appended drawings, showing currently preferred embodiments of the invention.

Figure 1 schematically shows an acoustic environment.

Figure 2 schematically shown an acoustic environment according to an embodiment of the present invention.

Figure 3 is a block diagram of a sensor array in figure 2.

Figure 4 is a flow chart of a method according to an embodiment of the present invention.

Figure 5 shows a map of sensor arrays and known sources determined according to an embodiment of the invention.

Figure 6 is a more detailed view of a part of the map in figure 5.

Figure 7 shows a map of sensor arrays, known sources and one unknown source determined according to an embodiment of the invention.

Figure 8 shows a map of sensor arrays, known sources and two unknown sources determined according to an embodiment of the invention.

Detailed description of preferred embodiments

As illustrated in figure 1 , an acoustic environment includes a set of S sources (loudspeakers) 1 placed in an acoustic enclosure such as a room 2. As illustrated in figure 2, a set of M sensor arrays 4 have been placed in the environment. As shown in figure 3, each sensor array 4 can include a plurality of microphones 5 and processing circuitry 6 for processing sound signals received by the microphones 5. A central processing unit 7, hereinafter referred to as "master", is connected to and synchronized with each sensor array 4. The sources 1 are also synchronized with the sensor arrays 4, and will therefore be referred to as "known" sources.

The environment may further include one or several "unknown" sources 8, i.e. sources which are not synchronized with the central

processing unit 7. An example of such an "unknown" source is a listener located at a given listening position 3.

According to the present invention, such a map is obtained by a method illustrated in figure 4. In brief, and with reference to figures 2 and 4, the method includes:

1 . In turn, the S known sources 1 emit a known source signal (step

S1 ) while the M sensor arrays 4 estimate the position of these sources in their own local coordinate system 1 1 (step S2). In addition to the position estimates, the sensor arrays 4 also compute a quality matrix describing the accuracy of the estimated positions (step S3).

2. In step S4, the position estimates of the S sources in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices. The master 7 now rotates and translates the local coordinate systems so that they fit (are aligned) as well as possible (step S5). The quality matrices are used in this process to ensure that the most accurate estimates have a higher weight than the less-accurate estimates. The result of the process is a map of all sensor arrays 4 and known sources 1 in a common coordinate system 12.

3. If the environment includes unknown sources 8, in step S6 , the R unknown sources 8 emit an unknown source signal while M sensor arrays 4 estimate the direction of arrival (DOA) in their own coordinate system 1 1 (step S7). The unknown source signals may be a talking person or a mobile phone. Since the source signal is non-synchronized and often in the far-field of the arrays, only the DOA and not the distance is estimated. Again, a quality matrix is computed describing the accuracy of the estimated DOAs (step S8).

4. In step S9, the DOA estimates of the R unknown sources 8 in the M local coordinate systems 1 1 are transmitted to the master 7 along with the quality matrices. The master 7 now finds the R points that best describe the estimated DOAs (step S10). The quality matrices are used in this process to ensure that the most accurate estimates have a higher weight than the less-accurate estimates. The result of the process is a map of all sensors and sources emitting a known or an unknown source signal.

There is no upper bound on the values of S, M, and R. However, at least two sensor arrays 4 are required to estimate the position of an unknown source 8. If only one sensor array 4 is present, only the direction of arrival (DOA) of an unknown source can be estimated. With two or more sensor arrays 4, the position of an unknown source 8 can be triangulated.

Figure 5 and 6 show an example of a map including four UCA sensor arrays, and four known sources (i.e. all source signals are assumed known and synchronized with the sensor arrays). The map has been determined with a signal-to-noise ratio of 20 dB, a data length of 999, and a sampling frequency of 4 kHz. The true positions of the sources and sensors are marked by dots and crosses, respectively. The estimated sensor positions are marked by circles and the estimated source positions are marked by stars. Finally, the quality matrices are illustrated as lines (which in fact are compressed ellipses) which indicate the one standard deviation uncertainty contours. Thus, bigger ellipses suggest more uncertain estimates and vice versa.

Aside from seeing that all the source and sensor locations are estimated with a very high precision, the most interesting part is the

uncertainty ellipses. The ellipses have almost no extension in the direction corresponding to range from the source, indicating that we are much more certain about the range than the DOA. This means that we get a much more accurate estimate of the sources if we use weighted Procrustes analysis.

Figure 6 shows a part of figure 5 in which the ellipses for one source are easier to see. Two ellipses are much bigger than the other two and this is hardly surprising since the corresponding sensor arrays are further away from this source.

If we assume that sources emitting an unknown source signal are in the near-field of the arrays, the positions of these sources can be estimated jointly with the positions of sources emitting a known source signal. In this case, step 3 above (steps S6-S8 in figure 4) is a part of step 1 , and step 4 (steps S9-S10) can be omitted. An illustration of such a joint estimation is given in figure 7 and 8.

In figure 7, the source signal of the source in (3, 0) is unknown, while in figure 8, the source signal of the sources in (3, 0) and in (2, 2) are unknown. The other sources are known, and other conditions are the same as in figures 5 and 6.

Clearly, the ellipses for the unknown sources are now much bigger indicating that they are much harder to estimate. However, the sources are still estimated fairly accurately since two of the arrays have a fairly good estimate of the source. This further demonstrates the power of weighted Procrustes analysis.

In the following, a more detailed disclosure of an example of the present invention will be provided.

Step 1: Source localisation

Many source localisation algorithms already exist in the scientific literature for various array geometries. In principle, any array geometry can be used as long as

• at least three sensors not on the same line are used for 2D source localisation, and

• at least four sensors not in the same plane are used for 3D source localisation.

For 2D localisation, we have used a uniform circular array (UCA) as the array geometry since the range and DO A estimation performance are independent of the direction of the source. To estimate the source position using a UCA, we use the maximum likelihood (ML) estimator in [5] since it is optimal in a statistical sense. Assuming white and Gaussian measurement noise, the covariance matrix of the estimator is the inverse Fisher information matrix for large enough data sizes. We use this matrix as an inverse quality matrix, and, for a source in the far field 1 , it is given by

where x is the source position, d is the distance to the source in meters, Θ is the DOA in radians, β is the gain for the source to the sensor array, N is the number of

data points, K is the number of sensors in the array, r is the array radius in meters, c is the propagation speed, is the z ' th DFT coefficient of the source signal, and σ 2 is the estimated noise variance.

To find the quality matrix, we invert and obtain

Clearly, the difference between how accurately we can estimate the x- and y-coordinates of x varies with Θ and can be huge. For example, the error in the x-coordinate can be expected to be 2d 2 /r 2 — 1 times bigger than the error in the y-coordinate when θ = π/2\ This can be a lot depending on d and r. By including these quality matrices as inverse weighting matrices in step two, we automatically include the differences in uncertainties.

A quality matrix for a source in the near-field can also be found (see details later). Step two: Combining independent source localisation estimates

The algorithm for combining the independent source localisation estimates into one map is described below. The algorithm is a variant of weighted generalised orthogonal Procrustes analysis.

Assume that the true coordinates of S sources in a reference coordinate system are given as the columns in the matrix X. In the coordinate system of the m'th sensor array these global coordinates are observed rotated and translated as

where Q m and t m are a rotation and a translation matrix, respectively. Unfortunately, we do not observe X m directly, but only the noisy version

where e m = vec(E m ) is a white Gaussian noise vector with an unknown variance, and W m is a block diagonal matrix of the form

The weighting matrix V mn is a 2 x 2 matrix proportional to the inverse FIM in (1.1) (see step one above). Combining these two e uations gives the signal model

for m = 1, . . . , M where

The task is now to estimate X given the weighting matrices and the observations

By stacking all the y m 's on top of each other for m = 1, . . . , M, we obtain the signal model where

For a known Q, the weighted least squares estimate of X and can easily be retrieved

from the weighted least squares estimate of z, which is given by

For white Gaussian noise with an unknown variance σ 2 , the maximum likelihood estimator of Q is

It is well known from generalised Procrustes analysis, that a closed-form solution to the above problem is not available unless M = 2 AND the same weights are applied to each column of E m . In this case, a D-dimensional eigenvalue decomposition can be used in the computation of Q Q 2 [2]. If M > 2 AND the same weights are applied to each column of E m , the estimates of X and Q are computed iteratively as detailed in [2]. In this case, Q m is estimated from (1.8) for m = 1, . . . , M as the solution to

Although this looks complicated, the estimate of Q m is the result of an eigenvalue decomposition WHEN the same weights are applied to each column of E m . Since the uncertainty in the x- and y-coordinates can be far from satisfying this condition in our case, we will not describe the detailed solution here. Instead, we will seek to find a solution for a general weighting matrix. According to [2], (1.20) can only be solved iteratively and they refer to [4]. In [4], an iterative algorithm has been suggested, but it seems to be very sensitive to the starting point. Specifically, the authors suggest that at least 20 random starting points should be tried out, and that the unweighted solution is not suitable to use as a starting point. This is a major drawback of the algorithm, and we therefore suggest that something else is done. In our initial case, the dimension of the problem is D = 2 so the rotation matrix can be written as

Thus, the complete problem has M — 1 nonlinear parameters. For many loudspeakers, it might be computationally very intensive to optimise such a high-dimensional nonlinear cost function (especially, if we move to 3D), so we instead attack the problem as it is traditionally solved in generalised orthogonal Procrustes analysis. That is, we have to solve a number of ID nonlinear optimisation problems where the objective is given in (1.20). In 3D, we instead get a series of 2D nonlinear optimisation problems which are not too costly to solve.

Step 3: DOA Estimation

In the third step, the K sources emitting an unknown source signal emit these in turn. In a very similar fashion to step 1, the sensor arrays produce DOA estimates and quality measures. The ML estimator is only a minor modification of the estimator in step 1. The inverse FIM for this estimator is given by

where we have used the same notation as above. Asymptotically, the ML estimator is efficient and distributed as a Gaussian. Since the DOA is periodic in 2π, we therefore model the distribution of the DOA estimator with the circular normal distribution (Von Mises distribution) which is given by [1, pp. 105-110]

where Ιο (κ) is the zeroth-order Bessel function of the first kind and given by

For , the Von Mises distribution converges to the normal distribution with mean θ 0 and

variance 1 /κ. Since we expect to be small, we can therefore set κ as

Since the source signal is unknown, we know neither α, nor β. However, we can estimate at all sensors. Step 4: Source Position Estimation

Assume that the unknown source is positioned at location z in a reference coordinate system. In the coordinate system of sensor array m, this point is given by

where Q m and t m are the rotation and translation vectors estimated in step 2. Since the source signal is unknown, we only estimate the angle 9 m of the polar representation of z ra (z), but not the length. In terms of the source and sensor array position, this angle is given by

where e m is the unit vector in the m'th dimension. As alluded to above, we assume that our estimator is unbiased and distributed as a Von Mises random variable. Thus, we model the m'th estimator as

where is the DOA estimate. Since it is reasonable to assume, that M sensor arrays produce observation errors independently, we therefore model the joint distribution as

Clearly, the maximum likelihood estimator of z is now the argument maximizing the above distribution w.r.t. z. Thus, we have to solve the optimisation problem

where S is the set of sensor array positions. By inserting the definition of 9 m (z) and by doing a number of manipulations, the above objective function can be rewritten into

where

In the case of M = 2, the solution to the above problem is given by

For M > 2, a solution can be found by doing a search on a 2D grid. Details of step 1 and step 3

In the description above, many details were left out of step 1 and step 3 of the four step procedure for the sake of clarity. Below, however, all the details will be described. The description also includes an extention of the algorithm which compensates for non-ideal microphones responses. Practical microphones do not have an ideal response, and this will affect the source localisation of step 1 and step 3. If the microphone responses are known in terms of impulse responses, however, the above algorithm can be extended to take non-ideal microphones into account, thus producing more accurate estimates.

Modelling periodic signals

The reason for considering the periodic signal model is that

1. a time-shift of a finite-length periodic signal leads to a phase shift in the frequency domain

2. the filtering of an N-periodic signal with an N-length FIR filter can be implemented using circular convolution without any zero-padding.

The applicability of these two properties will be apparent later. First, however, we will consider the model of such a periodic signal. Any periodic signal can be written as

fo is the fundamental frequency in radians /sample, Ai > 0 is the amplitude of the Z'th harmonic component, is the phase of the I'th harmonic component, and L < [N/2J . Using Euler's identity, we obtain that

where

In vector form, the signal model can be written as

where we have defined

We now make the assumptions that the fundamental frequency is

This assumption is the same as saying that the signal is periodic in N. When we can design the signal, this assumption is easily satisfied. However, if we cannot design the signal, the above is an assumption which is used (nearly always and often implicitly [3]) for source localisation. We also set the number of harmonic components to the maximum of

This is not a critical assumption when we assume the signal to be known since we can always set some to zero.

The above assumptions were made to facilitate a fast implementation using a fast Fourier transform. To see this, we will reformulate the above signal model slightly. We first define the DFT matrix

where, as assumed above . It then follows that

when L = [N/2\ and

Moreover, it turns out that it is also desirable to establish a link between the DFT of a periodic signal and the vector UL - This is given by

where of s (0) and

Although not obvious, the vector is real-valued and contains the unique elements from the DFT.

Using these expressions, we obtain an alternative formulation for the periodic signal which is given by

where

Linear filtering of a periodic signal

Filtering a signal s(n) through an M-dimensional filter can be written as

where impulse response of the filter. If we have N such filtered samples, we

can write this as

where we have defined

If the signal s (n) is periodic in N, we can then rewrite the above as

where we have now defined

The two equations are identical since s (n + N) = s (n) when s(n) is a periodic signal (in N) . The reason for considering the second form is that H is a circulant matrix. Thus, it can be diagonalised with the discrete Fourier transform so that

where

Modelling sensor array measurements from a periodic source

Assume that we have K sensors, each with their own direction-dependent impulse response vector where is the position the source. The source emits an N-periodic signal x{0)

which is received by each sensor samples later. Using the results from the previous two

sections, we can model the k'th. received sensor signal as

where the last equality follows since The scalar is a gain factor from the

source to the k'th sensor. In the far-field model, all are the same whereas we assume that the near-field model is given by

If we insert the near-field model for β k in the signal model, we obtain

If we now concatenate all of the sensor data into one big vector, we obtain that

Unknown source signal

If the source signal s (0) is unknown, we cannot distinguish between β and s (0) . We, therefore, set β = 1 and keep in mind that we are estimating a scaled version of the source signal. Given the source position the least-squares estimate of the source signal is

Note that Q {rjk) and A] ( (p s ) are diagonal matrices so no matrix inversions are necessary. Also note th

In practice, however, we can set also for an even data length and non-integer r\ k since anti-aliasing filters will ensure that H for the frequency

The nonlinear least squares (NLS) estimate of the source is now given by

where the NLS objective is given by

Known source signal

If the source signal s (0) is known, we can estimate the non-negative gain parameter β. However, since the gain factor is non-negative, we have to solve a constrained optimisation problem of the form

Fortunately, it is easy to modify the standard unconstrained solution to give a solution to the constrained optimisation problem. First, suppose p s is given. Then, the unconstrained solution for β is the least squares solution which is given by

Note that the denominator is easily evaluated since are diagonal matrices. If the constraint is inactive. However, the constraint is active and β = 0.

Thus, we have that

We now insert this estimate into the LS objective and obtain

where

Since the denominator o is positive, and does not depend on the source location, we

can rewrite the optimisation problem as

where

The non-zero part of the objective can be written as

Since the dominator is non-zero and the numerator is the squared value of we can rewrite the optimisation problem into its final form of

where the NLS objective is given by

2D uniform circular array

So far, we have not assumed a particular array structure. The array structure establishes a map between a source location and the the delays That is, we can model the array structure as a function Here, we will consider a 2D uniform circular array (UCA) in

both the near- and the far-field. In 2D, we can write the source location in polar coordinates as

where ] are the distance to the source in metres and the direction to the

source in radians, respectively. Similarly, we also have known microphone locations in either Euclidean or polar coordinates. For the k'th microphone, we have that

For a UCA centred in the origin of the coordinate system

In the near-field, we have that

where f s and c are the sampling frequency in Hz and the propagation speed in metres per second, respectively. The propagation speed depends on the temperature Γ in degrees Celsius approximately via

In the far-field, we have that

The interesting thing about the far-field model is that the distance d and the angle Θ are separated into two different terms. Thus, inserting the far-field expression for into the vector gives

where

The angles ζ and p k represent the distance to the source and the k'th microphone in radians, respectively. The definitions are straight-forward, but a bit cumbersome to write down in general. However, when is an integer, we have that

meaning that ιι (ζ) is a DFT vector. The definition of Β (θ) depends on whether N is even or uneven, even when is an integer. When this is the case, we have that

Quality matrices

To derive the quality (or weighting) matrices, we first have to compute the Fisher information matrix (FIM) for the case of a known and an unknown source signal.

Recall that the signal model for the microphone data are given by

where

If we assume that the noise is white and Gaussian with an unknown variance c 2 , the vector y is distributed as

where ϋ contains the unknown model parameters, except c 2 , and

For the case of an unknown source si nal, we assume that β = 1 so that

For the case of a known source signal, we have that

Thus, we have to find the derivatives of the mean vector w.r.t. these parameters to

find an expression for the FIM. Since p s is a common parameter for both a known and an unknown source signal, we will start with this. When we do not assume that (it is not

differentiable), it is relative straight-forward to derive that

where

Quality matrix for a known source signal

When the source signal is known, we have to find the derivative w.r.t. β. This is

Thus, the FIM is

where

The inverse FIM is, therefore, iven by

from with we can extract the inverse quality matrix to

Quality matrix for an unknown source signal

When the source signal is unknown, we set β = 1 and have to find the derivative w.r.t. s (0) . This is Thus, the FIM is where

The inverse FIM is, therefore, given b

from with we can extract the inverse quality matrix to

Note that some of the DFT matrices F cancel when calculating the term BA 1 B nT

Bibliography

[1] Christopher M. Bishop. Pattern Recognition and Machine Learning. New York, NY, USA:

Springer, 2006. ISBN: 0387310738.

[2] Fabio Crosilla and Alberto Beinat. "Use of generalised Procrustes analysis for the pho- togrammetric block adjustment by independent models". In: ISPRS Journal of Photogram- metry and Remote Sensing 56.3 (2002), pp. 195-209.

[3] J. R. Jensen et al. "On Frequency Domain Models for TDOA Estimation". In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2015.

[4] M. A. Koschat and D. F. Swayne. "A weighted Procrustes criterion". In: Psychometrika 56.2 (1991), pp. 229-239.

[5] J. K Nielsen et al. "Grid Size Selection for Nonlinear Least-Squares Optimisation in Spectral Estimation and Array Processing". In: Proc. European Signal Processing Conf. 2016.