Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SIGNAL PROCESSING SYSTEMS FOR ACOUSTIC TOUCH DETECTION
Document Type and Number:
WIPO Patent Application WO/2012/025733
Kind Code:
A1
Abstract:
This invention relates to methods, apparatus, and computer programme code for processing acoustic signal data to determine where and/or whether an object has been tapped with a stylus, pen, finger nail or the like. We describe a method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of a plurality of tap-sensing regions of said object, mean value data and covariance data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms and said covariance data defining covariance of said at least two waveforms; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and from said mean value data and said covariance data from said plurality of regions.

Inventors:
GODSHILL SIMON (GB)
Application Number:
PCT/GB2011/051494
Publication Date:
March 01, 2012
Filing Date:
August 08, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INPUTDYNAMICS LTD (GB)
GODSHILL SIMON (GB)
International Classes:
G06F3/043
Domestic Patent References:
WO2006069596A12006-07-06
WO2009093056A12009-07-30
WO2009093056A12009-07-30
Foreign References:
US20030217873A12003-11-27
US7511711B22009-03-31
US7345677B22008-03-18
US7511711B22009-03-31
US7345677B22008-03-18
Attorney, Agent or Firm:
MARKS & CLERK LLP (Cambridge Cambridgeshire CB2 1LA, GB)
Download PDF:
Claims:
CLAIMS:

1. A method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of a plurality of tap- sensing regions of said object, mean value data and covariance data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms and said covariance data defining covariance of said at least two waveforms; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and from said mean value data and said covariance data from said plurality of regions.

2. A method as claimed in claim 1 wherein said determining of said tapped region comprises determining from said captured tap data and from said mean value data and said covariance data from said plurality of regions, probability data defining a probability, for each of said regions, that said unknown location is in said region; and determining a said tapped region of said object for said unknown location from said probability data.

3. A method as claimed in claim 2 wherein said determining of said probability data comprises determining, for each said region and for each of a plurality of possible tap times, a probability of capturing said captured tap data.

4. A method as claimed in claim 3 further comprising pre-processing said captured tap data to identify a subset of said possible tap times for said determining of said probability data. 5. A method as claimed in claim 3 or 4 wherein said determining of said tapped region further comprises scaling said probability for each of said regions by one or both of: i) a probability of no tap having been captured; and ii) a prior probability of a said tap in the region for each of said plurality of possible tap times. 6. A method as claimed in any preceding claim wherein said processing of said labelled training data further comprises decomposing said covariance of said at least two waveforms into a plurality of basis functions each with a respective weighting, wherein said covariance data comprises data defining said respective weightings and providing basis function data defining said basis functions; and wherein said determining of said tapped region comprises determining said tapped region from said tap data and from said mean value data and covariance data from said plurality of regions and from said basis function data.

7. A method as claimed in claim 6 wherein a different set of said functions is determined for each said region.

8. A method as claimed in claim 6 or 7 further comprising determining at least one said basis function common to said plurality of tap-sensing regions. 9. A method as claimed in any one of claims 6 to 8 wherein said determining of said tapped region comprises, for each said region, estimating a set parameters to represent said captured data in terms of said basis functions for said region, and determining said tapped region by classifying said estimated sets of parameters for said regions.

10. A method as claimed in any one of claims 6 to 8 wherein said determining of said tapped region comprises operating on said captured tap data with a filter using each of said basis functions.

11. A method as claimed in any one of claims 6 to 8 when dependent on claim 3 wherein said determining of said tapped region comprises determining, for each said region, j, and for a set of possible tap times n0, and for captured tap data y, a said probability p(y \ j,n0) > or a '°9 likelihood ratio dependent on said probability p(y I j,n0) , by applying to said data y a first FIR filter having coefficients determined by values of t , where f represents a said set of basis functions for region j, and a second FIR filter having coefficients determined by values of said mean value data.

12. A method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data using a statistical model having a plurality of different parameters associated with each said region to determine from said waveforms, for each said region, statistical data representing the variability of said parameters resulting from tapping said object in said region; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and said statistical data.

13. A method as claimed in claim 12 wherein said statistical model comprises a pluralilty of basis functions each with a respective weighting, and wherein said statistical data represents the variability of one or both of said basis functions and said weightings.

14. A method as claimed in claim 13 wherein a different set of said functions is determined for each said region.

15. A method as claimed in claim 12, 13 and 14 wherein said determining of said tapped region comprises determining probability data defining a probability, for each of said regions, that said unknown location is in said region; and determining a said tapped region of said object for said unknown location from said probability data.

16. A method as claimed in claim 15 wherein said determining of said probability data comprises determining, for each said region and for each of a plurality of possible tap times, a probability of capturing said captured tap data; preferably further comprising pre-processing said captured tap data to identify a subset of said possible tap times for said determining of said probability data.

17. A method as claimed in claim 12 wherein said statistical model includes a scale parameter representing a strength of a said tap; wherein said determining of said tapped region comprises determining probability data defining a probability, for each of said regions, that said unknown location is in said region, and determining a said tapped region of said object for said unknown location from said probability data; and wherein said determining of said probability data comprises determining, for each said region and for each of a plurality of possible tap times and for each of a plurality of possible values of said scale parameter, a probability of capturing said captured tap data.

18. A method as claimed in any preceding claim wherein said object is a portable electric device and wherein said acoustic sensor is a microphone. 19. A method as claimed in claim 18 further comprising covering said microphone to attenuate external background noise.

20. A method as claimed in any preceding claim used to identify when a single said region is tapped, wherein a second said region comprises a remainder portion of the surface of said object.

21. A method of detecting the location of a tap on a portable electronic device having an acoustic sensor comprising a microphone, the method comprising performing, on said portable electronic device, the steps of capturing tap data and determining a said tapped region as recited in any preceding claim.

22. A method of providing data for detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of plurality of tap- sensing regions of said object, mean value data and covariance data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms and said covariance data defining covariance of said at least two waveforms; storing said mean value data and said covariance data from said plurality of regions for use with captured tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor for determining a tapped region of said object.

23. A method as claimed in claim 22 wherein said processing of said labelled training data further comprises decomposing said covariance of said at least two waveforms into a plurality of basis functions each with a respective weighting, wherein said covariance data comprises data defining said respective weightings, and providing basis function data defining said basis functions; the method further comprising storing said basis function data for use with said mean value data and said covariance data in determining said tapped region.

24. A method of providing data for detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data using a statistical model having a plurality of different parameters associated with each said region to determine from said waveforms, for each said region, statistical data representing the variability of said parameters resulting from tapping said object in said region; and storing said statistical data for use with captured tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor for determining a tapped region of said object.

25. A method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; retrieving statistical data for a plurality of regions of said object wherein said statistical data comprises data characterising waveforms from said sensor when said object is tapped in each of a plurality of tap-sensing regions of the object; and determining said tapped region of said object from said tap data and from said statistical data for said plurality of regions.

26. a method as claimed in claim 25 wherein said statistical data comprises data for a statistical model having a plurality of different parameters associated with each said region, and wherein said statistical data represents the variability of said parameters dependent on the variability in said waveforms when said object is tapped in each of said tap-sensing regions. 27. A method as claimed in claim 26 wherein said statistical model comprises a pluralilty of basis functions each with a respective weighting, and wherein said statistical data represents the variability of one or both of said basis functions and said weightings.

28. A method as claimed in claim 25, 26 or 27 wherein said statistical data comprises mean value data and said covariance data comprising data characterising, respectively, a mean and a covariance of said waveforms from said sensor. 29. A method as claimed in claim 28 wherein said covariance data comprises data defining said covariance in terms of a plurality of basis functions each with a respective weighting, wherein said weightings define said covariance, wherein said statistical data further comprises basis function data defining said basis functions, and wherein said determining of said tapped region comprises determining said tapped region from said tap data and from said mean value data and said covariance data from said plurality of regions and from said basis function data.

30. A method as claimed in claim 29 wherein a set of said functions is specific to a said region.

31. A carrier carrying processor control code to, when running, implement the method of any proceeding claim.

32. A computer system for providing data for detecting the location of a tap on an object having at least one acoustic sensor, the computer system comprising: an input to receive data from said acoustic sensor; working memory; program memory storing processor control code, and a processor coupled to said input, to said working memory and to said program memory; and wherein said program memory stores code for implementing the method of claim 22, 23 or 24.

33. An electronic device configured to detect the location of a tap on the device, the device comprising: an acoustic sensor; working memory; program memory storing processor control code, and a processor coupled to said input, to said working memory, and to said program memory; wherein said program memory stores code for implementing the method any one of claims 25 to 30.

Description:
SIGNAL PROCESSING SYSTEMS FOR ACOUSTIC TOUCH DETECTION

FIELD OF THE INVENTION

This invention relates to methods, apparatus, and computer programme code for processing acoustic signal data to determine where and/or whether an object has been tapped with a stylus, pen, finger nail or the like.

BACKGROUND TO THE INVENTION

We have previously described, in WO2009/093056, a mobile phone handset incorporating one or more acoustic sensors to pick up the unique acoustic signature of an input area struck by the user. We now describe details of some particularly preferred approaches which have been tested and found to work effectively. Other background prior art can be found in patents assigned to Sensitive Object, in particular US7.51 1.711 and US7,345,677.

Broadly speaking an aspect of the problem we address is to make acoustical measurements from one or more microphones embedded in a physical device. This may be an electronic device such as a mobile phone or laptop computer or a microphone or accelerometer may be attached to an inert object to make the object touch sensitive. The acoustic signal from a user tapping on the device or object using, for example, a stylus, pen tip, finger nail, or finger, is detected and analysed (either in the device or remotely) in order to determine where the object has been tapped. The resolution of this determination is variable - for example the object may be provided with a grid of touch sensitive regions akin to a keyboard, or a simple differentiation may be made between, say, two regions such as one or other half of the device/object. In still other approaches the determination may simply be as to whether the device has been tapped in a particular region; or high resolution or quasi-continuous determination of a tap location may be made. In a typical application an icon may be displayed on the screen of a mobile phone in a touch-sensitive region, and user-tapping of the icon may then be employed, for example to navigate a menu. However it will be appreciated that many applications of the technology are possible. Although the concept is simple, there are significant difficulties in achieving reliable operation, This is because the received "tap" waveforms are embedded in environmental noise such as speech, music, vehicle noise, general background noise and so forth, as well as interference. Moreover some of this noise/interference may mimic the acoustic signal from a tap. Another substantial difficulty is that the waveform measured when repeatedly tapping at the same position is not repeatable but exhibits a random variability, presumably from small differences in how the tapping is actually performed, how the device is held, and so forth.

There therefore exists a need for algorithms which can accurately and repeatably identify the region of a device on which a user has tapped.

SUMMARY OF THE INVENTION

According to the first aspect of the invention there is therefore provided a method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of a plurality of tap-sensing regions of said object, mean value data and covariance data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms and said covariance data defining covariance of said at least two waveforms; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and from said mean value data and said covariance data from said plurality of regions.

In embodiments the labelled training data comprises location data and (digitised) time domain analogue waveforms captured for a plurality of taps at each position of a grid of touch sensitive regions, preferably using a variety of different tapping styles/instruments and the like. The waveforms for each grid region are time-aligned and then, for the set of time-aligned waveforms, mean and covariance functions are calculated. The mean comprises time series data, in effect an average waveform (and thus is a vector; the covariance comprises a matrix, in effect expressing the variability of the tap waveforms. Each touch sensitive region has a pair of these functions, that is in embodiments a mean value vector and a covariance matrix. The skilled person will recognise that a matched filter classifier might only use the mean, but embodiments of the techniques we describe improve upon this by using the variability/covariance data, in some preferred embodiments applying principle component analysis techniques as described later. Thus embodiments of the techniques we describe achieve improved performance by improved modelling of the random variabilities mentioned in the introduction above. It will be recognised that the resolution of the location data/tap sensing regions may be selected according to the application and they range from very low resolution, for example just one of two regions, to very high or quasi-continuous resolution.

The mean and covariance data enable the calculation of the probability of a captured set of tap data (y) given a tap at a particular region j and with a particular start time/sample n 0 , p(y \ j,n 0 ) . In embodiments a formula for this probability p is based upon an assumption that the probability of tap location and time is Gaussian, and background noise is assumed (in the later formulae x denotes a clean tap signal without background noise). However, this Gaussian assumption is not necessary, and, further, a probability calculated with a Gaussian assumption seems to be reasonably accurate even though the underlying statistics may not be Gaussian.

The probabilities p(y \ j,n 0 ) may be calculated for all the tap-sensing regions j and for all the potential tap start positions n 0 , thus providing a matrix of probabilities and the most likely (j,n 0 ) may be selected by identifying the largest probability (an argmax approach). Alternatively a Bayesian approach may be employed, multiplying each probability by the prior probability of a tap at region j, time n 0 , p(j,n 0 ) . As described in more detail later, if all tap positions are equally likely, the prior probability of a tap at region y is 1 / J , where J is the total number of tap-sensing regions; similarly assuming each tap time is equally likely, this introduces a factor of 1/ N , where N is the total number of possible tap times. Preferably the probability of no tap (l - «) is also included, where a is a non-critical tuneable parameter.

With either the Bayesian or argmax approach the probabilities calculated may comprise a ratio of the probability of a tap to the probability of no tap (in the later mathematics labelled as a j = 0 case), in which case a ratio of > unity indicates that it is more likely than not that there has been a tap. In practice to reduce the risk of false positives a threshold of > unity for this ratio of probabilities may be employed.

In some preferred embodiments a calculated probability p(y \ j, n Q ) takes account of

(is dependent on) a determined level of background noise ( σ^ ). The background noise may be determined from the captured tap data, for example by taking an average value of the data (assuming the tap does not add much energy) or by a more sophisticated technique. An example procedure then performs a calculation over two loops, one running over possible tap positions j, a second running over tap start time n 0 in the captured tap data _y, for each j, n 0 calculating the probability of the captured data, more preferably the ratio of this to the no-tap null hypothesis, the tap region j defining which set of mean value data and covariance data to use in the probability calculation. In an electronic device the mean value data and covariance data for each of the tap-sensing regions may be stored in non-volatile memory, for example at manufacture or downloaded to the device later. In some embodiments the loop runs over all the possible tap times ( n 0 ), but preferably, to reduce the amount of computation, the calculation is performed over only a sub-set of the possible tap times ( n 0 's) for example by calculating the energy in the captured signal and discarding the low energy portions of the signal and restricting the probability calculations to higher than average portions of the captured signal, where a tap is most likely. By setting an appropriate energy threshold a relatively restricted range of potential tap (start) times may be employed.

The probability determination may be performed in software, hardware, or a combination of the two. In particular, in some preferred implementations of the probability calculation described later using basis functions the probability calculation involves a step which is effectively implementation of an FIR (finite impulse response) filter which may, for example, be efficiently implemented in hardware. In some preferred implementations of the method, a multi-component model of the tap waveforms for each region captured during the training procedure is employed. Thus the captured tap data at y is modelled as a sum of basis functions t with amplitudes determined by a corresponding set of unknown parameters Θ, together with background noise. This is motivated by the observation that tapping repeatedly at the same position results in different waveforms because there is a significant amount of unavoidable random variation in, for example, how the object/device is held, precisely how the tap takes place, and also, surprisingly, based on environmental parameters such as temperature. Thus the idea is to model the random variation using a relatively small number of underlying components, which enables modelling of the wave shape as well as the amplitude of the tap response. This is not as simple as just considering the object to be "ringing" with different frequency components when tapped - instead the idea is to approximate the covariance of the waveforms arising when repeatedly tapping the same region j by a set of underlying components. By employing a relatively small number of components the covariance is effectively denoised so that the determined probabilities are more robust. In general the components used to model sets of waveforms from different tap regions j are different (although in embodiments there may be some components in common - see later). In practice a relatively small total number of components (/) has been found to be adequate, in embodiments <10, for example 4 or 5.

When modelling the covariance of the waveforms from a particular tap region it is convenient to assume a Gaussian distribution with a zero mean (a zero mean because the mean value data has already been subtracted off). The covariance data for the waveforms is typically a matrix (with one axis as time/sample), and this can be decomposed into a set of basis functions by any one of a range of techniques with which the skilled person will be familiar including, but not limited to, singular value decomposition, principal component analysis (an orthogonal decomposition), independent component analysis, non-negative matrix factorisation, latent variable analysis, and so forth. Thus in one approach the covariance matrix C is decomposed into eigenvalues/eigenvectors and the eigenvectors with eigenvalues below a threshold disregarded. (The threshold may be set, for example by examining the relative fall-off of the eigenvalues for the data set). Then the eigenvalues give the variances of the weighting components Θ of the basis functions, and the eigenvectors provide the set of basis functions ί for a particular tap-sensitive region j.

When the probabilities p(y \ j, n 0 ) are calculated these are calculated using covariance data for a region defined by the aforementioned eigenvalues and eigenvectors, that is the set of basis functions f for a region j and the corresponding variances of the weightings Θ. In such an approach typically the sets of basis functions f for each region are stored in non-volatile memory on the electronic device performing the tap location. The determination of the sets of basis functions, and the expression of the covariance data in terms of these, may be performed during the training phase. Thus the variance of the weights of the basis functions in the model are used rather than the "raw" covariance of the wave forms (the raw covariance data may be discarded once the basis function components for each region have been determined). Thus it will be recognised that the variance of the wave forms when tapping a particular region is expressed, in embodiments, in terms of the variance of components of a decomposition of (the covariance of) these waveforms. Once the probabilities p(y \ j,n 0 ) have been determined they may be processed in a number of ways, as previously described. Alternatively, however, in embodiments of this multicomponent approach it is not necessary to determine the probabilities p( I n o) in order to determine the tapped region - instead determination of a tapped region may employ the estimated weights of the basis functions for the tapped regions. Thus, more particularly, a set of basis weightings Θ may be determined for each candidate tapped region (and optionally tap time) and input into a classifier such as a neural network, support vector machine or the like. This is because the basis function weightings Θ can be treated as features extracted from the data. As previously mentioned, although in embodiments the basis functions are different for different tap positions, some or all of the basis functions may be common to one or more of the tap-sensitive regions. To achieve this there may be initially a decomposition across a plurality of different tap-sensitive regions, performing a principal component analysis (or similar) on all the data, thus selecting components representing taps over the plurality of touch-sensitive regions. Then, because basis functions for the individual regions are also needed, a second decomposition may be performed region-by-region.

Optionally embodiments of the method may employ training of different parameter sets (including basis functions) for different temperatures. Then an internal temperature sensor of the phone (or other electronic device) may be used to switch between different models (parameter sets). Additionally or alternatively embodiments may automatically detect which temperature the device is at from the waveform data (again using probabilities, in a similar manner to that described above). A similar

consideration/approach applies for different input modalities, e.g. pen/stylus/finger and so forth. Again, different sets of templates/parameters could be learned for different environments; again, optionally, embodiments may automatically detect which type of input mode is being employed from the waveform data (using the determined probabilities).

Although we have described embodiments in which one of a set of (predetermined) tap-sensitive regions is identified, a variant of the approach may be employed to determine when a single tap-sensitive region is tapped, for example a region to wake- up a device. In this case a j = 2 model may be employed in which one of the regions is the region to be tapped, for example the wake-up region, and the other region comprises part or all of the remainder of the object/device. The invention also contemplates a simplified version of this in which the system is trained by tapping just a single region, in effect a j = 1 model and a probability p(y \ j,n 0 ) is determined. In such a case a simplified training regime may be employed, repeatedly tapping on the single touch-sensitive region.

In some preferred embodiments of the above described procedures the touch-sensitive object is a portable electronic device, for example a mobile phone, PDA (personal digital assistant), laptop computer or the like. Preferably such a device incorporates a microphone which is used as the acoustic sensor. However in other applications an object may be made tap-sensitive by attaching an acoustic sensor, for example a microphone or accelerometer, and training a system to identify tapped regions, as previously described. In the above described aspects of the invention preferably mean value data and covariance data are employed, but potentially other models of the variability of the tapping waveforms may be employed, for example using a non-linear, non-Gaussian model, a Bayesian approach, or the like. Thus aspects of the invention contemplate other types of statistical data employed to represent variability of the captured tap waveforms, in particular in the context of a multi component model approach as described above.

Thus in a related aspect the invention provides a method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data using a statistical model having a plurality of different parameters associated with each said region to determine from said waveforms, for each said region, statistical data representing the variability of said parameters resulting from tapping said object in said region; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and said statistical data.

The algorithms can be extended to classify taps that do not lie exactly on the training grid. This is important for devices where menu grids change during navigation, e.g. web browsing. In a simple version of this, templates and components are 'interpolated' to give new templates/components at new off-grid tapping positions and classification proceeds using these new templates at the new grid positions. This allows a continuous coverage of the whole screen in principle - ie. not grid-based at all.

The algorithms can be extended to classify taps that do not lie exactly on the training grid. This is important for devices where menu grids change during navigation, e.g. web browsing. In a simple version of this, templates and components are 'interpolated' to give new templates/components at new off-grid tapping positions and classification proceeds using these new templates at the new grid positions. This allows a continuous coverage of the whole screen in principle - ie. not grid-based at all. The inventive concept may also be extended/modified to identify off-grid tapping using interpolation of templates and/or covariance functions.

Thus in a further aspect the invention provides a method [and portions of the method implemented on a training computer system and on a portable electronic device] of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of a plurality of tap-sensing regions of said object, statistical data comprising at least mean value data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms; processing said mean value data from said waveforms of a plurality of said regions to perform one or both of interpolation between and extrapolation from said means of said waveforms of said regions to provide mean value data defining waveform mean values at regions in addition to the regions used to capture said training data, preferably to increase a resolution at which mean value data for said regions is available; capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; and determining a tapped region of said object from said tap data and said statistical data. In a simple implementation the captured tap data from an unknown location may be employed in a ML (maximum likelihood) or MAP (maximum apriori probability) or similar process, in combination with the interpolated/extrapolated mean value data, to identify the unknown tap location. However in more sophisticated approaches the statistical data statistical data may comprise further data, as previously described, for example covariance data and/or data defining one or more sets of basis functions. In such a case the statistical data may be used to determine probabilities for the unknown tap location along similar lines to those previously described, in particular to identify a most probable location for the tap.

The aforementioned aspects of the invention relate to both capturing training data and to use of the training data to identify a tapped region of an object/device. However the skilled person will recognise that in many instances these two parts of the procedure will be separated so that, for example, training of a system might occur only once for, say, a particular handset or particular model of handset, for example under control of a manufacturer. However, in particular for a portable device as contrasted with an inert object to which an acoustic sensor has been attached, the identification of the tapped region will generally be lifted by software and/or hardware on the electronic device.

Thus the invention also contemplates as separate aspects:

1. the portion of the methods according to aspects/embodiments of the invention described above implemented in a training phase in which classification data such as mean value data and covariance data, or statistical data, is determined for a touch-sensitive object/device; and

2. the portion of the above described procedures according to aspects/embodiments of the invention implemented on an electronic device processing a captured acoustic signal using the data previously derived from a training phase (classification data such as mean value data and covariance data, or other statistical data) to identify a most probably tapped region.

Thus in a related aspect the invention provides a method of providing data for detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: storing a set of labelled training data, said labelled training data comprising digitised waveform data of waveforms captured from said sensor for taps at a plurality of different locations on said object in combination with location data indicating for each said waveform the location of the tap; processing said labelled training data to determine, for each of plurality of tap-sensing regions of said object, mean value data and covariance data for at least two said waveforms captured from taps in the said region, said mean value data defining a mean of said at least two waveforms and said covariance data defining covariance of said at least two waveforms; storing said mean value data and said covariance data from said plurality of regions for use with captured tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor for determining a tapped region of said object..

Similarly the invention also provides a method of detecting the location of a tap on an object having at least one acoustic sensor, the method comprising: capturing tap data comprising a digitised waveform of a tap at an unknown location captured by said sensor; retrieving statistical data for a plurality of regions of said object wherein said statistical data comprises data characterising waveforms from said sensor when said object is tapped in each of a plurality of tap-sensing regions of the object; and determining said tapped region of said object from said tap data and from said statistical data for said plurality of regions.

The invention further provides processor control code to implement the above- described systems and methods, for example on a general purpose computer system or on a digital signal processor (DSP). The code is provided on a carrier such as a disk, CD- or DVD-ROM, programmed memory such as non-volatile memory (eg Flash) or read-only memory (Firmware). Code (and/or data) to implement embodiments of the invention may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code. As the skilled person will appreciate such code and/or data may be distributed between a plurality of coupled components in communication with one another.

The invention also provides a computer system comprising a processor, working memory and programme memory, the programme memory storing processor control code to implement a method according to an aspect/embodiment of the invention as described above.

Thus in a further aspect there is provided a computer system for providing data for detecting the location of a tap on an object having at least one acoustic sensor, the computer system comprising: an input to receive data from said acoustic sensor; working memory; program memory storing processor control code, and a processor coupled to said input, to said working memory and to said program memory; and wherein said program memory stores code for implementing a method as described above.

In a further, complementary aspect there is provided an electronic device configured to detect the location of a tap on the device, the device comprising an acoustic sensor and processor control code to implement a method (or a portion of a method) identifying a tapped region) as described above. BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will now be further described, by way of example only, with reference to the accompanying figures in which:

Figure 1 shows a block diagram of a computer system configured to implement a tap- sensing training procedure according to an embodiment of an aspect of the invention;

Figure 2 shows a block diagram of an electronic device configured to implement a procedure for identifying a most likely tapped region of a plurality of tap-sensing regions of the device, according to an embodiment of an aspect of the invention;

Figure 3 shows an example of a mobile phone in which the internal microphone has been provided with a connection to enable the capture and processing of labelled training data for determining statistical data to enable the device to be configured to be tap-sensitive according to an embodiment of the invention;

Figure 4 shows an example of a wooden board object with a microphone attached to the back to enable capture and processing of labelled training data so that the device can be configured to provide a grid of tap-sensitive regions as illustrated;

Figure 5 shows examples of tapping with a stylus and with a finger nail on the mobile phone of figure 3 on one of a grid of tap-sensitive regions, the device being configured to identify the tapped region according to an embodiment of the invention;

Figure 6 shows an example of a captured acoustic waveform of a single tap;

Figure 7 shows examples of waveforms from two taps on each of two different spots;

Figure 8 illustrates variability of a captured tap waveform with temperature; Figure 9 shows an example of a set of training data waveforms captured from a single tap-sensitive region, after time-alignment showing a mean value (thick line) and variability of the waveforms (thin lines);

Figure 10 shows output data from a decomposition of a covariance matrix of a set of waveforms of the type illustrated in figure 9, showing a decomposition of the covariance into a set of eigenvalues and corresponding eigenvectors, showing, in this example, that after four or five components the eigenvalues of the basis functions are below a threshold and may be neglected;

Figure 11 illustrates example output from a correlation detector implementing a probabilistic correlation using a multicomponent model having mean value data, covariance data, and basis function data defining sets of basis functions as inputs, as illustrated in the block diagram of figure 2 (dots indicating detected taps);

Figure 12 illustrates variability in the basis functions (ie templates/components) for a tapped region;

Figure 13 shows an example of a tap waveform embedded in a high level of background noise; and

Figure 14 shows examples of mean value waveforms (templates) for 12 different tap- sensitive regions on an example mobile phone.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Broadly speaking we will describe approaches which involve a training phase in which one or more users taps repeatedly in one or more tapping modalities (stylus, finger, different temperatures) at specified locations on the device - e.g. on a carefully measured lattice of points on the screen surface of a mobile phone. (It has been found that there is variability caused by changes of temperature and change of device, e.g. different mobile phones of the same model). A training algorithm then learns templates and parameters used for subsequent classification of the data. Random Process method

We first describe our random process method. The main idea here is to model the variability in tapping styles, style of handset holding, different handsets (of the same model type) and environmental conditions using a multiple component model, see for example the templates learned from the MAP method under different temperatures. It appears that the temperature has a significant effect on the mean template - see Figure 8. Thus it seems questionable whether a simple ML or MAP method would be able to cope with this level of variation in tap waveform. Pulse variability has also been detected when users hold the devices in different ways, have the device on a surface and tap with nails or different styluses. One can label data based on position rather than the other factors mentioned above, and record the variability within the significant components.

In this general approach the tapping waveform at each location is modelled as a random process, whose parameters are different at each tapping location. The process is non-stationary because of the nature of the tapping process. If a tap occurs at time index n 0 the tapping process is regarded as a random draw from a random process

{x„;n 0 , j} where n 0 is the tapping time and j is the tapping position. There is no reason to assume that the process is Gaussian, other than analytic tractability, and it is quite likely that non-Gaussian heavy-tailed processes could do a good job of modelling the tapping processes. However, for simplicity, consider the Gaussian Process as a specific case. Here we will model a mean function for the process μ 1 of length M . and a covariance function c J (n l , n 2 ) = E[ x - , )(Λ 2 - ¼ )] , defined for any pair of time indices 0 < « 1 < y + l , 0 < « 2 < . +1 , which may be compiled into a full covariance matrix C J , whose n x ,n 2 t element is c J (n n 2 ) . Then, a tap occurs at time n 0 , the probability distribution for a sequence of measurements x = [Χ, , .,., Λ^] is given by a multivariate Gaussian distribution that is obtained by time shifting the mean and covariance functions to the appropriate tap starting time n 0 : P(x I J, «o ) = N ( J OO )> C J (n 0 ))

where

// (/¾) = [0 ... 0 // ... ^ 0 ... 0

such that the first non-zero element of μ ] 0 ) is at position n Q and the n,m th element of C J (n 0 ) is defined

0, otherwise

Training of the mean and covariance functions for this process can be done by obtaining many time-aligned tapping pulses from each tapping position and computing the mean and covariance function for ea

In the presence of zero-mean Gaussian background noise v ~ N(0,∑ v ) the distribution of the observed data y = x + v is then straightforwardly obtained as

p(y \ j,n 0 ) = N(^(n 0 ), C J (n 0 ) +∑ v )

In the classification algorithm this can be numerically evaluated using the following formula:

The computation of the matrix inverse in this expression will simplify substantially in certain cases, notably in the case where the noise v is independent and hence has a diagonal covariance matrix.

1. Non-Gaussian extensions - scaled Gaussian processes

While the Gaussian process models user and environmental variability, we may wish to separate out a tapping strength parameter from the model, the notion being that the tapping strength should affect only the amplitude of the waveform and not its shape (assuming here a linear system). This can be done by modelling the process as a scaled version of the random process, and hence moving to a scaled mixture of Gaussian Processes (i.e. now in the non-Gaussian arena):

with the new observation being y = x' + v . We have immediately that

P(y I j, η 0 , κ) = Ν{κμ ] (n 0 ), C j (n 0 ) +∑ v )

and K can be treated as a (positive-valued) random variable with prior pdf ρ(κ) , for example a discrete point-mass distribution gamma, inverted-gamma, or the square-root versions of these. The most natural choice would be the square-root inverted gamma distribution since it yields an analytic result that p(x \ j, n 0 ) is a multivariate Student distribution. We can now obtain a modified version of p(y \ j,n 0 ,ic) as with the new observation being y = x' + v . We have immediately that

P(y I « ο ) = κ 2 0 ) + τ ν )ρ{κ)άκ

In the discrete point-mass case the integral will drop out straightforwardly as a finite summation of weighted Gaussian terms (a 'Gaussian mixture process'). In the continuous cases the integral can be approximated by a discrete summation or dealt with by more sophisticated computational tools such as Monte Carlo integration.

2. Classification method

Classification of tapping position can then be achieved, for example, by Bayesian probabilities (classical or Maximum Likelihood would also work here): p(y)

where p(y) could be computed as V . p(y \ j,n 0 )p(j,n 0 ) if required, but will not usually be necessary as it is constant for any given observed data y . p(y) can however be used for selecting between different modelling specifications. See below for more detailed discussion of how to perform classification decisions using Bayesian decision theory and expected cost.

Multiple components method

The multiple component model proposed here is also a Gaussian process model, with the covariance function expressed in a reduced dimension. As such, these methods are strictly a subset of those in the previous section on more general Gaussian processes, although in the multiple components case we develop the mathematical formulae separately here. We are appealing here to dimensionality reduction arguments from Principal Components Analysis (PCA) and its random process interpretation through the Karhunen-Loeve Transform (KLT). Note that the extension to scale-mixture processes can be done in an exactly analogous fashion to that of the previous section.

1. Multiple Component Model

Consider the case where a particular tapping position j e {\,2,...,J} occurs at a time index n 0 . A particular instance of a tap location j is modelled as a linear combination of one or more component waveforms t , of length M J elements, where i is the component number. In addition each tapping position has a fixed mean component vector μ 3 associated with it, also of length M J .

At this stage, consider the component waveforms as fixed and known. In practice, these will need to be pre-specified or learned from training data, see later information.

A particular tap occurs at a time index n 0 , say, and we hence shift the component waveforms along to time n 0 and form a shifted vector of template components: (¾) = [o ... o ¾ I ... tj Mt o ... o] r such that the first non-zero element of t 3 (n 0 ) is at position n 0 . Where the template component overshoots the end of the vector (i.e. when n 0 +M - -l > N , where N is the length of the current data record), the component vectors are truncated in the obvious way. μ 3 is shifted to become μ (η 0 ) in a similar way.

We then assume a linear instantaneous model where a vector of time observations y = [}>i,...,y N ] T are modelled as a nois linear combination:

where t](n 0 ) is the z ' th template component out of a total of / components, shifted along to start at time n 0 , Θ 3 is the amplitude of the z ' th component and v = [v v ...,v N .

Take Θ/ to be a random variable, vectorised to @ J = [θ{ ,...,0] ] T , describing the amplitude scale of each component . Θ ; is then taken to be a Gaussian random vector:

p(® J \ n 0 ,j) = N(jt e J ,∑ g J ), (6) The noise v may be non-stationary, non-Gaussian, or both, and a heavy-tailed non- Gaussian background might be considered as a suitable option in noisy environments such as cafes where sudden loud noises are expected. However, for analytic tractability assume now that the noise background is a zero-mean Gaussian random vector:

v ~ N(0,∑ v )

(non-zero mean can easily be incorporated as required) .

The simplest assumption for the background noise interference is independent and Gaussian:

and hence∑ v = σ Ι Ν where is the identity matrix of length N . It is straightforward to include a dependent Gaussian structure, such as an AR (auto-regression) or ARMA (auto-regression moving average) process, or a more general Gaussian process, in the mathematics, as desired, which would lead to non-identity forms for∑ v .

At this stage, we assume that , ∑ J S and ∑ v are all pre-specified and known - in practice these would be obtained from prior reasoning or learned from training data. Considering this model in matrix notation:

where

t J (r>o) = [ti(ri 0 )t 2 J (no)...tj(n 0 )]

The likelihood function for the data is now readily obtained as:

p(y I j,n 0 J ) = N(t J (n 0 J + J (n 0 ),∑ v )

We have specified here a fully Gaussian model for parameters and likelihood. This substantially simplifies calculations and is used in the example calculations below, but we note that a non-Gaussian model may prove even more successful in the classification task at hand and can in principle be incorporated into our framework.

2. Multiple Component Detector/Classifier

It is now possible to formulate a tap detector and classifier based on the above model. This could be done using classical detection theory, least squares-based fitting of the model, or Bayesian approaches. Here we focus on the latter Bayesian approaches for their simplicity and flexibility.

The Bayesian approach will compute the posterior tap probability:

p(J> n 0 \ y)∞ p( I »O)PU, «O) p(j, n 0 \ y) can be computed for all time-offsets n 0 and all tap positions j and optimised:

j, n 0 = avgmaX j no pU, n 0 \ y) and , o) taken as the MAP estimate of tapping position and tap identity.

Note that we will also wish to test the probability of the 'null' hypothesis, i.e. that there is just background noise present and no tap at all. We identify this hypothesis as the y ' = 0 case. Since only noise is present, the likelihood is just the probability of the noise process, i.e in the Gaussian case:

p(y \ j = 0,n 0 ) = N(0,∑ v ) p(j, n 0 ) is the prior for tap identity and tap position. A naive version of this, used successfully in our work, assigns all tap identities as equally probable at all times: where a is the probability that any tap has occurred in the time interval Ι,. , . ,Ν .

A more sophisticated version could incorporate that fact that some tap positions are more regularly used than others (e.g. common navigation functions), or

the fact that there is temporal dependence between taps (if the previous tap

was a particular type then the next tap is likely to be of another type,...).

In practice the maximising value of p(j, n 0 \ y) is monitored and a detection is only declared when p(j, n 0 \ y) exceeds a threshold. It will often be convenient to look at the ratio of posterior probabilities for each j > 0 , since this avoids any calculation of the normalisation constant in p(j,n 0 \ y)∞ p(y | j,n 0 )p(j,n 0 )

PU> n o \ y _ p(y I )PU> N

P(j = 0, n 0 1 y) p(y \ j = 0, n 0 )p(j and we can then monitor this ratio (or more conveniently) its logarithm, declaring a detection whenever it exceeds a pre-specified threshold.

More generally, decision theory can be used to make the detection. If, for example, we are only interested in tap identity and not tap timing, we can compute instead:

N

j = argmax y ^(j | y) = argmax,.∑ p(j, n 0 1 y)

n 0 =l

Alternatively, we may work directly with a cost function that expresses the cost of particular classification errors, C(j,j) and choose our estimate to minimise the expected cost:

The cost function could, for example, penalise false detections more than false alarms, and so forth.

Having described the general detection/classification framework, we now need to specify how to compute the marginal likelihood p(y \ j,n 0 ) , as used for the posterior probability (1 1):

P(y 1 7 o) = £ P( > ® J I j, " 0 )d® J = , P (y \ I j,n 0 )d& J

The two terms in the integrand have already been specified in (6) and (10). Since they are both Gaussian we can perform this integral using standard Gaussian integral identities:

p(y I j,n 0 J )p{& I j,n 0 J (n 0 )@ J + μ> (n 0 ),∑ r )N(@ J \ μ^„)

(y-M J (n a )-t J (n 0 )@ J ) r ∑; 1 (y-u J (n 0 ) -

(In the notation used, for clarity vectors/matrices are not always bolded and some superscripts/subscripts are omitted for clarity, eg j and n 0 for t). Rearranging the exponent, we have: μί Y VW - W )

where:

Hence the integral is given by:

For posterior probability ratios we require the ratio of this likelihood to the 'null' j = 0 likelihood, since we have for the ratio:

p(J> "o \ y) = p(y \ j> n o )PU> )

PU = °> «o I y) p( I J = °> "<>)/>(./ = °> «o)

This Ratio simplifies the above expression somewhat:

This is the general form of the result. Substantial simplifications arise from various useful special cases. In particular, consider the white noise case with ∑ v = diag(^) and the case of orthogonal components where t J (n 0 ) T t J (n 0 ) = I . Then:

& = φ- ι β

Note in particular that the term t J (n 0 ) T y can be computed for all n 0 by a simple FIR filtering operation on y with coefficients given by the template component vector tj (reversed in time ordering). Similarly, the data dependent terms in (y - J (n 0 )) T ( - J (n 0 )) can be computed by simple summing operations and a further filtering operation on y with coefficients μ } . Remaining terms can be precomputed and stored in advance of receiving the data. 2 · 2

Moreover, if the prior covariance matrix for ® J is diagonal,∑ = diag(^ ,.-., ^ ) and the prior mean vector is zero, we have further simplifications in form.

Consider now a case with μ & ] = 0 , in order to give a concrete example of the simplest computations that could be involved in the classfication process.

Look first at β :

= - (FILTERYWITHT (n 0 ) - TMU )

where 'FILTERYWITHT" is implemented for all values of n 0 and j as a simple bank of J FIR filters applied to the data y , each filter having coefficients equal to one of the (time-reversed) component waveforms for tapping position j , and where

TMI ' = ί- ! ΰ ) Τ μ- ί 0 ) = ΐ μ-' is a constant vector for each tapping position which may be computed off-line and stored prior to running in use mode.

Then, Φ is a diagonal matrix:

and hence the inverse of Φ is just

Then, @ J is obtained as a sim le summation of terms:

where the subscript refers as before to the th waveform component at tapping position j . Now, looking at the final expression for probability ratio we require: β τ φ- ι β =∑β,®',

/=1

/ ¾ ~1 μί = 0, (since = 0 here) ∑ y (w 0 ) = FILTERYWITHMU 7 (/¾ )

where as before 'FILTERYWITHMU' is implemented for all n 0 and a particular as an FIR filter operation applied to the data y , with coefficients equal this time to the (time- reversed) process mean vector μ ] . Finall , the matrix determinant terms simplify to:

Putting the whole thing together now into the steps for classification of a data sequence y , and highlighting the parts that can be pre-computed and stored, we have:

START OF ALGORITHM:

· Pre-trained/computed and stored in NVRAM for j - \ : J

t J N by I - dimensional array

μ- 1 N - dimensional array

a Q J , (I - dimensional array containing elements σ^ β , )

LOGPRIORRRATIO = log(a) - log(l -a) - log(J) (scalar variable)

• For each chunk of data y , of length N time samples:

Determine background noise variance [this is the simplest version - just assume that background noise variance equals received signal energy]:

2 1 1 V N 2

1

• For = l : J

- Compute 'FILTERYWITHT:

For / = 1 : / Apply an FIR filter to the data y , the filter having coefficients equal to the corresponding (time-reversed) component waveform for tapping position j and component i , i.e. the time-reversed version of tj . The output is an N - dimensional vector FILTERYWITHT for a particular value of j and i , and for all values of n 0 = 1 : N .

End

- Compute 'FILTERYWITHMU':

Apply an FIR filter to the data y , the filtter having coefficients equal to the corresponding (time-reversed) mean process vector for tapping position j , i.e. the time-reversed version of μ ί . The output is an N -dimensional vector

FILTERYWITH 7 for a particular value of j , and for all values of n 0 = 1 : N .

- For n 0 - \ : N

• Calculate β , a vector of length / :

BETA = - (FILTERYWITHT (n 0 ) - TMU y )

• Compute @ J , a vector of length / :

THETA - ^ — j (FILTERYWITHT/ (n 0 ) - TMU )

• Compute:

BETAPHIBETA = ΒΕΤΑ,ΤΗΕΤΑ

YTSIGMAVMU = - FILTERY WITHMU y (n 0 )

DETSIGTHETAPHI =

• Compute log-likelihood ratio:

LOGLIKERATIO = - log(DETSIGTHETAPHI) -

- (-BETAPHIBETA - 2 x YTSIGMAVMU + MUTMU / σ ν 2 ) • Compute Bayesian log-posterior probability ratio and store in as an N by J -dimensional array in RAM:

LOGPOSTRATKV "o = LOGLIKERATIO + LOGPRIOR ATIO

End

End

• Find indices n 0 , y ,of maximum element of array LOGPOSTRATIO . If this maximum exceeds a fixed threshold, return j as the detected tap position and n 0 as the detected tap time. Otherwise, return 'no tap detected'.

END OF ALGORITHM

A special case of the above has 1 = 1 . Then the models reduce to a type of Bayesian matched filter/correlation detector.

3. Training the Multiple component model

Several components of the model were assumed known in the last section, specifically t J , ∑ Q , / , μ·* and σ ν . These can all be learned from labelled training data, in which a user is prompted to tap one or more times (possibly many more than one) in all J specified tapping positions, according to some known deterministic or random pattern of positions, and possibly in a variety of tapping styles and environments. The parameters of the classification model are then learned.

In one realisation of the training, the templates are drawn from a known dictionary of components, such as a wavelet, Fourier or Gabor dictionary. In this case the training algorithm will learn which elements of the specified dictionary should be used in the model, and also learn the prior distribution of the coefficients, i.e. the μ θ and Σ Θ terms.

The dictionary elements and their parameters can for example be learned using pursuit methods, or sparse Bayesian/ likelihood modelling methods.

In another realisation, the template component vectors are automatically learned from the data, including their number. Sophisticated approaches based on independent component analysis and probability modelling are possible here, and would not necessarily deliver orthogonal components. A simple and successful approach however involved performing a Principal Components Analysis (PCA) on the aligned training data. This comprises a detection and alignment phase in which the training examples are all aligned in time using correlation matching or similar, and then orthogonal component vectors are computed by PCA for each tapping position. This is achieved by first subtracting the overall mean of the aligned training examples. This is then used as μ 3 in the above detection procedure. Then the sample correlation matrix is obtained from the mean-corrected, aligned training data - both can be achieved as described in the section on Gaussian Processes. An eigenvector/eigenvalue decompostion is then applied to the sample correlation matrix.

/ 2

The most significant eigenvalues of the correlation matrix are then used as the σ @( in the detection model. A threshold is used to determine the number of significant components to retain, i.e. the number of components / , which may be different for each tapping position. The corresponding eigenvectors then form the orthogonal set of template components t{ . This procedure is repeated for each tapping position j , returning different template components and variances for each j ,

In a refinement of the above paragraph, the learned template components are then used with the training data to form better estimates of the prior parameters∑ J e , and possibly to learn non-zero values for the mean parameters μ θ } .

In a further refinement the template vectors themselves are also refined, starting with the PCA vectors as an initialiser. The refined component vectors may or may not be orthogonal, depending on the details of the procedure.

In the above approach, a separate set of template components and prior parameters is learned for each tapping position. In another version of our approach, a common set of template components is learned from the whole unlabelled training dataset, using the above PCA approach, or a more sophisticated approach such as probability modelling. Then, using these template components, separate prior parameters are learned for each tapping position, including most likely non-zero mean terms μ θ 3 . as well as possibly fully populated covariance matrix∑ J Q

In another version of the above, additional training data is obtained from many randomised tapping positions on the device, not generally coinciding with the tapping positions being currently trained. Template components are then learned by e.g. PCA from this entire randomised training set, and these templates are used for determining the prior parameters for labelled training data from the current training grid. This approach should lead to greater robustness to tapping variations, especially when detecting 'off-grid' taps (see later). 4. Background modelling

The background noise parameter may be learned automatically from new data as it is received. In the simplest version we simply set σ ν 2 equal to the mean-squared value of the current data y . This leads to a conservative over-estimate for the noise. In more refined versions outlier and spike removal are first employed in order to get a more reliable noise estimate.

More generally, a correlated noise model may be estimated and incorporated directly through the noise covariance matrix ∑ v . This may be obtained by estimating the parameters of a corresponding autoregressive (AR) or autoregressive moving-average (ARMA) model from the data and computing the corresponding covariance matrix. As above, robust estimation methods can be used to reduce the impact of the tap transients on the estimates. Ideally, a short section of data just before and/or after a candidate tap pulse would be used for this estimation. More generally, a Gaussian Process background model can be incorporated in a similar way. 5. Preconditioning of data

It has been found beneficial to apply a high-pass filter to training and test data prior to training classification. This makes the algorithms much more robust in the presence of heavy background noise such as planes or trains or cars.

6. Adaptation of templates with time

It is proposed to use good pulse detections over time to slowly adapt the template and prior model parameters. This is in anticipation of slow changes to the physical characteristics of the device over time.

7. Detection of off-grid taps

The algorithms can be extended to classify taps that do not lie exactly on the training grid. This is important for devices where menu grids change during navigation, e.g. web browsing. In a simple version of this, templates and components are 'interpolated' to give new templates/components at new off-grid tapping positions and classification proceeds using these new templates at the new grid positions. This allows a continuous coverage of the whole screen in principle - ie. not grid-based at all. 8. Multiple microphones

All of the above have multiple microphone extensions that could perform better.

9. Low complexity detection stage

Computation can be substantially reduced by running a low-complexity matched or correlation detector to find candidate tap positions. This would involve running one or at most a few FIR filters on the data, with coefficients based on e.g. a mean template from a large unlabelled training set. Even more can be saved by first running a short term energy detector and only trying the correlation detector once a high energy region is detected. Only when a strong candidate detection and alignment has been made is the full component-based detector be called into play. This substantially reduces computational burden.

Implementation of some preferred embodiments

In this embodiment we use the multi-component Gaussian model. Training is done by PCA analysis of the labelled training data.

1. Training

A large quantity of labelled training data is digitally recorded from the acoustic sensor on the device and transferred onto a PC via an analogue or digital link. Here labelling means that we record which spot is being tapped along with the audio waveform itself. This will involve tapping many times at each spot located on the device. Tapping may be carried out using different styli, fingernail, fingerpad, different tapping strengths and styles, different handholds, different users, different environmental temperatures and indeed different examples of the same type of device (e.g. multiple handsets of the same model). This process is typically carried out once for each batch of handsets in the factory or in the laboratory, although it could be repeated later on in the life of the device.

Figure 1 shows a block diagram of a computer system configured to implement this method. Figures 3 and 5 show use of an internal microphone of a mobile phone to capture training data, Figure 4 shows a microphone-equipped inert object. In more detail, Figure 1 shows a block diagram of the implementation of the training algorithm. The training data x J is acquired by the user tapping multiple times on a specific spot / . The ΛΓ taps in this data stream x j is then detected and aligned with each other as an ensemble in a matrix. Figure 1 shows this as χί ~ [ x i " j X ir] and as the N taps superimposed on each other in the little plot. From the ensemble the mean J and the covariance matrix C j is computed, and from the covariance matrix cS the eigenvalues and eigenvectors are derived. The <? largest eigenvalues are selected and the corresponding eigenvectors are stored as the templates = [ β * ' · α *' '"' a i] while the eigenvectors are stored as σ β = diag [X{, λ{,

For each tapping position, a single waveform corresponding to one representative tap is selected (by visual examination of the training data in an audio editor). This waveform is trimmed and used as the coefficients in a correlation detector which determines the alignment of all training taps from this tapping position. These aligned taps are then extracted, trimmed to the same length (typically around 0.005s long) and training of template components is carried out. Figure 6 shows an example of a waveform from a single tap, and Figure 7 two tap waveforms from each of two tap positions. Figure 9 shows example of time-aligned waveforms from a single tap- sensitive region showing the mean (thick line) and also the variability of these waveforms.

First the mean and covariance fu

Then the eigenvectors and eigenvalues of Q j are computed. The eigenvalues are then sorted by decreasing magnitude and a number of components to use is selected on the basis of how these decay towards zero. Figure 10 shows the decay of eigenvalues with component number, from a decomposed covariance matrix. Figure 12 indicates basis function (template) variability. The template components are then set equal to the eigenvectors which correspond to the highest eigenvalues, while the prior variances σ θ ' > are set equal to the corresponding eigenvalues. In this embodiment the prior means are all then set to zero

Thus the training process yields the prior parameters for each tapping position,

• 2 - 2

Q = diag(<r^ ,...,σ @ ) and the mean vectors μ 3 [=0 here]. 2. Testing/use mode

In the use mode, audio is continuously recorded from the device. In the simplest version of the approach, this audio is then split up into chunks or frames of around 0.05s and each chunk is separately analysed to determine whether a tap has occurred and if so where the user tapped. (In more sophisticated implementations the data may be input sample by sample and classifications are updated on-line on a sample by sample basis - both versions fit with the above detailed mathematical framework).

For each chunk of data y , of length N samples (where N is, say, of order 1000), loop over possible tap times n 0 and tapping positions j :

Determine background noise variance (this is the simplest version - just assume that background noise variance equals received signal energy):

2 1 * N-1 2

1 ' n=l

For 7 = 1 : J

For n 0 = \ : N

Compute likelihood-ratio function: β =∑i ¾ + t ; to ) r (y - μ 1 ("o ))

6 = φ- /3

Compute Bayesian posterior probability ratio

End

End

p(j,"o\y)

Find maximum of p(j=0,n 0 \y) over all j > 0 and « 0 . If the maximum exceeds a fixed threshold, return that detected j and /¾ pair. Otherwise, return no detection

U = ).

Figure 2 shows a block diagram of a portable electronic device configured to implement this method. Figure 13 shows an example of the acoustic signal of a tap embedded in noise; Figure 14 shows example basis functions (templates); and Figure 11 shows example detection statistics output data for the system of Figure 2 (dots indicate a detected tap).

In more detail, Figure 2 shows a block diagram of the implementation of the use/testing stage of the algorithm. The diagram shows how a tap Ύ incident on the tapping device is picked up by the microphone on the device and transmitted to a computer. Within the computer the algorithm calculates the statistics, as outlined below, utilizing data from the training stage of the algorithm. The statistics of the signal is then evaluated and the most probable class or spot is displayed to the user unless the models do not yield a sufficiently high probability, as determined by a threshold, in which case no tap will be registered.

3. Testing/use mode for a Gaussian process model

An alternative testing mode is obtained for the Gaussian process model. Recall that in this case only the means fi J and covariances Q J are required, so no eigenvector- eigenvalue decomposition is performed in the training stage (otherwise training is identical). Testing then proceeds exactly as before, but using a different formula for the likelihood function p(y\ j,n 0 ):

For each chunk of data y , of length N samples, loop over possible tap times n 0 and tapping position j :

Determine background noise variance [this is the simplest version - just assume that background noise variance equals received signal energy]:

2 1 1 V N 2

For j = \: J

For n 0 = 1 : N

Compute a likelihood-ratio function:

P(y\j,n 0 )= } exp((y- J (n 0 )) T (C J (n 0 ) +∑ (y- J (n 0 )))

π) l c \ η 0 + ι ν)\

p(y\j = ,n 0 ) ^exp( ∑; )

Compute Bayesian posterior probability ratio

pU, n 0 \y) _ p(y 1 j, n 0 )p(j, )

PU = o» "o I y) p(y I J = ° 5 « 0 = o. «o )

End

End

p(i»"o )

Find maximum of p(j=0,n 0 \y) over all j>0 and « 0 . If the maximum exceeds a fixed threshold, return that detected j and n 0 pair. Otherwise, return no detection

(7 = 0).

4. Testing mode for a simple case

Here we give a very specific implementation for a simple case where noise is iid Gaussian, multi-component model is assumed with orthogonal components and the prior coefficient means are set to zero. The derivation of this is given above, and the algorithm is repeated there. Arrays are treated as column vectors throughout and dimensions are given as columns x rows.

START OF ALGORITHM:

• Pre-trained/computed and stored in NVRAM for j = 1 : J

t J (N by / - dimensional array)

μ 3 (N - dimensional array)

σ^, (I - dimensional array containing elements o J Kt )

LOGPRIORR ATIO = log(a) - log(l - a) - log(J) (scalar)

TMU y = \J T μ 1 , I - dimensional array

MUTMU = μ 3 T μ 1 ' , (scalar variable)

• For each chunk of data y , of length N time samples:

Determine background noise variance [this is the simplest version - just assume that background noise variance equals received signal energy]:

1 N

• For j = 1 : J

- Compute 'FILTERYWITHT':

For i = 1 : 1

Apply an FIR filter to the data y , the filtter having coefficients equal to the corresponding (time-reversed) component waveform for tapping position j and component i , i.e. the time-reversed version of . The output is an N - dimensional vector FILTERYWITHT for a particular value of j and , and for all values of n 0 = 1 : N .

End

- Compute 'FILTERYWITHMU':

Apply an FIR filter to the data y , the filtter having coefficients equal to the corresponding (time-reversed) mean process vector for tapping position j , i.e. the time-reversed version of μ' . The output is an N -dimensional vector FILTERYWITHMU j for a particular value of j , and for all values of n 0 = l : N .

For n n - \ : N Calculate β , a vector of length / :

BETA = - (FILTERYWITFTP (n 0 ) - TMU ; )

Compute @ J , a vector of length / :

THETA

Compute:

BETAPHIBETA = ΒΕΤΑ,ΤΗΕΤΑ YTSIGMAVMU = - FILTER YWITHMU 7 0 )

DETSIGTHETAPHI

• Compute log-likelihood ratio:

LOGLIKERATIO = - log(DETSIGTHETAPHI) - (-BETAPHIBETA - 2 x YTSIGMAVMU + MUTMU / σ] )

• Compute Bayesian log-posterior probability ratio and store in as an N by J -dimensional array in RAM:

LOGPOSTRATKV = LOGLIKERATIO + LOGPRIORRATIO

End

End

• Find indices n 0 , ,of maximum element of array LOGPOSTRATIO . If this maximum exceeds a fixed threshold, return j as the detected tap position and n 0 as the detected tap time. Otherwise, return 'no tap detected'.

END OF ALGORITHM

5. Other versions of the testing algorithm

Other versions of the likelihood function, such as non-Gaussian models for the taps and/or the background noise, or the scaled mixture versions of the models also fit into a testing framework with a corresponding structure to that above, simply by replacing the calculation of the terms p(y \ j,n 0 ) and p(y \ j = 0,n 0 ) with their alternative versions, for example using equation (6) for p(y | j ' 0 )when the scaled Gaussian mixture model is to be implemented.

We have thus described a device as having one or more acoustical sensors and an algorithm which classifies tapping positions in the presence of user and environmental variability through use of a statistical model. In embodiments the model is a non- stationary random process having different parameters for each tapping position, time- shifted to coincide with the time of arrival of the tapping waveform: {x„; n 0 , j} where n 0 is the tapping time and j is the tapping position.

The random process may a Gaussian Process whose mean and/or covariance functions are parameterised differently for each tapping position:

p(x \ j,n 0 ) = N(p J (n 0 )), C J (n 0 ))

where

0 ... 0 μ{ μ{ o - o

such that the first non-zero element of is at position n 0 and the n,m the element of C J (n 0 ) is defined

n 0 , m - n 0 ), 0 < n - n 0 ,m - n 0 < M j

otherwise

In embodiments the statistical model is a random probability mixture of processes: x' = KX

More particularly the statistical model may be a random probability mixture of Gaussian processes:

The statistical model may comprise multiple waveform components, each of which models one aspect of variability in the tapping process, and where the tapping waveform is modelled as a random combination of the individual components. In particular in embodiments the tapping random process is modelled as a linear weighted sum of the individual component waveforms and (optionally) a mean process:

where the weights 9f are random variables. More particularly the random parameters are multivariate Gaussian:

p(@ J \ n 0 ,j) = N( e J ,∑i), (3)

In a statistical model as described above an additive background noise process v may be modelled as:

y = x + v

which may be non-Gaussian or Gaussian, in a Gaussian example:

v ~ N(0,∑ v )

For example in which the background is iid Gaussian, i.e.∑ v = σ^Ι Ν .

The background may be modelled as a correlated process, for example an AR or ARMA process, or general Gaussian process. The background parameters may be estimated from surrounding (non-tap, pure background) data or jointly from the whole data including the tap, optionally using outlier robust statistical methods, or by probabilistic modelling.

Additionally or alternatively a classifier may be employed, in particular which uses measured tapping waveforms and any of the above statistical models (with or without background modelling) to determine (either or both) when (n 0 ) and where ( j ) the user has tapped on the device. Likelihood-based or classical decision methods may be used to perform classification. Alternatively classification may be based on the

Bayesian posterior probability

pU, n Q \ y)∞ p(y \ j, n 0 )p(j, n 0 ) (4)

In particular in which the Maximum a Posteriori classification is chosen, for j >= 0 : j,n 0 =argmax Jno p(j,n 0 \y)

or j = argma j p(j | y) = argmax, /?(j 0 1.y)

Alternatively the Maximum Posterior probability ratio classification is chosen, for j > 0 :

Λ . PU, «O I y)

^X; = 0,» 0 |^)

or =

and no tap classification is made (y = 0) unless the maximum value exceeds a pre- specified threshold.

Alternatively the classification is made using Bayesian Decision Theory: In the above procedures p(y \ j,n 0 ) may be computed as:

p(y\j,n 0 ) = N(^(n 0 )),C J (n Q ) +∑ v )

Alternatively p(y\j,n 0 ) may be computed for j>0 as: -^(„ 0 ))))

J 0

& = Φ- χ β

In the above p(y\ j,n 0 )/ p(y\ j = 0,n 0 ) may be computed for j>0 as: p(y \ J,n 0 ) = i

P(y \ j = 0,n 0 ) 1/2 Φ | 1/2

In any of the above procedures labelled training data may be obtained from one or more users tapping in one or more different styles and in one or more environmental conditions. Preferably the labelled training tap data are detected and time-aligned using e.g. correlation-based detector (matched filter). Optionally the aligned data are preconditioned by energy or peak amplitude normalisation. In embodiments the mean and covariance functions are learned from labelled and aligned training data by:

Where the tapping random process is modelled as a linear weighted sum of the individual component waveforms components and prior parameter variances may beobtained by Principal Components Analysis of the estimated covariance matrix.

There are other more variants on the training that may be employed as previously indicated including, for example: randomised unlabelled training over whole surface to get a single set of component templates (plus estimation of prior parameters for each j from labelled training data); adaptation of templates over time, detection of off-grid taps; multiple microphones; a low complexity detection stage. The training and/or test/use data may be pre-conditioned by high-pass/low-pass filtering in order to attenuate low/high frequency interference.

There are further options. For example one could employ a basic template based detector (eg MAP maximum apriori probability and/or ML maximum likelihood), and a corresponding training algorithm, implemented in a mobile device using matched filtering technology. The maths of the testing/use (tap identification) phase are a special case of the multiple component model when J = 1 and there is no mean process μ'.

We have described, inter alia, a multiple component-based detector and its training algorithm. This algorithm is aimed at modelling variability between taps at the same physical point. Lots of training data is obtained and multiple components are estimated for each tapping point - this can be done in the simplest case by a Principal Components Analysis of the training taps - leading to orthogonal components, or using a more sophisticated model that leads to dependent components. As described, there are configuration options available - for example, learn different components for each tapping point, or have a common pool of components for all tapping points.

In embodiments we model the background interference directly, learning the level of it automatically from non-tapping parts of the signal, but modelling it as white Gaussian noise. A possible extension is to learn the spectrum or autocorrelation function of the noise and build this into the detector - in effect using coloured rather than white noise. The maths can straightforwardly be adapted for this case. Once again, the spectrum can be estimated from non-tapping data. Other options include: High pass filtering/pre-processing to remove low-frequency variability; using sparse training and interpolation method applied in a basic single template mode; and using sparse training and interpolation, applied to PCA-type multiple component mode. The tap-characterising waveforms may change over time due to heavy use or damage to the phone. Thus in a multiple component model one could adapt the templates/components (basis functions) with time, preferably rather slowly. One could also provide a re-start function that recognises that the system is no longer working and that new tapping data needs to be input by the user for recalibration purposes. Additionally or alternatively methods for detecting Off-grid' taps could be implemented, for example including randomised (location) training to learn the template components. Similarly methods for providing/using data for dealing with multiple temperatures and multiple handsets could be implemented. As previously described, computational savings may be made by implementing a reduced-complexity detection stage. The methods may be extended to provide scaling parameter estimation/marginalisation for multi-component model and/or classification in the presence of clipping/saturation.

No doubt many other effective alternatives will occur to the skilled person. It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.