Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CAPTURING AND PROCESSING SOUND SIGNALS
Document Type and Number:
WIPO Patent Application WO/2018/077713
Kind Code:
A4
Abstract:
A system comprising a microphone arranged to capture sound from an environment, and an ultrasound emitter configured to emit an emitted ultrasound signal into an environment. The microphone is arranged to capture a received audio signal from the environment, comprising a component in the human audible range. The microphone is also arranged to capture a received ultrasound signal comprising reflections of the emitted ultrasound signal, or else the system comprises another, co-located microphone arranged to capture the received ultrasound signal. Either way, the system further comprises a controller implemented in software or hardware or a combination thereof, wherein the controller is configured to process the received audio signal in dependence on the received ultrasound signal.

Inventors:
STANFORD-JASON ANDREW (GB)
MULLER HENDRIK LAMBERTUS (GB)
Application Number:
PCT/EP2017/076673
Publication Date:
September 20, 2018
Filing Date:
October 19, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
XMOS LTD (GB)
International Classes:
G06F3/01; G10L25/84; G10L15/08; G10L15/22; G10L21/0208
Attorney, Agent or Firm:
TOWNSEND, Martyn James (GB)
Download PDF:
Claims:
AMENDED CLAIMS

received by the International Bureau on 03 August 2018 (03.08.2018)

1. A system comprising:

an ultrasound emitter configured to emit an emitted ultrasound signal Into an environment;

sound sensing equipment comprising a microphone or more than one co-located microphone, wherein at least one of the one or more microphones of said sound sensing equipment Is arranged to capture from said environment a received audio signal comprising a component In a human audible range, and wherein at least one of said one or more microphones of the sound sensing equipment is arranged to capture a received ultrasound signal comprising reflections of the emitted ultrasound signal; and

a controller configured to perform operations of:

- monitoring for one or more predetermined wake-up words in the received audio signal;

• waking up a target device from a standby state In response to a positive detection of at least one of the one or more wake-up words; and

• performing a gesture detection process, by using the ultrasound signal to detect the reflections of the emitted ultrasound signal and thereby detect user gestures performed by a user in said environment;

wherein the controller is configured to declare the positive detection of the at least one wake-up word at least partially in dependence on being accompanied by a user gesture as detected based on said gesture detection process.

2. The system of claim 1, wherein the sound sensing equipment comprises:

an audio filter arranged to receive and filter an input signal from the one or more microphones to produce the received audio signal representing the audio component by passing a first frequency range comprising the audio component but filtering out higher frequencies; and

an ultrasound filter arranged to receive and filter an instance of the same Input signal derived from the same microphone as the audio filter to produce the received ultrasound signal comprising the ultrasound reflections by passing a second frequency range comprising the ultrasound component but filtering out lower frequencies Including at least the audio component.

3. The system of claim 2, wherein:

the audio filter takes the form of an audio declmator which is also arranged to downsample the input signal to an audio sampling frequency retaining the first frequency range but not higher frequencies, thereby producing said received audio signal at the audio sampling frequency.

4. The system of claim 2 or 3, wherein the ultrasound filter is also arranged to filter out frequencies higher than the second frequency range.

5. The system of claim 4, wherein the ultrasound filter takes the form of an ultrasound declmator which Is also arranged to downsample the Input signal to an ultrasound sampling frequency retaining the second frequency range but not higher frequencies, thereby producing said received ultrasound signal at the ultrasound sampling rate.

6. The system of claim 4 or 5, wherein the ultrasound filter has a configurable pass band.

7. The system of any of claims 2 to 6, wherein:

the input signal Initially Includes frequencies higher than the second frequency range, comprising high frequency noise; and

the system comprises a preliminary filter arranged to filter the input signal before Input to the audio and ultrasound filters, by passing the first and second frequency ranges but to filter out at least some of the high frequency noise.

8. The system of claim 7, wherein the preliminary filter takes the form of a preliminary declmator which Is also arranged to downsample the input signal to an initial downsampled sampling frequency before Input to the audio and ultrasound filters, the Initial

downsampled sampling frequency retaining said first and second frequency ranges but not a higher frequency range comprising at least some of said high frequency noise.

9. The system of any preceding claim, wherein the controller is configured to process the received audio signal to be transmitted as part of a voice call.

10, The system of any preceding, wherein the system is incorporated in the target device.

11, The system of any preceding claim, wherein the target device takes the form of one of:

- a television set or set-top box,

- a smart household appliance,

- a mobile user terminal,

- a desktop computer,

- a server, or

- a robot.

12. The system of any preceding claim, wherein the controller is configured to:

apply a noise model In order to remove ambient noise from the received audio signal, the noise model modelling ambient noise originating from said environment;

perform a motion detection process, by using the received ultrasound signal to detect the reflections of the emitted ultrasound and based thereon to detect motion in the environment;

perform a noise classification to classify whether or not the received audio signal currently consists only of ambient noise, by classifying the received audio signal as ambient noise at least partially in dependence on not being accompanied by motion in the environment as detected based on said motion detection process; and

train the noise model based on the received audio signal during periods when the received audio signal is classified as ambient noise according to said noise classification, but supresslng the training during periods when the received audio signal is not classified as ambient noise according to said noise classification.

13. The system of any preceding claim, wherein the system further comprises a sound source, and wherein the controller is configured to:

apply an echo model in order to remove echoes of the sound source from the received audio signal, thereby producing an echo-cancelled version of the received audio signal, the echo model modelling an echo response of said environment;

when the echo-cancelled version of the audio signal diverges from quiescence, perform an echo response classification to classify whether or not the divergence is due to a change in the echo response of the environment, by classifying the divergence as being due to a change in the echo response at least partially in dependence on being accompanied by a change In the reflections of the emitted ultrasound signal received In the received ultrasound signal; and

train the echo model based on the received audio signal during periods when the divergence is classified as being due to the as due to a change in the echo response according to said echo response classification, but supressing the training during periods when the divergence is classified as not due to a change in the echo response according to said echo response classification.

14. The system of any preceding claim, wherein the microphone takes the form of a directional microphone comprising an array of sound sensing elements, and wherein the controller Is configured to:

based on the array of sound sensing elements, determine a direction of arrival of the received ultrasound signal; and

determine a direction of arrival of the received audio signal at least partially based on the direction of arrival of the received ultrasound signal.

15. A system comprising:

an ultrasound emitter configured to emit an emitted ultrasound signal into an environment;

sound sensing equipment comprising a microphone or more than one co-located microphone, wherein at least one of the one or more microphones of said sound sensing equipment is arranged to capture from said environment a received audio signal comprising a component in a human audible range, and wherein at least one of the one or more microphones of the sound sensing equipment Is arranged to capture a received ultrasound signal comprising reflections of the emitted ultrasound signal; and

a controller configured to perform operations of:

applying a noise model in order to remove ambient noise from the received audio signal, the noise model modelling ambient noise originating from said environment;

• performing a motion detection process, by using the received ultrasound signal to detect the reflections of the emitted ultrasound and based thereon to detect motion in the environment;

• performing a noise classification to classify whether or not the received audio signal currently consists only of ambient noise, by classifying the received audio signal as ambient noise at least partially in dependence on not being accompanied by motion in the environment as detected based on said motion detection process; and

- training the noise model based on the received audio signal during periods when the received audio signal Is classified as ambient noise according to said noise classification, but supressing the training during periods when the received audio signal is not classified as ambient noise according to said noise classification.

16. The system of claim 15, wherein the sound sensing equipment comprises:

an audio filter arranged to receive and filter an input signal from the one or more microphones to produce the received audio signal representing the audio component by passing a first frequency range comprising the audio component but filtering out higher frequencies; and

an ultrasound filter arranged to receive and filter an instance of the same input signal derived from the same microphone as the audio filter to produce the received ultrasound signal comprising the ultrasound reflections by passing a second frequency range comprising the ultrasound component but filtering out lower frequencies including at least the audio component.

17. The system of claim 16, wherein:

the audio filter takes the form of an audio decimator which Is also arranged to downsample the input signal to an audio sampling frequency retaining the first frequency range but not higher frequencies, thereby producing said received audio signal at the audio sampling frequency.

18. The system of claim 16 or 17, wherein the ultrasound filter Is also arranged to filter out frequencies higher than the second frequency range.

19. The system of claim 18, wherein the ultrasound filter takes the form of an ultrasound decimator which Is also arranged to downsampie the input signal to an ultrasound sampling frequency retaining the second frequency range but not higher frequencies, thereby producing said received ultrasound signal at the ultrasound sampling rate.

20. The system of claim 18 or 19, wherein the ultrasound filter has a configurable pass band.

21. The system of any of claims 16 to 20, wherein:

the input signal initially includes frequencies higher than the second frequency range, comprising high frequency noise; and

the system comprises a preliminary filter arranged to filter the Input signal before Input to the audio and ultrasound filters, by passing the first and second frequency ranges but to filter out at least some of the high frequency noise.

22. The system of claim 21, wherein the preliminary filter takes the form of a preliminary decimator which is also arranged to downsampie the input signal to an initial downsampled sampling frequency before input to the audio and ultrasound filters, the initial

downsampled sampling frequency retaining said first and second frequency ranges but not a higher frequency range comprising at least some of said high frequency noise.

23. The system of any of claims 15 to 22, wherein the processing which the controller Is configured to perform with the assistance of the ultrasound signal comprises: identifying speech in the received audio signal, and controlling a target device in dependence on the identified speech.

24. The system of any of claims 15 to 23, wherein the controller is configured to: process the received audio signal to be transmitted as part of a voice call.

25. The system of any of claims 15 to 24, wherein the controller is configured to:

monitor for one or more predetermined wake-up words In the received audio signal; wake up a target device from a standby state in response to a positive detection of at least one of the one or more wake-up words; and

perform a gesture detection process, by using the ultrasound signal to detect the reflections of the emitted ultrasound signal and thereby detect user gestures performed by a user in said environment;

wherein the controller is configured to declare the positive detection of the at least one wake-up word at least partially in dependence on being accompanied by a user gesture as detected based on said gesture detection process.

26. The system of claim 23 or 25, wherein the system Is Incorporated in the target device.

27. The system of claim 23, 25 or 26, wherein the target device takes the form of one of:

- a television set or set-top box,

- a smart household appliance,

- a mobile user terminal,

- a desktop computer,

- a server, or

- a robot.

28. The system of claim 23, 25, 26 or 27, wherein the target device Is arranged to run a virtual digital assistant or to access a virtual digital assistant hosted on a server, and wherein the waking-up of the target device from the standby state comprises waking up the virtual digital assistant such that the vlrtual digital assistant will respond to further voice commands or queries detected In the audio signal.

29. The system of any of claims 15 to 28, wherein the system further comprises a sound source, and wherein the controller is configured to:

apply an echo model In order to remove echoes of the sound source from the received audio signal, thereby producing an echo-cancelled version of the received audio signal/ the echo model modelling an echo response of said environment;

when the echo-cancelled version of the audio signal diverges from quiescence, perform an echo response classification to classify whether or not the divergence is due to a change in the echo response of the environment, by classifying the divergence as being due to a change in the echo response at least partially In dependence on being accompanied by a change in the reflections of the emitted ultrasound signal received In the received ultrasound signal; and

train the echo model based on the received audio signal during periods when the divergence is classified as being due to the as due to a change In the echo response according to said echo response classification, but supressing the training during periods when the divergence is classified as not due to a change in the echo response according to said echo response classification.

30. The system of any of claims 15 to 29, wherein the microphone takes the form of a directional microphone comprising an array of sound sensing elements, and wherein the controller is configured to:

based on the array of sound sensing elements, determine a direction of arrival of the received ultrasound signal; and

determine a direction of arrival of the received audio signal at least partially based on the direction of arrival of the received ultrasound signal.