Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD AND AN APPARATUS FOR ENHANCING AN AUDIO SIGNAL CAPTURED IN AN INDOOR ENVIRONMENT
Document Type and Number:
WIPO Patent Application WO/2018/219459
Kind Code:
A1
Abstract:
A first audio signal is received from a primary audio signal input. The first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, wherein the first audio signal includes a primary audio signal that is to be enhanced. A second input indicative of a second audio signal is received from a secondary audio signal source. The second audio signal is associated with a second IDI indicating the importance of the second audio signal. A determination that the first audio signal includes the second audio signal is performed. A modification, based on the first IDI and the second IDI, of the first audio signal is performed to obtain a modified version of the first audio signal that enhances the primary audio signal.

Inventors:
WANG KEVEN (SE)
FERSMAN ELENA (SE)
KARAPANTELAKIS ATHANASIOS (SE)
MOKRUSHIN LEONID (SE)
Application Number:
PCT/EP2017/063266
Publication Date:
December 06, 2018
Filing Date:
June 01, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G10L21/0208; G10L21/0216; H04L12/18; H04R25/00
Foreign References:
US20150195641A12015-07-09
US8606249B12013-12-10
EP2779162A22014-09-17
Other References:
None
Attorney, Agent or Firm:
ERICSSON (SE)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method of enhancing an audio signal captured in an indoor environment, the method comprising:

receiving (415) from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, wherein the first audio signal includes a primary audio signal that is to be enhanced;

receiving (420), from a secondary audio signal source, a second input indicative of a second audio signal, wherein the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment;

determining (430) that the first audio signal includes the second audio signal;

modifying (435), based on the first IDI and the second IDI, the first audio signal to

obtain a modified version of the first audio signal, wherein the modified version of the first audio signal enhances the primary audio signal; and

causing (440) the modified version of the first audio signal to be output to a receiver.

2. The method of claim 1, wherein the method is performed in a network device coupled with the secondary audio signal source through a network, and the method further comprises synchronizing (410) a local clock of the network device with a reference clock that causes the local clock of the network device to be in synchronization with a local clock of the secondary audio signal source.

3. The method of any of claims 1-2, wherein the second input is an identifier of a second audio signal, and the method further comprises retrieving (425) a second audio signal from a database of stored audio signals based on the identifier of the second audio signal.

4. The method of any of claims 1-2, wherein the second input is a second audio signal.

5. The method of claim 3-4, wherein the determining that the first audio signal includes the second audio signal includes:

synchronizing (460) a first segment of the first audio signal and a second segment of the second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment, wherein the first and the second timestamps are indicative of the time the first segment and the second segment are received.

6. The method of any of claims 3-5, wherein the determining (430) that the first audio signal includes the second audio signal further includes:

identifying (465) based on a pattern matching mechanism whether the first audio signal includes the second audio signal.

7. The method of any of claims 1-6, wherein the modifying (435), based on the first IDI and the second IDI, the first audio signal includes determining (470) that the first IDI and the second IDI indicate that the first audio signal is of higher importance than the second audio signal.

8. The method of claim 7, wherein the first IDI and the second IDI are numerical values, and wherein determining (470) that the first IDI and the second IDI indicate the first audio signal is of higher importance than the second audio signal includes determining (472) that the first IDI is greater than the second IDI.

9. The method of claim 3-8, the modifying (435), based on the first IDI and the second IDI, the first audio signal includes performing at least one of:

cancelling (475) the second audio signal from the first audio signal causing the primary audio signal of the first audio signal to be enhanced; and

increasing (480) a volume of the primary audio signal causing the primary audio signal to be enhanced.

10. A machine-readable medium comprising computer program code which when executed by a computer carries out the method of any of claims 1-9.

11. An apparatus for enhancing an audio signal captured in an indoor environment, the apparatus comprising:

an audio signal alteration unit to perform the following operations:

receiving (415) from a primary audio signal input a first audio signal that is

associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, wherein the first audio signal includes a primary audio signal that is to be enhanced, receiving (420), from a secondary audio signal source, a second input indicative of a second audio signal, wherein the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment, determining (430) that the first audio signal includes the second audio signal, modifying (435), based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, wherein the modified version of the first audio signal enhances the primary audio signal, and

causing (440) the modified version of the first audio signal to be output to a

receiver.

12. The apparatus of claim 11, wherein the apparatus is included in a network device coupled with the secondary audio signal source through a network, and the operations further include synchronizing (410) a local clock of the network device with a reference clock that causes the local clock of the network device to be in synchronization with a local clock of the secondary audio signal source.

13. The apparatus of any of claims 11-12, wherein the second input is an identifier of a second audio signal, and the operations further include retrieving (425) the second audio signal from a database of stored audio signals based on the identifier of the second audio signal.

14. The apparatus of any of claims 11-12, wherein the second input is a second audio signal.

15. The apparatus of claim 13-14, wherein the determining that the first audio signal includes the second audio signal includes:

synchronizing (460) a first segment of the first audio signal and a second segment of the second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment, wherein the first and the second timestamps are indicative of the time the first segment and the second segment are received.

16. The apparatus of any of claims 13-15, wherein the determining (430) that the first audio signal includes the second audio signal further includes:

identifying (465) based on a pattern matching mechanism whether the first audio signal include the second audio signal.

17. The apparatus of any of claims 10-16, wherein the modifying (435), based on the first IDI and the second IDI, the first audio signal includes determining (470) that the first IDI and the second IDI indicate that the first audio signal is of higher importance than the second audio signal.

18. The apparatus of claim 17, wherein the first IDI and the second IDI are numerical values, and wherein determining (470) that the first IDI and the second IDI indicate the first audio signal is of higher importance than the second audio signal includes determining (472) that the first IDI is greater than the second IDI.

19. The apparatus of claim 13-18, the modifying (435), based on the first IDI and the second IDI, the first audio signal includes performing at least one of:

cancelling (475) the second audio signal from the first audio signal causing the primary audio signal of the first audio signal to be enhanced; and

increasing (480) a volume of the primary audio signal causing the primary audio signal to be enhanced.

Description:
A METHOD AND AN APPARATUS FOR ENHANCING AN AUDIO SIGNAL

CAPTURED IN AN INDOOR ENVIRONMENT

TECHNICAL FIELD

[0001] Embodiments of the invention relate to the field of computer augmented hearing; and more specifically, to enhancing an audio signal captured in an indoor environment.

BACKGROUND

[0002] Noise cancellation technologies are based on isolating and enhancing one sound while suppressing or attenuating other sounds in an environment. In existing noise cancellation solutions, the choice of the sound to be enhanced is determined based on a priori information about the different sounds present in the environment. For example, a noise cancellation system can be pre-trained to classify audio signals corresponding to different sounds based on whether the audio signals can be classified as "echos" or not. In another example, a noise cancellation system may be preconfigured to assume that the strongest audio signal received corresponds to the most important sound and it is the audio signal that needs to be enhanced.

[0003] However, the existing solutions of noise cancellation have several drawbacks and disadvantages. Existing noise cancellation systems face significant challenges when attempting to eliminate background sounds from a primary sound as often the background sounds are mixed with the primary sound that needs to be enhanced. The noise cancellation solutions do not provide a way of identifying the sounds that should be cancelled from an audio signal to enhance the primary sound.

SUMMARY

[0004] It is an object of the invention to provide an improved alternative to the above techniques and prior art by providing methods and apparatuses for enhancing an audio signal captured in an indoor environment.

[0005] One general aspect includes a method of enhancing an audio signal captured in an indoor environment, the method including: receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment; determining that the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver.

[0006] A machine-readable medium including computer program code which when executed by a computer carries out the method including the operations of receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment; determining that the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver.

[0007] One general aspect includes an apparatus for enhancing an audio signal captured in an indoor environment, the apparatus including: an audio signal alteration unit to perform the following operations: receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment;

determining that the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver. BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:

[0009] Figure 1 illustrates a block diagram of an exemplary audio system 100 for enabling adaptive audio signal alteration, in accordance with some embodiments.

[0010] Figure 2A illustrates a transactional diagram of exemplary operations for receiving audio signals at an audio signal alteration unit, in accordance with some embodiments.

[0011] Figure 2B illustrates a transactional diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.

[0012] Figure 3A illustrates exemplary audio samples received at the audio signal alteration unit, in accordance with some embodiments.

[0013] Figure 3B illustrates exemplary audio samples synchronized at the audio signal alteration unit, in accordance with some embodiments.

[0014] Figure 4A illustrates a flow diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.

[0015] Figure 4B illustrates a flow diagram of exemplary operations for determining that a first audio signal includes a second audio signal, in accordance with some embodiments.

[0016] Figure 4C illustrates a flow diagram of exemplary operations for modifying, based on importance designation indicators, one or more audio signals, in accordance with some embodiments.

[0017] Figure 5 illustrates a block diagram of an exemplary implementation of audio alteration unit, in accordance with some embodiments.

DETAILED DESCRIPTION

[0018] The following description describes methods and apparatus for enhancing an audio signal captured in an indoor environment. In the following description, numerous specific details such as logic implementations, opcodes, means to specify operands, resource

partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

[0019] References in the specification to "one embodiment," "an embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0020] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dot- dash, and dots) may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.

[0021] In the following description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. "Coupled" is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. "Connected" is used to indicate the establishment of communication between two or more elements that are coupled with each other.

[0022] The embodiments of the present invention present a method, an apparatus and a system for enhancing an audio signal captured in an indoor environment. The system includes multiple sound sources and an audio signal alteration unit. The sound sources are coupled with the audio signal alteration unit and transmit to the audio signal alteration unit audio signals or an indication of an audio signal. The audio signal alteration unit receives information about the generated/captured audio signals from the various sound sources and mixes them such that background noise is altered and a primary sound is enhanced.

[0023] The embodiments described below provide a method and an apparatus for enhancing an audio signal captured in an indoor environment. A first audio signal is received from a primary audio signal input. The first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, wherein the first audio signal includes a primary audio signal that is to be enhanced. A second input indicative of a second audio signal is received from a secondary audio signal source. The second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment. A determination that the first audio signal includes the second audio signal is performed. A modification, based on the first IDI and the second IDI, of first audio signal is performed to obtain a modified version of the first audio signal. The modified version of the first audio signal enhances the primary audio signal. The modified version of the first audio signal is output to a receiver.

[0024] Figure 1 illustrates a block diagram of an exemplary audio system 100 for enabling adaptive audio signal alteration, in accordance with some embodiments. The audio system 100 includes an audio input 105, one or more secondary audio signal sources 103, an audio signal alteration unit (AAU) 102, an audio output 107, an audio signal(s) database 104, and optionally a configuration unit 106. The audio system 100 is a telecommunication system that allows a first user of the system to transmit audio to a second user of the audio system 100. The second user is located remotely from the location of the first user. In some embodiments, the audio system 100 can be part of an audio/video system enabling audio and video telecommunication. For example, the audio system 100 may be part of a teleconference system enabling the first user to communicate with one or more remote users. While the embodiments below will be described with reference to a first user communicating with a second user through the audio system 100, one of ordinary skill in the art would understand that this is intended to be exemplary only and should not be limiting. For example, any number of users can be located at two or more locations communicating through the audio system 100 without departing from the scope of the present invention.

[0025] The audio system 100 is operative to receive input audio signal(s), e.g., first and secondary input audio data, and analyze the signal(s) to identify secondary noises/sounds that need to be cancelled or attenuated in order to enhance a primary audio signal. In an exemplary scenario, the first user of the audio system 100 may be speaking through the primary audio input 105 in an indoor environment 101, e.g., a teleconference room, a room in a private residence, a collaborative workspace, etc.. The indoor environment may include one or more additional audio signal sources, i.e., the secondary audio signal sources 103 that can be referred to as audio sources 103. The additional audio signal sources create or record sounds that occur in the indoor environment. For example, the audio sources 103 can include a washing machine, a television (TV), a microphone recording outdoor noises that enter the room from a window, etc.. These sounds are considered to be secondary and more often will cause the second user, at the end of the audio system 100, to receive a sound that mixes the primary audio signal with some or all of the secondary sounds/noises. The mixed sounds/audio signals deteriorate the experience of the second user. The audio system 100 is operative to receive the first audio signal which includes the primary audio signal, e.g., the voice of the first user, mixed with one or more sounds from the secondary audio signals, e.g., the sound of the washing machine, the sound of the TV, the sound from the outdoor environment, and/or any other sounds that can be heard in the indoor environment where the first user is located. The audio system 100 is operative to alter the first audio signal to obtain a modified version (4) which includes an enhanced primary audio signal consequently improving the experience of the second user.

[0026] With reference to Figure 1, at operation 3, first audio signal are received at the primary audio input 105. In some embodiments, the primary audio input 105 includes a microphone operative to convert incoming sounds into electrical input audio signal(s). In some

embodiments, the primary audio input 105 may include an Analog-to-Digital (ADC) converter operative to convert sound(s) into a digital input audio signal.

[0027] The first audio signal includes a mix of a primary audio signal that is representative of a primary sound such as the voice of the first user and one or more additional audio signals that represent different sounds from the environment of the first user. The additional audio signals are ambient sounds that occur in the environment of the user while the user is speaking via the audio system 100. The first audio signal is associated with a time value indicating the time at which the signal is received at the audio system 100.

[0028] The audio system 100 may further receive other input audio signals from the secondary audio signal sources 103. Each one of the secondary audio signal sources 103A-C transmits to the audio signal alteration unit 102 data about the sound it produces or captures. Such data may come either in the format of digital audio stream, referred to as audio signal containing the sound itself, or in a form of an identifier, audio signal ID, that identifies an audio stream at a storage medium, e.g., the audio signal database 104. Each data, audio signal or audio signal ID, from the audio signal sources 103 is associated with a time value indicating the time at which the signal is received at the audio system 100.

[0029] In one embodiment, active audio signal source 103 A transmits an audio signal ID (2a), which when received by the AAU 102 is used, at operation 21a, to retrieve the audio signal, at operation (21b) from the audio signal database 104 corresponding to that audio signal ID. In one non-limiting example, the audio signal may be a sound emitted by the washing machine when the cleaning cycle is complete. This sound has been previously stored at the audio signal database 104, and when the washing machine emits that sound instead of sending the sound to the audio signal alteration unit 102, the ID associated with that particular sound is transmitted. In one embodiment, the ID of the audio signal is determined automatically using acoustic or audio fingerprinting. In this example, the ID is a fingerprint of the audio signal generated either at the active audio signal source and transmitted or from AAU 102 upon reception of the audio signal from the active audio signal source. The fingerprint is typically a few seconds long and used to retrieve the audio signal from the audio signal database 104. The audio signal database 104 has stored a priori fingerprinted and signal versions for all audio streams and on request from AAU 102 matches the generated audio fingerprint with the fingerprints stored, finally returning the audio signal of the closest match. In another embodiment, the active audio signal source 103B and the passive audio signal source 103C transmit, at operations 2b and 2c respectively, respective audio signals to the AAU 102. These audio signals include the audio stream and the AAU 102 does not need to access the audio signal database 104 to retrieve the audio signals.

[0030] The secondary audio signal sources 103 may include active or passive audio signal sources. An active audio signal source is a device that produces one or multiple sounds by itself. In addition to normal playback of the digital content, an active audio signal source is also operative to send the same content to the AAU 102 over a communication network, e.g., network 110. For example, a digital TV, active audio signal source 103B, or a connected speaker, not illustrated, are active sound producers as they generate sounds. A washing machine, active audio signal source 103 A, can also be considered as an active sound producer as it produces sounds during its operations, e.g. relay clicks, motor noises, end of cycle alarms, etc. One can think about other connected devices of this type, e.g., a connected door reporting lock sound when closed, etc.

[0031] A passive audio signal source is a device operative to record one or more sounds from an entity that is not capable of reporting its sounds. A passive audio signal source, e.g., source 103C, includes one or more microphones to capture sounds produced by another source and transforms them into digital format. The passive audio signal source is an "agent" of the passive entities and sends captured sounds to the AAU 102 on behalf of the passive entities. For example, a passive audio signal source 103C can be a microphone deployed near a window operative to capture and report, to the AAU 102, sounds from the outdoor environment.

[0032] The AAU 102 is operative to receive from the secondary audio signal sources and/or from the configuration unit 106 audio signal metadata, which can be referred to as "metadata", associated with different sounds that may be included in the first audio signal. The audio signal metadata is associated with an audio signal and includes an importance designation indicator (ID I). The IDI indicates an importance of the audio signal with respect to one or more other audio signals in the indoor environment. In order to enhance the primary audio signal of the first audio signal originating from the primary audio input 105, the IDI of the first audio signal indicates that the primary audio input 105 is the source of the most important signal. In contrast, the IDI associated with each one of the secondary audio signal sources 103 indicates that the audio signal received from these sources is of lower importance than the audio signal from audio input 105. In some embodiments, the IDI can take one of two Boolean values, i.e., a first value indicating that the corresponding audio signal is "important" and a second value indicating that the corresponding audio signal is "non-important". In some embodiments, the IDI can take a value from a range of numerical values, e.g., a 0 to N scale, where N=5 and where a value of 0 indicates less important and the value N=5 indicates the important audio signal. In these embodiments, the degree of alteration of a secondary signal can depend from its importance relative to other secondary audio signals.

[0033] In some embodiments, an administrator of the audio system 100 can configure, through a communication interface, not shown, each one of the secondary audio signal sources 103 individually with an associated IDI. Additionally or alternatively, the administrator may configure the AAU 102, through the configuration unit 106, to associate with each audio signal originating from a given audio source or audio input with an IDI. In these embodiments, the administrator may identify the source of the audio signal and the audio signal with an audio signal descriptor.

[0034] The metadata also includes the audio signal descriptor. The audio signal descriptor includes one or more parameter values that define the audio source and the audio signal. In some embodiments, the audio signal descriptor may include only an identification of the audio source, for example, when the audio source is a source of a single audio signal, outdoor noise. In other embodiments, the audio signal descriptor may include an identification of the audio source and an identification of the audio signal. For example, when the audio source is a source of more than one sound, e.g., the washing machine may generate different noises: the sound of the motor, the alarms, etc., and each sound has a corresponding identifier)). In some embodiments, the audio signal descriptor includes a semantic description of the audio signal. The audio signal descriptor can have several forms, for example, a JSON (JavaScript Object Notation) document can be used. Below is a non-limiting example of an audio signal descriptor for the audio signal source 103 A (TV) and for the audio signal source 103B (Washing Machine), respectively:

[0035] {

[0036] "model":"KSP342423"

[0037] "manufacturer": "Panasonic"

[0038] "type":"sound;home_electronics";"television_set";"channel_st atic"

[0039] "audio_stream_available":true

[0040] }

[0041]

[0042] {

[0043] "model":"KW432423" [0044] "manufacturer": "Kenwood"

[0045] "type":"sound;home_electronics";"washing_machine";"clothes_w ash"

[0046] "audio_stream_available":no

[0047] }

[0048] The audio system 100 also includes an audio output 107 that is coupled with the AAU 102 for outputting the modified version of the first audio signal. The audio output 107 is included in a device at a location remote from the primary audio input 105. The audio output 107 may include speaker(s) and a Digital-to -Analog Converter.

[0049] Data transfer between the different components of the audio system can be done using wired or wireless networks 110, e.g. 3G, WiFi, etc. The network 110 can be a combination of several networks, e.g., local area networks, wide area networks, cellular networks, etc., coupling the various components. In some embodiments, the AAU 102 is part of a network device 202 that also includes the primary audio input 105. In these embodiments, the AAU 102 and the primary audio input 105 are located at the same location and are either part of the same physical device or coupled through a local communication link. In other embodiments, the AAU 102 can be included within the same network device 204 as the audio output. This ND 204 is remote from the primary audio input 105. In these embodiments, the AAU 102 and the audio output 107 are located at the same location and are either part of the same physical device or coupled through a local communication link. The AAU 102 is operative to receive the audio signals from the primary audio input 105 and the secondary audio signal sources 103, to receive the metadata associated with each audio signal, and to create a modified version of the first audio signal that enhances the primary audio signal, e.g., the voice of the first user, based on the metadata.

[0050] Figure 2A illustrates a transactional diagram of exemplary operations for receiving audio signals at an audio signal alteration unit, in accordance with some embodiments. In some embodiments, the operations described with reference to Figures 2A-B occur following the receipt by the AAU 102 of the metadata associated with the audio signals originating from the secondary audio signal sources 103 and from the primary audio input 105. As described above with reference to Figure 1, the AAU 102 receives, either from each one of the audio sources 103 and the primary audio input 105, from the configuration unit 106, or from a combination of both, metadata associated with each audio signal that is to originate from the audio sources 103 and the primary audio input 105. This step of receiving the metadata can be performed during a configuration operation of the audio system 100 and is performed independently of the operations described below with reference to Figures 2A-B. In other embodiments, the metadata is transmitted to the AAU 102 along with the audio signals when the audio sources and the primary audio input are in operations and actively transmitting the audio signals to the AAU 102. In these embodiments, there is no separate step for transmitting and receiving the metadata.

[0051] Referring back to Figure 2A, the AAU 102 receives, at operation 201, from the primary audio signal input 105 a first audio signal that is associated with an IDI. The IDI indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment. The IDI indicates that the first audio signal is more important than other audio signals that may be received at the AAU 102. For example, when the IDI is one of Boolean values, the IDI of the first audio signal indicates that it is important. In another example, when the IDI is from a range of values including more than two values, the IDI of the first audio signal has the highest value. The first audio signal includes a primary audio signal that is to be enhanced. While the first audio signal includes the primary audio signal, e.g., the voice of a first user using the audio system 100, for communicating with a second user at the end of the communication network, this primary audio signal is mixed with one or more other secondary audio signals, e.g., sounds from the indoor environment that originates from the secondary audio signal sources 103.

[0052] At operation 203a, the AAU 102 receives, from a secondary audio signal source, e.g., active audio signal source 103 A, which can be a washing machine, an input indicative of an audio signal. The audio signal is associated with an IDI indicating the importance of the audio signal with respect to one or more other audio signals in the indoor environment. In some embodiments, the audio source 103 A may also transmit additional information, e.g., the metadata and additional information related to the audio signal, to the AAU 102. The audio source 103 A may report an amplitude of the audio signal, the IDI, the information about the source of the audio signal. In some embodiments, the audio sources 103 report the audio signal they emit by transmitting an identifier of the audio signal instead of the audio signal. For example, the audio source 103 A transmits an audio signal ID to the AAU 102. The IDs uniquely identify the audio signals and enable the AAU 102 to retrieve the actual audio signals from the audio signal database 104.

[0053] Upon receipt of the audio signal ID, the AAU 102 transmits a request for audio signal to the audio signal database 104. The audio signal database 104 stores audio signals that can be generated by one or more audio sources. For example, the audio signal database 104 can include multiple audio signals that can be generated by the washing machine, e.g., alarm sounds, motor sounds, etc. These audio signals are identified with an audio signal ID, which can be an alphanumerical value such as a fingerprint or other value that uniquely identifies the audio signal. At operation 203c, the audio signal database 104 retrieves the audio signal based on the first audio signal ID. In some embodiments, the database 104 may receive the amplitude of the first audio signal, in addition to the ID, from the AAU 102. At the time the audio signal is retrieved from the database it can also be transformed, operation 203d, according to the requested amplitude so that the first audio signal from the audio signal database 104 is, when played back, as realistic as possible as the actual audio signal being generated at the active audio signal source 103 A. In some embodiments, the operations 203d is not performed at the audio signal database 104 but instead at the audio signal alteration unit 102, when the audio signal is received. At operation 203 e, the audio signal retrieved from the database, which may have been transformed or not, is transmitted to the AAU 102.

[0054] At operation 205, the AAU 102 receives from the audio source 103B, e.g., a TV, another audio signal. In some embodiments, when the audio source 103B is a TV located in the indoor environment of the first user, the TV continuously sends the sound it is producing in a digital format, digital audio signal, to the AAU 102 in addition to outputting the sound to its speakers. This audio signal is associated with an IDI classifying the sound as less important than the audio signal originating from the audio input 105.

[0055] At operation 207, the AAU 102 receives from the audio source 103C, e.g., a microphone for recording outdoor noise that is located on or near a window, another audio signal. In some embodiments, when the audio source 103C is a microphone for recording outdoor noise that is located on or near a window, the microphone continuously captures and sends the sound of an outdoor environment in a digital format, digital audio signal, to the AAU 102. This audio signal is associated with an IDI classifying the sound as less important than the audio signal originating from the audio input 105. While the embodiments herein describe three secondary audio signals this is intended to be exemplary. Any number of secondary audio signal sources can be coupled with the AAU 102 and consequently any number of secondary audio signals can be received at the AAU 102 without departing from the scope of the present invention.

[0056] Each audio signal received at the AAU 102 may be received at a different time. The difference in the time of receipt between the various audio signals can be caused by the time at which the audio signal is actually produced/recorded by the audio source, as well as due to propagation delays caused by the transmission of these audio signal in the network 110 towards the AAU 102. The delays can also be caused by different relative distances and angular shifts between the primary audio input 105 and the secondary audio signal sources, by audio wave reflections, and/or various processing capacity of the electronic devices producing or recording these signals. Therefore while a sound may be produced/recorded at a given time, the audio signal associated with that sound can be received at the AAU 102 with a varying delay. When the inputs associated with the audio signals, 201, 203a, 205, and 207, are received at the AAU 102 they are associated with a time of receipt.

[0057] Figure 3A illustrates exemplary audio samples received at the audio signal alteration unit, in accordance with some embodiments. The audio signals received at the AAU 102 can be received as a set of successive samples with a determined frequency or alternatively as a single file. In some embodiments, when the audio sources, e.g., audio source 103B-C, and audio input 105, stream an audio signal to the AAU 102, the audio signal is received as segments.

Alternatively, when the audio signal is received at the AAU 102 as a result of the retrieval from the database based on the ID, the audio signal can be retrieved as a single file and either the AAU 102 or the database 104 can segment the file into one or more segments. While in Figure 3A only a single sample is illustrated for a given audio source, this is intended to be exemplary only. One of ordinary skill in the art would understand that each audio signal received from an audio source is comprised of multiple successive segments received at the determined frequency. For example, in the illustrated example, the audio samples are at a 44100 Khz frequency, i.e. 44100 samples per second.

[0058] Each segment is associated with a time at which the audio signal is received at the AAU 102. As indicated in Figure 3 A, each segment 302, 304, 306, and 308 has a different time of receipt: t(audio input), t(outdoor noise), t(washing machine), and t(TV) respectively. The difference between the various times of receipt of the signals can be caused by the network propagation delay, e.g., for audio source 103B-C and audio input 105, and/or by the audio signal identification delay, i.e., the time caused by the receipt of the ID and the retrieval of the audio signal from the database 104, as for audio source 103 A. The delays can also be caused by different relative distances and angular shifts between the primary audio input 105 and the secondary audio signal sources, by audio wave reflections, and/or various processing capacity of the electronic devices.

[0059] Figure 2B illustrates a transactional diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments. When the audio signals are received, from the secondary audio signal sources 103 and the input audio 105, the AAU 102 is operative to determine, at operation 209, that the first audio signal includes one or more audio signals second audio signal. As discussed above, the first audio signal includes a mix of the primary audio signal, e.g., the voice of the first user, and at least one other audio signal representative of a sound produced by another source. Thus, the AAU 102 determines if the first audio signal includes such a secondary audio signal and identifies which one it is, based on the additional audio signals received from the secondary audio signal sources, at operations 203d, 205, and 207. [0060] The operations of determining that the first audio signal includes one or more other audio signals can be performed according to several embodiments. In one embodiment, the operation 209 includes synchronizing, operation 217, a first segment of the first audio signal and a second segment of a second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment. The first and the second timestamps are indicative of the time the first segment and the second segment are received at the AAU 102.

[0061] In some embodiments, a reference clock and a network transmission delay aware- protocol are used to synchronize the internal clocks of each one of the secondary audio signal sources 103, the primary audio input 105 and the AAU 102. A non-limiting example of a delay- aware protocol that can be used is the Network Time Protocol (NTP). In these embodiments, a reference clock is used to synchronize the internal clocks. When a segment, e.g., segment 302, 304, 306, or 308, is received at the AAU 102, each one is associated a timestamp header, indicating the time, in the synchronized clock system, that the associated sound is received at the AAU 102.

[0062] In some embodiments, the approach above may not take into consideration the audio propagation delay from passive sound producers to the AAU 102. However, this propagation delay is quite small, when compared with the network propagation delay, given that speed of sound is 340.9 m/sec at sea level, e.g. for a 10 meter distance the audio propagation delay is 30ms. In some embodiments, audio propagation delay can be aggregated in the total delay considered when synchronizing the audio sources with the AAU 102.

[0063] In other embodiments, in addition to synchronizing the audio signals, a pattern matching mechanism is performed, at operation 219, to identify the various secondary audio signals within the first audio signal. The pattern matching mechanism compares the content of the incoming secondary audio signals, e.g., 203d, 205, and 207, with the first audio signal 201 to determine the exact time at which the secondary audio signal occurs within the first audio signal. The pattern matching mechanism can be performed through various methods without departing from the scope of the current invention. For example, the pattern matching mechanism is performed by creating a spectrogram, that have three dimensions of time, amplitude and frequency, of the first audio signal, e.g., using short-time Fourier transform, and analyzing occurrences of the secondary audio signal spectrograms in the spectrum of the first audio signal to determine a matching between these two spectrum in the time dimension.

[0064] Figure 3B illustrates exemplary audio samples synchronized at the audio signal alteration unit, in accordance with some embodiments. Upon completion of the synchronization operation and optionally of the pattern matching operation, the AAU 102 obtains audio signals that are synchronized and has identified when a secondary audio signal, e.g., audio signal 203d, 205, and 207, occur with reference to the first audio signal 201 as illustrated with reference to Figure 3B. Figure 3B illustrates the segments 302-308 for which the various delays caused by the system have been accounted for and the segments are aligned based on the actual time they have occurred.

[0065] Referring back to Figure 2B, the AAU 102 may now modify, based on the respective IDIs of the various audio signals received, the first audio signal to obtain a modified version of the first audio signal. The modified version of the first audio signal enhances the primary audio signal. In some embodiments, modifying the first audio signal includes determining that the IDI associated with the first audio signal, i.e., signal received from the primary audio input 105, indicates that this audio signal is of higher importance than the secondary audio signals received from the secondary audio signal sources 103. For example, when the first IDI associated with the first audio signal and the secondary IDIs associated with the secondary audio signals are numerical values, determining that the first IDI indicates that the first audio signal is of higher importance than the secondary audio signals includes determining that the first IDI is greater than any of the secondary IDIs.

[0066] Modifying the first audio signal includes performing at least one of cancelling the secondary audio signals from the first audio signal and increasing a volume of the primary audio signal. In some embodiments, the AAU 102 may perform any of the two operations, cancelling the secondary audio signals or increasing the volume of the primary audio signal, or a combination of the two operations. In all cases, this causes the primary audio signal to be enhanced. At operation 213, the modified version of the first audio signal is sent to the output from which it can be heard by the second user. For example, when the primary audio signal is the sound of a the voice of the first user, the mechanism discussed above causes this audio signal to be received at the output of the audio system 100 with a better quality and clarity without distracting background sounds/noises.

[0067] The operations in the flow diagrams will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagrams can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagrams.

[0068] Figure 4A illustrates a flow diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments. [0069] At operation 410, a local clock of the network device including the AAU 102, is synchronized with a reference clock. The synchronization causes the local clock of the network device to be in synchronization with a local clock of the secondary audio signal sources 103 and with a local clock of the primary audio input 105. At operation 415, the AAU 102 receives from a primary audio signal input, audio input 105, a first audio signal that is associated with a first importance designation indicator that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment. The first audio signal includes a primary audio signal that is to be enhanced.

[0070] At operation 420, the AAU 102 receives from a secondary audio signal source, e.g., audio source 103 A, 103B, and/or 103C, a second input indicative of a second audio signal. The second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment. In some embodiments, the second input is an identifier of a second audio signal, and the operations move to operation 425 at which the AAU 102 retrieves a second audio signal from a database 104 of stored audio signals based on the identifier of the second audio signal. In other embodiments, the second input is a second audio signal and the operation 425 is skipped.

[0071] At operation 430, the AAU 102 determines that the first audio signal includes the second audio signal. Figure 4B illustrates a flow diagram of exemplary operations for determining that a first audio signal includes a second audio signal, in accordance with some embodiments. In some embodiments, determining that the first audio signal includes the second audio signal includes synchronizing, at operation 460, a first segment of the first audio signal and a second segment of the second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment. The first and the second timestamps are indicative of the time the first segment and the second segment are received at the AAU 102. In some embodiments, determining that the first audio signal includes the second audio signal can also include identifying, at operation 465, based on a pattern matching mechanism whether the first audio signal includes the second audio signal.

[0072] At operation 435, the first audio is modified based on the first IDI and the second IDI signal to obtain a modified version of the first audio signal. The modified version of the first audio signal enhances the primary audio signal. Figure 4C illustrates a flow diagram of exemplary operations for modifying, based on importance designation indicators, one or more audio signals, in accordance with some embodiments.

[0073] In some embodiments, modifying the first audio signal includes determining, operation 470, that the IDI associated with the first audio signal, i.e., signal received from the primary audio input 105, indicates that this audio signal is of higher importance than the secondary audio signals received from the secondary audio signal sources 103. For example, when the first IDI associated with the first audio signal and the secondary IDIs associated with the secondary audio signals are numerical values, determining, at operation 472, that the first IDI indicates that the first audio signal is of higher importance than the secondary audio signals includes determining that the first IDI is greater than any of the secondary IDIs.

[0074] Modifying the first audio signal includes performing at least one of cancelling, at operation 475, the secondary audio signals from the first audio signal and increasing, at operation 480, a volume of the primary audio signal. In some embodiments, the AAU 102 may perform any of the two operations: cancelling the secondary audio signals or increasing the volume of the primary audio signal; or a combination of the two operations. In all cases, this causes the primary audio signal to be enhanced. At operation 440, the modified version of the first audio signal is caused to be output to a receiver. For example, when the primary audio signal is the sound of a the voice of the first user, the mechanism discussed above causes this audio signal to be received at the output of the audio system 100 with a better quality and clarity without distracting background sounds/noises.

[0075] The various embodiments described herein present clear advantages with respect to prior techniques for enhancing an audio signal in an indoor environment. In contrast to the prior art solutions which faced significant challenges when attempting to distinguish a primary sound that needs to be enhanced from other background sounds/noises in an audio signal, the embodiments herein enable the use of metadata associated with the background sounds, i.e., the secondary audio signals, as well as the receipt of these sounds from the sources creating or recording them in order to modify an audio signal consequently enhancing the primary audio signal. Therefore the embodiments of the present invention enable a superior noise cancellation that improves the experience of the users communicating through the audio system 100.

[0076] Architecture:

[0077] An electronic device stores and transmits, internally and/or with other electronic devices over a network, code, which is composed of software instructions and which is sometimes referred to as computer program code or a computer program, and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media, e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory, and machine-readable transmission media (also called a carrier), e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals. Thus, an electronic device, e.g., a computer, includes hardware and software, such as a set of one or more processors, e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding, coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data. For instance, an electronic device may include non- volatile memory containing the code since the non- volatile memory can persist code/data even when the electronic device is turned off, when power is removed, and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower nonvolatile memory into volatile memory, e.g., dynamic random access memory (DRAM), static random access memory (SRAM), of that electronic device.

[0078] The audio system 100 can be implemented on one or more electronic devices as will be described with reference to the following figure. Figure 5 illustrates a block diagram of an exemplary implementation of audio alteration unit 102 for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments. The physical, i.e., hardware, Audio Device 500 is an electronic device that can perform some or all of the operations and methods described above for one or more of the embodiments. The physical audio device 500 can include one or more I/O interfaces, processor(s) ("processor circuitry") 510, and a memory 505.

[0079] The processor(s) 510 may include one or more data processing circuits, such as a general purpose and/or special purpose processor, e.g., microprocessor and/or digital signal processor. The processor(s) is configured to execute the audio signal alteration unit instance 512. The audio signal adaptive unit instance 512 when executed by the processor is operative to perform the operations described with reference to the Figures 1-4C. Although the various modules of Fig. 5 are shown to be included as part of the processor 510, one having ordinary skill in the art will appreciate that the various modules may be stored separately from the processor, for example, in a non-transitory computer readable storage medium. The processor can execute the module stored in the memory, e.g., the audio signal alteration unit code 522, to perform some or all of the operations and methods described above. Accordingly, the processor can be configured by execution of various modules to carry out some or all of the functionality disclosed herein. The audio device 500 further includes an audio signal database 504 stored in the memory 505.

[0080] The audio device 500 also include a set or one or more physical Input/Output (I/O) interface(s) to establish connections and communication between the different components of the audio device 500 and with external electronic devices. For example, the set of I/O interfaces can include a microphone and an ADC for receiving input audio signals, speakers and a DAC for outputting audio signals to a user, a secondary audio input for receiving other audio signals, and a communication interface (e.g., BLE) for communicating with external electronic devices.

[0081] As described above with reference to Figure 1, in some embodiments, the audio signal alteration unit can be included within the same audio device as the audio input receiving the primary audio signal. For example, audio device 500 may include a microphone and the audio signal alteration unit. In other embodiments, the audio signal alteration unit can be included within the same audio device as the audio output receiving the modified audio signal with the enhanced primary audio signal. For example, audio device 500 may include a speaker and the audio signal alteration unit.

[0082] Each one of the secondary audio signal source 103 is an electronic device

communicatively coupled through the network with the audio device 500.

[0083] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.