Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUTOMOTIVE SOUND RECOGNITION SYSTEM FOR ENHANCED SITUATION AWARENESS
Document Type and Number:
WIPO Patent Application WO/2012/097150
Kind Code:
A1
Abstract:
Sound recognition systems for a vehicle and methods for increasing auditory situation awareness in a vehicle are provided. A sound recognition system includes at least one ambient sound microphone (ASM), at least one vehicle cabin receiver (VCR) and a processor. The ASM is disposed on the vehicle and configured to capture ambient sound external to the vehicle. The VCR is configured to deliver audio content to a vehicle cabin of the vehicle. The processor coupled to the at least one ASM and the at least one VCR. The processor is configured to detect at least one sound signature in the ambient sound and to adjust the audio content delivered to the vehicle cabin based on the detected at least one sound signature.

Inventors:
USHER JOHN (GB)
GOLDSTEIN STEVEN W (US)
CASALI JOHN G (US)
Application Number:
PCT/US2012/021077
Publication Date:
July 19, 2012
Filing Date:
January 12, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PERSONICS HOLDINGS INC (US)
USHER JOHN (GB)
GOLDSTEIN STEVEN W (US)
CASALI JOHN G (US)
International Classes:
H04B1/00
Foreign References:
US20070127734A12007-06-07
US20080240458A12008-10-02
Attorney, Agent or Firm:
SPADT, Jonathan, H. et al. (P.O. Box 980Valley Forge, PA, US)
Download PDF:
Claims:
What is Claimed :

1. A sound recognition system for a vehicle comprising : at least one ambient sound microphone (ASM), disposed on the vehicle, config ured to capture ambient sound external to the vehicle;

at least one vehicle cabin receiver (VCR) configured to deliver audio content to a vehicle cabin of the vehicle; and

a processor, coupled to the at least one ASM and the at least one VCR, the processor configured to detect at least one sound signature in the ambient sound and to adjust the aud io content delivered to the vehicle cabin based on the detected at least one sound signatu re.

2. The system according to claim 1, wherein the sound signature includes at least one of a non-verbal warning sound or a verbal warning sound .

3. The system according to claim 2, wherein the non-verbal warning sound includes at least one of an alarm, a horn, a siren or a noise.

4. The system accord ing to claim 2, wherein the verbal warning sou nd includes one or more spoken words associated with a verbal warning .

5. The system according to claim 1 , wherein the processor is configured to selectively adjust a volu me of the audio content delivered to the vehicle cabin when the at least one sound signature is detected .

6. The system according to claim 1, wherein the processor is configured to reproduce the ambient sound associated with the at least one sou nd signature within the vehicle cabin.

7. The system according to claim 1 , wherein the processor is configured to selectively mix the aud io content with the ambient sound when the at least one sound signatu re is detected .

8. The system according to claim 1 , further including a memory configured to store a target sound captu red by the at least one ASM for learning the corresponding sound sig nature.

9. The system according to claim 8, wherein the memory is configured to store the target sound responsive to an indication by a user of the system .

10. The system according to claim 8, wherein the memory is configured to store the target sound automatical ly by the system .

1 1. The system according to claim 1 , further including an audio interface coupled to the processor to receive the audio content from at least one of a media player or a mobile phone.

12. The system according to claim 1, wherein the system is configured to transmit one or more of the ambient sou nd a nd the at least one sound signature to a remote location .

13. A method for increasing auditory situation awareness in a vehicle, the method comprising the steps of:

capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM ) disposed on the vehicle;

monitoring the ambient sound for a target sound by detecting a sound signature corresponding to the target sound in the ambient sound ; and

adjusting a delivery of audio content by at least one vehicle cabin receiver (VCR) to a vehicle cabin of the vehicle based on the target sound .

14. The method accord ing to claim 13, wherein the adjusting of the delivery of the audio content includes mixing the target sound with the audio content for delivery to the vehicle cabin .

15. The method according to claim 14, wherein the target sound is mixed with the aud io content in accordance with a priority of the target sou nd .

16. The method according to claim 13, wherein the adjusting of the delivery of the aud io content includes at least o ne of:

passing the target sou nd to the at least one VCR,

amplifying the target sou nd for delivery to the vehicle cabi n, attenuating the target sound for delivery to the vehicle cabin, generating an audible message based on the target sound for delivery to the vehicle cabin, or

replacing the target sou nd with a predetermined sou nd corresponding to the target sound for delivery to the vehicle cabi n .

17. The method according to claim 13, the method further including : detecting at least one of a direction of a sound sou rce or a speed of the sound source generating the target sound from the sound signature; a nd

indicating the at least one of the direction or the speed of the sou nd source in the vehicle cabin .

18. The method according to claim 13, the method further including transmitting a warning notification to other devices.

19. The method of claim 13, wherein the target sou nd includes at least one of an alarm, a horn , a siren , a spoken utterance or a noise .

20. The method according to claim 13, wherein the detecting of the sou nd signatu re incl udes detecting a spoken utterance in the ambient sound associated with a verbal warning , the method further including : indicating the verbal warning in the vehicle cabin.

21. The method according to claim 13, the method further including : acquiring a current location of the vehicle;

associating the current location with a sound signature; and updating a sound signature library containing a plurality of predetermined target sounds with the sound signature associated with the current location.

22. A method for sound signature detection for a vehicle, the method comprising :

capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; and

receiving a directive to learn a sound signature within the ambient sound, wherein a voice command or an indication from a user is received and is used to initiate the steps of capturing and learning.

23. The method according to claim 22, further including saving the sound signature at least one of locally on the vehicle or remotely to a server.

24. The method according to claim 22, further including adapting a previously learned warning sound model using the sound signature within the ambient sound.

25. A method for personalized listening in a vehicle, the method comprising :

capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle;

detecting a sound signature within the ambient sound that is associated with a warning sound; and

mixing the warning sound with audio content delivered to the vehicle cabin via at least one vehicle cabin receiver (VCR) in accordance with a priority of the warning sound and a personalized hearing level (PHL).

26. The method according to claim 25, wherein the detecting of the sound signature includes:

retrieving learned models from a database;

comparing the sound signature to the learned models; and identifying the warning sound from the learned models responsive to the comparison.

27. The method according to claim 25, further including enhancing auditory queues in the warning sound relative to the audio content based on a spectrum of the ambient sound captured at the at least one ASM.

Description:
AUTOMOTIVE SOUND RECOGNITION SYSTEM FOR ENHANCED SITUATION AWARENESS

CROSS REFERENCE TO RELATED APPLICATIONS

[OOOl] This application is related to and claims the benefit of U.S. Provisional

Application No. 61/432,016 entitled "AUTOMOTIVE SOUND RECOGNITION SYSTEM FOR ENHANCED SITUATION AWARENESS" filed on January 12, 2011, the contents of which are incorporated herein by reference.

FIELD OF THE INVENTION

[0002] The present invention relates to a device that monitors sound directed to a vehicle cabin, and more particularly, though not exclusively, to an audio system and method that detects ambient warning sounds and adjusts audio delivered to a vehicle cabin based on the detected warning sounds to enhance auditory situation awareness.

BACKGROUND OF THE INVENTION

[0003] People that use audio systems in vehicles generally do so for either music enjoyment or voice communication. The user is generally immersed in the audio experience when using such devices and is acoustically isolated within a sealed vehicle cabin. Background noises in the external vehicle environment (e.g., road, engine, wind and traffic noise) can contend with the acoustic sounds produced from these devices. As the background noise levels change, the user may need to adjust the volume to listen to their music over the background noise. Alternatively, the level of reproduced audio may be automatically increased, for example, by audio systems that increase the audio level as the vehicle velocity increases (i.e., to compensate for the rise in noise level from road, engine and aerodynamic noise). One example of such an automatic gain control system is described in US patent No. 5,081,682.

SUMMARY OF THE INVENTION

[0004] Aspects of the present invention relate to a sound recognition system for a vehicle. A sound recognition system includes at least one ambient sound microphone (ASM), at least one vehicle cabin receiver (VCR) and a processor. The ASM is disposed on the vehicle and configured to capture ambient sound external to the vehicle. The VCR is configured to deliver audio content to a vehicle cabin of the vehicle. The processor coupled to the at least one ASM and the at least one VCR. The processor is configured to detect at least one sound signature in the ambient sound and to adjust the audio content delivered to the vehicle cabin based on the detected at least one sound signature.

[0005] Aspects of the present invention also relate to methods for increasing auditory situation awareness in a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; monitoring the ambient sound for a target sound by detecting a sound signature corresponding to the target sound in the ambient sound ; and adjusting a delivery of audio content by at least one vehicle cabin receiver (VCR) to a vehicle cabin of the vehicle based on the target sound.

[0006] Aspects of the present invention further relate to methods for sound signature detection for a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; and receiving a directive to learn a sound signature within the ambient sound . A voice command or an indication from a user is received and is used to initiate the steps of capturing and learning.

[0007] Aspects of the present invention also relate to methods for personalized listening in a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; detecting a sound signature within the ambient sound that is associated with a warning sound; and mixing the warning sound with audio content delivered to the vehicle cabin via at least one vehicle cabin receiver (VCR) in accordance with a priority of the warning sound and a personalized hearing level (PHL) .

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The invention may be u nderstood from the following detailed description when read in connection with the accompanying drawing. It is emphasized, according to common practice, that various features of the drawings may not be drawn to scale. On the contrary, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. Moreover, in the drawing, common numerical references are used to represent like features. Included in the drawing are the following figures:

[0009] FIG. 1 is a pictorial diagram of a vehicle including an exemplary automotive sound recognition system for enhanced situation awareness in accordance with an embodiment of the present invention ;

[0010] FIG. 2 is a block diagram of the system shown in FIG. 1 in accordance with an exemplary embodiment of the present invention ;

[0011] FIG. 3 is a flowchart of an exemplary method for ambient sound monitoring and warning detection in accordance with an embodiment of the present invention ;

[0012] FIG. 4 illustrates various system modes in accordance with an exemplary embodiment of the present invention ;

[0013] FIG. 5 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention ;

[0014] FIG. 6 is a flowchart of an exemplary method for managing audio delivery based on detected sound signatures in accordance with an embodiment of the present invention ; [0015] FIG. 7 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention ;

[0016] FIG. 8 is a pictorial diagram for mixing ambient sounds and warning sounds with audio content in accordance with an exemplary embodiment of the present invention; and

[0017] FIG. 9 is a flowchart of an exemplary method for updating the sound signature detection library dependent on the vehicle location in accordance with an embodiment of the present invention .

DETAILED DESCRIPTION OF THE INVENTION

[0018] The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

[0019] Processes, techniques, apparatus, and materials as known by one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the enabling description where appropriate, for example the fabrication and use of transducers. Additionally in at least one exemplary embodiment the sampling rate of the transducers can be varied to pick up pulses of sound, for example less than 50 milliseconds.

[0020] In all of the examples illustrated and discussed herein, any specific values, for example the sound pressure level change, should be interpreted to be illustrative only and non-limiting . Thus, other examples of the exemplary embodiments could have different values.

[0021] Note that herein when referring to correcting or preventing an error or damage (e.g ., hearing damage), a reduction of the damage or error and/or a correction of the damage or error are intended.

[0022] Automotive vehicle operators are often auditorially removed from their external ambient environment. Ambient sound cues such as from oncoming emergency (and non-emergency) vehicle sound alerts a re often not heard by the vehicle operator due to acoustic isolation of the vehicle cabin and internal cabin noise from engine and road noise and, especially, due to loud music and speech reproduction levels in the vehicle cabin.

[0023] Accordingly, background noises in the external vehicle environment contend with the acoustic sounds produced from the vehicle audio system. The vehicle operator, thus, becomes auditorially disassociated with their ambient environment, thereby increasing the danger of accidents from collisions with oncoming vehicles. A need therefore exists for improving the auditory situation awareness of vehicle operators to automatically alert the operator to ambient warning alerts. [0024] Music and speech audio reproduction levels in vehicles and ambient sound levels are antagonistic. For example, vehicle operators typically play vehicle audio devices louder to hear over the traffic and general urban noise. The same applies to voice communication.

[0025] Automotive vehicle operators are often auditorially removed from their external ambient environment. For example, high sound isolation from the external environment by be provided by cabin structural insu lation, close-fitting window seals and thick or double paned glass. Ambient sound cues (from external acoustic signals), such as oncoming emergency (and non-emergency) vehicle sound alerts; vocal messages from pedestrians; and sounds generated by the operator's own vehicle may often not be heard by the vehicle operator.

[0026] To summarize, the reduced "situation awareness" of the vehicle operator may be a consequence of at least two principle factors. One factor includes acoustic isolation of the vehicle cabin (e.g., from the vehicle windows and structural isolation). A second factor includes sound masking . The sound masking may include masking from internal cabin noise (such as from engine and road noise) and masking from loud music reproduction levels within the vehicle. The masking effect may be further compounded with telephone communications, where the vehicles operator's attention may be further distracted by the conversation. Telephone conversation, thus, may introduce an additional cognitive load that may further reduces the vehicle operator's auditory situation awareness of the vehicle surroundings.

[0027] The reduction of the situation awareness of the vehicle operator may lead to danger. For example, a personal safety of the vehicle operator may be reduced. In addition, personal safety of other vehicle operators and pedestrians in the vicinity of the vehicle may also be threatened.

[0028] One definition of situation awareness includes "the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the nea r future" . While some definitions are specific to the environment from which they were adapted, the above definition may be applicable across multiple task domains from visual to auditory modalities.

[0029] A method and system is herein disclosed to address this problem of reduced auditory situation awareness of vehicle operators. In an exemplary embodiment, ambient warning sounds in the vicinity of a vehicle may be automatically detected and may be actively reproduced in the vehicle cabin, to inform the vehicle operator of detected sounds. According to an exemplary embodiment, a library of known warning sounds may be acquired automatically based on the vehicle's location. [0030] Personal safety of the vehicle operator and passengers in the vehicle may thereby enhanced by exemplary systems and methods of the present invention, as described herein. Accordingly, the safety of other vehicles (such as oncoming emergency vehicles, other motorists, and pedestrians) may also be increased. The safety benefit comes not only from the enhanced auditory situation awareness, but also via reduced driver workload. For example, the system may reduce the burden on the driver to constantly visually scan the environment for emergency vehicles or other dangers that may also have recognizable acoustical signatures (that may ordinarily be inaudible inside the vehicle cabin).

[0031] One focus of the present invention is to enhance (i.e., increase) the auditory situation awareness of a typical vehicle operator and, thereby, improve the personal safety of the vehicle operator, and other motorists and pedestrians.

[0032] Referring to FIG. 1, a pictorial diagram of vehicle 102 including an exemplary automotive sound recognition system (ASRS) 100 for enhanced auditory situation awareness is shown. ASRS 100 may include user interface 106, central audio processor system 114 (also referred to herein as processor 114), indicator 116 and at least one loudspeaker (for example, right loudspeaker 112 and left loudspeaker 120) (also referred to herein as vehicle cabin receivers (VCRs) 112, 120). ASRS 100 may also include one or more ambient microphones (for example, right microphone 104, front microphone 108, rear microphone 110 and left microphone 122) for capturing ambient sound external to vehicle 102. ASRS 100 may also include at least one vehicle cabin microphone (VCM) 118 for capturing sound within vehicle cabin 126.

[0033] Processor 114 may be coupled to one or more of user interface 106, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122. Processor 114 may be configured to control acquisition of ambient sound signals from ambient microphones 104, 108, 110, 122 and (optionally) a cabin sound signal from VCM 118. Processor 114 may be configured analyze ambient and/or cabin sound signals, and to present information by system 100 to vehicle operator 124 (such as via VCRs 112, 120 and/or indicator 116) responsive to the analysis.

[0034] In operation, processor 114 may be configured to receive AC signal 107 and reproduce AC signal 107 through VCRs 112, 120 into vehicle cabin 126. Processor 114 may also be configured to receive ambient sound signals from respective ambient microphones 104, 108, 110, 122. Processor 114 may also be configured to receive a cabin sound signal from VCM 118.

[0035] Based on an analysis of the ambient sound signals (and, optionally, the cabin sound signal), processor 114 may mix the ambient sound signal from at least one of ambient microphones 104, 108, 110, 122 with AC signal 107. The mixed signal may be output to VCRs 112, 120. Accordingly, acoustic cues in the ambient signal (such as an ambulance siren, a vocal warning from a pedestrian, or a vehicle malfunction sound) may be passed into vehicle cabin 126, thereby providing detectable and spatial localization cues for vehicle operator 124.

[0036] AC signal 107 may include any audio signal provided to (and/or generated by) processor 114 that may be reproduced through VCRs 112, 120. AC signal 107 may correspond to (without being limited to) at least one of the following exemplary signals: a music or voice audio signal from a music audio source (for example, a radio, a portable media player, a computing device) ; voice audio (for example, from a telephone, a radio device or an occupant of vehicle 102); or an audio warning signal automatically generated by vehicle 102 (for example, in response to a backup proximity sensor, an unbelted passenger restraint, an engine malfunction condition, or other audio alert signals). AC signal 107 may be manually selected by vehicle operator 124 (for example, with user interface 106), or may be automatically generated by vehicle 102 (for example, by processor 114).

[0037] Although in FIG. 1, two loudspeakers 112, 120 are illustrated, ASRS 100 may include more or fewer loudspeakers. For example, ASRS 100 may have more than two loudspeakers for right, left, front and back balance of sound in vehicle cabin 126. As another example, ASRS 100 may include five loudspeakers (and a subwoofer) for 5.1 channel surround sound. It is understood that, in general, ASRS 100 may include one or more loudspeakers.

[0038] User interface 106 may include any suitable user interface capable of providing parameters for one or more of processor 114, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122. User interface 106 may include, for example, one or more buttons, a pointing device, a keyboard and/or a display device.

[0039] Processor 114 may also issue alerts to vehicle operator 124, for example, via indicator 116. Indicator 116 may provide alerts via a visual indication, an auditory indication (such as a tonal alert) and/or a haptic indication. Indicator 116 may include any suitable indicator such as (without being limited to) : a display (such as a heads-up display), a loudspeaker or a haptic transducer (for example, mounted in the vehicle's steering wheel or operator seat)..

[0040] In an exemplary embodiment, processor 114 may also use ambient microphones 104, 108, 110, 122 and/or VCM 118 and VCRs 112, 120 to cancel a background noise component (such as road noise) in vehicle cabin 126. For example, the noise cancellation may be centered at the position of vehicle operator 124. [0041] Ambient microphones 104, 108, 110, 122 may be positioned on vehicle 102 (for example, on an exterior of vehicle 102 or a ny other suitable location) such that ambient microphones 104, 108, 110, 122 may transduce sound that is external to vehicle 102. In general, ambient microphones 104, 108, 110, 122 may be configured to detect specific sounds in a vicinity of vehicle 102. Although four ambient microphones 104, 108, 110, 122 are illustrated in the positions (i.e., front, right, left and rear of vehicle 102) shown in FIG. 1, in general, system 100 may include any number of microphones and at least one ambient sound microphone. An ambient sound signal (from one or more of ambient microphones 104, 108, 110, 122) may also be mixed with AC signal 107 before being presented through at least one cabin loudspeaker 112, 120.

[0042] According to an exemplary embodiment, processor 114 may determine a sound pressure level (SPL) of vehicle cabin 126 (referred to herein as the cabin SPL) by analyzing a signal level and signal gain reproduced with at least one of loudspeakers 112, 120, and the sensitivity of respective loudspeakers 112, 120. In another exemplary embodiment, processor 114 may determine the cabin SPL via VCM 118. Use of VCM 118 may allow consideration of other sound sources in vehicle cabin 126 (i.e., other than sound sources contributed by loudspeakers 112, 120), such as an air conditioning system, and sound from other passengers in vehicle 102.

[0043] ASRS 100 may be coupled to a remote location (not shown), for example, by wireless communication. Information collected by ASRS 100 may be provided to the remote location (such as for further analysis) .

[0044] Referring to FIG. 2, a block diagram of ASRS 100 in accordance with an exemplary embodiment is shown. As illustrated, the ASRS 100 can include processor 114 operatively coupled to the Ambient Sound Microphone (ASM) 201, one or more VCRs 112, 120, and VCM 118 via one or more Analog to Digital Converters (ADC) 202 and Digital to Analog Converters (DAC) 203. In FIG. 2, ASM 201 may represent one or more of ambient microphones 104, 108, 110, 122 shown in FIG. 1. ASRS 100 can include an audio interface 212 operatively coupled to the processor 114 to receive AC signal 107 (for example, from a media player, a cell phone, voice mail), and deliver the AC signal 107 to the processor 114.

[0045] The processor 114 may include sound signature detection block 214 and may monitor the ambient sound captured by the ASM 201 for warning sounds in the environment, such as an alarm (e.g ., bell, emergency vehicle, security system, etc.), siren (e.g., police car, ambulance, etc.) , voice (e.g., "help", "stop", "police", etc.), or specific noise type (e.g., breaking glass, gunshot, etc.) . The memory 208 can store sound signatures for previously learned wa rning sounds to which the processor 114 refers for detecting warning sounds. The sound signatures can be resident in the Q memory 208 or downloaded to processor 114 via the transceiver 204 during operation as needed . Upon detecting a warning sound, the processor 114 can report the warning to the vehicle operator 124 (also referred to herein as user 124) via audio delivered from the VCRs 112, 120 to the vehicle cabin.

[0046] The processor 114 responsive to detecting warning sounds can adjust the audio content signal 107 and the warning sounds delivered to the vehicle cabin 126. The processor 114 can actively monitor the sound exposure level inside the vehicle cabin 126 and adjust the audio to within a safe and subjectively optimized listening level range. The processor 114 can utilize computing technologies such as a microprocessor,

Application Specific Integrated Chip (ASIC), and/or a digital signal processor (DSP) with associated storage memory 208 such as Flash, ROM, RAM, SRAM, DRAM or other like technologies for controlling operations.

[0047] The ASRS 100 can further include a transceiver 204 that can support singly or in combination any number of wireless access technologies including without limitation Bluetoothâ„¢, Wireless Fidelity (Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX), and/or other short or long range communication protocols. The transceiver 204 can also provide support for dynamic downloading over-the-air to the ASRS 100. It should be noted also that next generation access technologies can also be applied to the present disclosure.

[0048] The power supply 210 can utilize common power management technologies such as replaceable batteries, supply regulation technologies, and charging system technologies for supplying energy to the components of the ASRS 100 and to facilitate portable applications. A motor (not shown) can be a single supply motor driver coupled to the power supply 210 to improve sensory input via haptic vibration (for example, via indicator 116 (FIG. 1) configured as a haptic indicator), e.g. connected to the vehicle steering wheel or vehicle operator chair. As an example, the processor 114 can direct the motor to vibrate or pulse responsive to an action, such as a detection of a warning sound or an incoming voice call.

[0049] FIG. 3 is a flowchart of a method 300 for vehicle ambient sound monitoring and warning detection in accordance with an exemplary embodiment. The method 300 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 300, reference will be made to components of FIG 2 , although it is understood that the method 300 can be implemented in any other manner using other suitable components.

[0050] As shown in step 302, the processor 114 can monitor the environment for warning sounds, such as an alarm, a horn, a voice, or a noise. Each of the warning sounds can have certain identifiable features that characterize the sound. The features can be collectively referred to as a sound signature which can be used for recognizing the warning sound. As an example, the sound signature may include statistical properties or parametric properties of the warning sound. For example, a sound signature can describe prominent frequencies with associated amplitude and phase information. As another example, the sound signature can contain principal components identifying the most likely recognizable features of a warning sound.

[0051] At step 304, the processor 114 may detect the warning sounds within the environment based on the sound signatures. As described below, feature extraction techniques may be applied to the ambient sound captured at the ASM 201 to generate the sound signatures. Pattern recognition approaches may be applied based on known sound signatures to detect the warning sounds from their corresponding sound signatures. More specifically, sound signatures may be compared to learned models to identify a corresponding warning sound.

[0052] At step 306, the processor 114 may adjust sound delivered to the vehicle cabin 126 in view of a detected warning sound. Upon detecting a warning sound in the ambient sound of the user's environment, the processor 114, at step 308, may generate an audible alarm within the vehicle cabin 126 that identifies the detected sound signature.

[0053] The audible alarm can be a reproduction of the warning sound, an

amplification of the warning sound (or the entire ambient sound), a text-to-speech message (e.g., synthetic voice) identifying the warning sound, a haptic vibration via indicator 116 (configured as a haptic indicator), or an audio clip. For example, the processor 114 can generate a sound bite (i.e., audio clip) corresponding to the detected warning sound such as an ambulance, fire engine, or other environmental sound. As another example, the processor 114 can synthesize a voice to describe the detected warning sound (e.g., "ambulance approaching"). At step 310, processor 114 may send a message to a mobile device identifying the detected sound signature (e.g., "alarm sounding").

[0054] FIG. 4 illustrates system modes of ASRS 100 in accordance with an exemplary embodiment. The system mode may be manually selected by user 124, for example, by pressing a button; or automatically selected, for example, when the processor 114 detects it is in an active listen state or in a media state. As shown in FIG. 4, the system mode can correspond to Signature Sound Pass Through Mode (SSPTM), Signature Sound Boost Mode (SSBM), Signature Sound Rejection Mode (SSRJM), Signature Sound Attenuation Mode (SSAM), and Signature Sound Replacement Mode (SSRM).

[0055] In SSPTM mode, ambient sound captured at the ASM 201 is passed transparently to the VCRs 112, 120 for reproduction within the vehicle cabin 126. In this mode, the sound produced in the vehicle cabin 126 sufficiently matches the ambient sound outside the vehicle cabin 126, thereby providing a "transparency" effect. That is, the loudspeakers 112, 120 in the vehicle cabin 126 recreate the sound captured at the ASM 201. The processor 114, by way of sound measured at the VCM 118, may adjust the properties of sound delivered to the vehicle cabin 126 so the sound within the occluded vehicle cabin 126 is the same as the ambient sound outside the vehicle 102.

[0056] In SSBM mode, warning sounds and/or ambient sounds are amplified upon the processor 114 detecting a warning sound. The warning sound can be amplified relative to the normal level received, or amplified above an audio content level if audio content is being delivered to the vehicle cabin 126.

[0057] In SSRJM mode, sounds other than warning sounds may be rejected upon the processor 114 detecting a specific sound signature. The specific sound can be minimized relative to the normal level received. In SSAM mode, sounds other than warning sounds can be attenuated. For example, in both SSRJM and SSAM modes, annoying sounds or noises not associated with warning sounds can be suppressed. For instance, by way of a learning session, the user 124 can establish which sounds are considered warning sounds (e.g ., "ambulance") and which sounds are considered non-warning sounds (e.g. "jackhammer"). The processor 114 upon detecting non-warning sounds can thus attenuate or reject these sounds within the vehicle cabin 126.

[0058] In SSRM mode, warning sounds detected in the environment can be replaced with audible warning messages. For example, the processor 114, upon detecting a warning sound, can generate synthetic speech identifying the warning sound (e.g., "ambulance detected") . In such regard, the processor 114 may audibly report the warning sound identified, thereby relieving the user 124 from having to interpret the warning sound. The synthetic speech can be mixed with the ambient sound (e.g., amplified, attenuated, cropped, etc.), or played alone with the ambient sound muted .

[0059] FIG. 5 is a flowchart of a method 500 for sound signature detection in accordance with an exemplary embodiment. The method 500 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 500, reference will be made to components of FIG 2 , although it is understood that the method 500 can be implemented in any other manner using other suitable components.

[0060] The method can start at step 502, in which the processor 114 can enter a learn mode. Notably, the processor 114 upon completion of a learning mode or previous learning configuration can start instead at step 520. In the learning mode of step 502, the processor 114 can actively generate and learn sound signatures from ambient sounds within the environment. In learning mode, the processor 114 can also receive 1077

previously trained learni ng models to use for detecting warning sounds in the

environment.

[0061] In an active learning mode, the user 124 can press a button or otherwise (e.g . voice recognition) initiate a recording of ambient sou nds in the environ ment. For example, the user can, upon hearing a new warning sound in the environment ("car horn"), activate the processor 114 to learn the new wa rning sound . Upon generating a sound signature for the new warning sound, it can be stored in the user defined database 504. In another arrangement, the processor 114, upon detecting a unique sou nd, cha racteristic to a warning sou nd, can ask the user 124 if they desire to have the sou nd signature for the unique sound learned . In such regard , the processor 114 may actively sense sounds and may query the user 124 about the environment to learn the sounds. Moreover, the processor 114 can organize learned sounds based on

environmental context, for example, in city and country environments.

[0062] In an exemplary embodiment, ASRS 100 (FIG. 2) may provide for delayed recording, to allow a previously encountered sound to be learned . For example, ASRS 100 may include a buffer to store ambient sounds recorded for a period of time. User 124 may review the recorded ambient sounds and select one or more of these recorded ambient sounds for learning (such as via user interface 106 (FIG . 1 )) .

[0063] In another learni ng mode, trained models can be retrieved from an on-line database 506 for use in detecting warning sounds. The previously learned models can be transmitted on a scheduled basis to the processor 114, or as needed , depending on the environmental context. For example, upon the processor 114, upon detecting traffic noise, may retrieve sound sig nature models associated with warning sounds (e.g . , ambulance, police car) in traffic. In another embodiment, upon the processor 114 detecting conversational noise (e .g . people talki ng), sound signature models for verbal warnings ("help", "pol ice") may be retrieved . Groups of sound signature models may be retrieved based on the environmental context or on user directed action .

[0064] As shown in step 508, the ASRS processor 114 can also generate speech recog nition models for warning sounds corresponding to voice, such as " help", " police", "fire", etc. The speech recog nition models may be retrieved from the on-line database 506 or the user defined database 504. In the latter for example, the user 124 may say a word or enter a text version of a word to associate the word with a verbal warning sound . For instance, the user 124 may define a set of words of interest along with mappings to their meanings, and then use keyword spotting to detect their occurrences. If the user 124 enters an environment wherein another i ndividual says the sa me word (e.g . , " help") the processor 114 may inform the user 124 of the verbal warning sound . U 2012/021077

[0065] For other acoustic sounds, the processor 114 may generate sound signature models as shown in step 510. Notably, the processor 114 itself may generate the sound signature models, or transmit the captured warning sounds to external systems (e.g . , a remote server) that generate the sound signature models. Such learning may be conducted off-line in a training phase, and the processor 114 can be uploaded with the new learning models.

[0066] It should also be noted that the learning models can be updated during use of the AS S 100, for example, when the processor 114 detects warning sounds. The detected warning sounds can be used to adapt the learning models as new warning sound variants are encountered . For example, the processor 114 upon detecting a warning sound, can use the sound signature of the warning sound to u pdate the learned models in accordance with the training phase. In such an embodiment a first learned model is adapted based on new training data collected in the environment by the processor 114. In such regard, for example, a new set of "horn" warning sounds could be included in real-time training without discarding the other "horn" sounds already captured in the existing model.

[0067] Upon completion of learning, uploading, or retrieval of sound signature models, the processor 114 can monitor and report warning sounds within the

environment. As shown in step 520, ambient sounds (e.g . an input signal) within the environment are captured by the ASM 201. The ambient sounds can be digitized by way of the ADC 202 and stored temporarily to a data buffer in memory 208 as shown in step 522. The data buffer may be capable of holding enough data to allow for generation of a sound signature as will be described ahead in FIG. 7.

[0068] In another configuration, the processor 114 can implement a "look ahead" analysis system by way of the data buffer for reproduction of pre-recorded audio content, using a data buffer to offset the reproduction of the audio signal. The look- ahead system allows the processor 114 to analyze potentially harmful audio artifacts (e.g. high level onsets, bursts, etc.) either received from an external media device, or detected with the ambient microphones 201, in-situ before it is reproduced . The processor 114 can thus mitigate the audio artifacts in advance to reduce timbral distortion effects caused by, for instance, attenuating high level transients.

[0069] At step 524, signal conditioning techniques may be applied to the ambient sound, for example, to suppress noise or gate the noise to a predetermined threshold. Other signal processing steps such as threshold detection shown in step 526 may be used to determine whether ambient sounds should be evaluated for warning sounds. For instance, to conserve computational processing resources (e.g., battery, processor), only ambient sounds that exceed a predetermined power level may be evaluated for warning sounds. Other metrics such as signal spectrum, duration, and stationarity may be considered in determining whether the ambient sound is analyzed for warning sounds. Notably, other metrics (e.g., context aware) may also be used to determine when the ambient sound should be processed for warning sound detection.

[0070] If at least one property (e.g., power, spectral shape, duration, etc. ) of the ambient sound exceeds a threshold (or adaptive threshold), the processor 114 at step 530 can proceed to generate a sound signature for the ambient sound . In one embodiment the sound signature is a feature vector which can include statistical parameters or salient features of the ambient sound.

[0071] An ambient sound with a warning sound (e.g . "bell", "siren"), such as shown in step 532, is generally expected to exhibit features similar to sound signatures for similar warning sounds (e.g. "bell", "siren") stored in the user defined database 504 or the on-line database 506. The processor 114 can also identify a direction and speed of the sound source if it is moving, for example, by evaluating Doppler shift as shown in step 534 and 536. The processor 114, by way of beam-forming among multiple ASMs 201 may also estimate a direction of a sound source generating the warning sound.

[0072] The speed and bearing of the sound source can also be estimated using pitch analysis to detect changes predicted by Doppler effect, or alternatively by an analysis in changes in relative phase and magnitude between the two ASM signals. The processor 114, by way of a sound recognition engine, may detect general warning signals such as car horns or emergency sirens (and other signals referenced by ISO 7731) using spectral and temporal analysis.

[0073] The processor 114 can also analyze the ambient sound to determine if a verbal warning (e.g. "help", "police", "excuse me") is present. As shown in step 540, the sound signature of the ambient sound can be analyzed for speech content. For example, the sound signature may be analyzed for voice information, such as vocal cord pitch periodicities, time-varying voice formant envelopes, or other articulation parameter attributes. Upon detecting the presence of voice in the ambient sound, the processor 114 can perform key word detection (e.g . "help") in the spoken content as shown in step 542. Speech recognition models as well as language models may be used to identify key words in the spoken content. As previously noted, the user 124 may say or enter in one or more warning sounds that may be mapped to associated learning models for sound signature detection.

[0074] As shown in step 552, the user 124 may also provide user input to direct operation of the processor 114, for example, to select an operational mode as shown in 550. As one example, the operational mode can enable, disable or adj ust monitoring of warning sounds. For example, in a listening mode, the processor 114 may mix audio content with ambient sound while monitoring for warning sounds. In a quiet mode, the processor 114 may suppress or attempt to actively cancel all noises except detected warning sounds.

[0075] The user input may be in the form of a physical interaction (e.g., button press) or a vocalization (e.g., spoken command). The operating mode can also be controlled by a prioritizing module as shown in step 554. The prioritizing module may prioritize warning sounds based on severity and context. For example, if the user 124 is in a phone call, and a warning sound is detected, the processor 114 may audibly inform the user 124 of the warning and/or present a text message of the warning sound. If the user 124 is listening to music or a voice communication, and a warning sound is detected, the processor 114 may automatically shut off the music or voice audio and alert the user. The user 124, by way of user interface 106 (FIG. 1) or an administrator, may rank warning sounds and instruct the processor 114 how to respond to warnings in various contexts.

[0076] FIG. 6 is a flowchart of a method 600 for managing audio delivery based on detected sound signatures in accordance with an exemplary embodiment. The method 600 can be practiced with more or less than the number of steps shown and is not limited to the order shown. To describe the method 600, reference will be made to components of FIG 2 , although it is understood that the method 600 can be

implemented in any other manner using other suitable components.

[0077] As noted previously, the audio interface 212 can supply audio content (e.g., music, cell phone, voice mail, etc.) to the processor 114. In such regard, the user 124 may listen to music, talk on the phone, receive voice mail, or perform other audio related tasks while the processor 114 additionally monitors warning sounds in the environment. During normal use, when a warning sound is not present, the processor 114 may operate normally to recreate the sound experience requested by the user 124. If however the processor 114 detects a warning sound, the processor 114 may manage audio content delivery to notify the user 124 of the warning sound. Managing audio content delivery can include adjusting or overriding other current audio settings.

[0078] By way of example, as shown in step 602, the audio interface 212 receives audio content from a media player, such as a portable music player, or cell phone. The audio content can be delivered to the user's vehicle cabin by way of the VCRs 112, 120 as shown in step 604.

[0079] At step 606, the processor 114 monitors ambient sound in the environment captured at the ASM 201. Ambient sound may be sampled at sufficiently data rates (e.g. 8 kHz, 16 kHz, and 32 KHz) to allow for feature extraction of sound signatures.

Moreover, the processor 114 may adjust the sampling rate based on the information content of the ambient signal. For example, upon the ambient sound exceeding a first threshold, the sampling rate may be set to a first rate (e.g ., 4 KHz) . As the ambient sound increases in volume, or as prominent features are identified, the sampling rate may be increased to a second rate (e.g., 8 KHz) to increase signal resolution. Although, the higher sampling rate may improve resolution of features, the lower sampling rate may preserve use of computational resources for minimally sufficient feature resolution (e.g., power supply 210, processor 114).

[0080] If at step 608, a sound signature is detected, the processor 114 may determine a priority of the detected sound signature (at step 610) . The priority establishes how the processor 114 manages audio content. Notably, warning sounds for various environmental conditions and user experiences can be learned. Accordingly, the user 124 or an administrator, can establish priorities for warning sounds. Moreover, these priorities may be based on environmental context. For example, if a user 124 is in a warehouse where loading vehicles emit a beeping sound, sound signatures for such vehicles can be given the highest priority. A user 124 may also prioritize learned warning sounds, for example, via a user interface on a paired device (e.g ., cell phone), or via speech recognition (e.g ., "prioritize - 'ambulance' - high").

[0081] Upon detecting a warning sound and identifying a priority, the processor 114, at step 612, selectively manages at least a portion of the audio content based on the priority. For example, if the user 124 is listening to music during the time a warning sound is detected, the processor 114 may decrease the music volume to present an audible notification. This may represent one indication that the processor 114 has detected a warning sound.

[0082] At step 614, the processor 114 may further present an audible notification to the user 124. For example, upon detecting a "horn" sound, a speech-to-text message can be presented to the user 124 to audibly inform them that a horn sound has been detected (e.g. , "horn detected"). Information related to the warning sound (e.g ., direction, speed, priority, etc.) may also be presented with the audible notification.

[0083] In a further arrangement, the processor 114 may send a message to a device operated by the user 124 to visually display the notification, as shown in step 616. For example, if the user has disengaged audible notification, the processor 114 may transmit a text message to a paired device (e.g ., a cell phone) containing the audible warning (or to indicator 116 configured as a visual indicator) . Moreover, the processor 114 may beacon out an audible alarm to other devices within a vicinity, for example via Wi-Fi (e.g ., IEEE 802.16x) . Other devices in a proximity of the user 124 may sign up to receive audible alarms from the processor 114. In such regard, the processor 114 can beacon a warning notification to other devices in the area to share warning information with other people.

[0084] FIG. 7 is a flowchart of a method 700 further describing sound signature detection in accordance with an exemplary embodiment. The method 700 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown . The method 700 can begin in a state in which the processor 114 is actively monitoring warning sounds in the environment.

[0085] At step 711, ambient sound captured from the ASM 201 may be buffered into short term memory as frames. As an example, the ambient sound may be sampled at 8 KHz with 10-20 ms frame sizes (80 to 160 samples) . The frame size may also vary depending on the energy level of the ambient sound. For example, the processor 114 upon detecting low level sounds (e.g ., about 70-74 dB SPL) may use a frame size of about 30 ms, and update the frame size to about 10 ms as the power level increases (e.g., greater than about 86 dB SPL) . The processor 114 may also increase the sampling rate in accordance with the power level and/or a duration of the ambient sound . (A longer frame size with lower sampling may compromise resolution for computational resources.) The data buffer is desirably of sufficient length to hold a history of frames (e.g ., about 10- 15 frames) for short-term historical analysis.

[0086] At step 712, the processor 114 may perform feature extraction on the frame as the ambient sound is buffered into the data buffer. As one example, feature extraction may include performing a filter-bank analysis and summing frequencies in auditory bandwidths. Features may also include Fast Fourier Transform (FFT) coefficients, Discrete Cosine Transform (DCT) coefficients, cepstral coefficients, partial autocorrelation (PARCOR) coefficients, wavelet coefficients, statistical values (e.g., energy, mean, skew, variance), parametric features, or any other suitable data compression feature set.

[0087] Additionally, dynamic features, such as derivatives of any order, may be added to the static feature set. As one example, mel-frequency-cepstral analysis may be performed on the frame to generate between about 10-16 mel-frequency-cepstral coefficients. The small number of coefficients represent features that may be compactly stored to memory for that particular frame. Such front end feature extraction techniques may reduce the amount of data used to represent the data frame.

[0088] At step 713, the features may be incorporated as a sound signature and compared to learned models, for example, those retrieved from the warning sounds database 718 (e.g., user defined database 504 or the on-line database 506 of FIG. 5). A sound signature may be defined as a sound in the user's ambient environment which has significant perceptual saliency. As an example, a sound signature may correspond to an alarm, an ambulance, a siren, a horn, a police car, a bus, a bell, a gunshot, a window breaking, or any other warning sound, including voice. The sound signature may include features characteristic to the sound. As an example, the sound signature may be classified by statistical features of the sound (e.g., envelope, harmonics, spectral peaks, modulation, etc.).

[0089] Notably, each learned model used to identify a sound signature has a set of features specific to a warning sound. For example, a feature vector of a learned model for an "alarm" is sufficiently different from a feature vector of a learned model for a "bell sound". Moreover, the learned model may describe interconnectivity (e.g., state transitions, emission probabilities, initial probabilities, synaptic connections, hidden layers) among the feature vectors (e.g. frames). For example, the features of a "bell" sound may change in a specific manner compared to the features of an "alarm" sound. The learned model may be a statistical model such as a Gaussian mixture model (GMM), a Hidden Markov Model (HMM), a Bayes Classifier, or a Neural Network (NN) that requires training.

[0090] In the foregoing, a Gaussian Mixture Model (GMM) is presented, although it should be noted that any of the above models may be used for sound signature detection. In this case, each warning sound may have an associated GMM used for detecting the warning sound. As an example, the warning sound for an "alarm" may have its own GMM, and a warning sound for a "bell" may have its own GMM. Separate GMMs may also be used as a basis for the absence of the sounds ("anti-models"), such as "not alarm" or "not bell." Each GMM may provide a model for the distribution of the feature statistics for each warning sound in a multi-dimensional space.

[0091] Upon presentation of a new feature vector, the likelihood of the presence of each warning sound may then be calculated. In order to detect a warning sound, each warning sound's GMM may be evaluated relative to its anti-model, and a score related to the likelihood of that warning sound may be computed. A threshold may be applied directly to this score to decide whether the warning sound is present or absent.

Similarly, the sequence of scores may be relayed to yet another module which uses a more complex rule to decide presence or absence. Examples of such rules include linear smoothing or median filtering.

[0092] As previously noted, a HMM model or NN model with their associated connection logic may be used in place of each GMM for each learning model. For example, each warning sound in the database 718 may have a corresponding HMM. A sound signature for a warning sound captured at the ASM 201 in ambient sound may be processed through a lattice network (e.g. Viterbi network) for comparison to each HMM, to determine which HMM, if any, corresponds to the warning sound. Alternatively, in a trained NN, the sound signature may be input to the NN, where the output states of the NN correspond to warning sound indices. The NN may include various topologies such as a Feed-Forward, Radial Basis Function, Hopfield , Time-Delay Recurrent, or other optimized topologies for real-time sound signature detection.

[0093] At step 714, a distortion metric is performed with each learned model to determine which learned models are closest to the captured feature vector (e.g., sound signature) . The learned model with the smallest distortion (e.g., mathematical distance) is generally considered the correct match, or recognition result. It should also be noted that the distortion may be calculated as part of the model comparison in step 713. This is because the distortion metric may depend on the type of model used (e.g. , HMM, NN, GMM, etc.) and in fact may be internal to the model (e.g . Viterbi decoding, back- propagation error update, etc.) . The distortion module is merely presented in FIG. 7 as a separate component to suggest use with other types of pattern recognition methods or learning models.

[0094] Upon evaluating the feature vector (e.g . sound signature) against the candidate warning sound learned models, the ambient sound at step 715 may be classified as a warning sound. Each of the learned models may be associated with a score. For example, upon the presentation of a sound signature, each GMM may produce a score. The scores may be evaluated against a threshold, and the GMM with the highest score may be identified as the detected warning sound. For example, if the learned model for the "alarm" sound produces the highest score (e.g., smallest distortion result) compared to other learned models, the ambient sound may be classified as an "alarm" warning sound.

[0095] The classification step 715 also takes into account likelihoods (e.g. recognition probabilities) . For example, as part of the step of comparing the sound signature of the unknown ambient sound against all the GMMs for the learned models, each GMM may produce a likelihood result, or output. As an example, these likelihood results may be evaluated against each other or in a logical context, to determine the GMM considered "most likely" to match the sound signature of the warning sound. The processor 114 may then select the GMM with the highest likelihood or score via soft decisions.

[0096] The processor 114 may continually monitor the environment for warning sounds, or monitor the environment on a scheduled basis. In one arrangement, the processor 114 may increase monitoring in the presence of high ambient noise possibly signifying environmental danger or activity. Upon classifying an ambient sound as a warning sound the processor 114, at step 716, the ASRS 100 may generate an alarm. As previously noted, the processor 114 may mix the warning sound with audio content, amplify the warning sound, reproduce the warning sound, and/or deliver an audible message. As one example, spectral bands of the audio content that mask the warning sound may be suppressed to increase an audibility of the warning sound . This serves to notify the user 124 of a warning sounded detected in the environment, of which the user 124 may not be aware of depending on their environmental context.

[0097] As an example, the processor 114 may present an amplified audible notification to the user via the VCRs 112, 120. The audible notification may be a synthetic voice identifyi ng the warning sound (e.g. "car alarm"), a location or direction of the sound source generating the warning sound (e.g . "to your left"), a duration of the warning sound (e.g., "3 minutes") from initial capture, and any other information (e.g., proximity, severity level, etc.) related to the warning sound . Moreover, the processor 114 may selectively mix the warning sound with the audio content based on a

predetermined threshold level. For example, the user 124 may prioritize warning sound types for receiving various levels of notification, and/or identify the sound types as desirable of undesirable.

[0098] FIG. 8, presents a pictorial diagram 800 for mixing ambient sounds and warning sounds with audio content. In the illustration show, the processor 114 is directing music 136 to the vehicle cabin 126 via VCR 120 while simultaneously monitoring warning sounds in the environment. At time, T, the processor 114, upon detecting a warning sou nd (signature 135), can lower the music volume from the media player 150 (graph 141), and increase the volume of the ambient sound received at the ASM 201 (graph 142) . Other mixing arrangements are herein contemplated.

[0099] In such regard, there is a smooth audio transition between the music 136 and the warning sound 135. Notably, the ramp up and down times can also be adjusted based on the priority of the warning sound . For example, in an extreme case, the processor 114 may immediately shut off the music, and present the audible warning. Other various implementations for mixing audio and managing audio content delivery are herein contemplated .

[00100] Moreover, the audio content may be managed with other media devices (e.g., a cell phone) . For example, upon detecting a warning sound, the processor 114 may inform the user 124 and the called party of a warning sound . In such regard, the user 124 does not need to inform the called party since they also receive the notification, which may save them time to explain an emergency situation.

[00101] As one example, the processor 114 may spectrally enhance the audio content in view of the ambient sound. Moreover, a timbral balance of the audio content may be maintained by taking into account level dependent equal loudness curves and other psychoacoustic criteria (e.g. , masking) associated with a personalized hearing level (PHL) 430. For example, auditory queues in a received audio content may be enhanced 1077

~ 20 ~ based on the PHL 430 and a spectrum of the ambient sound captured at the ASM 201. Frequency peaks within the audio content may be elevated relative to ambient noise frequency levels and in accordance with the PHL 430 to permit sufficient audibility of the ambient sound. The PHL 430 reveals frequency dynamic ranges that may be used to limit the compression range of the peak elevation in view of the ambient noise spectrum.

[00102] In one arrangement, the processor 114 may compensate for a masking of the ambient sound by the audio content. Notably, the audio content, if sufficiently loud, may mask auditory queues in the ambient sound, which can : i) potentially cause hearing damage, and ii) prevent the user 124 from hearing warning sounds in the environment (e.g., an approaching ambulance, an alarm, etc.). Accordingly, the processor 114 may accentuate and attenuate frequencies of the audio content and ambient sound to permit maximal sound reproduction while simultaneously permitting audibility of ambient sounds.

[00103] In one exemplary embodiment, the processor 114 may narrow noise frequency bands within the ambient sound to permit sensitivity to audio content between the frequency bands. The processor 114 may also determine if the ambient sound contains salient information (e.g., warning sounds) that should be un-masked with respect to the audio content. If the ambient sound is not relevant, the processor 114 may mask the ambient sound (e.g., increase levels) with the audio content until warning sounds are detected.

[00104] FIG. 9 is a flowchart of a method 900 for updating the sound signature detection library dependent on the vehicle location. At step, 902, a current vehicle location may be acquired. The location of vehicle 102 may be determined with a number of methods, including : a Global Positioning System (GPS) for determining a GPS location and cell-phone signal cell codes for triangulating the location (for example, based on a signal strength of the received signal(s) from cell phone networks).

[00105] The acquired vehicle location may be used, at step 904, to determine which "sound library cell" the vehicle 102 is located in. The sound library cell may refer to a geographic region, such as a country, state, province or city. The sound library cell may contain the target sound signatures which the vehicle's sound recognition system detects (for example, for a particular sound signature of a particular fire engine siren, there may be associated GMM statistics, as described above). The cell library may be determined from the acquired location (for example, using a look-up table), or may be acquired directly from coded signals embedded in the received location signal (for example, a cellphone signal or a radio signal).

[00106] If, at step 906, the vehicle library cell location is determined to have changed, then the current sound signature library of the vehicle is updated (at step 910). For example, the current vehicle sound signature library may be updated from data provided by an on-line sound signature library 908, where the new sound signature data may be received wirelessly or from a storage device located on the vehicle 102.

[00107] Although the invention has been described in terms of automotive sound recognition systems and methods for enhancing situation awareness in a vehicle, it is contemplated that one or more steps and/or components may be implemented in software for use with microprocessors/general purpose computers (not shown). In this embodiment, one or more of the functions of the various components and/or steps described above may be implemented in software that controls a computer. The software may be embodied in non-transitory tangible computer readable media (such as, by way of non-limiting example, a magnetic disk, optical disk, flash memory, hard drive, etc.) for execution by the computer.

[00108] Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the invention.