Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR PROCESSING DIGITAL AUDIO SIGNALS
Document Type and Number:
WIPO Patent Application WO/2019/141455
Kind Code:
A1
Abstract:
A method for processing digital audio signals comprising a plurality of samples, each sample having an amplitude. It comprises estimating power spectral density of each one of a plurality of audio files of a first type to provide an averaged power spectrum for each audio file, and calculating frequency dependent mean and standard deviation values of each frequency bin of the averaged power spectrum for each audio file. Further steps are calculating a headroom spectrum as an amount of which a frequency dependent maximum value that can be represented exceeds a maximum sample amplitude of the audio signal before reaching a controlled degree of saturation, amplifying the digital audio signals within the headroom spectrum in sound processing steps, and outputting the processed digital audio signals.

Inventors:
PHILIPSSON JOHN (SE)
MARTINSON ROGER (SE)
WALDECK CARL-JOHAN (SE)
Application Number:
PCT/EP2018/085155
Publication Date:
July 25, 2019
Filing Date:
December 17, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HEAREZANZ AB (SE)
International Classes:
H04R3/02; H03G9/00; H03G9/02
Domestic Patent References:
WO2000047014A12000-08-10
WO2003015468A12003-02-20
Foreign References:
US20150117685A12015-04-30
US20060215844A12006-09-28
Attorney, Agent or Firm:
HANSSON THYRESSON AB (SE)
Download PDF:
Claims:
CLAIMS

1. A method for processing digital audio signals comprising a plurality of samples, each sample having an amplitude, comprising

estimating signal frequency content of each one of a plurality of au- dio files of a first type to provide an average signal frequency content,

calculating a measure of frequency dependent central tendency and a spread between frequency content values of each frequency bin of the signal frequency content of each of said audio files,

calculating a headroom spectrum as an amount of which a fre- quency dependent maximum value that can be represented exceeds a maxi- mum sample amplitude of the audio signal before reaching a controlled de- gree of saturation,

amplifying the digital audio signals within the headroom spectrum in sound processing steps, and

outputting the processed digital audio signals.

2. The method of claim 1 , comprising estimating signal frequency content by estimating energy spectral density of each one of a plurality of audio files of a first type to provide an averaged energy spectrum for each audio file.

3. The method of claim 1 , comprising calculating a measure of frequency de- pendent central tendency and a spread between frequency content values by calculating frequency dependent mean and standard deviation values of each frequency bin of the averaged energy spectrum for each audio file.

4. The method of any of claims 1 -3, comprising calculating the headroom to allow a tolerance in percent of audio samples that are allowed to saturate.

5. The method of claim 4, comprising calculating the headroom using the equatations: headroorriyf headroorrif — yaWf =—1 * m — yaWf mean value standard deviation

and, y being the number of standard deviations deducted or added to the mean value.

6. The method of any of claims 1 -3, comprising limiting the digital audio sig- nals to a signal level at a threshold value (11 ) below saturation after said sound processing steps.

7. The method of claim 6, comprising gradually reducing gain to the digital audio signal when a signal level of the digital audio signal exceeds a prede- fined knee threshold value (13), said predefined knee threshold value (13) being lower than the level of saturation.

Description:
A METHOD FOR PROCESSING DIGITAL AUDIO SIGNALS

TECHNICAL FIELD

[0001] The present invention relates generally to a method for processing digital audio signals.

PRIOR ART

[0002] Normal and hearing-impaired listeners often wish to optimize an au- dio sound field and to customize audio systems and products. A system and method for optimizing audio is disclosed in US20060215844. The back- ground of the invention disclosed therein is prior art systems that have built-in audio equalization systems that have been designed to alter the spectrum of the audio signal for improved listening experiences. Gains of a filter bank is adjusted by a user during an initial process and filter specifications for the user is later used to modify output of an audio signal.

[0003] Normally, it is desired to increase gain or signal level at specific fre- quencies. In analogue sound systems it is comparatively simple to achieve this without effects in other frequencies or frequency bands. When perform- ing digital audio processing, there are many scenarios when one might wish to increase the amplitude of the audio content. Since digital audio is quan- tized with a fixed maximum value referred to as a saturation level, there is big problem with headroom when performing such operations. In this context, headroom in a digital signal is the amount of which the maximum value that can be represented exceeds the maximum level of the signal content. Espe- dally when dealing with music, which is often mastered to utilize the entire bit resolution, leaving little to no headroom to work with. One work around is to simply lower the level of the signal, resulting in more headroom. However, this impacts quality as the resolution of the signal deteriorates.

SUMMARY OF THE INVENTION

[0004] An object of the invention is to provide a better method, taking into account the dynamic frequency content of the signal. In accordance with a first aspect of the invention, a special soft knee limiter, tuned to a statistical headroom ruled by a saturation tolerance is produced. This headroom can also be used as a conditional max value to the processing algorithms that are running before the limiter. One or many sound files that can be analyzed. These should be representative for the sound that is to be processed. In case of few sound files, each file may be broken up in intervals. This data can be collected from a representative file base beforehand, or runtime if the file to be processed is available.

[0005] The sound files are analyzed to get an averaged energy or power spectrum for each file. A central tendency is calculated for all of the analyzed files by averaging each frequency bin. The standard deviation spectrum is calculated in the same manner. Then the headroom spectrum is determined and the digital audio signals are amplified within the available headroom spectrum. In various embodiments, some samples of the digital audio signal will be above saturation level. These samples can be handled by a limiter, preferably a soft-knee limiter. Since the theoretical maximum level of the sig nal and our allowed gain compensation are known from headroom calcula- tions, it is possible to tune the limiter to fit the signal optimally. Any samples out of range for a container the samples are transported in will compressed to fit the container.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] In order that the manner in which the above recited and other ad- vantages and objects of the invention are obtained will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

[0007] Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specific- ity and detail through the use of the accompanying drawings in which: Fig. 1 is a schematic diagram showing a headroom in a typical music sound file,

Fig. 2a is a schematic diagram showing a power spectral density of the sound file of Fig. 1 ,

Fig. 2b is a schematic diagram showing mean headroom at different fre- quencies of the sound file of Fig. 1

Fig. 3 is a schematic diagram showing signal compression, signal limiter and a soft-knee limiter,

Fig. 4 is a schematic block diagram showing functional blocks of one em bodiment of a system in accordance with the invention, and

Fig. 5 is a schematic diagram showing normal distribution and tolerance.

DETAILED DESCRIPTION

[0008] When performing audio processing, there are many scenarios when an increase of the amplitude of the audio content is desired. Since digital au- dio is quantized with a fixed maximum value, there is big problem with head- room when performing such operations. Especially when dealing with music, which is often mastered to utilize the entire bit resolution, leaving little to no headroom to work with. One work around is to simply lower the level of the signal, resulting in more headroom. However, this impacts quality as the res- olution of the signal deteriorates.

DEFINITIONS

[0009] Headroom in a digital signal is the amount of which the maximum value that can be represented exceeds the maximum level of the signal con- tent.

A bin or a frequency bin in this context is a segment of the frequency axis that "collect" the amplitude, magnitude or energy from a small range of fre- quencies, often resulting from a Fourier analysis.

Welch's method, named after P.D. Welch, is used for estimating the power of a signal at different frequencies: that is, it is an approach to spectral density estimation. The method is based on the concept of using periodogram spectrum estimates, which are the result of converting a signal from the time domain to the frequency domain. Welch's method is an improvement on the standard period ogram spectrum estimating method and on Bartlett's method, in that it reduces noise in the estimated power spectra in exchange for reduc- ing the frequency resolution.

Compression is a tool to affect the dynamics of a signal. A simple audio com- pressor has two main settings. A threshold, above which the compressor is executed, and a ratio setting that governs how much the signal is corn- pressed. An extreme case of compressor is a called a limiter setting a limit where the signal is not allowed to exceed the threshold.

[0010] In the example shown in Fig. 1 headroom is illustrated with a music file with a varying sound content amplitude. If the sound content amplitude exceeds the saturation limit, it will cause clipping, introducing distortion to the signal. In many cases, this headroom is smaller than the amount of amplifica- tion that is necessary to reach the wanted outcome of for example a filter. As shown in Fig. 1 , sound is often very dynamic. Here the amplitude is shown over time with large variations. With enough statistical data, this dynamicity can be used to, in a controlled manner, fully utilize or exceed the headroom, still handling the cases where samples would saturate.

[0011] However, looking at the frequency content of the same file, as shown in Fig. 2a, there are some frequency bands where the headroom is much greater than in other frequency bands. By assuming that the audio content is utilizing the entire bit resolution it is possible use the data as shown in Fig. 2a to determine how much it is possible to amplify the signal in different fre- quency bins. By negating the mean spectrum, which is in decibels and nor- malized so that the highest value represents 0 as shown in Fig. 2a, the result effectively is the mean headroom at different frequency bins. The mean headroom value for a certain frequency now represents the mean gain that can be applied to the corresponding frequency component of a signal without saturation. This in turn implies that theoretically one half of the samples will in fact saturate. In accordance with the invention, this frequency based amplifi- cation is used when applying filtering and other signal processing to a digital audio signal. By using more audio files to base these statistics on, it is possible to obtain a more general curve that can be used to decide how much headroom signal processing is allowed to use, as long as this audio is representative for the digital audio signal that is processed.

[0012] Different behaviours of a compressor are shown in Fig. 3. The dashed line in Fig. 3 illustrates a typical compressor. Above a predefined threshold point 11 output is not allowed to follow input but will be limited. The threshold point is called“the knee” of the compressor. A limiter has a very sharp knee, as depicted by the continuous line. A gentler approach to this is to use what is called a“soft knee” or“over easy” as depicted by the dotted line. By gradually changing gradient in a circular behavior above a predefined knee threshold point 13 an abrupt limiting is avoided. Instead, a gradually in- creasing compression is obtained.

[0013] In one aspect of the invention as illustrated in Fig. 4, the two methods discussed above are combined to produce a special soft knee limiter, tuned to a statistical headroom ruled by a saturation tolerance. This headroom is also used as a conditional maximum value to the processing steps that are run before the limiter. One embodiment of a processing system 10 in accord- ance with the invention comprises an analysis stage 12 and a processing stage 14. In various embodiments, analysis stage 12 and processing stage 14 are arranged as separate systems and may comprise hardware and soft- ware components and also combinations thereof.

[0014] Stored audio content 16 comprises at least one digital audio file. When a digital audio file that will be used as external audio content 18 is available it is possible to use data from that file for further processing and a specific setting for that specific audio file. When there is no access to the ex ternal audio content in advance another approach can be used. In any case, a spectral estimation 20 is performed in an analysing step. During this step an averaged energy or power spectrum for each digital audio file in the stored audio content 16 is obtained. In various embodiments, Welch method is used for spectral estimation.

[0015] After spectral estimation, a statistical analysis 22 is performed. A central tendency or an average such as the arithmetic mean, the median or mode spectrum is calculated for all of the analysed files by averaging each frequency bin. In statistics, a central tendency (or measure of central ten- dency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of cen- tral tendency are often called averages. Then the spread between values, such as the variance or standard deviation spectrum a Wf is calculated corre- spondingly. Frequency dependent mean and standard deviation calculation for the spectra can be made with the equation 1 and equation 2 as set out below:

Where w(i) is the Welch spectrum for a specific audio file and Wf(i) is a spe- cific frequency bin for this spectrum.

[0016] A result from the statistical analysis 22 is used in a headroom estima- tion 24, c.f. Fig. 5. The headroom is estimated or calculated in terms of toler ance. By approximating the sound samples with a normal distribution, it is possible to establish a tolerance in percent of samples that will be allowed to saturate. With the relationship shown in Fig. 5, equations 3 and 4 as set out below is used to achieve a tolerance of certain percentage.

Equation 3 Equation 4

Where g is chosen in accordance with a selected tolerance in the normal dis- tribution graph shown in Fig. 5. In this context, g corresponds to the number of standard deviations deducted or added to the mean value.

[0017] An average headroom can be used as a limit for how much the audio processing is allowed to amplify the signal for each frequency band. Since there is a resulting tolerance degree of saturation present, a compressor is needed to cover a scenario when the processed signal otherwise would saturate.

[0018] The headroom spectrum generated above is used in a parameter calculation 26 for calculating parameters both for any conditional processing 30 and for a limiter 32. The processing is not allowed to amplify the signal be- yond the headroom in any given frequency band. Compressor parameters are tuned to match the highest amplification made by the processing algo- rithm(s), thereby ensuring a smooth transition to this value without saturation.

[0019] The external audio content is input to the processing stage 14 as an audio stream 28. The audio stream 28 is processed in the conditional pro- cessing 30. The conditional processing 30 can be any kind of signal pro- cessing and parameters from the parameter calculation 26 are used to main- tain a signal level within the headroom calculated during headroom estima- tion. It should be noted, that the processed signal level can exceed the maxi- mum value for a sample container, given that later processing, for instance in limiter 32 can handle it. The limiter 32 will handle cases like this to ensure that the signal will be representable after the processing stage 14.

[0020] Samples that would cause saturation will be handled gracefully by the limiter 32 beeing a soft-knee limiter. In a soft-knee compressor, onset of gain reduction occurs gradually after the signal has exceeded a threshold value. Since the theoretical maximum level of the signal and the allowed gain compensation are known from the headroom calculations, it is now possible to tune the limiter to fit the signal optimally. Any samples out of range for the container samples are transported in will be compressed to fit the container.

[0021] While certain illustrative embodiments of the invention have been de- scribed in particularity, it will be understood that various other modifications will be readily apparent to those skilled in the art without departing from the scope and spirit of the invention. Accordingly, it is not intended that the scope of the claims appended hereto be limited to the description set forth herein but rather that the claims be construed as encompassing all equivalents of the present invention which are apparent to those skilled in the art to which the invention pertains.