Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SIGNAL PROCESSOR FOR MODIFICATION OF AUDIO SIGNALS
Document Type and Number:
WIPO Patent Application WO/2006/106466
Kind Code:
A1
Abstract:
Described is a method and a signal processor for modification of an audio signal, such as a speech signal, according to a modification parameter while still providing a high-quality output signal . The modification parameter may be time-scaling or pitch shifting. The method of modifying an audio signal according to a modification parameter comprises receiving a target quality measure, adjusting the modification parameter in order to meet the target quality measure, and finally modifying the audio signal according to the adjusted modification parameter. Traditional modification systems are controlled by a target value of the modification parameter. According to the invention, the modification parameter is adjusted in order to provide a preset signal quality and thus avoid severe audible artiiacts that are known in traditional modification systems for certain input signals. According to the method, the actual modification parameter, e.g. pitch x'atio, may differ from the initially desired one, but the quality of the produced signal can be better guaranteed. The method may be applied to known speech modification techniques such as overlap-add techniques and its variants (OLA, SOLA, PSOLA, PICOLA) .

Inventors:
HARMA AKI S (NL)
Application Number:
PCT/IB2006/050999
Publication Date:
October 12, 2006
Filing Date:
April 03, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
HARMA AKI S (NL)
International Classes:
G10L19/22; G10L21/04
Foreign References:
US20040122662A12004-06-24
US5699404A1997-12-16
US5828994A1998-10-27
EP1515310A12005-03-16
Other References:
FANG LIU ET AL: "Quality enhancement of packet audio with time-scale modification", MULTIMEDIA SYSTEMS AND APPLICATIONS V 29-30 JULY 2002 BOSTON, MA, USA, vol. 4861, October 2002 (2002-10-01), Proceedings of the SPIE - The International Society for Optical Engineering SPIE-Int. Soc. Opt. Eng USA, pages 163 - 173, XP002394785, ISSN: 0277-786X
Attorney, Agent or Firm:
Slenders, Petrus J. W. (AA Eindhoven, NL)
Download PDF:
Claims:
CLAIMS:
1. Method of modifying an audio signal (AS) according to a modification parameter (MP), the method comprising the steps of receiving a target quality measure (TQM), adjusting the modification parameter (MP) in order to meet the target quality measure (TQM), modifying the audio signal (AS) according to the adjusted modification parameter (AMP).
2. Method according to claim 1, wherein step 2) comprises calculating a quality measure based on quantifying a magnitude of an artifact in the modified audio signal (MAS) compared to the audio signal (AS).
3. Method according to claim 2, wherein step 2) comprises determining the adjusted modification parameter (AMP) by iteratively adjusting the modifying parameter (MP) and calculating a quality measure according thereto, until the calculated quality measure complies with the target quality measure (TQM).
4. Method according to claim 1, wherein the adjusted modification parameter (AMP) is determined based on a statistical evaluation of previously calculated quality measures.
5. Method according to claim 1, wherein the modification parameter (MP) is selected from the group consisting of: timescaling ratio, pitch shifting ratio, parametric changes to spectrum envelope, formant modification based on formant tracking, amplitude modulation, frequency modulation, modifications of pitch variance.
6. Method according to claim 1, wherein the audio signal (AS) comprises a speech signal.
7. Method according to claim 1, wherein the audio signal (AS) is modified using an overlap addition synthesis technique.
8. Signal processor for modifying an audio signal (AS) according to a modification parameter (MP), the signal processor comprising: an input adapted to receive a target quality measure (TQM), a modification parameter adjuster (MPA) adapted to adjust the modification parameter (MP) in order to meet the target quality measure (TQM), an audio signal modifier (ASM) adapted to modify the audio signal (AS) according to the adjusted modification parameter (AMP).
9. Signal processor according to claim 8, wherein the modification parameter adjuster (MPA) is adapted to calculate a quality measure based on quantifying a magnitude of an artifact in the modified audio signal (MAS) compared to the audio signal (AS).
10. Signal processor according to claim 9, wherein the modification parameter adjuster (MPA) is adapted to determine the adjusted modification parameter (AMP) by iteratively adjusting the modifying parameter (MP) and calculating a quality measure according thereto, until the calculated quality measure complies with the target quality measure (TQM).
11. Signal processor according to claim 8, wherein modification parameter adjuster (MPA) is adapted to determine the adjusted modification parameter (AMP) based on a statistical evaluation of previously calculated quality measures.
12. Signal processor according to claim 8, wherein the modification parameter (MP) is selected from the group consisting of: timescaling ratio, pitch shifting ratio, parametric changes to spectrum envelope, formant modification based on formant tracking, amplitude modulation, frequency modulation, modifications of pitch variance.
13. Device comprising a signal processor according to claim 8.
14. Computer executable program code adapted to perform the method according to claim 1.
15. Computer readable storage medium comprising a computer executable program code according to claim 14.
Description:
Method and signal processor for modification of audio signals

The invention relates to the field of signal processing of audio signals. More specifically, the invention relates to a method and signal processor for modification of an audio signal according to a parameter, so as to change properties or attributes of the audio signal, while still a high signal quality is required. The method and signal processor may be used within modification of speech signals such as for modification of pitch and rate of speech, where a high speech intelligibility is required in the modified speech.

A large variety of modification techniques and algorithms exist that can modify different properties or attributes of an audio signal. Such modifications are especially used on speech signals, such as for changing the pitch or the rate of the speech signal. In some cases these techniques can be used to improve intelligibility of speech. There are also many potential voice design applications where the properties of speech signals are changed to create entertaining effects or to affect some emotional or aesthetic attributes of speech sounds (e.g., in movies, commercials, and even in mobile telephony).

One classical approach to modify speech signals is the splicing method and many of its variants. Here, the input signal is split into a sequence of short-time signals, which are processed separately, and combined using overlapped addition (OLA) to synthesize a modified speech signal. In speech processing applications the windowing is commonly synchronized to the fundamental frequency in voiced speech segments. In the synchronous overlapped addition (SOLA) method introduced in "Autocorrelation method for high-quality time/pitch-scaling" by J. Laroche, Proc. IEEE Workshop Applications of Signal Processing to Audio and Acoustics (WASPAA'93), 1993, the splicing of the signal was done so that the length of a short-time signal is the same as an estimate of the pitch period, that is, the lag of the maximum of the autocorrelation function. In pitch-synchronous overlapped addition (PSOLA) such as in "Non-parametric techniques for pitch-scale and time-scale modifications of speech" by E. Moulines and J. Laroche, Speech Comm., vol. 16, pp. 175- 205, 1995, algorithms the splicing is done such that the maximum position of a window function (e.g., the Hann window) is placed at the maximum peak inside a pitch period. The

center positions of such windows are called pitch marks. Another known related technique is pointer-interval controlled overlap addition (PICOLA).

The time scaling effect is obtained either by inserting new signal segments, or deleting short-time signals from the sequence of short-time signals. The locations of the pitch marks are changed accordingly. In pitch-shifting the temporal locations of the pitch marks are changed. For example, making the regular pitch mark sequence more dense, or more sparse, the pitch is made higher, or lower, respectively. Naturally, one must add or remove short- time signals also in pitch shifting if it is desired to keep the original duration of the signal. Traditionally, a pitch modification algorithm is initialized such that the operator sets the target pitch modification ratio for the algorithm and specifies how large temporal discrepancy is allowed from the original time scale. Then, pitch marks are modified to realize the target rate. When a limit for the time discrepancy is exceeded, a new short-time signal is inserted or deleted.

The insertion and deletion must be done with care because it is an important source of unwanted artifacts and deterioration of the intelligibility in modified speech signals. It can be argued that the sound quality and intelligibility are the most important measures in a speech modification system. In fact, in many applications these are even more important aspects than getting exactly the desired change in terms of the target pitch ratio or other modification parameter. Nevertheless, the traditional implementation of these effects is based on a fixed target rate for any input signal and it gives no means to control the quality of the modified signal.

A few examples of prior art patent documents are: US patent 6,763 332 B2 describing use of an OLA procedure for time-scaling, EP 1 236 332 Al describing use of PSOLA to change properties of a speech signal, such as an age associated with the person having produced the speech, and EP 1 099 216 Bl describing use of SOLA for time-scaling.

In known techniques, it is common that a speech modification algorithm delivers quite different results with different input signals. An effect that may work well with one signal may sound distorted with another input signal.

It may be seen as an object of the present invention to provide a method and a signal processor capable of modifying an audio signal according to a modification parameter and still provide a modified audio signal with a high signal quality without severe audible artifacts.

According to a first aspect, the invention provides a method of modifying an audio signal according to a modification parameter, the method comprising the steps of

1) receiving a target quality measure,

2) adjusting the modification parameter in order to meet the target quality measure,

3) modifying the audio signal according to the adjusted modification parameter. According to the method, the modification of the signal is controlled by a predefined target quality measure, such as an objective measure for signal quality, rather than being based purely on a target modification parameter, which may, for a specific audio signal, lead to artifacts in the produced modified audio signal. If the desired modification parameter does not provide a modified audio signal that complies with the target quality measure, then the modification parameter is adjusted until the produced modified audio signal has a quality that complies with the target quality.

A value of the target quality measure can be set in many different ways. In the simplest case the target quality measure may have a fixed value set by the manufacturer of the device in which the method is used. It may also be set by the user of the device. There are also several possible ways of controlling the value of the target quality measure adaptively to the input signal. For example, it may be found in some cases that the value of the quality measure for a particular input signal is never or rarely above the target quality measure. In that case, the target quality measure could be adjusted to some lower value to facilitate the modification effect at some desired minimum value of the modification parameter. Based on adaptive control of target quality measure, it is also possible to formulate methods based on common optimization or adjustment of the values of the target quality measure and modification parameter, possibly to optimize some other desired quantity such as a length of the processing buffer.

The method has the advantage that it adapts to the actual input audio signal. Thus, independent of the input audio signal and the modification parameter, it is possible to ensure a given target quality of the modified audio signal. The method implies that the final modification parameter may become different from the modification parameter initially set. This is the price for avoiding severe audible artifacts that may occur for certain combinations of audio signal and modification parameter. However, depending on the type of signal processing algorithm involved in the modification of the audio signal, only small and insignificant adjustment of the initially set modification parameter may be necessary in order to provide an improved signal quality without severe audible artiiacts. In case a low target

quality is set, and/or in case of a match between the audible signal and the modification parameter set, it may be that the adjusted modification parameter is identical with the modification parameter set.

For example, the modification parameter can be a pitch shift of a speech signal, and the target quality measure can be a signal-to-noise ratio (SNR) that does not go below 15 dB. Typically, the quality can be measured where the insertion or deletion of short- time signals could take place. Unlike in a traditional system where the insertion or deletion is performed driven by the target rate, the target rate is here modified to accommodate the insertions and deletions that take place only where the final signal complies with the 15 dB SNR. As a consequence of the quality control, the actual rate modification depends on the properties of the input signal.

As a consequence of the quality being controlled according to the method, the actual rate modification depends on the properties of the input signal. With the mentioned pitch shift example, a traditional fixed setting will provide the effect. But, it may provide a good sound quality with one speech signal and, due to artifacts, it may provide an unacceptable sound quality with another speech signal. Typical examples of the two cases are clean speech and speech with background noise. The proposed quality-controlled method delivers a target quality in both cases, but the average change in the pitch may be more modest in the latter case. The difference in the actual modified rate is typically not a problem in speech applications because humans are not very sensitive to the absolute pitch, or small pitch changes in fluent speech signals. Naturally, the proposed scenario for pitch modification is not suitable in music applications such as karaoke systems where singers pitch is modified such that it meets exactly the target rate set by the music. However, it is possible to use the same principle in other effects used in music applications. For example, the adaptive enhancement of singer's formant in singing voice, or various types guitar effects could be controlled in the same way.

The method according to the invention provides the largest advantages within moderate speech signal modification effects where the most important goal is to deliver the effect but keep the intelligibility as high as, or even better than, that of the original signal. Due to the method being capable of preserving a high signal quality, i.e. a high speech intelligibility in case of speech signals, the method is suitable for use within telephony, such as mobile phones, teleconferencing systems etc. The method may also be used on speech signals for entertainment purposes such as to create funny voices in cartoons or commercials or as a gimmick in mobile phones.

The method is especially suited to preserve signal quality in speech signals, but it may be used for music or other sources. Thus, the audio signal may comprise a speech signal. The audio signal may be predominantly a speech signal, such as for example speech in a mobile phone with additional background noise. The audio signal may be a pure speech signal, such as a speech signal recorded in a studio environment.

Step 2) preferably comprises calculating a quality measure based on quantifying a magnitude of an artifact or error in the modified audio signal compared to the original audio signal. Step 2) may further comprise determining the adjusted modification parameter by iteratively adjusting the modifying parameter and calculating a quality measure according thereto, until the calculated quality measure complies with the target quality measure. Other alternative strategies for adjusting the modification parameter may be used. If the modification parameter adjustment strategy may comprise a predefined parameter putting a limit to a maximum rate and/or a maximum magnitude that the adjusted modification parameter is allowed to deviate from the modification parameter initially set. This will help to inhibit rapid and significant variations of fluctuations of the actual modification parameter that may cause disturbing effects, such as a rapid pitch variation. With respect to the rate of adjustments allowed, a period of the audio signal taken into account in step 2) may be set, such as a number of frames taken into account etc.

The adjusted modification parameter may be determined based on a statistical evaluation of previously calculated quality measures. For example historical quality measure data for a specific audio signal may be stored in a database and then later used to improve the adjustment of the modification parameter. The statistics of the quality measure can also be monitored during the use of the modification device so that the modification parameter is adjusted continuously or it is updated at some predefined time intervals. For example, the modification parameter could slowly adapt over the duration of a conversation in a telephony application.

The modification parameter may be selected from the group consisting of: time-scaling (or time-scaling ratio), pitch shifting (or pitch-shifting ratio), parametric changes to the spectrum envelope, formant modification based on formant tracking, amplitude modulation, frequency modulation, modifications of the pitch variance. The modification effects can also be adapted separately for voiced and unvoiced parts of speech signals.

The audio signal may be modified using an overlap addition synthesis technique. For example SOLA or PSOLA such as known in the art of speech modification.

In a second aspect, the invention provides a signal processor for modifying an audio signal according to a modification parameter, the signal processor comprising an input adapted to receive a target quality measure, a modification parameter adjuster adapted to adjust the modification parameter in order to meet the target quality measure, an audio signal modifier adapted to modify the audio signal according to the adjusted modification parameter.

In a preferred embodiment, the modification parameter adjuster is adapted to calculate a quality measure based on quantifying a magnitude of an artifact in the modified audio signal compared to the audio signal. The modification parameter adjuster may be adapted to determine the adjusted modification parameter by iteratively adjusting the modifying parameter and calculating a quality measure according thereto, until the calculated quality measure complies with the target quality measure.

The modification parameter adjuster may be adapted to determine the adjusted modification parameter based on a statistical evaluation of previously calculated quality measures.

The modification parameter may be selected from the group consisting of: time-scaling ratio, pitch shifting ratio, parametric changes to spectrum envelope, formant modification based on formant tracking, amplitude modulation, frequency modulation, modifications of pitch variance.

Preferably, the signal processor is capable of processing the audio signal in real-time so as to enable applications within on-line systems such as effect processors and dedicated speech modification systems. The signal processor may be an application specific integrated circuit (ASIC), programmable logic such as a field-programmable gate array (FPGA), a dedicated signal processor, or a general purpose processor such as the processor of a Personal Computer (PC). It is also possible to split the execution of Steps 2) and 3) into separate processors or cores. However, for non-real time purposes, such as off-line processing of a large audio sequence, e.g. a whole DVD or CD, a slow signal processor may be used. The same properties and advantages as set forth for the first aspect apply for the second aspect as well.

In a third aspect, the invention provides a device comprising a signal processor according to the second aspect. The device may be: a dedicated speech processing device, an audio effect processor, a telephone, a mobile phone, a teleconference system, speech intelligibility enhancer, speech aid equipment for speech impaired, or a speech synthesizer.

In a fourth aspect, the invention provides a computer executable program code adapted to perform the method according to the first aspect.

In a fifth aspect, the invention provides a computer readable storage medium comprising a computer executable program code according to the forth aspect. The storage medium may be a hard disk, a floppy disk, a CD, a DVD, an SD card, a memory stick, a memory chip etc.

In the following the invention is described in more details with reference to the accompanying figures, of which

Fig. 1 shows a block diagram illustrating a signal processor according to the invention, and

Fig. 2 shows a block diagram illustrating a preferred speech modification device. While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Fig. 1 shows the principle of a signal processor SIGP adapted to modify an audio signal. Below is a Table showing abbreviations used in Fig. 1.

The signal processor SIGP, which may be included in an audio signal modification device DVC, receives as input an audio signal AS and outputs a modified audio signal MAS in response. The signal processor SIGP comprises a modification parameter

adjuster MPA and an audio signal modifier ASM. The modification parameter adjuster MPA receives the input audio signal AS and adjusts a modification parameter MP initially set, so as to provide a modified modification parameter AMP that will ensure that the modified audio signal MAS complies with a target quality measure TQM that can be adjustable and predefined by a user, or it may be a fixed value.

The modification parameter adjuster MPA adjusts the modification parameter MP according to a predefined strategy, such as using an iterative method in which a measure of quality of the modified audio signal MAS is calculated and adjusted in a repeated procedure until the quality measure satisfies the predefined target quality measure TQM. As the target quality measure TQM is reached, the adjusted modification parameter AMP is then provided to the audio signal modifier ASM that modifies the audio signal in accordance to the adjusted modification parameter AMP.

Fig. 2 shows a preferred embodiment of the invention: a speech modification signal processor for change of pitch of the speech signal, such as for use in a speech modification device or speech modification system. Below is a Table showing abbreviations used in Fig. 2.

The modification system is generally based on the principles of SOLA or PSOLA algorithms. The input speech signal is split into temporally overlapping short-time signals and the process is controlled by a local estimate of the fundamental frequency (roughly corresponds to the perceived pitch). Typically each short-time signal is associated with a corresponding pitch mark, a temporal location in the original signal, or it is based on some other segmentation of the speech signal synchronously with the fundamental frequency.

The input speech signal SP is analyzed either in frames (or alternatively as an entire audio file) in analysis block in the signal analysis decomposition SADEC. There it is decomposed into a sequence of windowed short-time signals STS each associated with a center location, pitch mark, in PM. The decomposition in the case of a voiced signal is usually based on an estimate of the period corresponding to the fundamental frequency of the signal. In unvoiced signals it is common to use a fixed length windowing for the short-time signals. In the synthesis phase OLAS, overlapped addition is used to combine short-time signals MPM with associated pitch marks MSTS into a modified speech signal MSP.

Typically, windowing in SADEC is such that the output modified speech signal MSP is a perfect reconstruction of the input signal SP if short-time signals or the positions of the pitch marks are not changed. This is obtained, e.g. using a Hann window or a triangular window with a 50% overlap. There are several other alternative windowing techniques to obtain the perfect reconstruction.

Modifying a signal with the proposed algorithm in block MOD, there are basically three components that can be changed. First, the positions of the original pitch marks PM ma be changed. Secondly, various different properties of the short-time signals STS may be changed. Third, the number of short-time signals STS, and pitch marks PM, may be changed by inserting new short-time signals to the original sequence, or by removing existing short-time signals and pitch marks. In most cases a new short-time signal inserted to the sequence is a copy of an existing short-time signal, or a new signal formed by combining two or more short-time signals.

Pitch shifting and time scale modification effects are based on the modification of the positions of pitch marks and the insertion and/or deletion of short-time signals from the sequence. Modification of the properties of the short-time signals changes, for example, the timbre, formant regions, and/or the type of the glottal excitation. In the proposed method the quality control is used to control all the three components, or only some of those.

However, to illustrate the idea of a quality controlled speech modification algorithm, a simple example of lowering the pitch of a speech signal is described in the following. Here, the time difference between adjacent pitch marks d k = p k - p k _ x is modified by multiplying it with a pitch rate modification ratio β>l. In addition, short-time signals are upsampled by the ratio β. The sequence of modified pitch marks is formed accumulatively from the sequence of pitch marks such that a new pitch mark position is given

by P 1 = P 1-1 + ^d 1 . With the change of the pitch marks and the upsampling of the short-time signals only, the synthesis in block OLAS would produce exactly the same signal that would be obtained by upsampling the entire original signal with the same ratio. To preserve the original length of the signal short-time signals are deleted from the sequence. In a sequence of N short-time signals, changing the time differences between the pitch marks by the ratio β produces a time-discrepancy given by:

The discrepancy can be compensated deleting a short-time signal each time T has grown (at least approximately) as large as the expectation of d k . There, the short-time signal corresponding to the position of the pitch mark is deleted from the sequence and the accumulative pitch mark is not updated. Considering that the upsampling effect is desired, the only source of artifacts in this example is then the place where the short-time signal was deleted. The effect of the deletion of a short-time signal S 1 (n) from the sequence can be measured, for example, using the following formula

Q del = 20\og w E[ Sl (n)s l+1 (n)]/ E[S 1 (H) - s l+1 (n)] 2

which simply characterizes the similarity of two adjacent short-time signals. This measure can be replaced by any similar formulation comparing the magnitude of the error in deleting one short-time signal, and replacing it with the next short-time signal S 1+1 (n) in the overlap- add synthesis OLAS. The above formula is an example of a calculation of a quality measure CQM that is performed in connection with the pitch marks and/or short-time signals modification procedure MOD.

In the quality-controlled speech modification principle according to the invention, the goal is to keep the values of the quality measure at a given region. For example, it may be desired that the value of Q del ≥ Q del , where Q del in this case is the lowest acceptable value for the quality measure at the point of deleting a short-time signal. The threshold value is typically set by the user, Input of target quality measure ITQM. The control logic unit stores the statistics of values of Q del . This can be used directly to modify the rate β and the length of the processing buffer such that the signal quality can be better

maintained in deleting short-time signals from the sequence. In practice, Q del is a random variable and therefore it cannot be guaranteed that there is always a pair of short-time signals, which will give a value below the threshold. However, the expectation and variance of Q del may be estimated from the signal history and be used to determine a suitable value for the modification parameter, which in this case is the rate β.

1) The control logic CL is used to adjust the rate of the pitch change β. There are many different strategies how this can be done:

2) The rate control can be continuously adaptive and change, e.g. in a time scale of seconds. This will produce continuously small fluctuations to the pitch. It is likely that small fluctuations in pitch of a modified speech signal are less annoying than artifacts.

3) The rate control may aim at converging to a constant rate. In this case the rate change parameter is initialized to some safe value and later adjusted towards an optimum value.

The statistics of Q del are computed off-line or over several times of usage of the system. The data are stored in a database DB and retrieved when the effect is activated for that source signal. For example, this could be the case in processing incoming speech signals in a mobile phone application where the information about the caller's Q del statistics can be stored and reused.

The previous example illustrated the main principles of quality-controlled speech modification. The modification parameter, which was the pitch change ratio β>l, was adjusted by the statistics of a quality measure Q del . Extension of the example to the case of making the pitch higher with β<l is straightforward. The deletion operation is replaced by insertion of a new signal segment to the sequence of short-time signals. The same target quality criteria can be used in this application to choose pairs of short-time signals where a duplicate of an existing short-time signal can be inserted to the sequence or a new short-time signal can be formed by combining existing short-time signals.

In a more general case, any parameter p controlling the generic speech modification scenario of Fig. 1 can be associated with a quality measure Q p . The value can be estimated from the short-time signals directly, or by means of analysis-by- synthesis from the input speech signal and the output of the speech modification system. The quality measure is an objective measure for the influence of one or several parameters to some observable property of the output signal.

The following list exemplifies some additional implementations for a quality- controlled speech modification using the system of Fig. 1.

1) When pitch shifting is performed without up/down sampling of the short-time signals, the benefit of preserving the original spectrum envelope in the pitch modification is obtained, which is a very useful property. However, the overlap-add procedure will now produce some artifacts due to the windowing. The degradation of the signal quality can be computed almost similarly to the definition of Q del above and the pitch modification parameter β can be controlled similarly.

2) Pitch can also be varied dynamically to change the intonation of the speech signal. In this case an estimate of the statistics of Q del can be used to determine suitable margins for the variation of β.

3) The speech signals may be modified such that the pitch marks are not modified but some properties of the short-time signals are changed. One particularly interesting application is where the short-time signals are up-sampled or down-sampled by some ratio γ. This will modify the spectrum envelope systematically. The effect will potentially produce audible artifacts in overlapped addition of each short-time signal in block OLAS. However, one can formulate a quality measure Q 1 associated with the modification parameter γ, set a target quality Q 1 , and use the signal statistics to adjust the value of γ.

4) The short time signals can be further decomposed into different components, which can be processed separately. For example, one can use linear predictive coding or many of its variants, such as ARMA, to estimate a parametric spectrum model and a short- time residual signal. Or a phase vocoder technique or sinusoidal modeling of the short-time signals can be applied to obtain different types of nonparametric and parametric representations of short-time signals. Signal properties can be modified in those representations in many ways. For example, we may estimate locations of formants in speech signals and, e.g. amplify the second formant by a certain gain A or increase the separation of the first and second formants by some other parameter. Like in other examples, one can usually develop a sound quality measure associated with the effect of a particular modification parameter p to the obtained signal quality in the overlapped addition. Moreover, that can be used to control the value of p such that a predefined quality criteria is not violated.

5) The quality-controlled modification principle can be applied independently to different parts of a speech signal. For example, we may process separately voiced and

unvoiced parts of the speech signals such that one target quality Q voιced is defined for the voiced parts of the signal and another target quality Q unvoιced for unvoiced parts of a speech signal. The quality criteria can also be different for the two different types of speech signals. 6) The modification method can be a combination of many different techniques. In this case the control of a set of modification parameters [α x , α α , α 3 , ... ] by one or more quality measures is typically more complicated.

The invention is suitable for real-time modification of properties of speech signals. One possible application could be the processing of incoming speech signals in mobile phones. Here, the modification of the signal could be used to produce entertaining effects to a user, such as to assign specific funny attributes to voices of a specific person calling.

Traditionally the sound quality has been one major problem for the use of speech modification techniques in real communication applications. The proposed method is designed such that it will significantly help to control the signal quality in a speech modification algorithms, thus making those much more viable also in applications where the intelligibility of speech is very important. In fact, similar technology for voice modification can also be used to improve the intelligibility of speech in certain cases.

Reference signs in the claims merely serve to increase readability. These reference signs should not in anyway be construed as limiting the scope of the claims.