Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR AUTOMATIC STRUCTURE ANALYSIS OF MUSIC
Document Type and Number:
WIPO Patent Application WO/2007/036846
Kind Code:
A2
Abstract:
The location of accented beats of a music track is determined. A signal is extracted from the music track. The signal is demultiplexed across a number of channels such that each consecutive channel contains a consecutive portion of the signal, each portion of the signal corresponding a consecutive beat period, the number of channels corresponding to the meter of the music track. The content of each channel is analysed to determine which channel has different signal properties. The channel having the different signal properties is the channel containing the accented beats.

Inventors:
LEMMA AWEKE N (NL)
ZIJDERVELD FRANCESCO F M (NL)
Application Number:
PCT/IB2006/053398
Publication Date:
April 05, 2007
Filing Date:
September 20, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KONINKL PHILIPS ELECTRONICS NV (NL)
LEMMA AWEKE N (NL)
ZIJDERVELD FRANCESCO F M (NL)
International Classes:
G11B27/038; G11B27/10; G11B27/11; G11B27/30; G10L11/00
Domestic Patent References:
WO2003093950A22003-11-13
WO2007072394A22007-06-28
Foreign References:
US5734731A1998-03-31
EP1162621A12001-12-12
Other References:
AWEKE N LEMMA: "AutoDJ: the art of electronic music mixing" PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 6015, October 2005 (2005-10), XP002426092 ISSN: 0277-786X
Attorney, Agent or Firm:
GROENENDAAL, Antonius, W., M. et al. (AA Eindhoven, NL)
Download PDF:
Claims:
CLAIMS:

1. A method for determining a location of accented beats of a music track, the method comprising the steps of:

(a) extracting a signal from said music track;

(b) demultiplexing said signal across a number of channels, the number of channels corresponding to a candidate meter of said music track such that each consecutive channel contains a consecutive portion of said signal, each portion of said signal corresponding to a consecutive beat period; and

(c) determining the location of the accented beats as a specific one of the channels having signal properties different to those of the other channels.

2. A method according to claim 1 , further comprising the step of determining the beat onsets of said music track to ensure that each consecutive channel contains a corresponding portion of said signal aligned with the corresponding consecutive beat onsets.

3. A method according to claim 1 or 2, wherein the temporal duration of each portion of said signal corresponds to a beat period.

4. A method according to any one of claims 1 to 3, wherein the method further comprises the step of estimating the candidate meter of said music track.

5. A method according to any one of the preceding claims, wherein said signal properties include correlation coefficients, difference in the signal profile of said music track, number of zero crossings of the signal profile of said music track.

6. A method according to any one of the preceding claims, wherein step (b) is repeated for a different candidate meter.

7. A method according to claim 6, wherein the method further comprises the step of: determining the meter of said music track as the candidate meter for which a channel having different signal properties to that of the other channels can be distinguished.

8. A method for mixing two music tracks to provide a rhythmically coherent transition by aligning accented beats of each music track, the accented beats of each music track being determined by the method according to any one of claims 1 to 7.

9. A method according to any one of the preceding claims, wherein said signal is the weighted sum of the FFT of the said music track.

10. A method according to any one of claims 1 to 8, wherein said signal is the running energy of the said music track.

11. Apparatus for determining a location of accented beats of a music track, the apparatus comprising: an extractor for extracting a signal from said music track; a demultiplexer for demultiplexing said signal across a number of channels such that each consecutive channel contains a consecutive portion of said signal, each portion of said signal corresponding to a consecutive beat period, the number of channels corresponding to a candidate meter of said music track; a comparator for determining the channel having different signal properties to that of the other channels; a selector for selecting a specific one of the channels having different signal properties to that of the other channels and determining this channel as the location of the accented beats.

12. Apparatus according to claim 11 , wherein among a plurality of candidate meters a candidate meter for which a channel has different properties to that of the other channels can be selected by the selector is determined as the meter of said music track.

13. Apparatus according to claim 11 or 12, wherein the demultiplexer comprises at least two demultiplexer stages, each demultiplexer stage demultiplexing said music track across a different number of channels.

14. Apparatus according to claim 13, wherein said at least two demultiplexer stages are connected in parallel and said comparator comprises at least two comparator stages connected in series to respective demultiplexer stages.

15. Apparatus according to claim 13 or 14, wherein the apparatus further comprises a filtering stage between the demultiplexing stages and the comparison stages for filtering transients.

16. Apparatus according to any one of claims 11 to 15, further comprising a beat onset estimator for estimating the tempo and beat onset for controlling the duration of each channel such that the duration of each channel corresponds to a beat period.

17. A computer program product comprising a plurality of program code portions for carrying out the method according to any one of claims 1 to 10.

Description:

Method and apparatus for automatic structure analysis of music

FIELD OF THE INVENTION

The present invention relates to method and apparatus for automatic structure analysis of a piece of music or music track. In particular, it relates to analysis of the structure such as the location of accented beats of the music track and the meter (or bar) of the music track.

BACKGROUND OF THE INVENTION

In an automatic DJ (AutoDJ) implementation, it is desirable to mix songs (or music tracks) having the same structure. This is because mixing, for example, a song having a 4/4 meter structure with that having a 3/4 meter structure will result in rhythmic conflict.

Furthermore, to ensure a smooth transition between songs, it is also desirable to match rhythmical coherent beats (or accented beats within the meter). The accent is the music stress applied to a note and contributes to the sensation of the beat. Therefore, when the accented beats are not matched, the transition could be annoying. There are many known AutoDJ systems which select and sort music tracks based on some similarity criteria and play them in a smooth rhythmically consistent way.

The basic schematic of a known AutoDJ system is shown in Figure 1.

Music tracks stored in database 101 are analysed to extract representative parameters 103. These include, among other things, the end of the intro section, the beginning of the outro section, phrase or bar boundaries, tempo, beat locations, and harmonic signature, etc. These parameters are usually computed offline and stored in a linked database

105.

A playlist 107 is generated from the definitive store 101 of music tracks. The playlist is compiled on the basis of the parameters in the database 105 and a set of user preferences. Given such a playlist, a transition planner 109 compares the extracted parameters corresponding to the music tracks in the playlist and generates a set of mix parameters to be used by the player 111.

Finally, the player 111 streams the music tracks in the playlist to the output device 113, for example loudspeaker, using the mix parameters that are an optimal compromise between user preferences and computed features to provide a smooth transition.

There is, therefore, an increasing demand in AutoDJ systems for accurately analysing the structure of a piece of music and thus improve the mix and transition between consecutive songs or music tracks. There are many known methods for beat detection, for example, "Real-time Beat Estimation using Feature Extraction", in Lecture Notes in Computer Science, Publisher: Springer- Verlag Heidelberg, Volume 2771/2004, pp 13-22, a beat detection algorithm based on statistics of the spectral evolution is discussed. The method provides estimates of beat onsets and instantaneous tempo. However, it does not recognize or locate accented beats.

US6542869 discloses yet another method of extracting song structure. In this method, the evolution of the ceptral frequency coefficients is used to determine events in songs. However, it does not provide a means of sorting these detected events. That is the method detects structural events that can represent any of intro, outro, beat onset, phrase boundary, break points etc. But it does not differentiate between these events making it practically inapplicable to an AutoDJ implementation.

SUMMARY OF THE INVENTION Therefore, it is desirable to provide a system for accurately analysing the structure of music, for example, a music track or song or the like. In particular, it is desirable to provide a system for accurately estimating the accented beats and/or the meter structure of music tracks to assist in providing smooth and natural transitions.

According to an aspect of the present invention, this is achieved by a method for determining a location of accented beats of a music track, the method comprising the steps of: extracting a signal from the music track; demultiplexing the signal across a number of channels, the number of channels corresponding to a candidate meter of the music track such that each consecutive channel contains a consecutive portion of the signal, each portion of the signal corresponding to a consecutive beat period; and determining the location of the accented beats as a specific one of the channels having signal properties different to those of the other channels.

According to another aspect of the present invention, this is achieved by apparatus for determining a location of accented beats of a music track, the apparatus comprising: an extractor for extracting a signal from the music track; a demultiplexer for

demultiplexing the signal across a number of channels such that each consecutive channel contains a consecutive portion of the signal, each portion of the signal corresponding to a consecutive beat portion, the number of channels corresponding to a candidate meter of the music track; a comparator for determining the channel having different signal properties to that of the other channels; and a selector for selecting a specific one of the channels having different signal properties to that of the other channels and determining this channel as the location of the accented beats.

According to the method and apparatus of the present invention, the location of the accented beats can be determined with some degree of accuracy. The method may further comprise the step of determining the beat onsets of the music track to ensure that each consecutive channel contains a corresponding portion of the signal aligned with corresponding consecutive beat onsets and the temporal duration of each portion of the signal may correspond to the beat period of the music track. Further, the method may further comprise the step of estimating the candidate meter of the music track to increase the accuracy of the locating the accented beats.

The difference in the signal properties may be derived from the corresponding correlation coefficients, difference in the signal profile of the music track, number of zero crossings of the signal profile of the music track etc. which enables a simple and effective comparison to be carried out to locate the channel containing the accented beat In a preferred embodiment the demultiplexing step is repeated for different candidate meters. The results can then be compared and the more accurate result used to locate the accented beats.

Preferably, the method further comprises the step of: determining the meter structure of the music track as the candidate meter for which a channel having different signal properties to that of the other channels can be distinguished.

The at least two demultiplexers stage may comprise at least two demultiplexers connected in parallel and the at least two correlating stages comprises at least two correlators connected in series to respective demultiplexers. In this way the results of a number of hypotheses can be acquired simultaneously, thus speeding up the processing time. To improve the accuracy of the results, transients can be filtered.

A beat onset estimator for estimating the tempo and beat onset may be incorporated in the apparatus for controlling the width of each channel such that the width of each channel corresponds to a beat period.

The present invention may also provide a method for determining the meter of a music track, the method comprising the steps of: (a) extracting a signal from the music track; (b) demultiplexing the music track across a plurality of channels such that each consecutive channel contains a consecutive portion of the signal, each portion of the signal corresponding to a consecutive beat period; (c) computing the cross correlation of the music track across the plurality of channels; (d) carrying out steps (b) and (c) for a plurality of different hypothesis in which at least the number of the plurality of channels are variable; (e) determining the meter of the music track as the number of channels of the hypothesis having the most consistent result. The meter may be determined as the argument corresponding to the minimum of the plurality of hypotheses.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings, in which: Figure 1 is a schematic diagram of the functions of a known automatic DJ system;

Figure 2 is a schematic of the apparatus according to a first embodiment of the present invention;

Figure 3 is a schematic of apparatus according to a second embodiment of the present invention;

Figures 4a and 4b illustrate graphical representations of an example of an input and output, respectively, of the demultiplexer of the apparatus of the embodiments of the present invention;

Figure 5 is a schematic of apparatus according to a third embodiment of the present invention; and

Figure 6 is a schematic of apparatus according to a fourth embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS A first embodiment of the present invention will now be described with reference to Figure 2. The apparatus comprises an input terminal 201 connected to the input of a framing means 203. The output of the framing means 203 is connected to the input of a Fast Fourier Transform (FFT) processor 205. The output of the FFT processor 205 is connected to a weighted averaging means 207. The output of the weighted averaging means

207 is connected to the input of a buffer 209. The output of the buffer 209 is connected to the input of a demultiplexer 211. The control of the demultiplexer 211 is connected to a genre or meter input terminal 213 of the apparatus 200. The plurality of outputs of the demultiplexer 211 are connected to the respective inputs of a signal comparator 215. The plurality of outputs of the signal comparator 215 are connected to respective inputs of a selector 217. The output of the selector 217 is connected to the output terminal 219 of the apparatus 200.

The apparatus 200 identifies the accented beats of an input music track x[n]. The input music track x[n] is processed to generate a song feature signal E[k]. First, the input music track (or song) is divided into a plurality frames by the framing means 203. The Fast Fourier Transform of each frame is derived by the processor 205 and weighted by the weighted averaging means 207. The song feature signal output from the weighted averaging means 207 is temporarily stored in the buffer 209 for delivery to the demultiplexer 211. More particularly, if Xk[fJ represents the FFT of the signal x[n] corresponding to the k-th frame, then the output E[k] in the buffer 209 is mathematically given by

where, C[fJ is the weighting factor for the frequency component f and N is the FFT frame size in number of samples.

The song meter information or a priori acquired genre data (For instance dance songs usually have a 4-4 meter structure) is input on the terminal 213. Given the meter M, the song feature signal E[k] is then de-multiplexed into M channels in demultiplexer 211 such that each consecutive channel contains a corresponding consecutive beat. Subsequently the M signals are compared against each other in the comparator 215 to determine which channel has differentiating properties. Since accented beats have differentiating properties, the channel identified as containing such a signal will be that which is the accented beat. This channel is then selected and marked as an accented beat by the selector 217 and output on the output terminal 219 of the apparatus 200.

A second embodiment will now be described with reference to Figure 3. Those elements common to Figure 2 have the same reference numerals as those of Figure 2 and will not be described here in detail.

The output of the buffer 209 is connected to the input of a tempo and beat onset estimator 301 and the input of a demultiplexer 303. The output of the tempo and beat onset estimator 301 is also connected to the input of the demultiplexer 303. The plurality of

outputs of the demultiplexer 303 are connected to respective inputs of a correlator 307. The control of the demultiplexer 303 is connected to a meter input terminal 305. The plurality of outputs of the correlator 307 are connected to a selector 309. The output of the selector 309 is connected to an output terminal 219 of the apparatus 300. As described above with reference to Figure 2, the song feature signal E[k] is extracted and temporarily stored in the buffer 209. Given the feature signal E[k], first the tempo and the beat onset positions are estimated by the estimator 301. Subsequently, the demultiplexer 303 performs block- wise de-multiplexing of the feature signal, where the block-size is equal to the beat-period over M channels, where M is the meter of the song. An example of a typical input signal E[k] and output of the demultiplexer 303 are shown in Figures 4a and 4b, respectively.

Operation of the apparatus of the embodiments above will now be described with reference to Figures 4a and 4b. First the beat onset positions are determined using known beat detection algorithms such that the feature signal E[k] is segmented into blocks of beat periods as shown in Figure 4a. Subsequently, in the demultiplexer 303, the segmented signal is block- wise de-multiplexed into M-channels, where M is the meter of the song being analyzed. For a typical dance music, M=4. With this procedure, wave shapes corresponding to accented beats are multiplexed into one channel, channel 4 as shown in Figure 4b. For each channel m=l ...,M, the cross correlation function σ m defined as

is computed in the correlator 307. The channel with the differentiating correlation value is then chosen as the accented beat by the selector 309. As is clear from Figures 4a and 4b, the fourth channel has a slightly different property than all the other channels. Thus, it corresponds to the accented beat. The accented beat is thus located by the selector 309. Once the accented beat has been selected, it can be mixed by aligning the accented beats in the two songs to provide a rhythmically coherent transition.

In another preferred embodiment. The correlation function o m is defined as

c m = ∑{E m [k].E m [k]) k=o

In yet another preferred embodiment, the demultiplexer is followed by a filtering stage to remove transients similar to that seen in channel 4 of Figure 4b. In this case the last stage of the processing blocks are as shown in Figure 5. The apparatus comprises a demultiplexer 303, correlator 307 and selector 309 similar to those of Figure 3. The first stage is the same as that of Figure 3 and is not shown here. A filter 501 is incorporated between the output of the demultiplexer 303 and the correlator 307. The filter 501 removes transients and, therefore, the correlation results are less affected by transient noises and hence gives more reliable results. The wave shapes output by the demultiplexer of the embodiments above in the different channels can be compared in several different ways. For example, correlation coefficients, signal difference, number of zero crossing etc could be used as a measure to differentiate the beat types.

The meter of the song may be derived from the genre of the song. However, this may provide an inaccurate result since in some cases two songs of the same genre may have different meters. Therefore, according to a preferred embodiment, the meter may be determined by the apparatus shown in Figure 6.

The input music track (or song) x[n] is placed on the input of a beat detector 601 and a feature extractor 603. The output of the beat detector 601 is connected to the feature extractor 603 and the input of a first demultiplexer 605 and a second demultiplexer 607. Only two demultiplexers are shown in Figure 6. However, it can be appreciated that the apparatus may comprise any number of demultiplexers depending on the number of hypotheses. The plurality of outputs of the first and second demultiplexers 605, 607 are connected to respective inputs of a first and second cross correlator 609, 611. The outputs of the first and second cross correlators 609, 611 are connected to first and second selectors 613, 615. The outputs of the first and second selectors 613, 615 are connected to a comparator 617. The output of the comparator 617 is connected to the output terminal of the apparatus.

In operation, first the beat onset positions are determined using known beat detection algorithms in the detector 601 and a feature extracted signal r[k] is output. The feature signal r[k] is segmented into blocks of beat periods as shown in Figure 4a.

Subsequently, in the first and second demultiplexers 605, 607, the segmented signal r[k] is block-wise de-multiplexed into M-channels. The first demultiplexer 605 demultiplexes the signal r[k] into 4 channels. The second demultiplexer 607 takes a second hypothesis and

demultiplexes the signal r[k] into 3 channels. For each channel m=l ...,M, the cross correlation iunction x m (M) defined as

[*])

is computed in the first and second correlators 609, 611. The channel with the minimum correlation value is then chosen as the accented beat. As shown in Figure 4b, the fourth channel has a slightly different property than all the other channels. Thus, according to the invention, it corresponds to the accented beat. Following, the minimum value x(M,) is computed.

Finally, the meter M of the song is determined by the comparator 617 as the argument corresponding to the minimum of the n hypothesis.

M = argtminWM j ), X(M 2 ),..., x[M J}).

In the apparatus of Figure 6, the hypothesis of M=A andM=3 are considered. The above embodiment can be extended to determine the "phrase" length instead of song meter. In this case M=I 6 and M=I 2 could be used for 4/4 and 3/4 style songs, respectively.

Although preferred embodiments of the present invention have been illustrated in the accompanying drawings and described in the foregoing detailed description, it will be understood that the invention is not limited to the embodiments disclosed, but is capable of numerous modifications without departing from the scope of the invention as set out in the following claims.