Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ANALYSIS AND SYNTHESIS OF RHYTHM
Document Type and Number:
WIPO Patent Application WO/1993/024923
Kind Code:
A1
Abstract:
A rhythm analyser comprises input means (2), a low-pass filter bank (7) including a plurality of Gaussian-type low-pass filters each having differing cut-off frequencies, differentiating means (8) for producing a plurality of derivative signals, detection means (9) for detecting a substantially zero value in at least one of the derivative signals and for producing a zero crossing signal, storage means (10, 11), and processing means to record structural information in the storage means (10, 11) relating to the zero crossing signal comprising at least one of, the position in time of the zero crossing signal, the identity of the low-pass filter which produced the filtered signal, or the energy of the zero crossing signal. A corresponding system for synthesising musical rhythmic expression comprises synthesis means (116) for receiving structural information and for generating synthesis signals from the structural information.

Inventors:
TODD NEIL PHILIP MCANGUS (GB)
Application Number:
PCT/GB1993/001183
Publication Date:
December 09, 1993
Filing Date:
June 03, 1993
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TODD NEIL PHILIP MCANGUS (GB)
International Classes:
G10H1/12; G10H1/40; (IPC1-7): G10H1/40; G10H1/12
Foreign References:
US4354412A1982-10-19
GB2063546A1981-06-03
US4276802A1981-07-07
US4939471A1990-07-03
Download PDF:
Claims:
CLAIMS
1. A rhythm analyser comprising: input means for receiving an input signal representing the energy of a timevarying electrical signal and for generating an output signal, a lowpass filter bank including a plurality of Gaussiantype lowpass filters having differing cut¬ off frequencies each for filtering the output signal to produce a plurality of respective filtered signals, differentiating means for receiving the filtered signals and for producing a plurality of derivative signals representing at least the first derivative of each filtered signal, detection means for detecting a substantially zero value in at least one of the derivative signals and for producing a zero crossing signal corresponding to the derivative signal having the substantially zero value, storage means, and processing means operable, upon receipt of the zero crossing signal, to record structural information in the storage means relating to the zero crossing signal comprising, at least one of: the position in time of the part of the input signal which corresponds to the zero crossing signal; the identity of the lowpass filter which produced the filtered signal which corresponds to the zero crossing signal; and the energy of the part of the input signal which corresponds to the zero crossing signal.
2. An analyser according to claim 1, wherein the low pass filters have time constants in the range from seconds to 20ms.
3. An analyser according to any preceding claim, wherein each lowpass filter has an infinite impulse response.
4. An analyser according to any preceding claim, wherein the timevarying electrical signal is an audio signal.
5. An analyser according to any preceding claim, wherein the detection means is operable to detect a stress structure by recognising the coincidence of a substantially zero value in the first order derivative and a negative second order derivative in the derivative signals associated with one of the lowpass filters.
6. An analyser according to any preceding claim, wherein the detection means is operable to detect a group structure by recognising the coincidence of a substantially zero value in the first order derivative and a positive second order derivative in the derivative signals associated with one of the lowpass filters.
7. An analyser according to any preceding claim, wherein the detection means is operable to detect an onset structure by recognising the coincidence of a substantially zero value in at least the second order derivative and a negative value in the next highest order derivative in the derivative signals associated with one of the lowpass filters.
8. An analyser according to claim 7, wherein the onset structure is detected by recognising a substantially zero value in the second order derivative.
9. An analyser according to any of claims 5 to 8, wherein the processing means is operable to record the type of structure as part of the structural information.
10. An analyser according to any preceding claim, wherein the input means comprises an antialiasing filter and an ADC for sampling the input signal and wherein the lowpass filters are digital filters.
11. An analyser according to any preceding claim, wherein the storage means comprises any one of a computer disk, a computer memory, a VDU driver circuit, a chart recorder, a tape streamer and a printer.
12. A method of analysing rhythm comprising: filtering a signal representing the energy of a time varying electrical signal through a lowpass filter bank comprising a plurality of Gaussiantype lowpass filters having differing cutoff frequencies to produce a plurality of respective filtered signals, differentiating the filtered signals to produce a plurality of derivative signals representing at least the first derivative of each filtered signal, detecting a substantially zero value in at least one of the derivative signals and producing a zero crossing signal corresponding to the derivative signal having the substantially zero value, and upon receipt of the zero crossing signal, recording structural information relating to the zero crossing signal comprising, at least one of: the position in time of the part of the input signal which corresponds to the zero crossing signal; the identity of the lowpass filter which produced the filtered signal which corresponds to the zero crossing signal; and the energy of the part of the input signal which corresponds to the zero crossing signal.
13. A method according to claim 12, comprising detecting a stress structure by recognising the coincidence of a substantially zero value in the first order derivative and a negative second order derivative in the derivative signals associated with one of the lowpass filters.
14. A method according to claim 12 or to claim 13, comprising detecting a group structure by recognising the coincidence of a substantially zero value in the first order derivative and a positive second order derivative in the derivative signals associated with one of the lowpass filters.
15. A method according to any of claims 12 to 14, comprising detecting an onset structure by recognising the coincidence of a substantially zero value in at least the second order derivative and a negative value in the next highest order derivative in the derivative signals associated with one of the lowpass filters.
16. A method according to claim 15, wherein the onset structure is detected by recognising a substantially zero value in the second order derivative.
17. A method according to any of claims 12 to 16 comprising recording the structural information on any one of a computer disk, a computer memory, a VDU driver circuit, a chart recorder, a tape streamer and a printer.
18. A system for synthesising musical rhythmic expression comprising: synthesis means for receiving structural information and for generating synthesis signals from the structural information, a pulse generator coupled to the synthesis means, for generating a pulse signal from the synthesis signals, and a filter means coupled to the pulse generator, comprising a plurality of Gaussiantype filters, for generating an expressive movement signal by filtering the pulse signal and summing the outputs of the filters.
Description:
ANALYSIS AND SYNTHESIS OF RHYTHM

This invention relates to a rhythm analyser for analysing rhythm and to a rhythmic expression synthesiser for synthesising rhythmic expression, particularly though not exclusively for use with signals representing a musical performance.

In several fields of study it is useful to be able to consistently and accurately analyse the rhythm of a sound. The sound may for instance be a musical or poetic performance in which case the analysis yields information about methods of performance and the way in which the performance is segmented by the performer. In other fields such as the analysis of birdsong and insect chirps, the analysis is an aid to classification and recognition of the sound. In a slightly different field, the analysis of rhythm may be useful in analysing electro-myographic signals associated with muscular movements.

For example, it is known that temporal and dynamic variations in a performance (which variations constitute an expressive signal) contain significant information about the intended structure and interpretation provided by the performer. The structure of the music may then be represented as a hierarchical grouping of events such as notes and phrases, in a tree structure.

A few systems exist which attempt to recover rhythmic structure from a performed sequence of music. One such system employs a computer program based on a grammar, which takes input values from a musical keyboard and groups musical notes according to their rhythmic structure. Another system uses a neural network approach to the same problem. A further system employs a signal processing approach based on a model of the auditory system and is thus able to analyse music directly from sound rather than from

electrical input from a keyboard or other instrument. This latter system runs in real-time and is able to track a live performance.

Whilst each of these systems is able to recover low-level rhythmic structure to varying degrees of success, they do not address the problem of higher level structure above the level of the musical bar, i.e. structures such as phrases, sections or even movements. Furthermore, none of these systems is able to recover fully the expressive information contained in a musical performance.

Whilst systems exist for synthesising expressive performances, none employ a realistic representation of structure and thus considerable adjustment by hand is required to produce a realistic performance.

According to a first aspect of the invention, a rhythm analyser comprises input means for receiving an input signal representing the energy of a time-varying electrical signal and for generating an output signal, a low-pass filter bank including a plurality of Gaussian- type low-pass filters having differing cut-off frequencies each for filtering the output signal to produce a plurality of respective filtered signals, differentiating means for receiving the filtered signals and for producing a plurality of derivative signals representing at least the first derivative of each filtered signal, detection means for detecting a substantially zero value in at least one of the derivative signals and for producing a zero crossing signal corresponding to the derivative signal having the substantially zero value, storage means, and processing means, the processing means being operable, upon receipt of the zero crossing signal, to record structural information in the storage means relating to the zero crossing signal comprising, at least one of, the position

in time of the part of the input signal which corresponds to the zero crossing signal, the identity of the low-pass filter which produced the filtered signal which corresponds to the zero crossing signal, and the energy of the part of the input signal which corresponds to the zero crossing signal.

According to a second aspect of the invention, a method of analysing rhythm comprises filtering a signal representing the energy of a time varying electrical signal through a low-pass filter bank comprising a plurality of Gaussian-type low-pass filters having differing cut-off frequencies to produce a plurality of respective filtered signals, differentiating the filtered signals to produce a plurality of derivative signals representing at least the first derivative of each filtered signal, detecting a substantially zero value in at least one of the derivative signals and producing a zero crossing signal corresponding to the derivative signal having the substantially zero value, and upon receipt of the zero crossing signal, recording structural information relating to the zero crossing signal comprising, at least one of, the position in time of the part of the input signal which corresponds to the zero crossing signal, the identity of the low-pass filter which produced the filtered signal which corresponds to the zero crossing signal, and the energy of the part of the input signal which corresponds to the zero crossing signal.

The range of time constants for the lew pass filters is preferably between 100 seconds and 20ms. This range is particularly useful for the analysis of musical rhythmic expression.

By preferably using low-pass filters having infinite impulse -esponses, the tool may be made more useful in

psychological studies, since these types of filters are more akin to natural filter processes than those having finite impulse responses.

Preferably, the input signal is sampled and the low-pass filters are digital filters. This permits much greater flexibility in the operating parameters of the analyser particularly in relation to the number, type and spacing of the filters in the filter bank.

Given a form of expressive representation, or having generated structural information, and perhaps stored it, the information may be used to synthesise musical rhythmic expression.

According to a third aspect of the invention, a system for synthesising musical rhythmic expression comprises synthesis means for receiving structural information and for generating synthesis signals from the structural information, a pulse generator coupled to the synthesis means, for generating a pulse signal from the synthesis signals, and a filter means coupled to the pulse generator, comprising a plurality of Gaussian-type filters, for generating an expressive movement signal by filtering the pulse signal and summing the outputs of the filters.

The expressive movement signal is preferably fed back to the synthesis means to be used, in conjunction with built-in information related to typical tempo and dynamic variations used by a performer for particular structures, to provide a modulation output. The modulation output is preferably used to modulate the automatic performance of an instrument which might for example, be a MIDI-type instrument.

Preferably the structural infc-..nation is a library store of zero-crossing structures generated by the analysis system. It may instead be generated directly from a score. This score is preferably of the type used for input to a MIDI-type instrument. By combining the modulation output feature and the direct structural information generation feature, the synthesis system may be used to read a score and to generate a fully automated rhythmically expressive performance of the score with little or no manual intervention.

According to a fourth aspect of the invention, a method of analysing rhythm comprises taking a signal that has been filtered through a Gaussian-type filter means and differentiated, in order to extract rhythmic structural information.

The invention will now be described by way of example, with reference to the drawings in which:-

Figure 1 is a block diagram of a rhythm analyser in accordance with the invention;

Figures 2(a) to 2(c) show a musical rhythmic cliche;

Figures 3(a) to 3(e) each show a zero-crossing structure for the cliche of Figure 2;

Figure 4 shows an energy density spectrum (or surface) for the cliche of Figure 2;

Figure 5 is a block diagram of a first embodiment of a synthesis system in accordance with the invention; and

Figure 6 is a block diagram of a second embodiment of a synthesis system in accordance with the invention.

A rhythm analyser in accordance with the invention may be implemented in combination with a conventional computer with some additional hardware and software. The additional hardware and software requirements are described below. The computer (not shown) may for example, be an IBM compatible or Apple Macintosh (Trade Mark) type computer. The computer may have some form of graphical output device such as a printer or VDU, and a storage device such as a disk drive or tape streamer. Additionally it may have some form of user input device such as a keyboard and/or a pointing device such as a mouse.

Alternatively, the analyser described below may be manufactured using entirely analogue components.

With reference to Figure 1, an analogue audio signal is fed from an input device 1 into an input of an interface card 2 installed in the computer. The input device 1 may for example, be a tape cassette, compact disk player or microphone.

The card 2, comprises a full-wave rectifier and analogue multiplier 3, an anti-aliasing filter 4 and a 12 bit analogue-to-digital convertor (ADC) 5. The rectification circuit used in the rectifier 3 may be an active circuit using two operational amplifiers and two signal diodes in a conventional manner. The analogue multiplier is used to square the audio input signal so that the output from this stage is a measure of the energy density of the input signal. The multiplication circuit may also provide any amplification and/or buffering required by the output of the input device 1.

The cut-off frequency of the low-pass anti-aliasing filter is variable to allow the analyser to analyse different frequencies. The cut-off frequency should be

at most half that of the sampling frequency of the ADC 5 and in practice, the cut-off frequency is kept below 20Hz with a sampling frequency in excess of 100Hz. The filter 4 may be a 5th-order, all-pole, maximally flat low-pass filter implemented using an LTC1062 chip, in which case the cut-off frequency can be adjusted by altering the frequency of the clock signal supplied to the chip.

The sampling frequency of the ADC 5 is also variable and should be adjusted according to the need to avoid aliasing of the input signal and the capability of the rest of the computer to accept the data output from the ADC 5.

The output from the ADC 5 comprises a series of values representing the instantaneous energy density of the input signal, and may be processed in real time and/or may be stored on a storage device such as disk storage 10, for later processing.

The main elements for processing the ADC output data are enclosed in box 6 in Figure 1 and comprise a digital low- pass filter bank 7, differentiation means 8 and detection means 9. As mentioned above, although in this embodiment, the elements in box 6 are implemented in software, they could instead be implemented in hardware.

The filter bank 7 comprises several Infinite Impulse Response (IIR) filters having a Gaussian-type response. Since a perfect Gaussian response cannot practically be implemented, approximations must be used and these can be selected from Taylor, Laguerre or Bessel series approximations. The order of each filter is adjustable and a range of between first and ninth order is acceptable in this application. The filter range is also adjustable i.e. the shortest and longest time constant in

the filter bank. Finally, the filter spacing is adjustable, in terms of the number of filters per octave.

The digital implementation of the filters may be derived for example, by applying a bilinear transform to a suitable analogue filter design.

The output of each filter represents the energy density of each frequency range or channel and therefore taking all channels, represents an energy density frequency spectrum as a function of time. The output from each channel is fed from the filter bank to the differentiating means 8.

The differentiating means 8 differentiates the output from each channel up to the nth derivative where n is variable. The differentiation of the filtered signal has the same effect as passing the filter through a band-pass bank with a Gaussian response and the analyser could be implemented in that way.

The output from the differentiating means 8 is passed to the detection means 9.

A rhythmic interpretation of the input signal which in musical terms, can be described by metre and phrasing requires three kinds of information and the detection means is triggered under three conditions relating to these three types of zero crossing structure:

(i) stress structure - a zero in the 1st order derivative in conjunction with a negative 2nd order derivative, in a particular channel at a particular time. (ii) group structure - a zero in the 1st derivative in conjunction with a positive second derivative in a particular channel at a particular time.

(ii) onset structure - a negative-going zero crossing of the nth order derivative (typically n=2) for a particular channel at a particular time.

The analyser is, in signal processing terms, convolving a Gaussian type infinite impulse response with the signal energy ]x(t)j 2 over a range of time constants {τ|k=0...N-l> such that

S G ( t, f k ) = G( t, τ k ) * \ x( t) | 2 = rG{ t', τ k ) | -x(t-ϋ') \ dt'

J 0

Where S(t,f) is a windowed measure of the energy density spectrum, G(t,τ) is a Gaussian low-pass impulse response function, f = l/τ k , t is time and τ is a time constant.

In the case of onset detection, the conditions d 2 -rS G2 ( t, τ k ) = dt

and

S Ω (t,τ > 0 dt

must be met and similar conditions as described above, must be met for the other two zero-crossing structures (group and stress structures).

A musical rhythmic cliche is shown in Figure 2 and sample outputs from the analyser are shown in Figures 3(a) to 3(e), for that cliche. The loci of zer - crossings of the filter d 2 /dt 2 G 2 (t, τ k ) convolved with tne signal energy from the sequence shown in Figure 2(d) is shown in Figure 3(a) which shows an onset structure. G 2 denotes a second-order Gaussian approximation. The time constants have a range 20ms-200ms. Figures 3(b) to 3(d)

show stress structures. In Figure 3(b), the loci of the zero crossings of the filter d/dtG 3 (t,τ k ) convolved with the signal energy from the sequence shown in Figure 2(d) is shown. In Figure 3(c) the same sequence is shown with a 3dB accent on e5 and in Figure 3(d) the same sequence is shown again with e5 doubled in length giving the musical effect of legato. G 3 is a third order Gaussian approximation and the time constants τ k have the range 50ms-3000ms. Figure 3(e) shows a group zero crossing structure for the cliche of Figure 2. This has separated the groups (PI and P2 in Figure 2(a)) but for a short input signal such as this, it does not yield a great deal of information. The group structure will for example indicate the boundaries of a verse in poetry

When the detection means 9 is triggered, it records the time, channel, energy and type of trigger. This structural information relating to the zero crossing structures may be stored on the storage device 10 or may be passed on to the output processor 11.

The storage device 10, can be used to store output data from the ADC 5 in the form of ASCII text files. Since the sampling rate is in most cases, not much more than 100Hz, it is feasible to store long samples. ASCII text files may also be used for storing information relating to the zero-crossing structures.

The output processor 11 is able to process three types of display which may be output on a VDU, printer or other type of graphical display device.

The three types of display are:

(i) A display of the three types of structures 12 that trigger the differentiating means. (These have been discussed above in connection with Figures 2 and 3) .

(ii) A display of the energy density spectrum 13 of the input signal. (iii)A display of analyser system parameters 14.

The three structures (or zero-crossing structures) are of three kinds as discussed above. Each of these can be displayed separately or in any combination. The structures can be displayed as a 2D scatterplot of events with time on the X-axis and either time-constant or energy on the Y axis. A single event can be represented by a single pixel. Alternatively , the 3D aspect (i.e. time, time-constant and energy) of the zero crossing structures can be represented using colour to represent the energy of a particular event. In addition, any of the displays may be annotated by the operator. This feature may be used to label particular features of a zero crossing structure.

The stress structure is used most often and resembles the tree structures used in speech and musical analysis to represent rhythm.

Typically, onset structures in music for example, have short time constants of up to about 1 second whereas stress and group structures have longer time constants. For example to recover the group structure relating to the movements of a symphony, time constants of the order of the length of a movement are required i.e. 15 or 20 minutes. Using time constants longer than the duration of the input signal will yield no additional information. Since the range of time cnstants displayed may be varied separately for stress a: i group ε ctures on one hand and for onsets on the other, the sc -.es of the plots may vary. In this case the display may be divided into separate "windows" which display simultaneously the global and local aspects of the structures.

The energy density frequency spectrum can be plotted directly from the data output of the filter bank. This display is not restricted to the display only of zero- crossing triggers detected by the detection means 9. With reference to Figure 4 which shows an energy density spectrum (or surface) for the cliche of Figure 2, the display resembles a "mountain range" with energy represented by the height of the range, decreasing time constant represented by the distance into the plot and time moving from left to right. In this display, the valleys correspond to the segmentation or group structure and the peaks correspond to the stress structure.

The energy density spectrum may be scrolled in real-time to give a dynamic representation which aids the intuitive understanding of the zero-crossing structure plots.

The sytem is controlled and directed using processing and user control means 15.

For each of the foregoing displays, the system parameters used to record and display the plot, may be simultaneously displayed.

The third display is that of the analyser parameters 14, These can be altered using a keyboard 16 and mouse. The parameters that may be varied by an operator of the analyser are either process control parameters 17a to 17e or data control parameters 18a to 18c.

The following process control parameters may be adjusted:

(a) anti-aliasing low pass cut-off fre-quency (typically about 20Hz) ,

(b) sampling rate (typically about 100Hz),

(c) number of samples (sets duration of analysis, can be infinite for real time analysis with no storage of data) , and

(d) digital filter bank parameters

(i) - approximation type (Bessel, Laguerre,

Taylor, typically Taylor) (ii) approximation order (typically 6th order for stress, 2nd order for onset)

(iii) range (typically 0.25s-30ε for stress,

0.05s-0.5s for onset) (iv) filter spacing (typically 12 per octave).

The following data control parameters may be adjusted:

(a) sample status flags,

(i) acquire only - storage only, output filename requested. (ii) acquire and analyse - storage and analysis, output filename requested.

(iii) retrieve and analyse - retrieve from storage and analyse, input filename requested.

(b) energy density display flags, and (i) display enabled - x and y scaling factors requested. (ii) scroll/static - type of display, scrolled or repeatedly overwritten in static form.

(c) zero-crossing structure display flags (i) define combination of display types - stress structure, group structure and/or onset structure.

A corresponding synthesis system operates to regenerate a rhythmically expressive performance and has two separate modes:-

(i) Reconstructive Mode in which the envelope of an expressive signal can be reconstructed from its structural analysis which forms an input.

(ii) Modulation mode in which the output of the synthesis filter can be used to modulate a performance via a standard MIDI interface.

With reference to Figure 5 and 6, the synthesis system comprises:-

(a) a means of representing and retrieving a stored structural information 114;

(b) a synthesis algorithm 116 which in Figure 5 is the reverse of the analysis elements 6;

(c) a multi-channel pulse generator 118; and

(d) a multi-channel synthesis filter-bank 120.

In reconstructive mode (as shown in Figure 5) , the structural information which forms the input to the synthesis system have the same format as that generated by analysis system. The structures can be retrieved from a library. In reconstructive mode, the synthesis algorithm reconstructs the real-time properties of the structure from its compact coding and directs to each channel a series of logic pulses of varying widths. The logic pulses trigger the pulse-generator which in turn provides impulses to the synthesis filter system which simulate the zero-crossings recorded by the analysis system. The synthesis filter bank mirrors the analysis filter bank used to derive the structure information. The parameters of the synthesis filter-bank are the same as those used in the analysis filter bank when the structural information was recorded. The response of the synthesis filters are summed to produce a reconstructed signal envelope.

With reference to Figure 6, the starting point in beat modulation mode is a score 122 of a piece of music i.e. a representation of the note values necessary to drive a MIDI interface 124. From this the user derives a normative structure 114b i.e. one which does not have real-time values, but relative durations. The terminal nodes of this structure correspond to beats in the score 122.

The system operates from this derived structure 114b in much the same way as in reconstructive mode. However, in this mode, the expressive movement is fed back via a feedback path 126 to the synthesis algorithm 116b. This algorithm differs from that used in reconstructive mode in that it operates to modulate the tempo and dynamics of a modulated score 128 according to built-in "performance" rules which operate on the expressive movement signal. The modulated score 128 is output via the MIDI interface 124 to a MIDI instrument 126. This system thereby generates an automatic expressive performance from a musical score. The output from the MIDI instrument is also controlled by style parameters such as "global tempo" .