METHOD AND APPARATUS FOR GENERATION OF AUDIO SIGNALS

Title:

METHOD AND APPARATUS FOR GENERATION OF AUDIO SIGNALS

Document Type and Number:

WIPO Patent Application WO/2017/137751

Kind Code:

Abstract:

A method of generating an audio signal, the method comprising: receiving a real-time data stream from a physical sensor; transforming the real-time data stream using a predefined transformation to generate a control signal, the control signal comprising a time sequence of values; storing the control signal as an animation data; applying the control signal to one or more parameters of a sound model; and generating an audio signal from the sound model.

Inventors:

HEINRICHS CHRISTIAN (GB)
MCPHERSON ANDREW (GB)

Application Number:

PCT/GB2017/050326

Publication Date:

August 17, 2017

Filing Date:

February 09, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV LONDON QUEEN MARY (GB)

International Classes:

G10H1/00

Foreign References:

US20140298975A1	2014-10-09
JP2008176679A	2008-07-31
US20120062718A1	2012-03-15

Other References:

FRÉDÉRIC BEVILACQUA ET AL: "Music control from 3D motion capture of dance", 1 January 2001 (2001-01-01), XP055369865, Retrieved from the Internet [retrieved on 20170505]
ONDER O ET AL: "Keyframe Reduction Techniques for Motion Capture Data", 3DTV CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, IEEE, PISCATAWAY, NJ, USA, 28 May 2008 (2008-05-28), pages 293 - 296, XP031275269, ISBN: 978-1-4244-1760-5
JUN XIAO ET AL: "An Efficient Keyframe Extraction from Motion Capture Data", 1 January 2006, ADVANCES IN COMPUTER GRAPHICS LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 494 - 501, ISBN: 978-3-540-35638-7, XP019041364

Attorney, Agent or Firm:

J A KEMP (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method of generating an audio signal, the method comprising:

receiving a real-time data stream from a physical sensor;

transforming the real-time data stream using a predefined transformation to generate a control signal, the control signal comprising a time sequence of values;

storing the control signal as animation data;

applying the control signal to one or more parameters of a sound model; and

generating an audio signal from the sound model.

2. A method according to claim 1 wherein the control signal is applied to the sound model in real-time. 3. A method according to claim 1 further wherein the control signal is applied to the sound model by replaying the animation data at a later time.

4. A method according to claim 3 wherein replaying the control signal is initiated by a user action.

5. A method according to claim 3 or 4 wherein the animation data comprises keyframe data.

6. A method according to claim 3, 4 or 5 further comprising receiving one or more parameter values in response to the user action and wherein the application of the control signal to the sound model is controlled by the parameter value(s).

7. A method according to claim 3, 4 or 5 further comprising:

applying a further control signal from a further animation data to the or a further sound model to generate a further audio signal; and

combining the audio signal and the further audio signal.

8. A method according to claim 3, 4 or 5 further comprising:

combining the animation data and further animation data under the control of one or more meta-parameters to generate a combined animation; and

applying the combined animation to the sound model to generate the audio signal.

9. A method according to any one of the preceding claims wherein the real-time data stream represents a human gesture detected by the physical sensor as a time series of a plurality of variables.

10. A computer-implemented method of generating an interactive sound effect, the method comprising:

receiving a user selection of a sound model having a plurality of input parameters;

receiving a real-time data stream from a physical sensor;

transforming the real-time data stream using a predefined transformation to generate a control signal, the control signal comprising a time sequence of values;

storing the control signal as animation data; and

generating a sound effect module having at least one input meta-parameter by combining the animation data and the sound model.

11. A computer-implemented method of generating an interactive sound effect, the method comprising:

receiving a user selection of a sound model having a plurality of input parameters;

generating a control signal in response to user input, the control signal comprising a time sequence of values;

storing the control signal as animation data; and

generating a sound effect module having at least one input meta-parameter by combining the animation data and the sound model.

12. A computer-implemented method according to claim 11 further comprising

receiving a further real-time data stream from a physical sensor;

transforming the further real-time data stream using a predefined transformation to generate a further control signal, the further control signal comprising a time sequence of values; and

storing the further control signal as further animation data;

wherein the sound effect module is generated by combining the animation data, the further animation data and the sound model.

13. A computer program comprising code means that, when executed by a computer system, instruct the computer system to perform a method according to any one of the preceding claims.

14. A sound effect module comprising code means that, when executed by a computer system, instruct the computer system to:

receive a control input; and generate an audio signal in response to the control input using animation data to control a sound model.

15. A sound effect module according to claim 14 wherein the control input is a multi -valued variable and generating the audio signal is responsive to the value of the control input.

16. An interactive computer program including a sound effect module according to claim 14 or 15.

17. An interactive computer program according to claim 16 which is a computer game, virtual reality application, musical instrument or interactive interface.

18. An apparatus for generating a sound effect, the apparatus comprising:

a physical sensor;

a processor;

an audio signal generator; and

a memory, wherein the memory is configured to store a sound model and instructions to cause the processor to perform a method according to any one of claims 1 to 12.

Description:

METHOD AND APPARATUS FOR GENERATION OF AUDIO SIGNALS

FIELD OF THE INVENTION

[0001 ] The present invention relates to the generation of audio signals and in particular to the use of human gesture and separation of data (animation) and process (model) in the design of generative audio for interactive applications.

BACKGROUND

[0002 ] Sound effects of various types are widely used in fields such as computer games, movies and television. Rather than attempting to generate the desired sound with physical objects, it is common to use digital synthesisers of various types to generate sound models which may have one or more input parameters, e.g. volume or duration, to allow different sounds to be generated from a single model. Such an arrangement is depicted in Figure 1A which shows an application 1 which provides a control signal to an interface 2 that controls a numerical model 3.

[0003 ] The design of more complex, interactive sound models (e.g. for games, mobile devices, VR, etc.) has so far been limited to traditional programming languages and some higher level tools such as Puredata, Max/MSP and Reaktor. While each of these tools enable the development of complex sound models, they lack the playful interaction involved in conventional sound design practice and require a high degree of specialised expertise. Thus, generative audio is an underdeveloped field in interactive media and has been neglected in the industry with few compelling cases where the sound quality and production efficiency matches that of recorded samples. Current endeavours, both commercial and academic, focus on the algorithms constituting the design of sound models. A more complex arrangement is shown in Figure IB, where the numerical model 3 is divided into a behaviour abstraction and a signal chain 4.

[0004 ] There have been cases of digital interfaces being used to generate sounds for non-interactive media (i.e. film and animation). For example, Ben Burtt (Skywalker Sound) used a graphics tablet with Symbolic Sound's Kyma system to control parameters of a sound effects processor when designing the sound for the film 'Wall-E'. In music production, MIDI interfaces and other controllers are used extensively to record "automation curves" for controlling audio effects and synthesis parameters. These technologies rely on conventional point-and-click user interfaces (e.g.

AudioGaming), microphones (e.g. Dehumaniser) and trackpads or graphics tablets (e.g. GameSynth, Kyma). Kyma relies on bespoke hardware.

SUMMARY

[0005 ] According to the present invention, there is provided a method of generating an audio signal, the method comprising: receiving a real-time data stream from a physical sensor;

transforming the real-time data stream using a predefined transformation to generate a control signal, the control signal comprising a time sequence of values;

storing the control signal as animation data;

applying the control signal to one or more parameters of a sound model; and

generating an audio signal from the sound model.

[0006 ] According to the present invention, there is also provided a computer-implemented method of generating an interactive sound effect, the method comprising:

receiving a user selection of a sound model having a plurality of input parameters;

receiving a real-time data stream from a physical sensor;

transforming the real-time data stream using a predefined transformation to generate a control signal, the control signal comprising a time sequence of values;

storing the control signal as animation data; and

generating a sound effect module having at least one input meta-parameter by combining the animation data and the sound model.

[0007 ] According to the present invention, there is also provided a computer-implemented method of generating an interactive sound effect, the method comprising:

receiving a user selection of a sound model having a plurality of input parameters;

generating a control signal in response to user input, the control signal comprising a time sequence of values;

storing the control signal as animation data; and

generating a sound effect module having at least one input meta-parameter by combining the animation data and the sound model.

[0008 ] According to the present invention, there is also provided a computer program comprising code means that, when executed by a computer system, instruct the computer system to perform a method as described above.

[0009 ] According to the present invention, there is also provided a sound effect module comprising code means that, when executed by a computer system, instruct the computer system to:

receive a control input; and

generate an audio signal in response to the control input using animation data to control a sound model.

[0010 ] According to the present invention, there is also provided an apparatus for generating a sound effect, the apparatus comprising:

a physical sensor;

a processor;

an audio signal generator; and a memory, wherein the memory is configured to store a sound model and instructions to cause the processor to perform a method as described above.

[0011] Embodiments of the present invention can therefore provide a novel approach to designing computational audio behaviour that incorporates gestural interaction into the design process. The present invention provides new methods, and apparatus for implementing the methods, that aim to mitigate at least some problems of known systems by separating the computational audio model from its expressible behaviours.

[0012] Methods according to embodiments of the present invention may include one of more of the following steps, which are described in more detail below:

· Use and transformation of physical gestures to generate useful control signals

• Real-time routing of control outputs to parameters of a sound model

• Recording of control outputs to a library of animations

• Creation of meta-parameters to control playback and blending of animations

• Exporting of sound model, animation library and playback interface into separate standalone software or plugin

[0013] The present invention proposes to include human performance as a means of exploring a sound model but also as a fundamental tool in the design of its implementation in an interactive scenario. On one hand this makes it feasible to reuse existing sound models while producing unique sounds (removing the cost of developing models internally). On the other hand embodiments of the invention enable fast and playful sound design which is a crucial factor in designing assets for modern games. Furthermore, a sound can easily be adjusted or replaced by recording a new gesture or editing keyframe data, rather than re-recording the sound in a studio or on location. A direct comparison can be drawn to computer animation, where a three-dimensional model and its behaviour are considered as two separate entities. This modularity has made graphical asset production more efficient, powerful and scalable.

[0014] By use of embedded computing, embodiments of the invention can provide near-zero latency (less than 2ms), high resolution sensor readings and a high degree of modularity in physical sensor inputs. For example, a force-sensitive resistor can be combined with an accelerometer to provide four independent degrees of freedom in one hand. Furthermore, because the sensors are sampled at audio- rate it is possible to combine them with everyday objects and therefore augment traditional Foley practices. Similar performance could be achieved in commercial/consumer-grade implementations.

[0015] Most prior art products rely on a One-to-one' mapping of runtime game parameters to the audio generator or are used to generate static waveform assets. In contrast, the present invention has a modular design and can be used for a variety of purposes. The embodiment of the invention can be used to program interactive audio projects without requiring especially high-powered or special purpose processors: a prototype was able to perform extremely fast prototyping by routing sensor signals to synthesis parameters at run-time. Animation data does not need to be encapsulated inside a separate audio plugin, but can instead be exported to accessible formats such as XML and be integrated into existing workflows. Enzien Audio's cloud compiler makes it possible to produce highly efficient and portable code, such that projects can be exported to run on PC, Mac OSX™, Linux, consoles, Android™, iOS™ and popular game engines and audio middleware at the click of a button. The present invention can also allow conventional physical inputs instead (e.g. mouse, keyboard, graphics tablet, iPad™, Bluetooth™ devices). Similarly, while Enzien Audio's Heavy compiler adds very useful functionality to the prototype, it is not required for other implementations of the proposed workflow.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The present invention is described below with reference to exemplary embodiments and the accompanying drawings, in which:

Figures 1 A and B illustrate audio signal generation systems according to the prior art;

Figure 2 depicts an audio signal generation system according to an embodiment of the invention;

Figure 3 depicts a known sound effect module;

Figure 4 depicts a sound effect module according to another embodiment of the invention; Figure 5 depicts breakpoints in an animation generated by a method of the invention; and Figure 6 depicts a sound model according to an embodiment of the invention.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0017] In an embodiment of the invention, real-time sensor inputs are processed using control layers, which can be implemented in software or hardware. Control layers can read several streams of data from physical sensors (e.g. accelerometers, capacitive touch sensors, potentiometers) and process them according to a user-defined transformation (control layers can also generate their own data without the use of a physical input). Desirably the data streams from the sensors are sampled at a rate of 5 kHz, 10 kHz, 20 kHz or greater. In an embodiment a standard audio sampling rate is used, e.g. 44.056 kHz, 44.1 kHz or 48 kHz. Examples of transformations include smoothing, differentiation and more complex ones such as friction simulation. Control layers can be re-used for different sensor configurations and projects, and can easily be stored as templates or examples for later

implementation.

[0018] Outputs from these control layers can be routed to the parameters of a sound model (also referred to herein as a synthesiser) in real-time without interrupting the audio output, allowing for quick exploration and experimentation. The sound model can be chosen from a pre-existing library, be extended by the user or programmed by the user. The sound model need not adhere to physical principles and can instead have abstract parameter input descriptions such as 'pitch', 'roughness', 'brightness', and so forth. The user can perform behaviours (i.e. parameter trajectories) using the physical sensors. Performances can be recorded at any time. Once recorded, all corresponding sensor data is stored into a library of animations, which can be processed further by the user.

Animations can be stored as animation data in any convenient storage medium, whether long term or temporarily, e.g. in a buffer. Each animation can be stored in its own file or multiple animations can be stored in a single file, e.g. in the form of a table.

[0019 ] The model of the invention is depicted in Figure 2. Inputs are provided by a performer 10, e.g. using physical sensors, and/or an interactive application 15. These inputs are converted to behaviours 11 expressed in timbre space which are applied through interface 12 to the sound model 13. The sound model 13 comprises a perceptual abstraction layer (e.g. using perceptual parameters such as 'Pitch', 'Brightness', 'Roughness', 'Gain' etc.) which controls a signal chain 14.

[0020 ] In an embodiment, the sensor data is converted from a regularly sampled stream into a series of keyframes (also known as 'breakpoints') in order to maintain a small file size and enable more natural blending techniques between animations (described below). Processing of recorded sensor data involves smoothing the data and setting start and end points. In addition, the user can choose to repeat the keyframe reduction process by specifying the number of desired keyframes per second. Animations can be auditioned at any time during this process.

[0021 ] After accumulating the desired amount of animations, an exporting tool can be used to export the sound model and animation library into a standalone plugin. The exporting tool allows the user to define a set of meta-parameters which are used to automatically playback and blend animations. Multiple animations can be blended in real-time according to animation weightings specified by the user.

[0022 ] Keyframe data is warped on both temporal and vertical axes, allowing for natural transitions from one configuration to the next. The chosen 'meta-parameters' are exposed in the standalone plug- in as parameters which can easily be driven from a game engine or any other interactive environment.

[0023 ] Embodiments of the present invention can advantageously be implemented using an embedded audio platform, such as Bela, which is based on the Beaglebone Black and developed at the Centre for Digital Music, Queen Mary, University of London. User interface (UI) software of an embodiment communicates with the Bela board via scripts and UDP network messages. The user is able to address each analog input and output, as well as stereo audio input and output on the board.

This can be done using Puredata patches which are converted into Vanilla C code, for example using the Heavy Cloud Compiler. The generated code is used in conjunction with specially developed code which enables the user to easily address inputs and outputs of the board using conventional Puredata objects that wouldn't otherwise be used for this purpose.

[0024 ] An embodiment of the invention can automatically generate Puredata patches, upload patches to Enzien Audio's compiler servers, send the generated code to the BeagleBone Black (plus Bela cape), interface with Bela (e.g. to let generated code be compiled on the board), run the generated executable on the board and monitor its output without the user having to leave the software environment. For each sensor control layer generated by the user, the output is monitored by the Bela program and sent to User Interface (UI) software running on the host machine, which in turn visualises the data, e.g. in the style of an oscilloscope. The user is able to interact with the visualisation with the mouse in order to zoom in and scroll. Audio output RMS values are also sent back to the host machine in order for the user to monitor the output level.

[0025 ] Enzien Audio's Heavy Cloud Compiler is a service that allows a user to upload Puredata patches and receive optimised C code (amongst other outputs) in return. The sound model and control layers specified by the user are expressed as Puredata patches with the aid of specially developed

Puredata abstractions that are designed to present the user with a more readable layout. After the user has configured the sound model and control layers (either from within the software or using Puredata), the software can process the corresponding Puredata patches and generate new ones, removing user- friendly abstractions and replacing them with all the necessary code to produce the expected interaction with the Bela platform's features. The generated patch is then sent to Enzien Audio's servers via a Python script (developed and supplied by Enzien Audio) and C code compatible with the Bela platform is received in return. This code is then copied over to the board, compiled on the board and launched. The whole process takes approximately between 5 and 30 seconds or less, depending on the complexity of the synthesiser and control layers designed by the user.

[0026 ] The content generated by the user consists of control layers and a synthesiser. A control layer consists of a Puredata patch that takes sensor readings and processes them to produce a single continuous audio output (e.g. 44.1kHz sampling rate and 16bit resolution). The synthesiser is also a Puredata patch which produces sound based on the values of public parameters (exposed to the UI interface using specially developed Puredata abstractions). Once the UI session has been compiled on the board, the user can route the outputs of each control layer to the public parameter inputs of the synthesiser using the provided user interface. A public parameter can also be given a constant value rather than be controlled by the output of a control layer. This is all done in real-time without needing to re-compile the project.

[0027 ] Each of the public parameters can be enabled for recording. Upon toggling a global 'record' button, all inputs to synthesiser parameters (including static values) are recorded on the board (as raw binary data at audio sampling rate and resolution). When finished recording, the data is sent back to the host machine. The UI software then processes this data and represents it as an 'animation'. The user can choose to edit this animation using an animation editor.

[0028 ] The animation editor contains two viewing methods: a 'track view' and a 'master view'. In the master view the user is presented with an overview of all the tracks and their corresponding data and can select the start and end-points of the animation. The animation can also be made 'seamless', by cross-fading a specified length of time at the beginning and end of the time selection (making it possible to loop the recorded data without hearing any transients at the looping point). In the track view each individual track of recorded data can be edited. The data can be smoothed (by applying a moving -average filter), and reduced to keyframes.

[0029 ] Keyframes are two-dimensional vectors that represent a value and a time . They are commonly used in animation software and are often referred to as 'breakpoint envelopes' in audio workstations. The user can set the desired density of keyframes (in units of keyframes per second). The editor also provides information to the user such as the duration and amount of keyframes in the animation. An example of keyframe data for a wind sound effect model is shown in Figure 5.

Parameters frequency, F, quality factor, Q, and the gain, G, are defined at specific time points. The parameters can change linearly between the defined values or have more complex behaviour.

[0030 ] Keyframe data can easily be exported to XML data, allowing it to be used in common animation software, audio middleware and game engines. The synthesiser itself can be exported to an audio plugin (VST, AU, Wwise, Unity) using Enzien Audio's cloud compiler. The software is also capable of generating a further Puredata patch that contains the synthesiser as well as the animation data. The Puredata patch is designed to be opened on the host machine (rather than compiled onto the Bela platform, though this is also possible). Within this patch the user is able to playback and blend recorded animations. Animation times and values are stored in separate tables making it possible to interpolate multiple parameter trajectories in both dimensions of time and value. Multiple animations can be interpolated at any given time.

[0031 ] The generated patch allows the user to easily generate new parameters (or 'meta-parameters') to control the weighting of animations and their playback. After creating a meta-parameter the user can store the state of the patch (i.e. the individual weightings of the animations) by clicking a button corresponding to the lower or upper value of a meta-parameter.

[0032 ] Animations can be played back in three different ways: 'trigger', 'loop' or 'scrub'. In 'trigger' mode, the patch contains a button-style parameter to play back the (blended) animation from beginning to end. In 'loop' mode, the patch contains a toggle-style parameter to toggle playback, and a continuous parameter controlling the speed of the playback. In 'scrub' mode, a continuous parameter controls the time position of the blended animation. Other playback modes are possible. A Puredata patch representing an amination can be exported to a separate plugin.

[0033 ] An example of a sound effect of an embodiment of the invention is shown in Figure 6. This is a wind sound effect. The playback behaviour 11, e.g. real time from a user's performance captured by a physical sensor or output by an interactive application, is converted by control layer 22 into the parameters frequency, F, quality factor, Q and gain, G. These parameters are applied to sound model 14. Sound model 14 comprises a signal chain including, e.g. a white noise generator 14a and bandpass filter 14b. [0034 ] A known sound effect model useable with an interactive application is shown in Figure 3. The interactive application 20, e.g. a computer game, generates outgoing parameters 21 e.g. as a result of user action. Such parameters may be multivalued variables, for instance requesting the speed or force and direction of a user's movement. The outgoing parameters 21 output by the interactive application are received by interface 22 and applied to the relevant parameter inputs of sound model 13.

[0035 ] A sound effect module according to an embodiment of the invention is shown in Figure 4. In this module an intermediary layer 23 is used to transform the outgoing parameters 21 from the interactive application 20. The intermediary layer 23 comprises one or more animations as described above and may also be referred to as a control layer. The intermediary layer 23 may transform the outgoing parameters 21 from a physical parameter space to a perceptual or other parameter space. The transformed parameters are combined with performance data 24, which may be live or prerecorded, and applied to the interface 32.

[0036 ] Having described an embodiment of the invention it will be approved that the variations of the above embodiments are possible. The present invention is therefore not to be limited by the above description but only by the appended claims.

Previous Patent: PISTON BOWL FOR AN INTERNAL COMBUSTION ENGINE

Next Patent: APPARATUS AND METHOD FOR MEASURING A COMPOSITION OF A FLUID