Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM FOR PROCESSING MUSIC
Document Type and Number:
WIPO Patent Application WO/2019/115987
Kind Code:
A1
Abstract:
With reference to Figure 1 we provide a system and method for generating audio data for performance on an audio device using an audio generation system, the method comprising the steps of: receiving an audio profile request indicating one or more values, ranges of values or identifiers associated with audio file metadata, accessing a database of audio file metadata records each record associated with a corresponding audio file, determining based on at least the audio profile request and the accessed audio file metadata a plurality of matching audio files, and generating audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point.

Inventors:
SHEPPARD DAVID (GB)
Application Number:
PCT/GB2018/052826
Publication Date:
June 20, 2019
Filing Date:
October 04, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
POB ENTERPRISES LTD (GB)
International Classes:
G10H1/00
Domestic Patent References:
WO2005057821A22005-06-23
WO2018002650A12018-01-04
Foreign References:
EP1791111A12007-05-30
EP1876583A12008-01-09
US20170092245A12017-03-30
US20040106395A12004-06-03
US20150093729A12015-04-02
US20140254831A12014-09-11
Other References:
None
Attorney, Agent or Firm:
FORRESTERS IP LLP (GB)
Download PDF:
Claims:
CLAIMS

1. A method of generating audio data for performance on an audio device using an audio generation system, the method comprising the steps of:

generating an audio profile request indicating one or more values, ranges of values or identifiers,

accessing a database of audio file metadata records each record associated with a corresponding audio file,

determining based on at least the audio profile request and the accessed audio file metadata a plurality of matching audio files, and

generating audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point. 2. The method of claim 1 , in which the audio profile request comprises one or more identifiers each indicating a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, and a duration.

3. The method of claim 1 or claim 2, wherein each audio file metadata record includes data representing one or more of a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, a chord progression, and a duration.

4. The method of any preceding claim, wherein generating the audio profile request includes generating the audio profile request at a remote user device associated with a user, and communicating the audio profile request to the audio generation system.

5. The method of any one of claims 1 to 3, further comprising the steps of communicating user data from a remote user device associated with a user to the audio generation system, and generating at the audio generation system the audio profile request in response to said received user data.

6. The method of claim 4 or claim 5, wherein the user data includes one or more of sensor data, stored user profile data, and data input via a user interface.

7. The method of any preceding claim, further comprising outputting the audio output data to an audio device.

8. The method of claim 7, where dependent on any one of claims 4 to 6, in which the remote user device includes the audio device.

9. The method of any preceding claim, wherein the step of determining the plurality of matching audio files includes comparing one or more identifiers to audio file metadata associated with the audio files.

10. The method of any preceding claim, including a step of updating a user status.

1 1. The method of claim 10 wherein the step of updating the user status comprises receiving at least one of: a WiFi signal, a Bluetooth signal, a GPS signal. 12. The method of any one of claims 10 or claim 1 1 , wherein the step of updating the user status comprises receiving a user input.

13. The method of any one of claims 10 to 12, wherein the step of updating the user status comprises detecting a time within a predetermined time range, or a lapse of a predetermined time period, or detecting a time relative to a predetermined user schedule.

14. The method of any one of claims 10 to 13, wherein the step of updating the user status comprises receiving sensor data from one or more sensors associated with the remote user device and comparing the sensor data to a range of values or a threshold value.

15. The method of any preceding claim, further including storing a plurality of audio profiles each comprising one or more identifiers, values or ranges of values, and generating an audio profile request includes retrieving a stored audio profile.

16. The method of claim 15 where dependent either directly or indirectly on claim 10, including a step of updating the user status and determining an audio profile associated with the user status.

17. The method of claim 16, wherein updating the user status and determining an audio profile associated with the user status includes determining that no further audio output data is required and causing the system to generate no further audio output data.

18. The method of any preceding claim, wherein the remote user device is a portable smart device.

19. The method of any preceding claim, wherein the audio output data comprises an audio stream of a predetermined time duration.

20. The method of claim 19, wherein the time duration is selected by a user. 21. The method of any preceding claim, wherein generating audio output data comprises generating an audio data file of a predetermined time duration wherein a plurality of layers are established, each layer comprising one or more audio files arranged in time sequential order, wherein audio corresponding to a combination of at least two of the layers is represented at any time-point.

22. The method of any one of claims 1 to 10, wherein generating audio output data comprises generating an audio data stream wherein a plurality of layers are established, each layer comprising one or more audio files arranged in time sequential order, wherein audio corresponding to a combination of at least two of the layers is represented at any time-point, and further including repeating the steps of accessing the database of audio file metadata records and determining a plurality of matching audio files so as to identify additional audio files from which to generate audio output data. 23. The method of claim 22, wherein repeating the steps of accessing the database of audio file metadata records and determining a plurality of matching audio files so as to identify additional audio files from which to generate audio output data includes first generating a further audio profile request different to the original audio profile request,

accessing the database of audio file metadata records and determining based on at least the further audio profile request and the accessed audio file metadata a further plurality of matching audio files, and

generating audio output data by layering the further plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of those matching audio files is represented at any time-point.

24. The method of any preceding claim, further include the steps of:

receiving an audio file and storing the audio file on a storage device associated with the audio generation system,

determining audio file metadata to be associated with the audio file based on the content of the audio file, and storing the determined audio file metadata in the audio file metadata record database.

25. A system for generating audio data for performance on an audio device, the system comprising:

an audio generation server configured to:

receive or generate an audio profile request indicating one or more values, ranges of values or identifiers,

access a database of audio file metadata records, each record being associated with a corresponding audio file,

determine a plurality of matching audio files based on at least the audio profile request and the accessed audio file metadata,

generate audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point.

26. A system according to claim 25, wherein the audio generation server comprises a data storage device configured to store the database of audio file metadata records.

27. A system according to claim 25 or claim 26, wherein the data storage device also stores the audio files.

28. A system according to any one of claims 25 to 27, wherein the audio generation server is operable to receive an audio profile request indicating one or more values, ranges of values or identifiers.

29. A system according to any one of claims 25 to 28, wherein the audio generation server is operable to generate an audio profile request indicating one or more values, ranges of values or identifiers.

30. A system according to any one of claims 25 to 29, wherein each audio file metadata record includes data representing one or more of a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, a chord progression, and a duration.

31 . A system according to any one of claims 25 to 30, wherein the audio profile request comprises one or more identifiers each indicating a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, and a duration.

32. A system according to any one of claims 25 to 31 , wherein the audio generation server is operable to receive user data, and to generate the audio profile request in response to user data. 33. A system according to claim 32, wherein the user data includes one or more of sensor data, stored user profile data, and data input via a user interface.

34. A system according to any one of claims 25 to 33, operable to output the audio output data to an audio device.

35. A system according to any one of claims 25 to 34, wherein the audio generation server is configured to determine the plurality of matching audio files by comparing one or more identifiers to audio file metadata associated with the audio files.

36. A system according to any one of claims 25 to 35, configured to store a plurality of audio profiles each comprising one or more identifiers, and to generate an audio profile request by retrieving a stored audio profile.

37. A system according to any one of claims 25 to 36, configured to store a user status.

38. A system according to claim 37 wherein the user status is updated in response to a signal received from a remote user device indicating at least one of a sensor input, a user input, a receiver input, and/or a trigger generated by a timer or schedule associated with the user or the remote user device.

39. A system according to claim 37 or claim 38, further configured to determine an audio profile associated with the user status, and to retrieve the determined audio profile to generate the audio profile request.

40. A system according to any one of claims 25 to 39 wherein the audio output data comprises an audio stream of a predetermined time duration.

41. The system according to any one of claims 25 to 40 wherein the time duration is selected by a user.

42. The system according to any one of claims 25 to 41 , wherein the audio generation server is configured to generate audio output data by generating an audio data file of a predetermined time duration wherein a plurality of layers are established, each layer comprising one or more audio files arranged in time sequential order, wherein audio corresponding to a combination of at least two of the layers is represented at any time-point.

43. The system according to any one of claims 25 to 41 , wherein the audio generation server is configured to generate audio output data by generating an audio data stream wherein a plurality of layers are established, each layer comprising one or more audio files arranged in time sequential order, wherein audio corresponding to a combination of at least two of the layers is represented at any time-point, and further including repeating the steps of accessing the database of audio file metadata records and determining a plurality of matching audio files so as to identify additional audio files from which to generate audio output data. 44. The system according to claim 43, wherein the audio generation server is configured to repeat the steps of accessing the database of audio file metadata records and determining a plurality of matching audio files so as to identify additional audio files from which to generate audio output data by first generating a further audio profile request different to the original audio profile request,

to access the database of audio file metadata records and determine based on at least the further audio profile request and the accessed audio file metadata a further plurality of matching audio files, and

to generate audio output data by layering the further plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of those matching audio files is represented at any time-point.

45. The system according to any one of claims 25 to 44, wherein the audio generation server is further configured to receive an audio file, to determine audio file metadata to be associated with the audio file based on the content of the audio file, and to store the determined audio file metadata in the audio file metadata record database.

Description:
Title: System for processing music

Description of Invention

The present invention relates to a system for processing music, and to systems for monitoring a user in order to adapt music to suit the behaviour of the user.

Music is typically provided to consumers via physical media (i.e. on compact discs, long-playing records, cassettes, or the like) or by digital audio files which may be downloaded, transferred, and stored for playback on a range of audio devices such as personal computers, MP3 players, home audio systems, or any other audio playing systems, as is known in the art. Traditionally, music is recorded either in a recording studio or at a live concert venue, for example. Multiple audio tracks may be recorded and layered, with audio processing effects and volume envelopes (i.e. a fluctuating profile varying the volume over the duration of the track) applied, to create a final audio recording in which the various layers are mixed.

For example, a band playing popular music may comprise a singer, an electric guitarist, an electric bassist, and a drummer. Each instrument (including the vocals of one or more of the musicians being referred to herein as ‘instruments’) may be recorded separately. Recordings may be made via microphones, or by processing and capture of analogue or digital signals produced by electric instruments (e.g. electronic keyboards, electric guitars, etc.), for example. The recordings of the individual instruments may then be combined to produce the effect of the band playing at the same time. The volume levels and audio effects of each recording may be set differently for each instrument. One or more recordings of the same instrument may be layered on top of one another. In this way, a final recording may be produced, which is subsequently released to consumers.

When musicians play live in front of a crowd, each instrument (including vocals) may be recorded separately, either via microphones or by capturing the audio signals produced by electronic instruments or other equipment. Audio tracks recorded in this way, are subsequently layered in the same manner, to achieve a desired recording balance. Music is also commonly created electronically using electric instruments such as keyboards, synthesisers and electronic drum machines, for example, without using a microphone to capture any audio signals. The same method applies as outlined above, wherein audio tracks are layered upon one another to create a finished recording. The term “recording”, where used in this context, is intended to mean both recorded audio signals and electronically generated data pertaining to an audio track (even if no microphone has been used to capture or record an audio signal, for example).

In each case outlined above, music is released to consumers in discrete (i.e. separate) audio tracks. Each track has a beginning and an end, and a set duration. For popular music, the duration is commonly between 3 to 4 minutes. A format common across most genres consists of an introduction, a number of verses interspersed with a chorus, some form of ‘middle eight’ intersection (a transition segment of eight bars in duration) or an instrumental solo, culminating in one or more repetitions of the chorus and an ending. Of course, there are many common formats of music composition, of which this is simply one example.

It is well known for musical tracks to be combined into albums of related music (e.g. related either by the artist recording the music, the composer of the music, or the music relating to a common theme). It is also well known for consumers to group multiple music tracks into a playlist of tracks, again commonly grouped by the artist, composer, or a common theme or style.

This fixed music structure allows only limited flexibility in the way that music is provided to consumers. Consumers are likely to select music from a limited selection of artists, composers and styles with which they are already familiar. Furthermore, consumers may like certain aspects of a song or style of music, yet dislike other aspects of the music. For example, a user may listen to a pop song and decide that they like the general style of the music - the melody, rhythm, the bass line, and the drum beat - but perhaps they dislike the voice or style of the singer. In this case, the user is likely to stop listening to music by that group, since the singer’s voice will be present in the large majority of their music. As another example, the tempo and dynamic volume of music may change throughout each track and throughout and album or playlist. Selecting combinations of songs with a certain tempo or dynamic theme is extremely time intensive and, even when provided with corresponding details of the tempo and dynamic attributes of a track, the majority of music tracks do not adhere to a particular dynamic, for example throughout the entirety of a song. Many songs include sections of a fast tempo and sections having a slower tempo, relatively. The same applies to the key of the music. It is common for pieces of music to contain both major and minor key sections, and therefore the mood created by the music can change dramatically during the course of a song or track.

It is known for tracks to be merged into one another for a brief period towards the end of a track and at the start of a next track in a playlist or album. Typically the current track has its volume faded gradually, and at the same time a subsequent track begins at a low but increasing volume, so that the tracks overlap one another. To the user listening to the resulting music, there is a brief period during which the tracks can both be heard, as the sound shifts from predominantly the first track to predominantly the second track. Of course, where the tempo, rhythm, key and instrumental structure of the tracks differ, the sound produced during this period may be discordant and/or confusing to a user, and the resulting effect is often a clumsy mix of the two tracks.

For all of these reasons, current devices and systems for providing music to consumers fail to provide a mechanism to provide adequately balanced or themed music. Furthermore, traditional methods of providing music fail to provide a sufficient variety of music to a consumer while maintaining a theme or balance to the music that fits the requirements of the consumer. In addition current devices and systems fail to provide a continuous flow of music to a consumer.

Consumers listen to music for differing durations of time, at differing volumes, and with different preferences as to musical style. Many consumers listen to different types of music at different times, selecting an appropriate radio station, playlist, album, or the like, to provide an appropriate sound track to the activity at hand. For example, a user may listen to background classical music when concentrating while at work, to rock music while commuting, and to dance music while jogging.

The present invention seeks to ameliorate or overcome one or more problems associated with the prior art.

According to an aspect of the invention we provide a method of generating audio data for performance on an audio device using an audio generation system, the method comprising the steps of:

receiving an audio profile request indicating one or more values, ranges of values or identifiers associated with audio file metadata, accessing a database of audio file metadata records each record associated with a corresponding audio file,

determining based on at least the audio profile request and the accessed audio file metadata a plurality of matching audio files, and

generating audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point.

According to another aspect of the invention we provide a system for generating audio data for performance on an audio device, the system comprising:

an audio generation server configured to:

receive or generate an audio profile request indicating one or more values, ranges of values or identifiers,

access a database of audio file metadata records, each record being associated with a corresponding audio file,

determine a plurality of matching audio files based on at least the audio profile request and the accessed audio file metadata,

generate audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point.

Further aspects of the above embodiments of the invention are set out in the appended claim set.

Using systems and methods according to the invention, we are able to reconstruct different audio components together in a dynamic manner, based on a number of variable inputs. Inputs may include those received from an abundance of detection technologies that can provide sensory data feedback, from smart devices that are situated in the home or wider environment, and from wearable smart devices and mobile computing technology (such as smart phones and tablets, for example). This allows the system of the present invention to react to sensed or received data in real-time, to cause a user’s audio experience to adapt according to the user’s preferences and according to the situation.

We now describe features of embodiments of the invention, by way of example only, with reference to the accompanying drawings of which

Figure 1 is a diagrammatic representation of the flow of data within a system according to embodiments of the invention,

Figure 2 is a diagrammatic representation of a system according to embodiments of the invention,

Figure 3 is an illustration of an example popular music structure,

Figure 4 is an illustration of an example layered music structure of an audio file,

Figure 5 is a diagrammatic representation of information flow associated with a system according to embodiments of the invention, and

Figures 6 and 7 are diagrammatic examples of user activity profiles.

With reference to the drawings, and in particular to Figures 1 and 2, we describe a system for generating audio data. The system comprises an audio generation server 10 configured to receive or generate an audio profile request indicating one or more values, ranges of values or identifiers associated with audio properties. The audio profile request typically comprises one or more identifiers each indicating a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, and a duration. Of course features such as the tempo of the music or duration of the music may not be exact requirements, and so an approximate value or range of values may be provided in such cases (e.g. a tempo in the range of 140 to 170 beats per minute). The audio generation server 10 is configured to access a database of audio file metadata records, each record being associated with a corresponding audio file, and to determine a plurality of matching audio files based on at least the audio profile request and the accessed audio file metadata. Each audio file metadata record includes data representing one or more of a musician, an instrument, a composer, a genre, a time signature, a key signature, a tempo, a chord progression, and a duration. In this way, the data held in the metadata records can be matched against the requirements set out in the audio profile request, and an appropriate set of audio tracks can be identified. Those audio tracks may be subject to further audio processing and manipulation (of pitch and tempo, for example) as described below.

The audio generation server 10 generates audio output data by layering the plurality of matching audio files so that in the created data, audio corresponding to a combination of at least two of the matching audio files is represented at any time-point. In other words, the audio files are not simply sequences in order, so that a first track plays and then a second track plays. The consecutive tracks are not simply cross-faded to overlap one another. Rather, layers of audio files are superimposed in layers on top of one another (as they are during a music production process when recording music in a studio, for example), to create a multi-layered music track.

So, in simple terms, the system determines what type of music to generate for the user, selects appropriate audio files from which to compose the stream of music, and subsequently mixes the audio stream from those audio files by layering the audio files. The audio files may be processed or manipulated in various ways, as described, to ensure that they are compatible for layering with one another. For example, such processing may involve time shifting and strectching to alter the tempo of an audio track, without altering its pitch. In other circumstances, the pitch of an audio file may be altered to fit with another audio file. The system of the present invention therefore creates an audio stream in real time, based on user requirements, by retrieving pre-recorded audio files and using metadata to create a cohesive audio track.

In typical embodiments, the audio generation server 10 provides a processing device 12, a memory device 14 and a communication device 16 for receiving and sending communications to the one or more other devices within the system, or interacting with the system (those devices 12, 14, 16 being any suitable hardware components which may be integrated components rather than standalone separate devices). The system includes a data storage device 18 configured to store the database of audio file metadata records, and preferably this storage device 18 is comprised by the audio generation server. In embodiments, multiple storage devices may be included, and one or more storage devices 18 may be located remote from the system (providing cloud storage, for example).

In broad terms, the system generates music to be output to a user via an audio device 26, 32. Where used herein, the term audio device 32 is intended to mean any device capable of playing an audio data file or stream of audio data. This is intended to cover music-playing devices such as audio Hi-Fi equipment, personal audio players (such as MP3 players, for example), a radio, or car stereo equipment, for example, for which playing music is the primary purpose of the device. The term is also intended to cover other devices capable of playing audio files, such as personal computers, mobile smart phones, tablets, smart televisions, and any other suitable device.

With reference to Figure 1 of the drawings, we describe a system which broadly incorporates four stages of audio generation, as follows:

1. Creation: the architecture required to generate an audio file and associated descriptive metadata. 2. Storage: storing the audio files and metadata, for use in generating the audio output data.

3. Audio generation: the interpretation of inbound data from one or more user devices, determining user/data requirements and then creating a suitable audio output from retrieved audio files.

4. Consumption and detection: user-based detection technology (i.e. in a smart phone for example) monitoring the behaviour of the user, requesting audio generation, and receiving and playing the generated audio output data. This stage may use a single device both to detect the user status and requirements, and to play the generated audio, or may alternatively include multiple distinct devices.

The audio generation server 10 has access to a storage device 18 which holds a plurality of audio files, from which the audio output data is generated. The audio output data may be in the form on a continuous stream (similar to a radio station or online radio audio stream for example), or may be in the form of a plurality of discrete or overlapping tracks.

Each of the audio files is typically a single entry containing data relating to a vocal melody for a verse, a series of guitar chords, or a drum beat, for example. It is also possible that the audio file may contain different audio elements combined into one audio file (e.g. vocals and piano). The audio file is typically stored in a suitable audio file format as is known in the art (e.g. WAV, AIFF, AU, FLAC, ALAC (i.e. M4A), MPEG-4, WMA, Vorbis, AAC, ATRAC, MP3, etc.).

While the system is primarily intended to generate music, it is also possible that other audio types may be included in the audio output data. As is the case with radio stations, adverts and spoken audio (such as news reports, travel or weather updates, or other forms of spoken content) may appear amongst music tracks, or may be layered on top of background music, for example. Therefore, references to music and to music files may also encompass other types of audio and audio file.

In embodiments, the audio files may include embedded media data. The embedded data may include text of the lyrics associated with the audio file, or an image (such as a logo or album cover or artist portrait) associated with the audio, or an image / text / video advertisement for example. The embedded data may include a link to another data source, such as a hyperlink to a webpage, for example. The embedded data may be associated with a specific trigger point in the audio file such as a specified time marker at which point the embedded data may be presented to the user consuming the audio file (as part of a generated audio data stream, for example).

In embodiments, the system includes an interface 34 for importing audio files. The imported audio files are stored on a storage device 18, for retrieval by the audio generation server 10. The interface 34 (or a further interface) may be operable to enable inputting, importing or generating metadata to associate with the audio files. The two may be combined to create digital layer file, which include both audio data and the associated metadata. In embodiments, digital layer files are stored as individual files each comprising audio data and metadata. In other embodiments, the audio file and metadata are stored as separate files associated with each other by file identifiers or by system storage data held by the storage or operating system. In general terms, digital layer files combine an audio file with supplemental metadata. The metadata holds information that enable other processes to combine (i.e. to layer and/or sequence) compatible audio files. In this way, audio output data comprises multiple digital layer files, played and mixed either concurrently or in sequence, or a combination of the two. As previously noted, the digital layer files may alternatively be stored as separate audio files and associated metadata files. The metadata may be stored in a database, used to search for metadata having specific properties, as described in more detail below.

In general terms, the interface 34 for inputting the audio files performs a combination of the following functions:

• Import audio files (selection destination folder / locations, validate the filename suffix/convention, validate the file contents and then upload into the Ul / application.

• Detect attributes of the audio file where possible (duration, key signature, beats per minute, volume range, etc.).

• Ability to input additional static attributes about the audio file (artist, genre, key signature, tempo / beats-per-minute (BPM)

• Ability to visualise the audio file on a timeline, and determine the number of bars and chord progressions throughout the audio recording.

· Ability to store all of these additional attributes as ‘metadata’ (data about data) - merging it with the audio file into a new proprietary file format.

• Ability to connect to a central repository / storage device to be able to import new or retrieve existing digital layer files. The interface 34 may perform all of the above functions, or only a subset of one or more of those functions. In embodiments, a single application may be used to perform all of those functions. In other embodiments, multiple applications are provided, running on one or more devices. In embodiments, the audio generation server 10 is further configured for receiving an audio file and storing the audio file on the storage device 18, determining corresponding audio file metadata to be associated with the audio file based on the content of the audio file, and storing the determined audio file metadata in the audio file metadata record database. In other words, the system may provide an automated analysis tool, for reading input audio files, and determining one or more metadata records to associate with that file to be stored as a digital layer file. The metadata analysed automatically may not be complete, in which case a user may need to input any missing data manually. Alternatively, metadata may be sought from online shared resources or cloud- based resources, to complete any blanks in the metadata record.

There is a need to access the metadata associated with the audio files in the digital layer files (or associated metadata records). In embodiments, the metadata is stored in a database schema that enables rapid querying from system processes. Additional usage data (e.g. usage statistics) or preference data (e.g. user preferences) may be stored within the same schema, or may alternatively be stored in one or more separate databases. The database(s) include one or more of the following records:

• End user details - identification details and preferences

• User detection devices - metrics provided and how these may influence the music played (e.g. heart-rate data from fitness trackers may impact the tempo/BPM (beats-per-minute) of music played).

• Digital layer sequences - a record of the combinations and layering of digital layers that are required for a user.

• Excluded layer sequences - a record of any combinations of digital layers that are not permissible (either due to music fit or due to contractual / legal reasons).

• Usage statistics - which digital layers have been played (likely to be required as the basis of a commercial model for contributing artists).

• Usage restriction - restrictions imposed on usage by users according to an associated user licence or subscription, which may prevent access to a subset of the audio files, for example.

One or more remote user devices 20 interact with the system 10, to provide input defining the requirements of the audio to be generated by the system 10, and to receive the audio output data generated by the system. Distinct devices may be used to perform each role, or else a device may both send requests and receive the generated audio output data.

Example remote user devices 20 provide one or more sensors 22, an interface 24, an integrated audio device 26, a processor 28, a memory and/or storage device 30, and a communication device 31 for sending and/or receiving data.

As a user interacts with a remote user device 20, data is collected by a combination of sensors 22, user inputs into an interface 24 (such as input music preferences), and user-related data stored on or accessible via the device (such as stored schedule information, for example). This collected data is used to inform the system about the type of audio the user wishes to hear at any point in time. This information, or a relevant subset of the information, is provided as an audio profile request to the audio generation server 10. In embodiments, user data from the remote user device 20 is provided to the audio generation server 10 directly rather than in the form of an audio profile request, and the server 10 itself generates an appropriate audio profile request based on the received data. The audio profile request comprises a request to the system to generate audio having particular properties or characteristics to suit the user and/or the user’s activities. This may be activity-dependent, time dependent, location dependent, or merely a selection of a music category by the user, for example. The system then, in turn, generates audio data fitting the request, and communicates the audio output data to an audio device 26, 32. The audio device may form part of the remote user device 20 or may be a distinct audio device 32 (such as a speaker).

Audio profile requests may be based on one or more of the following types of user data.

• Location / destination data - a particular location may influence the choice of musical styles (home, office, train etc.). • Tempo-based - either through heart-rate or pedometer / foot stride- rates, to reflect either tempo or musical intensity.

• User-preferences - which can be set and stored on the remote user device itself, or may be stored by the system and edited via a user interface provided by the system, or remotely via the remote user device.

• Timers - to determine the duration of the music required (e.g. needing 15 minutes of relaxing meditation music).

• Proximity based - how the preferences and proximity of other users and their respective remote user devices may impact the music output by the system. For example, a single stream of audio output data may be generated in relation to a single audio device (i.e. a music system at a venue), and the music may be generated based on an averaging or amalgamation of user preferences of those users of the system present at the venue.

• External data-sources - the ability to receive any other data and understand what metadata attributes it may influence (e.g. weather data may impact genre, traffic jam data may result in more relaxing music being required etc.)

The audio generation server 10 is operable to generate an audio profile request indicating one or more values, ranges of values or identifiers, following receipt of user data transmitted from the remote user device 20. The user data includes one or more of sensor data, stored user profile data, and data input via a user interface. In embodiments, the audio generation server 10 is operable to receive an audio profile request indicating one or more values, ranges of values or identifiers, defining desired audio properties determined by the remote user device 20 (i.e. rather than simply receiving the data and then determining the audio profile request at the server 10). Each sensor device 22 or data input device 24 may be incorporated within a remote user device 20, or as a separate input to the system, distinct from a user device. As an example, local weather data may be accessed by the system from an online source accessible via the internet, where the locality of the user is known from data collected by the remote user device 20.

The data input device 24 may be a user interface, for example, such as a touchscreen application running on the remote user device 20. The interface may include voice activation and control functionality for a user to control the system via voice commands.

Examples of the types of detection devices (and their metrics) are as follows.

• Heart-rate monitors / fitness bands / watches - that can detect heart- rate of the user, and even foot stride rate.

• GPS location - and even speed of movement, and whether the user is approaching or leaving certain key locations.

• EEG / direct brain feedback - it is possible in medical applications to respond to measurable activity in different areas of the brain (pain, arousal, stress etc.).

• Other external data sources - that may be both relevant and detectable from the remote user device 20.

• Other users and their preferences / proximity / influence on the music to be played.

Based on activity detected by the above sensors 22 and input mechanisms the system has the ability to react dynamically in real-time to real-world events and activity. Of course, the output audio data need not be generated according to sensed inputs. In embodiments, audio profile requests may be generated based only on user input indicating preferences. A user may simply select one of a list of music styles to initiate music generation according to that style, for example.

In embodiments, a user may construct specific music tracks using the available digital layer files. In such embodiments, a user interface may be provided to enable intentional selection of specific digital layer files, or to select from a plurality of digital layer files matching chosen metadata such as genres, artists, tempos, duration, etc. (or any other metadata stored). In this way, a pre-determined set of digital layer files is prepared in advance, to be played. In embodiments, a user may select settings to specify that sensor data and other inputs from the remote user device 20 may be ignored (perhaps for a preset period of time), to allow the chosen music style to be played uninterrupted. The music required is simply prepared, scheduled, and then played.

Once the audio data is prepared, it is output for consumption as output audio data, for receipt by an audio device 26, 32 or remote user device 20 (itself comprising an audio device 26). Examples of devices that may be used with or form a part of the system include:

• Smartphones (as a remote user device, for example) - a common-place modern example of an intelligent technical device that can both detect data from multiple input sources, but also be used to deliver musical audio output. · Wearables (as remote user devices) - devices (e.g. smartwatches) that can either be used to generate data, or to receive / play audio data.

• Computers / Laptops - can be used to connect to web-based software. Personal computers, laptops, or other suitable processing devices as are known in the art may be used for creating and editing the audio files, generating and/or receiving input of metadata, generating the audio output data, or any other step of the method as outlined herein. Computers and laptops are also exemplary remote user devices, suitable for use with or as part of the system.

• Vehicles - may include sensors suitable to provide GPS / location based data, and also to receive and play the audio output data. In this way, the inbuilt dashboard electronics of a vehicle may provide a suitable remote user device according to embodiments of the invention.

In embodiments, the audio generation server 10 is configured to assess the data it receives from the remote user device 20 to compare it to the current status of the system, to determine if any action is needed. For example:

• what to do if there has been an increase in heartbeat or a decrease in foot stride rate detected by sensors at the remote user device 20 - and how this impacts the BPM or tempo of music that should be generated by the system.

a change of EEG (electroencephalogram) brain pattern detected, which may indicate that a different genre or volume of music is required.

• the local traffic in the immediate proximity has now eased, and so car- based music intensity can be increased again.

• the GPS location is close to approaching a known stored position of the user’s home, and so the music needs to conclude very soon.

In broad terms the audio generation server 10 comprises five functional modules (typically embodied as software) operable to:

1. Interpret data received from the user device to generate an audio profile request, or - where the interpretation is performed on the remote user device itself - receive the interpreted audio profile request (interpretation module).

2. Match metadata associated with the stored audio files / digital layer files against the interpreted audio profile request (metadata matching module).

3. Layer the audio files by selecting appropriate audio files and determining an appropriate way to sequence or layer the audio tracks based on the metadata properties of those files (layering module). 4. Mix the audio tracks to produce audio output data (mixing module).

5. Communicate the audio output data to the remote user device and/or to one or more audio devices (communication module). Those modules may be stored on the audio generation server 10 hardware and run on the processing device 12. As described, the location of the processing is not important, and remote processing and storage of the module functionality is also contemplated. The audio generation server 10 determines the effect of the data received from the future selection of digital layer files, using the interpretation module, by comparing the metadata associated with the audio files of those digital layer files to the requirements determined by the received data. As the requirements are determined, an audio profile request indicating one or more values, ranges of values or identifiers, if generated. An audio profile request may, for example, include a music genre and a tempo (or suggested range of tempos).

The audio profile requests may be generated either by the remote user device 20, or locally by the audio generation server 10, based on the available data, sensor feedback and inputs.

In embodiments, the audio profile requests may correspond to one or more stored audio profiles each comprising one or more identifiers, values or ranges of values. In that case, the audio profile request may simply comprise an identifier associated with the stored audio profile, to retrieve the required information following a request to generate music according to that chosen profile. In embodiments, the system is configured to update a stored user status, to determine an audio profile associated with the user status, and to retrieve the determined audio profile to generate the audio profile request. For example, a user may save an audio profile identifying the jazz genre of music, and specifying a slow tempo, to be played when relaxing at home. When the user is at home and determined to be inactive by the remote user device (e.g. GPS or WiFi signals determine the user’s location is at home, and the motion sensors of the remote user device indicated that the user is not active), the remote user device sends an audio profile request to the system identifying the saved jazz audio profile. The system retrieves the stored audio profile identified in the audio profile request, and subsequently uses that information to determine appropriate audio files / digital layer files from which to generate audio output data to send to the user’s audio device (which may be speakers situated in the user’s living space - such as Bluetooth speakers, for example).

The user status may be updated in response to a signal received from the remote user device indicating at least one of a sensor input, a user input, a receiver input, and/or a trigger generated by a timer or schedule associated with the user or the remote user device.

Once the system has interpreted the sensed data, inputs, and user preferences, to determine an audio profile request and/or associated audio profile, the next step is to match the identifiers, values and ranges of values associated with the various audio properties stored in the metadata, and to retrieve an appropriate selection of audio files. In general terms, the audio generation server 10 is configured to determine the plurality of matching audio files by comparing one or more identifiers (of those received or generated in the audio profile request) to audio file metadata associated with the audio files. This metadata matching process takes these interpreted metadata requests (e.g. increase tempo, switch genre to Jazz), and then searches through the metadata via database queries for information about suitable stored digital layer files held on the associated storage device 18. Once located, a match processing module identifies these specific digital layer files as being appropriate for playing.

The result of combining the newly chosen audio files / digital layer files with the ongoing audio generation based on the previously chosen set of audio files, may be, for example, a combination of a change of underlying music, instrumentation, percussion/drums/rhythm and vocals.

As well as matching based on the future music requirements from the new inbound device data, the match processing module also ensures that newly selected digital layer files are compatible in terms of key signatures, chord sequences / progressions - to make sure that not only does the music change in an appropriate manner, but that it also sounds correct and transitions as seamlessly as possible. The layering process then involves retrieving the specific digital layers, and then constructing them in combination for the required anticipated period of time. For example, the user may be approaching a train station which will trigger a change of genre to‘Chill-Out’. The layering process will need to piece together enough of the audio files to construct a suitable amount of music (e.g. five minutes’ worth that can be looped / repeated if needed).

Figures 3 and 4 illustrate a typical song structure, comprising an intro, and verses interspersed with choruses. A“middle 8” (i.e. an eight-bar section) and/or instrumental solo may be included, before a final chorus and outro or end portion of the song. Figure 4 illustrates the idea of layering audio files over one another. In the example shown, two vocal audio files are layered in sequence. Under those tracks, a guitar and then keyboard audio file are sequenced and a sequence of percussion tracks, so that vocals / instrument / percussion files are combined at any point in time. The output mixed audio file therefore provides a sequence of overlapping audio files that are combined so that each layered track is audible to the user at a given point in time.

The layer processing module may also make subsequent calls back to the metadata matching module, if there is insufficient metadata available, or insufficient digital layers to make enough suitable music to be played. For example, the switch to Chill-Out may mean that the current vocal style (e.g. heavy metal) is no longer appropriate, and so alternate vocal styles (that match the key signature and chord arrangements) also needs to be requested.

In the event that the layer processing module is unable to obtain all of the digital layer files required, it will also improvise with the available digital layer files to ensure that music continues to play, whilst a more appropriate combination of digital layer files is still being retrieved. Using the previous vocal change example, if there were a delay obtaining appropriate new vocal digital layer files (or none were available), the layer processing module may either default to instrumental mode (i.e. without layering any vocals), and/or subsequently attempt to find instrumental solos instead.

The layer processing module can be thought of as providing a constant stream of audio files with which to construct the finished audio stream. It is then the responsibility of the mixing module to construct the various audio layers together, mixing in the new audio over / replacing the existing audio, and appropriately setting volume levels / fades / stereo pans (left, central, right) / effects against these individual audio components so that it sounds like a coherent piece of music, with the vocals, instrumentation and rhythm sections all sounding correct in terms of balance and volume. In embodiments the mixing module is operable to apply one or more audio processing effects to the audio file content as it is mixed. For example, processing-applied effects may include delays, echoes, flange, reverberation (plate reverb, gate, or vocal reverb, for example), chorus, auto-tune, compression, audio-shifting and time-stretching. This enables the same vocal digital layer to sound different when layered over the same repeated underlying music.

Once the audio files have been mixed, the output audio data is communicated via the communication device 16 to one or more audio devices 26, 32. The output audio data may be streamed via the internet (in a similar manner to an online radio station, for example), and/or may be stored on the or another storage device 18. In embodiments, the output audio data is stored and/or transmitted using a proprietary audio file format to lessen the likelihood of the audio data being intercepted and/or copied for unintended or unlicensed use.

The resulting audio output data may be stored and provided online as a downloadable file or stream, or stored as an audio file (and associated digital layer file) for future reuse by the system.

In use, the output audio data is intended to sound like a highly-customised music track, tailored specifically to the user. The audio changes to the specific genre, tempo, style and duration that the user wants to hear based on what they are doing at the time.

The audio output data may comprise an audio stream of a predetermined time duration, which may be selected by a user via the interface 24 associated with the remote user device 20. In embodiments, data is transmitted to the system from a first remote user device, and the output audio data is communicated to the user via a second remote user device. In an example use, a fitness wearable smart device is responsible for transmitting data about a change of GPS location and heartrate of the user to the audio generation server 10, which creates an audio profile request matching characteristics previously stored by the user based on the knowledge of the GPS location and heartrate. The resultant audio output data is streamed over a data connection to the smartphone / headphones that the user is wearing whilst running.

In embodiments, data and/or audio profile requests may be sent to the audio generation server 10 by a plurality of remote user devices 20, which may be associated with one or more users. The system is operable to select one or more audio profile requests to respond to, or may take an average of the data received. For example, if the system receives five requests from five users’ smartphones, and three of those are for upbeat pop music, the system may generated matching upbeat pop music. If, for example, the requests each specify one or more genres of music or properties indicating one or more genres, the system may choose a genre that matches as many requests as possible. Effectively, in this way, the system takes a poll of user preferences and provides an audio track that will suit most users. Figure 5 provides an illustrative overview of how a musical artist (via an interface 34), and a user (via a remote user device 20, for example), may interact with the system. The artist may record audio files, which are submitted using the interface 34 and stored by the system in a cloud storage device, for example. The audio file may be analysed automatically, and matching metadata generated. In addition, or instead, the artist may supply appropriate metadata identifying the artist, and describing the style of music, the key, the tempo, the instruments used, the composer of the music, and any other relevant metadata, via the interface 34. The audio file and metadata record may subsequently be stored together as a digital layer file. Further, in embodiments, the system may record usage statistics associated with each audio file, so that the artist can determine how many times the file has been incorporated into audio output data, for example. This data may be used to determine payments due to the artist under a licensing model, for example.

The user interacts with the system by setting user preferences and settings via a remote user device 30 providing a user interface, either via a web-based portal accessed on a laptop or PC, for example, or via a remote user smartphone or other user device. The remote user device 20 associated with that user may provide metrics, sensor data, and user input, to the audio generation server.

In embodiments, the user may indicate a preference for sequences of music output by the system, so that those sequences may be stored and repeated at a later date. The user may also be given the option to exclude certain sequences of music, or music having certain associated metadata (i.e. music by a particular artist, for example).

In embodiments, a restriction on the audio files available for inclusion in a user’s audio output data may be made according to a licence or subscription model adopted by the user. For example, under the terms of one subscription a user may have access to all of the audio files accessible to the system 10, whereas a more limited subscription may include access only to a subset of those files.

Figures 6 and 7 set out examples of how a user may experience changes to the music produced by the system in response to a varying activity or location. Figure 6 illustrates a daily routine of commuting from an office to home, via a station, then onboard a train, and then arriving at a destination station, followed by a car journey home. At each stage of the journey, the remote user device 20 may provide feedback to the music generation server 10 to alter the type of music being generated. Figure 7 shows an exercise routine, during which a motion sensor or heart rate monitor (or a combination of the two) may provide data that indicates the type of exercise currently in progress. A stretching activity during a warm up or cool down period may be associated with a calming music track, whereas running may provide a faster soundtrack, potentially matching the beats per minute to the stride rate of the runner. Similarly, using weight machines may result in the tempo or intensity of the music being generated being linked to the sensed heart rate of the user.

While example embodiments of the invention are described herein, it should be understood that features of different embodiments may be combined with one another, in isolation from one another or in any combination, unless stated otherwise.

When used in this specification and claims, the terms "comprises" and "comprising" and variations thereof mean that the specified features, steps or integers are included. The terms are not to be interpreted to exclude the presence of other features, steps or components.

The features disclosed in the foregoing description, or the following claims, or the accompanying drawings, expressed in their specific forms or in terms of a means for performing the disclosed function, or a method or process for attaining the disclosed result, as appropriate, may, separately, or in any combination of such features, be utilised for realising the invention in diverse forms thereof.