Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MEDIA CONTENT ITEMS SEQUENCING
Document Type and Number:
WIPO Patent Application WO/2017/165823
Kind Code:
A1
Abstract:
A media content item sequencing system determines a sequence for playback of selected media content items, such as media content items in a playlist. The system calculates similarities between all possible pairs of the media content items and determines a sequence of the media content items using the similarities. The sequence of media content items can be determined by modeling the track features of the media content items with a graphic traversal problem and calculating a solution to the problem with various methods.

Inventors:
JEHAN TRISTAN (SE)
BITTNER RACHEL (SE)
MONTECCHIO NICOLA (SE)
MCCURRY HUNTER (SE)
GU MINWEI (SE)
HERNANDEZ GANDALF (SE)
Application Number:
PCT/US2017/024106
Publication Date:
September 28, 2017
Filing Date:
March 24, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JEHAN TRISTAN (SE)
BITTNER RACHEL (SE)
MONTECCHIO NICOLA (SE)
MCCURRY HUNTER (SE)
GU MINWEI (SE)
HERNANDEZ GANDALF (SE)
International Classes:
G11B27/038; G06F17/30; G11B27/28
Foreign References:
US20100070917A12010-03-18
US20030221541A12003-12-04
US20100332437A12010-12-30
Other References:
AARON VAN DEN OORD; SANDER DIELEMAN; BENJAMIN SCHRAUWEN: "Deep Content-Based Music Recommendation", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2013, pages 2643 - 2651
Attorney, Agent or Firm:
SEBALD, Gregory, A. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS: 1. A method for playing media content items, the method comprising:

determining a plurality of track features of each of the media content items; obtaining weighting data for the plurality of track features;

generating a plurality of weighted track features for each of the media content items by applying the weighting data to the plurality of track features of each of the media content items;

calculating aggregated track features for the media content items, respectively, based on the plurality of weighted track features;

comparing the aggregated track features to determine similarities between the aggregated track features; and

determining a sequence of the media content items based on the similarities.

2. The method of claim 1, further comprising:

receiving a selection of the plurality of media content items. 3. The method of claim 1, further comprising:

obtaining a playlist identifying the plurality of media content items.

4. The method of claim 1, wherein obtaining weighting data comprises:

receiving a user input of weights on the plurality of track features.

5. The method of claim 1, wherein obtaining weighting data comprises:

obtaining sequencing history data;

determining a sequencing history of the media content items based on the sequencing history data; and

predicting weights on the plurality of track features of the media content items.

6. The method of claim 1, wherein the plurality of track features includes acoustic features, key and mode information, and tempo.

7. The method of claim 1, wherein the aggregated track features are represented by numerical values.

8. The method of claim 1, wherein determining a sequence of the media content items comprises:

arranging a first media content item prior to a second media content item, the first media content item and the second media content item being selected from the media content items, and the first media content item being played before the second media content item; and

arranging the second media content item prior to a third media content item, the third media content item being selected from the media content item and played after the second media content item, a difference between a numerical value of an aggregated track feature of the first media content item and a numerical value of an aggregated track feature of the second media content item being smaller than a difference between the numerical value of the aggregated track feature of the first media content item and a numerical value of an aggregated track feature of the third media content item.

9. The method of claim 8, wherein the first media content item is identified as a seed media content item, the seed media content item sequenced to be played first among the media content items.

10. The method of claim 1, further comprising:

identifying a seed media content item selected from the media content items, the seed media content item sequenced to be played first among the media content items.

11. A method for sequencing media content items, the method comprising:

determining a plurality of track features of each of the media content items; weighting the plurality of track features;

mapping the plurality of weighted track features of each of the media content items to an aggregated feature vector;

determining similarities among the aggregated feature vectors; and determining a sequence of the media content items based on the similarities.

12. The method of claim 11, wherein determining similarities comprises:

calculating distances between the aggregated feature vectors. 13. The method of claim 12, wherein determining a sequence of the media content items comprises:

arranging a first media content item prior to a second media content item, the first media content item and the second media content item being selected from the media content items, and the first media content item being played before the second media content item; and

arranging the second media content item prior to a third media content item, the third media content item being selected from the media content item and played after the second media content item, a distance between a feature vector of the first media content item and a feature vector of the second media content item being smaller than a distance between the feature vector of the first media content item and a feature vector of the third media content item.

14. The method of claim 11, wherein determining similarities comprises:

generating a complete symmetric graph with vertices and edges, the vertices associated with the media content items, respectively, and connected via the edges, the edges having values representative of distances between the aggregated feature vectors of the media content items; and

determining an optimal path crossing all of the vertices, the optimal path used to determine the sequence of the media content items.

15. The method of claim 14, further comprising: identifying a seed vertex from the vertices, the seed vertex associated with one of the media content items to be played first among the media content items.

16. The method of claim 14, wherein the optimal path includes a route defined by at least some of the edges and visiting all the vertices only once.

17. The method of claim 14, wherein the optimal path is calculated using the shorted Hamiltonian path. 18. The method of claiml 1, further comprising:

obtaining a playlist identifying the media content items.

19. The method of claiml 1, further comprising:

receiving a user input of weights on the plurality of track features.

20. A computer readable storage device storing data instructions that when executed by a processing device causes the processing device to:

determine a plurality of track features of each of the media content items; weight the plurality of track features;

map the plurality of weighted track features of each of the media content items to an aggregated feature vector;

determine similarities among the aggregated feature vectors; and

determine a sequence of the media content items based on the similarities. 21. A system comprising:

at least one processing device; and

at least one computer readable storage device storing data instructions, which when executed by the at least one processing device, cause the at least one processing device to:

determine a plurality of track features of each of the media content items;

weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector;

determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities.

Description:
MEDIA CONTENT ITEMS SEQUENCING

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application is being filed on 24 March 2017, as a PCT International patent application, and claims priority to U.S. Provisional Patent Application No. 62/313,636, filed March 25, 2016, the disclosure of which is hereby incorporated by reference in its entirety.

BACKGROUND

[0002] Media content, such as audio content or video content, is widely consumed in various environments, such as daily, recreation, or fitness activities.

Examples of audio content include songs, albums, podcasts, audiobooks, etc.

Examples of video content include movies, music videos, television episodes, etc.

Using a mobile phone or other media playback device a person can access large catalogs of media content. For example, a user can access an almost limitless catalog of media content through various free and subscription-based streaming services.

Additionally, a user can store a large catalog of media content on his or her mobile device.

[0003] This nearly limitless access to media content introduces new challenges for users. For example, it may be difficult to find or select the right media content that complements a particular moment such as running or other repetitive-motion activity. Further, it is desirable to play a series of media content items to create engaging, seamless, and cohesive listening experiences, which can be provided by professional music curators and DJs who carefully sort and mix tracks together. Average listeners typically lack the time and skill required to craft such an experience for their own personal enjoyment.

SUMMARY

[0004] In general terms, this disclosure is directed to systems and methods for managing a sequence between media content items. In one possible configuration and by non-limiting example, the systems and methods use a plurality of track features of media content items and determine a sequence of media content items based on similarities of the track features thereof. Various aspects are described in this disclosure, which include, but are not limited to, the following aspects.

[0005] One aspect is a method for playing media content items. The method includes determining a plurality of track features of each of the media content items; obtaining weighting data for the plurality of track features; generating a plurality of weighted track features for each of the media content items by applying the weighting data to the plurality of track features of each of the media content items; calculating aggregated track features for the media content items, respectively, based on the plurality of weighted track features; comparing the aggregated track features to determine similarities between the aggregated track features; and determining a sequence of the media content items based on the similarities.

[0006] Another aspect is a method for sequencing media content items. The method comprising determining a plurality of track features of each of the media content items; weighting the plurality of track features; mapping the plurality of weighted track features of each of the media content items to an aggregated feature vector; determining similarities among the aggregated feature vectors; and determining a sequence of the media content items based on the similarities.

[0007] Yet another aspect is a computer readable storage device storing data instructions that when executed by a processing device causes the processing device to: determine a plurality of track features of each of the media content items; weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector; determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities.

[0008] Another aspect is a system comprising: at least one processing device; and at least one computer readable storage device storing data instructions, which when executed by the at least one processing device, cause the at least one processing device to: determine a plurality of track features of each of the media content items; weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector; determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities. BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 illustrates an example system for automatically sequencing and playing media content items.

[0010] FIG. 2 is a schematic illustration of an example system for automatically sequencing and playing media content items.

[0011] FIG. 3 illustrates an example method for automatically sequencing media content items.

[0012] FIG. 4 illustrates example track features.

[0013] FIG. 5 illustrates an example method for obtaining weighting data.

[0014] FIG. 6 illustrates an example user interface for receiving a user input of weighting.

[0015] FIG. 7 illustrates another method for obtaining weighting data.

[0016] FIG. 8 illustrates an example table for showing track features and aggregated track feature for each media content item.

[0017] FIG. 9 is an example table showing sequencing of the media content items 116 based on aggregated track features.

[0018] FIG. 10 illustrates another example method for automatically sequencing media content items.

[0019] FIG. 11 is a diagram illustrating operations in the method of FIG. 10.

[0020] FIG. 12 illustrates example mapping of key and mode information in a three dimensional space.

[0021] FIG. 13 illustrates example mapping of tempo in a binary logarithmic scale.

[0022] FIG. 14 illustrates an example method for determining similarities between media content items.

[0023] FIG. 15 illustrates an example method for determining a sequence of media content items.

[0024] FIG. 16 is an example graph for determining a sequence of media content items.

[0025] FIG. 17 illustrates an example system for managing a sequence between media content items to continuously support a repetitive motion activity. [0026] FIG. 18 illustrates an example of the media delivery system of FIG. 17 for managing a sequence between media content items to continuously support a repetitive motion activity. DETAILED DESCRIPTION

[0027] Various embodiments will be described in detail with reference to the drawings, wherein like reference numerals represent like parts and assemblies throughout the several views. Reference to various embodiments does not limit the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the appended claims.

[0028] In general, the system of the present disclosure determines a sequence for playback of selected media content items, such as media content items in a playlist. For a given set of media content items (for example, in the form of a playlist), the system calculates similarities between all possible pairs of the media content items, and determines a sequence of the media content items using the similarities. Each of the similarities can be calculated by comparing track features of two media content items. In some embodiments, such track features can be represented as numerical values. In other embodiments, the track features can be represented as a vector. The sequence of media content items can be determined by modeling the track features of the media content items with a graphic traversal problem and calculating a solution to the problem with various methods.

[0029] In certain examples, the system of the present disclosure is used to play back a plurality of media content items to continuously support a user's repetitive motion activity without distracting the user's cadence.

[0030] As such, the system provides a simple, efficient solution to sequencing of selected media content items with professional-level quality. In certain examples, the management process for sequencing between media content items is executed in a server computing device, rather than a user's media playback device. Accordingly, the media playback device can save its resources for playing back media content items in a desirable sequence, and the management process can be efficiently maintained and conveniently modified as appropriate without interacting with the media playback device.

[0031] FIG. 1 illustrates an example system 100 for automatically sequencing and playing media content items. In this example, the system 100 includes a media playback device 102 and a media delivery system 104. The system 100

communicates across a network 106. In some embodiments, a media content sequencing engine 1 10 runs on the media playback device 102, and a media content sequence determination engine 112 runs on the media delivery system 104. Also shown is a user U who uses the media playback device 102 to play back a set of media content items in a playlist 114.

[0032] The media playback device 102 operates to play media content items to produce media output 108. In some embodiments, the media content items are provided by the media delivery system 104 and transmitted to the media playback device 102 using the network 106. A media content item is an item of media content, including audio, video, or other types of media content, which may be stored in any format suitable for storing media content. Non-limiting examples of media content items include songs, albums, music videos, movies, television episodes, podcasts, other types of audio or video content, and portions or combinations thereof. In this document, the media content items can also be referred to as tracks.

[0033] The media delivery system 104 operates to provide media content items to the media playback device 102. In some embodiments, the media delivery system 104 are connectable to a plurality of media playback devices 102 and provide media content items to the media playback devices 102 independently or simultaneously.

[0034] The media content sequencing engine 110 operates to play media content items in a desirable sequence. In some embodiments, a sequence of the media content items are determined by the media delivery system 104 and the media playback device 102 merely operates to play back the media content items according to the sequence. In other embodiments, the media content sequencing engine 110 operates to determine such a sequence of the media content items, either

independently or in cooperation with the media delivery system 104 including the media content sequence determination engine 112. [0035] In some embodiments, as illustrated in FIGS. 17 and 18, the system 100 operates to play media content items in such a sequence as to continuously support the user's repetitive motion activity without interruption.

[0036] The media content sequence determination engine 112 operates to determine a sequence of media content items which are played. In some

embodiments, a sequence of the media content items are determined by the media delivery system 104, either independently or in cooperation with the media playback device 102 including the media content sequencing engine 110. As described herein, in some embodiments, the media content sequence determination engine 112 operates to determine a sequence of media content items where a group of the media content items are given to be played on the media playback device 102. Such a group of media content items can be provided in the form of a playlist 114, which can be manually selected by the user and/or automatically populated for the user. In other embodiments, the sequencing can be determined for other media content items stored in either or both of the media playback device 102 and the media delivery system 104.

[0037] FIG. 2 is a schematic illustration of an example system 100 for automatically sequencing and playing media content items. As also illustrated in FIG. 1, the system 100 can include the media playback device 102, the media delivery system 104, and the network 106.

[0038] As described herein, the media playback device 102 operates to play media content items. In some embodiments, the media playback device 102 operates to play media content items that are provided (e.g., streamed, transmitted, etc.) by a system external to the media playback device such as the media delivery system 104, another system, or a peer device. Alternatively, in some embodiments, the media playback device 102 operates to play media content items stored locally on the media playback device 102. Further, in at least some embodiments, the media playback device 102 operates to play media content items that are stored locally as well as media content items provided by other systems.

[0039] In some embodiments, the media playback device 102 is a computing device, handheld entertainment device, smartphone, tablet, watch, wearable device, or any other type of device capable of playing media content. In yet other embodiments, the media playback device 102 is a laptop computer, desktop computer, television, gaming console, set-top box, network appliance, blue-ray or DVD player, media player, stereo, or radio.

[0040] In at least some embodiments, the media playback device 102 includes a location-determining device 130, a touch screen 132, a processing device 134, a memory device 136, a content output device 138, and a network access device 140. Other embodiments may include additional, different, or fewer components. For example, some embodiments may include a recording device such as a microphone or camera that operates to record audio or video content. As another example, some embodiments do not include one or more of the location-determining device 130 and the touch screen 132.

[0041] The location-determining device 130 is a device that determines the location of the media playback device 102. In some embodiments, the location- determining device 130 uses one or more of the following technologies: Global Positioning System (GPS) technology which may receive GPS signals from satellites S, cellular triangulation technology, network-based location identification technology, Wi-Fi positioning systems technology, and combinations thereof.

[0042] The touch screen 132 operates to receive an input from a selector (e.g., a finger, stylus etc.) controlled by the user U. In some embodiments, the touch screen 132 operates as both a display device and a user input device. In some embodiments, the touch screen 132 detects inputs based on one or both of touches and near- touches. In some embodiments, the touch screen 132 displays a user interface 144 for interacting with the media playback device 102. As noted above, some embodiments do not include a touch screen 132. Some embodiments include a display device and one or more separate user interface devices. Further, some embodiments do not include a display device.

[0043] In some embodiments, the processing device 134 comprises one or more central processing units (CPU). In other embodiments, the processing device 134 additionally or alternatively includes one or more digital signal processors, field- programmable gate arrays, or other electronic circuits.

[0044] The memory device 136 operates to store data and instructions. In some embodiments, the memory device 136 stores instructions for a media playback engine 146 that includes a media content selection engine 148 and the media content sequencing engine 110.

[0045] The memory device 136 typically includes at least some form of computer-readable media. Computer readable media include any available media that can be accessed by the media playback device 102. By way of example, computer-readable media include computer readable storage media and computer readable communication media.

[0046] Computer readable storage media includes volatile and nonvolatile, removable and non-removable media implemented in any device configured to store information such as computer readable instructions, data structures, program modules, or other data. Computer readable storage media includes, but is not limited to, random access memory, read only memory, electrically erasable programmable read only memory, flash memory and other memory technology, compact disc read only memory, blue ray discs, digital versatile discs or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the media playback device 102. In some embodiments, computer readable storage media is non-transitory computer readable storage media.

[0047] Computer readable communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" refers to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, computer readable communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.

[0048] The content output device 138 operates to output media content. In some embodiments, the content output device 138 generates media output 108 (FIG. 1) for the user U. Examples of the content output device 138 include a speaker, an audio output jack, a Bluetooth transmitter, a display panel, and a video output jack. Other embodiments are possible as well. For example, the content output device 138 may transmit a signal through the audio output jack or Bluetooth transmitter that can be used to reproduce an audio signal by a connected or paired device such as headphones or a speaker.

[0049] The network access device 140 operates to communicate with other computing devices over one or more networks, such as the network 106. Examples of the network access device include wired network interfaces and wireless network interfaces. Wireless network interfaces includes infrared, BLUETOOTH® wireless technology, 802.1 la/b/g/n/ac, and cellular or other radio frequency interfaces in at least some possible embodiments.

[0050] The media playback engine 146 operates to play back one or more of the media content items (e.g., music) to the user U. When the user U is running while using the media playback device 102, the media playback engine 146 can operate to play media content items to encourage the running of the user U, as illustrated with respect to FIG. 22. As described herein, the media playback engine 146 is configured to communicate with the media delivery system 104 to receive one or more media content items (e.g., through the stream media 180), as well as sequencing data generated by the media delivery system 104 for sequencing media content items. Alternatively, such sequencing data can be locally generated by, for example, the media playback device 102.

[0051] The media content selection engine 148 operates to retrieve one or more media content items. In some embodiments, the media content selection engine 148 is configured to send a request to the media delivery system 104 for media content items and receive information about such media content items for playback. In some embodiments, media content items can be stored in the media delivery system 104. In other embodiments, media content items can be stored locally in the media playback device 102. In yet other embodiments, some media content items can be stored locally in the media playback device 102 and other media content items can be stored in the media delivery system 104.

[0052] The media content sequencing engine 110 is included in the media playback engine 146 in some embodiments. The media content sequencing engine 110, either independently or in cooperation with the media content sequence determination engine 112, can operate to arrange similar media content items closely so as to provide engaging, seamless and cohesive listening experiences which would otherwise be manually performed by music professionals, such as disc jockeys. Such sequencing can be performed by the media content sequence determination engine 112 of the media delivery system 104 alone. As described herein, such a sequence of media content items can also support a user's repetitive motion activity.

[0053] With still reference to FIG. 2, the media delivery system 104 includes one or more computing devices and operates to provide media content items to the media playback devices 102 and, in some embodiments, other media playback devices as well. In some embodiments, the media delivery system 104 operates to transmit stream media 180 to media playback devices such as the media playback device 102.

[0054] In some embodiments, the media delivery system 104 includes a media server application 150, a processing device 152, a memory device 154, and a network access device 156. The processing device 152, memory device 154, and network access device 156 may be similar to the processing device 134, memory device 136, and network access device 140 respectively, which have each been previously described.

[0055] In some embodiments, the media server application 150 operates to stream music or other audio, video, or other forms of media content. The media server application 150 includes a media stream service 160, a media data store 162, and a media application interface 164.

[0056] The media stream service 160 operates to buffer media content such as media content items 170 (including 170 A, 170B, and 170Z) for streaming to one or more streams 172A, 172B, and 172Z.

[0057] The media application interface 164 can receive requests or other communication from media playback devices or other systems, to retrieve media content items from the media delivery system 104. For example, in FIG. 2, the media application interface 164 receives communication 182 from the media playback engine 146.

[0058] In some embodiments, the media data store 162 stores media content items 170, media content metadata 174, and playlists 176. The media data store 162 may comprise one or more databases and file systems. Other embodiments are possible as well. As noted above, the media content items 170 may be audio, video, or any other type of media content, which may be stored in any format for storing media content.

[0059] The media content metadata 174 operates to provide various pieces of information associated with the media content items 170. In some embodiments, the media content metadata 174 includes one or more of title, artist name, album name, length, genre, mood, era, etc.

[0060] In some embodiments, the media content metadata 174 includes acoustic metadata, cultural metadata, and explicit metadata. The acoustic metadata may be derived from analysis of the track refers to a numerical or mathematical

representation of the sound of a track. Acoustic metadata may include temporal information such as tempo, rhythm, beats, downbeats, tatums, patterns, sections, or other structures. Acoustic metadata may also include spectral information such as melody, pitch, harmony, timbre, chroma, loudness, vocalness, or other possible features. Acoustic metadata may take the form of one or more vectors, matrices, lists, tables, and other data structures. Acoustic metadata may be derived from analysis of the music signal. One form of acoustic metadata, commonly termed an acoustic fingerprint, may uniquely identify a specific track. Other forms of acoustic metadata may be formed by compressing the content of a track while retaining some or all of its musical characteristics.

[0061] The cultural metadata refers to text-based information describing listeners' reactions to a track or song, such as styles, genres, moods, themes, similar artists and/or songs, rankings, etc. Cultural metadata may be derived from expert opinion such as music reviews or classification of music into genres. Cultural metadata may be derived from listeners through websites, chatrooms, blogs, surveys, and the like. Cultural metadata may include sales data, shared collections, lists of favorite songs, and any text information that may be used to describe, rank, or interpret music. Cultural metadata may also be generated by a community of listeners and automatically retrieved from Internet sites, chat rooms, blogs, and the like. Cultural metadata may take the form of one or more vectors, matrices, lists, tables, and other data structures. A form of cultural metadata particularly useful for comparing music is a description vector. A description vector is a multi-dimensional vector associated with a track, album, or artist. Each term of the description vector indicates the probability that a corresponding word or phrase would be used to describe the associated track, album or artist.

[0062] The explicit metadata refers to factual or explicit information relating to music. Explicit metadata may include album and song titles, artist and composer names, other credits, album cover art, publisher name and product number, and other information. Explicit metadata is generally not derived from the music itself or from the reactions or opinions of listeners.

[0063] At least some of the metadata 174, such as explicit metadata (names, credits, product numbers, etc.) and cultural metadata (styles, genres, moods, themes, similar artists and/or songs, rankings, etc.), for a large library of songs or tracks can be evaluated and provided by one or more third party service providers. Acoustic and cultural metadata may take the form of parameters, lists, matrices, vectors, and other data structures. Acoustic and cultural metadata may be stored as XML files, for example, or any other appropriate file type. Explicit metadata may include numerical, text, pictorial, and other information. Explicit metadata may also be stored in an XML or other file. All or portions of the metadata may be stored in separate files associated with specific tracks. All or portions of the metadata, such as acoustic fingerprints and/or description vectors, may be stored in a searchable data structure, such as a k-D tree or other database format.

[0064] The playlists 176, which includes the playlist 1 14 (FIG. 1), operate to identify one or more of the media content items 170. In some embodiments, the playlists 176 identify a group of the media content items 170 in a particular order. In other embodiments, the playlists 176 merely identify a group of the media content items 170 without specifying a particular order. Some, but not necessarily all, of the media content items 170 included in a particular one of the playlists 176 are associated with a common characteristic such as a common genre, mood, or era.

[0065] In some embodiments, playlists can be manually created, modified, and managed by users. In other embodiments, playlists can be automatically created by the media delivery system 104, the media playback device 102, and any other computing devices and presented or recommended to the users. [0066] Referring still to FIG. 2, the network 106 is an electronic communication network that facilitates communication between the media playback device 102 and the media delivery system 104. An electronic communication network includes a set of computing devices and links between the computing devices. The computing devices in the network use the links to enable communication among the computing devices in the network. The network 106 can include routers, switches, mobile access points, bridges, hubs, intrusion detection devices, storage devices, standalone server devices, blade server devices, sensors, desktop computers, firewall devices, laptop computers, handheld computers, mobile telephones, and other types of computing devices.

[0067] In various embodiments, the network 106 includes various types of links. For example, the network 106 can include wired and/or wireless links, including Bluetooth, ultra-wideband (UWB), 802.11, ZigBee, cellular, and other types of wireless links. Furthermore, in various embodiments, the network 106 is

implemented at various scales. For example, the network 106 can be implemented as one or more local area networks (LANs), metropolitan area networks, subnets, wide area networks (such as the Internet), or can be implemented at another scale.

Further, in some embodiments, the network 106 includes multiple networks, which may be of the same type or of multiple different types.

[0068] Although FIG. 2 illustrates only a single media playback device 102 communicable with a single media delivery system 104, in accordance with some embodiments, the media delivery system 104 can support the simultaneous use of multiple media playback devices, and the media playback device can simultaneously access media content from multiple media delivery systems. Additionally, although FIG. 2 illustrates a streaming media based system for managing sequencing of media content items, other embodiments are possible as well. For example, in some embodiments, the media playback device 102 includes a media data store 162 and the media playback device 102 is configured to perform management of sequencing between media content items without accessing the media delivery system 104. Further in some embodiments, the media playback device 102 operates to store previously streamed media content items in a local media data store. [0069] FIG. 3 illustrates an example method 200 for automatically sequencing media content items. In this example, the method 200 is described as being performed in the media delivery system 104 including the media content sequence determination engine 112. However, in other embodiments, only some of the processes in the method 200 can be performed by the media delivery system 104. In other embodiments, all or some of the processes in the method 200 are performed by the media playback device 102. In yet other embodiments, all or some of the processes in the method 200 are performed by both of the media delivery system 104 and the media playback device 102 in cooperation.

[0070] Within this description, the terms "automatically" and "automated" mean "without user intervention". An automated task may be initiated by a user but an automated task, once initiated, proceeds to a conclusion without further user action.

[0071] Within this description, a "track" is a digital data file containing audio information. A track may be stored on a storage device such as a hard disc drive, and may be a component of a library of audio tracks. A track may be a recording of a song or a section, such as a movement, of a longer musical composition. A track may be stored in any known or future audio file format. A track may be stored in an uncompressed format, such as a WAV file, or a compressed format such as an MP3 file. In this document, however, a track is not limited to be of audio type and it is also understood that a track can indicate a media content item of any suitable type.

[0072] The method 200 can begin at operation 202, in which the media delivery system 104 receives selection of media content items. In some embodiments, the media content items to be sequenced are identified in a playlist 114 (FIG. 1). The media content items to be sequenced can be manually selected by the user or automatically provided to the user.

[0073] At operation 204, the media delivery system 104 determines one or more track features of each media content item. Track features represent various characteristics of a media content item in various forms. In some embodiments, track features can be obtained from various sources, such as the media content metadata 174 including acoustic metadata, cultural metadata, and explicit metadata. In other embodiments, track features can be obtained by retrieving the media content metadata 174 and processing it to different formats. Example track features which can be used for sequencing are further described with reference to FIG. 4.

[0074] At operation 206, the media delivery system 104 obtains weighting data. At operation 208, the media delivery system 104 then weights the track features based on the weighting data, thereby generating weighted track features for each media content item.

[0075] Track features and weighted track features can be represented in various formats. In some embodiments, track features and weighted track features can be represented by a numerical value or score, as illustrated in FIG. 8. In other embodiments, track features and weighted track features can be represented as vectors, such as feature vectors 376, as illustrated in FIG. 11. Other forms are also possible to represent track features and weighted track features in yet other embodiments.

[0076] The weighting data, such as weighting data 380 (FIG. 11), include information usable to weight (also referred to herein as scale) different track features. As described herein, the track features used for sequencing can be scaled such that the selected media content items are ordered to flow smoothly from one item to the next. The notion of "flowing smoothly" can be content dependent. For example, some situations require that the tempo doesn't change abruptly while other situations require that neighboring tracks are acoustically similar. By way of example, desirable sequencing can ensure that consecutive pairs of media content items have similar keys and tempos, allowing for less jarring transitions.

[0077] In some embodiments, the track features used for sequencing can be weighted in a way that is consistent with intended applications. By way of example, a generic playlist of media content items can be sequenced using only timbral descriptors, while tempo and key consistency may be the most important aspects in the case of a dance party playlist where the crossfade between media content items should preserve the rhythmic regularity and harmonic flow. As such, the track features can be weighted differently according to various factors which may determine the characteristics of the set (e.g., playlist) of media content items to be sequenced. [0078] In some embodiments, weighting information included in the weighting data can be selected or adjusted manually by a user, as further illustrated in FIG. 5. Alternatively or in addition, such weights can be automatically determined as further illustrated in FIG. 7.

[0079] At operation 210, the media delivery system 104 calculates an aggregated track feature for each media content item based on the weighted track features for that media content item. In some embodiments, the aggregated track feature for each media content items, such as an aggregated track feature 302 (FIG. 8), can be determined as a sum of the weighted track features that are obtained at the operation 208. In other embodiments, the aggregated track feature can be obtained by using the weighted track features differently.

[0080] The aggregated track feature can be represented in various formats. In some embodiments, the aggregated track feature can be represented by a numerical value or score, as illustrated in FIG. 8. In other embodiments, the aggregated track feature can be represented as a vector, such as an aggregated feature vector 378, as illustrated in FIG. 11. Other forms are also possible to represent the aggregated track feature in yet other embodiments.

[0081] In some embodiments, the operation 210 can be repeated until the aggregated track features are obtained for all of the media content items to be sequenced.

[0082] At operation 212, the media delivery system 104 compares the aggregated track features. At operation 214, the media delivery system 104 determines similarities between the media content items based on the comparison between the media content items' aggregated track features.

[0083] A similarity between media content items can be calculated in various ways. In some embodiments, where aggregated track features are represented as numerical values, a similarity between two media content items can be determined based on a difference between the aggregated track feature values of the two media content items. In other embodiments, where aggregated track features are represented as vectors, a similarity between two media content items can be determined by calculating the Euclidean distance between the vectors representative of the aggregated track features of the two media content items. In yet other embodiments, any other similarity or comparison measurement can be used to compare two media content items.

[0084] A similarity can be represented in various formats. In some

embodiments, a similarity result can be a value indicating the similarity between two media content items on a predetermined scale. For example, a similarity can be a score having a value between 0 and 1, 0 and 100, etc., with 0 indicating no similarity between two media content items and the maximum value indicating that two media content items are highly similar or identical. The similarity result may be expressed as a difference score, where zero may indicate no difference between two media content items and a higher value may indicating an increasing degree of difference. The similarity score may be quantized into levels, for example A/B/C/D/E, for reporting the requester. The similarity score may be compared to a predetermined threshold and converted into a binary value, for example Yes/No, for reporting the requester.

[0085] At operation 216, the media delivery system 104 operates to sequence the media content items based on the similarities. In some embodiments, where the aggregated track features are represented by numerical values, a difference between any two of the aggregated track features can determine an order of the media content items. Such an order of the media content items can begin with a seed media content item, which is selected from the media content items and to be played first among the media content items. The seed media content item can be manually selected by the user, or automatically selected by the media delivery system 104 or the media playback device 102.

[0086] By way of example, when the seed media content item is given, the next media content item can be selected to be a media content item having an aggregated track feature value that is more similar to the aggregated track feature value of the seed media content item than to the aggregated track feature values of the other media content items. As a simple example of sequencing three media content items, a first media content item is arranged prior to a second media content item and the second media content item is arranged prior to a third media content item when a difference between an aggregated track feature value of the first media content item and an aggregated track feature value of the second media content item being smaller than a difference between the aggregated track feature value of the first media content item and an aggregated track feature value of the third media content item.

[0087] FIG. 4 illustrates example track features 230, which can be used for sequencing media content items in a playlist. Several acoustic aspects of a media content item can be exposed and combined differently for different applications. In some embodiments, the track features 230 include acoustic features 240, key and mode information 242, and tempo 244.

[0088] In some embodiments, the track features 230 are computed for each track in the media delivery system 104. In other embodiments, the track features 230 can be calculated using one or more software programs running on the media delivery system or one or more other computing devices.

[0089] The acoustic features 240 represent the sound of a media content item, such as timbre, melody, pitch, harmony, and other possible features. In some embodiments, the acoustic features 240 can be obtained from the acoustic metadata of the media content item.

[0090] In some embodiments, a timbre feature 250 is used as an example of the acoustic features. The timbre feature 250 is character or quality of a sound or voice as distinct from its pitch and intensity. A timber feature is a perceived sound quality of a musical note, sound, or tone that distinguishes different types of sound production, such as choir voices, and musical instruments, such as string

instruments, wind instruments, and percussion instruments.

[0091] The key and mode information 242. The mode generally refers to a type of scale, coupled with a set of characteristic melodic behaviors. The key of a piece is a group of pitches or scale upon which a music composition is created. The group features a tonic note and its corresponding chords, also called a tonic or tonic chord, providing a subjective sense of arrival and rest and also has a unique relationship to the other pitches of the same group, their corresponding chords, and pitches and chords outside the group. Notes and chords other than the tonic in a piece create varying degrees of tension, resolved when the tonic note or chord returns. The key may be in the major or minor mode. [0092] The tempo 244 indicates the speed or pace of a given piece or subsection thereof, how fast or slow. Tempo is related to meter and is usually measured by beats per minute, with the beats being a division of the measures, though tempo is often indicated by terms which have acquired standard ranges of beats per minute or assumed by convention without indication.

[0093] In other embodiments, any other features or aspects of a media content item can be additionally or alternatively used as the track features 230. The methods of using the track features 230 for sequencing described herein are also applicable to such other features and aspects used as the track features 230.

[0094] FIG. 5 illustrates an example method 270 for obtaining weighting data, which can be used at the operation 206 in the method 200 as described in FIG. 3. The method 270 is described herein with further reference to FIG. 6, which illustrates an example user interface for receiving a user input of weighting.

[0095] In this example, the method 270 is described as being performed in the media delivery system 104. However, in other embodiments, only some of the processes in the method 270 can be performed by the media delivery system 104. In other embodiments, all or some of the processes in the method 270 are performed by the media playback device 102. In yet other embodiments, all or some of the processes in the method 270 are performed by both of the media delivery system 104 and the media playback device 102 in cooperation. In yet other embodiments, the method 270 can be performed by other computing devices and provided to the media delivery system 104.

[0096] The method 270 is used to receive a manual selection of weights from a user. The weights can be determined based on the overall characteristics (such as styles, genres, moods, themes, similar artists and/or songs, and rankings) of the media content items in the playlist. The weights are used to scale a plurality of track features such that the media content items are sequenced and played to provide a smooth, continuous playback. The given media content items may have similar values in one or more particular track features, and thus the weights can be adjusted to emphasize such particular track features more than other track features which are not shared by all or a majority of the media content items. By way of example, where a playlist includes media content items generally suitable for a dance party, consistency in tempo and key may be important aspects to preserve rhythmic regularity and harmonic flow between the media content items. In this case, the weights can be adjusted or set to give more weight on the tempo and the key feature.

[0097] The method 270 can begin at operation 272, in which the media delivery system 104 operates to provide a user interface for receiving a user input of weights on track features. The user interface enables a user to input or adjust weights for one or more track features 230. An example of the user interface is illustrated in FIG. 6. As shown in FIG. 6, the user interface 280 provides one or more control elements, such as sliders 282, allowing a user to make adjustments to values of track features 230. In some embodiments, the user interface can be presented on a computing device which is connected to the media delivery system 104 and operated by the user, and the computing device can transmit the input to the media delivery system 104 once receiving the input from the user through the user interface.

[0098] At operation 274, the media delivery system 104 operates to receive a user input of weights on one or more track features 230. In some embodiments, where the user inputs the weighting values through a user computing device, the media delivery system 104 receives such inputs from the user computing device. In other embodiments, the media delivery system 104 can directly receive the user input of weights from the user.

[0099] FIG. 7 illustrates another method 290 for obtaining weighting data, which can be used at the operation 206 in the method 200 as described in FIG. 3. In this example, the method 270 is described as being performed in the media delivery system 104. However, in other embodiments, only some of the processes in the method 270 can be performed by the media delivery system 104. In other embodiments, all or some of the processes in the method 270 are performed by the media playback device 102. In yet other embodiments, all or some of the processes in the method 270 are performed by both of the media delivery system 104 and the media playback device 102 in cooperation. In yet other embodiments, the method 270 can be performed by other computing devices and provided to the media delivery system 104.

[0100] The method 290 is used to automatically determine the weights for scaling the track features 230. The method 290 can begin at operation 292, in which the media delivery system 104 obtains sequencing history data. The sequencing history data include information about a history of sequencing media content items in general. In some embodiments, the sequencing history data include a large volume of past sequencing events that have been performed by music professionals, such as professional music curators and disc jockeys. In other embodiments, the sequencing history data include a large volume of past sequencing events that have been performed by at least some users or listeners of the media content items provided by the media delivery system 104.

[0101] At operation 294, the media delivery system 104 operates to determine the sequencing history of the given media content items based on the sequencing history data. In some embodiments, the media delivery system 104 can identify a particular characteristic of the selected media content items to be sequenced. Given a set of media content items, the set of media content items (for example, in the form of a playlist) can be characterized to have a particular attribute in common, such as styles, genres, moods, themes, similar artists and/or songs, rankings, etc. The media delivery system 104 can then determine how the same media content items, or the media content items having a similar characteristic to the characteristic of the selected media content items, have been sequenced from the sequencing history data. The media delivery system 104 further determine a correlation between the sequencing history and the track features of the same media content items or the media content items having the similar characteristic. Such a correlation can be used to determine or predict how the track features 230 of the media content items to be sequenced should be weighted.

[0102] At operation 296, the media delivery system 104 can predict weights on the track features of the media content items to be sequenced, depending on the characteristic of the media content items.

[0103] FIG. 8 illustrates an example table 300 for showing the track features 230 and the aggregated track feature 302 for each media content item 116. In this example, the track features 230 and the aggregated track feature 302 are represented with numerical values. In some embodiments, the values of the track features and the aggregated track feature can be normalized. [0104] In some embodiments, the aggregated track feature 302 are obtained as a weighted sum of the track features 230, such as the timbre feature 250, the key and mode feature 242, and the tempo feature 244. In the example table 300, the track features 230 are weighted such that the tempo feature 244 is only considered without the other track features (i.e., Timber : Key/Mode : Tempo = 0 : 0 : 1).

[0105] FIG. 9 illustrates an example table 310 in which the media content items 116 are sequenced based on the aggregated track features 302. In this example, the media content items are arranged from the seed media content item 310 (in this example, Track ID 1117) and ordered based on the smallest difference method between the aggregated track features of adjacent media content items, as used at the operation 216 described with reference to FIG. 3. In other embodiments, other methods can be used to order the given media content items.

[0106] FIG. 10 illustrates another example method 330 for automatically sequencing media content items. In this example, the method 200 is described as being performed in the media delivery system 104 including the media content sequence determination engine 112. However, in other embodiments, only some of the processes in the method 200 can be performed by the media delivery system 104. In other embodiments, all or some of the processes in the method 200 are performed by the media playback device 102. In yet other embodiments, all or some of the processes in the method 200 are performed by both of the media delivery system 104 and the media playback device 102 in cooperation.

[0107] At least some of the operations in the method 330 are performed similarly to the corresponding operations in the method 200 as described with reference to FIGS. 3-9. Therefore, the description of such operations in the method 200 is incorporated by reference for the method 330.

[0108] The operations 332, 334, 336, and 338 are performed similarly to the operations 202, 204, 206, and 208 in the method 200. For brevity purposes, the description of the operations 332, 334, 336, and 338 are omitted.

[0109] At operation 340, the media delivery system 104 calculates feature vectors 376 (FIG. 11) for each media content item 116. At operation 342, the media delivery system 104 calculates an aggregated feature vector 378 (FIG. 11) for each media content item 116. An example of such calculations is illustrated and described in more detail with reference to FIG. 11.

[0110] At operation 344, the media delivery system 104 operates to compare the aggregated feature vectors 378 of each pair of the media content items 116. At operation 346, the media delivery system 104 then determines similarities between the aggregated feature vectors 378. At operation 348, the media delivery system 104 determines a sequence of the media content items 116 based on the determined similarities. An example of the operations 344, 346, and 348 is described in more detail with reference to FIGS. 14 and 15.

[0111] FIG. 11 is a diagram 370 illustrating operations in the method 330. In this example, the media delivery system 104 includes a vector mapping engine 372 and an aggregation engine 374. The diagram 370 is described with also reference to FIG. 12, which illustrates an example mapping of key and mode information to Euclidean space, and FIG. 13, which illustrates an example octave-invariance mapping of tempo to Euclidean space.

[0112] In some embodiments, the vector mapping engine 372 and the aggregation engine 374 are included in the media content sequence determination engine 112. In other embodiments, the vector mapping engine 372 and the aggregation engine 374 can be included in any other part of the media delivery system 104. In yet other embodiments, the vector mapping engine 372 and the aggregation engine 374 can be included in the media playback device 102 or any other computing devices.

[0113] The vector mapping engine 372 can refer to the track features 230 of each media content item 116 and associate them to corresponding feature vectors 376 in Euclidean spaces. In other embodiments, however, at least one of the feature vectors 376 can be generated from other data which are not directly related to corresponding track features.

[0114] Where the acoustic features 240 are concerned, in some embodiments, the vector mapping engine 372 can derive acoustic vectors from a convolutional neural network. A convolutional neural network is a type of feed-forward artificial neural network. One example of the convolutional neural network that can be utilized to obtain the acoustic vectors is described in Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep Content-Based Music Recommendation. In Advances in Neural Information Processing Systems, pages 2643-2651, 2013.

[0115] In some examples, one of the acoustic vectors can capture a timbral characteristic (such as the timbre feature 250) of a media content item. According to a convolutional neural network, a low-dimensional embedding (such as in an eight (8) dimensional space (R 8 )) can be trained in a supervised setting to minimize the Euclidean distance between similar media content items based on metadata information. In other examples, other approaches can be used to generate one or more acoustic vectors.

[0116] Where the key and mode feature 242 is concerned, in some

embodiments, the vector mapping engine 372 can map the key and mode

information 242 into points 390 in a three (3) dimensional space (R 3 ) so that adjacent keys in the circle of fifths and relative major/minor keys are equidistance, as illustrated in FIG. 12. The points 390 can be represented as a key and mode feature vector. In other embodiments, a feature vector for the key and mode information can be generated in other methods.

[0117] Where the tempo feature 244 is concerned, in some embodiments, the vector mapping engine 372 can map the tempo feature 244 in a binary logarithmic scale. For example, in certain applications, tempo-octave invariance is preserved, and tempo is represented as a unit vector whose polar angle is mapped into a tempo octave, as illustrated in FIG. 13. In other embodiments, a feature vector for the tempo feature can be generated in other methods.

[0118] In some embodiments, the above description about mapping of the acoustic features 240, the key and mode feature 242, and the tempo feature 244 can required for a particular purpose or application. In other embodiments, however, any number of dimensions and/or any type of mapping can be used.

[0119] Referring still to FIG. 11, the aggregation engine 374 operates to generate an aggregated feature vector 378 for each media content item based on the feature vectors 376. In some embodiments, the aggregated feature vector 378 can be constructed by concatenating the individual feature vectors 376. In some

embodiments, the feature vectors 376 can be scaled based on the weighting data 380. In the illustrated example, the weighting data 380 is used in the aggregation engine 374 to scale the feature vectors 376 to generate the aggregated feature vector 378. Alternatively, the weighting data 380 can be provided to the vector mapping engine 372 so that the track features 230 are scaled based on the weighting data 380 before the feature vectors 376 are constructed.

[0120] FIG. 14 illustrates an example method 410 for determining similarities between media content items, which can be used at the operations 344, 346, and 348 in the method 330 as described in FIG. 10.

[0121] At operation 412, the media delivery system 104 operates to calculate a distance (such as the Euclidean distance) between the aggregated feature vectors 378 of each pair from the media content items 116. Calculation of the distance between two aggregated feature vectors 378 is repeated for all possible pairs from the media content items 116 in the playlist. In other embodiments, any other distance measurement can be used to calculate a distance between two aggregated feature vectors.

[0122] At operation 414, the media delivery system 104 determines a seed media content item 310 (such as shown in FIG. 8), which is to be played first among the media content items. As described herein, the seed media content item 310 can be manually selected by the user, or automatically selected by the media delivery system 104 or the media playback device 102.

[0123] At operation 416, the media delivery system 104 determines a sequence between the media content items in the playlist based on the distances calculated at the operation 412. The sequence begins from the seed media content item 310. In a simple example, a first media content item is arranged prior to a second media content item and the second media content item is arranged prior to a third media content item when a distance between an aggregated feature vector of the first media content item and an aggregated feature vector of the second media content item is smaller than a distance between the aggregated feature vector of the first media content item and an aggregated feature vector of the third media content item. Other example sequencing methods are further described with reference to FIG. 15.

[0124] FIG. 15 illustrates an example method 430 for determining a sequence of media content items, which can be used at the operation 416 in the method 410 as described in FIG. 14. The method 430 can be described with further reference to FIG. 16, which is an example graph 450 for determining the sequence of media content items.

[0125] In this example, the sequence of media content items is modeled with a graph traversal problem and determined using a graph which represents the media content items to be sequenced and the similarities between the media content items.

[0126] At operation 432, the media delivery system 104 operates to generate a graph 450 (FIG. 16) for representing the track features of the media content items in the playlist. In some embodiments, as illustrated in FIG. 16, the graph 450 (G = (V, E)) is a complete symmetric graph having a plurality of vertices (V) 452 (including 452A-J) and a plurality of edges (E) 454. In the illustrated examples, the graph 450 has eight (8) vertices 452A-J connected through the edges 454. In some

embodiments, the graph 450 is a directed graph, in which edges have orientations. In other embodiments, the graph 450 is an undirected graph, in which edges have no orientation.

[0127] In the graph 450, the vertices 452 correspond with the media content items 116 to be sequenced, respectively. Where the graph 450 is symmetrical, the positions of the media content items 116 with respect to the vertices 452 are irrelevant.

[0128] Each of the edges 454 connecting the vertices 452 (i.e., the media content items) can have a property representative of the similarity between two media content items connected via that edge. In some embodiments, each of the edges 454 represents a distance (e.g., Euclidean distance) between the aggregated feature vectors 378 of two media content items connected via that edge. In the illustrated example of FIG. 16, the property of each edge 454 can be depicted as the thickness of the edge. In one example, a thicker edge between two vertices can indicate that a distance between the aggregated feature vectors of two media content items corresponding to the vertices is closer than another distance, and, therefore, that the two media content items are more similar than another pair of media content items. In another example, a thicker edge between two vertices can indicate that a distance between the aggregated feature vectors of two media content items corresponding to the vertices is further than another distance, and, therefore, that the two media content items are less similar than another pair of media content items. In other embodiments, the property of each edge 454 can be represented as a numerical value annotated with that edge. Other forms for representing the properties of edges are also possible.

[0129] At operation 434, the media delivery system 104 identifies a seed vertex 456. The seed vertex 456 is a vertex associated with the seed media content item 310. When determining an optima path in subsequent operations, the seed vertex 456 is used as a starting point.

[0130] At operation 436, the media delivery system 104 determines an optimal path 460 (dotted lines in FIG. 16) that visits all the vertices 452 (such as 452A-J) only once. The optimal path can be a route consisting of the edges 454 that connect all the vertices 452 exactly once at a lower total cost. A total cost of a path can be determined based on the property of the edges in the path. As described herein, the property of an edge includes a distance between two vertices connected via that edge, which is indicative of a similarity between two media content items corresponding to the two vertices. Therefore, the total cost of a path can be a sum of distances between adjacent vertices in that path.

[0131] In some embodiments, the optimal path is found using the shortest Hamiltonian path approach. In other embodiments, the optimal path can be found using the shortest Hamiltonian cycle approach. As the Hamiltonian path problem and the Hamiltonian cycle problem are both P-complete, approximation approaches are used to find the shortest paths at the operation 436. In one example, a straight forward greedy approximation can be used, which iteratively selects the closest non-visited vertex, starting from the seed vertex. In another example, an improvement to the straight forward greedy approximation can be made by selecting the closest non-visited vertex from either the tail or the head of the partial sequencing.

[0132] As the edges are weighted by the Euclidean distance between the corresponding media content item features (e.g., the aggregated feature vectors) in constructing the graph 450, the total cost of sequencing can be a sum of all the weights of the edges in the path.

[0133] At operation 438, the media delivery system 104 determines a sequence of the media content items based on the optimal path 460. When the optimal path 460 is determined at the operation 436, the media content items can be arranged in the same order as the corresponding vertices 452 along the calculated optimal path 460.

[0134] As such, the operations 436 and 438 are configured to determine an optimal path that visits each vertex 452 exactly once. Accordingly, such an optimal path among the vertices 452 can give an optimal order of the media content items 116.

[0135] Referring now to FIGS. 17 and 18, in certain examples, the system of the present disclosure can be used to play back a plurality of media content items to continuously support a user's repetitive motion activity without distracting the user's cadence.

[0136] Users of media playback devices often consume media content while engaging in various activities, including repetitive motion activities. As noted above, examples of repetitive-motion activities may include swimming, biking, running, rowing, and other activities. Consuming media content may include one or more of listening to audio content, watching video content, or consuming other types of media content. For ease of explanation, the embodiments described in this application are presented using specific examples. For example, audio content (and in particular music) is described as an example of one form of media consumption. As another example, running is described as one example of a repetitive-motion activity. However, it should be understood that the same concepts are equally applicable to other forms of media consumption and to other forms of repetitive- motion activities, and at least some embodiments include other forms of media consumption and/or other forms of repetitive-motion activities.

[0137] The users may desire that the media content fits well with the particular repetitive activity. For example, a user who is running may desire to listen to music with a beat that corresponds to the user's cadence. Beneficially, by matching the beat of the music to the cadence, the user's performance or enjoyment of the repetitive-motion activity may be enhanced. This desire cannot be met with traditional media playback devices and media delivery systems.

[0138] FIG. 17 illustrates an example system 1000 for managing a sequence between media content items to continuously support a repetitive motion activity. In some embodiments, the system 1000 is configured similarly to the system 100 as described herein. Therefore, the description for all the features and elements in the system 100 are incorporated by reference for the system 1000. Where like or similar features or elements are shown, the same reference numbers will be used where possible. The following description for the system 1000 will be limited primarily to the differences from the system 100.

[0139] In the system 1000, the media playback device 102 further includes a cadence-acquiring device 1 1 14, as well as the media content sequencing engine 1 10. Also shown are a user U who is running. The user U's upcoming steps S are shown as well. A step represents a single strike of the runner's foot upon the ground.

[0140] The media playback device 102 can play media content for the user based on the user's cadence. In the example shown, the media output 108 includes music with a tempo that corresponds to the user's cadence. The tempo (or rhythm) of music refers to the frequency of the beat and is typically measured in beats per minute (BPM). The beat is the basic unit of rhythm in a musical composition (as determined by the time signature of the music). Accordingly, in the example shown, the user U's steps occur at the same frequency as the beat of the music.

[0141] For example, if the user U is running at a cadence of 180 steps per minute, the media playback device 102 may play a media content item having a tempo equal to or approximately equal to 180 BPM. In other embodiments, the media playback device 102 plays a media content item having a tempo equal or approximately equal to the result of dividing the cadence by an integer such as a tempo that is equal to or approximately equal to one-half (e.g., 90 BPM when the user is running at a cadence of 180 steps per minute), one-fourth, or one-eighth of the cadence. Alternatively, the media playback device 102 plays a media content item having a tempo that is equal or approximately equal to an integer multiple (e.g., 2x, 4x, etc.) of the cadence. Further, in some embodiments, the media playback device 102 operates to play multiple media content items including one or more media content items having a tempo equal to or approximately equal to the cadence and one or more media content items have a tempo equal or approximately equal to the result of multiplying or dividing the cadence by an integer. Various other combinations are possible as well. [0142] In some embodiments, the media playback device 102 operates to play music having a tempo that is within a predetermined range of a target tempo. In at least some embodiments, the predetermined range is plus or minus 2.5 BPM. For example, if the user U is running at a cadence of 180 steps per minute, the media playback device 102 operates to play music having a tempo of 177.5-182.5 BPM. Alternatively, in other embodiments, the predetermined range is itself in a range from 1 BPM to 10 BPM. Other ranges of a target tempo are also possible.

[0143] Further, in some embodiments, the media content items that are played back on the media playback device 102 have a tempo equal to or approximately equal to a user U's cadence after it is rounded. For example, the cadence may be rounded to the nearest multiple of 2.5, 5, or 10 and then the media playback device 102 plays music having a tempo equal to or approximately equal to the rounded cadence. In yet other embodiments, the media playback device 102 uses the cadence to select a predetermined tempo range of music for playback. For example, if the user U's cadence is 181 steps per minute, the media playback device 102 may operate to play music from a predetermined tempo range of 180-184.9 BPM; while if the user U's cadence is 178 steps per minute, the media playback device 102 may operate to play music from a predetermined tempo range of 175-179.9 BPM.

[0144] Referring still to FIG. 17, the cadence-acquiring device 1114 operates to acquire a cadence associated with the user U. In at least some embodiments, the cadence-acquiring device 1114 operates to determine cadence directly and includes one or more accelerometers or other motion-detecting technologies. Alternatively, the cadence-acquiring device 1114 operates to receive data representing a cadence associated with the user U. For example, in some embodiments, the cadence- acquiring device 1114 operates to receive data from a watch, bracelet, foot pod, chest strap, shoe insert, anklet, smart sock, bicycle computer, exercise equipment (e.g., treadmill, rowing machine, stationary cycle), or other device for determining or measuring cadence. Further, in some embodiments, the cadence-acquiring device 1114 operates to receive a cadence value input by the user U or another person.

[0145] FIG. 18 illustrates an example of the media delivery system 104 of FIG. 17 for managing a sequence between media content items to continuously support a repetitive motion activity. In the system 1000, the media delivery system 104 further includes a media server 1200 and a repetitive-motion activity server 1202. The media server 1200 includes the media server application 150, the processing device 152, the memory device 154, and the network access device 156, as described herein.

[0146] In at least some embodiments, the media server 1200 and the repetitive- motion activity server 1202 are provided by separate computing devices. In other embodiments, the media server 1200 and the repetitive-motion activity server 1202 are provided by the same computing devices. Further, in some embodiments, one or both of the media server 1200 and the repetitive-motion activity server 1202 are provided by multiple computing devices. For example, the media server 1200 and the repetitive-motion activity server 1202 may be provided by multiple redundant servers located in multiple geographic locations.

[0147] The repetitive-motion activity server 1202 operates to provide repetitive- motion activity-specific information about media content items to media playback devices. In some embodiments, the repetitive-motion activity server 1202 includes a repetitive-motion activity server application 1220, a processing device 1222, a memory device 1224, and a network access device 1226. The processing device 1222, memory device 1224, and network access device 1226 may be similar to the processing device 152, memory device 154, and network access device 156 respectively, which have each been previously described.

[0148] In some embodiments, repetitive-motion activity server application 1220 operates to transmit information about the suitability of one or more media content items for playback during a particular repetitive-motion activity. The repetitive- motion activity server application 1220 includes a repetitive-motion activity interface 1228 and a repetitive-motion activity media metadata store 1230.

[0149] In some embodiments, the repetitive-motion activity server application 1220 may provide a list of media content items at a particular tempo to a media playback device in response to a request that includes a particular cadence value. Further, in some embodiments, the media content items included in the returned list will be particularly relevant for the repetitive motion activity in which the user is engaged (for example, if the user is running, the returned list of media content items may include only media content items that have been identified as being highly runnable).

[0150] The repetitive-motion activity interface 1228 operates to receive requests or other communication from media playback devices or other systems to retrieve information about media content items from the repetitive-motion activity server 1202. For example, in FIG. 2, the repetitive-motion activity interface 1228 receives communication 184 from the media playback engine 146.

[0151] In some embodiments, the repetitive-motion activity media metadata store 1230 stores repetitive-motion activity media metadata 1232. The repetitive- motion activity media metadata store 1230 may comprise one or more databases and file systems. Other embodiments are possible as well.

[0152] The repetitive-motion activity media metadata 1232 operates to provide various information associated with media content items, such as the media content items 170. In some embodiments, the repetitive-motion activity media metadata 1232 provides information that may be useful for selecting media content items for playback during a repetitive-motion activity. For example, in some embodiments, the repetitive-motion activity media metadata 1232 stores runnability scores for media content items that corresponds to the suitability of particular media content items for playback during running. As another example, in some embodiments, the repetitive-motion activity media metadata 1232 stores timestamps (e.g., start and end points) that identify portions of a media content items that are particularly well- suited for playback during running (or another repetitive-motion activity).

[0153] Each of the media playback device 102 and the media delivery system 104 can include additional physical computer or hardware resources. In at least some embodiments, the media playback device 102 communicates with the media delivery system 104 via the network 106.

[0154] In at least some embodiments, the media delivery system 104 can be used to stream, progressively download, or otherwise communicate music, other audio, video, or other forms of media content items to the media playback device 102 based on a cadence acquired by the cadence-acquiring device 1114 of the media playback device 102. In accordance with an embodiment, a user U can direct the input to the user interface 144 to issue requests, for example, to playback media content corresponding to the cadence of a repetitive motion activity on the media playback device 102.

[0155] The media mix data generation engine 1240 operates to generate media mix data to be used for sequencing and/or crossfading cadence-based media content items. As described herein, such media mix data can be incorporated in repetitive- motion activity media metadata 1232.

[0156] In this example, the media content sequencing engine 110 operates to arrange selected media content items (such as ones in a playlist) in such an order that the media content items are played on the media playback device 102 to

continuously support a user's repetitive motion activity without interruption or jarring effect.

[0157] In this document, for the purpose of determining track features or feature vectors, calculating an aggregated track feature or aggregated feature vector, or determining similarity between two media content items or tracks, a media content item or a track may indicate the entire media content item or the entire track, a portion of the media content item or a portion of the track, or a collection of media content items or a collection of tracks, such as an album or a playlist.

[0158] The various examples and teachings described above are provided by way of illustration only and should not be construed to limit the scope of the present disclosure. Those skilled in the art will readily recognize various modifications and changes that may be made without following the examples and applications illustrated and described herein, and without departing from the true spirit and scope of the present disclosure.