PETERSEIL, Lina (Colaiste Padraid Mac PiaraisNa Forbacha, Gaillimh, IE)
PETERSEIL, Lina (Colaiste Padraid Mac PiaraisNa Forbacha, Gaillimh, IE)
| What is claimed: 1. A method for dynamic, distributed creation of a musical composition to accompany a visual composition, comprising: receiving, by an analyzer executed by a first device, a video file from a second device operated by a user; analyzing, by the analyzer, the video file to create one or more data arrays representative of the probability of a neighboring pixel of a video frame having the same color; receiving, by a composer executed by the first device, one or more band parameters selected by the user; generating, by the composer, an audio soundtrack based on the one or more data arrays and the one or more band parameters; and transmitting, by the first device, the audio soundtrack to the second device. 2. The method of claim 1 , wherein receiving one or more band parameters comprises receiving, from the second device, an audio file; and analyzing, by the analyzer, the audio file to generate the one or more band parameters. 3. The method of claim 1 , wherein generating an audio soundtrack comprises composing a musical score based on the one or more data arrays and the one or more band parameters. 4. The method of claim 1, wherein transmitting the audio soundtrack to the second device comprises transmitting the audio soundtrack for synchronization by the second device with the video file. 5. The method of claim 1 , wherein transmitting the audio soundtrack to the second device comprises synchronizing, by the first device, the audio soundtrack and video file to generate a multimedia file; and transmitting the generated multimedia file to the second device. 6. The method of claim 1 , wherein generating an audio soundtrack comprises generating a first low-quality audio soundtrack, and transmitting the audio soundtrack comprises transmitting the first low-quality audio soundtrack to the second device. 7. The method of claim 6, further comprising: receiving, by the first device, a purchase approval from the user; generating, by the composer, a high-quality audio soundtrack; and transmitting, by the first device, the high-quality audio soundtrack to the second device. 8. The method of claim 6, further comprising: receiving, by the first device, a modification to the one or more band parameters by the user; generating, by the composer, a second low-quality audio soundtrack based on the one or more data arrays and the modified one or more band parameters; and transmitting, by the first device, the second low-quality audio soundtrack to the second device. 9. The method of claim 6, further comprising: transmitting, by the first device, the one or more data arrays to the second device for display for the user; receiving, by the first device, a modification to the one or more data arrays by the user; generating, by the composer, a second low-quality audio soundtrack based on the modified one or more data arrays and the one or more band parameters; and transmitting, by the first device, the second low-quality audio soundtrack to the second device. 10. The method of claim 6, further comprising: composing, by the composer, a musical score based on the one or more data arrays and the one or more band parameters; transmitting, by the first device, the composed musical score to the second device for display for the user; receiving, by the first device, a modification to the composed musical score by the user; and composing, by the composer, a second musical score based on the based on the one or more data arrays, the one or more band parameters, and the received modification. 1 1. A system for dynamic, distributed creation of a musical composition to accompany a visual composition, comprising: a first device executing an analyzer, configured to receive a video file from a second device operated by a user, and analyze the video file to create one or more data arrays representative of the probability of a neighboring pixel of a video frame having the same color; and a composer, configured to receive one or more band parameters selected by the user, and generate an audio soundtrack based on the one or more data arrays and the one or more band parameters; and wherein the first device is configured to transmit the audio soundtrack to the second device. 12. The system of claim 1 1, wherein the analyzer is further configured to receive an audio file from the second device, and analyze the audio file to generate the one or more band parameters. 13. The system of claim 1 1 , wherein the composer is further configured to compose a musical score based on the one or more data arrays and the one or more band parameters. 14. The system of claim 1 1, wherein the first device is configured to transmit the audio soundtrack for synchronization by the second device with the video file. 15. The system of claim 11, wherein the first device is configured to synchronize the audio soundtrack and video file to generate a multimedia file and transmit the generated multimedia file to the second device. 16. The system of claim 11, wherein the composer is further configured to generate a first low-quality audio soundtrack, and the first device is configured to transmit the first low- quality audio soundtrack to the second device. 17. The system of claim 16, wherein the composer is further configured to generate a high- quality audio soundtrack, responsive to the first device receiving a purchase approval from the user. 18. The system of claim 16, wherein the composer is further configured to generate a second low-quality audio soundtrack, responsive to the first device receiving a modification to the one or more band parameters by the user, the second low-quality audio soundtrack based on the one or more data arrays and the modified one or more band parameters. 19. The system of claim 16, wherein the composer is further configured to generate a second low-quality audio soundtrack, responsive to receiving a modification to the one or more data arrays by the user, the second low-quality audio soundtrack based on the modified one or more data arrays and the one or more band parameters. 20. The system of claim 16, wherein the composer is further configured to compose a first musical score based on the one or more data arrays and the one or more band parameters, the musical score transmitted to the second device for display for the user; and the composer is further configured to compose a second musical score based on the based on the one or more data arrays, the one or more band parameters, and a modification to the first musical score received from the user. |
COMPOSITION
Related Applications
The present application claims priority to and the benefit of co-pending U.S.
provisional patent application no. 61/362,507, titled "Systems and Methods for Dynamic, Distributed Creation of a Musical Composition to Accompany a Visual Composition," filed July 8, 2010, the entirety of which is incorporated by reference.
Field of the Invention
The present application generally relates to dynamic, distributed creation of music compositions to accompany visual compositions.
Background of the Invention
In creating musical compositions to accompany a visual work or composition, such as a video or slideshow, the traditional method of hiring a composer to review the visuals and compose a matching score may be prohibitively expensive and time-consuming, particularly for independent film makers, web multimedia producers, and other content creators.
Furthermore, a composed score may be subject to copyright licensing and payment of royalties which may be unpredictable, particularly with rapid internet distribution of multimedia and the possibility of sudden surges in popularity of a work.
To reduce the impact of these issues, composers have created libraries of pre- composed and recorded music of varying moods, styles, and lengths which may be subject to royalty- free licenses or flat rates. These libraries, sometimes referred to as royalty-free libraries, stock libraries, or production music libraries, are frequently distributed as audio CDs with one or more full-length songs in a particular style, along with versions of the songs edited to short lengths, such as 30 or 60 seconds, for use in radio or television commercials. While useful for productions of these specific time-spans, they are less than optimal for visual works with non-standard lengths. Furthermore, because they are composed without review of the visual work, these songs may have phrasing that fails to synchronize with visual elements. For example, it may be aesthetically pleasing or provide greater impact to a visual if an accompanying musical score has key changes or modulation, phrase conclusions, stings, or other musical elements that synchronize with corresponding visual elements such as scene changes, entrances or exits of characters, presentation of a product, logo, titles or other text, or other visual elements. Pre-composed soundtracks, however, may have phrases or key changes that start in the middle of a scene, leading to confusion or a disjointed feeling for the viewer.
Brief Summary of the Invention
The present application is directed towards systems and methods for dynamic, distributed creation of music compositions to accompany visual compositions. Visual compositions, including videos and slideshows, may be uploaded to a server cloud and analyzed in a distributed system to create one or more data arrays. The data arrays may be used to create one or more melodic or harmonic phrases or musical lines of a composed score. In some embodiments, additional parameters may be input or an audio file may be uploaded and analyzed to create parameters for adjusting style, tempo, key, mode, or other melodic features. The composed score may be input to an audio generator, sampler or sequencer, or other playback engine to create an audio file. In some embodiments, the audio file may be synchronized with the input visual composition to create a multimedia file with a dynamically composed score with phrasing, melodic and harmonic features that correspond to visual elements over time. In one aspect, the present application is directed to a method for dynamic, distributed creation of a musical composition to accompany a visual composition. The method includes receiving, by an analyzer executed by a first device, a video file from a second device operated by a user. The method also includes the analyzer analyzing the video file to create one or more data arrays representative of the probability of a neighboring pixel of a video frame having the same color. The method further includes a composer executed by the first device receiving one or more band parameters selected by the user. The method also includes the composer generating an audio soundtrack based on the one or more data arrays and the one or more band parameters. The method also includes the first device transmitting the audio soundtrack to the second device.
In one embodiment, the method includes the first device receiving an audio file from the second device, and analyzing the audio file to generate the one or more band parameters. In another embodiment, the method includes composing a musical score based on the one or more data arrays and the one or more band parameters. In still another embodiment, the method includes transmitting the audio soundtrack for synchronization by the second device with the video file. In yet still another embodiment, the method includes the first device synchronizing the audio soundtrack and video file to generate a multimedia file; and transmitting the generated multimedia file to the second device.
In some embodiments, the method includes generating a first low-quality audio soundtrack, and transmitting the audio soundtrack comprises transmitting the first low-quality audio soundtrack to the second device. In a further embodiment, the method includes the first device receiving a purchase approval from the user, generating a high-quality audio soundtrack, and transmitting the high-quality audio soundtrack to the second device. In another further embodiment, the method includes the first device receiving a modification to the one or more band parameters by the user. The method also includes the composer generating a second low-quality audio soundtrack based on the one or more data arrays and the modified one or more band parameters, and the first device transmitting the second low- quality audio soundtrack to the second device.
In yet another further embodiment, the method includes the first device transmitting the one or more data arrays to the second device for display for the user. The first device receives a modification to the one or more data arrays by the user. The composer generates a second low-quality audio soundtrack based on the modified one or more data arrays and the one or more band parameters, and the first device transmits the second low-quality audio soundtrack to the second device. In a still yet further embodiment, the method includes the composer composing a musical score based on the one or more data arrays and the one or more band parameters. The first device transmits the composed musical score to the second device for display for the user, and receives a modification to the composed musical score by the user. The composer composes a second musical score based on the based on the one or more data arrays, the one or more band parameters, and the received modification.
In another aspect, the present application is directed to a system for dynamic, distributed creation of a musical composition to accompany a visual composition. The system includes a first device executing an analyzer and a composer. The analyzer is configured to receive a video file from a second device operated by a user, and analyze the video file to create one or more data arrays representative of the probability of a neighboring pixel of a video frame having the same color. The composer is configured to receive one or more band parameters selected by the user, and generate an audio soundtrack based on the one or more data arrays and the one or more band parameters. The first device is configured to transmit the audio soundtrack to the second device.
In one embodiment, the analyzer is further configured to receive an audio file from the second device, and analyze the audio file to generate the one or more band parameters. In another embodiment, the composer is further configured to compose a musical score based on the one or more data arrays and the one or more band parameters. In still another embodiment, the first device is configured to transmit the audio soundtrack for
synchronization by the second device with the video file. In yet still another embodiment, the first device is configured to synchronize the audio soundtrack and video file to generate a multimedia file and transmit the generated multimedia file to the second device.
In one embodiment, the composer is further configured to generate a first low-quality audio soundtrack, and the first device is configured to transmit the first low-quality audio soundtrack to the second device. In a further embodiment, the composer is further configured to generate a high-quality audio soundtrack, responsive to the first device receiving a purchase approval from the user. In another further embodiment, the composer is further configured to generate a second low-quality audio soundtrack, responsive to the first device receiving a modification to the one or more band parameters by the user, the second low- quality audio soundtrack based on the one or more data arrays and the modified one or more band parameters.
In yet another further embodiment, the composer is further configured to generate a second low-quality audio soundtrack, responsive to receiving a modification to the one or more data arrays by the user, the second low-quality audio soundtrack based on the modified one or more data arrays and the one or more band parameters. In yet still another further embodiment, the composer is further configured to compose a first musical score based on the one or more data arrays and the one or more band parameters, the musical score transmitted to the second device for display for the user; and the composer is further configured to compose a second musical score based on the based on the one or more data arrays, the one or more band parameters, and a modification to the first musical score received from the user. The details of various embodiments of the invention are set forth in the accompanying drawings and the description below.
Brief Description of the Figures
The foregoing and other objects, aspects, features, and advantages of the invention will become more apparent and better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 A is a block diagram of an embodiment of a network environment for a client to access a server for dynamic, distributed creation of music compositions;
FIGs. IB - 1C are block diagrams of embodiments of a computing device;
FIG. 2A is a block diagram of an embodiment of a system for dynamically creating musical compositions;
FIG. 2B is a block diagram of an embodiment of a system for using a cloud distributed cloud service to dynamically create musical compositions;
FIG. 3 is a block diagram of a screenshot of an embodiment of a video analysis application;
FIG. 4 is a block diagram of a screenshot of an embodiment of a web page hosted by a portal for dynamic, distributed creation of musical compositions; and
FIG. 5 is a flow chart of an embodiment of a distributed method of dynamically creating a musical composition for a video work.
The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. Detailed Description of the Invention
For purposes of reading the description of the various embodiments below, the following descriptions of the sections of the specification and their respective contents may be helpful:
Section A describes a network environment and computing environment which may be useful for practicing embodiments described herein; and
Section B describes embodiments of systems and methods for dynamic, distributed creation of musical compositions to accompany video works.
A. Network and Computing Environment
Prior to discussing the specifics of embodiments of the systems and methods of the solution of the present disclosure, it may be helpful to discuss the network and computing environments in which such embodiments may be deployed. Referring now to Figure 1 A, an embodiment of a network environment 101 is depicted. In brief overview, the network environment 101 comprises one or more client systems 102A-102N (referred to generally as clients 102) in communication with one or more server systems 106A-106N (referred to generally as servers 106) via one or more networks 104. In some embodiments, a client 102 communicates with a server 106 via an intermediary appliance (not shown), such as a firewall, a switch, a hub, a NAT, a proxy, a performance enhancing proxy, a network accelerator, a modem, or other network device of any form or type.
As shown in FIG. 1 A, the network 104 can be a local-area network (LAN), such as a company Intranet, a metropolitan area network (MAN), or a wide area network (WAN), such as the Internet or the World Wide Web. Although not illustrated, network 104 may comprise one or more networks, coupled either directly or via one or more intermediaries. In many embodiments, multiple network segments 104 may exist between a client 102 and a server 106, and accordingly, a client 102 may communicate with a server 106 via one or more networks 104, and one or more intermediaries or other servers. In one embodiment, network 104 may be a private network. In another embodiment, network 104 may be a public network. In some embodiments, network 104 may be a combination of one or more private networks and one or more public networks. In some embodiments, clients 102 may be located at a branch office of a corporate enterprise communicating via a WAN connection over the network 104 to the systems 106 located at a corporate data center. In other embodiments, clients 102 may be located at users' homes or offices and systems 106 may be located at a data center. In some embodiments discussed in more detail below, systems 106 may be distributed across a plurality of locations and serve as a cloud environment.
The network 104 may be any type and/or form of network and may include any of the following: a point to point network, a broadcast network, a wide area network, a local area network, a telecommunications network, a data communication network, a computer network, an ATM (Asynchronous Transfer Mode) network, a SONET (Synchronous Optical Network) network, a SDH (Synchronous Digital Hierarchy) network, a wireless network and a wireline network. In some embodiments, the network 104 may comprise a wireless link, such as an infrared channel or satellite band. The topology of the network 104 may be a bus, star, or ring network topology. The network 104 and network topology may be of any such network or network topology as known to those ordinarily skilled in the art capable of supporting the operations described herein.
Clients 102 and servers 106 may be deployed as and/or executed on any type and form of computing device, such as a computer, network device or appliance capable of communicating on any type and form of network and performing the operations described herein. FIGs. IB and 1C depict block diagrams of a computing device 100 useful for practicing an embodiment of client 102 or server 106. As shown in FIGs. IB and 1C, each computing device 100 includes a central processing unit 121, and a main memory unit 122. As shown in FIG. IB, a computing device 100 may include a display device 124, a keyboard 126 and/or a pointing device 127, such as a mouse. As shown in FIG. 1C, each computing device 100 may also include additional optional elements, such as one or more input/output devices 130a- 130b (generally referred to using reference numeral 130), and a cache memory 140 in communication with the central processing unit 121.
The central processing unit 121 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 122. In many embodiments, the central processing unit is provided by a microprocessor unit, such as: those manufactured by Intel Corporation of Mountain View, California; those manufactured by Motorola Corporation of Schaumburg, Illinois; those manufactured by Transmeta Corporation of Santa Clara,
California; the RS/6000 processor, those manufactured by International Business Machines of White Plains, New York; or those manufactured by Advanced Micro Devices of Sunnyvale, California. The computing device 100 may be based on any of these processors, or any other processor capable of operating as described herein.
Main memory unit 122 may be one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 121 , such as Static random access memory (SRAM), Burst SRAM or SynchBurst SRAM (BSRAM), Dynamic random access memory (DRAM), Fast Page Mode DRAM (FPM DRAM), Enhanced DRAM (EDRAM), Extended Data Output RAM (EDO RAM), Extended Data Output DRAM (EDO DRAM), Burst Extended Data Output DRAM (BEDO DRAM), Enhanced DRAM (EDRAM), synchronous DRAM (SDRAM), JEDEC SRAM, PC 100 SDRAM, Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), SyncLink DRAM (SLDRAM), Direct Rambus DRAM (DRDRAM), or Ferroelectric RAM (FRAM). The main memory 122 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein. In the embodiment shown in FIG. IB, the processor 121 communicates with main memory 122 via a system bus 150 (described in more detail below). FIG. 1C depicts an embodiment of a computing device 100 in which the processor communicates directly with main memory 122 via a memory port 103. For example, in FIG. 1C the main memory 122 may be DRDRAM.
FIG. 1C depicts an embodiment in which the main processor 121 communicates directly with cache memory 140 via a secondary bus, sometimes referred to as a backside bus. In other embodiments, the main processor 121 communicates with cache memory 140 using the system bus 150. Cache memory 140 typically has a faster response time than main memory 122 and is typically provided by SRAM, BSRAM, or EDRAM. In the embodiment shown in FIG. 1C, the processor 121 communicates with various I/O devices 130 via a local system bus 150. Various busses may be used to connect the central processing unit 121 to any of the I O devices 130, including a VESA VL bus, an ISA bus, an EISA bus, a
MicroChannel Architecture (MCA) bus, a PCI bus, a PCI-X bus, a PCI-Express bus, or a NuBus. For embodiments in which the I/O device is a video display 124, the processor 121 may use an Advanced Graphics Port (AGP) to communicate with the display 124. FIG. 1C depicts an embodiment of a computer 100 in which the main processor 121 communicates directly with I/O device 130 via HyperTransport, Rapid I/O, or InfiniBand. FIG. 1C also depicts an embodiment in which local busses and direct communication are mixed: the processor 121 communicates with I/O device 130a using a local interconnect bus while communicating with I O device 130b directly.
The computing device 100 may support any suitable installation device, such as a floppy disk drive for receiving floppy disks such as 3.5-inch, 5.25-inch disks or ZIP disks, a CD-ROM drive, a CD-R/RW drive, a DVD-ROM drive, tape drives of various formats, USB device, hard-drive or any other device suitable for installing software and programs, or portion thereof. The computing device 100 may further comprise a storage device 128, such as one or more hard disk drives or redundant arrays of independent disks, for storing an operating system and other related software, and for storing application software programs. Optionally, any of the installation devices could also be used as the storage device 128.
Additionally, the operating system and the software can be run from a bootable medium, for example, a bootable CD, such as KNOPPIX®, a bootable CD for GNU/Linux that is available as a GNU/Linux distribution from knoppix.net.
Furthermore, the computing device 100 may include a network interface 1 18 to interface to a Local Area Network (LAN), Wide Area Network (WAN) or the Internet through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (e.g., 802.1 1 , Tl , T3, 56kb, X.25), broadband connections (e.g., ISDN, Frame Relay, ATM), wireless connections, or some combination of any or all of the above. The network interface 1 18 may comprise a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 100 to any type of network capable of communication and performing the operations described herein.
A wide variety of I/O devices 130a-130n may be present in the computing device 100. Input devices include keyboards, mice, trackpads, trackballs, microphones, and drawing tablets. Output devices include video displays, speakers, inkjet printers, laser printers, and dye-sublimation printers. In some embodiments, specialized controllers may used for I/O devices 130. In an embodiment in which computing device 100 is a non-linear editing workstation, an I/O device 130 may be a control surface comprising one or more faders for controlling audio levels, and/or one or more video transport controls. For example, I/O device 130 may comprise a Mackie Human User Interface (HUI) manufactured by LOUD Technologies, Inc. of Woodinville, Washington; an OIX hardware controller, manufactured by the Yamaha Corporation of Hamamatsu, Japan; or any other type and form of audio or video editing control surface. The I/O devices 130 may be controlled by an I/O controller 123 as shown in FIG. IB. The I/O controller may control one or more I/O devices such as a keyboard 126 and a pointing device 127, e.g., a mouse or optical pen. Furthermore, an I/O device may also provide storage 128 and/or an installation medium for the computing device 100. In still other embodiments, the computing device 100 may provide USB connections to receive handheld USB storage devices such as the USB Flash Drive line of devices manufactured by Twintech Industry, Inc. of Los Alamitos, California.
In some embodiments, the computing device 100 may comprise or be connected to multiple display devices 124a-124n, which each may be of the same or different type and/or form. As such, any of the I/O devices 130a-130n and/or the I/O controller 123 may comprise any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of multiple display devices 124a-124n by the computing device 100. For example, the computing device 100 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display devices 124a-124n. In one embodiment, a video adapter may comprise multiple connectors to interface to multiple display devices 124a-124n. In other embodiments, the computing device 100 may include multiple video adapters, with each video adapter connected to one or more of the display devices 124a-124n. In some embodiments, any portion of the operating system of the computing device 100 may be configured for using multiple displays 124a-124n. In other embodiments, one or more of the display devices 124a- 124n may be provided by one or more other computing devices, such as computing devices 100a and 100b connected to the computing device 100, for example, via a network. These embodiments may include any type of software designed and constructed to use another computer's display device as a second display device 124a for the computing device 100. One ordinarily skilled in the art will recognize and appreciate the various ways and embodiments that a computing device 100 may be configured to have multiple display devices 124a-124n.
In some embodiments, computing device 100 may include one or more audio interfaces or audio outputs, including speakers or audio monitors, headphones, amplifiers, pre-amplifiers, line drivers, digital to analog converters, analog or digital signal processors, or other components for outputting music, audio, an audio track of a video, or any other sound output.
In further embodiments, an I/O device 130 may be a bridge 170 between the system bus 150 and an external communication bus, such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a HIPPI bus, a Super HIPPI bus, a SerialPlus bus, a SCI/LAMP bus, a FibreChannel bus, or a Serial Attached small computer system interface bus.
A computing device 100 of the sort depicted in FIGs. IB and 1C typically operate under the control of operating systems, which control scheduling of tasks and access to system resources. The computing device 100 can be running any operating system such as any of the versions of the Microsoft® Windows operating systems, the different releases of the Unix and Linux operating systems, any version of the Mac OS® for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein. Typical operating systems include: WINDOWS 3.x, WINDOWS 95, WINDOWS 98, WINDOWS 2000, WINDOWS NT 3.51 , WINDOWS NT 4.0, WINDOWS CE, and WINDOWS XP, all of which are manufactured by Microsoft Corporation of Redmond, Washington; MacOS, manufactured by Apple Computer of Cupertino, California; OS/2, manufactured by International Business Machines of
Armonk, New York; and Linux, a freely-available operating system distributed by Caldera Corp. of Salt Lake City, Utah, or any type and/or form of a Unix operating system, among others.
In other embodiments, the computing device 100 may have different processors, operating systems, and input devices consistent with the device. For example, in one embodiment the computer 100 is a Treo 180, 270, 1060, 600 or 650 smart phone
manufactured by Palm, Inc. In this embodiment, the Treo smart phone is operated under the control of the PalmOS operating system and includes a stylus input device as well as a five- way navigator device. In another embodiment, the computer 100 may be an iPhone, iPod touch, or iPad manufactured by Apple, Inc. In these embodiments, the computer 100 may be operated under control of the iOS operating system manufactured by Apple, Inc., and may include a multi-touch touchscreen input. In yet another embodiment, the computer 100 may be a smart phone operating under control of the Android operating system manufactured by Google, Inc. Moreover, the computing device 100 can be any workstation, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone, any other computer, or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.
In some embodiments, a first computing device 100a executes an application on behalf of a user of a client computing device 100b. In other embodiments, a computing device 100a executes a virtual machine, which provides an execution session within which applications execute on behalf of a user or a client computing devices 100b. In one of these embodiments, the execution session is a hosted desktop session. In another of these embodiments, the computing device 100 executes a terminal services session. The terminal services session may provide a hosted desktop environment. In still another of these embodiments, the execution session provides access to a computing environment, which may comprise one or more of: an application, a plurality of applications, a desktop application, and a desktop session in which one or more applications may execute.
B. Systems and Methods for Dynamic, Distributed Creation of Musical Compositions
Creating aesthetically-pleasing and emotionally-compelling soundtracks for audiovisual works requires more than simply determining an overall mood, style and length of the work, although these questions are important. Composing a musical composition that matches the associated visual work requires paying attention to phrase length, timing of key changes, internal tempo changes, and other elements. Timing these elements to scene or camera changes, entrances and exits of characters, titles, and other visual elements, allows for synchronization that increases the impact of the visual work. For example, if a musical score changes to a minor key on the entrance of a new character to a scene, viewers may feel a sense of anxiety or danger desired by the producer or director. Similarly, if the visual changes from a dark interior to a brightly lit exterior shot, a corresponding change in the musical score can signify that the shot is part of a new scene or act, that time has passed, or any number of subtle indications that can be used to aid the director or producer in telling a story.
Many of the important changes in a visual work, sometimes referred to as key elements or keyframes, such as camera cuts or scene changes, character or title entrances or exits or other elements change the color content of the picture from one frame to the next. For example, the last frame of one scene may be an outdoor shot of a sunset, with a sky dominated by reds and oranges and a dark ground, while the following first frame of the next scene may be a brightly lit morning, with a blue sky and green, grassy ground. Accordingly, by analyzing the color content of successive frames in a video and using the color content to dynamically create a musical composition, the resulting score will automatically synchronize with many of the important visual elements. Additionally, because the score can be created dynamically, the producer or director may adjust visual elements, swap scenes, insert new endings, slow titles for easier reading, or make other changes without incurring the expense and time of a human composer making multiple scores for a work.
Furthermore, these techniques may be provided as a distributed service via a network, allowing directors and producers to upload a video to the network, have a remote server analyze the video and create a musical composition, and download the created composition, without requiring the director or producer to install, learn, or utilize specialized software. This allows the directors and producers to focus on the visual content creation. In many embodiments, the generated audio files or score of the music composition may be output in a format usable by a non-linear editor (NLE), such as Final Cut Pro, iDVD, or iMovie manufactured by Apple, Inc. of Cupertino, California; Adobe Premiere, manufactured by Adobe Systems of San Jose, California; Media Composer, manufactured by Avid Technology of Burlington, Massachusetts; or any other type and form of audio, video, or multimedia nonlinear editing system.
Referring to FIG. 2A, illustrated is a block diagram of an embodiment of a system for dynamic, distributed creation of musical compositions for visual works. In brief overview, a client 102 may include a browser 200 and storage 202. In some embodiments, storage 202 may include one or more video files and one or more audio files. Client 102 may connect via a network 104 to a server 106, which may include or execute a portal 204. In some embodiments, server 106 may include a security module 206 and/or a billing module 208. Server 106 may further include audio/video storage 210, a video analyzer 212, and an audio analyzer 214. In some embodiments, server 106 may include band storage 216, which may include data files for one or more bands, described in more detail below. In some embodiments, server 106 may also include a music composer 218, an audio generator 220, and/or an audio/video resynchronizer 222.
Still referring to FIG. 2A and in more detail, in some embodiments, a client 102 may execute a browser 200 to connect to a portal 204 executed by a server 106. A browser 200 may comprise an application, process, service, logic, program, or any other type and form of executable code for interacting with a remote server. In some embodiments, browser 200 may be a web browser, such as the Internet Explorer browser manufactured by Microsoft, Inc. of Redmond, Washington; the Safari browser manufactured by Apple, Inc. of Cupertino, California; the Firefox browser manufactured by the Mozilla Foundation of Mountain View, California; the Chrome browser manufactured by Google, Inc. of Mountain View, California; or any other web browser. In other embodiments, browser 200 may be a dedicated application for dynamic distributed music creation. In many embodiments, browser 200 may be used by a director, producer or other user of client 102 to connect to portal 204, upload video files from storage 202 to server 106 for analysis, configure music composition parameters, and download composed music files. In some embodiments, browser 200 may also be used to upload audio files for analysis, discussed in more detail below.
Client 102 may include storage 202, which may comprise a hardware storage device such as a flash memory, hard drive, CD or DVD-ROM, or any other type and form of file storage as discussed above. In some embodiments, storage 202 may be a network storage device or a storage device external to client 102, but which browser 200 may connect to for retrieval of files. In many embodiments, storage 202 may store audio and video files in one or more formats, including QuickTime video files, MPEG audio or video files, Windows Media audio video files, Audio Interchange File Format (AIFF) audio files, or any other type and form of audio or video files.
In many embodiments, a server 106 may execute a portal 204. Portal 204 may comprise an application, service, process, program, server process, daemon, or any other type and form of executable code for interacting with a remote client via a browser 200. In some embodiments, portal 204 may comprise a web server executing a dynamic or interactive web page. In other embodiments, portal 204 may comprise a database or application server, providing data responsive to dynamic requests from a browser 200. In many embodiments, portal 204 may be able to simultaneously provide isolated services to a plurality of clients 102 and/or users.
In some embodiments, server 106 may include a security module 206. In some embodiments, security module 206 may comprise an application, service, process, daemon or other executable code, database, data table, array, or other storage of data, or combination of executable code and database for providing storage, retrieval, and processing of user credentials. In one embodiment, a user may enter a login and password or other security credential information into portal 204. Portal 204 may then provide this credential information to security module 206 which may compare the information to credential information stored in a security database. In some embodiments, security module 206 may provide functionality for preventing malicious attacks, such as a CAPTCHA human-response test, a timer for detecting multiple failed login attempts, logging or other features.
In many embodiments, security module 206 may also connect to or comprise a billing module 208. Billing module 208 may include a record of charges incurred by a user. For example, in one embodiment, a distributed music composition system may include a pay-as- you-go option. In such an embodiment, a user may be billed or charged a first amount for analyzing a video, a second amount for generating a score, a third amount for generating audio using high-quality musical instrument samples, and a fourth amount for downloading the audio. In other embodiments, one or more of these options may be free, such as allowing a user to analyze a video and listen to a low-quality generated audio score for free, prior to paying for a high quality download of the generated audio. Accordingly, security module 206, billing module 208, and portal 204 may be interconnected or interoperate to provide security and billing features throughout a user session or across multiple separate or resumed sessions. In some embodiments, a user or browser 200 may provide credentials to security module 206 and/or billing module 208 including a user name, login, email address, password, IP address, MAC address, billing address, credit card or other billing account information. In one embodiment, a user may provide preferences, including a preferred download file format. In another embodiment, a user may provide session-specific information, including a production name, company, director name, production title, client, or other information associated with a visual work. This information may be used with security module 206 to retrieve data specific to the session, or may be used with billing module 208 to provide detailed invoicing and billing receipts.
In some embodiments, billing module 208 may provide functionality for storing and processing a stored balance for a customer, client or user. For example, in one such embodiment, a client, customer or user may pre-pay an amount to be stored as a balance on their account. Subsequent uses of the analysis and composition system may then be charged against this stored balance. In a similar embodiment, the client, customer or user may purchase credits, which may be thought of as an abstracted form of currency or uses of the analysis and composition services. For example, in one such embodiment, a user may purchase 5 credits, and each use of the composition service may deplete their credit balance. Embodiments incorporating the use of credits may be simpler for international users, avoiding the requirement of either the user or the billing module translating between various foreign currencies. In a further embodiment, the billing module 208 may store a predetermined threshold for each account balance. If the account balance in any account is depleted below the predetermined threshold, in one embodiment, the billing module 208 may charge the customer or client a predetermined amount to restore the account balance to a level above the predetermined threshold. For example, a user may purchase n credits or pre-pay x dollars to their account. As the user uses various services, their account balance may be depleted. In these embodiments, when the account balance reaches m credits or y dollars, with m or y less than n or x respectively, the user's credit card or bank account may be automatically charged to restore the account balance to n credits or x dollars, or another predetermined amount.
In many embodiments, server 106 may include or connect to audio/video storage 210. Audio/video storage 210 may comprise any type and form of file storage, including hard drives, flash memory, network storage, or any other storage medium for storing and retrieving audio and video files. In many embodiments, a user may utilize browser 200 and portal 204 to upload a video file from storage 202 of client 102 to storage 210 of server 106 for processing, analysis, and music composition. In many embodiments, portal 204 may allow a user to upload a predetermined number of video files, or video files of a predetermined length or size to storage 210. In some embodiments, portal 204 may include functionality for deleting files from audio/video storage 210 responsive to requests from a user, or, in one embodiment, responsive to a file being stored beyond a predetermined expiration time. For example, portal 204 may be configured to allow a user to upload and store a video file on server 106 for a period of one week, and upon expiration of the week, the file may be automatically deleted or overwritten. In some embodiments, users may be charged by billing module 208 responsive to the amount of server storage utilized by the user. For example, billing module 208 may be configured to bill a predetermined amount per megabyte of stored data per day or week. Server 106 may include a video analyzer 212. Video analyzer 212 may comprise any combination of hardware and software elements, including an application, service, process, daemon, utility, functional logic, or other executable code for analyzing a video to determine color and position of colors over time. In one embodiment, video analyzer 212 may analyze every frame of a video, while in other embodiments, video analyzer 212 may analyze every other frame, every third frame, or every frame at a determined interval. In a further embodiment, the determined interval may be dynamically adjusted by video analyzer 212. For example, in one such embodiment, responsive to detecting a large number of pixels in a video frame changing color from a preceding video frame, video analyzer 212 may reduce the determined interval to provide more frequent analysis. Accordingly, quick camera cuts in a video may cause video analyzer 212 to analyze frames with greater frequency than long, steady shots, providing for analysis with detail responsive to video content. In some embodiments, audio/video storage 210, portal 204, browser 200, and/or video analyzer 212 may include functionality for compressing or scaling a video before or after uploading the video to server 106. This may be done to reduce storage requirements, or reduce complexity of video analysis. In one embodiment, videos may be scaled to a predetermined size prior to analysis, such as 640 pixels in width.
In some embodiments, video analyzer 212 may create a first set of one or more data arrays for each frame corresponding to each of one or more predetermined colors. For example, video analyzer 212 may create a data array for "blue" and include in the array an index or identifier for one or more pixels in a frame that match the color "blue". Although described as a single color, in many embodiments, video analyzer 212 may map a range of pixel colors to a predetermined color. For example, in one embodiment, video analyzer 212 may map any pixel with an RGB value of {0, 0, a} with a > 0 to "blue". In another embodiment, video analyzer 212 may map any pixel with an RGB value of {a, b, c} with c > \ .5*{a+b) to "blue". In other embodiments, video analyzer 212 may use any other formula or formulas to map a pixel to a predetermined color. In one embodiment, the mapped color array may include an indication of a presence of a mapped pixel, while in another embodiment, the array may include an indication of the brightness of the pixel, and/or the position of the pixel. In many embodiments, video analyzer 212 may determine the number of pixels mapped to a specific color in a specific frame. This number may be used in displaying a color value of a frame over time, discussed in more detail below in reference to FIG. 3. In one embodiment, the one or more predetermined colors may include pink, white, red, blue, lime green, green, yellow, and orange, or a subset of those colors. In another embodiment, the one or more predetermined colors may include other colors or color ranges. In still another embodiment, one or more colors or color ranges of the predetermined colors may be adjustable, either by a user or client or by an administrator. In one embodiment, the first data array for a particular color may contain an average of the x,y,Y values, based on the CIE 1931 color space, in each frame for that particular color. In another embodiment, the first set of data arrays may include other color spaces or formats, including RGB, CMY, XYZ, or other color space formats. Because the predetermined colors may include white, the analysis and composition methods discussed herein may be applied to black and white videos or visual works without alteration.
In many embodiments, video analyzer 212 may create a second set of one or more data arrays for each frame corresponding to each of the one or more predetermined colors, and including position information of each pixel mapped to the predetermined color. For example, the data array for a particular frame and particular color may indicate that a pixel mapped as "blue" is located at coordinates { 120, 80} in the frame. In some embodiments, the first and second set of arrays may be combined, such that the first set of arrays also includes positional information. In some embodiments, video analyzer 212 may use the second set of one or more data arrays for each frame corresponding to each of the one or more
predetermined colors to create a third set of one or more data arrays with data values describing the probability that a pixel of a certain mapped color will be next to a second pixel of the same color. In one embodiment, a pixel is considered "next to" a second pixel of the same color if it is horizontally adjacent to the second pixel in a single frame. In other embodiments, a pixel is considered "next to" a second pixel if it is vertically adjacent or diagonally adjacent to the pixel in a single frame. In one embodiment, the probability that a pixel of a certain mapped color will be next to a second pixel of the same color may be averaged over the entire frame. This average probability value may be stored, on a per frame basis, in the third set of one or more data arrays. In some embodiments, multiple probability values may be stored on a per frame basis. For example, in one such embodiment in which a pixel is considered "next to" a second pixel of the same color if it is horizontally adjacent to the pixel, multiple probability values may be stored for the frame representing the average probability value for each row of the frame. Similarly, in other embodiments based on vertically or diagonally adjacent probabilities, multiple probability values may be stored representing the average probability value for each column or diagonal line within the frame.
In one embodiment, the third set of one or more data arrays may include probabilities mapped to a closest predetermined value. For example, in one embodiment, the third set of one or more data arrays may comprise a number of discrete values, and probabilities between those values may be mapped to the discrete values.
In some embodiments, server 106 may include an audio analyzer 214. Audio analyzer 214 may comprise an application, service, logic, functionality, executable code, or any other type and form of instructions for analyzing an audio file. In some embodiments, audio analyzer may comprise hardware, software, or any combination of software, and may include digital or analog signal processing capability. In many embodiments, a user may upload an audio file to server 106 via portal 204 for analysis. In one such embodiment, audio analyzer 214 may process the audio file to determine tempo and meter. For example, audio analyzer 214 may include a low pass filter and remove frequencies in the audio over a predetermined value, and then detect audio transients or quick changes in amplitude in the resulting file to locate drum hits, bass notes, or other indicators of downbeats. By detecting a first indicator and a second indicator, a tempo can be established based on the time between the detected beats. The tempo may be further verified by detecting third, fourth, and other indicators that fall on expected beats of a measure. In another such embodiment, audio analyzer 214 may process the audio file to determine a tonic key or tonal center and mode of the music. For example, in one such embodiment, audio analyzer 214 may detect repeated frequencies on downbeats within a bass line to locate a tonic, and then determine mode based on the presence or absence of notes at major or minor intervals from the tonic. In yet another embodiment, audio analyzer 214 may include functionality for detecting style of a musical work, for example whether it is a pop song, dominated by drums and guitars, or a classical orchestral work with strings and woodwinds, or any other type of work.
In other embodiments, a user may upload an audio file as a MIDI file, which may reduce the need for the above-discussed harmonic analysis by audio analyzer 214. In some embodiments, audio analyzer 214 may comprise functionality for converting an uploaded audio file into a MIDI file. In many embodiments, audio analyzer 214 may analyze the uploaded or converted MIDI file for style, instrumentality, tempo, meter, key, mode, phrasing, or other musical features. In one embodiment, audio analyzer 214 may analyze the file for the amount and frequency of repetition of a theme or jingle. In another embodiment, audio analyzer 214 may analyzer the MIDI file to determine pattern or phrase length. In some embodiments, audio analyzer 214 may identify a time signature of the MIDI file. In other embodiments, audio analyzer 214 may analyze a drum track of the MIDI file to determine a style, such as a bossa nova, foxtrot, march, swing or other syncopated or non- syncopated style that may not be readily apparent from the time signature alone.
In one embodiment, having analyzed a song, audio analyzer 214 may store a data file in band storage 216 that includes parameters for recreating the style, tempo, key, or other features of a work. Band storage 216 may comprise a library, dynamic link library (dll) file, database, flat file, index, registry, or other type or form of accumulation of data for storing collections of parameters, referred to variously as "bands" or "ensembles", for input to music composer 218 for dynamic creation of musical compositions in different styles. In many embodiments, a band within band storage 216 may include an index number or identifier, such that music composer 218 may retrieve band parameters from storage via the index number or identifier. Parameters associated with a band for use in composition may include instrumentation, rhythm, meter, phrase length, and average tempo. In some embodiments, parameters associated with a band may include one or more frequencies of key or chord changes and/or patterns of chord changes. For example, a band may include a parameter indicating that chord changes occur no more often than once per two bars, or may indicate that chord changes occur no less often than once every four bars. In another embodiment, the band may include a parameter indicating that on a chord change, the chord will change to (or stay on, depending on the prior chord) the tonic a first predetermined percentage of the time, the dominant a second predetermined percentage of the time, the subdominant a third predetermined percentage of the time, etc. For example, one band may include parameters indicating that on a chord change, the next chord will be the tonic 50% of the time, the dominant 20% of the time, the subdominant 15% of the time, the submediant 10% of the time, and the supertonic 5% of the time. Thus, in many embodiments, the band may include parameters of percentages for chord changes associated with one or more scale degrees. In a further embodiment, the band may also include parameters of percentages for chord changes associated with augmented or diminished chords. In a still further embodiment, a band may include chord progression parameters that are dependent on the current chord. For example, the predetermined percentage that the next chord may be the tonic may be 25% if the current chord is the tonic, but may be 50% if the current chord is the dominant, or 80% if the current chord includes a subtonic or leading tone.
Music composer 218 may comprise one or more applications, logic, services, functions, routines, or executable instructions of any type or form for dynamically creating music. In many embodiments, music composer 218 may use a data array generated by video analyzer 212 and/or parameters of a band stored in band storage 216 to dynamically create a musical composition. In one embodiment, a user may specify one or more colors in the video to "track", such that music composer 218 uses the probability data arrays corresponding to those colors for inputs, and does not use the arrays for colors that are not tracked. In some embodiments, music composer 218 may comprise functionality for calculating a Markov chain with the third set of data arrays of probabilities discussed above as inputs to the conditional probability distribution. In one embodiment, each next state of the Markov chain may represent a next note for a melody or phrase, or bass or harmony line. In a further embodiment, probabilities in the data array may be distributed to avoid harmonic
dissonances, such as tritones or other not-aesthetically pleasing intervals. In many embodiments, music composer 218 may create a Musical Instrument Digital Interface (MIDI) file based on the input data array and band parameters. In one embodiment, a user may be able to download or retrieve this MIDI file via portal 204 for use in a hardware or software sequencer or other MIDI-compatible hardware or software.
In one embodiment, music composer 218 may create a rhythm for a melodic line based on a probability table in the selected band file. In such an embodiment, the probability table may comprise a Markov table that indicates the probability that a note of a first duration will be followed by a note of a second duration. For example, a band associated with a blues or swing style may include a probability table that indicates that: (i) a long note, such as a quarter note tied to an eighth note, has a high probability of being followed by a short note, such as an eighth note; and (ii) that a short note, such as the eighth note, has a high probability of being followed by a long note, such as another quarter note tied to an eighth note. In other bands, other rhythmic progressions associated with various styles can be created probabilistically in the same manner. In other embodiments, durations of notes may be determined based on the average brightness or intensity of colors within a frame. In many embodiments, durations and start times of notes may be shifted by small increments, such as 1/10th of a second, 1/100th of a second, or one or more beats of a MIDI beat clock. This may be done to give the resulting composed music tracks a less mechanical, more human feel. In some embodiments, volumes for notes may be determined based on the average brightness or intensity of colors within the frame.
In many embodiments, music composer 218 may create pitches for notes of the melodic line by mapping the intensity value of a color in a frame to a chromatic scale of predetermined length, such as 18 notes (an octave and a half of range), or longer or shorter scales. The color used for determining pitch may be one of the colors selected for tracking by the user. In some embodiments in which the user selects a plurality of colors to be tracked, the color used for determining pitch may be a color of the tracked plurality of colors in the frame with the highest intensity. For example, if a user has selected to track the colors blue, green and yellow, and a particular frame includes a first number of pixels mapped to blue with an average intensity of 50%, and a second number of pixels mapped to green with an average intensity of 20%, and a third number of pixels mapped to yellow with an average intensity of 10%, the 50% value associated with the average intensity of the blue pixels may be selected for determining pitch. In this manner, the pitch of a melodic note associated with a frame is dependent on the most intense color in the frame. As discussed above, the intensity of the color is mapped to a chromatic scale. For example, in an embodiment in which the scale is 18 semitones, an intensity of 52% may be mapped to the 10th semitone. In many embodiments, an intensity of 0% may be mapped to no pitch, or silence. Thus, for a black frame (in which no tracked color has an intensity higher than 0%) or a frame that consists only of untracked colors, music composer 218 would not generate any notes in the melodic line. In some embodiments, the mappings of intensities to semitones may be equally spaced, while in other embodiments, the mappings may be varied to reduce or eliminate the likelihood of one or more accidental notes or non-harmonic notes. In many embodiments, the note generated by the mapping may be shaped, transposed, shifted or otherwise altered so that it fits into a predetermined scale or key. For example, in one such embodiment, a melody may be in the key of C major or based off a tonic chord in C major. The intensity of a color for a particular frame may be such that the mapping generates an F-sharp. The F-sharp may then be shifted to either F or G to fit the key or chord. In some embodiments, the shaping may be performed responsive to the previous note. For example, if a previous note was an F, the above-mentioned F# may be shifted to a G to prevent repeating the F. In a further embodiment, these accidental or non-harmonic notes may be allowed if they are passing notes. For example, in one such embodiment, music composer 218 may allow the F# to remain in the melody line, provided the following note is a G.
Further shaping may be performed on the composed melody line. In one
embodiment, music composer 218 may transpose or shift notes to avoid large intervals, or may add passing notes between the intervals. For example, if a generated line includes an interval from a C to the A above the C (an interval of a major sixth), Music composer 218 may either shift the A to a G to reduce the interval to a perfect fifth, or may add passing notes of the major third, perfect fourth, or perfect fifth to split the interval into two or more smaller intervals. In another embodiment, music composer 218 may detect and prevent pitches from being repeated more than a predetermined number of times.
In some embodiments, a second melody line may be similarly composed, using the same or different parameters as defined by the band or ensemble. For example, in one embodiment, the second melody line may use a similar or different rhythm probability table. In another embodiment, the second melody line may use a probability table or intensity mapping that reduces the resulting composed pitches to just the root, third, fifth or octave of the chord. In some embodiments, shaping may be performed on the second melody line to add passing notes (such as been the third and fifth of the chord); eliminate parallel motion with the first melody line; eliminate intervals with the first melody line greater than a >' predetermined size, such as an octave or a tenth; or perform any other shaping rules.
In some embodiments, music composer 218 may create a bass line based on the color appearing in a frame. In one embodiment, the video analyzer may identify the total number of frames in the video or visual work in which a predetermined color is the dominant color in the frame, and may order the predetermined colors responsive to the identified number. The predetermined color that dominates the highest number of frames may be mapped to a chord ~ : - that should occur most frequently according to the band or ensemble style. The ' predetermined color that dominates the next highest number of frames may be mapped to the chord that should occur second most frequently according to the band or ensemble style.
Each predetermined color may thus be mapped to a corresponding chord in order of frequency of the color in the video or visual work and order of frequency of the
corresponding chord. For example, in many styles, the tonic chord will occur most often, followed by the dominant chord, then the subdominant chord, then other chords. If video analyzer determines that blue is the dominant color in a frame most often in the work, followed by green, then red, these colors may be mapped respectively to the tonic, dominant and subdominant chords.
To generate the bass line, in one embodiment, the music composer 218 may determine when a chord change should occur, based on the band or ensemble parameters as discussed above. The music composer may identify a dominant color in a frame of the video or visual work corresponding to the determined time. The music composer 218 may determine the resulting next chord of the chord change responsive to the chord mapped to the identified dominant color, as discussed above. Music composer 218 may then compose a bass line corresponding to the chord, based on parameters of the band or ensemble. For example, in many embodiments, the composed bass line may include quarter notes of the tonic note followed by the dominant, in a I-V-I-V pattern. In other embodiments, the composed bass line may include the mediant (III) or other passing notes, or may consist of just the tonic of the chord. In some embodiments, if a single color dominates the visual work for an extended period of time, such that music composer 218 would normally stay on one chord for the period of time, music composer 218 may execute one or more rules to select a different chord during a chord change. For example, in a video dominated by one color, music composer 218 may determine multiple chord change occurrences, but select the same chord each time due to the video being dominated by that color. To prevent this, in some embodiments, music composer 218 may switch to a different chord, such as the tonic or the dominant for at least one of the chord changes.
In many embodiments, music composer 218 may compose a drum track and/or an FX track. In one embodiment, the FX track may be a drone on the tonic of the determined chord at that time, which may aid in establishing a tonal center and allow for more variation in the bass line. In some embodiments, drum tracks may be based on rhythmic probability patterns, similar to those discussed above in reference to the duration of notes in the melodic lines. In other embodiments, a band or ensemble may include multiple pre-composed bars of drum patterns. Music composer 218 may compose the drum track by selecting a first pre- composed pattern and appending one or more other pre-composed patterns. Each successive pattern may be selected based on a probability table. For example, a rock band may include a first drum pattern of a basic kick- snare-hat pattern, a second drum pattern that incorporates a cymbal crash, a third drum pattern that includes a double-kick or snare fill, and a fourth drum pattern that includes a torn roll. The band may also include a probability table that indicates that the music composer 218 should select the basic pattern 75% of the time, the double-kick or snare fill pattern 15% of the time, and the crash and torn roll patterns 5% of the time. In some embodiments, the probability table may be static probabilities, while in other embodiments, the probability table may comprise a Markov table in which probabilities for the next pattern are dependent on the current pattern. This may be done, for example, to avoid repeatedly selecting a fill pattern. In other embodiments, a static table may be used, but the resulting drum track may be shaped or processed by the music composer to eliminate repeated selections of fill patterns.
In some embodiments, server 106 may further include an audio generator 220. In one embodiment, audio generator 220 may comprise a MIDI-compatible hardware or software sampler and playback engine for creating an audio output responsive to an input MIDI file. Samplers may include one or more high-quality recordings of one or more musical instruments. These recordings may include the instruments playing one or more notes, at one or more volumes for playback according to MIDI note and velocity instructions. By including recordings of the instrument at multiple volumes, different timbral details of the instrument may emerge, allowing for more realistic and emotional playback of musical scores. In some embodiments, audio generator 220 may utilize default instruments, dependent on the band or ensemble selected by the user during composition. In another embodiment, the user may vary or select a different one or more instruments, and audio generator 220 may use recordings of these different one or more instruments in generating an audio file. In some embodiments, the audio output of the sampler and/or playback engine may be captured as an audio file in one or more formats. For example, the output may be stored as a WAV file, an MPEG audio file, including MP3 or AAC, an AIFF file, or a losslessly compressed file, including an Apple Lossless Encoding (ALE) encoded file, or may be stored in any other type and form of audio file.
In some embodiments, server 106 may further include an audio/video resynchronizer 222. Audio/video resynchronizer 222 may comprise one or more applications, logic, functions, services, routines, or executable logic for combining an uploaded video file with a generated audio file, and storing the result as a multimedia file with both video and audio.
The above-discussed server 106 can be easily scaled via a distributed server cloud. Shown in FIG. 2B is a block diagram of an embodiment of a system for using a distributed cloud service to dynamically create musical compositions. Multiple customers may simultaneously or concurrently use the analysis, composition, and storage features provided by a system or portion thereof executing via the cloud service. One or more portals 204 executing on one or more servers serving as gateways or intermediaries control
communications redirection, routing and load balancing features for the cloud. In many embodiments, a server cloud may comprise one or more servers configured to perform these functions, and may be accessed by clients via a wide area network, as discussed above. In some embodiments, a portal 204 may receive audio or video files from one customer 102a and from another customer 102b simultaneously. The portal 204 may then direct these files to one or more video analyzers 212, one or more music composers 218, one or more audio generators 220, or other tools or clients not illustrated, which may be provided by one or more servers in the cloud. In some embodiments, portal 204 may direct files for analysis or composition based on load balancing requirements, including server or process CPU utilization, idle times, or memory or other resource requirements. In other embodiments, portal 204 may direct files or client requests based on different functions performed by video analyzers or music composers. For example, analysis of high resolution videos may be sent to a first video analyzer with capability for downscaling or high-resolution processing, while analysis of low resolution videos may be sent to a second video analyzer without such capability. In some embodiments, a single video or audio file may be sent to multiple analyzers or tools for concurrent or parallel analysis or processing. In some embodiments, a video file may be returned to portal 204 from a first server and redirected to a second server for further analysis or processing. In many embodiments, video and audio files from various customers 102a and 102b may be tagged or identified with session identifiers or user identifiers such that upon receiving them from various servers in the cloud, portal 204 may properly direct them to the corresponding customers' systems.
Although denoted in FIG. 2B by service type, as discussed above, video analyzers 212, music composers 218, audio generators 220, audio/video resynchronizer 222, and other clients and tools not illustrated may be provided by one or more servers in the cloud.
Accordingly, in some embodiments, one or more servers in the cloud may execute a tool, such as an video analyzer or music composer, responsive to a request by portal 204 or any other remote client. Thus, instances of these tools may be dynamically established as needed.
In many embodiments, portal 204 may comprise a Software as a Service (SaaS) deployment model. In such embodiments, execution of analysis, composition and generation tools may be transparent to customers and clients. Also, as shown, in some embodiments, portal 204 may further comprise functionality for security 206, billing 208, and storage 210 and 216.
Referring to FIG. 3, illustrated is a block diagram of a screenshot of an embodiment of a video analysis application 300. In brief overview, the video analysis application 300 may include one or more color timelines 302 associated with one or more colors 304. The video analysis application 300 may also include a playback window 306 with a video display 308. Although shown in shaded textures, it should be understood by one of skill in the art that a color video may include pixels of one or more colors. Furthermore, although objects in the block diagram of video display 308 are shown with homogenous shading, it should be understood that color videos may show variation of colors across objects, such as gradients in color and brightness across a sky, or changes in color of a cloud depending on density.
Still referring to FIG. 3, in many embodiments, the first set of data arrays
corresponding to pixels mapped to one or more colors created by a video analyzer 212 discussed above may be shown along one or more timelines 302. These timelines may show time in the video across a horizontal axis, and number of pixels mapped to a color in a frame at a particular time along a vertical axis. As shown in FIG. 3, in many embodiments, multiple timelines will show sudden increases or decreases in the number of pixels with a specific color at scene or camera changes. For example, if a first scene shows a blue sky, the last frame of the scene may include a large number of pixels mapped to blue. If the next scene shows a forest, the first frame of the new scene may include a large number of pixels mapped to green. By detecting these changes, as discussed above, music compositions may be dynamically created with phrasing that corresponds to changes in the associated video.
In many embodiments, including analysis of a video via a cloud service, the application windows shown in FIG. 3 may not be displayed for a user. Instead, in these embodiments, the user may utilize the service transparently in a simplified, non-interactive process: the user may upload a video; the user may upload audio or select a band for determining a style; and the user may download a MIDI file or generated audio file of the dynamically created work, with analysis and composition occurring transparently to the user. In other embodiments, the service may provide the application windows shown in FIG. 3 or similar windows to the user, and the user may interact with the analysis, composition and generation tools. For example, in one such embodiment, the user may select different colors in the video to track for musical composition and recompose the score. In another such embodiment, the user may be able to vary the tempo of the composed piece, either overall or at one or more points in the timeline. In another such embodiment, the user may be able to vary the instrumentation of the composed music, either overall or at one or more points in the timeline. In yet another such embodiment, the user may be able to make dynamic changes on individual instruments, tracks, or melodic or harmonic lines. For example, the user may be able to increase or decrease the volume of a track along the timeline, mute or solo the track, fade in or fade out, or make other stylistic changes.
Referring briefly to FIG. 4, illustrated is a block diagram of a screenshot of an embodiment of a web page 400 hosted by a portal 204. In some embodiments, a web page 400 may include links to one or more audio or multimedia samples or tutorials 402, and a link to a login or start page 404. In many embodiments, web page 400 may include an overview of the analysis and composition process, as discussed above.
Referring now to FIG. 5, illustrated is a flow chart of an embodiment of a distributed method of dynamically creating a musical composition for a video work. In brief overview, at step 500, a user may input a video to be analyzed. At step 505, the video may be analyzed for color content and/or color position over time. At step 510, the user may input an audio file for analysis or select a pre-analyzed band. At step 515, a score may be composed based on the analyzed video and the selected band or analyzed audio file. At step 520, an audio file may be created from the composed score. At step 525, the audio and video may be synchronized into a multimedia file.
Still referring to FIG. 5 and in more detail, at step 500, a user may input a video file to be analyzed. In some embodiments, the user may upload a video file to a portal 204 or server 106 as discussed above, while in other embodiments, the user may specify a previously uploaded video file.
At step 505, the video may be analyzed to create one or more data arrays, as discussed above. In some embodiments, such as the transparent processing method described above, data arrays representative of a number of color pixels in a frame over time and/or position of color pixels in a frame may be discarded and only a data array representative of the probability of a neighboring pixel of the same color existing may be retained. In other embodiments, one or more data arrays representative of the number of pixels of
predetermined colors in each frame over time may be displayed to the user as color timelines, as discussed above in connection with FIG. 3.
At step 510, in some embodiments, the user may input an audio file for analysis or may select a previously input audio file. As discussed above, the audio file may be analyzed for style, key, mode, timbre, and other characteristics or parameters for generating a score. In one embodiment, the audio file may be converted to a MIDI file prior to analysis, and analysis may be performed on the resulting MIDI file. In another embodiment, the user may input a MIDI file for analysis. In yet other embodiments, the user may select a pre-analyzed or generated band.
At step 515, a score may be dynamically composed based on the selected or input band parameters and the color neighbor probability arrays discussed above. In some embodiments, a first one or more color probability arrays may be used to generate a musical line, such as a bass line, and a second one or more color probability arrays may be used to generate another musical line, such as a melody. In one embodiment, a probability range may be divided into eighteen equal regions, which may correspond to a tone in an eighteen- semitone (one and a half octave) musical scale. In another embodiment, the probability range may be divided into a fewer or greater number of regions to correspond to a greater or fewer number of semi-tones. In yet another embodiment, the probability range may be divided into unequal regions. For example, in one embodiment, ranges for some semitones may be reduced to zero, such that these semitones will not be included in the resulting musical composition. This may be done to restrict the composition to certain tonal modes. Thus, in many embodiments, the probability that a pixel of a certain color is next to a second pixel of the same color in a frame may be mapped to a tone within the eighteen-semitone musical scale, and thus create a musical line that changes as color content of the video changes over time.
In some embodiments, at step 515, a user may edit the composed score. In some embodiments, the user may edit composed passages, reselect instruments, add tempo changes for rubato, phrasing and other stylistic elements. In one embodiment, the user may edit the input parameters for the composer, and recompose the score based on the edited input parameters.
At step 520, in some embodiments, an audio file may be generated from the dynamically created score. As discussed above, in many embodiments, the musical score may be input into a sampler or sequencer for playback with one or more sampled or generated instruments. The output of the sampler or sequencer may be captured as an audio file. In one embodiment, step 520 may be performed responsive to receiving an indication that a user has paid or accepted a billing charge for generating the audio file. In other embodiments, step 520 may be skipped. For example, in one such embodiment, a user may download the composed score without generating an audio file.
At step 525, in some embodiments, the generated audio file and uploaded or selected video file may be synchronized. In many embodiments, synchronizing the files may comprise interlacing an audio and video file and generating a file header in a multimedia protocol. In other embodiments, synchronizing the files may comprise appending an audio file and video file and generating a file header in a multimedia protocol. In these embodiments, the user may be provided with a multimedia file including both the video and audio, or may be provided with a link to download the multimedia file. In one embodiment, step 525 may be performed responsive to receiving an indication that a user has paid or accepted a billing charge for synchronizing the audio and video file. In other embodiments, step 525 may be skipped and a user may download the generated audio file.
While various embodiments of the methods and systems have been described, these embodiments are exemplary and in no way limit the scope of the described methods or systems. Those having skill in the relevant art can effect changes to form and details of the described methods and systems without departing from the broadest scope of the described methods and systems. Thus, the scope of the methods and systems described herein should not be limited by any of the exemplary embodiments and should be defined in accordance with the accompanying claims and their equivalents.
