MULTISOURCE MEDIA REMIXING

Title:

MULTISOURCE MEDIA REMIXING

Document Type and Number:

WIPO Patent Application WO/2014/037604

Kind Code:

Abstract:

The present embodiments relate to a method and to a technical equipment for implementing the method. The method comprises receiving media content from multiple recording devices; creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices; receiving a request to provide remixed media from a requesting device; determining a location of said requesting device; sending remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; providing remixed media relating to the received motion information.

More Like This:

JP3258723	MULTI-VISION SYSTEM
JPH0575945	TELEVISION RECEIVER
JPH06334935	VIDEO COMPRESSING CIRCUIT

Inventors:

SATHISH SAILESH (FI)
MATE SUJEET SHYAMSUNDAR (FI)

Application Number:

PCT/FI2012/050867

Publication Date:

March 13, 2014

Filing Date:

September 07, 2012

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA CORP (FI)
SATHISH SAILESH (FI)
MATE SUJEET SHYAMSUNDAR (FI)

International Classes:

H04N5/265; G11B27/031; H04N5/262; H04N21/8549

Domestic Patent References:

WO2010119181A1

2010-10-21

Foreign References:

US20090196570A1	2009-08-06
US20060251382A1	2006-11-09
EP2151770A1	2010-02-10

Other References:

CRICRI, F. ET AL.: "Sensor-based analysis of user generated video for multi-camera video remixing", INT. CONF. ON ADVANCES IN MULTIMEDIA MODELING, 4 January 2012 (2012-01-04), KLAGENFURT, AUSTRIA, pages 255 - 265, XP019171197
CRICRI, F. ET AL.: "Multimodal event detection in user generated videos", IEEE INT. SYMP. ON MULTIMEDIA, 5 December 2011 (2011-12-05), DANA POINT, CALIFORNIA, USA, pages 263 - 270, XP032090737, DOI: doi:10.1109/ISM.2011.49

Attorney, Agent or Firm:

Nokia Corporation et al. (Jussi JaatinenKeilalahdentie 4, Espoo, FI)

Download PDF:

View/Download PDF PDF Help

Claims:

1 . A method, comprising

- receiving media content from multiple recording devices;

- creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices ;

- receiving a request to provide remixed media from a requesting device;

- determining a location of said requesting device;

- sending remixed media comprising media remixes for multiple directions from said determined location;

- and as a response to received motion information from said requesting device;

- providing remixed media relating to the received motion information. 2. The method according to claim 1 , further comprising synchronizing the created media remixes based on a common factor.

3. The method according to claim 1 or 2, wherein a media remix comprises direction information, a vantage point for the content and timing information.

4. The method according to any of the previous claims 1 to 3, further comprising creating a position indexed list with corresponding media remix.

5. The method according to claim 4, further comprising sending the position indexed list to the requesting device.

6. The method according to any of the previous claims 1 to 5, further comprising sending a complete set of media remixes corresponding to one or more viewpoint requests to the requesting device.

7. The method according to any of the previous claims 1 to 6, further comprising predicting motion information of the requesting device.

8. An apparatus, comprising

a processor configured to

- receive media content from multiple recording devices;

- create remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices;

- receive a request to provide remixed media from a requesting device; - determine a location of said requesting device;

- send remixed media comprising media remixes for multiple directions from said determined location;

- and as a response to received motion information from said requesting device;

- to provide remixed media relating to the received motion information.

9. An apparatus, comprising:

at least one processor; and

at least one memory including computer program code

the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

- receiving media content from multiple recording devices;

- creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices;

- receiving a request to provide remixed media from a requesting device;

- determining a location of said requesting device;

- ending remixed media comprising media remixes for multiple directions from said determined location;

- and as a response to received motion information from said requesting device;

- providing remixed media relating to the received motion information.

10. The apparatus according to claim 9, further comprising computer program code configured to, with the processor, cause the apparatus to perform at least the following:

- synchronizing the created media remixes based on a common factor.

1 1 . The apparatus according to claim 9 or 10, wherein a media remix comprises direction information, a vantage point for the content and timing information.

12. The apparatus according to any of the previous claims 9 to 1 1 , further comprising computer program code configured to, with the processor, cause the apparatus to perform at least the following:

- creating a position indexed list with corresponding media remix.

13. The apparatus according to claim 12, further comprising computer program code configured to, with the processor, cause the apparatus to perform at least the following:

- sending the position indexed list to the requesting device.

14. The apparatus according to any of the previous claims 9 to 13, further comprising computer program code configured to, with the processor, cause the apparatus to perform at least the following:

- sending a complete set of media remixes corresponding to one or more viewpoint requests to the requesting device.

15. The apparatus according to any of the previous claims 9 to 14, further comprising computer program code configured to, with the processor, cause the apparatus to perform at least the following:

- predicting motion information for the requesting device.

16. A computer program, comprising:

- code for receiving media content from multiple recording devices;

- code for remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices;

- code for receiving a request to provide remixed media from a requesting device;

- code for determining a location of said requesting device;

- code for sending remixed media comprising media remixes for multiple directions from said determined location;

- and as a response to received motion information from said requesting device;

- code for providing remixed media relating to the received motion information,

when the computer program is run on a processor.

17. The computer program according to claim 16, wherein the computer program is a computer program product comprising a computer-readable medium bearing computer program code embodied therein for use with a computer.

18. A computer-readable medium encoded with instructions that, when executed by a computer, perform:

- receiving media content from multiple recording devices;

- creating remixed media of the received media content, which remixed media comprises media remixes to multiple direction from a location of two or more recording devices;

- receiving a request to provide remixed media from a requesting device;

- determining a location of said requesting device;

- sending remixed media comprising media remixes for multiple directions from said determined location;

- and as a response to received motion information from said requesting device; - providing remixed media relating to the received motion information.

Description:

MULTISOURCE MEDIA REMIXING

Technical field The present application relates generally to media remixing. Background

Multimedia capturing capabilities have become common features in portable devices. Thus, many people tend to record or capture an event, such as a music concert or a sport event, they are attending.

Media remixing is an application where multiple media recordings are combined in order to obtain a media mix that contains some segments selected from the plurality of media recordings. Video remixing, as such, is one of the basic manual video editing applications, for which various software products and services are already available. Some video remixing systems depend only on the recorded content, while others are capable of utilizing environmental context data that is recorded together with the video content. The context data may be, for example, sensor data received from a compass, an accelerometer, or a gyroscope, or global positioning system (GPS) location data.

Summary

Now there has been invented an improved method and technical equipment implementing the method, by means of which users can interactively create a view to a remix of a media from a particular point of interest, which media has been recorded by multiple recorders

Various aspects of the invention include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments of the invention are disclosed in the dependent claims.

According to a first aspect, a method comprises receiving media content from multiple recording devices; creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices; receiving a request to provide remixed media from a requesting device; determining a location of said requesting device; sending remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; providing remixed media relating to the received motion information.

According to a second aspect, an apparatus comprises a processor configured to receive media content from multiple recording devices; create remixed media of the received media content, which remixed media comprises media remixes to multiple direction from a location of two or more recording devices; receive a request to provide remixed media from a requesting device; determine a location of said requesting device; send remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; to provide remixed media relating to the received motion information.

According to a third aspect, an apparatus comprises at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receiving media content from multiple recording devices; creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions form a location of two or more recording devices; receiving a request to provide remixed media from a requesting device; determining a location of said requesting device; sending remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; providing remixed media relating to the received motion information.

According to a fourth aspect, a computer program comprises code for receiving media content from multiple recording devices; code for creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices; code for receiving a request to provide remixed media from a requesting device; code for determining a location of said requesting device; code for sending remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; code for providing remixed media relating to the received motion information, when the computer program is run on a processor.

According to a fifth aspect, a computer-readable medium is encoded with instructions that, when executed by a computer, perform: receiving media content from multiple recording devices; creating remixed media of the received media content, which remixed media comprises media remixes to multiple directions from a location of two or more recording devices; receiving a request to provide remixed media from a requesting device; determining a location of said requesting device; sending remixed media comprising media remixes for multiple directions from said determined location; and as a response to received motion information from said requesting device; providing remixed media relating to the received motion information. According to an embodiment, the created media remixes are synchronized based on a common factor.

According to an embodiment, a media remix comprises direction information, a vantage point for the content and timing information.

According to an embodiment, a position indexed list with corresponding media remix is created.

According to an embodiment, the position indexed list is sent to a requesting device.

According to an embodiment, a complete set of media remixes corresponding to one or more viewpoint requests are sent to the requesting device.

According to an embodiment, motion information for the requesting device is predicted.

Description of the Drawings

In the following, various embodiments of the invention will be described in more detail with reference to the appended drawings, in which

Fig. 1 shows an embodiment of an arrangement for multisource media service;

Fig. 2 shows an embodiment of a recording apparatus;

Fig. 3 shows an embodiment of a server comprising a remixing service;

Fig. 4 shows an embodiment of a venue information with user locations (i.e. vantage points); and

Fig. 5 shows an embodiment of the method as a flowchart.

Detailed Description of the Embodiments

The present embodiments relate to multisource generated media. This means that a plurality of users are recording an event or a point of interest (POI), whereby as many as there are recording users, as many views will be generated. Instead of including all the views from all the perspective or vantage points to a particular remix, the present embodiment provides a solution by means of which a user may interactively select a perspective or a vantage point, around which the remix will be created. Term "remix" relates to a video sequence that has been generated from a multiple video views. Instead of term "remix" also term "cut" or "remixed video" can be used.

As is generally known, many portable devices, such as mobile phones, cameras, and tablets, are provided with high quality cameras, which enable to capture high quality video files and still images. Usually, at events attended by a lot of people, such as live concerts, sport games, political gatherings, and other social events, there are many who record still images and videos using their portable devices. The recorded media content can be transmitted to a specific server configured to perform remixing of such content.

The media content to be used in media remixing services may comprise at least video content including 3D video content, still images (i.e. pictures), and audio content including multi-channel audio content. The embodiments disclosed herein are mainly described from the viewpoint of creating a video remix from video and audio content of source videos, but the embodiments are not limited to video and audio content of source videos, but they can be applied generally to any type of media content. Figure 1 shows a system and devices according to an embodiment. In Fig. 1 , the different devices may be connected via a fixed network 210 such as the Internet or a local area network; or a mobile communication network 220 such as the Global System for Mobile communications (GSM) network, 3rd Generation (3G) network, 3.5th Generation (3.5G) network, 4th Generation (4G) network, Wireless Local Area Network (WLAN), Bluetooth®, or other contemporary and future networks. Different networks are connected to each other by means of a communication interface 280. The networks comprise network elements such as routers and switches to handle data (not shown), and communication interfaces such as the base stations 230 and 231 in order for providing access for the different devices to the network, and the base stations 230, 231 are themselves connected to the mobile network 220 via a fixed connection 276 or a wireless connection 277.

There may be a number of servers connected to the network, and in the example of Fig. 1 are shown servers 240, 241 and 242, each connected to the mobile network 220, which servers may be arranged to operate as computing nodes (i.e. to form a cluster of computing nodes or a so-called server farm) for the video remixing service. Some of the above devices, for example the computers 240, 241 , 242 may be such that they are arranged to make up a connection to the Internet with the communication elements residing in the fixed network 210. There are also a number of end-user devices such as mobile phones and smart phones 251 , Internet access devices (Internet tablets) 250, personal computers 260 of various sizes and formats, televisions and other viewing devices 261 , video decoders and players 262, as well as video cameras 263 and other encoders. These devices 250, 251 , 260, 261 , 262 and 263 can also be made of multiple parts. The various devices may be connected to the networks 210 and 220 via communication connections such as a fixed connection 270, 271 , 272 and 280 to the internet, a wireless connection 273 to the internet 210, a fixed connection 275 to the mobile network 220, and a wireless connection 278, 279 and 282 to the mobile network 220. The connections 271 -282 are implemented by means of communication interfaces at the respective ends of the communication connection.

Figures 2 and 3 show devices for video remixing according to an example embodiment. As shown in Fig. 3, the server 240 contains memory 245, one or more processors 246, 247, and computer program code 248 residing in the memory 245 for implementing, for example, video remixing. The different servers 241 , 242 of Fig. 1 may contain at least these elements for employing functionality relevant to each server.

Similarly, the end-user device 151 shown in Figure 2 contains memory 152, at least one processor 153 and 156, and computer program code 154 residing in the memory 152 for implementing client-side functions for remixing video. The end-user device may also have one or more cameras 155 and 159 for capturing image data, for example stereo video. The end-user device may also contain one, two or more microphones 157 and 158 for capturing sound. The apparatus may also contain sensor for generating sensor data relating to the apparatus' relationship to the surroundings. The apparatus may also comprise a display 160 for viewing single-view, stereoscopic (2-view) or multiview (more-than-2-view) images. The display 160 may be extended at least partly on the back cover of the apparatus. The apparatus 151 may also comprise an interface means (e.g. a user interface) which allows a user to interact with the apparatus. The user interface means may be implemented using the display 160, a keypad 161 , voice control, or other structures. The apparatus may also be connected to another device e.g. by means of a communication block (not shown in Fig. 1 ) able to receive and/or transmit information. It needs to be understood that different embodiments of the apparatus allow different parts to be carried out in different elements. The different end-user devices 150, 160 of Figure 1 may contain at least these same elements for employing functionality relevant to each device.

It needs to be understood that different embodiments allow different parts to be carried out in different elements. The elements of the video remixing process may be implemented as a software component residing on one device or distributed across several devices, for example so that the devices form a so-called cloud. A video remix can be created according to the preferences of a user. The source content refers to all types of media that is captured by users, wherein the source content may involve any associated context data. For example, videos, images, audio captured by users may be provided with context data, such as information from various sensors, such as from a compass, an accelerometer, a gyroscope, or information indicating location, altitude, temperature, illumination, pressure, etc. A particular sub-type of source content is a source video, which refers to videos captured by the user, possibly provided with the above- mentioned context information. Any user can request from the video remix service a created video remix version from the material available for the service about an event, such as a concert.

An embodiment of the server-side method is shown as a flowchart in Figure 4. For a multisource media remixing, the remixing application (also called as "service") being located on a server receives (410) media content from a plurality of recording client devices (451 ) which media content relates to the same event. When the remixing service receives content from multiple users, which content relates to a certain common event, the remixing service first generates (420) remixed media comprising multiple remix videos from all vantage points (each relating to one user) and multiple (i.e. all possible) directions of vantage points. The remixing service is also configured to time synchronize (430) all these remixed videos based on some common factor (such as an audio track, time information, space information (temporal and spatial synchronization), objects present in media, network based time synchronization, etc.). For constructing the remixed video, the server uses venue information as shown in Figure 5 illustrating an embodiment of a map of the venue being divided into a 3 ^*3 grid. The circles shown at vertex intersections denote a user vantage point, i.e. user positions.

Any user who has contributed content to the service, said content relating to the certain event, may request (Fig 4: 440) the service to provide remixed media for the event, which remixed media comprises recorded media also from other users. In some embodiments, any user even if not having contributed content to the service, can request the service to provide remixed media for a certain event. This means, that any device capable of rendering the media are able to request remixed media. In other embodiment, a user having provided content to the service with a recording device, can request content with a viewing device being different from the recording device. At the time the user has authenticated to the service in order to request the content, the service first analyses the content the user has submitted (if so) and determines user's position at a certain resolution. The determined position is user's vantage point (VP), such as the vantage point 510 of Figure 5 which is now taken into consideration. In this embodiment, the remixing service creates (Fig 4: 450) four remixed videos for this particular vantage point 510, where each remixed video corresponds to the arrow direction shown in Figure 5. Therefore, each remix being created will have a directional aspect, a vantage point coordinate and timing information for each segment (i.e. frame). Thus the remixing service creates a number of remixed videos with users positions in a certain resolution which is dependent on the map resolutions, user position resolutions and service settings. It is appreciated that user positions need not to be computed at the intersection points as shown in figure, but can be determined in any other location of the grid. In addition, it is appreciated that this embodiments shows four directions to which video remixes are created. This is not necessarily the case in all embodiments. The number of directions may vary depending on the situation.

A first level remix (i.e. cut) being created (Fig 4: 450) by the remixing application with respect to the vantage point looks like the user was viewing different views of the event from his/her own vantage point.

The remixing application service sends (460) the generated content (i.e. first level remix) to the requesting client device along with the signaling information containing information about the vantage point, the event venue map, stage types etc. The signaling information can be included in a separate XML file showing segment identifications corresponding to media. However, it is appreciated that the information may be provided to the content in some other way.

The client device is configured to render the video content received from the service. The video content may be rendered while the view capture area (VA) is provided by the server. The view capture area comprises the event area map and the user position during video recording (i.e. the vantage point determined by the server by analyzing the metadata of the video provided to arrive at a user position estimation on the event area map) on the map. The view capture area may be touch-enabled area which can be controlled by the user by touch. The touch motion allows the user to dynamically change the current view capture area using certain motion patterns generated through finger touch such as a line motion, a zig-zag motion, a curvy motion etc. Changing the view capture area reflects also to the change of user's vantage point, whereby it will look like the user was viewing different views of the event from someone's else vantage point.

The motion performed by the user on the screen over the view capture area is streamed via signaling messages to the server (470). According to an embodiment, a particular motion type may be identified by the client and the motion pattern along with the direction information is sent to the server. The server will then send (480) - depending on the motion pattern - a position indexed list and corresponding video remixes, which video remixes the client can display depending on an index match between the finger position on the map and the indexed segments. This allows the user to zoom or view different view types of edited video clips from different vantage points corresponding to a motion produced by the user. The indexed list comprises vantage points of other users and corresponding video remixes. The metadata of each video remix defines the corresponding position (i.e. vantage point) of another user. When a viewing user performs a movement on the screen, a rendering element of the device of the viewing user will match the user's finger position defining the movement with the correct video remix provided in the index.

The motion action by the user may be construed independently from the media generation framework. In addition, any action or interaction that can generate a vantage point can be used for this purpose. Examples of such actions are waving the device to another side, forward, backward motion of the device, using an external mouse, etc. Depending on the speed of action, client may indicate more than one vantage point in one request to be sent to the remixing server. The server may use these one or more vantage points to predict the user's action so as to generate the position indexed list of media segments keeping in mind where the user may go nest. As said earlier, the server may send all four views (four remixes) from the vantage point of the user, or the entire set or a certain set within a fixed boundary from the vantage point of the user. The vantage point of the user is the default vantage point. Then, based on user interaction, the server will send further packages of indexed remixed videos back to the client. The client will map those indexes with event map information that is displayed in a normalized manner (so that it fits to the display size) on the user screen. The client may continually stream user indication on the screen to the server or in certain embodiment, to provide a predicted pattern and direction with respect to a certain vantage point to the server. The client device can always modify the set pattern if the user is found to deviate from the same. According to an embodiment, the server may provide the entire position indexed content (contrary to the position indexed list only for a certain vantage points of the movement) comprising all the recordings from multiple users straight to the device upon request of a remixed media. These indexed remixed video segments may be pre-created and stored on server side. In such a case, the server may also send a default vantage point pertaining to the actual position of the user to provide a starting point to the user to view the content.

The server may also employ a predictor module that predicts user direction and motion pattern. Accordingly, the server can send an indexed list of remixed videos to the client. The client device may then match the index with the user indication on the map. One significant feature of this type of remix generation is that each created remix with a vantage point and direction is taken from video content from user who were present within the vicinity of that vantage point or though popular depth estimation algorithms, use selected media segments from zoomed in or zoomed out versions of video taken from users at other vantage point's. Here, the remixed videos will also contain segments related to that particular direction only. This means that it differs from conventional video remix where the direction of remix segments will also change without giving any importance to vantage points.

The various embodiments may provide advantages. For example, users may control selection of video remix units. The present embodiments also support multiview with respect to user vantage points and makes it possible to dynamically personalize views. The present embodiments are also efficient and fast due to pre-indexing on server. Further, the predictive models allow to preload segments. In addition, client logic together with the index allows quick changes to views without pre-fetching. Multiple views are also enabled to single view point.

The various embodiments of the invention can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.

It is obvious that the present invention is not limited solely to the above-presented embodiments, but it can be modified within the scope of the appended claims.

Previous Patent: AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR IMAGE PROCESSING

Next Patent: AN INTERNAL COMBUSTION PISTON ENGINE AND METHOD OF OPERATING AN INTERNAL COMBUSTION PISTON ENGINE