Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MOOD BASED MULTIMEDIA CONTENT SUMMARIZATION
Document Type and Number:
WIPO Patent Application WO/2021/058116
Kind Code:
A1
Abstract:
Presented herein are method and systems for generating a mood based summary video for a full-length movie, comprising receiving a full-length movie, receiving a Knowledge Graph (KG) annotated model of the full-length movie generated by annotating features extracted for the full-length movie, segmenting the full-length movie to a plurality of mood based time intervals each expressing a certain dominant mood based on an analysis of the KG, computing a score for each of the plurality of mood based time intervals according to one or more of a plurality of metrics expressing a relevance level of the respective mood based time interval to a narrative of the full-length movie, generating a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold; and outputting the mood based summary video for presentation to one or more users.

Inventors:
CHOWDHURY TARIK (DE)
TANG JIAN (DE)
O’SULLIVAN DECLAN (IE)
CONLAN OWEN (IE)
DEBATTISTA JEREMY (IE)
ORLANDI FABRIZIO (IE)
LATIFI MAJID (IE)
NICHOLSON MATTHEW (IE)
HASSAN ISLAM (IE)
MCCABE KILLIAN (IE)
MCKIBBEN DECLAN (IE)
TURNER DANIEL (IE)
Application Number:
EP2019/076266
Publication Date:
April 01, 2021
Filing Date:
September 27, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
CHOWDHURY TARIK (DE)
International Classes:
G11B27/034; G11B27/10; G11B27/28
Foreign References:
US20080187231A12008-08-07
US20150139606A12015-05-21
US20190034428A12019-01-31
US20160211001A12016-07-21
Other References:
HANJALIC A ET AL: "Affective Video Content Representation and Modeling", IEEE TRANSACTIONS ON MULTIMEDIA, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 7, no. 1, 1 February 2005 (2005-02-01), pages 143 - 154, XP011125470, ISSN: 1520-9210, DOI: 10.1109/TMM.2004.840618
Attorney, Agent or Firm:
KREUZ, Georg (Riesstr. 25, Munich, DE)
Download PDF:
Claims:
CLAIMS

1. A method of generating a mood based summary video for a full-length movie, comprising: receiving a full-length movie; receiving a knowledge graph, KG, created for the full-length movie, the KG comprises an annotated model of the full-length movie generated by annotating features extracted for the full-length movie; segmenting the full-length movie to a plurality of mood based time intervals each expressing a certain dominant mood based on an analysis of the KG; computing a score for each of the plurality of mood based time intervals according to at least one of a plurality of metrics expressing a relevance level of the respective mood based time interval to a narrative of the full-length movie; generating a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold; and outputting the mood based summary video for presentation to at least one user.

2. The method of claim 1, wherein each of the annotated features in the KG annotated model is associated with a timestamp, indicating a temporal location of the respective feature in a timeline of the full-length movie.

3. The method of claim 1, wherein segmenting the full-length movie to the plurality of mood based time intervals is done according to the analysis of the KG according to at least one feature expressing at least one of: a background music, a semantic content of speech, a mood indicative facial expression of a character and a mood indicative gesture of a character.

4. The method of claim 1, wherein the plurality of metrics comprising: a number of main characters appearing during a certain mood based time interval, a duration of appearance of each main character during the certain mood based time interval and a number of actions relating to each main character during the certain mood based time interval.

5. The method of any one of the preceding claims, further comprising selecting at least some of the mood based time intervals of the subset according to a score of a diversity metrics computed for each of the plurality of mood based time intervals, the diversity metrics expressing a difference of each mood based time interval compared to its adjacent mood based time intervals with respect to at least one interval attribute which is a member of a group consisting of: characters appearing in the mood based time intervals, a dominant mood of the mood based time intervals and actions seen in the mood based time intervals.

6. The method of any one of the preceding claims, further comprising selecting the subset of mood based time intervals according to a time length defined for the mood based summary video.

7. The method of claim 1 , wherein the KG annotated model is created by automatically uplifting features extracted from at least one of: a video content of the full-length movie, an audio content of the full-length movie, a speech content of the full-length movie, at least one subtitles record associated with the full-length movie and a metadata record associated with the full-length movie.

8. The method of claim 7, further comprising the KG annotated model is using at least one manually annotated feature which is extracted for the full-length movie.

9. A system for generating a mood based summary video for a full-length movie, comprising: using at least one processor configured to execute a code, the code comprising: code instructions to receive a full-length movie; code instructions to receive a knowledge graph, KG, created for the full-length movie, the KG comprises an annotated model of the full-length movie generated by annotating features extracted for the full-length movie; code instructions to segment the full-length movie to a plurality of mood based time intervals each expressing a certain dominant mood based on an analysis of the KG; code instructions to compute a score for each of the plurality of mood based time intervals according to at least one of a plurality of metrics expressing a relevance level of the respective mood based time interval to a narrative of the full-length movie; code instructions to generate a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold; and code instructions to output the mood based summary video for presentation to at least one user.

10. A computer readable storage medium comprising computer program code instructions, being executable by a computer, for performing a method according to any of claims 1 to 8 when the computer program code instructions run on a computer.

Description:
MOOD BASED MULTIMEDIA CONTENT SUMMARIZATION

TECHNICAL FIELD

The present invention, in some embodiments thereof, relates to generating summery videos for multimedia content, and, more specifically, but not exclusively, to generating mood based summery videos for multimedia content, specifically for fiill-length movies.

BACKGROUND

Multimedia content, for example, video content, audio content and/or the like is constantly increasing in giant leaps offering endless options for consuming this content.

Much of the multimedia content, for example, movies, television series, television shows and/or the like may be significantly long in their time duration.

Creating summary videos to visually and/or audibly summarize such multimedia content, in particular the full-length movies, multi episode series and/or the like in a significantly shorter tie duration may be therefore highly desirable and beneficial for a plurality of applications, services, purposes and goals. Such applications may include, for example, supporting users to select multimedia to consume, categorization of the multimedia in categories and/or classes (based on genre, narrative, etc.) and/or the like.

However, one of the major challenges in creating these summary videos is to create an efficient summary video which delivers the narrative of the multimedia content, for example, plot, progress, main facts, key moments and/or the like in a concise and coherent manner such that users (spectators) watching the summary video may be able to accurately understand and comprehend the narrative of the multimedia content.

SUMMARY

An objective of the embodiments of the disclosure is to provide a solution which mitigates or solves the drawbacks and problems of conventional solutions. The above and further objectives are solved by the subject matter of the independent claims. Further advantageous embodiments can be found in the dependent claims.

The disclosure aims at providing a solution for creating video summary summarizing multimedia content, in particular full-length movies in a coherent, concise and accurate manner.

According to a first aspect of the present invention there is provided a method of generating a mood based summary video for a full-length movie, comprising:

Receiving a full-length movie.

Receiving a knowledge graph (KG) created for the full-length movie. The KG comprises an annotated model of the foil-length movie generated by annotating features extracted for the foil-length movie.

Segmenting the full-length movie to a plurality of mood based time intervals each expressing a certain dominant mood based on an analysis of the KG.

Computing a score for each of the plurality of mood based time intervals according to one or more of a plurality of metrics expressing a relevance level of the respective mood based time interval to a narrative of the foil-length movie.

Generating a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold.

Outputting the mood based summary video for presentation to one or more users.

According to a second aspect of the present invention there is provided a system for generating a mood based summary video for a foil-length movie, comprising using one or more processors configured to execute a code, the code comprising:

Code instructions to receive a full-length movie.

Code instructions to receive a KG created for the full-length movie. The KG comprises an annotated model of the foil-length movie generated by annotating features extracted for the foil-length movie.

Code instructions to segment the full-length movie to a plurality of mood based time intervals each expressing a certain dominant mood based on an analysis of the KG.

Code instructions to compute a score for each of the plurality of mood based time intervals according to one or more of a plurality of metrics expressing a relevance level of the respective mood based time interval to a narrative of the full-length movie. Code instructions to generate a mood based summary video by concatenating a subset of the plurality of mood based time intervals having a score exceeding a predefined threshold.

Code instructions to output the mood based summary video for presentation to one or more users.

According to a third aspect of the present invention there is provided a computer readable storage medium comprising computer program code instructions, being executable by a computer, for performing the above identified method.

In a further implementation form of the first and/or second aspects, each of the annotated features in the KG annotated model is associated with a timestamp, indicating a temporal location of the respective feature in a timeline of the full-length movie. The timestamps may map each of the features expressed by the KG annotation model to their time of occurrence along the time line of the full-length movie. Mapping the features along the timeline may be essential for accurately associating the features with their time in order to identify the plurality of mood based time intervals and apply the metrics used to compute the scores for each of the mood based time intervals.

In a further implementation form of the first and/or second aspects, segmenting the full-length movie to the plurality of mood based time intervals is done according to the analysis of the KG according to one or more features expressing one or more of: a background music, a semantic content of speech, a mood indicative facial expression of a character and a mood indicative gesture of a character. These features may be highly indicative of the moods expressed in the respective mood based time interval, in particular the dominant mood.

In a further implementation form of the first and/or second aspects, the plurality of metrics comprising: a number of main characters appearing during a certain mood based time interval, a duration of appearance of each main character during the certain mood based time interval and a number of actions relating to each main character during the certain mood based time interval. These metrics may present an accurate and easy to measure method for assessing (estimating) the relevance of each mood based time interval to the narrative of the full-length movie. In an optional implementation form of the first and/or second aspects, at least some of the mood based time intervals of the subset are selected according to a score of a diversity metrics computed for each of the plurality of mood based time intervals, the diversity metrics expressing a difference of each mood based time interval compared to its adjacent mood based time intervals with respect to one or more interval attributes which is a member of a group consisting of: characters appearing in the mood based time intervals, a dominant mood of the mood based time intervals and actions seen in the mood based time intervals. Selecting the mood based time intervals according to the diversity score may lead to selection of a diverse collection of the mood based time intervals which may convey an elaborate, wide and/or comprehensive scope of the narrative. Using the diversity score may further serve to avoid selecting redundant mood based time intervals which may present little and/or insignificant difference compared to other selected mood based time intervals.

In an optional implementation form of the first and/or second aspects, the subset of mood based time intervals is selected according to a time length defined for the mood based summary video. Adjusting the selection of the mood based time intervals according to the predefined duration (length) of the summary video may enable high flexibility in selecting the mood based time intervals which best deliver, present and/or convey the narrative within the time constraints applicable for the summary video.

In a further implementation form of the first and/or second aspects, the KG annotated model is created by automatically uplifting features extracted from one or more of: a video content of the full-length movie, an audio content of the full-length movie, a speech content of the full-length movie, one or more subtitles record associated with the full-length movie and a metadata record associated with the full-length movie. The GK annotation model may be a powerful tool providing highly rich, extensive and precise information describing the full-length movie which may be used for extracting accurate and extensive features for segmenting the full-length movie to the plurality of mood based time intervals, for computing the score for the mood based time intervals and/or the like.

In an optional implementation form of the first and/or second aspects, the KG annotated model is using one or more manually annotated features which is extracted for the full-length movie. Manual annotation may serve to enhance the KG annotated model where automated tools may be somewhat limited. Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.

For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.

In the drawings:

FIG. 1 is a flowchart of an exemplary process of generating a mood based summery video for a full-length movie, according to some embodiments of the present invention;

FIG. 2 is a schematic illustration of an exemplary system for generating a mood based summery video for a full-length movie, according to some embodiments of the present invention;

FIG. 3 is a schematic illustration of an exemplary mood based segmentation of a full- length video, according to some embodiments of the present invention;

FIG. 4 is a chart graph of distribution of experiment scores provided by users presented with mood based summary video to rank their understanding of the narrative of the full-length movie, according to some embodiments of the present invention; and

FIG. 5 presents graph charts of experiment results conducted to evaluate mood based summary videos created for three full-length movies, according to some embodiments of the present invention.

DETAILED DESCRIPTION

The present invention, in some embodiments thereof, relates to generating summery videos for multimedia content, and, more specifically, but not exclusively, to generating mood based summery videos for multimedia content, specifically for full-length movies. Creating summary videos to visually (as opposed to textually) summarize multimedia content, in particular full-length movies may be highly desirable and beneficial for a plurality of applications, services, purposes and goals, for example, supporting users to select multimedia to consume, categorization of the multimedia in categories and/or classes (based on genre, narrative, etc.) and/or the like.

However, in order for the summary video of a full-length movie to be effective, the video summary which may be significantly shorter in (time) duration should be concise and coherent while conveying (delivering) the narrative of the full-length movie, for example, plot, progress, main facts, key moments and/or the like.

The summary video may be therefore created by selecting and concatenating together a plurality of segments (time intervals) extracted from the full-length movie which combined together may have a total duration (length) shorter, for example, approximately 15% to 25% compared to the original full-length movie.

There may be various approaches and concepts for selecting the time intervals from the full-length movie to be included in the summary video. The time intervals should be short enough to provide sufficient granularity and localization thus allowing selection of multiple time intervals which may reliably and accurately convey the narrative of the full- length movie. However, the time intervals should be sufficiently long to create a coherent summary video in which the selected time intervals are logically and effectively connect to each other.

According to some embodiments of the present invention, there are provided methods, systems and computer program products for generating summary videos to visually summarize multimedia content, in particular full-length movies by concatenating together time intervals selected from the original full-length movie which are created based on moods (feelings) expressed in the time intervals, for example, happiness, sadness, depression, anxiety, excitement, joy, anger, rage, kindness, generosity, sorrow, patience and/or the like.

It was demonstrated in research and experiments (as described herein after) that the mood based time intervals which are typically approximately several minutes long are sufficiently short (in duration) to serve as an efficient segmentation (split) unit for segmenting full-length movies to a plurality of high resolution time intervals. However, as demonstrated, the mood based time intervals are sufficiently long to convey a substantial aspect of the full-length movie’s narrative in a reliable, coherent and concise manner.

The mood based summary video is therefore created for the full-length movie by concatenating together a subset of mood based time intervals selected from a plurality of mood based time intervals created by segmenting the full-length movie based on the mood expressed in each of the mood based time intervals. Moreover, since one or more of the mood based time intervals may express multiple moods, the full-length movie may be segmented to the plurality of mood based time intervals according to a dominant mood expressed in each of the mood based time intervals.

Segmenting the full-length video is done based on an analysis of a Knowledge Graph (KG) annotation model created by manually and/or automatically uplifting features extracted from the full-length video. The KG annotated model which is outside the scope of the present invention may be created for the full-length movie by uplifting and annotating features extracted from one or more data sources relating to the full-length movie, for example, the video (visual) content, the audio content, the speech content, a subtitles record associated with the full-length movie, a metadata record associated with the full-length movie, a textual description and/or summary of the full-length movie, an actors list, a characters list and/or the like. The KG annotated model may therefore provide a highly rich, extensive and precise source of information describing the full-length movie, scenes of the full-length movie and/or features extracted from the full-length movie.

The full-length video may be segmented to the mood based time intervals according to one or more mood indicative features extracted from the KG annotation model, for example, a background music, a semantic content of the speech, a mood indicative facial expression of a character, a mood indicative gesture of a character and/or the like.

After the full-length movie is segmented to the plurality of mood based time intervals, the summary video may be created by concatenating together a subset of the mood based time intervals which are selected to bets convey, present and/or deliver the narrative of the full-length movie.

In order to select the mood based time intervals to be included in the video summary a set of metrics was defined to enable evaluating a level of relevance of each mood based time interval to the narrative, specifically, to evaluate a contribution of each mood based time interval to an understanding and comprehension of the narrative by users (spectators) who watch the summary video.

The newly defined metrics may include relevance metrics, for example, a number of main characters appearing during a respective mood based time interval, a duration of appearance of each main character during the respective mood based time interval, a number of actions relating to each main character during the respective mood based time interval and/or the like.

The main characters appearing in the full-length movie may have a major correlation, contribution and/or impact to the narrative, plot and/or progress of the full-length movie, in particular compared to other characters, for example, supporting characters, side characters, extra characters and/or the like. The number of main characters which appear in a mood based time interval of the full-length movie and the (time) duration of their appearance in the mood based time interval may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the mood based time interval to the narrative of the full-length movie. Moreover, actions relating (e.g. conducted y, inflicted on, involving, etc.) to the main characters which are detected in the mood based time interval may also be highly indicative of the level of relevance of the respective mood based time interval to the narrative of the full-length movie.

Using one or more of the relevance metrics, a relevance score may be computed for each of the mood based time intervals. Moreover, the relevance score computed for one or more of the mood based time intervals may be based on aggregation of the relevance score computed according to multiple relevance metrics.

In addition, the defined metrics may include diversity metrics defined to express a difference between each mood based time interval and one or more of its adjacent mood based time intervals, i.e., a preceding mood base time interval and a subsequent mood base time interval. The diversity metrics may be defined by one or more interval attributes relating to each mood based time interval with respect to its adjacent mood based time interval(s). The diversity metrics may include, for example, a difference in the (identity) of characters appearing in a mood based time intervals compared to the adjacent interval(s), a difference between the mood, specifically the dominant mood expressed in a mood based time interval and the mood expressed in the adjacent interval(s), a difference in actions depicted in a mood based time interval and actions depicted in the adjacent interval(s) and/or the like.

A diversity score may be computed for each of the mood based time intervals according to the diversity metrics, typically by aggregating the diversity score computed for each of the diversity metrics relating to one or more interval attributes. The diversity score therefore expresses how different each mood based time interval is from its adjacent mood based time interval(s). The diversity score therefore expresses how different the respective mood based time interval is from its adjacent mood based time interval(s). Identifying the diversity and difference between each of the mood based time intervals and its adjacent intervals may enable selecting a diverse set of mood based time intervals encompassing a wide scope of the narrative of the full-length movie while avoiding selecting similar mood based time intervals which may be redundant.

The subset of mood based time intervals selected for the summary video may therefore include mood based time intervals selected according to their score, for example, an aggregation of their relevance score and diversity score. For example, each mood based time interval having a score exceeding a certain predefined threshold may be selected to the subset used to create the summary video. In another example, a certain number of mood based time intervals having a highest score may be selected for the subset used to create the summary video. Optionally, the mood based time intervals and/or their number are selected to the subset constituting the summary video according to a one or more timing parameters, for example, an overall duration time defined for the summary video, a duration of one or more of the mood based time intervals and/or the like.

The selected subset of mood based time intervals may be concatenated to produce the mood based video summary of the full-length movie which may be output for presentation to one or more users (spectators).

The mood based summary videos may present major advantages and benefits compared to exiting methods and systems for creating video summaries. First, automatically creating the summary videos may significantly reduce the effort and/or time required for manually creating the summary videos as may be done at least partially by some of the existing methods.

Moreover, some of the existing methods may use one or more video inference and/or interpretation tools, algorithms and/or techniques for automatically (at least partially) creating the summary videos. These methods, however, may typically process the entire full- length movie which may require major and even extreme computing resources (e.g. processing resources, storage resources, etc.) and/or computing time. The mood based video summary on the other hand is based on processing limited length time intervals of the full- length movie thus significantly reducing the computing resources and/or computing time required for creating the summary videos.

Furthermore, a major challenge in creating the summary videos is to create the summary video such that it is highly representative of the full-length movie’s narrative while maintaining coherence in a significantly (predefined) shorter duration compared to the original full-length movie. Some of the existing methods for automatically creating the summary videos may include in the summary video short time sections extracted from the full-length movie, typically action related sections. These short time sections may fail to accurately, logically and/or coherently deliver the narrative of the full-length movie. In contrast, the mood based time intervals may define efficient segmentation units which are long enough to contain substantial sections of the full-length movie to convey its narrative in a coherent manner while sufficiently short to allow selection of a large number of time interval presenting a diverse collection of sections (different in content and/or context) of the full-length movie thus conveying an accurate and extensive summary of the narrative.

In addition, introducing the new metrics, specifically the relevance metrics may allow for efficient automated selection of the significant and important mood based time intervals thus further reducing the computing resources and/or computing time required to create the summary video.

Also, applying the newly introduced diversity metrics for automatically selecting the mood based time intervals may serve to select a wide and diverse collection of the mood based time intervals which may be highly representative of the narrative of the full-length movie, in particular for complex narrative movies in which major aspects of the narrative may be distributed across many sections of the movie. Moreover, using the diversity score may further serve to avoid selecting redundant mood based time intervals which may present little and/or insignificant difference compared to other selected mood based time intervals.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD- ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions. The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

Referring now to the drawings, FIG. 1 is a flowchart of an exemplary process of generating a mood based summery video for a full-length movie, according to some embodiments of the present invention. An exemplary process 100 may be executed to generate a mood based summary video summarizing multimedia content, specifically a full- length movie. The mood based summary video which may be significantly shorter than the may consist of a one or more segments of the full-length movie which are highly relevant to the overall narrative (ontology) of the full-length movie thus reliably conveying the summery of the full-length movie.

The generated mood based summary video may be presented to one or more users (spectators) for one or more goals, purposes and/or applications, for example, movie selection, movie categorization and/or the like.

Reference is also made to FIG. 2, which is a schematic illustration of an exemplary system for generating a mood based summery video for a full-length movie, according to some embodiments of the present invention. An exemplary video summarization system 200, for example, a computer, a server, a computing node, a cluster of computing nodes and/or the like may execute a process such as the process 100 to generate (create) a mood based summary video for one or more full-length movies. The video summarization system 200 may include an I/O interface 210, a processor(s) 212 for executing the process 100 and storage 214 for storing code (program store) and/or data.

The I/O network interface 210 may include one or more interfaces, ports and/or interconnections for exchanging data with one or more external resources.

For example, the I/O interface 210 may include one or more network interfaces for connecting to one or more wired and/or wireless networks, for example, a Local Area Network (LAN), a Wide Area Network (WAN), a Municipal Area Network (MAN), a cellular network, the internet and/or the like. Using the network interface(s) provided by the I/O interface 210, the video summarization system 200 may communicate with one or more remote networked resources, for example, a device, a computer, a server, a computing node, a cluster of computing nodes a server, a system, a service, a storage resource, a cloud system, a cloud service, a cloud platform and/or the like.

In another example, the I/O interface 210 may include one or more interfaces and/or ports, for example, a Universal Serial Bus (USB) port, a serial port and/or the like configured to connect to one or more attachable media devices, for example, a storage medium (e.g. flash drive, memory stick, etc.), a mobile device (e.g. laptop, smartphone, tablet, etc.) and/or the like.

The video summarization system 200 may receive, via the I/O interface 210, one or more full-length movies 250, for example, fiction movies, documentary movies, educational movies, a series comprising multiple episodes and/or the like.

The video summarization system 200 may further receive, via the I/O interface 210, a KG annotation model 255 annotation model created for each of the full-length movies 250. The KG annotated model 255 which is outside the scope of the present invention may be created for each full-length movie 250 by uplifting and annotating features extracted from a video content of the full-length movie 250, an audio content of the full-length movie 250, a speech content of the full-length movie 250, at least one subtitles record associated with the full-length movie 250, a metadata record associated with the full-length movie 250 and/or the like. The KG annotated model 255 which may be created automatically, manually and/or in a combination thereof may therefore provide a highly rich, extensive and precise source of information describing the full-length movie 250, scenes of the full-length movie 250 and/or features extracted from the full-length movie 250.

The video summarization system 200 may output, via the I/O interface 210, a video summary 260 summarizing the full-length movie 250 in a significantly short time compared to the full-length movie 250, for example, 15%, 20%, 25%, etc. of the length of the full- length movie 250.

The processor(s) 212 may include one or more processor(s), homogenous or heterogeneous, each comprising one or more processing nodes arranged for parallel processing, as clusters and/or as one or more multi-core processor(s). The processor(s) 212 may execute one or more software (code) modules, for example, a process, an application, an agent, a utility, a tool, a script and/or the like each comprising a plurality of program instructions stored in a non-transitory medium such as the storage 214 and executed by one or more processors such as the processor(s) 212. For example, the processor(s) 212 may execute a video summarizer 220 implementing the process 100.

The video summarization system 200 may further include one or more hardware components to support execution of the video summarizer 220, for example, a circuit, an Integrated Circuit (IC), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signals Processor (DSP), a Graphic Processor Unit (GPU) and/or the like.

The video summarizer 220 may be therefore executed, utilized and/or implemented by one or more software modules, one or more of the hardware components and/or a combination thereof.

The storage 214 used for storing data and/or code (program store) may include one or more non-transitory memory devices, for example, a persistent non-volatile device such as, for example, a ROM, a Flash array, a hard drive, a solid state drive (SSD), a magnetic disk and/or the like. The storage 214 may typically also include one or more volatile devices, for example, a Random Access Memory (RAM) device, a cache memory and/or the like. Optionally, the storage 214 further comprises one or more network storage resources, for example, a storage server, a Network Attached Storage (NAS), a network drive, and/or the like accessible to the video summarizer 220 via the I/O interface 210. Optionally, the video summarization system 200 and/or the video summarizer 220 are provided, executed and/or utilized at least partially by one or more cloud computing services, for example, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS) and/or the like provided by one or more cloud infrastructures and/or services such as, for example, Amazon Web Service (AWS), Google Cloud, Microsoft Azure and/or the like.

Moreover, the video summarization system 200, specifically, the processor(s) 212 may further execute one or more applications, services and/or hosts to enable one or more users, for example, a content expert, a knowledge engineer and/or the like to interact with the video summarizer 220 in order to access, evaluate, define, adjust and/or control the process 100 and/or part thereof. The user(s) may access, for example, the video summarizer 220, the full-length movie 250, the KG annotation model 255, the summary video 260 and/or one or more temporary products of the process 100 may be done via.

Access to the video summarization system 200 may be implemented in one or more architectures, deployments and/or methods. For example, the video summarization system 200 may execute one or more host applications, web applications and/or the like providing access to the video summarizer 220 for one or more of remote users. The remote user(s) may use a client device(s), for example, a computer, a server, a mobile device and/or the like which executes an access application, for example, a browser, a local agent and/or the like for accessing the video summarization system 200 via one or more networks to which the video summarization system 200 is connected via the I/O interface 210. In another example, one or more of the users may be local users who may access the video summarization system 200 via one or more Human Machine Interfaces (HMI) provided by the I/O interface 210, for example, a keyboard, a point device, (e.g. mouse, trackball, etc.), a display, a touch screen and/or the like.

Optionally, access to one or more of the video summarizer 220, the full-length movie 250, the KG annotation model 255, the summary video 260 and/or to a temporary product of the process 100 may be done via one or more databases, applications and/or interfaces. For example, one or more databases, for example, an SPARQL database may be deployed in association with the video summarization system 200, specifically with the video summarizer 220. The user(s) may therefore issue one or more database queries and/or issue one or more update instructions to interact with the video summarizer 220 and in order to evaluate, define, adjust and/or control the process 100 and/or part thereof.

As shown at 102, the process 100 starts with the video summarizer 220 receiving a full-length movie 250. The full-length movie 250, for example, a fiction movie, a documentary movie, an educational movie and/or the like may typically comprise of a significantly long video stream, for example, 90 minutes, 120 minutes, 180 minutes and/or the like. The full-length movie 250 may further include a series, a mini-series and/or the like comprising a plurality of episodes.

As shown at 104, the video summarizer 220 receives a KG annotation model 255 created for the full-length movie 250. The KG annotated model 255 which is outside the scope of the present invention may be created for the full-length movie 250 in order to create an enhanced feature set for the full-length movie 250 providing rich, extensive and precise information describing the full-length movie 250, one or more scenes of the full-length movie 250, a narrative of the full-length movie 250, an ontology of the full-length movie 250 and/or the like.

The KG annotated model 255 may be created by uplifting and annotating features extracted from the full-length movie 250 and/or from one or more data sources and/or data records associated and/or corresponding to the full-length movie 250. For example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from the video (visual) content of the full-length movie 250. In another example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from the audio content of the full-length movie 250. In another example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from the speech content of the full-length movie 250. In another example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from one or more subtitles records associated with the full-length movie 250. In another example, the KG annotated model 255 may include enhanced features created by uplifting features extracted from one or more metadata records associated with the full-length movie 250.

The KG annotation model 255 may be created manually, automatically and/or by a combination thereof. For example, one or more Natural Language Processing (NLP) methods, algorithms and/or tools may be applied to the full-length movie 250, specifically to features extracted from the full-length movie 250 in order to annotate these features thus uplifting, enhancing and/or enriching the extracted features. Moreover, one or more trained Machine Learning (ML) models may be applied to the features extracted from the full-length movie 250 for annotating and uplifting the extracted features. The ML model(s), in particular NLP ML model (s), for example, a neural network, a deep neural network, a Support Vector Machine and/or the like may be trained using one or more training datasets comprising sample features extracted, simulated and/or manipulated for a plurality of full-length movies optionally of the same genre as the received full-length movie 250.

Each of the annotated features described by the KG annotation model 255 may be associated (assigned) with a time stamp which temporally maps the respective annotated future to a temporal location in a timeline of the full-length movie 250. Using the timestamp associated with each of the annotated features, the video summarizer 220 may therefore identify the temporal location of the respective annotated feature in the timeline of the full- length movie 250.

As shown at 106, the video summarizer 220 analyzes the KG annotation model 255 and, based on the analysis, segments the full-length movie 250 into a plurality of time intervals, in particular, mood based time intervals S i (i = 0, ... , N) each expressing a certain dominant mood, for example, happiness, sadness, depression, anxiety, excitement, joy, anger, rage, kindness, generosity, sorrow, patience and/or the like.

While one or more of the mood based time intervals may express multiple moods, the video summarizer 220 may identify a single dominant mood which is estimated to be expressed in higher intensity, force, magnitude and/or the like in each of the mood based time intervals compared to other moods which may be expressed in the respective mood based time interval.

The video summarizer 220 may identify the plurality of mood based time intervals by analyzing one or more mood indicative annotated features described in the KG annotation model 255. As each of the annotated features is associated with a respective timestamp, the video summarizer 220 may accurately map the annotated features to their temporal locations in the timeline of the full-length movie 250 in order to set the start and end times for each of the plurality of mood based time intervals. The mood indicative annotated features may include, for example, features reflecting a background music (sound track) played during one or more of the mood based time intervals. For example, a dramatic music may be highly indicative of a dramatic scene which may express moods such as depression, sorrow and/or the like. In another example, a romantic music may be highly indicative of a romantic scene which may express moods such as happiness, joy, lightheaded and/or the like. In another example, a rhythmic music may be highly indicative of an action scene which may express moods such as anxiety, excitement, fear and/or the like.

In another example, the mood indicative annotated features may further include features expressing semantic content of speech detected during one or more of the mood based time intervals, for example, key words, contextual words and/or the like. For example, love and/or affection expressing words may be highly indicative of a romantic scene, which may express moods such as happiness, joy, lightheaded and/or the like. In another example, weapons, cars, violence expressing words may be highly indicative of an action and/or battle scene, which may express moods such as anxiety, excitement, fear and/or the like.

The mood indicative annotated features may further include features expressing tone, intonation and/or volume, which may be coupled with corresponding semantic content expressing features such that the semantic content may be associated with the tone, intonation and/or volume. For example, key words spoken in low volume and/or whispered in a soft intonation may be highly indicative of a romantic and/or a dramatic scene, which may express moods such as sadness, depression, sorrow, happiness, joy, lightheaded and/or the like. In another example, key words spoken in high volume in a sharp intonation may be highly indicative of an action and/or battle scene, which may express moods such as anxiety, excitement, fear and/or the like.

In another example, the mood indicative annotated features may include features expressing one or more mood indicative (expressive) facial expressions of one or more characters identified in one or more of the mood based time intervals. The mood indicative facial expressions may reflect, for example, anger, happiness, sorrow, anxiety, excitement, fear, lighthearted, love and/o the like.

In another example, the mood indicative annotated features may include features expressing one or more mood indicative (expressive) gestures made by one or more characters identified in one or more of the mood based time intervals. The mood indicative gestures may include for example, hugging, kissing, fighting, running, driving and/o the like which may be highly indicative of the moods of the characters, for example, love, anxiety, excitement, fear and/or the like. The video summarizer 220 may further aggregate a plurality of mood indicative annotated features in order to estimate the dominant mode expressed in each of one or more of the mood based time intervals and segment the full-length movie 250 accordingly.

As shown at 108, the video summarizer 220 may compute a score, specifically a relevance score for each of the plurality of mood based time intervals which expresses the relevance of the respective mood based time interval to a narrative and/or ontology of the full-length movie 250.

The video summarizer 220 may compute the relevance score according to one or more relevance metrics specifically defined and applied for estimating the relevance of time intervals extracted from the full-length movie 250 to the narrative and/or ontology of the full-length movie 250. In other words, the metrics are defined and applied to reflect a level (degree) of relevance, i.e., the level of correlation, expressiveness, agreement and/or alignment of the extracted time interval with respect to the narrative of the full-length movie 250.

The relevance metrics may relate to characters in the full-length movie 250 and actions relating to the characters. To facilitate the means for applying the relevance metrics the video summarizer 220 may therefore analyze the KG annotation model 255, specifically the annotated features of the KG annotation model 255 to identify the characters presented throughout the full-length movie 250.

Moreover, based on the analysis of the KG annotation model 255, the video summarizer 220 may identify which of these characters are main character(s), i.e. leading characters playing a major part in the full-length movie 250 and which of these characters are supporting characters, side characters and/or extra characters.

Based on the analysis of the KG annotation model 255, the video summarizer 220 may further identify actions relating to each of the characters throughout the full-length movie 250, for example, actions conducted by a character, an action inflicted on a character, an action involving a character and/or the like.

In particular, the video summarizer 220 associates the characters, main characters and actions with the plurality of mood based time intervals according to the timestamps of the features expressing the time of appearance of the characters and/or the actions along the timeline of the full-length movie 250. As such, each of the mood based time interval is associated with one or more characters seen during the respective mood based time interval and one or more actions relating to the character(s) during the respective mood based time interval.

Reference is now made to FIG. 3, which is a schematic illustration of an exemplary mood based segmentation of a full-length video, according to some embodiments of the present invention. A video summarizer such as the video summarizer 220 may analyze a KG annotation model such as the KG annotation model 255 created for a full-length movie such as the full-length movie 250 comprising a plurality of scenes to segment the full-length movie 250 to a plurality of mood based time intervals.

For example, assuming a total (overall) duration of the full-length movie 250 is T n . Based on the analysis of the KG annotation model 255 of the video summarizer 220 may identify a certain scene starting at time T n-50 and ending at time Tn-io. Based on the analysis of the KG annotation model 255, the video summarizer 220 may further identify a mood based interval expressing a dominant mood MOODq which starts at starting at time T n-47 and ends at time T n-32 . The video summarizer 220 may also identify, based on the analysis of the KG annotation model 255, one or more actions seen in the scene, for example, an action A w starting at T n-38 , and ending at T n-33 , an action A x starting at T n-33 , and ending at T n-28 , an action A y starting at T n-28 , and ending at T n-25 and an action A z starting at T n-25 , and ending at T n-19 .

The video summarizer 220 may associate each of the actions with one or more of the characters identified during the time of the actions and may further associate each of the actions with a respective mood based time interval according to the time of detection of the respective action along the timeline of the full-length movie 250 identified by the timestamp(s) assigned to the feature(s) expressing the respective action. Based on the analysis of the KG annotation model 255, the video summarizer thus associates each of the mood based time intervals with one or more characters, main character(s) or other character(s) and actions relating to the identified characters. Once the association is accomplished the relevance metrics may be applied.

The relevance metrics may include, for example, a number of main characters appearing during the certain mood based time interval.

The main characters may have a major correlation, contribution and/or impact to the narrative, plot and/or progress of the full-length movie 250. In particular, the relevance, e.g. the correlation, the contribution and/or the impact of the main characters to the full-length movie 250 may be significantly higher compared to other characters, for example, supporting characters, side characters, extra characters and/or the like depicted in the full- length movie 250. The number of main characters which are depicted in a certain time interval of the full-length movie 250 may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval to the narrative of the full-length movie 250.

For example, time intervals showing a large number of main characters of the full- length movie 250 may be highly correlated to the narrative of the full-length movie 250 while time intervals depicting a small number of main characters, for example, one main character may have low correlation to the narrative of the full-length movie 250. In another example, time intervals of the full-length movie in which none of the main characters appears may have low and potentially insignificance correlation, contribution and/or impact to the narrative, ontology, plot and/or progress of the full-length movie 250.

The video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the main character metrics based on the number of main characters identified in the respective mood based time interval. The video summarizer 220 may compute a main characters score ImpChar(S i ) of the respective mood based time interval S i according to an exemplary formulation presented in equation 1 below which indicates how many of the overall characters depicted in the respective mood based time interval are main characters. Equation 1:

Where | MainChar ∈ S i | reflects the number of main characters identified in the respective mood based time interval S i and | Char ∈ S i | reflects the number of all characters identified in the respective mood based time interval S i .

The relevance metrics may further include a time (duration) of appearance of each of the main characters depicted in a time interval. Since the main characters may have a major relevance to the narrative, plot and/or progress of the full-length movie 250, the time of appearance of these main characters during a certain time interval may also be highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval with respect to the narrative of the full-length movie 250. This metrics may be computed to express, for example, the time duration of appearance of each of the main characters seen in the respective mood based time interval with relation (e.g. a fraction of) the total (overall) time duration of the respective mood based time interval.

The video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the main character time duration metrics for one or more of the main characters appearing in the respective mood based time interval. Moreover, in case a plurality of main characters appear in the respective mood based time interval, the video summarizer 220 may further compute and/or adjust the relevance score by aggregating the main character time duration metrics for the plurality of main characters shown in the respective mood based time interval.

Another relevance metric which may be applied to compute the relevance score of one or more of the mood based time intervals is a number of actions relating to each main character (designated important actions herein after) detected during the respective mood based time interval, for example, actions conducted by a main character, actions inflicted on a main character, actions involving a main character and/or the like.

Since the main characters have a major relevance to the narrative of the full-length movie 250, the important actions relating to these main characters may also have major relevance to the narrative of the full-length movie 250. In particular, the relevance, e.g. the correlation, the contribution and/or the impact of the important actions relating to the main characters may be significantly higher compared to actions relating to the other characters (i.e. conducted by, inflicted on, involving). The number of important actions identified in a certain time interval of the full-length movie 250 as relating to one or more of the main characters seen in the certain time interval may be therefore highly indicative and reflective of the level of relevance (correlation, expressiveness, agreement and/or alignment) of the certain time interval with respect to the narrative of the full-length movie 250. The video summarizer 220 may compute the relevance score of one or more of the mood based time intervals according to the important actions metrics based on the number of actions identified for each main character identified in the respective mood based time interval. The video summarizer 220 may compute an important actions score ImpAct (S i ) of the respective mood based time interval S j according to an exemplary formulation presented in equation 2 below which indicates how many of the overall actions detected in the respective mood based time interval relate to main characters.

Equation 2:

Where C n designates a character identified in the respective mood based time interval S i , Main(C n ) designates a main character identified in the respective mood based time interval S i . | Actions ∈ S i | designates actions relating to all the characters C n identified in the respective mood based time interval S i and designates the important actions relating to the main characters identified in the respective mood based time interval S i .

The video summarizer 220 may compute and/or adjust the relevance score of one or more of the mood based time intervals of the fiill-length movie 250 by aggregating the relevance scores computed for the respective mood based time interval according to multiple metrics selected from the main character metrics, the main character time duration metrics and the important actions metrics.

As shown at 110, the video summarizer 220 may compute a diversity score for one or more of the mood based time intervals expressing a difference between the respective mood based time interval compared to one or more of its adjacent mood based time intervals, i.e., a preceding mood base time interval and a subsequent mood base time interval.

The diversity score therefore expresses how different the respective mood based time interval is from its adjacent mood based time interval(s). Identifying the diversity and difference between each of the mood based time intervals and its adjacent intervals may enable selecting a diverse set of mood based time intervals encompassing a wide scope of the narrative of the full-length movie 250 while avoiding selecting similar mood based time intervals, which may be redundant.

In particular, the video summarizer 220 computes the diversity score according to a diversity metrics defined by one or more interval attributes of the respective mood based time interval and its adjacent mood based time interval(s). These interval attributes may include, for example, a character attribute reflecting the (identify of) characters appearing in the mood based time interval, a mood attribute reflecting the moods expressed in the mood based time interval, an action attribute reflecting the actions depicted in the mood based time interval and/or the like.

The diversity metrics may thus express the difference between the respective mood based time interval and its adjacent mood based time interval(s) with respect to the respective interval attribute. For example, a first diversity metric may express the difference between the characters appearing in the respective mood based time interval and those appearing in the adjacent mood based time interval(s). In another example, a second diversity metric may express the difference between the mood, specifically the dominant mood expressed in the respective mood based time interval and the mood(s) expressed in the adjacent mood based time interval(s). In another example, a third diversity metric may express the difference between actions depicted in the respective mood based time interval and actions depicted in the adjacent mood based time interval(s). The video summarizer 220 may compute a partial diversity score for each of the interval attributes between two consecutive mood based time intervals. For example, the partial diversity score may be computed as an intersection of the respective interval attribute as identified in two subsequent mood based time intervals over a union of the respective interval attribute as identified in two subsequent mood based time intervals S i and S i+1 as presented in equation 3 below.

Equation 3 :

Where C ∈ S i is the set of characters appearing in the mood based time interval S i and C ∈ S i+1 is the set of characters appearing in the mood based time interval S i+1 . Similarly, M ∈ S i is the set of moods expressed in the mood based time interval S i and M ∈ S i+1 is the set of moods expressed in the mood based time interval S i+1 and A ∈ S i is the set of actions seen in the mood based time interval 5; and A ∈ S i+1 is the set of actions seen in the mood based time interval S i+1 .

The video summarizer 220 may compute a relative diversity score for each two consecutive mood based time intervals by aggregating two or more of the individual diversity scores computed separately for each of the interval attributes. For example, the video summarizer 220 may compute the relative diversity score according to the exemplary formulation presented in equation 4 below.

Equation 4: The video summarizer 220 may then compute the diversity score Div(S i ) for each of the mood based time intervals S i based on the relative diversity score computed for every two consecutive mood based time intervals, for example according to the exemplary formulation presented in equation 5 below.

Equation 5 :

Since the first mood based time interval and the last mood based time interval have only one adjacent mood based time interval, the video summarizer 220 may set the diversity score Div( S i ) for these two mood based time intervals to equal the relative diversity score computed for these two mood based time intervals, i.e., d(S (0,1) ) for the first mood based time interval and (S (n-1,n) ) for the last mood based time interval assuming the full-length movie 250 is segmented to n mood based time intervals.

As shown at 112, the video summarizer 220 may generate a mood based summary video for the fiill-length movie 250 which aims to summarize the full-length movie 250 and present the narrative, plot and/or progress of the full-length movie 250 in a significantly shorter time duration compared to the time duration of the full-length movie 250, for example, 15%, 20, 25% and/or the like.

The video summarizer 220 may generate the mood based summary video by concatenating together a subset of mood based time intervals selected from the plurality of mood based time intervals according to a score computed for each of the mood based time intervals. The score computed by the video summarizer 220 for each of the mood based time intervals includes the relevance score computed for the respective mood based time interval and optionally the diversity score computed for the respective mood based time interval.

The score computed by the video summarizer 220 for each of the mood based time intervals may therefore express an aggregated score, for example, a weighted average of the scores computed according to the metrics described herein before. In particular, weighted average score is computed based on the relevance score(s) computed according to the relevance metrics optionally combined with the diversity score computed based on the diversity metrics derived from the interval attributes.

The video summarizer 220 may apply one or more methods, techniques and/or implementation modes for selecting the subset of mood based time intervals used to generate the mood based summary video.

For example, the video summarizer 220 may select all the mood based time intervals having a score exceeding a certain predefined threshold value. In another example, the video summarizer 220 may select a predefined number of mood based time intervals having the highest score.

Optionally, the video summarizer 220 may select the subset of mood based time intervals according to a duration time predefined for the mood based summary video to be created for the full-length movie 250. For example, assuming a certain time duration is predefined for the mood based summary video, the video summarizer 220 may select a certain number of the highest scoring mood based time intervals which have a combined (time) duration which is less or equal to the predefined certain time duration.

The resulting moos based summary video is therefore a sequence of the mood based time intervals selected according to the scores assigned by the metrics described herein before. In particular, the mood based time intervals are shorter than scenes and longer than actions thus providing a convenient and efficient unit for creating the mood based summary video which may accurately, concisely and coherently represent the narrative, plot and progress of the full-length movie 250 in a significantly shorter time period, for example, 20 to 30 minutes.

As shown at 112, the video summarizer 220 outputs the mood based summary video which may be used by one or more users for one or more purposes, objectives, goals and/or applications. For example, one or more users may watch the mood based summary video in order to determine whether they wish to watch the full-length movie 250. In another example, one or more users may watch the mood based summary video in order to categorize the full-length movie 250 in one or more categories, libraries and/or the like.

Optionally, the video summarizer 220 may adjust one or more of the weights assigned to the relevance score, to the diversity score and/or to any of their components to adjust (reduce or increase) the contribution and/or the relevance of one or more of the metrics and hence of the score computed according to these metrics. The video summarizer 220 may further apply one or more ML models trained over a large dataset comprising a plurality of full-length movies such as the full-length movie 250 each associated with a respective KG annotation model such as the KG annotation model 255 to adjust the weights assigned to each of the metrics. In particular, the training datasets may be labeled with a feedback scores assigned by users who viewed mood based summary videos created for the plurality of full- length movies. The feedback scores may reflect the understanding and/or conception of the narrative, ontology and/or plot of a certain full-length movie based on watching the respective mood based summary video.

Experiments were conducted to demonstrate the high level of understanding and/or conception of narratives of full-length movies as captured by users presented with respective mood based summary videos.

The computing hardware used for the experiments is selected to support the intensive computer vision processing required by the crowd behavior anomaly detector 220 executing the process 100. In the particular, the computing hardware is based on a work station comprising two Intel® Xeon ® E5-2600 CPUs having 6 cores each and 15MB SmartCache operating @ 2.00GHz supported by 24GB DRAM and an nVIDIA® GeForce GTX 1080 Xtreme D5X GPU with 8GB DRAM. The video summarizer 220 was implemented in Python and JavaScript programming languages and makes use of source code provided on GitHub.

However, the computing hardware and programming code languages used for the experiments should not be construed as limiting since a plurality of other implementations may be applied for realizing the video summarization system 200 and the video summarizer 220 executing the process 100.

The evaluation in the experiments was focused measuring two main aspects:

1) User understanding, meaning how well did users presented with a mood based summary video understand (comprehend) the narrative (i.e., the plot, story, etc.) of the full-length movie from which the mood based summary video was created and were these users able to grasp the main key aspects, moments and/or concepts portrayed by the mood based summary video.

The purpose of this evaluation aspect is to estimate the quality of the mood based summary videos, in particular to measure the user understanding after watching a mood based summary video. This aspect should refer to the amount of information that a user obtains from watching the mood based summary video and, if possible, it should not reflect the subjective view of the user on the quality of the mood based summary video.

2) Alignment with existing summary videos, meaning how similar are the mood based summary videos to existing online movie summaries manually generated by humans, for example, movie plots available on Wikipedia, IMDb and/or the like and do the mood based summary videos share common key moments and important parts of the story of the full-length movie with the manually generated movie summaries.

The purpose of this evaluation aspect is to determine the extent to which the automatically generated mood based summary videos include key events of their respective full-length movies through a comparison with human authored text summaries. These text summaries may be obtained, for example, through movie plots described on Wikipedia, IMDb, TV review websites and/or the like.

The evaluation was conducted for three full-length movies: “Mission Impossible - Rogue Nation”, “Casino Royale” and “The Girl with the Dragon Tattoo”. The movies are selected to represent a range of complexity of movie narratives (plots, characters, structure, turning points, etc.) where “Mission Impossible” may be relatively simple, “The Girl with the Dragon Tattoo” may be more complicated and “Casino Royale” may be the most complex.

To evaluate the first aspect of user understanding, a plurality of users who are not familiar with the three evaluated full-length movies and have not seen them before were presented with three mood based summary videos created for the evaluated full-length movies. The users then filled a questionnaire comprising questions aiming to determine the level of the users’ understanding of the narratives of the full-length movies. The questionnaire was fabricated according to a Likert scale as known in the art with a scale of 1 to 7 where 1 indicates a very low understanding level and 7 indicates a very high understanding level.

The results of the evaluation are presented in FIG. 4 which is a chart graph of distribution of experiment scores provided by users presented with mood based summary video to rank their understanding of the narrative of the full-length movie, according to some embodiments of the present invention.

A chart graph 400 presents an averaged score accumulated for a plurality of users ranking their understanding of the three full-length movies based on watching the mood based summary videos cerate for these three full-length movies. As expected, the level of understanding expressed by the user for the three full-length movies matches the complexity of the narrative of these full-length movies.

To evaluate the second aspect of alignment with existing summaries, a plurality of people (evaluators) who are familiar with the three evaluated full-length movies, identified and marked important events, key moments and/or the like within one or more movie manually generated summaries of the three evaluated full-length movies and negotiated to agreement on these important events, key moments and/or the like.

For example, from the Wikipedia page of the full-length movie “Casino Royale”, the following text snippet was analyzed and processed for the extraction of key movie facts:

“MI6 agent James Bond gains his license to kill and status as a 00 agent by assassinating the traitorous MI6 section chief Dryden at the British Embassy in

Prague, as well as his terrorist contact, Fisher. In Uganda, the mysterious liaison

Mr. White introduces warlord Steven Obanno of the Lord’s Resistance Army to Le

Chiffre, a terrorist financier. [ ...]”

The following facts were extracted from the snippet:

- James Bond becomes 00 agent

James Bond killed traitorous MI6 section chief Dryden James Bond killed Dryden at the British Embassy in Prague James Bond killed terrorist Fisher

Mr. White introduces warlord Steven Obanno of the Lord’s Resistance Army to Le Chiffre [. · ·]

The evaluators had the task of checking how many of these facts extracted from textual summaries were presented in the respective mood based summary video. The evaluators were presented with a list of facts extracted from the textual summaries and, after watching the mood based summary videos, they had to mark the facts that are included in the respective mood based summary videos, either fully or partially. While manually generated textual summaries by humans are available on well-known websites such as Wikipedia or IMDB, specific community-based online wikis and/or the like, the textual summaries obtained from Wikipedia plots and IMDB were used as they both enforce strict guidelines on their authors and offer summaries that are quite similar in structure and granularity.

This evaluation strategy has the advantage that it does not require questionnaires or user-based evaluations which may be subjective and provides an accurate and concrete estimate of the validity of the generated mood based summary videos. Simply counting the key facts that are included in the mood based summary videos provides a reliable estimation of the selection of the mood based time intervals and the summarization method as described in the process 100.

The evaluation was conducted according to the following guidelines:

Six evaluators out of a total of seven evaluators evaluated each movie summary to reach agreement on the facts, key moments and/or the like.

Selected movies and genre included the three evaluated full-length movies - “Mission Impossible - Rogue Nation”, “Casino Royale” and “The Girl with the Dragon Tattoo”.

Three video summaries have been evaluated, one for each movie following a specific summarization strategy chosen by the team of evaluators. Summarization strategies for generating the mood based summary videos included 3 different summarization strategies, each employing different metrics for computing the score for the mood based time intervals, specifically, presence of main actors, moods, types of activities and/or the like.

A similar total duration (length) of approximately of 30 minutes was set for all mood based summary videos generated according to the different summarization strategies such that a similar compression rate was applied the different strategies mood based summary videos. The duration (length) of the evaluated full-length movies is over 2 hours such that the duration of mood based summary videos are ~25% of the duration of the respective full-length movies. - Two text summaries, one from Wikipedia: plot and the other from IMDb:

Synopsis) were used for each of the three evaluated full-length movies.

The lists of extracted facts created by the evaluators for the text summaries comprised the following numbers of facts:

“Mission Impossible”: 51 facts “Casino Royale”: 65 facts

“The Girl with the Dragon Tattoo”: 47 facts

Table 1 below presents the results of the experiment conducted by the seven evaluators for the three evaluated full-length movies. It should be noted that while the experiments are conducted on a small scale and may thus lack statistical significance, these experiments may provide insights on the developed summarization algorithms.

Table 1: The first evaluation aspect of user understanding is presented in the “understand” row for each of the full-length movies and expresses the understanding of the respective evaluator of the respective fill-length movie after watching the respective summary video on the scale of 1-7. As evident from table 1, the evaluators had consistent results showing substantial agreement and similar values. The standard deviation for the percentages reported in table 1 range from a minimum of ±4.1% to ±8.9%. However, as stated herein before, with this limited number of evaluators deriving significant statistical values may be highly limited.

The second evaluation aspect of alignment with the textual summaries is reflected in the “%present”. “%partial” and “%missing” rows in table 1. The “%present” expresses the percentage of facts found by the respective evaluator to be aligned (match) between the respective summary video and its corresponding text summary. The “%partial” expresses the percentage of facts found by the respective evaluator to be partially aligned (match) between the respective summary video and its corresponding text summary. And, the “%missing” expresses the percentage of facts found by the respective evaluator to be missing in the respective summary video compared to its corresponding text summary.

As evident from table 1, the average percentages (right most column) of missing key facts ranges between 53% and 59% for the mood based summary videos created for the three evaluated full-length movies. Consequently, 41%-47% of the key facts were included in the mood based summary videos created for the three evaluated full-length movies (either partially or completely). This may be a significantly good result especially considering that, based on rough estimation a mood based summary video which includes all facts of a respective full-length movie, for example, “Mission Impossible”, may be approximately an hour long. Therefore, achieving a relatively high compliance of the mood based summary video which is half that time (~30 minutes) with the respective text summaries may be a major improvement.

The results of the alignment of the mood based summary video with the text summaries is presented in FIG. 5 which presents graph charts of experiment results conducted to evaluate mood based summary videos created for three full-length movies, according to some embodiments of the present invention. A pie graph chart 502 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “Mission Impossible” full-length movie. A pie graph chart 504 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “Casino Royal” full-length movie. A pie graph chart 506 presents the distribution of present, partial and missing facts as evaluated by the evaluators for the mood based summary video created for the “The Girl with The Dragon Tattoo” full-length movie.

It is expected that during the life of a patent maturing from this application many relevant systems, methods and computer programs will be developed and the scope of the terms ML models and neural network are intended to include all such new technologies a priori.

As used herein the term “about” refers to ± 10 %.

The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to".

The term “consisting of’ means “including and limited to”.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.

Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.