Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM OF DETERMINATION OF A VIDEO SCENE FOR VIDEO INSERTION
Document Type and Number:
WIPO Patent Application WO/2016/071401
Kind Code:
A1
Abstract:
A method and a system (100) of determination of a video scene suitable for video insertion are described. A receiving unit (101) acquires (21) a video and semantic metadata thereof. The semantic metadata comprises keywords relevant to the content of the video. An operation unit (103) develops (20) a suspense ontology comprising a knowledge base that semantically describes suspense levels of videos and scenes thereof. By searching in the suspense ontology for keywords relevant to the keywords of the acquired semantic metadata of the video, one scene of the acquired video is determined suitable for video insertion.

Inventors:
PIEPER MICHAEL (DE)
KUBSCH STEFAN (DE)
GLAESER FRANK (DE)
WEBER MICHAEL (DE)
LI HUI (DE)
Application Number:
PCT/EP2015/075706
Publication Date:
May 12, 2016
Filing Date:
November 04, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
International Classes:
G06F17/30; G11B27/034; G11B27/11; G11B27/28; G06Q30/02
Domestic Patent References:
WO2010029472A12010-03-18
Foreign References:
EP1164791A12001-12-19
US20080066107A12008-03-13
US20100217671A12010-08-26
US20080007567A12008-01-10
Other References:
None
Attorney, Agent or Firm:
SCHMIDT-UHLIG, Thomas (Patent OperationsKarl-Wiechert-Allee 74, Hannover, DE)
Download PDF:
Claims:
CLAIMS

A method for determining a scene of a video suitable for video insertion of a product data in the

determined scene, comprising:

- developing (20) a suspense ontology comprising a knowledge base that semantically describes suspense levels of videos and scenes thereof;

- acquiring (21) the video and semantic metadata of the video, the semantic metadata comprising keywords relevant to the content of the video; and

- determining (22) one scene of the acquired video suitable for video insertion by searching in the suspense ontology for keywords relevant to the keywords of the acquired semantic metadata of the video .

The method of claim 1, further comprising determining (23) one area shown in the determined scene of the video for video insertion of the product data in the determined area.

The method of one of the preceding claims, wherein the determined scene of the required video has a highest suspense level in the video.

The method of one of the preceding claims, further comprising acquiring a textual script of the acquired video, wherein the semantic metadata is extracted from the textual script.

The method of one of the preceding claims, wherein the semantic metadata of the video data includes semantic scene description and temporal video

segmentation data of the video data.

6. The method of one of the preceding claims, wherein the semantic metadata of the video data includes a scene description ontology.

7. A system (100) configured to determine a scene of a video suitable for video insertion of a product data in the determined scene, comprising:

- a receiving unit (101) configured to acquire a video and semantic metadata of the video, the

semantic metadata comprising keywords relevant to the content of the video;

- a storage unit (102) configured to store the acquired video and semantic metadata; and

- an operation unit (103) configured to

develop a suspense ontology comprising a knowledge base that semantically describes suspense levels of videos and scenes thereof; and

determine one scene of the acquired video suitable for video insertion.

8. The system of claim 7, wherein the operation unit (103) is configured to determine an area shown in the determined scene of the video, the determined area being for video insertion of the product data in the determined area.

9. The system of claim 7 or 8, wherein the receiving unit (101) acquires a textual script of the acquired video .

10. The system of one of claims 7 to 9, further

comprising a user interface (104) configured to enable a user to interact with the system (100) .

11. A computer readable storage medium having stored therein instruction for determining a video scene for video insertion of a product data in the determined scene, which when executed by a computer, cause the computer to:

- develop a suspense ontology comprising a knowledge base that semantically describes suspense levels of videos and scenes thereof;

- acquire a video and semantic metadata of the video, the semantic metadata comprising keywords relevant to the content of the video; and

- determine one scene of the acquired video suitable for video insertion by searching in the suspense ontology for keywords relevant to the keywords of the acquired semantic metadata of the video.

A method for determining a scene of a video suitable for video insertion of a product data in the

determined scene, comprising:

- acquiring a suspense ontology comprising a

knowledge base that semantically describes suspense levels of videos and scenes thereof;

- acquiring the video and semantic metadata of the video, the semantic metadata comprising keywords relevant to the content of the video; and

- determining one scene of the acquired video

suitable for video insertion by searching in the suspense ontology for keywords relevant to the keywords of the acquired semantic metadata of the video .

Description:
METHOD AND SYSTEM OF DETERMINATION OF A VIDEO SCENE FOR

VIDEO INSERTION

FIELD

The present principles relates to a method and a system for determining a scene of a video for video insertion, particularly using a suspense ontology. A computer readable medium suitable for such a method and a system is alto introduced.

BACKGROUND

Virtual content insertion is an emerging application of video analysis and has been widely applied in video augmentation to improve the audiences' viewing

experience. One practical application of virtual content insertion is the advertisement integration into videos which provides huge business opportunities for

advertisers.

One major challenge for advertisement integration in a video is to balance its two conflicting tasks, which are to make the inserted content conspicuous enough to be noticed by a viewer, preferably be perceived as part of the original content, and meanwhile not to interfere with the viewer's viewing experience of the original content.

The advertisement integration can be generally conducted by an experienced human operator who can recognize the content of the target video and decide when (e.g., which frames or scenes of the video) and where (e.g., which exact location or area of the chosen frames and scenes) to place an advertisement product. However, the manual marking of proper advertisement zones is time consuming - and rather expensive. Approaches and systems for

automatic or semi-automatic advertisement insertion have been studied for achieving an effective insertion and at the same time minimizing the interference for a viewer.

SUMMARY

Therefore, it is an objective to propose an improved solution for determining a scene of a video for video insertion.

According to the invention, the method comprises:

developing a suspense ontology for describing suspense levels of videos and the scenes thereof; acquiring a video and semantic metadata of the video; and determining one scene of the required video for video insertion using the suspense ontology and the acquired semantic metadata of the video. Specifically, said determining comprises searching the suspense ontology for keywords relevant to the acquired semantic metadata of the video.

In one embodiment, the method further comprises

determining one area of the determined scene of the video for video insertion.

In one embodiment, the method further comprises acquiring a textual script of the acquired video, and the semantic metadata is extracted from the textual script. Accordingly, a system configured to determine a video scene for video insertion is introduced, which comprises a receiving unit, a storage unit and an operation unit. The receiving unit is configured to acquire a video and semantic metadata of the video. The storage unit is configured to store the acquired video and semantic - metadata. The operation unit is configured to develop a suspense ontology for describing suspense levels of videos and the scenes thereof, and to determine one scene of the required video for video insertion using the suspense ontology and the acquired semantic metadata of the video.

In one embodiment, the operation unit is further

configured to determine one area of the determined scene of the video for video insertion.

In one embodiment, the system further comprises a user interface configured to enable a user to interact with the system.

Also, a computer readable storage medium has stored therein instruction for determining a video scene for video insertion, which when executed by a computer, cause the computer to: develop a suspense ontology for

describing suspense levels of videos and the scenes thereof; acquire a video and semantic metadata of the video; and determine one scene of the required video for video insertion using the suspense ontology and the acquired semantic metadata of the video.

The method and system of this invention provide an improved solution of the determination of a suitable video scene for video insertion. With the utilization of the suspense ontology and the semantic metadata of a given video, the video can be properly and automatically analysed for the suspense levels of the video scenes, and thus the best-suited scenes for video insertion can be matched and determined. The analysis of the suspense level can provide an advertiser with a better

understanding and use of the video, resulting in a more - sufficient and precise video insertion. The determined scenes can be sold for advertisement of certain

categories and can charge higher prices because the resulting advertisement effect is improved.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the present principles, the proposed solutions shall now be explained in more detail in the following description with reference to the figures. It is understood that the solutions are not limited to the disclosed exemplary embodiments and that specified features can also expediently be combined and/or modified without departing from the scope of the proposed solutions as defined in the appended claims and exhibited in the figures.

Fig. 1 is a schematic diagram illustrating an

exemplary system configured to perform a method according to this invention.

Fig. 2 flow chart illustrating one preferred embodiment of a method according to this invention .

Fig. 3 shows an exemplary eight sequences model used in the suspense ontology in one embodiment of a method according to this invention. Fig. 4 shows an exemplary script page in one

embodiment of a method according to this invention . -

Fig. 5 shows an exemplary scene description of a

video in one embodiment of a method according to this invention. Figs. 6-18 illustrate one embodiment of a suspense

ontology developed in one embodiment of a method according to this invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Fig. 1 schematically illustrates an exemplary system 100 incorporating the method of the present invention for determining a scene of a video for video insertion. The system 100 comprises a receiving unit 101, a storage unit 102, and an operation unit 103.

The receiving unit 101 is configured to acquire a video and semantic metadata of the video, which can be stored in the storage unit 102 of the system 100, or any other suitable memory unit supplementary to the system 100 (not shown) . The semantic metadata of the video can be

acquired from an external source outside the system 100 or extracted from the acquired video by the operation unit 103. The semantic metadata preferably includes semantic scene description and temporal segmentation data of the video, as well as the information about potential areas for video insertion in the video, e.g., in which frames or scenes of the video data, the possible time interval for the frames or scenes, and of which size the areas on the frames or scenes for video insertion are.

These areas can be, for example, billboards, displays, TV sets and any other similar devices that are shown and detected in the video. Of course, any other high-level and/or low-level metadata, such as face detection, people recognition, action detection, picture construction, - etc., can also be included in the semantic metadata, depending on different demands and situations.

The operation unit 103 is configured to develop a suspense ontology for describing suspense levels of videos and scenes thereof. An ontology can be defined generally as a formal and explicit specification of a shared conceptualization, which represents a domain of concepts in a used application and the relationships among those concepts. An ontology describes knowledge in a special interest section and is used to reason about the properties of the domain, as well as to define the domain. More details about the ontology used in the embodiments of this invention will be described below. Preferably, the suspense ontology is stored in the storage unit 102 of the system 100. In addition, the operation unit 103 is further configured to determine one scene of the acquired video for video insertion using the suspense ontology and the acquired semantic metadata of the video.

In one embodiment, the operation unit 103 is further configured to determine one area of the determined scene of the video for video insertion. This area could be automatically detected by edge detection, object

detection and tracking algorithms, etc., and the

detection results can be, for example, white walls, beverage cans, advertising signs, tables, etc., that are shown in the video data.

Optionally, the system 100 further comprises user interface 104 configured to provide an interface for a user to, for example, enter user input for acquiring video data, search scenes of the acquired video and review the finished video after a video insertion - performed by the operation unit 103. The user interface 104 should be understood as a general user interface that enables a user to interact with and control the system 100, and the display and/or the design of the user interface 104 is flexible for various demands and

purposes. Of course, the system 100 can optionally comprise other additional or alternative devices for the implementation of the method of this invention. Fig. 2 schematically illustrates an exemplary embodiment of a method according to this invention. The method comprises: developing 20 a suspense ontology for

describing suspense levels of videos and the scenes therefor; acquiring 21 a video and semantic metadata of the video; and determining 22 one scene of the acquired video for video insertion using the suspense ontology and the acquired semantic metadata of the video.

Referring to Fig. 1, the exemplary embodiment of the method of the present invention will be further explained in detail below with the exemplary embodiment of the system 100. Preferably, the exemplary embodiment of the method is automatically performed by the system 100. An advertiser can utilize the method and the system 100 to insert a preferred product data in a best-suited scene of a given video. The product data can be, for example, an advertisement or commercial product, or any other

industry product that is suitable for video insertion. The system 100 can be, for example, a system or a server used by an advertiser or a video content provider.

The operation unit 103 of the system 100 develops 20 the suspense ontology. The receiving unit 101 acquires 21 a video and the semantic metadata thereof. The video is preferably a video sequence including several frames and - scenes. Optionally, the order of the developing 20 and the acquiring 21 steps is flexible and does not influence the result of the method. In other words, the suspense ontology can be developed before or after the acquisition of the video.

The suspense ontology preferably comprises a knowledge base of tension levels of videos and films, which

semantically describes typical suspense scenes such as a duel in a western, a suspicion affair in a love film, a car chase in an action movie, a dialogue sequence in a crime film, etc. For example, Fig. 3 shows an eight sequences model illustrating a dramatic structure used in Hitchcock's film. The model shows how the level of tension and suspense is built up throughout the narrative time of the film, what effects are given, and when the summit is reached, where the moment of climax in the film normally has the highest level of suspense. For the suspense ontology different concepts and their relations are needed. Suspense is the reason for the viewer when he wants to know how the story of a video will go on and how the actors will act probably in the next scenes.

Preferably, the suspense ontology is continuously

developed and refined while being used such that the contents of the ontologies become more precisely defined. The refinement of the ontologies can be performed, for example, by the operation unit 103. In one embodiment, the suspense ontology can be developed by the operation unit 103 of the system 100 under

instruction of an operator, for example, an experienced operator who has seen many films and thus has knowledge about how to define and identify suspenseful scenes.

Optionally, the suspense ontology can be developed by the - operation unit 103 of the system 100 using natural language processing (NLP) tools and also the acquired semantic metadata of the video. In other words, the suspense ontology is refined by and when using the required semantic metadata of the video.

In one embodiment, the language for the implementation in the development of the suspense ontology is Web Ontology Language (OWL) from the W3C consortium.

In the ontology there are classes, properties and

individuals described. Every class has individuals which belong to it. In case of the class "SuspenseScene" there are all suspenseful scenes inserted with their content from all movies from a film or video database. A class below that level, a subclass of SuspenseScene, for instance the "FilmMusic" consists of individuals which stand for Filmmusic composition. This can be the beat of a drum, a violin playing or any other instrument which is used to generate an atmosphere of suspense in it. This subclass below the upper SuspenseScene only consists of these individually named representatives. Another class would be the location of the scene. This can also be a big range of things, from a detailed room up to a wider field outside. In OWL the named members of the class

Location will not be part of the class FilmMusic. Instead they would be both parts of the SuspenseScene class. The concept of the class hierarchy, also be called a

taxonomy, with classes and subclasses is a necessary implication in OWL. All members of the subclass are also members of the upper class.

Properties in the Ontology describe the relationships between individuals of two classes. For example one can say: The bedroomScene hasFilmMusic beating drums. To - split the sentence into the Classes, individuals and properties would be: Bedroom is an individual of the class Location. Beating Drum is an individual of the Class FilmMusic. These Individuals are related with the property hasFilmMusic, which is normally displayed as an arrow between the two classes, but for overview reasons there are only the class and subclass connected by an arrow in the figures that will be explained below. It is easier to suggest that all necessary properties between the individuals or representatives between two classes are available. The above is a summary description of the used components inside the ontology.

The development 20 of a suspense ontology can be filled by representatives and can be described by an experienced user for different film genres. The class hierarchy inside the ontology will be used for the generation of suspense levels, which can be used for advertisement integration with business opportunities by the

advertiser. The ontology is used as a scheme like a pattern in which suspenseful scenes with their

appropriate levels are ordered.

The above described suspense ontology is graphically and schematically illustrated in Fig. 6. Besides the main class "SuspenseScene", there are five subclasses which consist of Genre, Location, FilmMusic, Object and

DramaticTools . These classes were chosen to be the representatives of the classes and thus to detect

suspenseful scenes inside films or videos. Each class has subclasses of itself, in which the information level will be more detailed from subclass to subclass, as shown by the symbol on each of the box of the subclass. Each arrow Fig. 6 has the meaning of "hasSubclass" . For example, the SuspenseScene has subclasses of Genre, -

DramaticTools , Location, FilmMusic and Object. The same meaning is Genre "isSubClassOf" SuspenseScene . The details of the five Subclasses will be illustrated below with the corresponding figures. By virtual clicking on the + symbol in the boxes, the underlying subclasses will be expanded.

Fig. 7 shows the "Genre" subclass with its subclasses. As an Example of "Western", the content as an individual or representative for it could be Italian and American, and SpecialActor Western which stand for a Genre of these types .

Fig. 8 shows the "Location" subclass with its subclasses "Interior" and "Exterior" and their further subclasses. As an example of "Nature", a representative could be mountain, sea, field and the like.

Figure 9 shows the "Object" subclass. For further

subclasses there are "person", "car" and "animal" given. For content description it is more useful to have a more global view by using the class object. The person

subclass is detailed more and more because it is one important point how the action takes place or what kind of events are given. Under Events there are three classes given: ActionProcess , Activity and Process. The

generation of this information can follow the questions: What did the person do? (Activity) What did a person to another person? (Action Process) and what happened to a person? (Process) Fig. 10 shows the missing subclass of person, where "Emotion" is expanded with 17 subclasses involved .

Fig. 11 shows the "FilmMusic" subclass, which is divided into three parts as an example. Often suspenseful parts _ - can be detected by background music which has a special character, like a beat of a drum in a slowly manner.

Fig. 12 shows the "dramaticTools" subclass which can be regarded as the most important ones for the suspenseScene representation. There are several subclasses, where

InterestingLocation, InterestingTheme, and

ConstellationofFigures are only listed as classes without subclasses. A representative for constellationofFigures could be two gunmen, as an example for a duel challenge on a windy Main street (InterestingLocation) in a

western .

Fig. 13 shows the "conflicts" subclass under

"DramaticTools" . Often the viewer is interested in these scenes when opposing concepts appear together and lead to conflicts when persons are involved into these scenes. This could be for example the rich business man who meets after years his poor brother and they decide to live together. Fig. 14 shows another "Context" subclass under "DramaticTools" . This class stands for an overall

abstract estimation of a scene. For an example there could be a meeting of good friends in which their

relation would be expressed by the Friendship

represantative .

Fig. 15 shows another "CameraSetting" subclass under "DramaticTools", which is an often used aspect of the picture composition by a video director of photography to create a certain atmosphere and therefore support the actors play and the plot of the story. The camera setting is divided into three different subclasses, Range, Angle and Movement. In Fig. 16 the class range and the

subclasses of "CameraSetting" are shown. That is the setting in which scenes will be recorded by the camera - from a known distance. A close shot is a very near picture taken only for example from a person' s face to work out the emotions. Fig. 17 shows other subclasses "CameraAngle" and "CameraMovement" under "CameraSetting" with their belonging subclasses. One example for that setting would be the tracking shot of a camera when a special object should be zoomed in with a normal angle, like laying all viewers interest on this point at that time .

Fig. 18 shows further another subclass of the subclass "dramaticTools" . The NarrationPace and its subclasses are used to detect the story telling velocity. This can be regarded for example in films when an important decision or a message is shortly before being spoken, then the camera will stay a longer time in the same sequence on the actor. The pace of the narration will be slowed to get all viewers' attention on this point. The above Figs. 6-18 illustrate one embodiment of the developed 20 suspense ontology, in which various

subclasses are used and which includes representatives filling these subclasses. The class hierarchy and the detailed subclasses are used in the invention to generate a value for a Suspense level which orders the suspense rate to different values.

The acquired semantic metadata of the video preferably includes semantic scene description, temporal

segmentation data of the video, keywords relevant to the content of the video data, etc. In one embodiment, the semantic metadata includes a semantic scene description ontology. Optionally, the semantic metadata can also include other information such as the camera settings, _ the brightness used in the video, the emotions that have to be portrayed by actors, etc.

In one embodiment, the receiving unit 101 further

acquires the video and a textual script of the video which defines and identifies suspenseful scenes of the video, and the semantic metadata can thus be extracted and acquired from the textual script. For example, fig. 4 shows an exemplary script page from the movie "The

Godfather" which describes a shooting scene. The semantic metadata acquired and extracted from this script page can include, for example, the Header line (i.e. INT. DON'S OFFICE - DAY) , the time and location of the scene, the textual story plot of the scene, the dialogue section as shown in the script, and characteristic messages of the script status. In addition, the term "quietly" in the brackets can be used for emotion detection of the scene. The term "Pause" can also be detected, which indicates a certain dramaturgical concept and can be used for

suspense level identification. Preferably, a natural language processing (NLP) tool is used to detect

recognizable entity classes, such as events, genre, activity, emotion, etc., from the whole text of the script, mainly the story plot. These entity classes can be subsequently matched with the keywords included in the suspense ontology, and thus be utilized for the decision of video scenes for video insertion. Fig. 5 shows two exemplary scene descriptions included in the acquired semantic metadata of a video in one exemplary embodiment.

With the acquisition of the video and the semantic metadata of the video, the operation unit 103 accordingly determines 22 one or more scenes of the acquired video for video insertion using the suspense ontology and the acquired semantic metadata of the video. Specifically, - the operation unit 103 uses and searches the suspense ontology for keywords relevant to the acquired semantic metadata of the video, and thus determines where the advertisement can or cannot be inserted in the scenes of the video. In other words, the semantic metadata is matched with the knowledge base of the ontology to decide appropriate scenes for video insertion. For example, the scene 2 in Fig. 5 shows a higher suspense level and can thus be decided as a more suitable scene for video insertion.

In one embodiment, the determination 22 of one or more scenes of the acquired video for video insertion is performed in a receiver or a consumer device, rather than a provider server. In this case, the suspense ontology can be developed in advance in the provider server or alternatively developed in the receiver itself. In the case the ontology is developed in advance, the receiver or consumer device receives or acquires the suspense available ontology, a video and semantic metadata of a video, and then determines 22 one scene of the acquired video suitable for video insertion. As described above, the suspense ontology comprises a knowledge base that semantically describes suspense levels of videos and scenes thereof. The semantic metadata comprises keywords relevant to the content of the videos. The determination of one scene of the acquired video suitable for video insertion is performed by searching in the suspense ontology for keywords relevant to the keywords of the acquired semantic metadata of the video.

In other words, the steps of the method proposed in the current invention can be performed in a single apparatus, device or system, for example, a producer server or a receiver. Or alternatively, these steps can be performed _ - respectively in different apparatus or devices and lead to the same desired results of the determination of one video scene for video insertion. In one embodiment, the two scenes shown in Fig. 5 can be analyzed as described below. The arrows shown in the Fig. 5 are the ways to the subclasses in their different areas. For example: Scene hasGenre MotionPicture starts from the SuspenseScene class, were all scenes are

represented by names. It goes then to the subclass Genre, but not into other subclasses, because Motion Picture is not explicitly mentioned and remains therefore as a representative in the Genre class. Next statement says: Scene hasContext Adultery. It has a more detailed

declaration, because it goes one level deeper. To detect an overall suspense level with a scale number of the statements given for the scenes, the following remarks have to be considered. 1. Detect every vertical level by the representatives and give values for it. Concerning the five main

subclasses of SuspenseScene classes shown in Fig. 6 would have vertical suspense level (VSL) 2, then the subclasses of Genre can reach VSL 3, because no other detailed hierarchy is given. Subclasses of Location can reach VSL 4, for Objects VSL 6 is given and for FilmMusic VSL 3 is reached. For the DramaticTools VSL 5 can be reached.

2. Detect every horizontal level by the representatives and give values for it. Also from Fig. 6 the main classes will be classified with values. Genre will get Horizontal Suspense Level (HSL) 1, Location HSL 2, FilmMusic 3, Object HSL 4 and DramaticTools will get HSL 5. -

3. Check each statement within the class Hierarchy in the Ontology by its End position and build a suspense Level SL with the combination of HSL and VSL saying that SL = HSL * VSL, with its maximum of 25.

4. Accumulate all statements in their levels to get the final suspense level FSL = SL1 + SL2 + SL3 dependimg on the number of statements given in the list above. 5. Another approach could be the distribution of the statements over the five main subclasses. In the examples in Fig. 5, both scenes have one Genre, Location and

DramaticTool representativebut three objects ones and no FilmMusic one.

In the end, the values found for the scene 1 : Love Affair Anger in Fig. 5 are FSL = 2 + 20 + 8 + 20 + 20 + 16 = 86, and for the scene 2: Criminal Bad Message it is FSL = 3 + 20 + 6 + 20 + 16 + 16 = 81. The difference is the detail rate in the highly rated subclasses objects and

DramaticTools .

In addition, it is also important and valuable to define a person' s emotion directly than to say that there is one. For example, when the shooting script in Fig. 4 is analyzed with NLP tools, an additional hint like "Pause" can be integrated into the suspense ontology in the

NarrationPace class by the represantative "Paused". In this case, the suspense level of scene 2 in Fig. 5 would rise by 20 points and eventually reach 101 points.

The interaction of the suspense ontology and the semantic description of a video by a user or derived from a shooting script by NLP tools is further explained as following. As it was mentioned before the ontology can be - optionally filled by an expert with suspenseful scenes with statements like above. It is also possibly through the extraction by NLP tools inside the shooting script from above that additional remarks like "quietly" and "Pause" can be integrated into the dramaticTool class of the ontology. The results of the NLP Stanford core for part of speech information and the grammatical

dependencies can be used for the statement assignment into the ontology. The named entity recognition can extract individuals to the main subclasses location and object, with the sublasses person and event in the ontology. That means that Ontology can then be filled with the representatives of the classes, derived by the NLP extraction. The relationship of the individuals used in the Ontology by the hasProperty construct can also be filled.

The sematic video description by an experienced user or the extraction by NLP tools inside the shooting script should be divided into statements in the same way like it is done for the two scenes shown in Fig. 5. Then the representatives from the video description will be filled into the ontology depending on the classes. When a number of statements are given, the suspense level can be derived and a classification of the scene can thus be given by the ontology.

In one embodiment, the operation unit 103 can detect and decide only the most suspenseful scenes of the video, which normally correspond to the climax of the video or film, for the subsequent video insertion. In a mood of high suspense level, the viewers' attention is fixed on the interesting and important content of the climax scenes, and thus video insertion of advertisement or virtual objects in these scenes can be made subliminal - without distraction for the viewers. This provides additional criteria and advantages for an advertiser to improve the video insertion and increase the

advertisement result.

In one embodiment, the method further comprises

determining 23 one area of the determined scene of the video. This determining step can be, for example, also performed by the operation unit 103 of the system 100. The determined area can be, for example, a flat area, a display or a billboard that is shown in the determined scene of the video data and thus is suitable for video insertion. In addition, the size of the area can be relevant to and depend on the suspense level of the determined scene. For example, in a scene of a low suspense level, the area size can be bigger and wider to attract more attention from the viewer without causing too much interference. The foregoing illustrates the principles of the

disclosure and it will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles and are within its scope. It is therefore to be understood that numerous modifications can be made to the illustrative embodiments and that other arrangements can be devised without departing from the scope of the present principle as defined by the appended claims.