Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MACHINE LEARNING-BASED SYSTEM AND METHOD FOR PROCESSING MOVING IMAGE
Document Type and Number:
WIPO Patent Application WO/2021/107764
Kind Code:
A1
Abstract:
The present invention relates to a machine learning-based system and method for for processing a moving image. The system (10) comprises an input unit (11) for receiving two more image frames of the moving image, wherein the moving image includes one or more subjects being monitored. A processing unit (12) processes the received image frames to predict an incident, wherein the incident is a fight or quarrel between two or more people in the moving image. An output unit (13) outputs prediction result, wherein an alert message is outputted as the prediction result if the incident is predicted.

Inventors:
YUEN SHANG LI (MY)
LIANG KIM MENG (MY)
CHIEN SU FONG (MY)
HON HOCK WOON (MY)
Application Number:
PCT/MY2020/050123
Publication Date:
June 03, 2021
Filing Date:
October 28, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MIMOS BERHAD (MY)
International Classes:
G08B13/196; G08B25/00; H04N7/18
Domestic Patent References:
WO2015128939A12015-09-03
Foreign References:
JP2006287884A2006-10-19
JP2007124526A2007-05-17
JP2013131153A2013-07-04
JP2019152943A2019-09-12
Attorney, Agent or Firm:
KANDIAH, Geetha (MY)
Download PDF:
Claims:
CLAIMS:

1. A system (10) for processing a moving image, comprising: i. at least one input unit (11 ) for receiving at least two adjacent image frames of said moving image, wherein said moving image includes at least one subject being monitored; ii. at least one processing unit (12) for processing said image frames to predict an incident; and iii. at least one output unit (13) for outputting said prediction result, characterized in that said processing unit (12) includes at least one image classification model (12a) for:

- analysing said image frames to compute a motion flow of said subject between said image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image; and

- predicting said incident based on said computed results.

2. The system (10) as claimed in claim 1, wherein said image classification model (12a) computes said magnitude change map and orientation change map by:

- computing said motion flow from said image frames using a standard optical flow algorithm;

- computing said magnitude change map and orientation change map based on said motion flow; and

- storing said computed magnitude change map and orientation change map into a magnitude change database and an orientation change database, respectively.

3. The system (10) as claimed in claim 1, wherein said image classification model (12a) computes said magnitude change history map and said orientation change history map by: - retrieving a recent magnitude change map and orientation change map from said magnitude change database and orientation change database, respectively; and

- creating said magnitude change history map and said orientation change history map based on said retrieved magnitude change map and said retrieved orientation change map, respectively.

4. The system (10) as claimed in claim 2, wherein said image classification model (12a) computes a binary map by filtering said magnitude map using a magnitude threshold and computes said conditional motion map by filtering a motion region using width and height threshold for each connected region, wherein said filtered result is stored as said conditional motion map in a conditional motion map database.

5. The system (10) as claimed in claim 4, wherein said image classification model (12a) retrieves a recent conditional motion map from said conditional motion map database and creates a conditional motion history map based on said retrieved conditional motion map.

6. The system (10) as claimed in claims 5, wherein said image classification model (12a) creates said incident history image by:

- normalizing said magnitude change history map, orientation change history map and conditional motion history map;

- filtering said normalization results using a box kernel filter; and

- merging said filtration results and said normalization results.

7. The system (10) as claimed in claim 1, wherein said image classification model (12a) further includes a classification module for analysing said computed motion flow, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image to predict said incident.

8. The system (10) as claimed in claims 7, wherein said output unit (13) displays an alert message as said prediction result, if said incident is predicted.

9. The system (10) as claimed in claim 1 , wherein said subject is a human subject.

10. A method (20) for processing a moving image, comprising: i. receiving at least two adjacent image frames of said moving image (21), wherein said moving image includes at least one subject being monitored; ii. processing said image frames to predict an incident (22); and iii. outputting said prediction result (23), characterized in that said step of processing said image frames

(22)includes:

- analysing said received image frames to compute a motion flow of said subject between said received image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image; and predicting said incident based on said computed results.

Description:
MACHINE LEARNING-BASED SYSTEM AND METHOD FOR PROCESSING

MOVING IMAGE

FIELD OF THE DISCLOSURE

The disclosures made herein relate generally to the field of moving image processing and, more particularly, to a machine learning-based system and method for processing a moving image for predicting aggressive behavior.

BACKGROUND

Recent developments in imaging technologies have resulted in the ability to quickly and easily process still and moving images, in support of a wide variety of applications. One of the moving image processing applications includes surveillance of public places e.g. shopping center, transport hub, etc. When imaging systems are used for surveillance, it may be highly desirable for the systems to quickly identify the objects and/or people captured in the surveillance images. Especially, in situations where image processing delays the intended progress, such as a riot, stampede, etc.

Conventionally, video systems had been employed to monitor a crowded area, wherein one or more users are employed to monitor a display screen of such video systems to detect an aggressive or unwanted behavior of one or more human subjects captured in surveillance video. Further developments in the field led to automated surveillance systems, wherein video processing equipment was employed for processing the surveillance video to detect any aggressive behavior is captured in the video.

Japanese Patent Publication No.: JP 2006276969 A discloses a system for detecting violence in a closed space e.g. elevator car. The system comprises a camera for capturing the closed space and a motion calculating unit for calculating magnitude and direction of motion of each pixel between adjacent image frames. Based on the calculated magnitude and direction, the system detects occurrence of any violence in the closed space.

Similarly, United States Patent No.: US 10,121,064 B2 discloses a system and method for behavior detection using three dimensional (3D) tracking and machine learning. Even though this system is effective in predicting violent behavior of monitored people, this system uses a 3D imaging system for capturing the video to be processed for detecting the behavior. Capturing and processing 3D images need highly expensive and sophisticated components. Further, this system is not effective in processing and classifying high speed activities of a human subject which is very common during both violence and enjoyment.

Hence, there is a need for a machine learning-based system and method for processing a moving image for predicting incidents irrespective of a number of subjects and movement speed thereof in a surveyed area without a need for expensive and sophisticated components.

SUMMARY

The present invention relates to a machine learning-based system for for processing a moving image. The system comprises an input unit for receiving two more adjacent image frames of the moving image, wherein moving image includes one or more subjects being monitored. A processing unit processes the received image frames to predict an incident. An output unit outputs prediction result, wherein an alert message is outputted as the prediction result if the incident is predicted.

In a preferred embodiment, the processing unit includes an image classification model for analysing the received image frames to compute a motion flow of a subject between the received image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image; and for predicting the incident based on the computed results. The present invention also includes a machine learning-based method for processing a moving image. The method comprises the steps of receiving two or more adjacent image frames of the moving image, processing the received image frames to predict an incident and outputting the prediction result. The received image frames are processed by analysing the received image frames to compute a motion flow between the received image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image.

The incident is predicted based on the computed motion flow, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image. By this way, the present invention is capable of predicting incidents irrespective of a number of subjects and movement speed thereof in a surveyed area.

Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.

FIGURE 1 illustrates a block representation of the machine learning-based system for processing a moving image, in accordance with an exemplary embodiment of the present invention. FIGURE 2 illustrates a flow diagram of the machine learning-based method for processing a moving image, in accordance with an exemplary embodiment of the present invention.

DETAILED DESCRIPTION

In accordance with the present invention, there is provided a machine learning- based system and method for processing a moving image, which will now be described with reference to the embodiment shown in the accompanying drawings. The embodiment does not limit the scope and ambit of the disclosure. The description relates purely to the exemplary embodiment and its suggested applications.

The embodiment herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiment in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiment herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the description should not be construed as limiting the scope of the embodiment herein.

The description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware or programmable instructions) or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “unit,” “module,” or “system.”

Various terms as used herein are defined below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.

Definitions:

Moving image: A continuous sequence of image frames captured with or without audio using a video camera, closed-circuit television and the like. It includes but not limited to movie and surveillance video.

Incident: An event or occurrence of an undesired action of a subject. For example, an act of aggression, violence, hostility, staggering and the like.

Motion flow: A movement pattern between images and denotes a velocity of brightness pattern of image, wherein a movement vector information (x axis and y axis) for each pixel position between the adjacent image frames can be obtained by computing the motion flow.

Change map: A 2-dimensional (2D) matrix including floating point values.

Change history map: A weighted average 2D matrix from multiple change maps. The weighted average is obtained based on freshness, wherein highest weightage is given to the most recent map.

Magnitude change map: A change of motion flow magnitude or an acceleration of the brightness pattern between adjacent image frames. Also, the magnitude change map provides information about an accelerated motion which is one of the critical parameters in predicting the incident. Magnitude change history map: A weighted average of multiple magnitude change maps. This weighted average value indicates a probability of aggressive behavior of the subject on a corresponding pixel position.

Orientation change map: A change of motion flow orientation between adjacent image frames. Normally, an aggressive behavior (e.g. punch, kick) of a subject occurs in a single direction, if an orientation change map shows a single direction then it is an indication of a potential aggressive behavior which may result in an incident.

Orientation change history map: A weighted average of multiple orientation change maps. When a low pass filter is applied, the orientation change map history emphasizes a pixel with higher probability of movement in same direction.

Conditional motion map: Information about motion regions satisfying a condition (magnitude, height and width) defined by a user. The conditional motion map shows small and noise motions caused by undesired camera or environment movements.

Conditional motion history map: A weighted average of collection from multiple conditional motion maps. It indicates one or more pixel position to be processed.

The present invention provides a machine learning-based system and method for processing a moving image. The system comprises a processing unit including an image classification model for computing a motion flow of a subject between two adjacent image frames of the moving image, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image. By this way, the present invention is capable of differentiating one or more subjects from noises present in the image frames and identifying desired attributes such as acceleration, orientation and trends thereof, of the subjects, and therefore incidents can be predicted irrespective of a number of subjects and movement speed thereof in a surveyed area. Referring to the accompanying drawings, FIGURE 1 illustrates a block representation of the machine learning-based system for processing a moving image, in accordance with an exemplary embodiment of the present invention. The system (10) comprises an input unit (11) for receiving two or more image frames of the moving image, wherein the moving image includes one or more subjects being monitored. In a preferred embodiment, the input unit (11) is a closed circuit television (CCTV) system (1) for monitoring human subjects in a prison.

Alternatively, the input unit (11) may be any conventionally available video processing device capable of segmenting a digital or analog video image into multiple digital image frames. Similarly, the moving image is captured using a camera for monitoring people in a crowded place such as shopping mall, stadium, transportation terminal and the like.

A processing unit (12) processes the image frames to predict an incident, wherein the incident is an aggressive behavior such as a fight, stampede, riot and the like among the subjects in the moving image. Preferably, the processing unit (12) is any central processing unit (CPU) or graphical processing unit (GPU) commercially available for processing a moving image. The processing unit (12) includes an image classification model (12a) for analysing the received image frames to compute a motion flow of the subjects between the received image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image. Based on the computed results, the image classification model (12a) predicts the incident.

In a preferred embodiment, the image classification model (12a) is a machine learning-based model. The image classification model (12a) is trained using a set of pre-classified images based on a standard machine learning approach such as Support Vector Machine (SVM), or a deep learning approach such as convolutional neural network.

The image classification model (12a) computes the magnitude change map and the orientation change map by computing the motion flow from the image frames using a standard optical flow algorithm. Further, the image classification model (12a) computes the magnitude change map based on the motion flow, wherein a magnitude map is computed using a motion flow vector at each pixel position of the image frames and an absolute difference between the magnitude map of the two image frames is calculated. The computed difference is stored as the magnitude change map in a magnitude change map database.

Similarly, the image classification model (12a) computes the orientation change map based on the motion flow, wherein an orientation map is computed using a motion flow vector at each pixel position of the image frames and an absolute difference between the orientation map of the two image frames is calculated. The computed difference is stored as the orientation change map in an orientation change map database.

The image classification model (12a) computes the magnitude change history map and the orientation change history map by retrieving the recent magnitude change map and the recent orientation change map from the corresponding database. Further, the recent magnitude map is divided by a magnitude reverse time index, and the result is accumulated to compute the magnitude change history map. The magnitude reverse time index is computed by substituting a total number of retrieved magnitude map to a current map index.

Similarly, the recent orientation map is divided by an orientation reverse time index, and the result is accumulated to create a history map. The orientation reverse time index is computed by substituting a total number of retrieved orientation map to a current map index. An average of the history map is computed and is used as a threshold for low pass filtering of the orientation change map to compute the orientation change history map, wherein if the orientation change map is higher than the computed average, the orientation change map is output as zero. Otherwise, an actual value of the orientation change map is outputted, wherein the orientation change map below the threshold indicates a higher probability of the incident.

The image classification model (12a) computes the conditional motion map by receiving the motion flow and computing a magnitude map using a motion flow vector at each pixel position. Further, the image classification model (12a) computes a binary map by filtering the magnitude map using a magnitude threshold and then filters the motion region in the binary map using width and height threshold for each connected region, wherein the final filtered output is stored as the conditional motion map in a conditional motion map database.

The image classification model (12a) retrieves a recent conditional motion map from the conditional motion map database and creates the conditional motion history map based on the retrieved conditional motion map. The image classification model (12a) divides the retrieved conditional motion map by a conditional motion reverse time index which is computed by substituting a total number of retrieved conditional motion map to a current map index. The divided result is accumulated and stored as the conditional motion history map.

The image classification model (12a) creates the incident history image by normalizing the magnitude change history map, orientation change history map and conditional motion history map, filtering the normalization results and merging the filtered results. The normalization results are filtered by a standard morphological opening process to filter small blobs (noises) using a box kernel filter to smooth the normalization results to reduce spark value in normalization result.

The image classification model (12a) includes a classification module (not shown) for analysing the computed motion flow, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image to predict the incident. By this way, the present invention is capable of differentiating each subject from noises (if any) present in the image frames and identifying desired attributes such as acceleration, orientation and trends thereof, of the subjects based on the analysis, and therefore incidents can be predicted irrespective of a number of subjects and movement speed thereof in a surveyed area.

An output unit (13) outputs prediction result, wherein an alert message is outputted as the prediction result if the incident is predicted. The output unit (13) receives the prediction result from the classification module and checks if the incident is predicted. If yes, an incident counter in the output unit (13) is incremented and a non-incident counter is reset. If no, the non-incident counter is incremented. The incident counter is reset if the non-incident counter reaches a first threshold. When the incident counter reaches a second threshold, the alert message is displayed as the prediction result using a display device e.g. LCD screen. The first and second thresholds are defined manually or automatically based on a frame rate of the moving image.

FIGURE 2 shows a flow diagram of the machine learning-based method for processing a moving image, in accordance with the exemplary embodiment of the present invention. The method (20) comprises the steps of receiving two or more image frames of the moving image (21), processing the received image frames to predict an incident (22) and outputting the prediction result (23). The moving image includes one or more subjects being monitored.

In a preferred embodiment, the moving image is captured using a closed circuit television (CCTV) system for monitoring human subjects in a prison. Alternatively, any conventionally available video processing device capable of segmenting a digital or analog video image into multiple digital image frames is used for capturing the moving image. Similarly, the moving image can also be captured using a camera for monitoring people in a crowded place such as shopping mall, stadium, transportation terminal and the like. Further, the incident includes but not limited to fight, stampede, riot, etc., among the subjects in the moving image.

The step of processing the received image frames includes analysing the received image frames using an image classification model to compute a motion flow of the subject between the received image frames, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image. In a preferred embodiment, the image classification model is a machine learning- based model. Further, the incident is predicted, based on the computed results.

The magnitude change map and the orientation change map are computed by computing the motion flow from the image frames using a standard optical flow algorithm. A magnitude map is computed using a motion flow vector at each pixel position of the image frames and an absolute difference between the magnitude map of the two image frames is calculated. The computed difference is stored as the magnitude change map in a magnitude change map database. Similarly, an orientation map is computed using a motion flow vector at each pixel position of the image frames and an absolute difference between the orientation map of the two image frames is calculated. The computed difference is stored as the orientation change map in an orientation change map database.

The magnitude change history map and the orientation change history map are computed by retrieving the recent magnitude map and the recent orientation map from the corresponding databases. Further, the recent magnitude map is divided by a magnitude reverse time index, and the result is accumulated to compute the magnitude change history map. The magnitude reverse time index is computed by substituting a total number of retrieved magnitude map to a current map index.

Similarly, the recent orientation map is divided by an orientation reverse time index, and the result is accumulated to create a history map. The orientation reverse time index is computed by substituting a total number of retrieved orientation map to a current map index. An average of the history map is computed and is used as a threshold for low pass filtering the orientation change map to compute the orientation change history map, wherein the orientation change map below the threshold indicates a potential incident.

The conditional motion map is computed by receiving the motion flow and computing a magnitude map using a motion flow vector at each pixel position. Further, a binary map is computed by filtering the magnitude map using a magnitude threshold and then filters the motion region using width and height threshold for each connected region, wherein the final filtered output is stored as the conditional motion map in a conditional motion map database. The conditional motion map is a 2 dimension matrix comprising binary values, wherein T denotes a region user wants to focus and Ό’ denotes a region user doesn’t want to focus.

A recent conditional motion map is retrieved from the conditional motion map database and the conditional motion history map is created based on the retrieved conditional motion map. The retrieved conditional motion map is divided by a conditional motion reverse time index computed by substituting a total number of retrieved conditional motion map to a current map index. The divided result is accumulated and stored as the conditional motion history map. The incident history image is created by normalizing the magnitude change history map, orientation change history map and conditional motion history map, filtering the normalization results and merging the filtered results. The normalization results are filtered by a standard morphological opening process to filter small blobs (noises) using a box kernel filter to smooth the normalization results to reduce spark value in the normalization result.

The computed motion flow, magnitude change map, magnitude change history map, orientation change map, orientation change history map, conditional motion map, conditional motion history map and incident history image are analysed to predict the incident. By this way, the present invention is capable of predicting incidents irrespective of a number of subjects and movement speed thereof in a surveyed area.

The prediction result is received and checked if the incident is predicted. If yes, an incident counter is incremented and a non-incident counter is reset. If no, the non incident counter is incremented. The incident counter is reset if the non-incident counter reaches a first threshold. When the incident counter reaches a second threshold, an alert message is displayed as the prediction result using a display device e.g. LCD screen. The first and second thresholds are determined manually or automatically based on a frame rate of the moving image.

Even though the above embodiments show the present invention as being applied to predict an aggressive behaviour among human subjects under surveillance, it is to be understood that the present invention may also be applied for other applications. Such alternate applications include but not limited to predicting behaviour of foetus, animals, insects, microorganisms, elementary particles and the like.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises," "comprising," “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.

The use of the expression “at least” or “at least one” suggests the use of one or more elements, as the use may be in one of the embodiments to achieve one or more of the desired objects or results. Various methods described herein may be practiced by combining one or more machine-readable storage media containing code that perform the steps according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the disclosure could be accomplished by modules, routines, subroutines, or subparts of a computer program product.

While the foregoing describes various embodiments of the disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof. The scope of the disclosure is determined by the claims that follow. The disclosure is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the disclosure when combined with information and knowledge available to the person having ordinary skill in the art.