METHOD OF IMAGE PROCESSING FOR OBJECT IDENTIFICATION AND SYSTEM THEREOF

Title:

METHOD OF IMAGE PROCESSING FOR OBJECT IDENTIFICATION AND SYSTEM THEREOF

Document Type and Number:

WIPO Patent Application WO/2022/015236

Kind Code:

Abstract:

There is provided a method of image processing for object identification. The method includes: obtaining an image of an object; identifying at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determining a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determining whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified. There is also provided a corresponding system for image processing for object identification.

Inventors:

DOU SHUYANG (SG)

Application Number:

PCT/SG2020/050416

Publication Date:

January 20, 2022

Filing Date:

July 17, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HITACHI LTD (JP)

International Classes:

G06K9/62; G06N3/02; G06T7/00

Foreign References:

CN105488479A	2016-04-13
CN108596277A	2018-09-28
CN109145759A	2019-01-04
US20180293552A1	2018-10-11
US20130039569A1	2013-02-14

Attorney, Agent or Firm:

VIERING, JENTSCHURA & PARTNER LLP (SG)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A method of image processing for object identification using at least one processor, the method comprising: obtaining an image of an object; identifying at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determining a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding reference object portion of a reference object, respectively; and determining whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

2. The method according to claim 1 : further comprising performing transformation of the at least one image portion identified to obtain at least one transformed image portion corresponding to the at least one object portion based on an object portions definition data set related to the object, wherein the object portions definition data set comprises a plurality of predefined data subsets defining the plurality of predefined object portions, respectively, related to the object, and wherein for each of the at least one image portion identified, the transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the object portion corresponding to the image portion; and wherein said determining a similarity of the at least one image portion identified is based on the at least one transformed image portion corresponding to the at least one object portion.

3. The method according to claim 2, wherein said identifying at least one image portion of the image comprises, for each image portion corresponding to the corresponding object portion of the object, detecting a set of key points in the image corresponding to a predefined object portion of the plurality of predefined object portions related to the object.

4. The method according to claim 3, wherein the set of key points in the image is detected based on a machine learning model trained based on the plurality of predefined object portions related to the object.

5. The method according to claim 3 or 4, wherein each of the plurality of predefined data subsets comprises a set of rectified coordinates for defining a shape associated with the corresponding predefined object portion.

6. The method according to claim 5, wherein said performing transformation of the at least one image portion identified comprises, for each image portion identified: determining a correspondence relationship between the set of key points detected for the image portion and the set of rectified coordinates associated with the corresponding predefined object portion; and transforming the image portion based on the correspondence relationship determined to obtain the transformed image portion.

7. The method according to claim 6, wherein: the set of rectified coordinates defines a planar polygon associated with the corresponding predefined object portion; and the correspondence relationship comprises an affine transformation matrix, and the image portion is affine transformed based on the affine transformation matrix determined to obtain the transformed image portion in the form of a rectified image portion.

8. The method according to any one of claims 2 to 7, further comprising: obtaining a reference image of the reference object; identifying at least one reference image portion of the reference image corresponding to at least one reference object portion of the reference object based on the plurality of predefined object portions; and performing transformation of the at least one reference image portion identified to obtain at least one transformed reference image portion corresponding to the at least one reference object portion based on the object portions definition data set, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions, and wherein for each of the at least one reference image portion identified, the transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the reference object portion corresponding to the reference image portion.

9. The method according to any one of claims 2 to 7, further comprising obtaining at least one transformed reference image portion corresponding to the at least one reference object portion from a database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions.

10. The method according to any one of claims 2 to 7, further comprises obtaining image features extracted from at least one transformed reference image portion corresponding to the at least one reference object portion from a database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions.

11. The method according to claim 8 to 10, wherein said determining a similarity of the at least one transformed image portion comprises determining, for each transformed image portion, a similarity of the transformed image portion with respect to the corresponding transformed reference image portion to output a corresponding similarity value.

12. The method according to any one of claims 1 to 11, wherein the object comprises a vehicle, a building, or a suitcase, and each of the at least one object portion of the object corresponds to a planar part according to a structure of the vehicle, the building, or the suitcase.

13. The method according to any one of claims 1 to 11, wherein the object comprises a vehicle and the reference object comprises a reference vehicle, and further comprising: identifying a number plate image portion of the image corresponding to a number plate portion of the vehicle; performing number plate recognition of the number plate image portion identified to obtain a number plate information; determining a similarity of the number plate image portion identified with respect to a corresponding reference number plate image portion of a corresponding reference number plate portion of the reference vehicle based on the number plate information; and determining whether the object corresponds to the reference object based on the determined similarity of the number plate image portion identified.

14. The method according to any one of claims 1 to 13, wherein said determining a similarity of the at least one image portion identified is based on pixel intensity correlation, structure similarity or image feature distance.

15. A system for image processing for object identification, the system comprising: a memory; and at least one processor communicatively coupled to the memory and configured to: obtain an image of an object; identify at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determine a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determine whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

16. The system according to claim 15, wherein the at least one processor is further configured to: perform transformation of the at least one image portion identified to obtain at least one transformed image portion corresponding to the at least one object portion based on an object portions definition data set related to the object, wherein the object portions definition data set comprises a plurality of predefined data subsets defining the plurality of predefined object portions, respectively, related to the object, and wherein for each of the at least one image portion identified, the transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the object portion corresponding to the image portion; and wherein said determine a similarity of the at least one image portion identified is based on the at least one transformed image portion corresponding to the at least one object portion.

17. The system according to claim 16, wherein said identify at least one image portion of the image comprises, for each image portion corresponding to the corresponding object portion of the object, detecting a set of key points in the image corresponding to a predefined object portion of the plurality of predefined object portions related to the object.

18. The system according to claim 17, wherein the set of key points in the image is detected based on a machine learning model trained based on the plurality of predefined object portions related to the object.

19. The system according to claim 17 or 18, wherein each of the plurality of predefined data subsets comprises a set of rectified coordinates for defining a shape associated with the corresponding predefined object portion.

20. The system according to claim 19, wherein said perform transformation of the at least one image portion identified comprises, for each image portion identified: determining a correspondence relationship between the set of key points detected for the image portion and the set of rectified coordinates of the corresponding predefined object portion; and transforming the image portion based on the correspondence relationship determined to obtain the transformed image portion.

21. The system according to claim 19, wherein: the set of rectified coordinates defines a planar polygon associated with the corresponding predefined object portion; and the correspondence relationship comprises an affine transformation matrix, and the image portion is affine transformed based on the affine transformation matrix determined to obtain the transformed image portion in the form of a rectified image portion.

22. The system according to any one of claims 16 to 21, wherein the at least one processor is further configured to: obtain a reference image of the reference object; identify at least one reference image portion of the reference image corresponding to at least one reference object portion of the reference object based on the plurality of predefined object portions; and perform transformation of the at least one reference image portion identified to obtain at least one transformed reference image portion corresponding to the at least one reference object portion based on the object portions definition data set, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions, and wherein for each of the at least one reference image portion identified, the transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the reference object portion corresponding to the reference image portion.

23. The system according to any one of claims 16 to 21, wherein the at least one processor is further configured to obtain at least one transformed reference image portion corresponding to the at least one reference object portion from a database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions.

24. The system according to any one of claims 16 to 21, wherein the at least one processor is further configured to obtain image features extracted from at least one transformed reference image portion corresponding to the at least one reference object portion from a database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions.

25. The system according to any one of claims 22 to 24, wherein said determine a similarity of the at least one transformed image portion comprises determining, for each transformed image portion, a similarity of the transformed image portion with respect to the corresponding transformed reference image portion to output a corresponding similarity value.

26. The system according to any one of claims 15 to 25, wherein the object comprises a vehicle, a building, or a suitcase, and each of the at least one object portion of the object corresponds to a planar part according to a structure of the vehicle, the building, or the suitcase.

27. The system according to any one of claims 15 to 25, wherein the object comprises a vehicle and the reference object comprises a reference vehicle, and wherein the at least one processor is further configured to: identify a number plate image portion of the image corresponding to a number plate portion of the vehicle; perform number plate recognition of the number plate image portion identified to obtain a number plate information; determine a similarity of the number plate image portion identified with respect to a corresponding reference number plate image portion of a corresponding reference number plate portion of the reference vehicle based on the number plate information; and determine whether the object corresponds to the reference object based on the determined similarity of the number plate image portion identified.

28. The system according to any one of claims 15 to 27, wherein said determine a similarity of the at least one image portion identified is based on pixel intensity correlation, structure similarity or image feature distance.

29. A computer program product, embodied in one or more non-transitory computer- readable storage mediums, comprising instructions executable by at least one processor to perform a method of image processing for object identification, the method comprising: obtain an image of an object; identify at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determine a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determine whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

Description:

METHOD OF IMAGE PROCESSING FOR OBJECT IDENTIFICATION AND SYSTEM THEREOF

TECHNICAL FIELD

[0001] The present invention generally relates to a method of image processing for object identification and a system thereof.

BACKGROUND

[0002] Object identification or recognition is a very important technology and has a broad range of applications. For example, with respect to vehicle identification, one of the applications includes real-time tracking a vehicle’s location in a restricted area. The number plate of a vehicle may provide useful information for identifying which vehicle it is. Unfortunately, in real world scenarios, especially in cases employing outdoor surveillance camera, it may be difficult to capture an image of a vehicle with a clear number plate. In such cases, image feature based identification approaches may widely be used. However, visual appearance difference due to viewpoint change may degrade the identification accuracy. FIG. 1A depicts a diagram 100 illustrating an example scenario for vehicle recognition across multiple cameras (e.g., at different locations & time), which illustrates a challenging problem due to visual appearance difference caused by viewpoint change. FIG. IB illustrates example images 120a, 120b and 120c of the same vehicle from a VeRi dataset which were obtained from a database, as described in Liu, Xinchen, et al. "Large-scale vehicle re-identification in urban surveillance videos." 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2016. For example, the images 120a, 120b and 120c may be taken by three different cameras with different viewpoints. As illustrated, there is significant visual appearance difference, for example, due to viewpoint change, and therefore it may be difficult to identify that the vehicle in images 120a, 120b and 120c is the same vehicle. FIG. 1C, on the other hand, illustrates example images 150a, 150b and 150c of different vehicles which were obtained from the database, as described in Liu, Xinchen, et al. "Large-scale vehicle re-identification in urban surveillance videos." 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2016. As illustrated, the respective vehicles in the images 150a, 150b and 150c, although are different vehicles, have similar appearance.

[0003] A conventional technique to address the problem above is to utilize affine transformation to eliminate the visual appearance difference. For example, in U.S. Patent No. 8,295,604 entitled "Image search method and device using affine-invariant regions" filed on 23 Oct. 2012, which describes a “neighboring regions” approach to apply the affine transformation, affine invariant regions are extracted from a query image and their neighboring regions, then each region is compared with every region in a pre-processed learning table to obtain votes (i.e., using neighboring regions and affine transformation). In the neighboring region approach, an optimized learning table which contains neighboring regions information and affine transformation is generated during a learning stage, while a voting table is generated during a matching stage by comparing each region (and neighboring regions) of a query image with every region in the learning table. The learning table may contain the neighborhood relationship for each region and corresponding affine transformation. Finally, correspondences are decided by referring to the votes in voting table, e.g., highest votes in each row. However, such a technique requires comparing an extracted region of a query image with every region of a reference image. For example, there is no meaning or correlation of each extracted region. Thus, the conventional approach may compare each region of the query image with every region in the reference images (database images). In this regard, the processing associated with the comparison for all parts of the images for the identification may be inefficient and time consuming.

[0004] A need therefore exists to provide a method of image processing for object identification and a system thereof that seek to overcome, or at least ameliorate, one or more of the deficiencies in conventional image processing methods/systems for object identification. It is against this background that the present invention has been developed.

SUMMARY

[0005] According to a first aspect of the present invention, there is provided a method of image processing for object identification using at least one processor, the method comprising: obtaining an image of an object; identifying at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determining a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determining whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

[0006] According to a second aspect of the present invention, there is provided a system for image processing for object identification, the system comprising: a memory; and at least one processor communicatively coupled to the memory and configured to: obtain an image of an object; identify at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determine a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determine whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

[0007] According to a third aspect of the present invention, there is provided a computer program product, embodied in one or more non-transitory computer-readable storage mediums, comprising instructions executable by at least one processor to perform a method of image processing for object identification, the method comprising: obtaining an image of an object; identifying at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determining a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determining whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

BRIEF DESCRIPTION OF THE DRAWINGS [0008] Embodiments of the present invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:

FIG. 1A depicts a diagram illustrating an example scenario for vehicle recognition across multiple cameras;

FIG. IB illustrates images of the same vehicle which was taken by three different cameras with different viewpoints;

FIG. 1C, on the other hand, illustrates images of different vehicles which have similar appearance;

FIG. 2 depicts a schematic flow diagram of a method (computer-implemented method) of image processing for object identification using at least one processor according to various embodiments of the present invention;

FIG. 3 depicts a schematic block diagram of an image processing system for object identification according to various embodiments of the present invention, such as corresponding to the method shown in FIG.2;

FIG. 4 depicts an example computer system which the system according to various embodiments of the present invention may be embodied in;

FIG. 5A illustrates a schematic of a vehicle divided into four planar parts according to various example embodiments of the present invention;

FIG. 5B illustrates an exemplary schematic of an object portions definition data set which may be defined in relation to the plurality of predefined object parts of the object (e.g., vehicle) in FIG. 5A according to various example embodiments of the present invention;

FIG. 6A illustrates another exemplary schematic of a vehicle divided into planar parts to define a plurality of predefined object portions of the vehicle according to various example embodiments of the present invention;

FIG. 6B illustrates another exemplary schematic of an object portions definition data set in relation to the plurality of predefined object parts related to the object (e.g., vehicle) in FIG. 6A according to various example embodiments of the present invention;

FIGS. 7A and 7B illustrate exemplary frameworks of image processing for object identification according to various example embodiments of the present invention;

FIG. 8 illustrates another exemplary schematic of image processing for object identification according to various example embodiments of the present invention;

FIG. 9 shows an exemplary process flow of image processing for object identification according to various example embodiments of the present invention;

FIG. 10 shows an exemplary schematic diagram of a process of image processing for object identification according to various example embodiments of the present invention;

FIG. 11 shows two exemplary images of an object each comprising an image portion of the image corresponding to an object portion and transformed image portions according to various example embodiments of the present invention;

FIG. 12 illustrates a schematic in relation to key point detection and affine rectification for two different vehicle parts according to various example embodiments of the present invention;

FIG. 13 shows another exemplary process flow of image processing for object identification according to various example embodiments of the present invention;

FIG. 14 shows an exemplary process flow for determining a similarity of at least one transformed image portion corresponding to at least one object portion according to various example embodiments of the present invention; and

FIG. 15 illustrate an exemplary framework of image processing for object identification combined with vehicle number plate recognition according to various example embodiments of the present invention. DETAILED DESCRIPTION

[0009] Various embodiments of the present invention provide a method (computer- implemented method) of image processing for object identification and a system (including a memory and at least one processor communicatively coupled to the memory) thereof. [0010] FIG. 2 depicts a schematic flow diagram of a method 200 (computer- implemented method) of image processing for object identification using at least one processor according to various embodiments of the present invention. The method 200 comprises obtaining (at 202) an image (e.g., a query image) of an object (target object); identifying (at 204) at least one image portion of the image corresponding to (e.g., thereof or in relation to) at least one object portion of the object (i.e., respectively) based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determining (at 206) a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determining (at 208) whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

[0011] In relation to 202, for example, the object may be an object(s) having (approximately) planar portions or parts which are discriminative that is desired to be identified (or recognized). For example, the object may be an artifact having at least approximately or substantially planar portions which are discriminative relative to one another and is useful for identification/recognition purposes. In various embodiments, the image of the object may be obtained from one or more image capturing devices. The one or more image capturing devices may be of any type(s) known in the art, for example, may be a camera at a location or a plurality of cameras disposed at different locations of an area or region desired to be covered. In various embodiments, the one or more image capturing devices may produce an image of the object and at least one processor may then obtain (or receive) the image for processing to identify (or recognize) the object. In other embodiments, the image of the object may be obtained from a database and at least one processor may then obtain (or receive) the image for processing to identify (or recognize) the object. [0012] In relation to 204, the above-mentioned identifying at least one image portion of the image corresponding to at least one object portion of the object may include identifying multiple image portions of the image corresponding to multiple object portions of the object based on the plurality of predefined object portions related to the object. The multiple object portions may each correspond to a respective predefined object portion of the plurality of predefined object portions. For example, an object desired to be identified may be defined (e.g., divided or segmented) into a plurality of predefined object portions, each corresponding to a portion (e.g., a unique part) of the object which may be helpful for images thereof to be processed according to various embodiments for identifying (e.g., recognizing) the object (or the type of object). Each of the plurality of predefined object portions (e.g., predefined object parts) related to the object may be an (approximately) planar portion according to a structure of the object and is unique to the object. In other words, each predefined object portion may be a unique portion of the object to facilitate identification. Accordingly, each of the plurality of predefined object portions may have meaning in relation to object structure (i.e., related to vehicle structure). The predefined object portions may be defined by one or more users based on domain knowledge with identification purpose in mind prior to processing an image for object identification such that each predefined object portion may provide unique information for object identification or recognition (e.g., each predefined object portion is selected and pre defined by one or more users using domain knowledge with identification purpose in mind). Accordingly, the at least one image portion identified may be co-related to one predefined object portion. For example, each of the plurality of predefined object portions may include unique information (or unique features) related the object to facilitate subsequent comparison (or matching) of an image portion identified from the query image with a corresponding image portion identified from a reference image (i.e., comparison of corresponding image portion pairs from different images). For example, in the case of the object being a vehicle, each of the plurality of predefined object portions may correspond to a corresponding vehicle part or structure, such as front face, tail face, left side, right side, and so on. Accordingly, the at least one image portion identified may include unique information (or unique features) of the object for facilitating subsequent comparison (or matching). In various embodiments, the predefined object portions may overlap each other. In other words, the predefined object portions related to the object may be defined such that the two or more object portions comprise overlapping area. The overlap may be partial or complete. For example, one or more users may define “front face” and “number plate area at a front face” as two different object portions. In such case, the latter predefined object portion is completely overlapped in the former predefined object portion.

[0013] In relation to 206, the at least one corresponding reference image portion of at least one corresponding object portion of a reference object may be identified based on the plurality of predefined object portions related to the object. In other words, image portions of corresponding object portion pairs from different images (e.g., query image and reference image) may be compared to determine a similarity of the at least one transformed image portion corresponding to the at least one object portion. For example, each identified image portion corresponding to each object portion (e.g., of a query image and reference image) is already correlated or corresponds to at least one predefined object portion of the plurality of predefined object portions, therefore only image portions of corresponding object portion pairs from different images are compared (e.g., image portion corresponding to front face of vehicle from query image compared to reference image portion corresponding to front face of vehicle from reference image). For example, in the case of the object being a vehicle and and each of the at least one object portion of the object corresponds to a part of the vehicle, there is no need to compare an image portion of a “Front” part of a query image with an image portion of a “Rear” part of a reference image since it is already known that the image portions correspond to which vehicle parts. In various embodiments, the at least one image portion identified may be compared with the at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively, for example, to determine similarity values of corresponding object portion pairs from the image and the reference image.

[0014] In relation to 208, the above-mentioned determining whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified may provide information whether the object and the reference object are the same objects, same type of objects, same class of objects, etc.

[0015] Accordingly, various embodiments provide a method of image processing for object identification based on the plurality of predefined object portions related to the object. The plurality of predefined object portions may be predefined discriminative (approximately) planar portions. Unlike conventional identification techniques which extract object parts automatically based on an algorithm (e.g., by detecting intensity change of image pixels) which have no meaning of object structure (e.g., vehicle structure) and compare each extracted part of the query image with all extracted parts from a reference image (e.g., one to all” comparisons), various embodiments of the present invention compare image portions of corresponding object portion pairs from the query image and the reference image based on the plurality of predefined object portions (e.g., “one to one” comparisons), which improves the identification accuracy by avoiding mismatching. Further, various embodiments of the present invention increases processing speed in relation to determining the similarity of each image portion identified with corresponding reference image portions as only corresponding object portion pair may be compared (e.g., it is not necessary to compare the image portion corresponding to the front face part of a vehicle from a query image with the image portion corresponding to the left body part of a vehicle from a reference image).

[0016] In various embodiments, the method 200 may further comprise performing transformation of the at least one image portion identified to obtain at least one transformed image portion corresponding to the at least one object portion (i.e., respectively) based on an object portions definition data set related to the object. In this regard, the object portions definition data set comprises a plurality of predefined data subsets defining the plurality of predefined object portions, respectively, related to the object. Furthermore, for each of the at least one image portion identified, the above-mentioned transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the object portion corresponding to the image portion. Accordingly, the above-mentioned determining a similarity of the at least one image portion identified may be based on the at least one transformed image portion corresponding to the at least one object portion.

[0017] The transformation rectifies the image before identification, so that the difference or distortion in visual appearance may be eliminated or minimized. The transformation may be applied to each image portion identified to eliminate the visual appearance difference. [0018] In various embodiments, each predefined object portion may be defined by a set of key points. A set of key points may include at least four key points. The key points in a set of key points of a predefined object portion may be interconnected to form a planar polygon defining the object portion.

[0019] In various embodiments, the above-mentioned identifying at least one image portion of the image (based on a plurality of predefined object portions related to the object) comprises, for each image portion corresponding to the corresponding object portion of the object, detecting a set of key points in the image corresponding to a predefined object portion of the plurality of predefined object portions related to the object.

[0020] In various embodiments, the set of key points in the image may be detected based on pattern recognition or a machine learning model trained based on the plurality of predefined object portions related to the object. In various embodiments, the machine learning model may comprise a neural network such as a plurality of convolutional neural networks (CNNs) each trained based on a respective predefined object portion of the plurality of predefined object portions related to the object. The plurality of CNNs may be trained separately for respective predefined object portions to detect a set of key points in the image corresponding to a predefined object portion. For example, each CNN may be trained based on images comprising an object portion corresponding to a respective predefined object portion for the CNN. In other words, each CNN of the plurality of CNN may be trained for a particular predefined object portion and used to detect key points for that predefined object portion based on an image comprising an object portion corresponding to that predefined object portion. In other embodiments, a single CNN model may be used to detect all key points in relation to all predefined object portions. In this case, the CNN model may be trained using all key points in relation to all the predefined object portions. In yet other embodiments, a CNN may be used to detect key points related to two or more predefined object portions. For example, in the case of a vehicle having five predefined vehicle portions, two CNNs may be used for key points detection. In such case, one CNN may be used to detect key points related to two predefined vehicle portions, and another CNN may be used to detect key points related to the remaining three predefined vehicle portions. The image of the object may be input into the machine learning model, and the machine learning model may output the location of each key point in the image. For example, the output of a CNN may be a set of key points (e.g., a set of coordinates of corresponding key points in the image).

[0021] In various embodiments, each of the plurality of predefined data subsets comprises a set of rectified coordinates for defining a shape associated with the corresponding predefined object portion. In various embodiments, the set of rectified coordinates defines a planar polygon associated with the corresponding predefined object portion.

[0022] In various embodiments, the above-mentioned performing transformation of the at least one image portion identified comprises, for each image portion identified: determining a correspondence relationship between the set of key points detected for the image portion and the set of rectified coordinates associated with the corresponding predefined object portion; and transforming the image portion based on the correspondence relationship determined to obtain the transformed image portion.

[0023] In various embodiments, the correspondence relationship comprises an affine transformation matrix.

[0024] In various embodiments, the image portion is affine transformed based on the affine transformation matrix determined to obtain the transformed image portion in the form of a rectified image portion. For example, the transformation of the at least one image portion identified to obtain at least one transformed image portion corresponding to the at least one object portion may be an affine transformation or rectification.

[0025] In various embodiments, the method 200 may further comprise: obtaining a reference image of the reference object; identifying at least one reference image portion of the reference image corresponding to at least one reference object portion of the reference object based on the plurality of predefined object portions; and performing transformation of the at least one reference image portion identified to obtain at least one transformed reference image portion corresponding to the at least one reference object portion based on the object portions definition data set, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions, and wherein for each of the at least one reference image portion identified, the transformation is performed based on the predefined data subset defining the predefined object portion of the plurality of predefined object portions corresponding to the reference object portion corresponding to the reference image portion.

[0026] In various embodiments, the transformation of the at least one reference image portion identified may be the same or similar to that of the transformation of the at least one image portion described above.

[0027] In various embodiments, the reference image of the reference object may be a different frame captured by the same image capture device from which the image (or query image) of the object is obtained. In other words, the image and the reference image may be obtained from different frames captured by the same image capture device. For example, the image and the reference image may be sequential frames in a predetermined time period. In other embodiments, the reference image of the reference object may be obtained from a frame of a different image capture device than an image capture device from which the image (or query image) of the object is obtained. The method may then determine whether the object in the image (or query image) corresponds to (the same as) the reference object in the reference image based on the determined similarity of the at least one transformed image portion corresponding to the at least one object portion. For example, the object and the reference object may be the same object, same type of object same classification, etc. In other cases, the object and the reference object may be different. [0028] In various embodiments, the reference image of the reference object may be obtained in real time from one or more image capture devices. In other embodiments, the reference image of the reference object may be obtained from a database.

[0029] In various embodiments, the method 200 may further comprise obtaining at least one transformed reference image portion corresponding to the at least one reference object portion from the database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions.

[0030] In various embodiments, the method 200 may further comprise obtaining image features extracted from at least one transformed reference image portion corresponding to the at least one reference object portion from the database, wherein the at least one reference object portion corresponds to at least one predefined object portion of the plurality of predefined object portions. [0031] In various embodiments, the above-mentioned determining a similarity of the at least one transformed image portion comprises determining, for each transformed image portion, a similarity of the transformed image portion with respect to the corresponding transformed reference image portion to output a corresponding similarity value.

[0032] In various embodiments, the above-mentioned determining a similarity of the at least one image portion identified may be based on pixel intensity correlation, structure similarity or image feature distance.

[0033] By comparing, for each transformed image portion, the transformed image portion with respect to the corresponding transformed reference image portion, the similarity for corresponding object portion pairs from different images (e.g., query image and reference image) may be determined. The similarity may be determined based on intensity correlation, structure similarity, Normalized Crossed Correlation (NCC) between the transformed image portion from a query image and transformed reference image portion from a reference image, or combination thereof. For example, the NCC for each transformed image portion may be determined and the average of all NCCs may be computed to obtain a final similarity value. For example, when the final similarity value is higher than a predefined threshold, the object in the query image may be determined to correspond to the reference image (e.g., the two objects in the query image and the reference image may be recognized as the same object). Other techniques may also be used to determine the similarity such as extracting features from two images (e.g., query image and reference image), and compute the feature distance (e.g., the Euclidean distance) as similarity (or lack of similarity thereof).

[0034] In various embodiments, the object comprises a vehicle, a building, or a suitcase, and each of the at least one object portion of the object corresponds to a planar part according to a structure of the vehicle, the building, or the suitcase. The vehicle may be a car, a train, an airplane, in various non-limiting examples. It is understood that the present invention is not limited to the above-mentioned objects. For example, other types of objects having discriminative (approximately) planar parts may also be used in the image processing method as described according to various embodiments of the present invention for object identification (or recognition). In the case where affine transformation is used to transform or rectify the at least one image portion identified, the object may be an artefact which may be defined or divided into multiple (approximately) planar portions. In other words, in the case affine transformation is applied to each image portion of the at least one image portion identified, the predefined object portion is an approximately planar portion. [0035] As described, various embodiments according to the present invention provides the object portions definition data set related to the object as an input to the image processing method, identifies at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, and performs transformation of the at least one image portion identified to obtain at least one transformed image portion. The similarity of each of the at least one transformed image portion may then be determined with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object to determine whether the object corresponds to the reference object. [0036] According to various embodiments, the present invention may be applied in use cases that need vehicle identification or vehicle type classification (e.g., deciding the type of the vehicle). For example, one highly demanded use case is to track a vehicle in a restricted area in real-time. It is quite often that the tracking may be broken, for example, due to occlusion, limitation of cameras’ field of view (FoV), etc. In this case, the tracking may be resumed by identifying the vehicle with previously tracked vehicles. Another common usage of vehicle identification is to identify whether the query vehicle is one of the vehicles registered in a current database. Accordingly, a more stable vehicle tracking in outdoor environment or more reliable vehicle identification (which is free of viewpoint change) may be implemented. With respect to vehicle type classification, for example, some representative vehicle images of each vehicle type may be chosen, and when the query image matches one of the representative vehicle images, the vehicle type of the query image man be defined as the matched representative image.

[0037] In various embodiments, the object comprises a vehicle and the reference object comprises a reference vehicle, and the method 200 may further comprise identifying a number plate image portion of the image corresponding to a number plate portion of the vehicle; performing number plate recognition of the number plate image portion identified to obtain a number plate information; determining a similarity of the number plate image portion identified with respect to a corresponding reference number plate image portion of a corresponding reference number plate portion of the reference vehicle based on the number plate information; and determining whether the object corresponds to the reference object based on the determined similarity of the number plate image portion identified. [0038] FIG. 3 depicts a schematic block diagram of an image processing system 300 for object identification according to various embodiments of the present invention, such as corresponding to the method 200 of image processing for object identification as described hereinbefore according to various embodiments of the present invention.

[0039] The system 300 comprises a memory 304, and at least one processor 306 communicatively coupled to the memory 304 and configured to: obtain an image of an object; identify at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object, the at least one object portion corresponds to at least one predefined object portion of the plurality of predefined object portions; determine a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and determine whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified.

[0040] It will be appreciated by a person skilled in the art that the at least one processor 306 may be configured to perform the required functions or operations through set(s) of instructions (e.g., software modules) executable by the at least one processor 306 to perform the required functions or operations. Accordingly, as shown in FIG. 3, the system 300 may further comprise an image obtaining module (or circuit) 308 configured to obtain an image of an object; an image processing module (or circuit) 310 configured to identify at least one image portion of the image corresponding to at least one object portion of the object; a similarity determining module (or circuit) 312 configured to determine a similarity of the at least one image portion identified with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object, respectively; and an object identification module (or circuit) 314 configured to determine whether the object corresponds to the reference object based on the determined similarity of the at least one image portion identified. [0041] It will be appreciated by a person skilled in the art that the above-mentioned modules (or circuits) are not necessarily separate modules, and two or more modules may be realized by or implemented as one functional module (e.g., a circuit or a software program) as desired or as appropriate without deviating from the scope of the present invention. For example, the image obtaining module 308, the image processing module 310, the similarity determining module 312, and/or the object identification module 314 may be realized (e.g., compiled together) as one executable software program (e.g., software application or simply referred to as an “app”), which for example may be stored in the memory 304 and executable by the at least one processor 306 to perform the functions/operations as described herein according to various embodiments.

[0042] In various embodiments, the system 300 corresponds to the method 200 as described hereinbefore with reference to FIG. 2, therefore, various functions/operations configured to be performed by the least one processor 306 may correspond to various steps or operations of the method 200 described hereinbefore according to various embodiments, and thus need not be repeated with respect to the system 300 for clarity and conciseness. In other words, various embodiments described herein in context of the methods are analogously valid for the respective systems (e.g., which may also be embodied as devices). [0043] For example, in various embodiments, the memory 304 may have stored therein the image obtaining module 308, the image processing module 310, the similarity determining module 312, and/or the object identification module 314, which respectively correspond to various steps or operations of the method 200 as described hereinbefore, which are executable by the at least one processor 36 to perform the corresponding functions/operations as described herein.

[0044] A computing system, a controller, a microcontroller or any other system providing a processing capability may be provided according to various embodiments in the present disclosure. Such a system may be taken to include one or more processors and one or more computer-readable storage mediums. For example, the system 300 described hereinbefore may include a processor (or controller) 306 and a computer-readable storage medium (or memory) 304 which are for example used in various processing carried out therein as described herein. A memory or computer-readable storage medium used in various embodiments may be a volatile memory, for example a DRAM (Dynamic Random Access Memory) or a non-volatile memory, for example a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), or a flash memory, e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory).

[0045] In various embodiments, a “circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing software stored in a memory, firmware, or any combination thereof. Thus, in an embodiment, a “circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g., a microprocessor (e.g., a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A “circuit” may also be a processor executing software, e.g., any kind of computer program, e.g., a computer program using a virtual machine code, e.g., Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a “circuit” in accordance with various alternative embodiments. Similarly, a “module” may be a portion of a system according to various embodiments in the present invention and may encompass a “circuit” as above, or may be understood to be any kind of a logic-implementing entity therefrom.

[0046] Some portions of the present disclosure are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.

[0047] Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as “determining”, “obtaining”, “identifying”, “performing”, or the like, refer to the actions and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical quantities within the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.

[0048] The present specification also discloses a system (which may also be embodied as a device or an apparatus) for performing the operations/functions of the methods described herein. Such a system may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose machines may be used with computer programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate.

[0049] In addition, the present specification also at least implicitly discloses a computer program or software/functional module, in that it would be apparent to the person skilled in the art that the individual steps or operations of the methods described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the scope of the invention. It will be appreciated by a person skilled in the art that various modules described herein (e.g., the image obtaining module 308, the image processing module 310, the similarity determining module 312, and/or the object identification module 314) may be software module(s) realized by computer program(s) or set(s) of instructions executable by a computer processor to perform the required functions, or may be hardware module(s) being functional hardware unit(s) designed to perform the required functions. It will also be appreciated that a combination of hardware and software modules may be implemented.

[0050] Furthermore, one or more of the steps or operations of a computer program/module or method described herein may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general- purpose computer. The computer program when loaded and executed on such a general- purpose computer effectively results in an apparatus that implements the steps or operations of the methods described herein.

[0051] In various embodiments, there is provided a computer program product, embodied in one or more computer-readable storage mediums (non-transitory computer- readable storage medium), comprising instructions (e.g., the image obtaining module 308, the image processing module 310, the similarity determining module 312, and/or the object identification module 314) executable by one or more computer processors to perform a method 200 of image processing for object identification as described hereinbefore with reference to FIG. 2. Accordingly, various computer programs or modules described herein may be stored in a computer program product receivable by a system (e.g., a computer system or an electronic device) therein, such as the system 300 as shown in FIG. 3, for execution by at least one processor 306 of the system 300 to perform the required or desired functions.

[0052] The software or functional modules described herein may also be implemented as hardware modules. More particularly, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the software or functional module(s) described herein can also be implemented as a combination of hardware and software modules.

[0053] In various embodiments, the above-mentioned computer system may be realized by any computer system (e.g., portable or desktop computer system), such as a computer system 400 as schematically shown in FIG. 4 as an example only and without limitation. Various methods/operations or functional modules (e.g., the image obtaining module 308, the image processing module 310, the similarity determining module 312, and/or the object identification module 314) may be implemented as software, such as a computer program being executed within the computer system 400, and instructing the computer system 400 (in particular, one or more processors therein) to conduct the methods/functions of various embodiments described herein. The computer system 400 may comprise a computer module 402, input modules, such as a keyboard 404 and a mouse 406, and a plurality of output devices such as a display 408, and a printer 410. The computer module 402 may be connected to a computer network 412 via a suitable transceiver device 414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN). The computer module 402 in the example may include a processor 418 for executing various instructions, a Random Access Memory (RAM) 420 and a Read Only Memory (ROM) 422. The computer module 402 may also include a number of Input/Output (I/O) interfaces, for example I/O interface 424 to the display 408, and I/O interface 426 to the keyboard 404. The components of the computer module 402 typically communicate via an interconnected bus 428 and in a manner known to the person skilled in the relevant art.

[0054] It will be appreciated by a person skilled in the art that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising", or the like such as “includes” and/or “including”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0055] In order that the present invention may be readily understood and put into practical effect, various example embodiments of the present invention will be described hereinafter by way of examples only and not limitations. It will be appreciated by a person skilled in the art that the present invention may, however, be embodied in various different forms or configurations and should not be construed as limited to the example embodiments set forth hereinafter. Rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the present invention to those skilled in the art. [0056] In particular, for better understanding of the present invention and without limitation or loss of generality, various example embodiments of the present invention will now be described with respect to image processing for identifying or recognizing a vehicle. For example, an image (e.g., query image) of a vehicle to be identified may be processed based on a plurality of predefined object portions related to the vehicle, performing transformation of the image based on an object portions definition data set related to the vehicle and determining a similarity of the processed image with respect to a reference image of a reference vehicle so as to determine whether the object corresponds to the reference vehicle in the reference image. However, it will be appreciated by a person skilled in the art that the present invention is not limited to identifying vehicles, and the method of image processing for object identification as disclosed herein according to various embodiments may be applied to identify other types of objects, such as but not limited to, suitcases, buildings, humans, etc.

[0057] In various example embodiments, a vehicle (e.g., an object) may be divided into multiple parts (object portions) to define a plurality of predefined object portions related to the vehicle. Each selected part of the vehicle may be an approximately planar part for its corresponding predefined object portion. For example, the planar parts of an image may be identified (and extracted) and the corresponding affine transformation matrix may be estimated as affine transformation can only be applied to planar objects. FIG. 5A illustrates a schematic 500a of a vehicle divided into four planar parts, including a front face 502, a tail face 504, a left side 506, and a right side 508 to define four predefined object portions (a plurality of predefined object portions) of the vehicle. Each planar part of the vehicle may be selected to define the corresponding predefined object portion such that the predefined object portion is unique to the vehicle from other predefined object portions. For example, unique parts may be selected to define the plurality of predefined object portions related to the object so as to avoid subsequent mismatching problem during processing of the image.

[0058] FIG. 5B illustrates an exemplary schematic 500b of an object portions definition data set which may be defined in relation to the plurality of predefined object parts of the object (e.g., vehicle) in FIG. 5A. The object portions definition data set, for example, may be a predefined table comprising a plurality of predefined data subsets (e.g., 512, 514, 516 and 518) defining the plurality of predefined object portions (e.g., 502, 504, 506, and 508), respectively, related to the object (e.g., vehicle) in FIG. 5A. It is understood that the number and name of each vehicle part in FIG. 5A and FIG. 5B is merely for purpose of illustration. The object portions definition data set should clearly provide data set with respect to coordinates or data points defining a vehicle part. In various example embodiments, each of the plurality of predefined data subsets comprises a set of rectified coordinates (or data points) 520 for defining a shape of a transformed image portion corresponding to the corresponding predefined object portion. For example, each set of rectified coordinates may comprise four or more rectified coordinates (or data points) for defining a shape of a transformed image portion corresponding to the corresponding predefined object portion. Each rectified coordinate may correspond to a key point of a predefined object portion. For each set of rectified coordinates, the rectified coordinates of the set may be connected to uniquely define the shape of the transformed image portion corresponding to the corresponding predefined object portion. In various example embodiments, the set of rectified coordinates defines a planar polygon of the corresponding predefined object portion.

[0059] The meaning of each rectified coordinate or data point may be indicated or defined in the object portions definition data set related to the object. In a non-limiting example, the rectified coordinates or data point in the object portions definition data set related to the object may correspond to upper left corner of left fog lamp of the vehicle, upper right corner of right fog lamp of the vehicle, lower right corner of right headlight of the vehicle, lower left corner of left headlight of the vehicle, upper left corner of left rear fog lamp of the vehicle, etc. Each rectified coordinate or data point may be a unique point in the object according to its definition. Thus, it may be desirable to use general terms rather than special terms that are used only by some specific makers/users to define, for example, the vehicle coordinates or data points in the case the object is a vehicle. For example, the terms or definition of headlight, side window, front glass, roof which are usually easily understood and will not cause ambiguity may be used. It is noted that some vehicles may not have all the predefined vehicle parts (e.g., some cars do not have a roof) used in the object portions definition data set. In such case, it may be necessary to make the definition to be more common so as to consider such “uncommon” vehicles. In other words, the object portions definition data set may be predefined to define the rectified coordinates or data points uniquely in objects (e.g., target vehicles) without any ambiguity. [0060] FIGS. 5 A and 5B illustrates the rectified coordinates in each set of coordinates corresponding to key points for defining a shape of the corresponding predefined object portion to be corner points. For example, in the case the front part 502 of the vehicle is defined as a predefined object portion, the predefined data subset 512 for the front part 502 of the vehicle comprises a set of rectified coordinates 520 which includes coordinates corresponding to key points (e.g., upper left corner of left fog of lamp, upper right corner of right fog of lamp, lower right corner of right headlight, lower left corner of left headlight) to define a shape of the transformed image portion corresponding to the front part 502 of the vehicle. However, it is understood that the rectified coordinates of each set of rectified coordinates for defining the shape of a transformed image portion corresponding to the corresponding predefined object portion are not limited to corner points. For example, some coordinates or data points inside the vehicle parts may also be used if the coordinates or data points in the vehicle can be uniquely defined.

[0061] In various example embodiments, the set of rectified coordinates 520 are the expected coordinates of vehicle points in a transformed image portion (rectified image portion or rectified view or undistorted image portion). A transformed image portion may be a normalized image corresponding to the object portion (vehicle part) which is invariant of viewpoint change. For example, a top-down view may be selected to be the transformed image portion. In this case, all vehicle parts may be transformed or rectified to their top- down views before comparison. Thus, the set of rectified coordinates 520 are parameters for calculating the transformed image portion. One may roughly choose the value so that the transformed image portion may have good visual features. It is understood that the top- down view is described as an example illustration, however other orientations known to those skilled in the art may also be employed. Further, different vehicle parts may be converted to views with different orientations. For example, “bonet” may be converted to top-down view, while “roof’ may be converted to a view with 45 degree of counter clock. [0062] FIG. 6A illustrates another exemplary schematic 600a of a vehicle divided into planar parts to define a plurality of predefined object portions of the vehicle, while FIG. 6B illustrates another exemplary schematic 600b of an object portions definition data set in relation to the plurality of predefined object parts related to the object (e.g., vehicle) in FIG. 6A. The object portions definition data set, for example, may be a predefined table comprising a plurality of predefined data subsets (e.g., 612, 614, 616, 618, 620, 622 and 624) defining the plurality of predefined object portions, respectively, related to the object (e.g., vehicle) in FIG. 6A. Each of the plurality of predefined data subsets comprises a set of rectified coordinates or data points 520 for defining a shape of the corresponding predefined object portion. For example, referring to FIG. 6B, “50 pixels” (rectified coordinates) may be used for both length and width of the transformed image portion corresponding to the predefined object portion 624 (e.g., Roof). However, with respect to the predefined object portion 624 (e.g., LeftBody), “350 pixels” may be used for its length, while “50 pixels” may be used for its height. Thus, the value of the set of rectified coordinates 520 depends on the shape of the transformed image portion of the vehicle part which may be experience -based. In other words, these values reflect the expected shape of the vehicle part (in a transformed image portion).

[0063] FIGS. 5A-5B and FIGS. 6a-6B illustrate eleven vehicle parts being defined which encompasses most visible parts of the vehicle by a camera view. It should be noted that each vehicle part may be further divided into multiple small parts. For example, the “LeftBody” part may be divided into “front door”, “back door”, etc. In various example embodiments, the predefined object portions may overlap each other. In other words, the predefined object portions related to the object may be defined such that the object portions are overlapping. For example, in such cases, a rectified coordinate may form two or more sets of rectified coordinates corresponding to two or more predefined object portions. [0064] In various example embodiments, an image of an object (e.g., vehicle) to be identified or recognized may be obtained, and based on the plurality of predefined object portions related to the object, image portions of the image corresponding to object portions of the object may be identified or extracted. A transformation may be performed for each image portion identified based on the object portions definition data set to rectify distortions in visual appearance so as to reduce or minimize visual appearance difference in the image portion identified, for example, which may be caused by a camera’s viewpoint change. The transformation may reduce or minimize visual appearance difference of corresponding object portions of a same object in different images (e.g., obtained image and reference image). In various example embodiments, the transformation may be an affine transformation used to rectify each image portion identified so as to obtain transformed image portions corresponding to object portions. Each transformed image portion may be compared to a corresponding reference image portion of a corresponding object portion of a reference object to determine the similarity.

[0065] FIGS. 7A and 7B illustrate exemplary frameworks 700a and 700b, respectively, of image processing for object identification according to various example embodiments of the present invention. In various example embodiments, an image (query image) 702 of an object (e.g., vehicle) may be obtained. The image 702 of the object, for example, may be an arbitrary image of vehicle to be identified. For example, the image of the object 702 may be a real-time frame captured by an outdoor surveillance camera. The query image 702 of the object may be divided into multiple image portions corresponding to multiple object portions (e.g., vehicle parts) 704, respectively, according to the plurality of predefined object portions 706 related to the object (e.g., identifying at least one image portion of the image corresponding to at least one object portion of the object based on a plurality of predefined object portions related to the object). In various example embodiments, at least one image portion of the image 702 may be identified by detecting, for each image portion corresponding to the corresponding object portion of the object, a set of key points in the image 702 corresponding to a predefined object portion of the plurality of predefined object portions related to the object. The set of key points in the image 702 may be automatically detected based on a machine learning model trained based on the plurality of predefined object portions related to the object. In a non-limiting example, the machine learning model may be a Convolutional Neural Network (CNN) model. The key points may be detected sequentially or simultaneously.

[0066] A transformation (or rectification) 708 of each image portion identified may be performed to obtain transformed image portions 710 corresponding to object portions identified in the query image 702. The transformation 708 of each image portion identified may be performed according to an estimated affine transformation matrix. For example, the detected key points and corresponding rectified coordinates (predefined in vehicle part definition) may be used to estimate an affine matrix. By using these detected vehicle points and their corresponding rectified coordinates (which are defined beforehand), the affine transformation matrix may be estimated. The affine transformation matrix is a 3x3 homogenous matrix which defines the relationship between two planes. It should be noted that this matrix has eight degrees-of-freedom, which means at least four corresponding coordinate pairs should be used for estimation (e.g., at least four coordinates for each set of coordinates are necessary for each predefined objection portion.

[0067] The estimated affine transformations may be applied to each image portion identified to rectify the visual appearance difference, for example, caused by one or more cameras’ viewpoint change. Each transformed image portion 710 corresponding to an object portion identified in the query image 702 may be compared to a corresponding reference image portion 720 of a corresponding object portion of a reference object to determine a similarity of the transformed image portion 710 corresponding to the object portion (i.e., comparing image portions of corresponding object portion pairs).

[0068] Referring to FIG. 7A, in various example embodiments, the corresponding reference image portion 720 of the corresponding object portion of the reference object may be obtained from a database 725. For example, a reference image of the reference object may be obtained and pre-processed and stored into the database 725. For example, pre-processing of the reference image may similarly include dividing the reference image into multiple reference image portions corresponding to multiple reference object portions, respectively, according to the plurality of predefined object portions 706 related to the object (identifying at least one reference image portion of the reference image corresponding to at least one reference object portion of the reference object based on the plurality of predefined object portions), and performing a transformation of each reference image portion identified to obtain transformed reference image portions corresponding to reference object portions identified in the reference image. The transformed reference image portions may be stored into the database 725 or cached to speed up the image processing, for example, if an image gallery (comprising the reference image) is seldom changed.

[0069] In another example embodiment, processing of the reference image may be performed in real-time. Referring to FIG. 7B, a reference image 732 of the reference object may be obtained, for example, from an image gallery. The reference image 732 may be processed in the same or similar manner to the processing of the query image of the object 702 (e.g., dividing the reference image into multiple reference image portions corresponding to multiple reference object portions, respectively, according to the plurality of predefined object portions 706 related to the object, and performing a transformation of each reference image portion identified to obtain transformed reference image portions corresponding to reference object portions identified in the reference image). In various example embodiments, the reference object in the reference image 732 may be known. For example, the reference object in the reference image may have been registered into a database. Accordingly, an example implementation (or use case) may be to identify the object in the query image 702 by comparing the query image captured in real-time with the reference images of reference objects that have already been registered (e.g., reference images of reference object in the image gallery).

[0070] By comparing image portions of each corresponding object portion (e.g., vehicle part) pair of the query image 702 and the reference image, the similarity of this object portion pair may be determined. In various example embodiments, a determination of whether the object corresponds to the reference object may be made, for example, based on similarities of image portions of corresponding object portion pairs (e.g., image portion corresponding to front face of vehicle from query image compared to reference image portion corresponding to front face of vehicle reference image).

[0071] FIG. 8 illustrates another exemplary schematic 800 of image processing for object identification according to various example embodiments of the present invention. [0072] FIG. 9 shows an exemplary process flow 900 of image processing for object identification according to various example embodiments of the present invention. The process flow 900, for example, illustrates processing from vehicle detection to transformation (e.g., affine rectification) for one image portion of the image corresponding to one object portion of the object.

[0073] At 910, an image region corresponding to the vehicle (e.g., vehicle detection) may be detected. For example, the output of vehicle detection may be a rectangular bounding box including the vehicle (object). At 920, the image region corresponding to the vehicle may be cropped using this bounding box. For example, vehicle detection processing may be applied to detect the location of the vehicle in the query image (i.e., pixel coordinates corresponding to a vehicle inside the original query image). Various algorithms may be used for the vehicle detection processing such as Yolov3 which uses a deep learning model to detect location of object in image (bounding box) as described in Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv: 1804.02767 (2018) and methods as mentioned in Sun, Zehang, George Bebis, and Ronald Miller. "On-road vehicle detection: A review." IEEE transactions on pattern analysis and machine intelligence 28.5 (2006): 694-711, which utilize color, shadow, corner and/or texture information to detect vehicle’s location in an image, the content of which being hereby incorporated by reference in its entirety for all purposes.

[0074] At 930, key point detection may be performed on the cropped image. It should be noted that key points may also be detected directly from the original image of the vehicle as captured. For example, step 910 and 920 may be omitted. However, it is noted that key points detection on the cropped image may be easier because the searching area may be reduced to the bounding box area rather than performed on the whole image. Key points corresponding to each predefined object portion (vehicle part) may be detected respectively. These key points may be detected simultaneously or one by one. It is important to know the relation between detected key points and the predefined vehicle points in the object portions definition data set related to the object (i.e., the set of key points and corresponding set of rectified coordinates defined for the corresponding predefined object portion in the object portions definition data set). For example, four key points, i.e., A, B, C, D are detected, and A is corresponding to first vehicle point of vehicle part P, B is corresponding to second vehicle point of vehicle part P, etc. Such correspondence information may be provided by the key point detection algorithm (machine learning model for the key point detection).

[0075] At 940, a correspondence relationship between the set of key points detected for the image portion and the set of rectified coordinates of the corresponding predefined object portion may be determined. In various example embodiments, the correspondence relationship may be an affine transformation matrix. Each object portion (e.g., vehicle part) has its own affine matrix. The corresponding affine matrix may be applied to its corresponding vehicle part.

[0076] The estimated affine matrices are used for transforming (or rectifying) each image portion identified. At 950, a transformation may be performed to the image portion identified. For example, an image portion corresponding to an object portion may be cropped from the image of the vehicle, and a transformation (e.g., affine rectification) may be applied to obtain a transformed image portion. Alternatively, the transformation may be applied to the whole image and then the transformed image portion corresponding to the object portion (vehicle part) may be cropped using the corresponding set of rectified coordinates.

[0077] FIG. 10 shows an exemplary schematic diagram of a process 1000 of image processing for object identification according to various example embodiments of the present invention. An exemplary transformed image portion 1010 is illustrated. FIG. 11 shows two exemplary images 1110a and 1120a of an object each comprising an image portion of the image 1110a, 1120a corresponding to an object portion of the object identified based on a plurality of predefined object portions related to the object. Transformed image portions 1110b, 1120b corresponding to the object portions may be obtained by applying a transformation (e.g., affine transformation) based on an object portions definition data set related to the object. It can be seen that the transformed image portions 1110b, 1120b have much more similar visual appearance after the transformation. [0078] In FIGS. 10 and 11, each image portion corresponding to each object portion (e.g., vehicle part) are illustrated to be transformed (rectified) to a top-down view which is very easy for humans to understand. However, it is not necessary to transform each image portion corresponding to each object portion to a top-down view. FIG. 12 illustrates a schematic 1200 in relation to key point detection and affine rectification for two different vehicle parts. As shown in FIG. 12, an image portion corresponding to an object portion may be transformed to a “skewed” view 1210 in the transformed image portion. The orientation of the transformed image portion may be implicitly defined by the predefined set of rectified coordinates in the object portions definition data set related to the object. [0079] FIG. 13 shows another exemplary process flow 1300 of image processing for object identification according to various example embodiments of the present invention. The process flow 1300, for example, illustrates processing from key points detection to transformation (e.g., affine rectification) for each image portion of the image corresponding to each object portion of the object based on a plurality of predefined object portions related to the object. The process flow 1300 is similar to the process flow 900 described with respect to FIG. 9 and will not be discussed in detail in the interest of brevity. For example, for each predefined object portion i of the total number of predefined object portions n, key points corresponding to each predefined object portion (vehicle part) detection may be performed (e.g., to identify image portion corresponding to object portion of the object based on the plurality of predefined object portions related to the object), a correspondence relationship between the set of key points detected for the image portion and the set of coordinates of the corresponding predefined object portion may be determined (e.g., affine transformation matrix estimation), and a transformation may be performed to the image portion identified.

[0080] In various example embodiments, by comparing the transformed (rectified) image portions of corresponding vehicle parts, the similarity for corresponding object portion pairs from different images (e.g., query image and reference image) may be computed. Various algorithms may be used to compute the similarity between two images. For example, the similarity between the image portion from the query image and the image portion from the reference image may be computed based on pixel intensity correlation, structure similarity or image feature distance. In various example embodiments, the similarity of the at least one image portion identified may be determined based on pixel intensity correlation such as using the Normalized Crossed Correlation (NCC). Equation (1) as follows shows a formula for calculating the NCC between the rectified vehicle part pair from a query image and a reference image. where Q(i,j), is the pixel intensity at location (i,j) of a query image, G(i,j) is the pixel intensity of a reference image.

[0081] Other techniques such as template matching methods may be implemented for calculating similarity of each corresponding object portion pair. Another common approach for comparing two images is to extract features from both images, and compute the feature distance (e.g., the Euclidean distance) as similarity (or lack of similarity thereof), i.e., image feature distance. In addition to the handmade image features (e.g., image feature which needs to be designed manually by researcher/engineer), deep learning/machine learning models such as CNN models may also be used for image feature extraction.

[0082] In various example embodiments, the overall similarity for each image portion of the at least one image portion may be determined. For example, in relation to NCC, the NCC for each image portion identified may be computed and the average of all NCCs may be computed to obtain the overall similarity value. For example, when the overall similarity value is higher than a predefined threshold, the two vehicles in the query image and the reference image may be recognized as the same vehicle, same class of vehicle, same type of vehicle, etc.

[0083] FIG. 14 shows an exemplary process flow 1400 for determining a similarity of at least one transformed image portion corresponding to at least one object portion with respect to at least one corresponding reference image portion of at least one corresponding object portion of a reference object (e.g., comparing image portions corresponding to vehicle parts) according to various example embodiments of the present invention.

[0084] As shown in FIG. 14, after computing similarity for each vehicle part, a score from the similarities may be estimated. For example, an average over all similarities may be determined or the maximum/minimum similarity value may be taken as the score. Alternatively, a coefficient for each vehicle part may be defined, and the weighted sum of similarities may be computed as the score. The score may be used for making a final identification decision. For example, two vehicles captured by two different image may be identified to be the same object if the score is higher than a predetermined threshold. [0085] In various example embodiments, the image processing method according to various example embodiments of the present invention may be combined with vehicle number plate recognition, as illustrated in FIG. 15. In this case, the recognized “vehicle number information” may or may not contain the complete vehicle number. The confidence of the vehicle number information may be used as coefficient when computing the similarity between two vehicle numbers. This is similar as the “weighted similarity” described above. The vehicle number information, for example, may be recognized alphabet and numbers. In this case, the matching rate may be computed as the similarity (e.g., the number or amount of characters that are correctly matched). In other embodiments, the vehicle number information may be image feature extracted from the number plate area. In this case, the extracted image feature may be viewed as encoded “alphabet and numbers”, and the feature distance may be used for determining similarity. The similarity for the detected number plate area may be computed, and combined with determined similarities from other vehicle parts. [0086] While embodiments of the invention have been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Previous Patent: SYSTEM AND METHOD FOR HANDLING EVENTS OF A FLEET OF PERSONAL MOBILITY DEVICES

Next Patent: SELF-LEVELLING SPREADER BEAM