KANG QIYU (SG)
SHE RUI (SG)
TAY WEE PENG (SG)
NAVARRO NAVARRO DIEGO (SG)
KHURANA RITESH (SG)
WANG SIJIE (SG)
UNIV NANYANG TECH (SG)
KAYA ET AL: "Deep Metric Learning: A Survey", SYMMETRY, vol. 11, no. 9, 21 August 2019 (2019-08-21), pages 1066, XP055838320, DOI: 10.3390/sym11091066
XIANGLI YANG ET AL: "A Survey on Deep Semi-supervised Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 August 2021 (2021-08-23), XP091024668
YIXIN LIU ET AL: "Graph Self-Supervised Learning: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 5 August 2021 (2021-08-05), XP091025019
Patent claims 1. Computer implemented method (600) for image segmentation matching of a first segmentation yk of a first image with a second segmentation xk of a second image, where the first segmentation yk and second segmentation xk are a pair of segmentations, comprising the steps: learning (602) joint features and metric of the first segmentation yk and the second segmentation xk using joint feature and metric learning; and regulating (604) the joint feature and metric learning using a graph Gyk containing a global spatial relationship between the first segmentation yk corresponding to the second segmentation xk and neighboring segmentations of the first segmentation yk; and using the second segmentation xk and a graph Gxk containing a global spatial relationship between the second segmentation xk and neighboring segmentations of the second segmentation xk. 2. Computer implemented method (600) according to claim 1, wherein the method is executed within a framework (100), the framework (100) comprising a basic model (102) for the joint feature and metric learning, a graph-based regularization part (104) for regulating the joint feature and metric learning, and a loss layer (106), wherein the basic model (102) comprises a first image feature extraction module (111), a second image feature extraction module (112) and a fully connected neural network (116), wherein the method further comprises the steps: Extracting (702) first image features f(yk) of the first segmentation yk by the first image feature extraction module (111) and outputting the extracted first image features f(yk) to the fully connected neural network (116); Extracting (704) second image features f(xk) of the second segmentation xk by the second image feature extraction module (112) and outputting the extracted second image features f(yk) to the fully connected neural network (116); Comparing (706) the extracted first image features f(yk) and the extracted second image f(xk) features and outputting a predicted result to the loss layer (106), by the fully connected neural network (116); Calculating (708) a total loss ltotal as a sum of the loss lce based on the predicted result and a loss lreg based on the result of the graph-based regularization part, by the loss layer (106). 3. Computer implemented method (600) according to claim 2, wherein the graph-based regularization part (104) comprises a third image feature extraction module (113), a fourth image feature extraction module (114), a graph attention network (120), and a discriminator (108); wherein the method further comprises the steps: Extracting (802) third image features f(xk) of the second segmentation xk by the third image feature extraction module (113) and outputting the extracted third image features f(xk) to the discriminator (108); Extracting fourth image features f(yl) of neighboring segmentations contained in graph Gyk of the first image segmentation yk by the fourth image feature extraction module (114) and outputting the extracted fourth image features f(yl) to the graph attention network (120); Executing (804) attention functions over the fourth image features f(yl), in a multi-head module (128) to obtain high dimensional features g(Gyk) of the neighboring segments, and outputting the high dimensional features g(Gyk) of the neighboring segmentations to the discriminator (108), by the graph attention network (120); and Comparing (806) the extracted third image features f(xk) with the high dimensional features g(Gyk) of the neighboring segmentations contained in graph Gyk in layers of the discriminator (108) and outputting the result Lempirical-ID(x, Gy) to the loss layer (106), by the discriminator (108). 4. Computer implemented method (600) according to claim 2 or 3, wherein additionally the steps of claim 3 are performed with respect to image features f(yk) of the first segmentation yk and image features f(xl)) of neighboring segmentations contained in graph Gxk of the second image segmentation xk, and to compare third image features f(yk) of the first segmentation (yk) with the high dimensional features g(Gxk) of the neighboring segmentations in layers of the discriminator (108) and outputting the result Lempirical-ID(y, Gx) to the loss layer (106), by the discriminator (108). 5. Computer implemented method (600) according to claim 3 or claim 4, wherein the discriminator (108) comprises a bilinear layer with a trainable matrix M. 6. Computer implemented method (600) according to claim 5, wherein the bilinear layer form is given as d(a, b) = σ(aτ M b), where σ(·) denotes the sigmoid function, where, aτ is a weight vector, and a and b are placeholders for f(xk) or f(yk), wherein the discriminator computes a loss Lpair-empirical-ID according to a discrimination function, which is defined as Lpair-empirical-ID = Lempirical-ID(x, Gy) + Lempirical-ID(y, Gx) = wherein 1{-} is an indicator and denotes a match or mismatch with respect to a kth segmentation pair (xk, yk). 7. Computer implemented method (600) according to claim 6, wherein the loss lreg is calculated as loss lreg = - Lpair-empirical-ID, by the loss layer (106). 8. Computer implemented method (600) according to any one of claims 2 - 7, wherein a loss function for calculating the loss lce is a cross entropie over pairs of segmentation samples using a ground-truth label and the predicted result from the basic model (102), and the result obtained from graph-based regularization part (104); wherein the total loss is the weighted sum of the losses lce and lreg. 9. Computer implemented method (600) according to any one of the previous claims, wherein the loss function lce is defined by wherein k is a k-th pair of segmentation samples, αk a ground-truth label, and is a predicted result from the basic model (102) based on a sigmoid function. 10. Computer implemented method (600) according to any of the previous claims, wherein the first image contains a street scene from a first perspective under first environmental conditions and the second image contains a street scene from a second perspective under second environmental conditions. 11. Computer implemented method (600) according to any of the previous claims, wherein the first and the second image segmentations comprise spatial information. 12. Computer implemented method (600) according to any of the previous claims, wherein the first (111), the second (112), the third (113), and the fourth (114) image feature extraction modules are one shared network (110), which serves as feature descriptor function f and extracts the first, second, third and fourth image features as high dimensional features from the respective image segmentations. 13. Computer implemented method (600) according to the previous claim, wherein the shared network (110) is a residual neural network, ResNet. 14. Computer readable medium on which a computer program for image segmentation matching of a first segmentation of a first image with a second segmentation of a second image is stored, the computer program comprising a basic model module (102) configured to learn joint features and metric of a first segmentation of a first image and a second segmentation of a second image, and a graph-based regularization part (104) configured to regulate the joint feature and metric learning using a graph Gyk containing a global spatial relationship between the first segmentation and neighboring segmentations of the first segmentation and a graph Gxk containing a global spatial relationship between the second segmentation and neighboring segmentations of the second segment. 15. Processing circuitry configured to run a computer program for image segmentation matching of a first segmentation of a first image with a second segmentation of a second image, the computer program comprising a basic model module configured to learn joint features and metric of a first segmentation, and a graph-based regularization part configured to regulate the joint feature and metric learning using a graph Gyk containing a global spatial relationship between the first segmentation and neighboring segmentations of the first segmentation and a graph Gxk containing a global spatial relationship between the second segmentation and neighboring segmentations of the second segment. |
Fig. 6 shows a flow diagram of a computer implemented method 600 for image segmentation matching of a first segmentation y k of a first image with a second segmentation x k of a second image, where the first segmentation y k and second segmentation x k are a pair of segmentations, comprising the steps: learning 602 joint features and metric of the first segmentation y k and the second segmentation x k using joint feature and metric learning; and regulating 604 the joint feature and metric learning using a graph G yk containing a global spatial relationship between the first segmentation y k corresponding to the second segmentation x k and neighboring segmentations of the first segmentation y k ; and using the second segmentation x k .
It is noted that the steps described herein may be performed in parallel or in another order where meaningful.
Fig. 7 shows a flow diagram based on the method 600 performed within the framework 100 and in particular within the basic model 102 presented in this disclosure. The flow diagram in Fig. 7 shows the part of the joint feature and metric learning 602 in more detail and uses the output of step 604 as input into the loss layer. The steps of the joint feature and metric learning 602 are the following: Extracting 702 first image features f(y k ) of the first segmentation y k by the first image feature extraction module 111 and outputting the extracted first image features f(y k ) to the fully connected neural network 116. Extracting 704 second image features f(x k ) of the second segmentation x k by the second image feature extraction module 112 and outputting the extracted second image features f(y k ) to the fully connected neural network 116. Comparing 706 the extracted first image features f(y k ) and the extracted second image f(x k ) features and outputting a predicted result to the loss layer 106, by the fully connected neural network 116. A further step 708 is performed using the output of step 604: Calculating 708 a total loss ltotal as a sum of the loss lce based on the predicted result and a loss l reg based on the result of the graph-based regularization part, by the loss layer 106. Fig.8 shows a flow diagram that based on the method 600 and that shows the step 604 in more detail. Also this part is performed within the framework 100, in particular within the graph-based regularization part 104. This method part comprises the following steps: Extracting 802 third image features f(x k ) of the second segmentation x k by the third image feature extraction module 113 and outputting the extracted third image features f(x k ) to the discriminator 108. Extracting 804 fourth image features f(y l ) of neighboring segmentations contained in graph G yk of the first image segmentation y k by the fourth image feature extraction module 114 and outputting the extracted fourth image features f(y l ) to the graph attention network 120. Executing 806 attention functions over the fourth image features f(y l ), in a multi-head module 128 to obtain high dimensional features g(G yk ) of the neighboring segments, and outputting the high dimensional features g(G yk ) of the neighboring segmentations to the discriminator 108, by the graph attention network 120, and Comparing 808 the extracted third image features f(x k ) with the high dimensional features g(G yk ) of the neighboring segmentations contained in graph G yk in layers of the discriminator 108 and outputting the result L empirical-ID (x, G y ) to the loss layer 106, by the discriminator 108. In the following, a theoretical analysis is provided for Graph-Based Regularization. Assumption 1. Let be a random sample pair of an image segmentation and a neighborhood graph which are from the frames respectively. As well, these random sample pairs are independent and identically distributed (i.i.d.). In this case, denotes a pair of high-dimensional features output from Resnet and GAT respectively, which is corresponding to With respect to the random sample pairs are also i.i.d. Let be the probability space for , where is the sample space of denotes the collection of all subsets of , as well as The matched conditional distribution and unmatched conditional distribution with respect to are denoted by whose densities are respectively, where the condition is whether the two image segmentations x and y are matched. When the Assumption 1 is satisfied, we shall discuss some theoretical characteristics for the objective function optimization with respect to the graph-based regularization. Wefirst state the expected form of as Proposition 1 (Relationship with KL divergence). In terms of the optimization for graph-based regularization, it is essential to update the parameters of neural networks by maximizing L ID given by (6). Suppose that the optimal discrimina tor d* is obtained and the corresponding objective function is denoted by In this case, maximizing is equivalent to maximizing an upper bound of Kullback-Leibler (KL) divergence between Proposition 2 (Optimal solution for the regularization). As for the optimization problem with respect to the regularization, it can be described as where is given by Eq. (4), f and g denote the Resnet and GAT respectively, as well as d is the discriminator. In this case, the optimal solution is reached when the unmatched conditional distribution lies in the boundary of its range. Besides, we also investigate how the optimization objective function of the regularization is influenced by the discriminator disturbance. Proposition 3 (Effect of discriminator disturbance). Consider the case there exists a disturbance on discriminator d, that is where ε is a small enough parameter. As for the optimization objective function L ID given by Eq. (4), it is obtained denotes the objective function with Furthermore, when the optimal discriminator d* is obtained as Eq. (8) whose corresponding object function is denoted by where denotes the objective function based on the optimal discriminator with a disturbance. Moreover, we shall discuss how the mapping from vertices to neighborhood graphs makes an effect on the matching effectiveness in some special cases. Case 1. There are two sampled frames in which there exist the same vertices, implying that the image segmentations of objects and the corresponding neighborhood graphs are all common in the two frames. Let x and y denote the random vertices from the frames as well as are the neighborhood graphs with respect to x and y, respectively. Besides, are the high-dimensional features for x and are bijective mappings represented by two neural networks. Proposition 4 (Relationship with the mapping in Case 1) When Case 1 is satisfied, if the mapping from vertices to their neighborhood graphs is injective, the correct image segmentation matching will be achieved for each pair of frames. Case 2 As for a pair of frames , there exist some uncommon vertices so that their corresponding neighborhood graphs are also not the same, which is a different condition from that in Case 1. While, the rest conditions are similar to those in Case 1. Proposition 5 (Relationship with the mapping in Case 2) When Case 2 is satisfied, if there exists an injective mapping from vertices to their neighborhood graphs, we will have the correct results for the image segmentation matching. Performance Evaluation Performance of the KITTI Dataset. This dataset is available online, which provides a deal of multi-sensors data for autonomous driving. It contains street scene images and the corresponding LiDAR points. Generally speaking, the proposed framework 100, which is also referred to as REGR Net in the following, is superior to the other methods such as MatchNet, Siamese Network, TFeat Network, L2-Net, HardNet, SOSNet, or Res-Matching Network though there exist not too much difference between it and Res-Matching Network under the measurement of Recall. Moreover, it is also not difficult to observe that REGR Net performs less unstable than the other methods when the training state tends to convergence. Actually, these methods are almost converged after around 15 epochs. As a result of the experiments, the proposed method performs better than other methods under the measurements of the whole criteria. As a result, the regularization proposed in this disclosure makes a positive difference to the matching efficiency by introducing graph-based information. Performance of a real dataset The real dataset is a dataset for autonomous driving applications, which is collected by a probe vehicle with many sensors including cameras, LiDAR, radars, etc. Similar to KITTI dataset, it also contains images and LiDAR points, which are captured on the streets in Singapore as for the objects contained in an image, there are more numerous and more various landmarks such as traffic signs, traffic lights and poses in this dataset than those in KITTI dataset. When taking experiments on the real dataset, the proposed method almost holds the optimal performance compared with the other methods, which is similar to that on KITTI dataset. Moreover, since the real dataset and KITTI dataset have different image quality and they are collected in different street scenes, there exists different performance in real data from that in KITTI data. That is, for all the discussed methods, they performs better in real data. In general, the proposed method holds its superiority on the matching prediction in the case of high-quality image datasets and more amount of valuable objects. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from the study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items or steps recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims should not be construed as limiting the scope of the claims. List of reference signs 100 Framework 102 Basic model 104 Graph-based regularization part 106 Loss layer 108 Discriminator 110 Shared image feature extraction module, Resnet 111 First image feature extraction module, Resnet 112 Second image feature extraction module, Resnet 113 Third image feature extraction module, Resnet 114 Fourth image feature extraction module, Resnet 116 Fully connected layer 120 GAT (graph attention) network module with GAT blocks 121 GAT block 122 Module with Information of edges 124 Neighborhood graphs handling network 128 Multi-head module 129 Concatenation module 131 Image segmentation of a first image 132 Image segmentation of a second image 133 Image segmentations of the first (or second) image 134 Neighborhood graphs 301 Convolution layer 302 Max pooling 304 Average pooling 310 Resnet convolution layer block 401 Fully connected layer 402 ReLU or LeakyReLU layer 403 Sigmoid layer 501 Bilinear layer 502 Sigmoid layer 600 Method (flow diagram) 602 First step of method part 1 604 Second step of method part 1 702 First sub-step of first step 602 704 Second sub-step of first step 602 706 Third sub-step of first step 602 708 Further step of method part 1 802 First sub-step of second step 604 804 Second sub-step of second step 604 806 Third sub-step of second step 604
Next Patent: PYRAZOLE DERIVATIVES AS STING AGONISTS