SIMILARITY RETRIEVAL

Title:

SIMILARITY RETRIEVAL

Document Type and Number:

WIPO Patent Application WO/2023/011943

Kind Code:

Abstract:

The following disclosure relates to the field of data analysis, in particular medical data analysis, or more particularly relates to systems, apparatuses, and methods for processing in particular medical data stored in different modalities, so-called multi-modal data. In some embodiments, the disclosure relates to similarity retrieval for input data, in particular medical input data.

Inventors:

VOGLER STEFFEN (DE)
HOEHNE JOHANNES (DE)
LENGA MATTHIAS (DE)

Application Number:

PCT/EP2022/070592

Publication Date:

February 09, 2023

Filing Date:

July 22, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BAYER AG (DE)

International Classes:

G06N3/04; G06N3/08

Domestic Patent References:

WO2018140225A1	2018-08-02
WO2020132674A1	2020-06-25
WO2019217152A1	2019-11-14

Other References:

FANGXIANG FENG ET AL: "Correspondence Autoencoders for Cross-Modal Retrieval", ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS ANDAPPLICATIONS, ASSOCIATION FOR COMPUTING MACHINERY, US, vol. 12, no. 1s, 21 October 2015 (2015-10-21), pages 1 - 22, XP058077001, ISSN: 1551-6857, DOI: 10.1145/2808205
JONAS DIPPEL ET AL: "Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 April 2021 (2021-04-09), XP081936849
AGISILAOS CHARTSIAS ET AL: "Multimodal MR Synthesis via Modality-Invariant Latent Representation", IEEE TRANSACTIONS ON MEDICAL IMAGING, vol. 37, no. 3, 1 March 2018 (2018-03-01), USA, pages 803 - 814, XP055656238, ISSN: 0278-0062, DOI: 10.1109/TMI.2017.2764326
XI CHENG ET AL: "Deep similarity learning for multimodal medical images", COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION, 6 April 2016 (2016-04-06), GB, pages 1 - 5, XP055414487, ISSN: 2168-1163, DOI: 10.1080/21681163.2015.1135299
NGIAM JIQUAN ET AL: "Multimodal Deep Learning", 1 May 2011 (2011-05-01), pages 1 - 8, XP055836369, Retrieved from the Internet [retrieved on 20210831]
N. FAREED: "Intelligent High Resolution Satellite/Aerial Imagery", ADVANCES IN REMOTE SENSING, vol. 03, 2014, pages 1 - 9
C. YANG ET AL.: "Using High-Resolution Airborne and Satellite Imagery to Assess Crop Growth and Yield Variability for Precision Agriculture", PROCEEDINGS OF THE IEEE, vol. 101, no. 3, March 2013 (2013-03-01), pages 582 - 592, XP011494138, DOI: 10.1109/JPROC.2012.2196249
P. BASNYAT ET AL.: "Agriculture field characterization using aerial photograph and satellite imagery", IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, vol. 1, no. 1, January 2004 (2004-01-01), pages 7 - 10, XP011107086, DOI: 10.1109/LGRS.2003.822313
G.A. TSIHRINTZISL.C. JAIN: "Learning and Analytics in Intelligent Systems", vol. 18, 2020, SPRINGER NATURE, article "Machine Learning Paradigms: Advances in Deep Learning-based Technological Applications"
K. GRZEGORCZYK: "Doctoral Dissertation", 2018, article "Vector representations of text data in deep learning"
D. ITZKOVICH ET AL.: "Using Augmentation to Improve the Robustness to Rotation of Deep Learning Segmentation in Robotic-Assisted Surgical Data", 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), MONTREAL, QC, CANADA, 2019, pages 5068 - 5075, XP033593915, DOI: 10.1109/ICRA.2019.8793963
E. CASTRO ET AL.: "Elastic deformations for data augmentation in breast cancer mass detection", 2018 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL HEALTH INFORMATICS (BHI, 2018, pages 230 - 234, XP033345166, DOI: 10.1109/BHI.2018.8333411
Y.-J. CHA ET AL.: "Autonomous Structural Visual Inspection Using Region-Based Deep Learning for Detecting Multiple Damage Types", COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, vol. 00, pages 1 - 17
S. WANG ET AL.: "Multiple Sclerosis Identification by 14-Layer Convolutional Neural Network With Batch Normalization, Dropout, and Stochastic Pooling", FRONTIERS IN NEUROSCIENCE, vol. 12, pages 818, XP055818203, DOI: 10.3389/fnins.2018.00818
Z. WANG ET AL.: "CNN Training with Twenty Samples for Crack Detection via Data Augmentation", SENSORS, vol. 20, 2020, pages 4849
B. HU ET AL.: "A Preliminary Study on Data Augmentation of Deep Learning for Image Classification", COMPUTER VISION AND PATTERN RECOGNITION; MACHINE LEARNING (CS.LG); IMAGE AND VIDEO PROCESSING
R. TAKAHASHI ET AL.: "Data Augmentation using Random Image Cropping and Patchingfor Deep CNNs", JOURNAL OF LATEX CLASS FILES, vol. 14, no. 8, 2015
T. DEVRIESG. W. TAYLOR: "Improved Regularization of Convolutional Neural Networks with Cutout", ARXIV: 1708.04552, 2017
Z. ZHONG ET AL.: "Random Erasing Data Augmentation", ARXIV: 1708.04896, 2017
S.M. MEYSTRE ET AL.: "Extracting information from textual documents in the electronic health record: a review of recent research", YEARB MED INFORM., 2008, pages 128 - 44
O. SUN: "MICE-DA: A MICE method with Data Augmentation for missing data imputation", IEEE ICHI 2019 DACMI CHALLENGE, 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI, 2019, pages 1 - 3, XP033663475, DOI: 10.1109/ICHI.2019.8904724
V. MARIVATET. SEFARA ET AL.: "Machine Learning and Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Computer Science", vol. 12279, 2020, SPRINGER, article "Improving Short Text Classification Through Global Augmentation Methods"
J. WEIK. ZOU: "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", ARXIV: 1901.11196
M. ABULAISHA. K. SAH: "A Text Data Augmentation Approach for Improving the Performance of CNN", 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), BENGALURU, INDIA, 2019, pages 625 - 630, XP033548850, DOI: 10.1109/COMSNETS.2019.8711054
T. CHEN ET AL.: "A simple framework for contrastive learning of visual representations", ARXIV:2002.05709, 2020
P. KHOSLA ET AL.: "Supervised Contrastive Learning", COMPUTER VISION AND PATTERN RECOGNITION
J. DIPPELS. VOGLERJ, HOHNE: "Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling", ARXIV:2104.04323VL
A. RADFORD ET AL.: "Learning transferable visual models from natural language supervision", ARXIV:2103.00020, 2021, Retrieved from the Internet
O. RONNEBERGER ET AL.: "International Conference on Medical image computing and computer-assisted intervention", 2015, SPRINGER, article "U-net: Convolutional networks for biomedical image segmentation", pages: 234 - 241
G. HUANG ET AL.: "Densely connected convolutional networks", IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2017, pages 2261 - 2269, XP033249569, DOI: 10.1109/CVPR.2017.243

Attorney, Agent or Firm:

BIP PATENTS (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A computer-implemented method, the method comprising: providing a machine learning model, the machine learning model comprising:

• a first input layer,

• a second input layer,

• a first output layer,

• a second output layer, and

• a third output layer, providing training data for training the machine learning model, wherein providing training data comprises:

• receiving, for each object of a multitude of objects, input data of at least two different modalities, first input data of a first modality and second input data of a second modality,

• generating first augmented input data from the first input data and second augmented input data from the second input data,

• generating first masked input data from the first augmented input data and second masked input data from the second augmented input data, training the machine learning model to perform a combined reconstruction and discrimination task, the training comprising:

• inputting the first masked input data into the first input layer,

• inputting the second masked input data into the second input layer,

• reconstructing the first augmented input data from the first masked input data via the first output layer,

• reconstructing the second augmented input data from the second masked input data via the second output layer,

• generating a joint representation of the first masked input data and the second masked input data via the third output layer, and

• discriminating joint representations which were generated from input data of the same object from joint contrastive representations which were generated from input data of different objects, receiving input data related to a first object, inputting the input data related to the first object into the trained machine learning model, receiving from the trained machine learning model a first representation of the first object via the third output layer, receiving at least one second representation of at least one second object, computing a similarity value, the similarity value indicating the similarity between the first representation and the at least one second representation, outputting the similarity value and/or information related to the at least one second object.

2. The method according to claim 1, wherein the first object, the at least one second object and each object of the multitude of objects is a human being, preferably a patient.

3. The method according to claim 2, wherein the training data and the input data related to the first object comprise personal data, the personal data being selected from one or more of the following group: age, height, weight, gender, eye color, hair color, skin color, blood group, blood pressure, resting heart rate, heart rate variability, vagus nerve tone, hematocrit, sugar concentration in urine, existing illnesses, existing conditions, pre-existing illnesses, pre-existing conditions, eyesight, consumption of alcohol, smoking, exercise, diet, information from an electronic medical record, self-assessment data, medical image(s), sound(s) from: heartbeat, breathing noise, cough, swallow, sneeze, clear throat, scratch, voice, noises when knocking against part(s) of the body and/or joint noise.

4. The method according to claim 1, wherein the first object, the at least one second object and each object of the multitude of objects is a plant or a plurality of plants or one or more parts of a plant.

5. The method according to claim 1, wherein the first object, the at least one second object and each object of the multitude of objects is a part of the Earth's surface.

6. The method according to any one of claim 1 to 5, wherein the first input data of the first modality and the second input data of the second modality comprise or are derived from one or more images, text files and/or audio files, wherein the first modality is different from the second modality.

7. The method according to any one of claim 1 to 6, wherein the first object is different from the at least one second object.

8. The method according to any one of claims 1 to 6, wherein the first object is identical to the at least one second object, wherein the first representation is a representation of the first object at a first point in time and the at least one second representation represents the first object at at least one second point in time.

9. The method according to any one of claims 1 to 8, w’herein the machine learning model comprises a number k of input layers, and a number k+\ output layers, wherein k is a natural number greater than two, and w herein each input layer is configured to receive input data of a different modality.

10. The method according to any one of claim 1 to 9, w herein the machine learning model is or comprises a deep neural network, wherein the deep neural network comprises, at least for the training, a first encoder, a first decoder, a second encoder, a second decoder, a fusion component, an attention weighted pooling, and a projection head, w’herein the first encoder is configured to receive first masked input data, and to generate a first representation from the first masked input data, wherein the second encoder is configured to receive second masked input data, and to generate a second representation from the second masked input data, wherein the fusion component is configured to generate a joint representation from the first representation and the second representation, wherein the first decoder is configured to reconstruct the first augmented input data from the joint representation, wherein the second decoder is configured to reconstruct the second augmented input data from the joint representation, wherein the attention weighted pooling is configured to reduce the dimensions of the joint representation, wherein the projection head is configured to map the dimensionally reduced joint representation to a space w’here contrastive loss is applied.

11. The method according to any one of claim 1 to 10, wherein the training further comprises: computing a reconstruction loss for each reconstruction task, computing a contrastive loss for each discrimination task, computing a total loss on the basis of the reconstruction losses and the discrimination losses, modifying parameters of the machine learning model so that the total loss is minimized.

12. The method according to any one of claim 1 to 11, wherein, for each second object of a plurality of second objects, a similarity value is computed, the similarity value quantifying the similarity between a second representation of the second object and the first representation of the first object, wherein a number m of second objects is identified, the similarity’ values of the number m of second objects being greater than the similarity values of second objects not belonging to the number m of second objects, wherein m is a natural number greater than 0.

13. The method according to claim 12, wherein, for each second object of the number m of second objects, input data related to the second object is analyzed in order to identify data characterizing the second object that is not available for the first object.

14. A computer system comprising: a processor; and a memory storing an application program configured to perform, when executed by the processor, an operation, the operation comprising: receiving input data related to a first object, inputting the input data into a trained machine learning model, receiving from the trained machine learning model a first representation of the first object, receiving at least one second representation of at least one second object, computing a similarity value, the similarity value indicating the similarity between the first representation and the at least one second representation, outputting the similarity value and/or information related to the at least one second object, wherein the trained machine learning model was trained in a training process, the training process comprising the following steps: providing a machine learning model, the machine learning model comprising:

• a first input layer,

• a second input layer,

• a first output layer,

• a second output layer, and

• a third output layer, receiving training data for training the machine learning model, wherein providing training data comprises:

• receiving, for each object of a multitude of objects, input data of at least two different modalities, first input data of a first modality and second input data of a second modality,

• generating first augmented input data from the first input data and second augmented input data from the second input data,

• inputting the first masked input data into the first input layer,

• inputting the second masked input data into the second input layer,

• reconstructing the first augmented input data from the first masked input data via the first output layer,

• reconstructing the second augmented input data from the second masked input data via the second output layer,

• generating a joint representation of the first masked input data and the second masked input data via the third output layer, and

• discriminating joint representations which were generated from input data of the same object from joint contrastive representations which were generated from input data of different objects.

15. A non-transitory computer readable medium having stored thereon software instructions that, when executed by a processor of a computer system, cause the computer system to execute the following steps: receiving input data related to a first object, inputting the input data into a trained machine learning model, receiving from the trained machine learning model a first representation of the object, computing a similarity value, the similarity value indicating the similarity between the first representation and at least one second representation of at least one second object, outputting the similarity value and/or information related to the at least one second object, wherein the trained machine learning model was trained in a training process, the training process comprising the following steps: providing a machine learning model, the machine learning model comprising: