Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CLINICAL PROTOTYPES
Document Type and Number:
WIPO Patent Application WO/2022/069884
Kind Code:
A1
Abstract:
Disclosed is a method of training a machine-learning network to obtain clinical prototypes for a plurality of measurement categories. The method comprises receiving measurement data units of patients comprising one or more measurements of physiological parameters of the patients and assigned to measurement categories. Iterations are carried out comprising choosing values of one or more network parameters of the machine-learning network, generating embedded representations for each measurement data unit using the machine-learning network, according to the network parameters, for each measurement category, calculating a clinical prototype according to the network parameters, and evaluating a loss function comprising a contrastive loss dependent on differences between the embedded representations of the measurement data units and the clinical prototypes of the measurement categories. The iterations are repeated until the value of the loss function is reduced to meet a predetermined condition. Also disclosed are methods applying machine learning networks trained using the method.

Inventors:
KIYASSEH DANI (GB)
ZHU TINGTING (GB)
CLIFTON DAVID (GB)
Application Number:
PCT/GB2021/052520
Publication Date:
April 07, 2022
Filing Date:
September 29, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV OXFORD INNOVATION LTD (GB)
International Classes:
G16H50/70; G16H50/20
Other References:
DANI KIYASSEH ET AL: "CLOCS: Contrastive Learning of Cardiac Signals", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 May 2020 (2020-05-27), XP081684275
KHOSLA PRANNAY ET AL: "Supervised Contrastive Learning", 23 April 2020 (2020-04-23), pages 1 - 18, XP055882457, Retrieved from the Internet [retrieved on 20220121]
ADAM PASZKESAM GROSSFRANCISCO MASSAADAM LERERJAMES BRADBURYGREGORY CHANANTREVOR KILLEENZEMING LINNATALIA GIMELSHEINLUCA ANTIGA ET : "Pytorch: An imperative style, high-performance deep learning library", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2019, pages 8024 - 8035
ADAM M: "Rhine. Information Retrieval for Clinical Decision Support", PHD, 2017
ALAN H GEEDIEGO GARCIA-OLANOJOYDEEP GHOSHDAVID PAYDARFAR: "Explaining deep classification of time series data with learned prototypes", ARXIV PREPRINT ARXIV: 1904.08935, 2019
ALAN F SMEATON: "Natural Language Information Retrieval", 1999, SPRINGER, article "Using nlp or nlp resources for information retrieval tasks", pages: 99 - 111
ALI POURMANDMARY TANSKISTEVEN DAVISHAMID SHOKOOHIRAYMOND LUCASFAREEN ZAVER: "Educational technology improves ecg interpretation of acute myocardial infarction among medical students and emergency medicine residents", WESTERN JOURNAL OF EMERGENCY MEDICINE, vol. 16, no. 1, 2015, pages 133
ALISTAIR E W JOHNSONTOM J POLLARDLU SHENH LEHMAN LI-WEIMENGLING FENGMOHAMMAD GHASSEMIBENJAMIN MOODYPETER SZOLOVITSLEO ANTHONY CELR: "Mimic-iii, a freely accessible critical care database", SCIENTIFIC DATA, vol. 3, no. 1, 2016, pages 1,9, XP055756323, DOI: 10.1038/sdata.2016.35
ANIS SHARAFODDINIJOEL A DUBINJOON LEE: "Patient similarity in prediction models based on health data: a scoping review", JMIR MEDICAL INFORMATICS, vol. 5, no. 1, 2017, pages e7
ARNAUD VAN LOOVERENJANIS KLAISE: "Interpretable counterfactual explanations guided by prototypes", ARXIV PREPRINT ARXIV, vol. 1907, 2019, pages 02584
BYRON C WALLACEJOEL KUIPERAAKASH SHARMAMINGXI ZHULAIN J MARSHALL: "Extracting pico sentences from clinical trial reports using supervised distant supervision", THE JOURNAL OF MACHINE LEARNING RESEARCH, vol. 17, no. l, 2016, pages 4572 - 4596
CHAITANYA SHIVADEPREETHI RAGHAVANERIC FOSLER-LUSSIERPETER J EMBINOEMIE ELHADADSTEPHEN B JOHNSONALBERT M LAI: "A review of approaches to identifying patient phenotype cohorts using electronic health records", JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, vol. 21, no. 2, 2014, pages 221 - 230, XP055453340, DOI: 10.1136/amiajnl-2013-001935
CONNER D GALLOWAYALEXANDER V VALYSJACQUELINE B SHREIBATIDANIEL L TREIMANFRANK L PETERSONVIVEK P GUNDOTRADAVID E ALBERTZACHI I ATTI: "Development and validation of a deep-learning model to screen for hyperkalemia from the electrocardiogram", JAMA CARDIOLOGY, vol. 4, no. 5, 2019, pages 428 - 436
DANI KIYASSEHTINGTING ZHUDAVID A CLIFTON, ARXIV PREPRINT ARXIV:2005.13249, 2020
DAVID HAANDREW DAIQUOC V LE. HYPERNETWORKS, ARXIV PREPRINT ARXIV: 1609.09106, 2016
DEEPAK ROY CHITTAJALLUBO DONGPAUL TUNISONRODDY COLLINSKATERINA WELLSJAMES FLESHMANGANESH SANKARANARAYANANSTEVEN SCHWAITZBERGLORA C: "International Symposium on Biomedical Imaging", 2019, IEEE, article "Andinet Enquobahrie. Xai-cbir: Explainable ai system for content based retrieval of video frames from minimally invasive surgery videos", pages: 66 - 69
DIANBO LIUDMITRIY DLIGACHTIMOTHY MILLER: "Two-stage federated phenotyping and patient representation learning", ARXIV PREPRINT ARXIV, vol. 1908, 2019, pages 05596
ERICK A PEREZ ALDAYANNIE GUAMIT SHAHCHAD ROBICHAUXAN-KWOK IAN WONGCHENGYU LIUFEIFEI LIUALI BAHRAMI RADANDONI ELOLASALMAN SEYEDI ET: "Classification of 12 2 lead ecgs: the physionet/computing in cardiology challenge 2020", MEDRXIV, 2020
FLOOD SUNGYONGXIN YANGLI ZHANGTAO XIANGPHILIP HS TORRTIMOTHY M HOSPEDALES: "Learning to compare: Relation network for few-shot learning", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2018, pages 1199 - 1208, XP033476082, DOI: 10.1109/CVPR.2018.00131
GEOFFREY HINTON: "How to represent part-whole hierarchies in a neural network", ARXIV PREPRINT ARXIV, vol. 2102, 2021, pages 12627
HAOLIN WANGQINGPENG ZHANGJIAHU YUAN: "Semantically enhanced medical information retrieval system: a tensor factorization based approach", IEEEACCESS, vol. 5, 2017, pages 7584 - 7593, XP011651471, DOI: 10.1109/ACCESS.2017.2698142
HARSHA GURULINGAPPALUCA TOLDOCLAUDIA SCHEPERSALEXANDER BAUERGERARD MEGARO: "Semi-supervised information retrieval system for clinical decision support", TEXT RETRIEVAL CONFERENCE (TREC, 2016
ISOTTA LANDIBENJAMIN S GLICKSBERGHAO-CHIH LEESARAH CHERNGGIULIA LANDIMATTEO DANIELETTOJOEL T DUDLEYCESARE FURLANELLORICCARDO MIOTT: "Deep representation learning of electronic health records to unlock patient stratification at scale", ARXIV PREPRINT ARXIV:2003.06516, 2020
JAKE SNELLKEVIN SWERSKYRICHARD ZEMEL: "Prototypical networks for few-shot learning", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2017, pages 4077 - 4087
JIANWEI ZHENGJIANMING ZHANGSIDY DANIOKOHAI YAOHANGYUAN GUOCYRIL RAKOVSKI: "A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients", SCIENTIFIC DATA, vol. 7, no. 1, 2020, pages 1 - 8
JOSEPH Y CHENGHANLIN GOHKAAN DOGRUSOZONCEL TUZELERDRIN AZEMI: "Subject-aware contrastive learning for biosignals", ARXIV PREPRINT ARXIV:2007.04871, 2020
JOSIF GRABOCKANICOLAS SCHILLINGMARTIN WISTUBALARS SCHMIDT-THIEME: "Learning time-series shapelets", PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2014, pages 392 - 401, XP058053687, DOI: 10.1145/2623330.2623613
JUNNAN LIPAN ZHOUCAIMING XIONGRICHARD SOCHERSTEVEN C H HOI: "Prototypical contrastive learning of unsupervised representations", ARXIV PREPRINT ARXIV:2005.04966, 2020
KAI HANANDREA VEDALDIANDREW ZISSERMAN: "Learning to discover novel visual categories via deep transfer clustering", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2019, pages 8401 - 8409
LELAND MCINNESJOHN HEALYJAMES MELVILLE: "Umap: Uniform manifold approximation and projection for dimension reduction", ARXIV PREPRINT ARXIV: 1802.03426, 2018
LEONARD W D'AVOLIOTHIEN M NGUYENWILDON R FARWELLYONGMING CHENFELICIA FITZMEYEROWEN M HARRISLOUIS D FIORE: "Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (arc", JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, vol. 17, no. 4, 2010, pages 375 - 382
LI HUANGANDREW L SHEAHUINING QIANADITYA MASURKARHAO DENGDIANBO LIU: "Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records", JOURNAL OF BIOMEDICAL INFORMATICS, vol. 99, 2019, pages 103291
MATHILDE CARONISHAN MISRAJULIEN MAIRALPRIYA GOYALPIOTR BOJANOWSKIARMAND JOULIN: "Unsupervised learning of visual features by contrasting cluster assignments", ARXIV PREPRINT ARXIV:2006.09882, 2020
MATHILDE CARONPIOTR BOJANOWSKIARMAND JOULINMATTHIJSDOUZE: "Deep clustering for unsupervised learning of visual features", PROCEEDINGS OF THE EUROPEAN, 2018, pages 132 - 149
MORTEN MORUPLARS KAI HANSEN: "Archetypal analysis for machine learning and data mining", NEUROCOMPUTING, vol. 80, 2012, pages 54 - 63, XP028356709, DOI: 10.1016/j.neucom.2011.06.033
NAVEEN SAI MADIRAJUSEID M SADATDIMITRY FISHERHOMA KARIMABADI: "Deep temporal clustering: Fully unsupervised learning of time-domain features", ARXIV PREPRINT ARXIV: 1802.01059, 2018
NILS STRODTHOFFPATRICK WAGNERTOBIAS SCHAEFFTERWOJCIECH SAMEK: "Deep learning for ecg analysis: Benchmarks and insights from ptb-xl", ARXIV PREPRINT ARXIV:2004.13701, 2020
PATRICK WAGNERNILS STRODTHOFFRALF-DIETER BOUSSELJOTWOJCIECH SAMEKTOBIAS SCHAEFFTER, PTB-XL, A LARGE PUBLICLY AVAILABLE ELECTROCARDIOGRAPHY DATASET, 2020, Retrieved from the Internet
QIANLI MAJIAWEI ZHENGSEN LIGARY W COTTRELL: "Learning representations for time series clustering", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 32, 2019, pages 3781 - 3791
QIN ZHANGJIA WUPENG ZHANGGUODONG LONGCHENGQI ZHANG: "Salient subsequence learning for time series clustering", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 41, no. 9, 2018, pages 2193 - 2207, XP011737831, DOI: 10.1109/TPAMI.2018.2847699
QIULING SUOFENGLONG MAYE YUANMENGDI HUAIWEIDA ZHONGAIDONG ZHANGJING GAO: "In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM", 2017, IEEE, article "Personalized disease prediction using a cnn-based similarity learning method", pages: 811 - 816
R RANI SARITHAVARGHESE PAULP GANESH KUMAR: "Content based image retrieval using deep learning process", CLUSTER COMPUTING, vol. 22, no. 2, 2019, pages 4187 - 4200
RICCARDO MIOTTOLI LIBRIAN A KIDDJOEL T DUDLEY: "Deep patient: an unsupervised representation to predict the future of patients from the electronic health records", SCIENTIFIC REPORTS, vol. 6, no. 1, 10 January 2016 (2016-01-10)
SAJAD DARABIMOHAMMAD KACHUEESHAYAN FAZELIMAJID SARRAFZADEH: "Taper: Time-aware patient ehr representation", IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020. EUROPEAN COMMISSION. EXCHANGE-ELECTRONIC-HEALTH-RECORDS-ACROSS-EU, 2019, Retrieved from the Internet
SHRADDHA PAIGARY D BADER: "Patient similarity networks for precision medicine.", JOURNAL OF MOLECULAR BIOLOGY, vol. 430, no. 18, 2018, pages 2924 - 2938, XP085446855, DOI: 10.1016/j.jmb.2018.05.037
SHRADDHA PAISHIRLEY HUIRUTH ISSERLINMUHAMMAD A SHAHHUSSAM KAKAGARY D BADER: "Interpretable patient classification using integrated patient similarity networks", MOLECULAR SYSTEMS BIOLOGY, vol. 15, no. 3, 2019
SIDDHARTH BISWALCAO XIAOLUCAS M GLASSELIZABETH MILKOVITSJIMENG SUN: "Doctor2vec: Dynamic doctor representation learning for clinical trial recruitment", PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 34, 2020, pages 557 - 564
SIYUAN QIAOCHENXI LIUWEI SHENALAN L YUILLE: "Few-shot image recognition by predicting parameters from activations", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2018, pages 7229 - 7238, XP033473642, DOI: 10.1109/CVPR.2018.00755
SPYROS GIDARISNIKOS KOMODAKIS: "Dynamic few-shot visual learning without forgetting", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2018, pages 4367 - 4375, XP033476410, DOI: 10.1109/CVPR.2018.00459
STEVE R CHAMBERLINSTEVEN D BEDRICKAARON M COHENYANSHAN WANGANDREW WENSIJIA LIUHONGFANG LIUWILLIAM HERSH: "Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task", MEDRXIV, 2019, pages 19005280
TING CHENSIMON KORNBLITHMOHAMMAD NOROUZIGEOFFREY HINTON: "A simple framework for contrastive learning of visual representations", ARXIV PREPRINT ARXIV:2002.05709, 2020
TONGZHOU WANGJUN-YAN ZHUANTONIO TORRALBAALEXEI A EFROS, DATASET DISTILLATION. ARXIV PREPRINT ARXIV: 1811.10959, 2018
TRAVIS R GOODWINSANDA M HARABAGIU: "Learning relevance models for patient cohort retrieval", JAMIA OPEN, vol. 1, no. 2, 2018, pages 265 - 275
TRAVIS R GOODWINSANDA M HARABAGIU: "AMIA Annual Symposium Proceedings", vol. 2016, 2016, AMERICAN MEDICAL INFORMATICS ASSOCIATION, article "Multi-modal patient cohort identification from eeg report and signal data", pages: 1794
VIVEK H MURTHYHARLAN M KRUMHOLZCARY P GROSS: "Participation in cancer clinical trials: race-, sex-, and age-based disparities", JAMA, vol. 291, no. 22, 2004, pages 2720 - 2726
VIVIEN SAINTE FARE GARNOTLOIC LANDRIEU, METRIC-GUIDED PROTOTYPE LEARNING, 2020
WEI-YIN KOKONSTANTINOS C SIONTISZACHI I ATTIARICKEY E CARTERSURAJ KAPASTEVE R OMMENSTEVEN J DEMUTHMICHAEL J ACKERMANBERNARD J GERS: "Detection of hypertrophic cardio myopathy using a convolutional neural network-enabled electrocardiogram", JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, vol. 75, no. 7, 2020, pages 722 - 733
WILLIAM R HERSHROBERT A GREENES, INFORMATION RETRIEVAL IN MEDICINE: STATE OF THE ART. MD COMPUTING: COMPUTERS IN MEDICAL PRACTICE, vol. 7, no. 5, 1990, pages 302 - 311
XU JIJOAO F HENRIQUESANDREA VEDALDI: "Invariant information clustering for unsupervised image classification and segmentation", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2019, pages 9865 - 9874
YANSHAN WANGANDREW WENSIJIA LIUWILLIAM HERSHSTEVEN BEDRICKHONGFANG LIU: "Test collections for electronic health record-based clinical information retrieval", JAMIAOPEN, vol. 2, no. 3, 2019, pages 360 - 368
YONGLONG TIANDILIP KRISHNANPHILLIP ISOLA, CONTRASTIVE MULTIVIEW CODING. ARXIV PREPRINT ARXIV: 1906.05849, 2019
YUE LIPRATHEEKSHA NAIRXING HAN LUZHI WENYUENING WANGAMIR ARDALANKALANTARI DEHAGHIYAN MIAOWEIQI LIUTAMAS ORDOG: "Inferring multimodal latent topics from electronic health records", NATURE COMMUNICATIONS, vol. 11, no. 1, 2020, pages 1 - 17
YUKI MARKUS ASANOCHRISTIAN RUPPRECHTANDREA VEDALDI, SELF-LABELLING VIA SIMULTANEOUS CLUSTERING AND REPRESENTATION LEARNING. ARXIV PREPRINT ARXIV:1911.05371, 2019
INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, 2020
YUQI SIKIRK ROBERTS: "Deep patient representation of clinical notes via multi-task learning for mortality prediction", AMIA SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS, vol. 2019, 2019, pages 779
ZACHI I ATTIAPETER A NOSEWORTHYFRANCISCO LOPEZ-JIMENEZSAMUEL J ASIRVATHAMABHISHEK J DESHMUKHBERNARD J GERSHRICKEY E CARTERXIAOXI Y: "An artificial intelligence enabled ecg algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction", THE LANCET, vol. 394, no. 10201, 2019, pages 861 - 867, XP085804088, DOI: 10.1016/S0140-6736(19)31721-0
ZACHI I ATTIASURAJ KAPAFRANCISCO LOPEZ-JIMENEZPAUL M MCKIEDOROTHY J LADEWIGGAURAV SATAMPATRICIA A PELLIKKAMAURICE ENRIQUEZ-SARANOP: "Screening for cardiac contractile dysfunction using an artificial intelligence enabled electro-cardiogram", NATURE MEDICINE, vol. 25, no. 1, 2019, pages 70 - 74
ZIHAO ZHUCHANGCHANG YINBUYUE QIANYU CHENGJISHANG WEIFEI WANG: "2016 IEEE 16th International Conference on Data Mining (ICDM", 2016, IEEE, article "Measuring patient similarities via a deep architecture with medical concept embedding", pages: 749 - 758
Attorney, Agent or Firm:
J A KEMP LLP (GB)
Download PDF:
Claims:
CLAIMS

1. A method of training a machine-learning network to obtain clinical prototypes for a plurality of measurement categories, the method comprising: receiving measurement data units of a plurality of patients, each measurement data unit comprising one or more measurements of a physiological parameter of one of the patients and being assigned to one of the plurality of measurement categories; and carrying out one or more iterations comprising: choosing values of one or more network parameters of the machine-learning network; generating an embedded representation for each measurement data unit using the machine-learning network, according to the values of one or more of the network parameters; for each measurement category, calculating a clinical prototype according to the values of one or more of the network parameters; and calculating a value of a loss function comprising a contrastive loss dependent on differences between the embedded representations of the measurement data units and the clinical prototypes of the measurement categories, wherein the iterations are repeated until the value of the loss function is reduced to meet a predetermined condition.

2. The method of claim 1, wherein the differences between the embedded representations and the clinical prototypes comprise attractive differences between the embedded representations of the measurement data units and the clinical prototypes of the measurement categories to which the respective measurement data units are assigned, optionally wherein the contrastive loss increases if the attractive differences increase.

3. The method of any preceding claim, wherein the differences between the embedded representations and the clinical prototypes comprise repulsive differences between the embedded representations of the measurement data units and the clinical prototypes of measurement categories to which the respective measurement data units are not assigned, optionally wherein the contrastive loss decreases if the repulsive differences increase.

4. The method of any preceding claim, wherein each measurement data unit is further assigned to one of a plurality of clinical classes, and wherein each of the one or more iterations further comprises: generating classification parameters for each embedded representation according to the network parameters; assigning each embedded representation to one of the clinical classes using the machine-learning network, the assignment based on the classification parameters, wherein the loss function further comprises a supervised loss dependent on a difference between the clinical class to which each measurement data unit is assigned and the clinical class to which the embedded representation of the measurement data unit is assigned, optionally wherein the generating of classification parameters is performed using a hypemetwork using the embedded representations, and the assigning of each embedded representations is performed using a linear classifier using the classification parameters.

5. The method of any preceding claim, wherein: the measurement data units further comprise values of one or more attributes of the respective patients; the measurement categories are associated with values of the one or more attributes; and the measurement data units are assigned to the measurement category associated with values of the attributes that are the same as the values of the attributes of the respective measurement data units, optionally wherein the attributes comprise one or more of identity, sex, age, disease class, and ethnicity.

6. The method of claim 5, wherein: one or more of the attributes are defined as class attributes; and the differences between the embedded representations and the clinical prototypes comprise a plurality of attractive differences between the embedded representations of the measurement data units and the clinical prototypes of each of the measurement categories associated with the same values of the class attributes as the values of the class attributes of the respective measurement data units, optionally wherein the contrastive loss increases if the attractive differences increase.

7. The method of claim 6, wherein the contrastive loss depends on the plurality of attractive differences with weights that differ between the plurality of attractive differences depending on the number of values of attributes other than the class attributes that are the same between the measurement data units and the respective measurement categories, optionally wherein the weights of the attractive differences are higher for measurement categories having a higher number of attributes other than the class attributes having values that are the same between the measurement data units and the respective measurement categories.

8. The method of any of claims 5 to 7, wherein: one or more of the attributes are defined as class attributes; and the differences between the embedded representations and the clinical prototypes comprise a plurality of repulsive differences between the embedded representations of the measurement data units and the clinical prototypes of each of the measurement categories associated with values of the class attributes different from the values of the class attributes of the respective measurement data units, optionally wherein the contrastive loss decreases if the repulsive differences increase.

9. The method of claim 8, wherein the contrastive loss depends on the plurality of repulsive differences with weights that differ between the plurality of repulsive differences depending on the number of values of attributes that are different between the measurement data units and the respective measurement categories, optionally wherein the weights of the repulsive differences are higher for measurement categories having a higher number of attributes having values that are different between the measurement data units and the respective measurement categories.

10. The method of any of claims 6 to 9, wherein the loss function further comprises a regression loss dependent on intraclass spacings between the clinical prototypes of each of the measurement categories associated with the same values of the class attributes, the regression loss increasing when the difference between the intraclass spacings and a set of desired spacings increases, optionally wherein the set of desired spacings is determined based on differences between the values of attributes other than the class attributes with which the measurement categories associated with the same values of the class attributes are associated.

11. The method of claim 10, wherein the intraclass spacings are determined for the clinical prototypes of each pair of measurement categories associated with the same values of the class attributes.

12. The method of any of claims 10 to 11, wherein the intraclass spacings are determined using pairwise Euclidian distances between the clinical prototypes of each of the measurement categories associated with the same values of the class attributes.

13. A method of assigning a patient to one of a plurality of clinical classes comprising: receiving one or more measurement data units each comprising one or more measurements of a physiological parameter of the patient; generating an embedded representation for the measurement data unit using a machine-learning network trained using the method of claim 4 or of claims 5 to 12 when dependent on claims 4; identifying one or more clinical prototypes most similar to the embedded representation from the clinical prototypes obtained from the machine-learning network; generating classification parameters for the one or more clinical prototypes most similar to the embedded representation using the machine-learning network; and classifying the patient based on the classification parameters optionally wherein: identifying one or more clinical prototypes comprises identifying a plurality of the clinical prototypes most similar to the embedded representation and calculating an average of the plurality of clinical prototypes; and generating classification parameters for the one or more clinical prototypes comprises generating classification parameters for the average of the plurality of clinical prototypes.

14. A method of determining a similarity among a plurality of patients comprising: receiving measurement data units of each patient, each measurement data unit comprising one or more measurements of a physiological parameter of the patient; for each patient, generating an embedded representation of the measurement data units using a machine-learning network trained using the method of any of claims 1 to 12, and identifying one or more clinical prototypes most similar to the embedded representation from the clinical prototypes obtained from the machine-learning network; and determining the similarity among the patients based on a similarity among the clinical prototypes identified for each patient.

15. A method of training a machine-learning network to assign patients to one of a plurality of clinical classes, the method comprising training the machine-learning network using the clinical prototypes obtained using the method of any of claims 1 to 12.

16. A method of selecting patients having desired values of one or more attributes from a plurality of patients, the method comprising: receiving measurement data units of the plurality of patients, each measurement data unit comprising one or more measurements of a physiological parameter of one of the patients; generating an embedded representation of the measurement data units of each patient using a machine-learning network trained using the method of claim 5 or any of claims 6 to 12 when dependent on claim 5; identifying a clinical prototype of a measurement category associated with the desired values of the one or more attributes from the clinical prototypes obtained from the machine-learning network; identifying one or more embedded representations most similar to the identified clinical prototype; and selecting the one or more patients corresponding to the identified embedded representations.

17. A method of assigning a patient to one of a plurality of measurement categories, the method comprising: receiving a measurement data unit of the patient comprising one or more measurements of a physiological parameter of the patient; generating an embedded representation of the measurement data unit using a machine-learning network trained using the method of any of claims 1 to 12; calculating similarities between the embedded representation and the clinical prototypes obtained from the machine-learning network; assigning the patient to one of the measurement categories based on the similarities optionally wherein assigning the patient to one of the measurement categories comprises assigning the patient to the measurement category having a clinical prototype most similar to the embedded representation.

18. The method of any preceding claim, wherein the measurement data units comprise a plurality of measurements of the physiological parameter as a function of time.

19. The method of any preceding claim, wherein the physiological parameter is an electrocardiogram measurement.

20. The method of any preceding claim, wherein differences between embedded representations and clinical prototypes are calculated using a cosine similarity function.

21. The method of any preceding claim, wherein the embedded representations are generated using a feature extractor.

22. The method of any preceding claim, wherein choosing values of the one or more network parameters comprises choosing values of the network parameters based on values of the network parameters from one or more previous iterations, optionally wherein choosing values of the one or more network parameters further comprises choosing values of the network parameters based on a change in the value of the loss function between one or more previous iterations.

23. The method of any preceding claim, wherein the values of the one or more network parameters are chosen randomly in a first iteration of the one or more iterations.

24. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of the preceding claims.

25. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of any of claims 1 to 23.

Description:
CLINICAL PROTOTYPES

The invention relates to methods of training of machine learning networks to obtain clinical prototypes, and the use of machine learning networks trained using those methods.

The ongoing digitization of health records within the healthcare industry results in large-scale datasets that make it nontrivial to manually extract clinically-useful insight. Extracting information at scale while leveraging patient-specific attributes such as sex and age can assist with clinical trial enrollment, medical school educational endeavours, and the evaluation of the fairness of neural networks, among other application. At the same time, existing deep learning methodologies applied to clinical data within the medical domain are typically population-based and may be difficult to interpret. This limits their clinical utility, as population-based medical diagnoses may not generalize to the individual patient.

Healthcare data are being collected at a burgeoning rate. Such growth is driven by the digitization of old patient records, the presence of novel monitoring systems, and the recent recommendation to improve the exchange of health records. Healthcare data may include the results of tests and measurements carried out on individual patients. Physiological signals such as the electrocardiogram (ECG) convey a significant amount of information about the functionality and potential abnormalities of an individual’s body, for example their heart. Therefore, analysis of these increasingly available large datasets has the potential to provide various improvements in clinical practice.

The automatic detection of heart abnormalities, a subset of which are known as arrhythmias, through the use of deep learning has been quite successful in recent years (Galloway et al., 2019; Attia et al., 2019b;a; Ko et al., 2020). However, diagnoses generated by these algorithms are applied to large datasets and remain population-based and difficult to interpret. Consequently, medical practitioners are reluctant to integrate network-generated predictions into their clinical workflow.

Modern medical research is arguably anchored around the ‘gold standard’ of evidence provided by randomized control trials. However, the conclusions derived from such trials are typically population-based and fail to capture nuances at the individual patient level. The complex structures that define a patient in terms of their demographics, physiological state, and treatment outcomes imply that population-based findings may not trivially extend to the level of an individual patient. Consequently, personalized medicine, the ability to deliver the right treatment to the right patient at the right time, is increasingly viewed as being an important aim of medical diagnosis.

Models that are specifically structured to exploit patient-specific information already present within existing physiological datasets stand to benefit from personalized medical diagnosis and improved clinical interpretability. Appreciating the potential this holds, and in an attempt to design personalized models, Pai & Bader (2018); Pai et al. (2019) propose patient similarity networks. These graph networks combine multi-modal clinical data to model connections between patients. Similarly, Zhu et al. (2016) and Suo et al. (2017) exploit methods from natural language processing to learn patient representations and similarities based on electronic health record data. In performing patient representation learning, existing approaches are relatively naive, do not concurrently optimize for a predictive task, and do not trivially extend to physiological signals.

Another longstanding goal in the presence of large-scale patient datasets is retrieving instances based on some user-defined criteria using machine learning. This information retrieval (IR) process typically consists of a query that is used to search through a large database and retrieve matched instances. Clinical databases can comprise instances that are either unlabelled or labelled with patient attribute information, such as disease class, sex, and age. The process of manually searching for relevant instances in, and extracting information from, clinical databases underpin a multitude of clinical tasks (Shivade et aL 2014). For example, clinicians extract a disease diagnosis from patient data, researchers involved in clinical trials search for and recruit patients satisfying specific inclusion criteria (Murthy et al. 2004), and educators retrieve relevant information as part of the continuing medical education scheme (Pourmand et aL 2015). This manual search- and-extract process, however, has been hampered by the growth of large-scale clinical databases and the increased prevalence of unlabelled instances. IR systems can be used to automate this process.

Within healthcare, the importance of an IR system is threefold. Firstly, it provides researchers with greater control and flexibility to choose patients for clinical trial recruitment. Secondly, if the query were to consist of sensitive attributes such as sex, age, and race, then such a system would allow researchers to more reliably evaluate the individual and counterfactual fairness of a particular model. To illustrate this point, assume the presence of a query instance that corresponds to a patient with atrial fibrillation who is male and under the age of 25. To reliably determine the sensitivity of a model with respect to sex, one would observe its response when exposed to a counterfactual instance, namely the exact same instance but with a different sex label. The use of an IR system can allow one to arrive at more reliable counterfactual instances. Lastly, IR systems can serve as an educational and diagnostic tool, allowing physicians to identify seemingly similar patients who exhibit different clinical parameters and vice versa.

Several IR systems have previously been implemented to retrieve instances from electronic health records (Wang et al., 2019; Chamberlin et al., 2019). However, most of these approaches do not allow for the granularity of an attribute-specific search and do not trivially extend to the domain of physiological signals. Moreover, of the methods that implement representation learning, none do so in a self-supervised manner.

In view of these limitations of existing systems, it would be advantageous to provide improved machine learning techniques that can better represent specific attributes within large datasets, such as sex, age, race, or even individual patient identity.

According to an aspect of the invention, there is provided a method of training a machine-learning network to obtain clinical prototypes for a plurality of measurement categories, the method comprising: receiving measurement data units of a plurality of patients, each measurement data unit comprising one or more measurements of a physiological parameter of one of the patients and being assigned to one of the plurality of measurement categories; and carrying out one or more iterations comprising: choosing values of one or more network parameters of the machine-learning network; generating an embedded representation for each measurement data unit using the machine-learning network, according to the network parameters; for each measurement category, calculating a clinical prototype using the values of one or more of the network parameters; and calculating a value of a loss function comprising a contrastive loss dependent on differences between the embedded representations of the measurement data units and the clinical prototypes of the measurement categories, wherein the iterations are repeated until the value of the loss function is reduced to meet a predetermined condition.

The method provides a supervised contrastive learning framework where representations of cardiac signals associated with a set of patient-specific attributes (e.g., disease class, sex, age) are attracted to learnable embeddings entitled clinical prototypes. This method uses contrastive loss to encourage discrimination between clinical prototypes that are associated with different measurement categories. The clinical prototypes are thereby encouraged to be better representative of the measurement categories than previous attempts to represent measurement categories. The clinical prototypes can be used for both clustering and retrieval of unlabelled patient measurement data (e.g. cardiac signals) based on multiple patient attributes. In addition, clinical prototypes adopt a semantically meaningful arrangement based on patient attributes, and thus confer a high degree of interpretability.

In some embodiments, the differences between the embedded representations and the clinical prototypes comprise attractive differences between the embedded representations of the measurement data units and the clinical prototypes of the measurement categories to which the respective measurement data units are assigned. Optionally, the contrastive loss increases if the attractive differences increase. This encourages embedded representations from the same categories to be clustered together to inform the clinical prototypes of those categories.

In some embodiments, the differences between the embedded representations and the clinical prototypes comprise repulsive differences between the embedded representations of the measurement data units and the clinical prototypes of measurement categories to which the respective measurement data units are not assigned. Optionally, the contrastive loss decreases if the repulsive differences increase. This encourages discrimination between embedded representations and clinical prototypes of different categories.

In some embodiments, each measurement data unit is further assigned to one of a plurality of clinical classes, and each of the one or more iterations further comprises: generating classification parameters for each embedded representation according to the network parameters; and assigning each embedded representation to one of the clinical classes using the machine-learning network, the assignment based on the classification parameters, wherein the loss function further comprises a supervised loss dependent on a difference between the clinical class to which each measurement data unit is assigned and the clinical class to which the embedded representation of the measurement data unit is assigned. This allows the learning to take account of additional annotations on the measurement data and thereby improve the ability of the clinical prototypes to account for properties of the input data.

In some embodiments, the generating of classification parameters is performed using a hypemetwork using the embedded representations, and the assigning of each embedded representations is performed using a linear classifier using the classification parameters. This is a particular implementation well-suited to this application.

In some embodiments, the measurement data units further comprise values of one or more attributes of the respective patients; the measurement categories are associated with values of the one or more attributes; and the measurement data units are assigned to the measurement category associated with values of the attributes that are the same as the values of the attributes of the respective measurement data units. Defining the measurement categories according to patient attributes allows the clinical prototypes to be used to be representative of patients having those attributes. In some embodiments, the attributes comprise one or more of identity, sex, age, disease class, and ethnicity.

In some embodiments, one or more of the attributes are defined as class attributes; and the differences between the embedded representations and the clinical prototypes comprise a plurality of attractive differences between the embedded representations of the measurement data units and the clinical prototypes of each of the measurement categories associated with the same values of the class attributes as the values of the class attributes of the respective measurement data units. Optionally, the contrastive loss increases if the attractive differences increase. This further encourages clinical prototypes to be discriminative of the class attributes in particular.

In some embodiments, the contrastive loss depends on the plurality of attractive differences with weights that differ between the plurality of attractive differences depending on the number of values of attributes other than the class attributes that are the same between the measurement data units and the respective measurement categories. In some embodiments, the weights of the attractive differences are higher for measurement categories having a higher number of attributes other than the class attributes having values that are the same between the measurement data units and the respective measurement categories. The weights allow the clinical prototypes to be clustered within the class according to how many attributes they share.

In some embodiments, one or more of the attributes are defined as class attributes; and the differences between the embedded representations and the clinical prototypes comprise a plurality of repulsive differences between the embedded representations of the measurement data units and the clinical prototypes of each of the measurement categories associated with values of the class attributes different from the values of the class attributes of the respective measurement data units. Optionally, the contrastive loss decreases if the repulsive differences increase. This further encourages clinical prototypes to be discriminative of the class attributes in particular, and discourages clinical prototypes from different classes from overlapping.

In some embodiments, the contrastive loss depends on the plurality of repulsive differences with weights that differ between the plurality of repulsive differences depending on the number of values of attributes that are different between the measurement data units and the respective measurement categories. In some embodiments, the weights of the repulsive differences are higher for measurement categories having a higher number of attributes having values that are different between the measurement data units and the respective measurement categories. The weights allow the clinical prototypes to take account of the degree of difference from prototypes in other classes.

In some embodiments, the loss function further comprises a regression loss dependent on intraclass spacings between the clinical prototypes of each of the measurement categories associated with the same values of the class attributes, the regression loss increasing when the difference between the intraclass spacings and a set of desired spacings increases. This encourage clinical prototypes to adopt a particular arrangement, and discourages clinical prototypes in the same class but with different attributes from collapsing into a single point. In some embodiments, the set of desired spacings is determined based on differences between the values of attributes other than the class attributes with which the measurement categories associated with the same values of the class attributes are associated.

In some embodiments, the intraclass spacings are determined for the clinical prototypes of each pair of measurement categories associated with the same values of the class attributes. Pairwise spacings allow the loss function to encourage a particular arrangement based on all of the relationships between clinical prototypes. In some embodiments, the intraclass spacings are determined using pairwise Euclidian distances between the clinical prototypes of each of the measurement categories associated with the same values of the class attributes. Euclidean differences are a particularly suitable means of determining spacings.

According to another aspect, there is provided a method of assigning a patient to one of a plurality of clinical classes comprising: receiving one or more measurement data units each comprising one or more measurements of a physiological parameter of the patient; generating an embedded representation for the measurement data unit using a machine-learning network trained using the method of training a machine-learning network; identifying one or more clinical prototypes most similar to the embedded representation from the clinical prototypes obtained from the machine-learning network; generating classification parameters for the one or more clinical prototypes most similar to the embedded representation using the machine-learning network; and classifying the patient based on the classification parameters. Using the clinical prototypes of the method of training a machine-learning network allows for more efficient classification of patients due to the representative nature of the prototypes for characteristics of the patient.

In some embodiments, identifying one or more clinical prototypes comprises identifying a plurality of the clinical prototypes most similar to the embedded representation and calculating an average of the plurality of clinical prototypes; and generating classification parameters for the one or more clinical prototypes comprises generating classification parameters for the average of the plurality of clinical prototypes. Basing the classification on an average of multiple clinical prototypes may allow the method to better classify patients who have some characteristics of multiple categories.

According to another aspect, there is provided a method of determining a similarity among a plurality of patients comprising: receiving measurement data units of each patient, each measurement data unit comprising one or more measurements of a physiological parameter of the patient; for each patient, generating an embedded representation of the measurement data units using a machine-learning network trained using the method of training a machine learning network, and identifying one or more clinical prototypes most similar to the embedded representation from the clinical prototypes obtained from the machine-learning network; and determining the similarity among the patients based on a similarity among the clinical prototypes identified for each patient. Networks trained using the present method are better able to determine similarities, because the clinical prototypes are better reflective of the essential characteristics of patients in the measurement categories.

According to another aspect, there is provided a method of training a machinelearning network to assign patients to one of a plurality of clinical classes, the method comprising training the machine-learning network using the clinical prototypes obtained using the method of training a machine-learning network. Because the clinical prototypes effectively describe the patients, they provide an effective way to summarise the essential characteristics of the dataset and can be used to train other networks.

According to another aspect, there is provided a method of selecting patients having desired values of one or more attributes from a plurality of patients, the method comprising: receiving measurement data units of the plurality of patients, each measurement data unit comprising one or more measurements of a physiological parameter of one of the patients; generating an embedded representation of the measurement data units of each patient using a machine-learning network trained using the method of training a machine-learning network; identifying a clinical prototype of a measurement category associated with the desired values of the one or more attributes from the clinical prototypes obtained from the machine-learning network; identifying one or more embedded representations most similar to the identified clinical prototype; and selecting the one or more patients corresponding to the identified embedded representations. Because the clinical prototypes effectively summarise the essential characteristics of the categories, they can be used to find representative examples of those categories.

According to another aspect, there is provided a method of assigning a patient to one of a plurality of measurement categories, the method comprising: receiving a measurement data unit of the patient comprising one or more measurements of a physiological parameter of the patient; generating an embedded representation of the measurement data unit using a machine-learning network trained using the method of training a machine-learning network; calculating similarities between the embedded representation and the clinical prototypes obtained from the machine-learning network; assigning the patient to one of the measurement categories based on the similarities. Using the clinical prototypes of the method of training a machine-learning network allows for more efficient categorisation of patients due to the representative nature of the prototypes for characteristics of the patient. In some embodiments, assigning the patient to one of the measurement categories comprises assigning the patient to the measurement category having a clinical prototype most similar to the embedded representation.

In some embodiments of all of the above aspects, the measurement data units comprise a plurality of measurements of the physiological parameter as a function of time.

In some embodiments of all of the above aspects, the physiological parameter is an electrocardiogram measurement. These can effectively summarise a patient’s cardiac health.

In some embodiments of all of the above aspects, differences between embedded representations and clinical prototypes are calculated using a cosine similarity function. This is particularly suited to the types of calculations performed using the networks.

In some embodiments of all of the above aspects, the embedded representations are generated using a feature extractor.

In some embodiments of all of the above aspects, choosing values of the one or more network parameters comprises choosing values of the network parameters based on values of the network parameters from one or more previous iterations. In some embodiments of all of the above aspects, choosing values of the one or more network parameters further comprises choosing values of the network parameters based on a change in the value of the loss function between one or more previous iterations. This allows the method to more rapidly and effectively reduce the value of the loss function.

In some embodiments of all of the above aspects, the values of the one or more network parameters are chosen randomly in a first iteration of the one or more iterations.

The methods of all of the above aspects may be provided using a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method. The methods of all of the above aspects may be provided using a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method.

An apparatus may be provided configured to carry out the method of any of the above aspects.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which corresponding reference symbols represent corresponding parts, and in which:

Fig. l is a flowchart of a method of training a machine-learning network;

Fig. 2 is a schematic of a machine-learning network during training; Fig. 3 shows attractive and repulsive difference between a representation and clinical prototypes;

Fig. 4 shows attractive and repulsive difference between a representation and clinical prototypes with further values of a class attribute;

Fig. 5 shows projections of clinical prototypes learned with different contrastive losses;

Fig. 6 illustrates the calculation of a regression loss;

Fig. 7 is a schematic of a trained machine-learning network during use for inference;

Fig. 8 illustrates schematically the use of clinical prototypes for two different applications;

Fig. 9 shows a projection of embedded representations using t-SNE;

Fig. 10 shows a projection of clinical prototypes using t-SNE;

Fig. 11 is a graph showing the effectiveness of different choices of clinical prototypes for inference;

Fig. 12 shows results illustrating that clinical prototypes can be trained to be patient-specific by comparing similarities within a first dataset;

Fig. 13 shows further results illustrating that clinical prototypes can be trained to be patient-specific by comparing similarities within a second dataset;

Fig. 14 shows results illustrating that clinical prototypes can be trained to be patient-specific by comparing similarities within a third dataset;

Fig. 15 shows results of experiments on determining the similarity of patients using clinical prototypes within a first dataset;

Fig. 16 shows results of experiments on determining the similarity of patients using clinical prototypes within a second dataset;

Fig. 17 shows results of experiments on determining the similarity of patients using clinical prototypes between a first dataset and a second dataset;

Fig. 18 shows results of experiments on identifying dissimilar patients using clinical prototypes within a first dataset;

Fig. 19 shows results of experiments on identifying dissimilar patients using clinical prototypes within a second dataset;

Fig. 20 shows results of experiments on identifying dissimilar patients using clinical prototypes between a first dataset and a second dataset;

Fig. 21 shows results of training a machine-learning network using clinical prototypes;

Figs. 22(a) to 22(c) illustrate the effect of the embedding dimension on the accuracy of using clinical prototypes to train a machine-learning network using clinical prototypes;

Fig. 23 shows results of training a machine-learning network using clinical prototypes determined for a different dataset from Figs. 21 and 22(a)-22(c);

Fig. 24 shows results of retrieving patient instances from a dataset using clinical prototypes;

Fig. 25(a) and 25(b) illustrate the effect on the spacing of clinical prototypes of incorporating a regression loss when training the machine learning network to obtain clinical prototypes;

Fig. 26 shows UMAP projections of embedded representations using clinical prototypes, and traditional prototypes;

Fig. 27 shows qualitative results of retrieving patient instances; and

Fig. 28 illustrates a matrix and hierarchical agglomerative clustering of clinical prototypes to identify clusters of clinical prototypes.

Given a large, unlabelled clinical database, two tasks are particularly important: (1) how to reliably search for and retrieve relevant instances; and (2) how to extract attribute information from such unlabelled instances. To address the former, the task of information retrieval holds promise. To address the second question, the task of clustering holds value. Previous methodologies for information retrieval and patient representation have drawbacks. Such properties arise due to the failure to incorporate patient-specific or attribute-specific structure during model training or inference.

Clinical representation learning attempts to represent clinical data in a way that is meaningful for solving tasks. Learning meaningful representations of clinical data is an ongoing research endeavour. Recent research has focused on learning representations from electronic health records (EHRs) (Gee et al. 2019, Liu et al. 2019, Li et al. 2020b, Biswal et al. 2020, Darabi et al. 2020) and via auto-encoders, which are then clustered using existing methods, such as k-means (Huang et al. 2019, Landi et al. 2020). As for timeseries data, auto-encoders are learned with (Ma et al. 2019) or without (Madiraju et al. 2018) an auxiliary clustering objective, salient features (shapelets) are identified in an unsupervised manner (Grabocka et al. 2014, Zhang et al. 2018), and patient-specific representations are learned via contrastive learning (Kiyasseh et al. 2020). Li et al. (2020a) learn prototypes, or representative embeddings, via the ProtoNCE loss and cluster instances using k-means. Gee et al. (2019) disclosing learning of prototypes for the clustering of time-series signals. Their prototypes, however, cannot cluster instances based on multiple patient attributes and do not extend to the retrieval setting.

In addressing the second question, a centroid groups together instances that share some similarities. Most research attempts to learn representations of electronic health records (EHRs) (Miotto et al., 2016; Huang et al., 2019; Liu et al., 2019; Li et al., 2020b; Biswal et al., 2020; Darabi et al., 2020; Landi et al., 2020) and medical text (Si & Roberts,

2019) in a generative manner. Recent research has focused on exploiting existing clustering algorithms, such as k-means, to group similar patients from electronic health record (EHR) data. For example, both Landi et al. (2020) and Huang et al. (2019) implement an autoencoder to learn patient representations. These representations are then either clustered in a hierarchical manner or via K-means.

Other methods involve learning prototypes, which do not represent specific instances of input data, but are generalised to represent the essential characteristics of particular categories of data. For example, Li et al. (2020a) propose to do so via the ProtoNCE loss. Moreover, Van Looveren & Klaise (2019) learn to perturb prototypes to derive interpretable counterfactual instances. Another example is Gamot & Landrieu (2020) where the distance between class prototypes, which are learned in an end-to-end manner, is regularized based on a pre-defined tree hierarchy. Such methods, however, are exclusively unsupervised; they do not exploit patient attribute information. In contrast, the present methods focus on learning attribute-specific prototypes via contrastive learning and optionally maintaining their semantics via regression-based regularization, as will be discussed further below.

To overcome the drawbacks of previous approaches, the present disclosure concerns methods of learning clinical prototypes (CPs) that capture attribute-specific semantic relationships and significantly improve clustering performance. This approach exploits available patient-attribute data and does not necessitate the K-means algorithm.

In addition, previous work performs either clustering or retrieval, and not both. The present method addresses both questions while exploiting large-scale electrocardiogram (ECG) databases comprising patient attribute information.

Fig. 1 shows a method of training a machine-learning network 1, such as that shown schematically in Fig. 2. The method is used to train the machine-learning network 1 to obtain clinical prototypes 12 for a plurality of measurement categories. The measurement categories may be associated with values of one or more attributes of patients whose data is used to train the machine-learning network 1. The attributes may comprise one or more of identity, sex, age, disease class, and ethnicity. Other attributes may be used depending on what is available in the datasets that are used. Attributes may be discrete variables, such as sex, or continuous variable, such as age. Continuous variables may also be binned to convert them into discrete variables, for example by binning age into two or more groups.

The method of training can produce a machine-learning network 1 capable of learning patient representations that can be exploited for various purposes. As will be discussed further below, specific examples of applications include performing personalized arrhythmia diagnosis based on a standard 12-lead ECG, large-scale retrieval and clustering of physiological signals, or provision of a deep-learning based IR system that retrieves patients from a particular sub-cohort of interest.

The method allows the network 1 to learn, in an end-to-end manner, a fixed set of attribute-specific clinical prototypes 12. The clinical prototypes 12 are embeddings and can be thought of as summarizing a particular combination of attributes, for example m = aciass> a sex> a age) e M where M is the set of all possible combinations of values of disease class, sex, and age. These clinical prototypes 12 are learned via contrastive learning using labelled patient data whereby instances are encouraged to be similar to their corresponding clinical prototype 12 and dissimilar to the others. Each of these clinical prototypes can be thought of as descriptors of a particular (for example, a unique) combination of patient attributes. Such attribute-specific clinical prototypes create “islands” of similar representations (Hinton 2021), allowing for both the clustering and retrieval of cardiac signals based on multiple patient attributes.

The method comprises receiving measurement data units 10 of a plurality of patients. Each measurement data unit 10 comprises one or more measurements of a physiological parameter of one of the patients and is assigned to one of the plurality of measurement categories. The measurement data units 10 may comprise a plurality of measurements of the physiological parameter as a function of time. In some embodiments, for example in the specific examples for which results are given below, the physiological parameter is an electrocardiogram measurement. In such an embodiments, each measurement data unit 10 comprises ECG measurements of a patient, which comprise a time series of voltage measurements from 12 leads applied to the patient’s body. In other embodiments, other physiological parameters may be used, for example coronary angiograms, cardiac MRI, cardiac CT, EEG measurements, levels of various chemicals in a patient’s body and so on. Measurement data units 10 may also be referred to as instances.

The measurement data units 10 further comprise values of one or more attributes of the respective patients. This reflects the labelled nature of the data mentioned above. For example, the measurement data units 10 may comprise the sex, age, identity (e.g. a identification number or code), and/or ethnicity of the patient from which the measurements of the physiological parameter in that data unit are taken. Each measurement data unit 10 comprises measurements of the physiological parameter from a single patient.

The measurement data units 10 are assigned to the measurement category associated with values of the attributes that are the same as the values of the attributes of the respective measurement data units 10. As mentioned above, the measurement categories may be associated with values of one or more attributes, i.e. with a specific combination m of the values of attributes. For example, a measurement category may be associated with males under 25, and measurement data units 10 comprising measurements taken from patients who are male and under 25 would be associated with that measurement category.

Each measurement data unit 10 may be further assigned to one of a plurality of clinical classes. These may be, for example, classes of disease for which patients have an existing diagnosis. The clinical classes may correspond to particular values of one of the attributes, or combinations of values of two or more attributes.

The method of training comprises carrying out one or more iterations. The iterations are repeated until the value of a loss function is reduced to meet a predetermined condition. The loss function will be discussed further below. For example, in Fig. 1, the iterations are repeated until the value of the loss function is below a predetermined threshold. Alternatively, the predetermined condition may be on a change in the value of the loss function between iterations. The iterations may be repeated until the change in the value of the loss function from a previous iteration is below a predetermined threshold, i.e. the value of the loss function is minimised.

Each iteration comprises a step S10 of choosing values of one or more network parameters of the machine-learning network 1. The values of the one or more network parameters may be chosen randomly in a first iteration of the one or more iterations. Alternatively, predetermined initial values may be used in the first iteration. In subsequent iterations, the values of the network parameters will generally be chosen to reduce the value of the loss function relative to the value of the loss function in preceding iterations.

Any suitable method for choosing the values of the network parameters may be used, as is known in machine learning. In some examples, choosing values of the one or more network parameters may comprise choosing values of the network parameters based on values of the network parameters from one or more previous iterations. Alternatively or additionally, choosing values of the one or more network parameters may comprise choosing values of the network parameters based on a change in the value of the loss function between one or more previous iterations.

Each iteration further comprises a step S12 of generating an embedded representation 14 for each measurement data unit using the machine-learning network 1, according to the values of one or more of the network parameters. The embedded representations 14 may be generated using a feature extractor 3, as shown in Fig. 2. The feature extractor 3 applies a function f 0 : x G R D -> h G R E , that is parameterized by 0 and maps D-dimensional inputs, x, to E-dimensional embedded representations, h.

In Fig. 1, each iteration of the method further comprises a step S14 of generating classification parameters 16 for each embedded representation 14 according to the network parameters. However, this is not essential, and some embodiments of the method may not include a step S14 of generating classification parameters. In Fig. 2, the generating of classification parameters is performed using a hypemetwork parameter generator 7 using the embedded representations 12, although in general this may be done with any suitable method.

Embedded representations are fed into a hypernetwork, g (p , to generate parameters for a linear classification layer p M that outputs a prediction of clinical class, y. The parameter generator 7 applies a function g^ ■ h E R E -> m G R ExC , that is parameterized by (p and maps E-dimensional embedded representations, h, to a matrix of classification parameters m, where C is the number of clinical classes. During training, the parameter generator 7 is provided with embedded representations, h L , as inputs, thus generating representation-specific parameters a>t.

The use of the hypemetwork parameter generator 7 is an example of meta-leaming, which is focused on designing learning paradigms that allow for the fast adaptation of networks. Examples of meta-learning approaches include Prototypical Networks (Snell et al., 2017), which average representations to obtain class-specific prototypes. During inference, the similarity of representations to these prototypes determines the classification. Another example is Relational Networks (Sung et al., 2018), which build on this idea by learning the similarity of representations to prototypes through a parametric function. Gidaris & Komodakis (2018) and Qiao et al. (2018) exploit hypernetworks (Ha et al., 2016) and propose to generate the parameters of the final linear layer of a network for few shot learning on visual tasks.

Each iteration further comprises a step S18 of assigning each embedded representation 14 to one of the clinical classes using the machine-learning network 1, the assignment based on the classification parameters 16. The assigning of each embedded representation 12 may be performed using a linear classifier 5 using the classification parameters. An example is shown in Fig. 2, where the classification parameters parameterize the linear classifier p M ■ h G R E -> y G R c , that maps the embedded representations, h, to posterior clinical classes, y. Other types of classifier may also be used to perform the step SI 8.

Each iteration further comprises a step S20 of, for each measurement category, calculating a clinical prototype 12 according to the values of one or more of the network parameters. The clinical prototypes 12 are analogous to the embedded representations 14 in their dimensionality and structure, but are defined by values of the network parameters, and do not correspond directly to any single embedded representation 14 of any of the measurement data units 10.

The steps S12, S14 and SI 8, and S20 may be carried out in any order, or one or more of the steps S12, S14 and SI 8, and S20 may be performed simultaneously, as each relies on the current values of the network parameters in each iteration, which are not updated until the start of the next iteration. However, step S18 must follow step S14.

The method achieves learning via a contrastive loss, which may be combined with other types of loss in some embodiments, such as a supervised loss in embodiments such as shown in Fig. 2.

Each iteration further comprises a step S22 of calculating a value of a loss function. The loss function comprises a contrastive loss dependent on differences between the embedded representations 12 of the measurement data units 10 and the clinical prototypes 14 of the measurement categories. The differences may be calculated using any suitable function, such as a similarity function v fc ). In an embodiment, the differences between embedded representations 12 and clinical prototypes 14 are calculated using a cosine similarity function with a temperature parameter, T, but in general any suitable similarity function may be used. This encourages embedded representations to be similar to their corresponding clinical prototype, v.

Contrastive learning is a self-supervised method that encourages representations of measurement data units with commonalities to be similar to one another. This is performed for each measurement data unit and its perturbed counterpart (Chen et al., 2020) and for different visual modalities (views) of the same measurement data unit (Tian et al., 2019). To avoid the large number of comparisons between representations that would otherwise need to be performed, Caron et al. (2020) propose instead to learn cluster prototypes. Cheng et al. (2020) and CLOCS (Kiyasseh et al., 2020) both show the benefit of encouraging patient specific representations to be similar to one another. In contrast to these previous approaches, the present method encourages embedded representations 14 to be similar to clinical prototypes 12 that are learned in an end-to-end manner. Thereby, each clinical prototype 12 captures essential characteristics that describe the state of the patient, while being invariant to nuisance differences present in individual embedded representations 14.

The differences between the embedded representations 14 and the clinical prototypes 12 comprise attractive differences between the embedded representations 14 of the measurement data units 10 and the clinical prototypes 12 of the measurement categories to which the respective measurement data units 10 are assigned. Each attractive difference is calculated between an embedded representation 14 of a measurement data unit 10 and the clinical prototype 12 of the measurement category to which that measurement data unit 10 is assigned. An attractive difference may be calculated for the embedded representation 14 of every measurement data unit 10 assigned to the measurement category.

The differences between the embedded representations 14 and the clinical prototypes 12 further comprise repulsive differences between the embedded representations 14 of the measurement data units 10 and the clinical prototypes 12 of measurement categories to which the respective measurement data units 10 are not assigned. Each repulsive difference is calculated between an embedded representation 14 of a measurement data unit 10 and a clinical prototype 12 of one of the measurement categories to which that measurement data unit 10 is not assigned. A repulsive difference may be calculated between an embedded representation 14 of a measurement data unit 10 and each one of the clinical prototypes of measurement categories to which the measurement data unit 10 is not assigned. In general, the contrastive loss increases if the attractive differences increase, and the contrastive loss decreases if the repulsive differences increase.

The contrastive loss encourages each embedded representation, h L = fe(xt), in a mini-batch of size B, to be similar to its corresponding clinical prototype 12, v k , and dissimilar to the remaining clinical prototypes 12. Measurement data units 10 within a dataset can differ in their degree of similarity to one another. Ideally, when attempting to capture the semantic relationship between measurement data units 10, those that are more similar to one another should be closer to one another. Contrastive learning encourages exactly this behaviour between pairs of embedded representations 14 that share some context, in particular in being assigned to the same measurement category.

A particular form of the contrastive loss is as follows: where the terms have the meaning as discussed above, and P is the number of clinical prototypes (CPs). The form of contrastive loss in Eq. (1) may be referred to as hard assignment. This is because representations 14 are attracted to CPs 12 of measurement categories assigned to the same attribute combination as the measurement data unit. More specifically, each representation, v, of an instance (measurement data unit), x, associated with a particular attribute combination, m, is encouraged to be similar to the single clinical prototype, p m , that perfectly matches the attribute combination and dissimilar to the remaining clinical prototypes, pi, where i m. Mapping each representation 14 to a single CP 12 is referred to as a hard assignment.

The hard assignment approach assumes a bijective relationship between v and p m . This suggests that clinical prototypes without a perfect attribute match to a particular representation (near miss) are unable to leverage potentially useful information from that representation.

An alternative approach is soft assignment whereby each representation, Vi, is mapped to a set of class-specific clinical prototypes, L, where |L| < |M|. A class-specific mapping is chosen to avoid erroneously mapping representations 14 to clinical prototypes 12 from a different class (classes may also be referred to as clusters). Therefore, one or more of the attributes are defined as class attributes. The classes are defined by different values (or combinations of values) of the one or more class attributes. The class of an embedded representation 14 is defined by the values of the class attributes of the measurement data unit 10 from which it is derived. The classes may be defined by the values of any combination of the patient attributes, but in the examples below are defined by the values of the single attribute of disease class. Embedded representations 14 are attracted to CPs 12 in the same class, and may also be repelled from CPs 12 in other classes.

When using this soft assignment, the differences between the embedded representations 14 and the clinical prototypes 12 comprise a plurality of attractive differences between the embedded representations 14 of the measurement data units 10 and the clinical prototypes 12 of each of the measurement categories associated with the same values of the class attributes as the values of the class attributes of the respective measurement data units 10. The contrastive loss increases if the attractive differences increase.

The differences between the embedded representations and the clinical prototypes additionally comprise a plurality of repulsive differences between the embedded representations of the measurement data units and the clinical prototypes of each of the measurement categories associated with values of the class attributes different from the values of the class attributes of the respective measurement data units. The contrastive loss decreases if the repulsive differences increase.

However, the clinical prototypes 12 in the set L may not be equally similar to each representation 14, given the discrepancies in their attribute combinations. To account for this, weights can be applied to the differences between the representation, Vi, and each clinical prototype, pk, by Wik. Such a weight is determined by the discrepancy in the respective attribute combinations, Differences between embedded representations 14 and clinical prototypes 12 with more similar attribute combinations are given a relatively larger weight. This is visualised in Fig. 3, where the attractive and repulsive differences are referred to as attractive and repulsive forces. The clinical prototypes are coloured according to sex and shaded in ascending order of age, demonstrating possible other attributes. The thickness of the solid lines indicates the weights applied to the attractive forces. A further illustration is shown in Fig. 4, additionally including the feature extractor step and including a larger number of different values of the class attribute. As shown in Fig. 4, the representation, v t , of an instance, x t , associated with a set of attributes, Ai, is strongly attracted to the clinical prototype which represents the same attributes, weakly attracted to others within the same disease class (colour), and repelled from those representing different classes. These attractions result in the shown similarity probability mass function.

Thereby, the contrastive loss depends on the plurality of attractive differences with weights that differ between the plurality of attractive differences depending on the number of values of attributes other than the class attributes that are the same between the measurement data units 10 and the respective measurement categories. The weights of the attractive differences are higher for measurement categories having a higher number of attributes other than the class attributes having values that are the same between the measurement data units 10 and the respective measurement categories.

In Fig. 3, all repulsive forces are treated equally whereas attractive forces are weighted according to the degree of attribute matching between the instance and the respective clinical prototype. However, in general, repulsive differences may also be weighted. In such a case, the contrastive loss depends on the plurality of repulsive differences with weights that differ between the plurality of repulsive differences depending on the number of values of attributes that are different between the measurement data units 10 and the respective measurement categories. The weights of the repulsive differences are higher for measurement categories having a higher number of attributes having values that are different between the measurement data units 10 and the respective measurement categories.

Mathematically, Eq. 1 is extended as follows: where T W is a temperature parameter that determines how soft the assignment is. For example, as T W — > °°, the loss-term approaches the hard assignment formulation. Also, 6 represents the Kronecker delta function that evaluates to one if the argument is true and zero otherwise.

Fig. 5 visualizes the UMAP projection (Mclnnes et al. 2018) of clinical prototypes learned with a hard assignment (Fig. 5 left) and those learned with a soft assignment (Fig. 5 centre). With hard assignment, representations (instances) are attracted to class-specific prototypes uniformly. Those learned with a soft assignment become more linearly separable across sex.

Clinical prototypes 12 learned in an end-to-end manner as discussed, using hard or soft assignment, will exhibit a high and desirable degree of inter-class separability. However, CPs 12 within a class (or cluster) are still at risk of collapsing to a single point. This would decrease their utility for attribute-based querying. To encourage intra-cluster separability of CPs 12, a regression-based regularization term can be used that encourages CPs 12 to adopt the formation shown in Fig. 3. This allows clinical prototypes to confer a high degree of interpretability, by also capturing the semantic relationships between attributes. Concretely, clinical prototypes representing similar attribute sets (e.g., adjacent age groups) should be similar to one another. This is analogous to the high similarity of word embeddings representing semantically similar words (Smeaton 1999) in natural language processing (NLP). In NLP, a learnable word embedding represents a unique word. In the present method, each clinical prototype represents a unique combination of discrete patient attributes. To capture these semantic relationships, clinical prototypes 12 assigned to the same class are encouraged to maintain some desired distance between one another.

In such embodiments, the loss function further comprises a regression loss dependent on intraclass spacings between the clinical prototypes 12 of each of the measurement categories associated with the same values of the class attributes, the regression loss increasing when the difference between the intraclass spacings and a set of desired spacings increases. Fig. 5 (right) visualizes the UMAP projection (Mclnnes et al. 2018) of clinical prototypes learned with a soft assignment including a regression loss. This leads to clinical prototypes that adopt a semantically meaningful arrangement. The set of desired spacings may be determined based on differences between the values of attributes other than the class attributes with which the measurement categories associated with the same values of the class attributes are associated. The intraclass spacings are determined for the clinical prototypes 12 of each pair of measurement categories associated with the same values of the class attributes. The intraclass spacings may be determined using pairwise Euclidian distances between the clinical prototypes 12 of each of the measurement categories associated with the same values of the class attributes.

In a particular embodiment of this approach, CPs 12 in the set, M, are sorted according to their attributes and normalized using the L2 norm. Their pairwise Euclidean distances are then calculated, generating the matrix D M xM . Fig. 6 shows a matrix D M xM illustrating the Euclidean distance between each pair of clinical prototypes 12. These pairwise difference represent the intraclass spacings between the clinical prototypes 12.

As the regression loss is only calculated using pairwise distances for prototypes 12

M M within the same cluster, C matrices D c x c are produced (where C is the number of classspecific clusters, which will correspond to the number of possible combinations of values of the class attributes) populated with ground-truth distances. The ground-truth distances reflect a semantic relationship between the attributes of the measurement categories of the CPs, and are generated as a reference for a desired set of spacings that the clinical prototypes within a class (or cluster) should adopt. The ground-truth distances are formulated as d E = a x S, where S G R represents an integer distance between attributes and a is a parameter that can be varied depending on the values of the distances. S is incremented by 1 for every increment in attribute value away from the current prototype’s corresponding attribute. For example, where disease class is used as the class attribute, and sex and age are the remaining non-class attributes, the two CPs m 1 =

{Atrial Fibrillation, Male, < 25} and m 2 = {Atrial Fibrillation; Female; < 25], result in S=1 and d E = a X 1. In some embodiments, S is the Hamming distance between a pair of attribute sets. The regression loss in this example is calculated as the mean- squared error between the class-specific submatrix of intraclass spacings D c and the corresponding ground-truth matrix of desired spacings, D c , as follows.

The mean-squared error between the observed intra-cluster distances, D c , and the ground-truth intra-cluster distances, D c , is then minimized as part of the loss function over the course of the multiple iterations of the method.

When, as in Figs. 1 and 2, the method comprises a step S18 of assigning the representations to clinical classes, this can be used as a further source of information for the loss function used to train the machine-learning network. In such embodiments, the loss function further comprises a supervised loss. The supervised loss is dependent on a difference between the clinical class to which each measurement data unit 10 is assigned and the clinical class to which the embedded representation 14 of the measurement data unit 10 is assigned. By minimising this supervised loss, the classifications generated by the machine-learning network are encouraged to be the same as those with which the original training data is labelled. This should allow the machine-learning network to learn characteristics of the input data in the measurement data units that reflect the clinical classes to which they should be assigned. In a particular example, the following form of the supervised loss may be used. for training for inference, Vfe ■■■- argmax s(/^, v/) 9

As will be discussed further below, the clinical prototype nearest the representation of the input data is used during inference, when the trained machine learning network is applied to new data not part of the training set.

Experimental Results

Two embodiments are discussed in detail below to validate the efficacy of the method of training described above. These embodiments are illustrated schematically in Fig. 8. Clinical prototypes are exploited for attribute-specific clustering and retrieval of cardiac signals. For clustering, clinical prototypes act as centroids of clusters to which unlabelled instances are assigned. Such an assignment is associated with a set of attributes, such as disease class, sex, and age. For retrieval, each clinical prototype is used as a query, associated with a set of attributes, to search through an unlabelled database and retrieve instances to which it is most similar. These results show that the present method outperforms the state-of-the-art method, DTC, in the clustering setting and retrieves relevant cardiac signals from a large database. At the same time, clinical prototypes adopt a semantically meaningful arrangement and thus confer a high degree of interpretability. Patient Cardiac Prototypes (PCPs)

In the first example, the measurement categories are defined by patient identity, so that one measurement category exists for each individual in the training dataset. Multiple measurement data units 10 may be present for each patient in the dataset. In this example, the physiological parameter is 12-lead ECG data. Because CPs are generated for each patient based on cardiac information, the CPs in this section are referred to as patient cardiac prototypes (PCPs). PCPs are efficient descriptors of the cardiac state of a patient. In the embodiment generating PCPs, the machine-learning network 1 comprises a parameter generator 7 and classifier 5 and the method of training used is as shown in Fig. 1. PCPs are 1) are patient-specific 2) allow for the discovery of similar patients both within and across datasets, and 3) can be exploited for dataset distillation as a compact substitute for the original raw dataset, as will be discussed further below. In particular, knowledge of patient similarity has a twofold effect. First, the ability to compare patients allows clinicians to validate network-generated diagnoses. Second, it allows them to explore previously unidentified patient relationships thus paving the way for novel clinical hypotheses.

ECG datasets are used that contain a significant number of patients. PhysioNet 2020ECG consists of 12-Lead ECG recordings from 6,877 patients alongside labels corresponding to 9 different classes of cardiac arrhythmia. Each recording can be associated with multiple labels. Chapman ECG (Zheng et al., 2020) consists of 12-Lead ECG recordings from 10,646 patients alongside labels corresponding to 11 different classes of cardiac arrhythmia. As is suggested by Zheng et al. (2020), these labels are grouped into 4 major classes. PTB-XL ECG (Wagner et al., 2020) consists of 12-Lead ECG recordings from 18,885 patients alongside 71 different types of annotations provided by two cardiologists. The training and evaluation protocol presented by Strodthoff et al. (2020) is followed leveraging the 5 diagnostic class labels. The original setup is altered by only considering ECG segments with one label assigned to them and converting the task into a binary classification problem. Each dataset contains patient sex and age information. Unless otherwise mentioned, datasets were split into training, validation, and test sets according to patient ID using a 60, 20, 20 configuration. In other words, patients appeared in only one of the sets. Further details about the dataset splits can be found in Table 1, which shows the number of patients used during training. These represent sample sizes for all 12 leads.

Table 1

For all of the datasets, frames consisted of 2500 samples and consecutive frames had no overlap with one another. Data splits were always performed at the patient-level. For PhysioNet 2020 (Alday et al., 2020), each ECG recording varied in duration from 6 seconds to 60 seconds with a sampling rate of 500Hz. Each ECG frame consisted of 2500 samples (5 seconds), as this is common for in-hospital recordings. Multiple labels are assigned to each ECG recording as provided by the original authors. These labels are: AF, I-AVB, LBBB, Normal, PAC, PVC, RBBB, STD, and STE. The ECG frames were normalized in amplitude between the values of 0 and 1. For Chapman (Zheng et al., 2020), each ECG recording was originally 10 seconds with a sampling rate of 500Hz. The recording was downsampled to 250Hz and therefore each ECG frame consisted of 2500 samples. The labelling setup suggested by Zheng et al. (2020) was followed, which resulted in four classes: Atrial Fibrillation, GSVT, Sudden Bradychardia, Sinus Rhythm. The ECG frames were normalized in amplitude between the values of 0 and 1. For PTB- XL (Wagner et al., 2020), each ECG recording was originally 10 seconds with a sampling rate of 500Hz. 5-second non-overlapping segments of each recording were extracted generating frames of length 2500 samples. The diagnostic class labelling setup suggested by Wagner et al. (2020) was followed, which resulted in five classes Conduction Disturbance (CD), Hypertrophy (HYP), Myocardial Infarction (MI), Normal (NORM), and Ischemic ST-T Changes (STTC). The original setup was altered in two main ways. Firstly, only ECG segments with one label assigned to them are considered. Secondly, the task is converted into a binary classification problem of NORM vs (CD, HYP, MI, STTC) from above. The ECG frames were normalized in amplitude between the values of 0 and 1.

When calculating the contrastive loss, a value was chosen of T = 0.1 as in Kiyasseh et al. (2020). The neural network, f 0 , comprises ID convolutional operators. Table 2 shows the neural network architecture used. K, C in , and C out represent the kernel size, number of input channels, and number of output channels, respectively. A stride of 3 was used for all convolutional layers. E represents the dimension of the final representation. The same neural network architecture was used for all experiments.

Table 2

Table 3 shows the batch size B and learning rates used for training with different datasets. The Adam optimizer was used for all experiments.

Table 3

PCPs are fed to the hypernetwork during inference, to generate patient-specific parameters for the final (linear) layer of a classification network. During inference, the most similar PCP, v k , (according to some similarity metric, s, such as the cosine similarity) to each instance representation, h L , serves as the input. This approach exploits the similarity between patients in the training and inference set and, in turn, generates patientspecific parameters. Classification parameters of machine-learning networks are typically updated during training of the machine learning network, and fixed during inference when the trained network is applied to other data. This allows the training of the classification parameters to exploit population-based information in order to learn high-level features useful for solving the task at hand. Such an approach, however, means that all instances of input data in subsequent use are exposed to the same set of parameters regardless of instance-specific information. In the present method, instance-specific information is incorporated by making parameters dependent on the clinical prototypes, which achieves more personalized prediction.

Thereby, a method of assigning a patient to one of a plurality of clinical classes is provided. The method comprises receiving one or more measurement data units 10, each comprising one or more measurements of a physiological parameter of the patient. These measurement data units may be referred to as instances. They have the same structure as the measurement data units 10 used during training of the machine-learning network, but will generally be drawn from different patients to those used during training.

The method further comprises generating an embedded representation 14 for the measurement data unit 10 using a machine-learning network 1 trained using the method described above in relation to Figs. 1 and 2. An example of such a machine-learning network as in use during the method of assigning a patient is shown in Fig. 7. As mentioned above, the method comprises identifying one or more clinical prototypes 12 most similar to the embedded representation 14 from the clinical prototypes 12 obtained from the machine-learning network 1. Then the method comprises generating classification parameters 16 for the one or more clinical prototypes 12 most similar to the embedded representation 14 using the machine-learning network 1. Finally, the method comprises classifying the patient based on the classification parameters 16.

During training, a loss function is optimized that consists of both a supervised and contrastive loss term (see Eqs. 1 and 7). In the PCP embodiment, the hard contrastive loss of Eq. 1 is used, as this is more appropriate where the measurement categories each correspond to a single, specific patient during training. Based on the supervised loss, instance representations are expected to exhibit discriminative behaviour for the classification task at hand. The contrastive loss term encourages these instance representations to be similar to PCPs, and thus PCPs are expected to be discriminative. In Figs. 9 and 10, the representations of instances in the training set and the PCPs are illustrated after being projected to a 2-dimensional subspace using t-SNE and colour-coded according to their class label (in this case the arrhythmia label assigned to each patient). Fig. 9 shows representations, h G R 128 , of instances in the training set of the Chapman dataset. Fig. 10 shows PCPs, v G R 128 , learned on the training set.

Both training representations, h, and PCPs, v, are class-discriminative. This can be seen by the high separability of the projected representations along class boundaries. Based on this finding, one could make the argument that PCPs are simply picking up the class label differences between patients. However, the PCPs are more clustered and exhibit more overlap with one another as compared to the training representations. Such a finding suggests that PCPs are encoding information beyond class labels and that which could be patient-specific.

To generate patient-specific classification parameters during inference the PCP nearest to the instance representation is used as input to the hypernetwork. This approach places a substantial dependency on that single chosen PCP. Therefore, three additional input strategies are considered that incorporate PCPs differently.

Nearest 10 searches for, and takes the average of, the 10 PCPs that are nearest to the instance representation. Mean simply takes the average of all PCPs. Similarity- Weighted Mean takes a linear combination of all PCPs, weighted according to their cosine similarity to the instance representation.

Fig. 11 shows the effect of these strategies on the test set area under curve (AUC) on a test set of the Chapman dataset as a function of hypemetwork input strategies and as the embedding dimension, E, is changed. Bars are averaged across five seeds and the error bars illustrate one standard deviation.

Exploiting the similarity of representations during inference benefits the generalization performance of the network. This is shown by the inferiority of the mean strategy relative to the remaining strategies. For instance, at E=256, the mean strategy achieves an AUC-0.50, equivalent to a random guess. However, simply weighting those PCPs according to their similarity to instance representations leads to an AUC-0.65. Such a finding suggests that the representations learned are capturing patient-specific information and similar patients are being found.

It is more advantageous to exploit similarity to identify the nearest PCPs than to weight many PCPs. In Fig. 11, the “Nearest” and “Nearest 10” input strategies perform best, with the latter achieving an AUC ~ 0.90, regardless of the embedding dimension. The is also unaffected by changes to the embedding dimension. This behaviour can be explained by the idea that the incorporation of fewer PCPs is less likely to overwhelm the hypemetwork, thus allowing it to generate reasonable classification parameters. The strong performance of these strategies despite their high dependence on so few PCPs reaffirms the utility of the representations learned.

Therefore, in some embodiments, in the method of assigning a patient to one of a plurality of clinical classes, the step of identifying one or more clinical prototypes 12 may comprise identifying a plurality of the clinical prototypes 12 most similar to the embedded representation 14 and calculating an average of the plurality of clinical prototypes 12. The step of generating classification parameters for the one or more clinical prototypes 12 then comprises generating classification parameters for the average of the plurality of clinical prototypes 12.

PCPs can provide patient-specific representations. To demonstrate this, the Euclidean distance between each PCP and two sets of representations 14 is calculated. The first set includes representations corresponding to the same patient summarized by the PCP (PCP to Same Training Patient). The second includes representations that correspond to all remaining patients (PCP to Different Training Patients). The intuition is that if PCPs were truly patient-specific, the distances in the former scenario should be smaller than those in the latter scenario. This expectation is confirmed by the results in Fig. 12.

Fig. 12 shows the distribution of pairwise Euclidean distance from the learned PCPs on the Chapman dataset to three sets of representations: those in the training set that belong to the same patient (blue), those in the training set that belong to different patients (orange). PCPs are patient-specific since they are closer to representations belonging to the same patient than they are to representations belonging to different patients.

The distributions of the distances in the PCP to Same Training Patient and PCP to Different Training Patients scenarios exhibit no overlap and a mean of ~ 4.5 and 9.5, respectively. Such a finding implies that PCPs are on average, a factor of 2 more similar to representations from the same patient than they are to those from other patients.

During inference, PCPs are compared to representations 14 of instances in the validation set (see Fig. 7). Such a comparison would be appropriate if the pairwise distances between these representations were on the same order of the distances between PCPs and representations of instances in the training set. Fig. 12 further shows the distribution of distances between the PCPs and representations of instances from the validation set (PCP to Validation Patients, purple). Its proximity to the other two distributions supports the appropriateness of the method.

Moreover, this distribution implies that patients in the validation set are not present in the training set, as was enforced by design. This can be seen by the overlapping distributions (purple and orange). Such a finding suggests that PCPs may be useful in detecting out-of-distribution patients or distribution shift.

To illustrate the generalizability of these claims to other datasets, graphs analogous to Fig. 12 are reproduced for two additional datasets, PTB-XL and PhysioNet 2020. In Figs. 13 and 14, the distribution of Euclidean distances between the aforementioned representations is illustrated for the two datasets, respectively.

Patient cardiac prototypes are indeed patient-specific. Fig. 13 shows the distribution of pairwise Euclidean distance from the learned PCPs on the PTB-XL dataset to three sets of representations: those in the training set that belong to the same patient (blue), those in the training set that belong to different patients (orange), and those in the validation set (purple). PCPs are patient-specific since they are closer to representations belonging to the same patient than they are to representations belonging to different patients. In Fig. 13, this is supported by the smaller average Euclidean distances between PCPs and representations of instances of the same patient than between PCPs and representations of instances from different patients (average Euclidean distance ~ 7 vs. 10, respectively). Furthermore, PCPs have the potential to be used for the detection of out-of-distribution data. The high overlap between the PCP to Validation Patients and PCP to Different Training Patients has a twofold implication. First, it suggests that instances in the validation set belong to patients not found in the training set (by design). Second, that patients in the validation set, on average, belong to the same overall distribution of patients.

Fig. 14 shows the distribution of pairwise Euclidean distance from the learned PCPs on the PhysioNet 2020 dataset to three sets of representations: those in the training set that belong to the same patient (blue), those in the training set that belong to different patients (orange), and those in the validation set (purple). As shown for the Chapman and PTB-XL datasets, PCPs are patient-specific when trained on the PhysioNet 2020 dataset. This is emphasized by the high degree of separability between the PCP to Same Training Patient and PCP to Different Training Patients distributions. The instances in the PTB-XL dataset exhibit a higher degree of diversity relative to those in other datasets. This is seen by comparing the PCP to Same Training Patient distribution which has a larger mean (~ 8) compared to- 4 (Fig. 12) and 7 (Fig. 13) for the Chapman and PTB-XL datasets, respectively. Such a finding implies that the PCPs had a more difficult time summarizing the representations that belong to the same patient.

Patient Similarity aims at discovering relationships between patient data Sharafoddini et al. (2017). To quantify these relationships, Pai & Bader (2018) and Pai et al. (2019) propose Patient Similarity Networks for cancer survival classification. Exploiting electronic health record data, Zhu et al. (2016) use Word2Vec to learn patient representations, and Suo et al. (2017) propose to exploit patient similarity to guide the retraining of models, an approach which is computationally expensive. Instead, the present method naturally learns PCPs as efficient descriptors of the cardiac state of a patient. Patient Similarity Quantification is achieved by measuring the Euclidean distance between PCPs and representations. Thus, similar patients are identified. This is validated by visualizing their ECG recordings.

Up until this point, it has been shown that PCPs that are both patient-specific and discriminative for the task at hand. These two properties suggest that PCPs can also be exploited for the quantification of patient similarity. However, the ground truth for the similarity of a pair of patients is non-trivial as it can be based on demographic, physiological, or treatment outcomes. Therefore, the following two steps are carried out: 1) use PCPs to discover similar and dissimilar patients within the same dataset, and 2) validate these findings by comparing the patients’ 12-Lead ECG recordings.

To achieve the first goal the pairwise Euclidean distance is computed between each PCP and each representation in the validation set. The distribution of these distances can be found for the Chapman dataset in Fig. 15 (top). Once these distances are averaged across representations that belong to the same patient, a matrix of pairwise distances between patients is generated. Fig. 15, centre, illustrates pairwise distances between PCPs and representations of patients in the validation set for a subset of that matrix. By locating the cell with the lowest Euclidean distance, this identifies the pair of patients that are most similar to one another. Their corresponding 12-Lead ECG recordings are visualised in Fig. 15, bottom. Both recordings are similar and correspond to the same arrhythmia, supraventricular tachycardia

PCPs are able to sufficiently distinguish between unseen patients and thus act as reasonable patient-similarity tools. In Fig. 15 (centre), there exists a large range of distance values for any chosen PCP (row). In other words, it is closer to some representations than to others, implying that a chosen PCP is not trivially equidistant to all other representations. However, distinguishing between patients is not sufficient for a patient similarity tool. Such a tool must also correctly capture the relative similarity to these patients.

PCPs can comfortably achieve this aim. In Fig. 15 (bottom), the two patients identified as being closest to one another have ECG recordings with a similar morphology and arrhythmia label, supra-ventricular tachycardia. This behaviour arises due to the ability of PCPs to efficiently summarize the cardiac state of a patient. Such a finding reaffirms the potential of PCPs as patient similarity tools.

This implementation thereby provides a method of determining a similarity among a plurality of patients. The method comprises receiving measurement data units of each patient. As above, each measurement data unit comprises one or more measurements of a physiological parameter of the patient. The method further comprises, for each patient, generating an embedded representation of the measurement data units using the machinelearning network trained using the method described above. The method then comprises identifying one or more clinical prototypes most similar to the embedded representation from the clinical prototypes obtained from the machine-learning network, and determining the similarity among the patients based on a similarity among the clinical prototypes identified for each patient.

The above procedure is repeated to discover similar and dissimilar patients across different datasets. Cross-dataset patient-similarity quantification can be consequential in exploiting clinical knowledge located in different institutions. Patient similarity is quantified by calculating the pairwise Euclidean distance between their representations. More specifically when computing the similarity between patients in the training set, the Euclidean distance between the PCPs is calculated. In contrast, when comparing patients in the validation set to those in the training set, the Euclidean distance is calculated between the latter’s PCP and the former’s representations. Such pairwise distances are averaged across the multiple instances that may exist for the same patient in the validation set.

Fig. 16 shows an identification of the two most similar patients in the training and validation sets of the PhysioNet 2020 dataset, analogous to that in Fig. 15. Fig. 16 (top) shows the distribution of all pairwise Euclidean distances between PCPs and representations in the validation set. Fig. 16 (centre) shows a matrix illustrating pairwise distances between a subset of PCPs and representations of patients in the validation set. Fig. 16 (bottom) shows a visualization of the 12-Lead ECG recordings of the two most similar patients. Both recordings are similar and correspond to the same morphology, sinus rhythm, which is considered normal.

It is also possible to discover similar patients across datasets. Fig. 17 illustrates the identification of the two most similar patients in the training sets of the Chapman and PTB- XL dataset. To do so, the pairwise Euclidean distance is computed between the PCPs of each dataset. The distribution of all these distances are shown in Fig. 17 (top). The pairwise distances for a subset of the PCPs in the Chapman and PTB-XL dataset are also illustrated in Fig. 17 (centre). From a clinical perspective, such a matrix provides physicians with the ability to identify patients that are most similar to the one they are currently diagnosing or treating. This information can help guide future clinical intervention. By locating the pair of PCPs with the lowest Euclidean distance, the pair of patients from each dataset that are most similar to one another. Their corresponding 12- Lead ECG recordings are visualised in Fig. 17 (bottom). Both recordings are similar and

To further validate the PCPs and their ability to discern between patients, the complete version of Fig. 15 is used to identify two patients deemed dissimilar according to their Euclidean distance. In Fig. 18, the 12-Lead ECG segments corresponding to these two dissimilar patients in the training and validation set of the Chapman dataset are illustrated. Similarity is defined as low Euclidean distance between patient cardiac prototypes (PCPs) and representations of instances in the validation set. The different morphology of the ECG segments between patients and the different arrhythmia labels (Sinus Rhythm vs. Atrial Fibrillation) show that these patients are indeed dissimilar. Such a finding reaffirms the notion that PCPs both capture the cardiac state of the patient and allow for reliable patient similarity quantification.

This is repeated for the other datasets. Fig. 19 shows 12-Lead ECG segments corresponding to two dissimilar patients in the training and validation set of the Chapman dataset. ECG segments between patients exhibit different morphology and correspond to the different arrhythmia labels (Sinus Rhythm vs. Atrial Fibrillation).

The distance matrix visualized in Fig. 17 is then used to identify the two most dissimilar patients across the Chapman and PTB-XL datasets. In Fig. 20, the 12-Lead ECG recordings of this pair of patients is visualized. PCPs are able to reliably identify two patients that are dissimilar. This can be seen the drastically different ECG morphology present in Fig. 20. ECG segments between patients exhibit different morphology and correspond to the different arrhythmia labels (Sudden Bradycardia vs. Ischemic ST-T Changes). The patient in the Chapman dataset is suffering from sudden bradycardia, a decrease in the rate at which the heart beats. In contrast, the patient from the PTB-XL dataset is suffering from changes to the ST segment of the ECG recording. Such a finding reaffirms that PCPs are reliable descriptors of the cardiac state of a patient.

This type of training can also be used for dataset distillation. A network can be trained on the compact set of PCPs in lieu of the larger, original dataset and maintain their generalization performance. This this can provide a method of training a machine-learning network to assign patients to one of a plurality of clinical classes. The method comprises training the machine-learning network using the clinical prototypes obtained using the method of training a machine learning network described above.

The evidence in support of PCPs as efficient descriptors of the cardiac state of a patient suggests PCPs are sufficient for training classification task k in lieu of the original raw dataset. This idea is similar to that of dataset distillation which focuses on obtaining a core-set of instances that do not compromise the generalization capabilities of a model (Wang et al., 2018).

To illustrate the use of PCPs as dataset distillers, a Support Vector Machine (SVM) is trained on them. The model is evaluated on representations of held-out instances. In Fig. 21, the generalization performance of these models is illustrated after having trained on a different fraction of the available PCPs. Fig. 21 shows the validation AUC after training an SVM on a different fraction of PCPs (E=32) available. Results are shown across 5 seeds. For comparison’s sake, the AUC is shown when the neural network is trained on all instances in training set (Full Training Set), which in total is several folds larger than the number of PCPs. PCPs do indeed act as effective dataset distillers. In Fig. 21, training on 100% of the PCPs (n= 6,387) achieves an AUC- 0.89, which is similar to that achieved when training on the full training set (n=76,614). In other words, similar generalization performance is achieved despite a 12-fold decrease in the number of training instances.

Reducing the number of PCPs, by selecting a random subset for training, does not significantly hurt performance. For instance, training with only 5% of PCPs available (n = 319) achieves an AUC ~ 0.82. Concisely, a 240-fold decrease in the number of training instances only leads to a 7% reduction in performance.

Such a finding supports the potential of PCPs as dataset distillers. This behaviour arises due to the patient-centric contrastive learning approach. By encouraging each PCP to be similar to several instances that belong to the same patient, it is able to capture the most pertinent information.

Similar results are achieved when changing the embedding dimension, E = [64,128,256], Fig. 22 shows Validation AUC after training an SVM on a different fraction of PCPs available for the Chapman dataset. The generalization performance when training on the full training set is also shown, (a)-(c) illustrate the effect of changing the embedding dimension, E, on the generalization performance. Results are shown across 5 seeds.

The embedding dimension, E, has a significant effect on the dataset distillation capabilities of PCPs. In Fig. 22(b), at an embedding dimension, E=128, the performance drop due to training with 100% of PCPs relative to training with the full training set is minimal, AAUC-0.90-0.89=0.01. In contrast, at E=256, this performance drop is more substantial AAUC-0.905-0.86=0.045. Such a finding suggests that more attention should be given to the embedding dimension when designing dataset distillation methods.

Fig. 23 shows the validation AUC after training an SVM on a different fraction of PCPs (E-126) available for the PTB-XL dataset. The generalization performance when training on the full training set is also shown. Results are shown across 5 seeds.

Experiments above used all 12 leads of an ECG. However, the availability of all 12 leads for training is not always guaranteed. This can be the case, for instance, in low- resource clinical settings where medical infrastructure is lacking or in the context of homemonitoring where wearable sensors are used. To investigate the robustness of the method to such scenarios, a subset of the experiments are repeated in the presence of only 4 leads (II, V2, aVR, aVL). The results can be found in Table 4, which shows AUC on a test set of Chapman dataset in the presence of a different number of leads. The inference strategy involves using the nearest PCP (E=128) as input to the hypernetwork. Results are shown for five seeds.

Table 4

Deep Retrieval of Physiological Signals

In this second example, the measurement categories are defined by combinations of patient attributes, namely disease class, sex, and age, a c ; ass , a sex , and a age . In this example, the physiological parameter is 12-lead ECG data. In this embodiment the machine-learning network 1 does not comprise a parameter generator 7 or classifier 5, and so the method of training used omits steps S14 and S18 shown in Fig. 1. The soft classification loss is used, and disease class is used as the single class attribute. A regression loss is also employed.

The experiments are conducted using PyTorch (Paszke et al., 2019) on datasets that consist of physiological time-series such as the electrocardiogram (ECG) alongside cardiac arrhythmia labels.

The regression term (Eq. 6) requires the specification of a, which was chosen as a=0.2 due to the choice of distance metric (squared Euclidean distance) and the number of attribute groups. If a were too small in magnitude, the intra-cluster separability of clinical prototypes would be too small. If a were too large, then the clinical prototypes from different classes would begin to overlap with one another and class separability diminishes. For the contrastive loss, the temperature parameter for the hard assignment was chosen as T = 0.1 (Kiyasseh et al. 2020), and the temperature parameter for the soft assignment was chosen

This second embodiment illustrates applications of the method to training networks for deep retrieval and clustering, among other applications. This framework is referred to as Deep Retrieval of Physiological Signals (DROPS). CPs are used to i) retrieve instances corresponding to a specific patient attribute combination, and ii) assign instances to various clusters. Instance retrieval may be important in clinical trial guidance, as clinical prototypes can retrieve patients that belong to certain groups, which may facilitate the choice of cohorts in clinical trials. Clustering may be a powerful tool for patient subtype discovery. The learning framework naturally lends itself to clustering, which might unearth patient/disease subtypes.

Given a query set of representations, H G {h x ; h 2 ], information retrieval systems can efficiently search through a large database to identify matching or similar representations. To that end, E-dimensional representations, H G R E are learned. These can be loosely thought of as centroids in the K-means formulation.

Clinical prototypes can play a dual role for retrieval and clustering purposes. Clinical information retrieval (IR) in which instances similar to a query are retrieved was first introduced in 1990 (Hersh & Greenes, 1990). In this setting, a query associated with a set of desired attributes is exploited to retrieve a relevant instance. Retrieving clinical data from a large database has been a longstanding goal of researchers within healthcare (Hersh & Greenes 1990). Such research has involved the retrieval of clinical documents (Gurulingappa et al. 2016, Wang et al. 2017, Rhine 2017, Wallace et al. 2016) where, for example, Avolio et al. (2010) map text queries to an ontology known as SNOMED, before retrieving relevant clinical documents.

More recently, IR has been performed with biomedical images, and is referred to as content-based image retrieval (Saritha et al., 2019; Chittajallu et al., 2019). Others have extended this concept to EHR data (Goodwin & Harabagiu, 2018; Wang et al., 2019; Chamberlin et al., 2019) and clinical text (Wang et al. 2017) to discover patient cohorts in a clinical database. For example, Chamberlin et al. (2019) implement rudimentary IR methods such as divergence from randomness on the UPMC and MIMICIII (Johnson et al., 2016) datasets with the aim of discovering patient cohorts. However, recent research has minimal emphasis on medical time-series data. For example, Goodwin & Harabagiu (2016) implement an unsupervised patient cohort retrieval system by exploiting clinical text and time-series data. These existing methods do not extend to cardiac time-series data cannot account for searching based on multiple patient attributes, and are unable to also cluster instances. In contrast to such methods, the present embodiment implements a deeplearning based clinical information retrieval system for ECG data. This learning framework allows for both the clustering and retrieval of cardiac signals based on multiple patient attributes.

To perform retrieval, CPs are treated as the query set, and the closest instances (based on Euclidean distance) are retrieved from a held-out dataset. In evaluating this task of retrieval, a commonly used metric known as Recall at K (R@K) is used. This metric quantifies whether at least one of the K retrieved instances matches the query.

In the present context, however, a match can occur according to any of the three attributes, a c iass, a sex-. and a age . Therefore, the attribute is extracted for each clinical prototype, a m , and that for each of the K retrieved instances, and the following attribute specific R@K is defined. where 6 is the Kronecker function that evaluates to one if the argument is true and zero otherwise.

The CPs can provide a method of selecting patients having desired values of one or more attributes from a plurality of patients. The method comprises receiving measurement data units of the plurality of patients, each measurement data unit comprising one or more measurements of a physiological parameter of one of the patients. The method further comprises generating an embedded representation of the measurement data units of each patient using a machine-learning network trained using the method described above. The method further comprises identifying a clinical prototype of a measurement category associated with the desired values of the one or more attributes from the clinical prototypes obtained from the machine-learning network, identifying one or more embedded representations most similar to the identified clinical prototype, and selecting the one or more patients corresponding to the identified embedded representations.

Table 5 shows the DROPS’ retrieval performance (R@K) for three different values of K G [1, 5, 10], Results are shown across 5 seeds. The # of attributes column outlines the minimum number of attributes that happen to match between the clinical prototype and the retrieved instances. For example, # of attributes=3 implies that a perfect match has occurred for all attributes: class, sex, and age.

Table 5 shows that clinical prototypes perform well in retrieving instances with at least one attribute match. This can be seen for # of attributes > 1 where R@l= 0.763, implying that 76.3% of the clinical prototypes retrieved an instance that had at least one match. Also, as the number of retrieved instances increases K), so does the likelihood of obtaining one that has at least one attribute match. For example, as K = 1 -> 10, the R@K = 0.763 -> 1. Such a finding illustrates the potential utility of clinical prototypes as tools for retrieval.

Table 5

When deployed to retrieve a single instance (K=l), clinical prototypes struggle to retrieve one whose attributes match more than once. This is shown by R@1 = 0.269 when # of attributes > 2. However, as more instances are retrieved, the likelihood of obtaining one that matches across at least two attributes improves drastically. For example, as K = 1 -> 10, the R@K = 0.269 -> 0.938. In other words, when retrieving ten instances, 93.8% of the clinical prototypes manage to find one whose attributes match at least twice. As expected, as one searches for a larger # of attributes that match, the retrieval performance of the clinical prototypes drops (/?@1 = 0.763 -> 0.025). The cause of this behaviour is twofold. First, the present training paradigm favours class discrimination and does not place an equal emphasis on all attributes. Therefore, perfect matches are more likely to occur along the class attribute. Moreover, the task of discriminating instances (ECG recordings) according to certain attributes, such as sex, is inherently difficult.

In addition, the retrieval capabilities of our clinical prototypes are qualitatively evaluated. Two random clinical prototypes with m 1 = SR, M, 49) and m 2 = (AFIB, F, 72) are chosen to form the query set, their Euclidean distance calculated to the representations of instances in a held-out dataset, and the 5 retrieved which are closest. In Fig. 24, the ECG signals are visualised that correspond to these top 5 closest representations in the validation set. Their borders are coloured green if their class label matches that of the query, and red otherwise. In both cases, 60% of the retrieved instances match the class label.

A second experiment to test clinical prototypes in the retrieval setting was carried out on the same datasets to compare the retrieval by the present method with that of prior art methods. In this second experiment, the supervised contrastive learning framework used to learn the representations of instances and the clinical prototypes is referred to as CROCS (clustering and retrieval of cardiac signals). Specifically a query retrieves the closest K= [1,5,10] previously unseen cardiac signals, and assigns them to its associated set of patient attributes. The prior art methods used for comparison are a subset of those later used for the experiments evaluating clustering performance discussed below. The details of these prior art methods are also discussed below.

In this experiment, access to the ground-truth attributes is assumed, which are used to calculate a variant of the precision at K metric (Eq. 9a). P@K checks whether at least one of the retrieved instances is relevant, where relevance is based on a partial or exact match of query and instance attributes (# attribute matches). This value is then averaged across all M prototypes.

In Tables 5a & 5b, the assignments of the previously unseen signals to the sets of attributes are evaluated based on both partial and exact matches of the attributes (# attribute matches) represented by the query and retrieved cardiac signals. Tables 5a and 5b show that precision of K retrieved representations, v, in the validation set of Chapman and PTB-XL respectively, that are closest to the query. Results are shown for partial and exact matches of the attributes (# attribute matches) represented by the query and retrieved cardiac signals, and are averaged across five random seeds. Brackets indicate standard deviation and bold reflects the top performing method. The strong performance of CP CROCS provides evidence in support of the CROCS framework.

Table 5a

Table 5b In Tables 5a and 5b, CROCS outperforms the baseline retrieval method DTC. For example, on Chapman, at K=l, and when # attribute matches > 1, CP CROCS and DTC achieve a precision of 95.6 and 71.9%, respectively. This indicates that, on average, 95.6% of the cardiac signals retrieved by the clinical prototypes are relevant. Relevance, in this case, implies that the retrieved cardiac signals share at least one attribute with the query. Such a finding points to the utility of clinical prototypes as queries in the retrieval setting.

In addition, CROCS leads to rich representation learning that facilitates retrieval. This is evident by the strong performance of TP CROCS which depends directly on representations learned via our CROCS framework. For example, on PTB-XL, at K=l, and when # attribute matches > 1, DTC, TP CROCS, and CP CROCS achieve a precision of 70.0, 99.0, and 92.5%, respectively. In this particular case, the lower performance of CP CROCS relative to TP CROCS is hypothesized to stem from clinical prototypes acting instead as archetypes (extreme representative data points) [39] which may occasionally hinder retrieval along multiple attributes. Evidence of such extreme embeddings can be found in Fig. 26.

As for the first retrieval experiment, the retrieval performance can also be qualitatively evaluated by examining retrieved instances. To qualitatively evaluate the retrieval performance, a query representing a set of attributes is randomly chosen and its Euclidean distance to the representations in a validation set is calculated. Fig. 27 shows the qualitative retrieval performance for two distinct query prototypes.

Fig. 27, top row, shows distributions of the Euclidean distances from (a) DTC query or (b) CP CROCS query to representations, v, in the validation set of Chapman, coloured based on the ground-truth class of the representations.

Fig. 27, bottom row, shows the six closest cardiac signals (K=6) to the query which is associated with a set of patient attributes {disease, sex, age}. Retrieved cardiac signals with a green border indicate those whose class attribute matches that of the query.

The CP CROCS query is closer to representations of the same class (SR) than to those of a different class and thus retrieves relevant cardiac signals. For example, in Fig. 27(b), top row, the average distance between the CP CROCS query representing {SR, male, under 49} and representations with and without the class attribute SR is ~ 0.6 and > 1.5, respectively. Such separability, which is not exhibited by the DTC query, points to the improved reliability of the CP CROCS query in distinguishing between the relevance of cardiac signals. Further evidence in support of this reliability is shown in Fig. 27, bottom row, where we find that a DTC and a CP CROCS query retrieve relevant cardiac signals 0 and 50% of the time, respectively. This finding also extends to the PTB-XL dataset. Clustering Using Clinical Prototypes

In addition to performing retrieval, clinical prototypes can be exploited for clustering. An unlabelled dataset, DU(x), that consists of instances, x, without groundtruth labels, y, can harbour significant semantic information. Extracting such information to discover relationships between instances can be done via unsupervised clustering methods, such as K-means. In this setting, instances are assigned to one of the K learned centroids to which they are closest (e.g., L2 distance). From this perspective, clinical prototypes can be thought of as multiple centroids of a particular cluster.

Consequently, there is provided a method of assigning a patient to one of a plurality of measurement categories. The method comprises receiving a measurement data unit of the patient comprising one or more measurements of a physiological parameter of the patient, generating an embedded representation of the measurement data unit using a machine-learning network trained using the method described above, calculating similarities between the embedded representation and the clinical prototypes obtained from the machine-learning network, and assigning the patient to one of the measurement categories based on the similarities. In the present example, assigning the patient to one of the measurement categories comprises assigning the patient to the measurement category having a clinical prototype most similar to the embedded representation.

To evaluate clustering, the following steps are performed. First, the pairwise Euclidean distance is calculated between each instance and CP. Each instance is then assigned to the CP to which it is closest. Since each CP corresponds to a combination of three attributes, the specific attribute a red can be determined, to which each instance is assigned. Given the ground-truth attribute for each instance, a rue , the accuracy of the assignments, Acc(a), can be calculated. The agreement can also be quantified between the ground-truth attribute assignment and that obtained via clustering, a true =

{aJrue> a?rue< - > au-ue), and a pred = {Up red , a^ red , ... , a^ red }, by calculating the adjusted mutual information AMI(a) G [0; 1], as follows. a pred ) represents the mutual information between the ground-truth and the predicted set of attributes, and H(a) represents the entropy of the set of attributes.

In the clustering scenario, the present method is compared to the following approaches:

1) K-Means, which implements Expectation Maximization to arrive at classspecific centroids. Instances are then assigned to the class of their nearest centroid. To compare, K-Means is performed a) on the input instances (K-Means Raw), b) on representations of instances learned using the soft contrastive loss (K-Means Contrastive), and c) on representations of instances learned using the soft contrastive loss and the regression loss (K-Means Combined).

2) DeepCluster (Caron et al., 2018), which is an iterative method that performs K- Means on representations, pseudo-labels instances according to their assigned cluster, and then exploits such labels for supervised training. The final set of instance labels is taken from the epoch with the lowest validation loss.

3) Information Invariant Clustering (IIC) (Ji et al., 2019), which maximizes the mutual information between the posterior class probabilities assigned to an instance and its perturbed counterpart. Fr comparison, IIC is adapted to the time-series domain, where instances are perturbed with additive Gaussian noise, £ ~ N(0, <J) . The present implementation incorporates the recommended auxiliary over-clustering method as it was shown to significantly improve performance. The final set of instance labels is chosen by taking the argmax of the output probabilities.

4) SeLA (Asano et al., 2019), which pseudo-labels instances by implementing the Sinkhorn-Knopp algorithm before exploiting them for supervised training. For comparison instances are pseudo-labelled after each epoch of training and prior class information used to determine the number of clusters, which should boost performance.

In Table 6, the clustering performance of DROPS is illustrated relative to that of the four above-mentioned state-of-the-art clustering methods, both in the supervised and unsupervised setting. Table 6 shows accuracy and adjusted mutual information between the ground-truth cluster labels and the labels assigned to representations of instances in the validation set of the Chapman dataset. Accuracy and AMI are defined in Eqs. 10 and 11, respectively. Results are shown across 5 seeds.

Table 6

There are four main observations from Table 6. First, the supervised contrastive learning paradigm results in rich representations. This can be seen by the superiority of K- Means Contrastive over K-Means Raw where the Acc(class) =0.795 and 0.292, respectively. This clustering approach does not exploit clinical prototypes and thus allows reliable evaluation of the utility of the representations.

Second, incorporating the regression regularization term (Eq. 6) boosts performance, albeit by a small amount, when implementing either a K-Means algorithm or DROPS. For example, Acc(class) =0.795 and 0.817 for the K-Mean Contrastive and K- Means Combined approaches, respectively. Such a finding suggests that encouraging intracluster separability can offer marginal benefits to the learning process. Despite this improvement, the main motivation for the regularization term lies in its ability to produce semantically meaningful clinical prototypes.

Third, clinical prototypes in the DROPS implementation are able to build upon rich representations to facilitate strong clustering performance. This can be seen by the DROPS Combined performance of Acc(class) =0.903 which outperforms not only K-Means but also recent state-of-the-art unsupervised methods such as DeepCluster and IIC. Such a finding emphasizes the necessity of clinical prototypes for clustering. The present method can also be incorporated with others such as DeepCluster, for example, to assign pseudolabels. Lastly, DROPS is flexible enough to simultaneously cluster according to multiple attributes, a quality that cannot be trivially incorporated into other methods. Such flexibility can provide researchers with improved control over sensitive attributes.

It has been demonstrated that clinical prototypes can be successfully deployed for retrieval and clustering. Such prototypes are encouraged to exhibit inter and intra-cluster separability, with the aim of capturing semantic relationships between them. Here it is confirmed that such semantic relationships were indeed captured.

Fig. 25 illustrates the t-SNE projection of the clinical prototypes after having trained a model with and without the regression regularization term in Eq. 6. Symbols, colours, and shades represent class, sex, and age attributes, respectively. In (a), class and sex separability is achieved. In (b), the separability of such attributes is achieved in addition to that of the age attribute.

Inter-class separability is high regardless of whether the regression term is incorporated or not. This can be seen by the lack of overlap between the projections of CPs illustrated with different symbols, the regression regularization term results in CPs ordered according to age attributes.

Moreover, the inclusion of the regression term leads to clinical prototypes that can better satisfy the semantic relationships between the class, sex, and age attributes. This can be seen by the improved separability and ordering of the sex and age attributes in Fig. 25(b) compared to that in Fig. 25(a). For example, CPs that correspond to a particular class and sex, e.g., SB and F, are ordered according to their associated age attribute. This observation extends to all of the remaining CPs.

The high degree of similarity between the actual clinical prototype projections shown in Fig. 25(b) and the desired projections shown in Fig. 3 is notable. Such a finding demonstrates the ability of the regression regularization term to capture predefined and desired semantic relationships between representations.

A further experiment was carried out on the Chapman and PTB-XL datasets. As for the second retrieval experiment, in this second clustering experiment, the supervised contrastive learning framework used to learn the representations of instances and the clinical prototypes is referred to as CROCS (clustering and retrieval of cardiac signals). In the second clustering experiment, additional prior art methods were used for comparison. The methods used were: 1) K-Means Raw (as above)

2) KM CROCS (similar to K-Means contrastive above, where k-means representations are learned using the CROCS learning framework)

3) KM EP: K-Means using representations learned using the explainable prototypes framework according to Gee et al 2019.

4) DeepCluster (Caron et al., 2018), as above.

5) Information Invariant Clustering (IIC) (Ji et al., 2019), as above.

6) SeLA (Asano et al., 2019), as above.

7) Deep Transfer Cluster (DTC) (Han et al. 2019), as above.

8) TP CROCS: involves traditional prototypes where each prototype, v A . = £ S s i m ply an average of representations, Vt, associated with the same set of attributes, Aj. Such representations are learned via the same CROCS machine learning framework as the present method.

9) Deep Temporal Clustering Representation (DTCR) (Ma et al. 2019) optimizes an objective function with a reconstruction, k-means, and classifier loss that determines whether instances are real.

For Chapman and PTB-XL, sex G{M;F}, age is converted to quartiles, and |class| = 4 and 5, respectively. Therefore, M= |class|x |sex|* |age| = 32 and 40, for the two datasets, respectively. The embedding dimension was chosen as E=128 and 256 for Chapman and PTB-XL, respectively. As shown above, E has a minimal effect on performance.

Fig. 26 shows projections similar to those in Figs. 9 and 10. Specifically, Fig. 26 shows two dimensional UMAP projections of the class-specific clinical prototypes (large, coloured shapes), traditional prototypes (large, black shapes), and representations of instances in the validation set of Chapman (left) and PTB-XL (right) to qualitatively validate the fact that clinical prototypes are attribute-specific. As shown, clinical prototypes can be delineated along the dimensions of disease class, sex, and age.

Clinical prototypes are disease class-specific as evident by the high degree of class separability of the clinical prototypes. Clinical prototypes are also distinct from traditional prototypes. The consistency of the class labels of the clinical prototypes with those of the representations is consistent with the performance of clinical prototypes in the clustering and retrieval settings. These findings complement the delineation of the prototypes along the dimensions of sex and age and their adoption of a semantically meaningful arrangement. This demonstrates their use as centroids for clustering and as queries for retrieval.

To demonstrate clustering performance, cardiac signals in a held-out dataset are assigned to a set of patient attributes associated with the cluster of the closest clinical prototype. These assignments are evaluated based on the three patient-specific attributes (disease class, sex, and age). The results of this second experiment are shown in Table 7 and Table 8, using the accuracy and AMI scores used above. In Table 7 and Table 8, the scores are given as percentages rather than decimals, so should be divided by 100 for direct comparison with the values in Table 6. The evaluation is based on class (Table 7) and sex and age (Table 8) attributes. Results are averaged across five random seeds. Brackets indicate standard deviation and bold reflects the top performing method. CPCROCS (an implementation of the present method, equivalent to DROPS above) outperforms the remaining methods regardless of patient attribute.

Table 7

Table 8

In Tables 7 and 8, CROCS outperforms both generic and domain specific state of- the-art clustering methods. For example, on Chapman, CPCROCS, KMEP, and DTC achieve Acc(class) = 90.3, 65.6, and 53.4% respectively. Along the dimension of sex, and on PTB-XL, CPCROCS and DTC achieve Acc(sex) = 73.5 and 58.6%, respectively.

CROCS also leads to rich representation learning that facilitates clustering. This is evident when comparing the performance of k-means applied to representations that are learned via different methods. For example, on Chapman, KMraw, KMEP, and KMCROCS achieve Acc(class)=28.4, 65.6, and 73.4%, respectively.

Clinical prototypes, when exploited as centroids, are also preferable to traditional prototypes, and centroids learned via k-means. For example, on PTB-XL, KMCROCS, TPCROCS, and CPCROCS achieve Acc(class) = 47.6, 53.6, and 76.0%, respectively. These findings hold across datasets and evaluation metrics, pointing to the overall utility of the CROCS framework and clinical prototypes for attribute-specific clustering.

Ablation Studies

CROCS reliably allows for both clustering and retrieval. Ablation studies were also carried out to better understand the root cause of this reliability. The results are shown in Tables 9 and 10. Tables 9 and 10 display the marginal impact of design choices of CROCS on clustering performance. Evaluation is based on (a) cardiac arrhythmia class and (b) sex and age attributes. Results are averaged across five random seeds. Brackets indicate standard deviation and bold reflects the top performing method.

On average, the soft assignment of representations to prototypes is preferable to the hard assignment. For example, on PTB-XL, LNCE-soft and LNCE-hard achieve Acc(class) ~ 76.0 and 66.5%, respectively.

In addition, the full framework (LNCE-soft +Lreg) is preferable to other variants regardless of attribute. The full framework (LNCE-soft + Lreg) performs better than, or on par with, other variants. For example, on Chapman, LNCE-hard and LNCE-soft = A oo achieve AMI(class) = 67.5, 68.2, and 72.1%, respectively, whereas LNCE-soft +Lreg achieves AMI(class) = 72.8%. This is a positive outcome given that the regularization term’s main purpose was to improve the interpretability of prototypes by allowing them to capture the semantic relationships between attributes. These findings extend to the retrieval setting.

Table 9

Table 10

Disentangled Representations

As demonstrated, clinical prototypes can be deployed successfully for retrieval and clustering purposes while managing to capture relationships between attributes. It is also possible to quantify the relationship between clinical prototypes and explore their features further. Disentangled representations are a concept whereby representations can be factorized into multiple sub-groups, each of which correspond to a particular abstraction. By clustering it is possible to discover attribute-specific feature subsets. These features can indeed be clustered into three main groups, potentially coinciding with the pre-defined attributes. Such a process can improve the interpretability of clinical prototypes and lead to insights about how they can be further manipulated for retrieval purposes, for instance, by altering a subset of features.

Fig. 28 illustrates a matrix of the clinical prototypes (M=32) along the rows and their corresponding features (E=128) along the columns.

By implementing the hierarchical agglomerative clustering (HAC) algorithm, these clinical prototypes are clustered to arrive at the dendrogram presented along the rows of Fig. 28. Along the rows, clustering is performed using the 128-dimensional features resulting in 4 major clusters corresponding to the 4 classes. Clinical prototypes with similar attributes are also clustered together. The rows are labelled according to the attribute combination, m. In the columns, clustering is performed whereby each of the 128 features is treated as an instance, resulting in 3 major clusters which broadly correspond to the 3 attributes: class, sex, and age. This suggests that disentangled, attribute-specific features may have been learned.

In addition to being correctly clustered according to class labels, they are also more similar to one another based on their attributes. This can be seen by the attribute combination descriptions in the right column. This finding supports the idea that clinical prototypes do indeed capture relationships between attributes.

References

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024-8035, 2019.

Adam M. Rhine. Information Retrieval for Clinical Decision Support. PhD thesis, 2017.

Alan H Gee, Diego Garcia-Olano, Joydeep Ghosh, and David Paydarfar. Explaining deep classification of time series data with learned prototypes. arXiv preprint arXiv: 1904.08935, 2019.

Alan F Smeaton. Using nip or nip resources for information retrieval tasks. In Natural Language Information Retrieval, pages 99-111. Springer, 1999.

Ali Pourmand, Mary Tanski, Steven Davis, Hamid Shokoohi, Raymond Lucas, and Fareen Zaver. Educational technology improves ecg interpretation of acute myocardial infarction among medical students and emergency medicine residents. Western Journal of Emergency Medicine, 16(1): 133, 2015.

Alistair E W Johnson, Tom J Pollard, Lu Shen, H Lehman Li-Wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data 3(l):l-9, 2016.

Anis Sharafoddini, Joel A Dubin, and Joon Lee. Patient similarity in prediction models based on health data: a scoping review. JMIR Medical Informatics, 5(l):e7, 2017.

Arnaud Van Looveren and Janis Klaise. Interpretable counterfactual explanations guided by prototypes. arXiv preprint arXiv: 1907.02584, 2019.

Byron C Wallace, Joel Kuiper, Aakash Sharma, Mingxi Zhu, and Iain J Marshall. Extracting pico sentences from clinical trial reports using supervised distant supervision. The Journal of Machine Learning Research, 17(l):4572-4596, 2016.

Chaitanya Shivade, Preethi Raghavan, Eric Fosler-Lussier, Peter J Embi, Noemie Elhadad, Stephen B Johnson, and Albert M Lai. A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association, 21(2):221-230, 2014.

Conner D Galloway, Alexander V Valys, Jacqueline B Shreibati, Daniel L Treiman, Frank L Peterson, Vivek P Gundotra, David E Albert, Zachi I Attia, Rickey E Carter, Samuel J Asirvatham, et al. Development and validation of a deep-learning model to screen for hyperkalemia from the electrocardiogram. JAMA Cardiology, 4(5):428-436, 2019.

Dani Kiyasseh, Tingting Zhu, and David A Clifton. CLOCS: Contrastive learning of cardiac signals. arXiv preprint arXiv:2005.13249, 2020.

David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv: 1609.09106, 2016.

Deepak Roy Chittajallu, Bo Dong, Paul Tunison, Roddy Collins, Katerina Wells, James Fleshman, Ganesh Sankaranarayanan, Steven Schwaitzberg, Lora Cavuoto, and Andinet Enquobahrie. Xai-cbir: Explainable ai system for content based retrieval of video frames from minimally invasive surgery videos. In International Symposium on Biomedical Imaging, pages 66-69. IEEE, 2019.

Dianbo Liu, Dmitriy Dligach, and Timothy Miller. Two-stage federated phenotyping and patient representation learning. arXiv preprint arXiv: 1908.05596, 2019.

Erick A Perez Alday, Annie Gu, Amit Shah, Chad Robichaux, An-Kwok Ian Wong, Chengyu Liu, Feifei Liu, Ali Bahrami Rad, Andoni Elola, Salman Seyedi, et al. Classification of 12 2 lead ecgs: the physionet/computing in cardiology challenge 2020. medRxiv, 2020.

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199-1208, 2018.

Geoffrey Hinton. How to represent part-whole hierarchies in a neural network. arXiv preprint arXiv: 2102.12627, 2021.

Haolin Wang, Qingpeng Zhang, and Jiahu Yuan. Semantically enhanced medical information retrieval system: a tensor factorization based approach. lEEEAccess, 5:7584-7593, 2017.

Harsha Gurulingappa, Luca Toldo, Claudia Schepers, Alexander Bauer, and Gerard Megaro. Semi-supervised information retrieval system for clinical decision support. In Text Retrieval Conference (TREC), 2016.

Isotta Landi, Benjamin S Glicksberg, Hao-Chih Lee, Sarah Cherng, Giulia Landi, Matteo Danieletto, Joel T Dudley, Cesare Furlanello, and Riccardo Miotto. Deep representation learning of electronic health records to unlock patient stratification at scale arXiv preprint arXiv:2003.06516, 2020.

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for fewshot learning. In Advances in Neural Information Processing Systems, pp. 4077-4087, 2017. Jianwei Zheng, Jianming Zhang, Sidy Danioko, Hai Yao, Hangyuan Guo, and Cyril Rakovski. A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. Scientific Data 7(1): 1-8, 2020.

Joseph Y Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, and Erdrin Azemi. Subject-aware contrastive learning for biosignals. arXiv preprint arXiv:2007.04871, 2020.

Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt- Thieme. Learning time-series shapelets. In Proceedings of the 20 th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 392-401, 2014.

Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, and Steven C H Hoi. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966, 2020a.

Kai Han, Andrea Vedaldi, and Andrew Zisserman. Learning to discover novel visual categories via deep transfer clustering. In Proceedings of the IEEE International Conference on Computer Vision, pages 8401-8409, 2019.

Leland Mclnnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv: 1802.03426, 2018.

Leonard W D’ Avolio, Thien M Nguyen, Wildon R Farwell, Yongming Chen, Felicia Fitzmeyer, Owen M Harris, and Louis D Fiore. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (arc). Journal of the American Medical Informatics Association, 17(4):375— 382, 2010.

Li Huang, Andrew L Shea, Huining Qian, Aditya Masurkar, Hao Deng, and Dianbo Liu. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. Journal of biomedical informatics, 99: 103291, 2019.

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882, 2020.

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and MatthijsDouze. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 132-149, 2018.

Morten Morup and Lars Kai Hansen. Archetypal analysis for machine learning and data mining. Neurocomputing, 80:54-63, 2012.

Naveen Sai Madiraju, Seid M Sadat, Dimitry Fisher, and Homa Karimabadi. Deep temporal clustering: Fully unsupervised learning of time-domain features. arXiv preprint arXiv: 1802.01059, 2018.

Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl. arXiv preprint arXiv:2004.13701, 2020.

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Wojciech Samek, and Tobias Schaeffter. PTB-XL, a large publicly available electrocardiography dataset, 2020. URLhttps://physionet.org/content/ptb-xl/l .0.1/.

[16] Qianli Ma, Jiawei Zheng, Sen Li, and Gary W Cottrell. Learning representations for time series clustering. Advances in Neural Information Processing Systems, 32:3781-3791, 2019.

Qin Zhang, Jia Wu, Peng Zhang, Guodong Long, and Chengqi Zhang. Salient subsequence learning for time series clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9):2193-2207, 2018.

Qiuling Suo, Fenglong Ma, Ye Yuan, Mengdi Huai, Weida Zhong, Aidong Zhang, and Jing Gao. Personalized disease prediction using a cnn-based similarity learning method. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 811-816. IEEE, 2017.

R Rani Saritha, Varghese Paul, and P Ganesh Kumar. Content based image retrieval using deep learning process. Cluster Computing, 22(2):4187-4200, 2019.

Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports 6(1): 1-10, 2016.

Sajad Darabi, Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. Taper: Time-aware patient ehr representation. IEEE Journal of Biomedical and Health Informatics, 2020. European Commission, exchange-electronic-health-records-across- eu, 2019. URL https://ec.europa.eu/digital-single-market/en/exchange-elect ronic-health- records-across-eu.

Shraddha Pai and Gary D Bader. Patient similarity networks for precision medicine. Journal of Molecular Biology, 430(18):2924-2938, 2018.

Shraddha Pai, Shirley Hui, Ruth Isserlin, Muhammad A Shah, Hussam Kaka, and Gary D Bader, netdx: Interpretable patient classification using integrated patient similarity networks Molecular Systems Biology, 15(3), 2019.

Siddharth Biswal, Cao Xiao, Lucas M Glass, Elizabeth Milkovits, and Jimeng Sun. Doctor2vec: Dynamic doctor representation learning for clinical trial recruitment. In Proceedings of the AAAI Conference on Artificial Intelligence volume 34, pp. 557- 564, 2020.

Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L Yuille. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7229-7238, 2018.

Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367-4375, 2018.

Steve R Chamberlin, Steven D Bedrick, Aaron M Cohen, Yanshan Wang, Andrew Wen, Sijia Liu, Hongfang Liu, and William Hersh. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. medRxiv, pp. 19005280, 2019.

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020.

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efiros. Dataset distillation. arXiv preprint arXiv: 1811.10959, 2018.

Travis R Goodwin and Sanda M Harabagiu. Learning relevance models for patient cohort retrieval. JAMIA open, l(2):265-275, 2018.

Travis R Goodwin and Sanda M Harabagiu. Multi-modal patient cohort identification from eeg report and signal data. In AMIA Annual Symposium Proceedings, volume 2016, page 1794 American Medical Informatics Association, 2016.

Vivek H Murthy, Harlan M Krumholz, and Cary P Gross. Participation in cancer clinical trials: race-, sex-, and age-based disparities. JAMA, 291(22):2720-2726, 2004.

Vivien Sainte Fare Garnot and Loic Landrieu. Metric-guided prototype learning, 2020.

Wei-Yin Ko, Konstantinos C Siontis, Zachi I Attia, Rickey E Carter, Suraj Kapa, Steve R Ommen, Steven J Demuth, Michael J Ackerman, Bernard J Gersh, Adelaide M Arruda-Olson, et al. Detection of hypertrophic cardio myopathy using a convolutional neural network-enabled electrocardiogram. Journal of the American College of Cardiology, 75(7):722-733, 2020.

William R Hersh and Robert A Greenes. Information retrieval in medicine: state of the art. MD computing: computers in medical practice, 7(5):302— 311, 1990.

Xu Ji, Joao F Henriques, and Andrea Vedaldi. Invariant information clustering for unsupervised image classification and segmentation In Proceedings of the IEEE International Conference on Computer Vision, pp. 9865-9874, 2019.

Yanshan Wang, Andrew Wen, Sijia Liu, William Hersh, Steven Bedrick, and Hongfang Liu. Test collections for electronic health record-based clinical information retrieval. JAMIAopen, 2(3): 360-368, 2019.

Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv: 1906.05849, 2019.

Yue Li, Pratheeksha Nair, Xing Han Lu, Zhi Wen, Yuening Wang, Amir Ardalan Kalantari Dehaghi, Yan Miao, Weiqi Liu, Tamas Ordog, Joanna M Biemacka, et al. Inferring multimodal latent topics from electronic health records. Nature communications, 11(1): 1-17, 2020b.

Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. arXiv preprint arXiv: 1911.05371, 2019 (also in International Conference on Learning Representations, 2020).

Yuqi Si and Kirk Roberts. Deep patient representation of clinical notes via multitask learning for mortality prediction. AMIA Summits on Translational Science Proceedings, 2019:779, 2019.

Zachi I Attia, Peter A Noseworthy, Francisco Lopez -Jimenez, Samuel J Asirvatham, Abhishek J Deshmukh, Bernard J Gersh, Rickey E Carter, Xiaoxi Yao, Alejandro A Rabinstein, Brad J Erickson, et al. An artificial intelligence enabled ecg algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. The Lancet, 394(10201):861-867, 2019b.

Zachi I Attia, Suraj Kapa, Francisco Lopez -Jimenez, Paul M McKie, Dorothy J Ladewig, Gaurav Satam, Patricia A Pellikka, Maurice Enriquez-Sarano, Peter A Noseworthy, Thomas M Munger, et al. Screening for cardiac contractile dysfunction using an artificial intelligence enabled electro-cardiogram. Nature Medicine, 25(l):70- 74, 2019a.

Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, and Fei Wang. Measuring patient similarities via a deep architecture with medical concept embedding. In 2016 IEEE 16 th International Conference on Data Mining (ICDM), pp. 749-758. IEEE, 2016.