Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
RADON TRANSFORM AND PERSISTENT HOMOLOGY PATIENT BARCODES
Document Type and Number:
WIPO Patent Application WO/2022/182603
Kind Code:
A1
Abstract:
Various examples are provided related to patient codes which may be utilized to store and retrieve compressed data. In one example, a method includes receiving imaging data associated with a patient; generating a barcode or persistence image from the imaging data; extracting feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generating a patient code from the extracted feature information, wherein the feature information is retrievable from the code. The patient code can be a visual code such as, e.g., a QR-Code. In another example, a system includes processing circuitry and a code generation application that causes the processing circuitry to generate a barcode or persistence image from imaging data associated with a patient; extract feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generate a patient code from the extracted feature information.

Inventors:
SENGUPTA PARTHO (US)
YANAMALA NAVEENA (US)
Application Number:
PCT/US2022/017132
Publication Date:
September 01, 2022
Filing Date:
February 21, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
WEST VIRGINIA UNIV BOARD OF GOVERNORS ON BEHALF OF WEST VIRGINIA UNIV (US)
International Classes:
G06K9/62
Domestic Patent References:
WO2018069736A12018-04-19
Foreign References:
CN110826633A2020-02-21
US20190139221A12019-05-09
US20170262982A12017-09-14
Other References:
HENRY ADAMS, SOFYA CHEPUSHTANOVA, TEGAN EMERSON, ERIC HANSON, MICHAEL KIRBY, FRANCIS MOTTA, RACHEL NEVILLE, CHRIS PETERSON, PATRIC: "Persistence Images: A Stable Vector Representation of Persistent Homology", JOURNAL OF MACHINE LEARNING RESEARCH, 22 July 2015 (2015-07-22), pages 1 - 35, XP055660586, Retrieved from the Internet
Attorney, Agent or Firm:
RANDY R. SCHOEN et al. (US)
Download PDF:
Claims:
CLAIMS

Therefore, at least the following is claimed:

1. A method, comprising: receiving imaging data associated with a patient; generating a barcode or persistence image from the imaging data; extracting feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generating a QR-Code from the extracted feature information, wherein the feature information is retrievable from the QR-Code.

2. The method of claim 1 , wherein the QR-Code includes clinical information associated with the patient.

3. The method of claim 1 , wherein the imaging data comprises point cloud data used to generate the persistence image.

4. The method of claim 3, wherein the persistence image is generated using persistent homology (PH).

5. The method of claim 4, wherein the PH analyzes the point cloud data and a sublevel set.

6. The method of claim 3, wherein the point cloud data is generated from a scan of the patient.

7. The method of 6, wherein the scan is an ultrasound image.

8. The method of claim 1 , wherein the barcode is generated from a scan of the patient using a Radon transform.

9. The method of claim 8, wherein the barcode is further based upon information produced using discrete cosine transform (DCT) analysis of content generated by the Radon transform.

10. The method of claim 1 , wherein the machine learning comprises supervised machine learning.

11. The method of claim 1 , comprising storing the QR-Code in a database.

12. The method of claim 1 , comprising transmitting the QR-Code to a remotely located device.

13. A system, comprising: processing circuitry comprising a processor and memory; and a code generation application executable by the processing circuitry, where execution of the code generation application causes the processing circuitry to: generate a barcode or persistence image from imaging data associated with a patient; extract feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generate a patient code from the extracted feature information, wherein the feature information is retrievable from the patient code.

14. The system of claim 13, wherein the patient code is a QR-code.

15. The system of claim 13, wherein the patient code includes clinical information associated with the patient.

16. The system of claim 13, wherein the persistence image is generated using persistent homology (PH).

17. The system of claim 16, wherein the imaging data comprises point cloud data used to generate the persistence image.

18. The system of claim 13, wherein the barcode is generated from a scan of the patient using a Radon transform.

19. The system of claim 13, wherein the code generation application causes the processing circuitry to store the patient code in a database.

20. The system of claim 19, wherein the patient code is transmitted to a remotely located device for storage.

Description:
RADON TRANSFORM AND PERSISTENT HOMOLOGY PATIENT

BARCODES

CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to, and the benefit of, co-pending U.S. provisional application entitled “Radon Transform and Persistent Homology Patient Barcodes” having serial no. 63/152,686, filed February 23, 2021 , which is hereby incorporated by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0002] This invention was made with government support under agreement # 1920920 awarded by the National Science Foundation. The Government has certain rights in the invention.

BACKGROUND

[0003] The nature of medical data is innately complex temporally and spatially. Enormous amounts of medical information are currently available, and even more is generated daily. Yet it is not feasible for a physician to assimilate all of this information to aid in their medical decision making/diagnosis of a rare disease. To circumvent this cognitive burden, much research has been done in data mining and machine learning algorithms to process such information. However, even machine learning algorithms have their limitations when accepting too many input parameters.

SUMMARY

[0004] Aspects of the present disclosure are related to patient codes which may be utilized to store and retrieve data. The techniques can utilize persistent homology and/or Radon transform to obtain condensed information. In one aspect, among others, a method comprises receiving imaging data associated with a patient; generating a barcode or persistence image from the imaging data; extracting feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generating a patient code from the extracted feature information, wherein the feature information is retrievable from the patient code. The patient code can be a QR-Code or other visual code. In one or more aspects, the patient code or QR-Code can include clinical information associated with the patient. In various aspects, the imaging data can comprise point cloud data used to generate the persistence image. The persistence image can be generated using persistent homology (PH). The PH can analyze the point cloud data and a sublevel set. The point cloud data can be generated from a scan of the patient. The scan can be an ultrasound image. In some aspects, the barcode can be generated from a scan of the patient using a Radon transform. The barcode can be based upon information produced using discrete cosine transform (DCT) analysis of content generated by the Radon transform. The machine learning can comprise supervised machine learning. The method can comprise storing the QR-Code in a database. The method can comprise transmitting the QR-Code to a remotely located device.

[0005] In another aspect, a system comprises processing circuitry comprising a processor and memory; and a code generation application executable by the processing circuitry, where execution of the code generation application causes the processing circuitry to: generate a barcode or persistence image from imaging data associated with a patient; extract feature information regarding the patient from the barcode or persistence image utilizing machine learning; and generate a patient code from the extracted feature information, wherein the feature information is retrievable from the patient code. In one or more aspects, the patient code can be a QR-code or other visual code. The patient code can include clinical information associated with the patient. The persistence image can be generated using persistent homology (PH). The imaging data can comprise point cloud data used to generate the persistence image. In various aspects, the barcode can be generated from a scan of the patient using a Radon transform. The code generation application can cause the processing circuitry to store the patient code in a database. The patient code can be transmitted to a remotely located device for storage.

[0006] Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims. In addition, all optional and preferred features and modifications of the described embodiments are usable in all aspects of the disclosure taught herein. Furthermore, the individual features of the dependent claims, as well as all optional and preferred features and modifications of the described embodiments are combinable and interchangeable with one another.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.

[0008] FIG. 1 illustrates an example of a persistent homology (PH) workflow, in accordance with various embodiments of the present disclosure.

[0009] FIG. 2 illustrates an example of a Radon transform workflow, in accordance with various embodiments of the present disclosure.

[0010] FIG. 3 illustrates an example of a code generation workflow utilizing PH and Radon transform, in accordance with various embodiments of the present disclosure.

[0011] FIGS. 4A-4C illustrate aspects of a proof of concept of the Radon transform and PH workflow of FIG. 3, in accordance with various embodiments of the present disclosure.

[0012] FIGS. 4D-4K illustrate aspects of a use-case scenario of the PH workflow, in accordance with various embodiments of the present disclosure. [0013] FIG. 5 is a schematic block diagram illustrating an example of a system employed for Radon transform and PH workflow for code generation, in accordance with various embodiments of the present disclosure.

DETAILED DESCRIPTION

[0014] Disclosed herein are various examples related to patient codes which may be utilized to store and retrieve data. The disclosed techniques offer the ability to compress data while allowing recovery of the information. The techniques can utilize persistent homology and/or Radon transform to obtain condensed information about the shape of data. Reference will now be made in detail to the description of the embodiments as illustrated in the drawings, wherein like reference numbers indicate like parts throughout the several views.

[0015] Informative features may be discarded when extracting only global longitudinal strain from speckle tracking echocardiography (STE) strain tracings. Here, a workflow pipeline is proposed to compute persistent homology (PH) of information such as, e.g., STE myocardial deformation curves as a dimension reduction technique and the store this information within a QR-Code. PH is a method of topological data analysis (TDA) that captures information regarding the global and local “shape” of the data being analyzed. The workflow provides a way to simplify the spatially and temporally complex strain curves and store only meaningful features in the QR-Code. This allows for recognition of deformation patterns associated with various pathological conditions. Further, storing this information within a QR-Code can reduce the data intensive footprint of STE, while allowing for its seamless implementation within electronic medical record systems.

[0016] The use of persistent homology can address the issue of poor reproducibility of segmental strain values, by instead emphasizing the shape of the curve in the data analysis. This workflow can also be used to encode other disease patterns within patient individualized QR-Codes. Examples of applications include, but are not limited to, storage of large amounts of data in a compressed format (which can be fed into machine learning pipelines), communication and/or transmission of information (intra- or extra- patient or organization), patient phenotyping, grouping, risk stratification and therapy planning, and biometric evaluation and assessment. The feasibility is demonstrated using a STE strain dataset of chronic constrictive pericarditis, restrictive cardiomyopathy, and control patients. Although the technology is demonstrated with STE data, this pipeline can also be applied to other aspects of clinical datasets. This technique can be utilized to discriminate the rare disease of constrictive pericarditis from restrictive cardiomyopathy and normal patients using echocardiography image analysis.

Myocardial Strain Imaging

[0017] Speckle tracking echocardiography (STE) quantifies regional cardiac function by monitoring myocardial deformation. Over the past two decades, strain imaging has gained in prevalence and shown great clinical utility. STE measures deformation in the longitudinal, circumferential, and radial axes for LV segments. This can be visualized as multiple regional strain and strain rate traces throughout the cardiac cycle. Global longitudinal strain (GLS), which can be calculated as the average value of segmental longitudinal curves from the same frame, has emerged as a robust measure for left ventricular (LV) mechanics that provides prognostic and diagnostic information, and appears to perform better than LV ejection fraction (EF) to predict major adverse cardiac events and to recognize certain cases of left ventricular dysfunction before their identification by LVEF.

[0018] The increased integration of GLS has been a significant step forward. However, rich diagnostic information present within strain traces may be missed using only GLS from strain curves. Using the individual segmental strain values can result in limitations such as greater measurement variability, poorer reproducibility, and higher inter-vendor bias compared to those of GLS. The recognition of patterns in the strain curves or relationships between different ventricular segments offers a solution to this issue. For example, a pipeline of principle component analysis to represent the complex spatio-temporal nature of the curves can be applied with machine learning to classify myocardial infarction or a relative regional strain ratio (a metric of relative longitudinal strain sparing in the apex) can be applied to provide prognostic information in cardiac amyloidosis patients. In this disclosure, persistent homology is utilized as the basis to discover the disease patterns in the myocardial strain tracings. This can facilitate the use of both global topology and local geometry characteristics of the strain tracings.

Persistent Homology

[0019] Persistent homology (PH) is a topological data analysis tool that describes the shape of data by extracting its topological invariants. The mathematical basis of PH is shown in “Computing persistent homology” by Zomorodian and Carlsson ( Discrete & Computational Geometry, 33(2), 249-274, 2005) and “Topological persistence and simplification” by Edelsbrunner et al. ( IEEE Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 454-463, 2000). The pipelines for computing persistent homology and its application with machine learning are outlined in “Persistent-Homology-based Machine Learning and its Applications-A Survey” by Pun et al. (2018) and “A roadmap for the computation of persistent homology” by Otter et al. (EPJ Data Science, 6(1), 17, 2017).

[0020] Referring to FIG. 1 , shown is an example of the PH workflow. Beginning with a point cloud data set (e.g., an echo strain point cloud) at 102, preprocessing can be performed at 104. The initial step of PH is a filtration at 106 to create a series of simplicial complexes for a scale e. Consider the filtration process 106 for a point cloud data set (within a finite metric space) where a sphere of radius e is drawn around every point. At each intersection of two spheres, an edge can be drawn between the two points. The filtering of the data builds a simplicial complex space, from which PH quantifies the presence of n- dimensional holes (i.e. 0-dimensional holes are connected components, 1-dimensional holes are circles/loops/tunnels, and 2-dimensional holes are voids). As it is not possible to determine the optimal value for the scale z, the main principle of PH is to progress through all possible values e (0<e<¥) to determine how the homology of these components change. For each structure (n-dimensional whole), the times of the birth (at what e it appears) and death (at what e it disappears) are computed. [0021] The mode of filtration differs in the three different TDA based techniques applied in this pipeline: PH of point cloud data, sublevel set PH and phase space reconstructed point cloud of LV regions. In the first PH pipeline, alpha filtration of point cloud data is used because it will be faster for the data being analyzed, which is low dimensional (3 dimensions) and in a Euclidian space. The general intuition explained previously holds for the alpha complex is homotopy equivalent to the Cech complex [“Topological Data Analysis” by Zomorodian ( Advances in applied and computational topology, 70, 1-39, 2012)]. The alpha complex is a subcomplex of the Delunay complex, which is formed as the nerves of the Varonoi Diagram. The mathematical background for the alpha filtration was introduced in “Three-dimensional alpha shapes” by Edelsbrunner and Miicke ( ACM Transactions on Graphics (TOG), 13(1), 43-72, 1994).

[0022] The second technique of PH is the sublevel set, which can take an input as a real valued function or image and extract topology representing the critical points within the data [“Persistent homology-a survey” by Edelsbrunner and Harer ( Contemporary Mathematics, 453, 257-282, 2008)]. The lower star filtration used in this process sweeps across different pixel intensity thresholds and at each scale computes the 0-dimensional homology. This process records a birth when the component is a local minimum and a death when there is a saddle point (merging of two local minimum). Additionally, by inputting a negative version of the data, lower star filtration can be performed to identify local maxima as births.

[0023] The third technique of PH is time-delay embedded set, which uses a non-linear dynamic signal processing approach called time-delay embedding for reconstructing a point cloud in the metric/vector space. This process ensures that informative dynamic invariants are preserved such that a topological analysis of the resulting point cloud data yields features that are useful for classification. Such a point cloud requires a PH to be computed in a real-valued Euclidean space (e.g., using Vietoris-Rips filtration). Compared to Cech or the alpha complex, the Vietoris-Rips complex computations are much easier in higher dimensions, as Rips filtration depends only on pairwise distances. [0024] The output of any form of PH is a series of birth (b) and death (d) values for each component that can be plotted in a persistence diagram. This visual representation is a plot with coordinates (b, d). A diagonal line is drawn through the origin because every component must be born before it dies; generally, points close to the diagonal are not as persistent while those further away are more persistent topological features. The persistence, or "lifetime", of a component is defined as b - d. In other words, the components with a larger lifetime are said to persist in the data for a longer time. While the initial intuition was that the components with the higher persistence are the most significant, it has been shown that the most significant features in classification are not always ones with the longest lifetimes Rather, the significance associated with a component’s persistence varies based on the dataset.

Persistence Image

[0025] In order to compare the homology of persistence diagrams of different patients, metrics such as Wasserstein or Bottleneck distance can be calculated. Persistence barcodes and persistence landscapes have been considered. Persistence images have been introduced as a method to vectorize persistence diagrams for machine learning tasks [“Persistence images: A stable vector representation of persistent homology” by Adams et al. (The Journal of Machine Learning Research, 18(1), 218-252, 2017)]. The persistence image methodology has been selected because of its ability to work with a wider range of machine learning algorithms as well as the potential to select only a few discriminatory pixels to store in the limited space of a QR-Code. The persistence image pipeline converts the persistence diagram to a persistence surface then a pixelated persistence image (PI) at 108. First all points are converted from birth-death to birth-persistence, i.e. (b, d) to (b, d-b). A weighting function can then be applied giving points that are more persistent a higher amount of intensity. A Gaussian probability distribution with selected variance level is applied at each point. A grid (n by n) is overlaid over the surface to form the PI with chosen resolution. The pixel intensities of the PI are taken as a feature vector for machine learning and/or feature selection at 110. Vectors from different component dimensions can be concatenated into a larger vector at this stage. The machine learning output can be used to generate the QR- Code at 112.

Radon Transform

[0026] The Radon transform takes a function defined on the plane to a function defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The Radon transform represents the projection data obtained as the output of a scan and can he used to reconstruct the original density from the projection data. The Radon transform data is often called a sinogram because the Radon transform of an off-center point source is a sinusoid. Consequently, the Radon transform appears graphically as a number of blurred sine waves with different amplitudes and phases.

[0027] Referring to FIG. 2, shown is an example of the Radon workflow. Beginning with a scan (e.g., an ultrasound image, tomography image, or other appropriate imaging scan) at 202, preprocessing can be performed at 204. The Random transform can be applied at 206 and the Radon transform content from the image can be used to generate the barcode at 208. The Radon transform content can also be further processed using a two-dimensional (2D) discrete cosine transform (DST) and/or a one-dimensional (1D) DST at 210, followed by locality sensitive discriminate analysis (LSDA) (local binary pattern analysis) at 212 [“An integrated index for identification of fatty liver disease using radon transform and discrete cosine transform features in ultrasound images” by Acharya, et al. ( Information Fusion, 31, 43-53, 2016)]. Ranking at 214 and classification at 216 of the processed content can provide data for the generation of the barcode at 208.

Workflow Pipeline

[0028] FIG. 3 shows an example of the Radon transform and PH code generation workflow that can be used to produce patient codes. As illustrated, scan image data can be analyzed using the PH and/or Radon transform to extract features that can be applied to machine learning (ML) to generate identify information used to generate a QR-Code. For proof of concept, the workflow’s ability to differentiate between restrictive cardiomyopathy and chronic constrictive pericarditis was evaluated as this is a complex differential diagnosis with similar clinical presentation for both conditions. FIGS. 4A-4C illustrate aspects of the proof of concept. As shown in FIG. 4A, the ultrasound scan can be converted into a sinogram using the Radon transform, which can then be converted into a barcode as illustrated in FIG. 2.

[0029] From previously published datasets [“Diagnostic Concordance of Echocardiography and Cardiac Magnetic Resonance-Based Tissue Tracking for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy” by Amaki et al. (Cardiovascular Imaging, 7(5), 819-827, 2014) and “Disparate patterns of left ventricular mechanics differentiate constrictive pericarditis from restrictive cardiomyopathy” by Sengupta et al (JACC Cardiovasc Imaging. 2008 Jan; 1 (1):29-38)] of 152 patients evaluated by 2- dimensional speckle tracking echocardiography, longitudinal, radial and circumferential strain traces and strain rates were obtained from the apical 4-chamber and short-axis views. For each patient, 49 left ventricular segments were monitored for longitudinal, radial and circumferential deformation. Using these values as raw data, a cubic spline was applied to each segment as a temporal normalization technique over one cardiac cycle to produce its segmental strain/strain-rate curve, with number of time intervals, n = 49. The splined data was further transformed in three separate manners for each version of PH (as needed). As shown in FIG. 4C, the point cloud data can be converted into a QR-Code as illustrated in FIG. 3.

[0030] For point cloud PH, the splined data can be converted into (C,U,Z) coordinates, with X = LV segment identity, Y = time in cardiac cycle, and Z = strain value. The alpha filtration was used to compute the PH of these strain curves for dimensions H 0 , Hi, and H 2 .

[0031] For sublevel set PH, the splined data can be converted to a patient contour plot with time in the cardiac cycle as x-axis and left ventricular segment as the y- axis. In this manner the image can be thought of as a real-valued function where each pixel location corresponds to the strain value at a particular location on the ventricle at a given time. Lower star filtration was used to compute sublevel set PH for dimension H 0 of strain contour plots.

[0032] For time-delay embedded reconstruction set PH, the splined data was converted by averaging over 16/17 LV segments to represent the lateral wall, apex and interventricular septum of the left ventricle. The Vietoris-Rips aka Rips-complex filtration was used to compute the PH for dimension H 0 of the strain and strain rate phase-space reconstructed LV regional signal curves.

[0033] The output of all input data types to PH is a set of birth and death coordinates. For point cloud H 0 , Hi, and H 2 , sublevel set H 0 , and phase-space-reconstructed LV regions set Ho, these coordinates are used to make a persistence image for each dimension. The intensity of each pixel in the persistent image is concatenated to form a vector of length 50 corresponding to the 50 pixels in the image. The persistence pixel vectors for any dimension or for same dimension across different strain and strain rates can be concatenated to form a larger vector of combined features. Different combinations of feature vectors are finally evaluated through PCA (Orange) and supervised machine learning (BigML/Orange). FIG. 4C illustrates supervised machine learning to extract features for identifying signatures. The selected features which are differentiating of ROM vs CP, ROM vs NL, and CP vs NL myocardial pathological patterns can then be encoded within a patient’s individualized patient QR-Code.

Use-Case Scenario

[0034] Next, a use-case scenario of the PH workflow that harnesses both deformation patterns and global topological information of the left ventricle to characterize cardiovascular diseases is discussed with respect to FIGS. 4D-4K. Specifically, the ability of the pipeline to condense large, high-dimensional cardiac imaging data while preserving informative features capable of distinguishing between rare patient populations, such as those with constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM) is showcased. These two diseases were investigated in this proof of concept because they share a similar clinical presentation posing a difficult challenge for physicians to diagnose correctly. The TDA technique also can be scaled to larger and more diverse disease presentations, with the ability to store the collected information from topological data into a readable motif for clinical utility. By incorporating persistent homology into the assessment of strain waveforms, both global topological structure and local geometry of the left ventricle deformation patterns can be captured. This provides a more comprehensive means for identifying cardiovascular anomalies

[0035] A persistent homology workflow based on topological data analysis techniques is proposed to identify disease patterns from functional physiologic signals. FIG. 4D illustrates an example of the proposed workflow. A physiological signal can be pre-processed to a suitable input for persistent homology feature extraction. The resulting topological features can be represented visually as an individualized patient motif produced by concatenating multiple persistent images. This motif can be directly interpreted by physician or used to develop machine learning models. As shown in FIG. 4D, the pipeline can include:

• Data Preprocessing - data is temporally and spatially normalized to an n-dimensional point cloud.

• TDA filtration - simplicial complexes are built upon the point cloud, from which topological invariants are extracted.

• Persistent Image - birth, and death features are transformed into a persistent image to develop feature vectors.

• Patient-Specific Motif - features are stored in a visual representation that can be directly interpreted by physicians/scientists or serve as input for machine learning.

• Predictive Model Generation - feature selection and classification of patients is performed using machine learning.

[0036] TDA Filtration. Consider a point cloud data set within a finite metric space, a set M is a function d-. M x M to [0, ¥), where [0, ¥) denotes the set of non-negative real numbers. For any points x,y,z in M,d(x,y ) = d(y,x),d(x,x ) = 0, and d(x,z ) £ d(x,y ) + d(y,z .

[0037] The initial step of PH is a filtration to create a series of simplicial complexes for a scale r, where a sphere of radius r is drawn around every point. At each intersection of two spheres, an edge is drawn between the two points. The filtering of data builds a simplicial complex space, from which PH quantifies the presence of n-dimensional holes i.e. 0- dimensional holes are connected components, 1-dimensional holes are circles/loops/tunnels, and 2-dimensional holes are voids. For a data set X c R n and scale r ³ 0, the Cech simplicial complex Cech( ; r) has:

• vertex set X and

• finite simplex (x 0 ,x 1 , ..., ) when n fc j = 0 B( j ,r) ¹ 0.

[0038] As it is not possible to determine the optimal value for the scale r, the main principle of PH is to progress through all possible values r (0 < r < ¥) to determine how the homology of these components change. For a given filtration of a simplicial complex, determine the output (birth, death) barcode intervals, representatives for each topological feature, i.e. for each structure (n-dimensional hole), and the times of the birth (at what r it appears), and death (at what r it disappears) are computed.

[0039] FIG. 4E illustrates an example of the persistent homology. For a given proximity parameter e, a circle with radius r = e can be drawn around each data point. The intersections of these circles guide the construction of a set of simplicial complexes. All possible values of scan be tested to detect variations in topology at different scales. The appearance and disappearance of connected components and open loops can be measured by HO and H1 , respectively, and subsequently visualized as a persistent barcode.

[0040] The horizontal axis shows the filtration steps. Each D-dimensional topological feature in filtration is represented by a bar that starts at the filtration step at which the feature is born and ends at the filtration step at which it dies. Thus, 0-dimensional barcode, each bar corresponds to a connected component, and the length of a bar indicates how long a particular component is disconnected from other components.

[0041] Persistence Image. To compare the homology of persistence diagrams of different patients, metrics, such as Wasserstein or Bottleneck distance, can be calculated. Persistence barcodes and persistence landscapes have been developed to represent the persistent topology within datasets. Persistence images have been introduced to vectorize persistence diagrams for machine learning tasks. The persistence image methodology was selected because of its ability to work with a broader range of machine learning algorithms and the potential to convert its feature vectors into a visual signature. The persistence image (PI) pipeline converts the birth-death points to birth-persistence, i.e. ( b,d ) to ( b,d - b ). A weighting function can be applied, giving points that are more persistent a higher amount of intensity. Gaussian probability distribution with selected variance level can be applied at each point. A grid of n x n can be overlaid over the surface to form the PI with chosen resolution. The pixel intensities of the PI can be taken as a feature vector for machine learning and feature selection. Vectors from different component dimensions can be concatenated into a disease pattern motif for both visualization and storage.

[0042] A proof of concept of the workflow will now be discussed. A merged cohort from two previously published datasets was utilized, with a total of 51 constrictive pericarditis (CP), 47 restrictive cardiomyopathies (RCM), and 53 no structural heart failure control patients (normal). Both studies received the required institutional review board approval.

[0043] Speckle Tracking Echocardiography. Grayscale images from the apical 4- chamber (4ch) and midventricular parasternal short-axis (mid) views were evaluated with 2- dimensional speckle-tracking echocardiography (STE). In the STE image analysis, the left ventricle was divided into 49 and 48 myocardium segments for 4ch and short-axis views, respectively. Over one cardiac cycle, various features, including strain, strain rate, and velocity, were measured at each spatial location corresponding to cardiac tissue motion. The measurements were stored in two text files, one for 4ch view and one for mid view, which serve as the raw data for this proof-of-concept study.

[0044] FIG. 4F illustrates an overview of the use-case scenario. Three regions were defined in each the mid short axis (1A) and apical four chamber view (1B). The average regional curves (1C) for each strain parameter were calculated and transformed using phase space reconstruction (1D). This served as the input for persistent homology filtration (2A). The resulting birth and death coordinates for dimension 0 were converted to a vector form for machine learning using persistent image methodology (2B). [0045] Pre-processing. From the 4ch STE analysis, longitudinal strain (LS), longitudinal strain rate (LSR), radial strain (RS), and radial strain rate (RSR) were obtained. Due to the different image frame times and heart rates, each patient had a varying number of time points over one cardiac cycle from which measurements were recorded. Thus, to compare patient measurements at varying times within one cardiac cycle, cubic spline interpolation was used to compute values of the tracings at 50 time points over one cycle; this resulted in a 49 by 50 array. Subsequently, the 49 left ventricular segments were grouped into three regions, lateral, apical, and septal wall regions, by taking 16-17 adjacent segments. The mean of the strain values was taken within each region to produce three average strain curves. A similar procedure produced anterior wall, septal wall, and posterior wall average curves of circumferential strain (CS) and circumferential strain rate (CSR) from the mid-view STE analysis. To minimize the effects of cardiac ejection and relaxation occurring at different time points within cardiac cycles of different patients, the resulting 18 strain tracings were converted to a state space point cloud using phase space reconstruction with embedding dimension d = 3 and time delay t= 2.

[0046] Persistent Homology (PH). This experiment was performed on Python using TDA libraries, ripser (v0.6.0) and persim (v0.2.0), which are both freely available. To analyze the point clouds, simplicial complex filtration was built on the data points using ripser. The PH of the filtration was extracted as birth and death values for dimension 0. To utilize this information in downstream machine learning tasks, a persistent linear image was created with pixel resolution of [1 ,50] and variance of 0.005; the image bounds were automatically selected by the persim algorithm. The 50-pixel intensities resulting from each strain and wall region combination were concatenated to generate a patient-specific motif.

[0047] Patient-Specific Motif. The workflow produced a visual representation indicative of the initial input that can be interpreted directly by physicians/scientists while also being capable of feeding into downstream machine learning tasks. The patient-specific motifs that are generated showcase the general trends of the disease conditions while still maintaining individual patient characteristics, allowing the patients to be followed up over time and monitored for cardiac functions changes via their unique visual signature.

[0048] TDA Evaluation Using Machine Learning Algorithms. All machine-learning modeling and statistical analyses were performed using Medcalc in Windows, version 19.4.1 (MedCalc Software, Ostend, Belgium), RStudio version 3.1.3 (Vienna, Austria) and Orange, an open-source, cross-platform data mining and machine learning platform A p-value < 0.05 was considered statistically significant.

[0049] As an initial step, persistent homology data from the various strain and strain rates were preprocessed to remove columns with zero variance. The dataset was then randomly split into training (80%) and test (20%) sets using replicable deterministic sampling. To permit the training of the machine learning algorithms without incurring an overfitting problem. Feature selection was also performed with a random forest-based approach using the Boruta algorithm in the R statistical environment to capture features that are important in predicting a target variable. Using the persistent homology data features retained after applying the Boruta algorithm, three different binary classifiers and a multiclass classifier model were developed using the training dataset.

[0050] In assessing both binary (one group vs. another group) and multiclass (one group vs. the other two groups) classifiers, evaluation of performance was determined by the best performing algorithm using cross-validation on training dataset, e.g., ensemble-based (i.e., Random Forest), neural networks, logistic regression, naive-Bayes, decision trees, support etc. Thus, developed models were subsequently assessed using the unseen internal validation dataset. Sensitivity and specificity were calculated as true positive / (true positive + false negative) and true negative / (true negative + false positive), respectively. Receiver operating characteristic (ROC) curves were drawn by plotting sensitivity and the 1 -specificity to multiple thresholds. An area under the curve (AUC) was calculated as the area under the ROC curves and tested for statistical significance. As a standard for comparison, GLS was evaluated using logistic regression. Default parameters of the various supervised learning methods were used unless otherwise noted. Results

[0051] Average Strain Pattern Motifs. Referring now to FIG. 4G, shown are the average strain pattern motifs for each cardiac condition. The output visual signature for persistent homology workflow was a heatmap-like motif with x-axis corresponding to strain or strain rate in longitudinal, radial, and circumferential directions; y-axis corresponding to persistence pixel position in the resulting 1 by 50 vector for each combination of wall region and strain measurement; and z-color corresponding to the pixel intensity calculated through persistent image vectorization. The average strain motifs for constrictive pericarditis (A), restrictive cardiomyopathy (B), and normal/control patients (C) are shown to demonstrate group defining patterns. CP = constrictive pericarditis, RCM = restrictive cardiomyopathy, LS = longitudinal strain, LSR = longitudinal strain rate, RS = radial strain, RSR = radial strain rate, CS = circumferential strain, CSR = circumferential strain rate.

[0052] From visual inspection of the cardiac condition average strain motifs, general trends can be identified, e.g., RCM patients generally have higher intensity values limited to lower persistence pixels while having lower intensity values at higher persistence pixels. This indicates, in general, a restrictive pattern within these patients because it suggests that a fully connected component in the HO dimension is formed at a lower scale parameter. Alternatively, for both CP and normal groups, the average motif showed a much wider spread to their strain persistence pixel intensities, indicating more spread in these patients' data points than the general constraint seen in RCM.

[0053] Feature Selection. While the number of features was considerably reduced from 14700 to 900 (18 by 50), this data still exhibited a high feature to sample ratio. To avoid an overfitting problem, Boruta feature selection was performed through R (v4.0.3) statistical suite that applies a random forest algorithm to determine meaningful features to retain.

These selected features were used to develop predictive models distinguishing between CP, RCM, and normal patients.

[0054] Machine Learning Classifiers. To determine if the features extracted through the pipeline helped distinguish the cardiac conditions, three binary class classifiers were developed for CP vs. RCM, CP vs. normal, and RCM vs. normal. Next, the data were split into an 80% training and 20% testing set using replicable deterministic sampling. Finally, the performance of these models was compared with a baseline performance achieved by logistic regression models using average peak longitudinal strain from the 4Ch view. This peak value approximates the global longitudinal strain that clinicians typically extract from cardiac strain imaging data.

[0055] FIG. 4H illustrates an example of binary classifier receiver operating curves. Across all binary class distinctions, the persistent homology workflow outperformed or matched the performance metrics of the GLS model for RCM vs. CP (A), RCM vs. NL (B), and CP vs. NL (C). CP = constrictive pericarditis, RCM = restrictive cardiomyopathy, NL = normal, GLS = global longitudinal strain, TDA = topological data analysis.

[0056] The CP vs normal tree classifier demonstrated the greatest improvement compared to the GLS model (PH AUC = 0.94; GLS AUC = 0.64; p=0.02). The RCM vs CP logistic regression classifier showed an improvement that was not statistically significant (PH AUC= 0.99; GLS AUC = 0.78; p = 0.058). The RCM vs normal random forest classifier maintained the efficacy of the GLS model (PH AUC = 0.95; GLS AUC = 0.94; p = 0.88). A multi-class classifier was created to discriminate between all conditions; the average across all classes AUC, sensitivity (Sn), and specificity (Sp) were improved in comparison to the baseline model. The PH model achieved AUC = 0.87 (Sn = 71% and Sp = 83%) whereas GLS model achieved AUC = 0.72 (Sn = 53% and Sp = 69%).

[0057] Interpretable Artificial Intelligence. The interpretable artificial intelligence results are shown as Shapley additive explanation plots indicating the top ten features integral in distinguishing each class from the others. FIG. 4I illustrates the Shapley additive explanations for the multi-class model. The Shapley plot presents the top ten important features responsible for the multi-class machine learning model to output its predictions that discriminated each cardiac condition from the other two, i.e., CP from RCM and Normal (A), RCM from CP and Normal (B), and Normal from CP and RCM (C). The features identified from this interpretable artificial intelligence tool can correspond to specific regions in the patient specific motif that contribute to disease stratification visually. CP = constrictive pericarditis, RCM = restrictive cardiomyopathy, NL = normal, LS = longitudinal strain, LSR = longitudinal strain rate, RS = radial strain, RSR = radial strain rate, CS = circumferential strain, CSR = circumferential strain rate.

[0058] Thus, a combination of feature trends is responsible for the model to output a particular prediction. Moreover, these results allow better comprehension of the average strain motifs produced as illustrated in FIG. 4G. As an example, in the Shapley plot for RCM vs. others discrimination, Septal Longitudinal Strain pixel 19 is observed as a top informative feature; when inspecting the average strain pattern motifs, RCM patients have low intensities in this pixel region compared to CP and normal patients that, on average, have higher intensities at this pixel. Additionally, it can be seen that very few RCM patients showcased any intensity above this pixel.

[0059] To understand this pattern, the original phase space reconstruction point clouds for septal longitudinal strain can be referred to. For convenience, a few example patients from each disease group are depicted in FIG. 4J, which shows the original phase reconstruction point clouds for septal longitudinal strain analysis. Representative point clouds are provided for CP, RCM, and Normal groups. CP = constrictive pericarditis, RCM = restrictive cardiomyopathy. It can be observed that multiple RCM patients have a much tighter trajectory of longitudinal strain, while CP and normal patients tend to have much broader loop patterns. As a result, the persistent homology workflow detects that in RCM patients, a fully formed connected component in dimension 0 forms at lower radius values; on the other hand, CP and normal patients have separate components persist into larger radius values that would be required to create a connected simplicial complex around the data points. In other words, RCM patients have a restrained septal longitudinal strain compared to CP and normal patients. A similar observation was identified for CP patients who do not exhibit much activity in Wall 1 circumferential strain greater than pixel 13, indicating a reduced circumferential strain in this region compared to RCM and normal patients. [0060] A slightly different pattern is discovered regarding the radial apical strain rate in CP patients. While the CP patients tend to exhibit low values for this pixel 3 of this measurement, they have intensities present in the higher pixel values. This indicates that most CP patients possessed a broader trajectory in their phase space reconstruction point clouds. This can be seen in FIG. 4K, which shows the original phase reconstruction point clouds for apical radial strain analysis. Representative point clouds are provided for CP, RCM, and Normal groups. CP = constrictive pericarditis, RCM = restrictive cardiomyopathy. This may suggest one of the compensatory mechanisms present in CP patients is an increase in apical radial strain rate. The other informative features can also be confirmed and investigated by referring to the original phase space reconstruction point clouds.

[0061] In review, topological data extraction of segmental strain analysis has been harnessed to develop a better stratification tool for cardiovascular diseases. Through vectorization of traditionally linearized data, a more holistic assessment of left ventricular function can be obtained through echocardiography alone. A workflow was developed for the use-case model that can accurately stratify uncommon cardiovascular diseases in a small patient cohort. Additionally, the structural and functional data can be represented as a persistent image that can be displayed as a motif, allowing for clinical assessment of the processed data. The use of topological data analysis (TDA) presents significant advantages for augmenting current diagnostic tests.

[0062] The current study leverages the more sophisticated waveforms derived from segmental strain analysis. Global and local structural deformations of the cardiac myocardium were captured with persistent homology, enabling the prediction of the presence of constrictive pericarditis (CP) (AUC: 0.94) and restrictive cardiomyopathy (RCM) (AUC: 0.95) from normal patients. In this small cohort, the ability of the TDA model to make accurate predictions of rare disease presentations highlights its use for cardiovascular applications. In diseases with low prevalence, this presents an obvious advantage to traditional approaches that may require a specific threshold of cases before allowing appropriate stratification. Another advantage is that the application of TDA to echocardiographic assessments, such as strain analysis, reduces the need for further imaging and invasive procedures while providing no risk of radiation exposure to the patient.

[0063] Strain analysis can be segmentally divided into anatomically unique locations that comprise the entirety of the left ventricular myocardium. The 48 (short-axis view) and 49 (apical four-chamber view) segmental strain parameters were combined into three functional groupings that included the anterior septal, inferior septal, and lateral wall in short axis and the lateral wall, apex, and interventricular septum in apical four chamber view. Analysis of segmental strain waveforms as aggregates instead of individual segments can remove the stochastic nature that analysis of each separate segment would precipitate. Instead, grouping by functional domains allows for averaging curves through a more physiologically relevant manner, specifically regarding the contractile nature and ultrastructural properties of cardiomyocytes within the myocardium.

[0064] The current application of the use-case scenario highlights the ability of TDA, and more specifically persistent homology, to correctly stratify unique cardiovascular anomalies from segmental strain analysis. Importantly, this approach can be applied both to small sample sizes as well as generalized to larger, diverse populations. The clinical usefulness of this method is further indicated through its ability to create a patient-specific motif that visually defines the structural/functional information derived from TDA, which bridges the gap between computer-driven data discovery and everyday interpretability to achieve clinical efficacy.

[0065] With reference to FIG. 5, shown is a schematic block diagram of a computing device 300 that can be utilized to generate codes (e.g., QR-Codes for patients) using the described techniques. In some embodiments, among others, the computing device 300 may represent a mobile device (e.g. a smartphone, tablet, computer, etc.) or other processing device. Each computing device 300 includes processing circuitry comprising at least one processor circuit, for example, having a processor 303 and a memory 306, both of which are coupled to a local interface 309. To this end, each computing device 300 may comprise, for example, at least one server computer or like device. The local interface 309 may comprise, for example, a data bus with an accompanying address/control bus or other bus structure as can be appreciated.

[0066] In some embodiments, the computing device 300 can include one or more network interfaces 310. The network interface 310 may comprise, for example, a wireless transmitter, a wireless transceiver, and a wireless receiver. As discussed above, the network interface 310 can communicate to a remote computing device using a Bluetooth protocol. As one skilled in the art can appreciate, other wireless protocols may be used in the various embodiments of the present disclosure.

[0067] Stored in the memory 306 are both data and several components that are executable by the processor 303. In particular, stored in the memory 306 and executable by the processor 303 are code generation program 315, application program 318, and potentially other applications. Also stored in the memory 306 may be a data store 312 and other data. In addition, an operating system may be stored in the memory 306 and executable by the processor 303.

[0068] It is understood that there may be other applications that are stored in the memory 306 and are executable by the processor 303 as can be appreciated. Where any component discussed herein is implemented in the form of software, any one of a number of programming languages may be employed such as, for example, C, C++, C#, Objective C, Java®, JavaScript®, Perl, PHP, Visual Basic®, Python®, Ruby, Flash®, or other programming languages.

[0069] A number of software components are stored in the memory 306 and are executable by the processor 303. In this respect, the term "executable" means a program file that is in a form that can ultimately be run by the processor 303. Examples of executable programs may be, for example, a compiled program that can be translated into machine code in a format that can be loaded into a random access portion of the memory 306 and run by the processor 303, source code that may be expressed in proper format such as object code that is capable of being loaded into a random access portion of the memory 306 and executed by the processor 303, or source code that may be interpreted by another executable program to generate instructions in a random access portion of the memory 306 to be executed by the processor 303, etc. An executable program may be stored in any portion or component of the memory 306 including, for example, random access memory (RAM), read-only memory (ROM), hard drive, solid-state drive, USB flash drive, memory card, optical disc such as compact disc (CD) or digital versatile disc (DVD), floppy disk, magnetic tape, or other memory components.

[0070] The memory 306 is defined herein as including both volatile and nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power. Thus, the memory 306 may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, or a combination of any two or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable readonly memory (EEPROM), or other like memory device.

[0071] Also, the processor 303 may represent multiple processors 303 and/or multiple processor cores and the memory 306 may represent multiple memories 306 that operate in parallel processing circuits, respectively. In such a case, the local interface 309 may be an appropriate network that facilitates communication between any two of the multiple processors 303, between any processor 303 and any of the memories 306, or between any two of the memories 306, etc. The local interface 309 may comprise additional systems designed to coordinate this communication, including, for example, performing load balancing. The processor 303 may be of electrical or of some other available construction. [0072] Although the code generation program 315 and the application program 318, and other various systems described herein may be embodied in software or code executed by general purpose hardware as discussed above, as an alternative the same may also be embodied in dedicated hardware or a combination of software/general purpose hardware and dedicated hardware. If embodied in dedicated hardware, each can be implemented as a circuit or state machine that employs any one of or a combination of a number of technologies. These technologies may include, but are not limited to, discrete logic circuits having logic gates for implementing various logic functions upon an application of one or more data signals, application specific integrated circuits (ASICs) having appropriate logic gates, field-programmable gate arrays (FPGAs), or other components, etc. Such technologies are generally well known by those skilled in the art and, consequently, are not described in detail herein.

[0073] Also, any logic or application described herein, including the code generation program 315 and the application program 318, that comprises software or code can be embodied in any non-transitory computer-readable medium for use by or in connection with an instruction execution system such as, for example, a processor 303 in a computer system or other system. In this sense, the logic may comprise, for example, statements including instructions and declarations that can be fetched from the computer-readable medium and executed by the instruction execution system. In the context of the present disclosure, a "computer-readable medium" can be any medium that can contain, store, or maintain the logic or application described herein for use by or in connection with the instruction execution system.

[0074] The computer-readable medium can comprise any one of many physical media such as, for example, magnetic, optical, or semiconductor media. More specific examples of a suitable computer-readable medium would include, but are not limited to, magnetic tapes, magnetic floppy diskettes, magnetic hard drives, memory cards, solid-state drives, USB flash drives, or optical discs. Also, the computer-readable medium may be a random access memory (RAM) including, for example, static random access memory (SRAM) and dynamic random access memory (DRAM), or magnetic random access memory (MRAM). In addition, the computer-readable medium may be a read-only memory (ROM), a programmable readonly memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other type of memory device.

[0075] Further, any logic or application described herein, including the code generation program 315 and the application program 318, may be implemented and structured in a variety of ways. For example, one or more applications described may be implemented as modules or components of a single application. For example, separate applications can be executed for the PH and Radon transform workflows as illustrated in FIGS. 1-3. Further, one or more applications described herein may be executed in shared or separate computing devices or a combination thereof. For example, a plurality of the applications described herein may execute in the same computing device 300, or in multiple computing devices in the same computing environment. Additionally, it is understood that terms such as “application,” “service,” “system,” “engine,” “module,” and so on may be interchangeable and are not intended to be limiting.

[0076] GLS has a simplicity since it is viewed as a single value; on the other hand, a series of strain curves is cognitively demanding to process. Simplifying a series of segmental strain curves to its meaningful information while preserving each patient’s unique identity is not trivial. The proof of concept workflow provides a way to simplify the spatially and temporally complex strain curves and store only meaningful features in a QR-Code. Moreover, by using persistent homology the issue of poor reproducibility of segmental strain values can be addressed, by instead emphasizing the shape of the curve in the data analysis. This workflow can be used to encode other disease patterns within patient individualized QR-Codes.

[0077] It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

[0078] The term "substantially" is meant to permit deviations from the descriptive term that don't negatively impact the intended purpose. Descriptive terms are implicitly understood to be modified by the word substantially, even if the term is not explicitly modified by the word substantially.

[0079] It should be noted that ratios, concentrations, amounts, and other numerical data may be expressed herein in a range format. It is to be understood that such a range format is used for convenience and brevity, and thus, should be interpreted in a flexible manner to include not only the numerical values explicitly recited as the limits of the range, but also to include all the individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly recited. To illustrate, a concentration range of “about 0.1% to about 5%” should be interpreted to include not only the explicitly recited concentration of about 0.1 wt% to about 5 wt%, but also include individual concentrations (e.gf., 1%, 2%, 3%, and 4%) and the sub-ranges (e.g., 0.5%, 1.1%, 2.2%, 3.3%, and 4.4%) within the indicated range. The term “about” can include traditional rounding according to significant figures of numerical values. In addition, the phrase “about ‘x’ to ‘y’” includes “about ‘x’ to about ‘y’”·