Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR PREDICTING A TYPE OF A CARDIAC DISEASE
Document Type and Number:
WIPO Patent Application WO/2024/080928
Kind Code:
A1
Abstract:
A method for predicting a type of a cardiac disease (y) in a subject is provided. The method includes: receiving, by a processing device, a vector representation of a time series data (x) associated with an activity of a heart of the subject; extracting, using the processing device, a vector representation of an amplitude (α) of the vector representation of the time series data (x) using a first neural network (A); generating, using the processing device, a prediction of a vector representation of a mask (m) based on the vector representation of the amplitude (α) using a second neural network (MM); applying, using the processing device, the vector representation of the mask (m) on the vector representation of the amplitude (α) to obtain a vector representation of an amplitude in a region of interest (ãm); determining, using the processing device, a set of shape parameters (θo) of the amplitude in the region of interest (ãm) based on a set of heuristics (Θo), wherein the set of shape parameters (θo) defines a shape function (hm) of the amplitude in the region of interest (ãm), wherein the shape function (hm) is based on a predetermined shape function representing the cardiac disease (y) in the subject; performing, using the processing device, shape fitting of the shape function (hm) with the amplitude in the region of interest (ãm); and calculating, using the processing device, a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

Inventors:
LIM BRIAN (SG)
Application Number:
PCT/SG2023/050684
Publication Date:
April 18, 2024
Filing Date:
October 11, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NAT UNIV SINGAPORE (SG)
International Classes:
G16H50/20; G06N20/00
Attorney, Agent or Firm:
SPRUSON & FERGUSON (ASIA) PTE LTD (SG)
Download PDF:
Claims:
Claims

What is claimed is:

1. A method for predicting a type of a cardiac disease (y) in a subject, the method comprising: receiving, by a processing device, a vector representation of a time series data (x) associated with an activity of a heart of the subject; extracting, using the processing device, a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A); generating, using the processing device, a prediction of a vector representation of a mask (in) based on the vector representation of the amplitude (a) using a second neural network (MM); applying, using the processing device, the vector representation of the mask (m) on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest (czm); determining, using the processing device, a set of shape parameters (0o) of the amplitude in the region of interest (czm) based on a set of heuristics (0o), wherein the set of shape parameters (0o) defines a shape function (hm) of the amplitude in the region of interest (am), wherein the shape function (hm) is based on a predetermined shape function representing the cardiac disease (y) in the subject; performing, using the processing device, shape fitting of the shape function (hm) with the amplitude in the region of interest (czm); and calculating, using the processing device, a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

2. The method of claim 1, wherein the set of shape parameters (0o) comprises at least one time parameter and at least one slope parameter, and wherein the set of heuristics determines the time parameter and the slope parameter based on a set of predetermined shape parameters defining the predetermined shape function.

3. The method of claim 2, wherein before the calculating, using the processing device, the MSE lack of fit (d) of the shape fitting, further comprises: iteratively: performing, using the processing device, optimization of the set of shape parameters (0O) using Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L- BFGS) algorithm to obtain an optimized set of shape parameters (0), wherein the optimized set of shape parameters (0) defines an optimized shape function of the amplitude in the region of interest (czm), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject, wherein the method further comprises: performing, using the processing device, shape fitting of the optimized shape function of the amplitude in the region of interest (hm) with the amplitude in the region of interest (czm) such that the MSE lack of fit (d) is minimized.

4. The method of any one of claims 1 to 3, wherein generating, using the processing device, the prediction of the vector representation of the mask (in) comprises: performing, using the processing device, segmentation of the vector representation of the amplitude (a) using the second neural network (Mm).

5. The method of any one of claims 1 to 4, wherein the time series data (x) corresponds to an audio recording of a heart cycle, and wherein the region of interest (in) corresponds to a segment when a murmur sound occurs.

6. The method of claim 5, wherein the type of the cardiac disease (y) comprises any one of: Normal, Aortic stenosis, Mitral regurgitation, Mitral valve prolapse and Mitral stenosis.

7. The method of claim 6, wherein generating, using the processing device, the prediction of the vector representation of the mask (in) further comprises generating an embedding vector representation of the mask (zm), the method further comprising: generating, using the processing device, an initial prediction of the cardiac disease (y) based on the embedding vector representation of the mask using a third neural network (Fyo).

8. The method of claim 7, further comprising: determining, using the processing device, a phase of the heart cycle based on the initial prediction

9. The method of claim 8, further comprising: generating, by the processing device, an intermediate diagrammatic explainable prediction of the cardiac disease (y) in the subject based on the phase of the heart cycle and the calculated MSE lack of fit using a fourth neural network ( yh).

10. The method of claim 9, further comprising: generating, by the processing device, a final diagrammatic explainable prediction of the cardiac disease (y) in the subject based on the initial prediction and the intermediate diagrammatic explainable prediction of the cardiac disease (y) in the subject using a fifth neural network ( y).

11. A system for predicting a type of a cardiac disease (y) in a subject, the system comprising a processing device configured to: receive a vector representation of a time series data (x) associated with an activity of a heart of the subject; extract a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A); generate a prediction of a vector representation of a mask based on the vector representation of the amplitude (a) using a second neural network (MM); apply the vector representation of the mask on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest , determine a set of shape parameters of the amplitude in the region of interest based on a set of heuristics wherein the set of shape parameters defines a shape function (hm) of the amplitude in the region of interest (am), wherein the shape function (h.m) is based on a predetermined shape function representing the cardiac disease (y) in the subject; perform shape fitting of the shape function hm~) with the amplitude in the region of interest (czm); and calculate a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

12. The system of claim 11, wherein the set of shape parameters (0o) comprises at least one time parameter and at least one slope parameter, and wherein the set of heuristics determines the time parameter and the slope parameter based on a set of predetermined shape parameters defining the predetermined shape function.

13. The system of claim 12, wherein the processing device is further configured to: iteratively: perform optimization of the set of shape parameters (0O) using Limitedmemory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm to obtain an optimized set of shape parameters (0), wherein the optimized set of shape parameters (0) defines an optimized shape function of the amplitude in the region of interest (dm), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject; and perform shape fitting of the optimized shape function of the amplitude in the region of interest (hm) with the amplitude in the region of interest (dm) such that the MSE lack of fit (d) is minimized.

14. The system of any one of claims 11 to 13, wherein generate the prediction of the vector representation of the mask (in) comprises: perform segmentation of the vector representation of the amplitude (a) using the second neural network (Mm).

15. The system of any one of claims 11 to 14, wherein the time series data (x) corresponds to an audio recording of a heart cycle, and wherein the region of interest (in) corresponds to a segment when a murmur sound occurs.

16. The system of claim 15, wherein the type of the cardiac disease (y) comprises any one of: Normal, Aortic stenosis, Mitral regurgitation, Mitral valve prolapse and Mitral stenosis.

17. The system of claim 16, wherein the generate the prediction of the vector representation of the mask (in) further comprises generate an embedding vector representation of the mask (zm), the processing device is further configured to: generate an initial prediction (yo) of the cardiac disease (y) based on the embedding vector representation of the mask (zm) using a third neural network (Fy ).

18. The system of claim 17, wherein the processing device is further configured to: determine a phase of the heart cycle (< >o) based on the initial prediction (yo).

19. The system of claim 18, wherein the processing device is further configured to: generate an intermediate diagrammatic explainable prediction (y ) of the cardiac disease (y) in the subject based on the phase of the heart cycle (< >o) and the calculated MSE lack of fit (d) using a fourth neural network ( yh).

20. The system of claim 19, wherein the processing device is further configured to: generate a final diagrammatic explainable prediction (y) of the cardiac disease (y) in the subject based on the initial (yo) prediction and the intermediate (yh) diagrammatic explainable prediction of the cardiac disease (y) in the subject using a fifth neural network ( y).

Description:
METHOD AND SYSTEM FOR PREDICTING A TYPE OF A CARDIAC

DISEASE

FIELD OF INVENTION

[0001] The present invention relates broadly, but not exclusively, to a method and system for predicting a type of a cardiac disease.

BACKGROUND

[0002] The need for Al accountability has spurred the development of many explainable Al (XAI) techniques. However, current approaches tend to use rudimentary, off-the-shelf visualizations, such as bar or line charts and heat maps, that assume users are analytically- driven to study the visualizations. Consequently, these are difficult to make sense of, too simplistic to provide effective feedback, or require significant subsequent effort to interpret.

[0003] Additionally, since machine learning models make predictions based on learned rules, many XAI techniques produce explanations that also reason deductively. However, humans can also reason with other processes. Abductive reasoning is a particularly powerful approach to first generate hypotheses, then test them to determine why an observation or event occurred.

[0004] A need therefore exists to provide a method and a system that can allow XAI to support abductive reasoning.

[0005] Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure. SUMMARY

[0006] According to a first aspect of the present invention, there is provided a method for predicting a type of a cardiac disease (y) in a subject. The method includes receiving, by a processing device, a vector representation of a time series data (x) associated with an activity of a heart of the subject; extracting, using the processing device, a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A); generating, using the processing device, a prediction of a vector representation of a mask (in) based on the vector representation of the amplitude (a) using a second neural network (M M ); applying, using the processing device, the vector representation of the mask (in) on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest (a m ); determining, using the processing device, a set of shape parameters (9o) of the amplitude in the region of interest (a m ) based on a set of heuristics (0o), wherein the set of shape parameters (9o) defines a shape function (h m ) of the amplitude in the region of interest ( m ), wherein the shape function (h m ) is based on a predetermined shape function representing the cardiac disease (y) in the subject; performing, using the processing device, shape fitting of the shape function (h m ) with the amplitude in the region of interest (a m ); and calculating, using the processing device, a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

[0007] The set of shape parameters (0o) may include at least one time parameter and at least one slope parameter, and the set of heuristics determines the time parameter and the slope parameter based on a set of predetermined shape parameters defining the predetermined shape function.

[0008] Before the step of calculating, using the processing device, the MSE lack of fit (d) of the shape fitting, the method may include iteratively: performing, using the processing device, optimization of the set of shape parameters (0 O ) using Limitedmemory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm to obtain an optimized set of shape parameters (0), wherein the optimized set of shape parameters (9) defines an optimized shape function of the amplitude in the region of interest (a m ), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject, the method may further include: performing, using the processing device, shape fitting of the optimized shape function of the amplitude in the region of interest (h m ) with the amplitude in the region of interest (a m ) such that the MSE lack of fit (d) is minimized.

[0009] The step of generating, using the processing device, the prediction of the vector representation of the mask (m) may include performing, using the processing device, segmentation of the vector representation of the amplitude (a) using the second neural network (M m ).

[0010] The step of generating, using the processing device, the prediction of the vector representation of the mask (m) may include generating an embedding vector representation of the mask (z m ). The method may further include generating, using the processing device, an initial prediction (yo) of the cardiac disease (y) based on the embedding vector representation of the mask (z m ) using a third neural network (F yo ).

[0011] In an exemplary embodiment, the time series data (x) corresponds to an audio recording of a heart cycle, and the region of interest (m) corresponds to a segment when a murmur sound occurs.

[0012] In the exemplary embodiment, the type of the cardiac disease (y) may include any one of: Normal, Aortic stenosis, Mitral regurgitation, Mitral valve prolapse and Mitral stenosis.

[0013] In the exemplary embodiment, the method may further include a step of determining, using the processing device, a phase of the heart cycle (< >o) based on the initial prediction (yo).

[0014] In the exemplary embodiment, the method may further include a step of generating, by the processing device, an intermediate diagrammatic explainable prediction (yh) of the cardiac disease (y) in the subject based on the phase of the heart cycle (< >o) and the calculated MSE lack of fit (d) using a fourth neural network ( y h). [0015] In the exemplary embodiment, the method may further include a step of generating, by the processing device, a final diagrammatic explainable prediction (y) of the cardiac disease (y) in the subject based on the initial (yo) prediction and the intermediate (yh) diagrammatic explainable prediction of the cardiac disease (y) in the subject using a fifth neural network ( y ).

[0016] In a second aspect of the present invention, there is provided a system for predicting a type a cardiac disease (y) in a subject. The system includes a processing device configured to: receive a vector representation of a time series data (x) associated with an activity of a heart of the subject; extract a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A); generate a prediction of a vector representation of a mask (in) based on the vector representation of the amplitude (a) using a second neural network (M M ); apply the vector representation of the mask (in) on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest (cz m ); determine a set of shape parameters (0o) of the amplitude in the region of interest (a m ) based on a set of heuristics (0o), wherein the set of shape parameters (9 o) defines a shape function (h m ) of the amplitude in the region of interest (a m ), wherein the shape function (ii m ) is based on a predetermined shape function representing the cardiac disease (y) in the subject; perform shape fitting of the shape function (h m ) with the amplitude in the region of interest (a m ); and calculate a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

[0017] The set of shape parameters (9o) may include at least one time parameter and at least one slope parameter, and the set of heuristics determines the time parameter and the slope parameter based on a set of predetermined shape parameters defining the predetermined shape function.

[0018] The processing device may be configured to: iteratively: perform optimization of the set of shape parameters (0 O ) using Limited-memory Broyden-Fletcher-Goldfarb- Shanno (L-BFGS) algorithm to obtain an optimized set of shape parameters (9), wherein the optimized set of shape parameters (0) defines an optimized shape function of the amplitude in the region of interest (a m ), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject; and perform shape fitting of the optimized shape function of the amplitude in the region of interest with the amplitude in the region of interest (cz m ) such that the MSE lack of fit (d) is minimized.

[0019] The processing device may be configured to perform segmentation of the vector representation of the amplitude (a) using the second neural network (M m ).

[0020] The generate the prediction of the vector representation of the mask (in) may further include generate an embedding vector representation of the mask (z m ), and the processing device may be further configured to: generate an initial prediction (yo) of the cardiac disease (y) based on the embedding vector representation of the mask (z m ) using a third neural network (F y ).

[0021] In an exemplary embodiment, the time series data (x) corresponds to an audio recording of a heart cycle, and the region of interest (in) corresponds to a segment when a murmur sound occurs.

[0022] In the exemplary embodiment, the type of the cardiac disease (y) may include any one of: Normal, Aortic stenosis, Mitral regurgitation, Mitral valve prolapse and Mitral stenosis.

[0023] In the exemplary embodiment, the processing device may be configured to determine a phase of the heart cycle (< >o) based on the initial prediction (yo).

[0024] In the exemplary embodiment, the processing device may be configured to generate an intermediate diagrammatic explainable prediction (yh) of the cardiac disease (y) in the subject based on the phase of the heart cycle (< >o) and the calculated MSE lack of fit (d) using a fourth neural network ( y h).

[0025] In the exemplary embodiment, the processing device may be configured to generate a final diagrammatic explainable prediction (y) of the cardiac disease (y) in the subject based on the initial (yo) prediction and the intermediate (yh) diagrammatic explainable prediction of the cardiac disease (y) in the subject using a fifth neural network ( y ).

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] Embodiments of the invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:

[0027] Fig. 1 shows a reasoning processing framework between a user and Al a) with current XAI explanations and b) with Diagrammatization.

[0028] Fig. 2 shows a processing framework of three types of reasoning demonstrated on a simplified, pedagogical scenario of inferring cats and dogs.

[0029] Fig. 3A shows a Venn diagram indicating that diagrammatic explanation encompasses different types of visualization and verbalization explanations.

[0030] Fig. 3B shows a table listing Diagrammatization design space with dimensions to compare verbal, visual, and diagram representations of XAI.

[0031] Fig. 4 shows on the Upper- Left: a schematic diagram of anatomy and physiology of the heart, Lower-Left: a schematic diagram showing aortic valve with normal or aortic stenosis pathology and Right: Murmur diagrams showing typical murmurs for various cardiac diseases.

[0032] Fig. 5 shows a cardiac diagnosis framework using the Peircean abductive process.

[0033] Fig. 6 shows a schematic diagram of an exemplary base Convolutional Neural Network (CNN) for predicting cardiac diagnosis, according to an embodiment. [0034] Fig. 7 shows a table listing various cardiac diagnoses as piecewise linear functions of murmur amplitude changes over time.

[0035] Fig. 8A shows a schematic diagram of an exemplary modular architecture of a DiagramNet deep neural network, in accordance with an embodiment. Each module corresponds to stages 1-7 and the steps of the Peircean abduction process (I to IV). Single line arrows indicate feedforward activations, double line arrows indicates an iterative nonlinear optimization to estimate the final murmur shape parameters. Bold variables are vectors or tensors, variables with a hat (") indicate predicted values, ° is the Hardamard operator for element-wise multiplication of vectors for masking the amplitude on the murmur region. Narrow rectangles indicate an input or predicted variable. Other shapes indicate processes, such as trainable neural network blocks (capital letters), non-trainable heuristic processes (script letters), and vector operators (circles).

[0036] Fig. 8B shows a schematic diagram of an exemplary system for predicting a type of a cardiac disease in a subject, according to an embodiment.

[0037] Fig. 8C shows a flowchart illustrating an exemplary workflow of a DiagramNet deep neural network, in accordance with an embodiment.

[0038] Fig. 8D shows a flowchart illustrating another exemplary workflow of a DiagramNet deep neural network, in accordance with an embodiment.

[0039] Fig. 8E shows a flowchart illustrating another exemplary workflow of a DiagramNet deep neural network, in accordance with an embodiment.

[0040] Fig. 8F shows a flowchart illustrating another exemplary workflow of a DiagramNet deep neural network, in accordance with an embodiment.

[0041] Fig. 9 shows a table listing heuristics to estimate initial values of murmur shape parameters for each plausible diagnosis.

[0042] Fig. 10 shows a schematic diagram of an alternate exemplary spectrogram-based CNN model for predicting cardiac diagnoses, according to an embodiment. [0043] Fig. 11 shows examples of diagrammatic XAI based on abductive-deductive reasoning using phonocardiograms (PCGs).

[0044] Fig. 12A shows examples of diagrammatic XAI based on abductive-deductive reasoning using phonocardiograms (PCGs) with contrastive explanation.

[0045] Fig. 12B shows a table listing the predicted shape parameters and goodness-of-fit MSE for each murmur shape shown in Fig. 12 A.

[0046] Fig. 13 shows examples of diagrammatic XAI based on abductive-deductive reasoning using phonocardiograms (PCGs) with counterfactual explanation.

[0047] Fig. 14 shows examples of diagrammatic XAI based on abductive-deductive reasoning using phonocardiograms (PCGs) with case (example-based) explanations.

[0048] Figs. 15A to 17 show results of model performance comparison between DiagramNet, the exemplary base CNN model and the alternate exemplary spectrogrambased CNN model.

[0049] Fig. 18 shows an exemplary user interface used to show diagrammatic XAI based on abductive-deductive reasoning using phonocardiograms (PCGs).

[0050] Fig. 19 shows an exemplary user interface to show cardiac disease prediction based on the alternate exemplary spectrogram-based CNN model.

[0051] Fig. 20 shows an exemplary user interface to show cardiac disease prediction based on Time-saliency XAI.

[0052] Fig. 21 shows a table listing different applications that the DiagramNet can be used in.

[0053] Fig. 22 shows a schematic diagram of a comparison between diagrammatic explanation and verbal explanation. [0054] Fig. 23 shows a schematic diagram of an example of a computing device used to realise a system for predicting a cardiac disease in a subject.

DETAILED DESCRIPTION

[0055] Embodiments of the present invention will be described, by way of example only, with reference to the drawings. Like reference numerals and characters in the drawings refer to like elements or equivalents.

[0056] Some portions of the description which follows are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.

[0057] Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as “scanning”, “calculating”, “determining”, “replacing”, “generating”, “initializing”, “outputting”, or the like, refer to the action and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical quantities within the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.

[0058] The present specification also discloses apparatus for performing the operations of the methods. Such apparatus may be specially constructed for the required purposes, or may comprise a computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various machines may be used with programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate. The structure of a conventional computer will appear from the description below.

[0059] In addition, the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in the art that the individual steps of the method described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.

[0060] Furthermore, one or more of the steps of the computer program may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a computer. The computer readable medium may also include a hardwired medium such as exemplified in the Internet system, or wireless medium such as exemplified in the GSM, GPRS, 3G or 4G mobile telephone systems, as well as other wireless systems such as Bluetooth, ZigBee, Wi-Fi. The computer program when loaded and executed on such a computer effectively results in an apparatus that implements the steps of the preferred method.

[0061] In embodiments of the present invention, use of the term ‘server’ may mean a single computing device or at least a computer network of interconnected computing devices which operate together to perform a particular function. In other words, the server may be contained within a single hardware unit or be distributed among several or many different hardware units.

[0062] The term “configured to” is used in the specification in connection with systems, apparatus, and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on IT software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions. For special-purpose logic circuitry to be configured to perform particular operations or actions means that the circuitry has electronic logic that performs the operations or actions.

Overview

[0063] The need for Al accountability has spurred the development of many explainable Al (XAI) techniques. However, current approaches tend to use rudimentary, off-the-shelf visualizations, such as bar or line charts and heatmaps, that assume users are analytically- driven to study the visualizations. Consequently, these are difficult to make sense of, too simplistic to provide effective feedback, or require significant subsequent effort to interpret.

[0064] Additionally, since machine learning models make predictions based on learned rules, many XAI techniques produce explanations that also reason deductively. However, humans can also reason with other processes. Abductive reasoning is a particularly powerful approach to first generate hypotheses, then test them to determine why an observation or event occurred. Abduction can allow the user and Al to reason with hypotheses from a shared domain, and save user effort to contextualize low-level explanations. Towards the goal of human-like XAI, embodiments of the present invention seek to provide an approach to leverage abduction to provide hypothesis-driven explanations for complex real world problems.

[0065] Yet, hypotheses of challenging tasks tend to be complex, thus expressive representations are needed for Al explanations. Diagrams are used in many domains to explain sophisticated observations and events. In physics, force diagrams can explain how objects move when interacting with other objects or fields. In medicine, diagrams can describe physiological mechanisms of diseases. Diagrams are distinct from visualization since they can encode inherent constraints based on hypotheses, and provide a systematic approach to read it, thus simplifying interpretation. Indeed, diagrams are a generalization of visual and verbal representations, thus expanding the diversity of explanations. Extending abductive reasoning, people engage in diagrammatic reasoning to: I) construct diagrams as consistent systems of representation, II) perform experiments based on the rules of the diagrams, and III) note the experiment results.

[0066] Therefore, to reduce interpretation burden, the present disclosure provides XAI Diagrammatization to use abductive inference and generate explanations in expressive and constrained diagrams as follows:

1) “Diagrammatization” as a design framework for diagrammatic reasoning in XAI to i) support abductive reasoning with hypotheses, ii) follow domain conventions, and iii) can be represented visually or verbally.

2) “DiagramNet”, a deep neural network to provide diagram-based, hypothesis- driven, abductive-deductive explanations by inferring to the best explanation while inferring the prediction label.

3) A clinical application and clinically-relevant explanations to diagnose cardiac disease using murmur diagrams. The murmur diagrams are formalized mathematically to predict them as explanations in DiagramNet.

4) Evaluation of DiagramNet using a real-world heart auscultation dataset with multiple studies. a) Demonstration study to illustrate that Diagrammatization can demonstrate abductive-deductive reasoning that follows domain conventions with abductive, contrastive, counterfactual, and case (examplebased) explanations. b) Modeling study to show that abductive-deductive reasoning in DiagramNet improves both prediction performance and explanation faithfulness compared to baseline and alternative models, and 5) Implications for XAI and generalization of Diagrammatization to other application domains.

Abductive And Diagrammatic Reasoning

[0067] Current XAI explanation show common visualizations (e.g., charts, saliency maps), but these require users to form their own hypotheses to evaluate. This leaves an interpretability gap. Ante-hoc diagrammatic reasoning can close this gap. Instead of drawing a diagram post-hoc, the Al can perform abductive-deductive reasoning ante-hoc to generate and evaluate its own hypotheses to justify its prediction. The explanation adheres to conventions in the target application domain and represents domain hypotheses whether visually or verbally.

[0068] Fig. 1 illustrates how explaining with Diagrammatization can reduce the interpretability burden for users through three capabilities:

Diagrammatization = (i) Peircean abduction process + (ii) Domain conventions + (iii) Peircean diagrams

Inferential Reasoning

[0069] This section introduces the human reasoning processes of abductive and diagrammatic reasoning, to distinguish their nuances from reasoning processes and representations typical in XAI. On observing an object or event, people engage in various reasoning processes. Philosopher Charles S. Peirce defined three types of inferential reasoning: induction, deduction, and abduction.

[0070] Fig. 2 shows processing framework of the three types of reasoning demonstrated on a simplified, pedagogical scenario of inferring cats and dogs. For pedagogical clarity, a stylized scenario of recognizing cats and dogs based on ear shape, and concept-based is used rather than causal explanations. Induction: People infer general rules of objects and events by using inductive reasoning. In Fig. 2 (lower-left), after observing several instances of the same label (cat or dog), one infers the rules that cats have pointy ears (Cat => Pointy Ears) and dogs have floppy ears (Dog => Floppy Ears). Machine learning trains models using induction. Deduction: This process uses predefined rules for inference. Fig.

2 (middle) shows that deductive reasoning starts with predefined rules (Cat => Pointy Ears, Dog => Floppy Ears) and evaluates them against the observation. For example, Cheshire has ears that are pointy (Pointy Ears ~ Cheshire’s Ears), i.e., not floppy (Floppy Ears Cheshire’s Ears). When dealing with continuous variables (vs. discrete), each rule is evaluated by a continuous score to determine its likelihood. Abduction: Abduction can be defined as "inference to the best explanation"; instead of inferring a label, abduction infers the underlying reason, which could be causal or non-causal. Peirce and later Popper describe abduction as "guessing" hypotheses that need to be evaluated for plausibility. Combining abduction with deduction supports the hypothetico-deductive reasoning method of forming and testing hypotheses. This is equivalent to the Peircean abduction process that Hoffman et al. and Miller highlight as relevant to XAI. The Peircean abduction process is elaborated as follows:

I. Observe event, noting relevant cues for further reasoning.

II. Generate plausible explanations as potential causes of the observation, e.g., identities, states, diseases. The present disclosure focuses on a set of explanations that are known a priori, rather than generated creatively.

III. Evaluate and judge plausibility of explanations by applying a system of rules via deduction to compare evaluation results. This allows ranking of how well each explanation fits the observation (from Step I).

IV. Resolve explanation by using the best inferred explanation with the best fit to the observation.

[0071] Fig. 2 (top) illustrates the abduction process as follows: I) observation is made on the cat Cheshire, II) a hypothesis is derived that Cheshire could be a cat or a dog. Ill) Next, the hypothesis is evaluated by performing deduction on each of the rules for differentiating cats and dogs (i.e., Cat => Pointy Ears, Dog => Floppy Ears) against the observation, and determine that the ears are more pointy (Pointy Ears ~ Cheshire’ s Ears) than floppy (Floppy Ears Cheshire’s Ears). IV) Hence, it is resolved with the ear evidence that Cheshire is most probably a cat (Cheshire := Cat). This illustrates how people use abduction to retrieve and test hypotheses to understand their observations.

Diagrammatization as a General XAI Representation for Domain-specific Conventions

[0072] With reference to Figs. 3 A and 3B, the present disclosure describes the current XAI methods based on visualization and verbalization, articulates how Diagrammatization is a broader paradigm encompassing both, and how diagrams can be more expressive and constrained to efficiently convey hypotheses and concepts to domain experts.

[0073] Leveraging visualization to augment human cognition, many XAI techniques are rendered in visual form. These techniques can be organized into four broad categories ((a) to (d) below) based on their semantic structures rather than visual format: a) Model-free explanations use generic, off-the-shelf, low-level visualizations. These explanations assume linear or univariate relationships between variables, and are meant to be accessible to a broad audience, though lay users may struggle to comprehend them. Techniques include bar charts to show feature attributions, and weights of evidence; extensions use point clouds to show data distributions, or violin plots to show uncertainty; line graphs to show nonlinear relationships, which can be estimated with partial dependence plots, modeled with generalized additive models (GAM), etc.; scatter plots to show multivariate relationships and clusters; and saliency maps to show important regions as heatmaps on images or highlights on text. b) Model-based explanations visualize the data structure of the prediction model or a simplified proxy. Such explanations use graph network or rule-based data structures, which are complex but known to data scientists. Techniques include: Neural network activations, canonical filters in CNNs, or distilled networks; and decision trees to show nodes and decision branches to explain system decisions, medical diagnoses, and step count behavior. c) Example-based explanations retrieve examples that are similar, contrastive, or even adversarial for users to compare with the current observation. Such explanations are typically visualized in native format, e.g., images instead of charts. d) Concept-based explanations increase interpretability by explaining with semantically meaningful concept vectors, conceptual attributes, or relatable cues. Interactive editing also helps with understanding.

[0074] Instead of visual representations, explanations can also be written (or spoken) verbally (i.e., verbalization). This is done with logical syntax (symbolic) or more "naturally" with text. Verbalization explanations can be organized into different categories as follows: a) Symbolic explanations use mathematical notation to describe logical relationships. Since math is written sequentially, it is sentential and verbal. Rules are popular to explain the Al’s decision logic and can be simplified with various regularizations. They are useful to provide counterfactual explanations. Formal logic can also provide abductive explanations with prime implicants and constraining deep models towards abductive rules, though these explanations remain highly mathematical and inaccessible to non-technical users. b) Template-based text explanations are a straightforward way to convert symbolic expressions into text with a mapping function. They produce text explanations with fixed terms and sentence structures. c) Natural Language Generative (NLG) explanations are "natural" by emulating how humans communicate and explain. These are trained by showing a machine state (e.g., game state, text and hypotheses) to human annotators who rationalize an explanation. Training is labor intensive, and yet may be spurious, since human annotators reason independently of the machine. Moreover, annotation correctness cannot be easily validated. [0075] Ehsan et al.’s definition of “rationalization” is particularly instructive: an NLG explanation justifies a model’s decision "based on how a human would think", but does "not necessarily reveal the true decision making process". In contrast, Diagrammatization extends this to include visual diagrams and also reveals the true decision making process of the Al.

[0076] Peirce considered diagrams as a general framework that encompasses graphical (visual), symbolic (equations), and sentential (verbal) representations with several elements: an ontology that defines the entities and their relations, conventions that prescribe how to interpret diagrams, and rules to evaluate experiments. Hoffman determined 5 steps for diagrammatic reasoning which align with the Peircean abduction process:

I. Construct a diagram by means of a consistent system of representation.

II. Perform experiments upon this diagram according to the rules of the chosen system of representation.

III. Note the results of those experiments.

[0077] Analysis of the above described representations by their consistency and rules is provided. As shown in Fig. 3B, generative text verbalization is open-ended with low consistency and rules implicit to language and tacit knowledge. Symbolic and templatebased verbalizations have high consistency and are bounded formally or implicitly to rules due to their predefined structure. Model-free and example-based visualizations have low consistency to render any data that fit their formats, though examples are bounded by natural variations. Concept-based visualizations are more consistent to restrict to fixed concepts. Model-based visualizations and diagrams have high consistency and formal rules that obey conventions of their formats.

[0078] The header terms used in Fig. 3B are explained below. [0079] ‘ ‘Level of states” describe whether the representations can be "analog" (categorical) or "digital" (continuous). Verbalizations impose categorical representation, while visualizations and diagrams also support continuous quantities.

[0080] “Homomorphism” refers to how analogous the diagram is to the represented domain. Verbal representations have low homomorphism, since people have to translate text to symbols and structures. Model-free visualizations may have formats irrelevant to the domain (e.g., spectrogram of heart sounds). Model-based visualizations may be homomorphic with the domain if chosen appropriately. Example-based visualizations of instances in their native format are highly homomorphic. Concept-based explanations must be interpreted verbally, so have low homomorphism. Diagrams can represent physical notions that are familiar to domain experts, thus can have high homomorphism.

[0081] ‘ ‘Content expressivity” refers to whether the representation limits information expressiveness. Generative text verbalization is unbounded, since any text could be predicted. Model-free and example-based visualizations limit the visual format, but any relevant value can be rendered. Other representations are bounded by the graphical, symbolic, or template formats. High expressivity is useful to show nuances for experts, but is overwhelming to non-experts.

[0082] ‘ ‘Inherent constraints”: All representations share extrinsic constraints of the represented domain, but can impose differing inherent constraints. Generative text verbalizations can include any words, so have no inherent constraints, while templatebased text is bounded to the taxonomy in the template. Concept-based visualizations are also constrained by the taxonomy of concepts. Model-based visualizations are constrained by topological structure (e.g., decision tree). Diagrams can be constrained by topological or geometric constraints (e.g., physics, time sequence).

[0083] Since various users may require various explanation types under various conditions, Diagrammatization supports multi-faceted explanations, namely: a) Abductive explanations to select the best-fitting hypothesis and show that as the explanation for the prediction. b) Contrastive explanations to describe the evidence for alternative model outcomes. By generating a hypothesis for each outcome (cause), this can show how well (or poorly) each hypothesis fits the current instance observation. c) Counterfactual explanations to propose changes to input values to predict another outcome. Though used mostly for symbolic reasoning with tabular data, this can also be used for unstructured data (e.g., images). d) Case (example-based) explanations to show examples of similar or counterfactual predictions for comparison. Note that case explanations do not represent abductive reasoning for the current instance, but for other cases. They allow one to examine the similarity in the structure of hypotheses, to be used for reference, but this is not as definitive as abductive reasoning on the current observation.

Domain Application: Clinical Background

[0084] Cardiovascular diseases cause an estimated 17.9 million worldwide deaths, accounting for 32% of deaths in 2019. The present disclosure aims to develop an early diagnosis Al system for heart disease to augment clinicians with deficient auscultation skills. When predictions impact people’s lives, it is critical to provide explanations for review by relevant experts. The following sections describe the background to clarify how Al explanations based on the Diagrammatization framework are clinically-relevant for practicing clinicians.

Heart Auscultation

[0085] Fig. 4 (Upper-Left) shows a partial heart cycle with blood flowing into the left atrium, pumped into the left ventricle through the mitral valve, and pumped out through the aortic valve. Valves prevent blood from flowing backward. Their closing produces a "lub-dub" sound: the 1st heart sound (commonly termed as SI) "lub" is from the mitral valve, and the 2nd heart sound (commonly termed as S2) "dub" is from the aortic valve. SI and S2 demarcate the systolic (between SI and S2) and diastolic phases of the heart cycle. In heart auscultation (a standard first-line diagnostic approach), the clinician uses a stethoscope to listen for normal or abnormal sounds, and makes initial cardiac diagnoses by listening only.

Murmur Diagrams to Diagnose Cardiac Valvular Diseases

[0086] Abnormal heart sounds — "murmurs" — may indicate heart disease. Clinicians make diagnoses by listening to changes in loudness. These are commonly represented in murmur diagrams (see Fig. 4, Right). There are four prevalent cardiac diseases: aortic stenosis (AS), mitral regurgitation (MR), mitral valve prolapse (MVP), and mitral stenosis (MS).

[0087] 1) Aortic Stenosis (AS): the aortic valve leaflets stiffen due to calcification (Fig. 4, Lower- Left), narrowing the valve opening (i.e., stenosis), resulting in a high-pitched noise that increases in loudness as the valve opens and decreases as it closes. This produces a crescendo-descresendo murmur during the systolic heart phase, visualized as a diamond shape (Fig. 4(a)). In severe AS, the shape apex shifts later and is lower, due to delayed valve closure and weaker heart performance (Fig. 4(e)).

[0088] 2) Mitral Regurgitation (MR): the mitral valve fails to fully close, allowing blood to flow backward (i.e., regurgitation). This reverse flow is heard as a constant, high- pitched murmur during the systolic heart phase, and is visualized as a uniform low amplitude sound (Fig. 4(b)). Sometimes, the mitral valve remains closed until mid-systole (Fig. 4(f)).

[0089] 3) Mitral Valve Prolapse (MVP): the tendons keeping the mitral valve closed fails, causing the valve to pop open (prolapse), allowing blood to regurgitate. This opening is heard as a mid-systolic “click”, visualized as a vertical line (Fig. 4(c)). Often the regurgitation is audible as a uniform, high-pitched murmur, which is MVP with MR (Fig. 4(g)).

[0090] 4) Mitral Stenosis (MS): the mitral valve leaflets fuse (i.e., stenosis) due to rheumatic heart disease, reducing blood flow during the diastolic heart phase (Fig. 4(d)). After the S2 “dub”, the valve snaps open with a “click” sound, enabling large blood flow, followed by a decrescendo as flow reduces, then a constant low-pitch “rumble”, and a crescendo before the next SI. Severe MS has an earlier “click” during diastole and longer murmur decrescendo (Fig. 4(h)).

[0091] Note that heart auscultation can only provide an initial diagnosis of cardiac diseases, since it is based on resultant audio evidence. Through auscultation, the clinician can narrow down to distinct types of cardiac diseases and decide if specific elective clinical tests are needed to evaluate biomolecular changes. As these follow-up tests are costly and require long wait to schedule (e.g., echocardiogram, invasive angiogram), the low-cost and fast turnaround of heart auscultation makes it an important first-line diagnostic approach.

Abductive-Deductive Inference of Best Murmur Shape Explanation for Cardiac Diagnosis

[0092] With the aforementioned medical knowledge, the domain expert can diagnose using abductive-deductive reasoning. Fig. 5 shows the Peircean abduction process being applied to cardiac diagnosis. On hearing a heart sound, the clinician I) observes an abnormal murmur 502, II) abductively hypothesizes possible diagnoses (N, AS, MS, MVP, MR) with corresponding heart cycle phase and murmur shape, III) deductively evaluates all hypotheses by whether the murmur heart phase was systolic or diastolic and whether the murmur fits certain shapes, and IV) resolves the diagnosis as AS based on the evidence that the murmur is systolic and best fits the crescendo-decrescendo shape (A).

Current XAI for Medicine and Heart Auscultation

[0093] Embodiments of the present invention use Diagrammatization to imbue medical expertise into XAI. In particular, embodiments of the present invention focus on Al for diagnosing cardiac disease. In this regard, much work has been done on electrocardiogram (ECG) data and less on phonocardiograms (PCG) of heart auscultations. Yet, the few works on PCGs focus on classifying normal or abnormal sounds or segmenting time. These lack clinical usefulness, since they do not provide a differential diagnosis to rank multiple plausible diagnoses. Work on XAI for PCGs is even more sparse, focusing on saliency maps on spectrograms. It is described in later sections of the present disclosure how clinicians are unconvinced with this format.

Diagrammatization for Murmur Diagrams

[0094] The complexity of biological processes demands diagrammatic reasoning in medicine. Furthermore, clinical diagnosis is indeed a form of abductive reasoning, where the clinician infers the best disease cause (explanation) based on symptoms (observation). Therefore, heart auscultations and murmur diagrams provide an ideal use case to study and demonstrate Diagrammatization. Details on the characterization of murmur diagrams in terms of the Diagrammatization design dimensions is provided as follows:

• Consistent system of representation (ontology): Key concepts are audio volume (amplitude) over time, normal "lub" (SI) and "dub" (S2) sounds, and abnormal murmur sounds. Murmurs, can be systolic or diastolic, have shape categories with specific slopes (crescendo, decrescendo, uniform) and may include "clicks".

• Rules to interpret representation: Base: represent heart sounds with phonocardiograms (PCG) and draw amplitude over time. Annotations: SI and S2 positions are demarcated as tall rectangles, and murmur shapes are drawn with multi-part straight lines. These conventions help with drawing, reading, and evaluating the diagrams.

• Categorical and continuous level of states: For each diagnosis, the murmur shape must fit a categorical profile, but there is some flexibility (e.g., slope steepness, time span length) to support continuous variation in observations.

• Bounded content expressivity: Murmur diagrams emphasize murmur shapes, and are bounded to show the amplitude. They do not represent other information, such as pitch, stethoscope position, and sound radiation.

• High physical and conceptual homomophism: All clinicians are trained to interpret murmur diagrams, these diagrams can be overlaid on PCGs and intuitively represent how sound volume changes over time. • Geometrical inherent constraints: Murmur shapes are geometrically constrained to be between S1-S2 or S2-S1 and have positive, negative, or flat slopes. The shapes should also fit the amplitude data optimally.

[0095] These describe how diagrams are expressive, constrained, and conventional to convey murmur shape hypotheses from heart sounds to explain cardiac diagnosis. The technical approach for the Al to perform abductive and diagrammatic reasoning, and generate diagrammatic explanations is described in the following sections. By demonstrating the Al’s independent ante-hoc reasoning, which is clinician-like, embodiments of the present invention aim to increase its trustworthiness for clinicians.

Technical Approach

[0096] An explainable model to predict cardiac diagnosis from phonocardiograms (PCG) is disclosed. Following clinical practice, the model generates diagrammatic explanations with murmur diagrams based on sound. In the following sections, the data source, data preparation, baseline modeling, problem formulation of murmur shapes, DiagramNet model, and alternative model for cardiac diagnosis prediction are described.

Heart Auscultation Dataset, Data Preparation, and Annotation

[0097] Models to predict cardiac diagnoses are trained using the dataset by Yaseen, Gui- Young Son, and Soonil Kwon, 2018, Classification of heart sound signal using multiple features, Applied Sciences 8, 12 (2018), 2344. The dataset comprises 1000 audio recordings of heart cycles, each data was 1.15-3.99s long, and sampled at 8 kHz. In the dataset, there are 200 recordings of each disease/diagnosis: normal (N), AS, MR, MVP, and MS.

[0098] The 1000 recordings are pre-processed into 14.7k instances which is sufficiently large for deep learning (achieving 86.0% for a base CNN, and 96.8% for our proposed model). Each .wav audio file is processed into multiple time series ID tensors. To classify auscultations starting at any time point, instances are created based on sliding windows, with window length 1.0s (8000 samples) and stride 0.1s. The window length can be chosen such that each instance will likely contain only one heart cycle with 0 or 1 murmur, thus simplifying predictions. In total, there are 14,672 instances, which are split into training and test sets with a 50% ratio. Further, it was ensured that all time windows for the same original audio files only occur in the training or test set, not both. The amplitude of the audio time series a = d(x) is extracted to estimate murmur shapes.

[0099] The dataset only contains diagnosis labels, but lacks annotations about murmur locations. Thus, the segments of when the murmur occurs are manually annotated to derive and T L as the murmur start and end (last) times, respectively. These are used for the supervised training of the murmur segment predictions. This segment is then fitted to a nonlinear function describing the correct murmur shape to the data. This provides ground truth estimates of shape parameters 3 = (r, n), where r and n are time and slope parameters, respectively. Details are described later.

Base Prediction Model for Cardiac Diagnosis Prediction

[0100] In the present disclosure, each audio time series is treated as a one-dimensional image, since all instances are fixed-length and single-channel. The displacement x and amplitude a are then concatenated into a 2-channel "image". To assess the performance of the proposed model in accordance with embodiments of the present invention, a base convolutional neural network (CNN) as model M o on (x, a) to predict cardiac disease y 0 is trained (see Fig. 6). Although M o can indicate the probability-like disease risk of diagnoses, this could be spurious, since it does not consider the murmur shapes of each diagnosis. Instead, a more reliable model should explicitly encode constraints that are domain relevant.

Formalization of Murmur Shapes as Piecewise Linear Functions

[0101] To enable the model in accordance with embodiments of the present invention to predict various murmur shapes, these shapes are formulated as parametric nonlinear functions over time, y(t). Since murmur shapes are defined with crescendo, decrescendo, and uniform slopes, each slope is approximated as a line. Thus, the total murmur shape can be modelled as a piecewise linear function, instead of other less relevant families of functions, e.g., sum of polynomials (Taylor series) or sine/cosine (Fourier series). A Taylor series would include spurious artifacts due to the mathematical fit being clinically irrelevant, and be unintuitive to interpret for non-mathematical applications. A Fourier series, which spectrograms actually represent, would capture important frequency information in murmurs, but does not emphasize the recognizable murmur shapes.

[0102] All candidate murmur shapes share the murmur segment start τ 1 and end τ L time parameters, but can have varying number of time r and slope π parameters depending on the complexity of the shape. Crescendos are modeled as lines with positive slope, decrescendos as lines with negative slope, and uniform with 0 slope. Fig. 7 illustrates the murmur shapes mathematically with relevant parameters, and their shape function f y (t) equations, where [] represents the Iverson bracket, which is 1 if it’s internal expression is true and 0 otherwise:

1) Normal (N), i.e., absence of AS, MR, MVP and MS, has no murmurs, so murmur segment start τ 2 and end τ L are undefined 0, and f N (t) = 0 by definition.

2) Aortic stenosis (AS) has a crescendo-decrescendo murmur defined with positive slope π 1 from τ 1 to τ 2 and negative slope -π 2 from τ 2 to τ L . The vertical position of the shape is anchored by the intercept term π 0 .

3) Mitral regurgitation (MR) has a uniform murmur between τ 2 and τ L at amplitude level π 0 .

4) Mitral valve prolapse (MVP) murmurs start with a "click" which we model as a short crescendo-decrescendo with slopes π 1 and -π 1 from τ 1 through τ 2 to τ 3 . The uniform murmur spans from τ 3 to τ L with 0 slope. If there is no subsequent MR murmur, then the region with uniform slope would just have 0 amplitude.

5) Mitral stenosis (MS) has a very similar shape to that of MVP, but it ends with a crescendo with positive slope π 2 from τ 4 to τ L . Also note that this murmur happens in the diastolic heart phase, not systolic. DiagramNet: Diagrammatic Network with Abductive Explanations of Murmur Shapes

[0103] DiagramNet is a deep neural network meta-architecture to infer a prediction and perform abductive-deductive reasoning to infer the best explanations that is consistent with the observation and prediction. While standard neural networks tend to learn spurious and unintelligible neural activation, the modular approach in accordance with embodiments of the present invention satisfies domain constraints. This shares a similar goal as physics-informed neural networks (PINNs), but human mental models are encoded instead of physical laws. In the present disclosure, the DiagramNet is implemented for the application of diagnosing cardiac diseases as way of example. It should be noted that the DiagramNet is generalizable to other domains with formalized hypotheses.

[0104] Fig. 8A shows a schematic diagram of an exemplary modular architecture of the DiagramNet described above. As shown in Fig. 8A, the DiagramNet model has seven stages that correspond in sequence to the 4-step Peircean abductive reasoning process described previously. Details of each of the seven stages are described below.

[0105] Stage 1) Audio displacement and amplitude inputs: The DiagramNet receives the audio data as displacement x, extracts amplitude information a, and concatenate them as a 2-channel ID tensor (x, a). Although the convolutional layers of the CNN previously described in the “Base Prediction Model for Cardiac Diagnosis Prediction” could learn frequency information from x, explicitly computing a makes it easier for the model to learn patterns from amplitude.

[0106] Stage 2) Murmur segmentation: Next, the 2-channel ID tensor (x, a) is input into a U-Net model Mm to predict the time region of the murmur in, defined as a mask vector. U-Net, a popular model for image segmentation, is applied to the time series data by treating the data as a ID "image" and leveraging its convolutional filters to find temporal motifs as spatial patterns. However, it may suffer from over-segmentation by inferring multiple regions of murmurs in a single instance, although there should only be one. To avoid over- segmentation, a smoothing loss regularization can be applied using the truncated mean squared error: c), where e t = (log m(t ) - log (m)(t - l)) 2 is the squared of log differences and e is the truncation hyperparameter. However, this may still result in > 1 segments in a single instance, hence the segment with the longest time span among remaining segments is chosen.

[0107] Stage 3) Initial prediction: In this stage, the embedding representation learned from murmur segmentation is fed into fully-connected layers F y to predict an initial diagnosis - This would be more accurate than a base CNN, since it benefits from the added multi-task learning to predict m too. This is equivalent to System I thinking of the dual process theory that is intuitive and quick, while the full Peircean abduction process is equivalent to System II thinking that is rationale and slower. Additionally, inference can be made to the murmur heart phase as diastolic or not (systolic).

[0108] Stage 4) Hypotheses initialization: In this stage, possible hypotheses y (N, AS, MR, MVP, MS) are enumerated and their corresponding murmur shape functions fy(t) and parameters retrieved. First, the murmur amplitude is extracted by applying murmur segment m as a mask on the full amplitude, i.e., , where ° is the Hadamard element-wise multiplication. Murmur shapes focused on this region from can then be estimated. The shape parameter values are initialized using heuristics based on typical characteristics defined in Fig. 9 (it should be noted that, in Fig. 9, a represents all amplitudes in the observation, is the average amplitude during the murmur, a(t) represents the amplitude at time t, Δτ 12 = τ 2 — τ 1 ; Δτ 4L = Δτ L — Δτ 4 , and 0, 5 (Δτ) represents the median of all training set instances for Δτ). These do not have to be very accurate, since they can be optimized at a later stage. A brief description of the heuristics is provided as follows:

1) Normal (N): No parameters are estimated, since no murmur is expected.

2) Aortic stenosis (AS): The apex of the crescendo-decresendo is estimated to occur at the time τ 2 = argmax t (a) of highest amplitude max(a). π 0 is just the amplitude at , it! is the slope from the murmur start to apex, and π 2 ~ it! .

3) Mitral regurgitation (MR): The shape is a flat line at the average amplitude of the murmur segment 4) Mitral valve prolapse (MVP): The apex at T 2 is estimated in the same way as for AS, τ 3 to occur at twice the distance from to τ 2 . n 0 and π 1 are calculated the same way as for AS.

5) Mitral stenosis (MS): Estimating time parameters is poor using heuristics, hence a data-driven approach is adopted by using the median of time differences from the training dataset. These are calculated for τ 2 and τ 4 relative to the murmur start and end τ L times. τ 3 , π 0 and π 1 are calculated the same way as for MVP.

[0109] Stage 5) Hypotheses optimization: Using the initial shape parameter values , the murmur shapes are computed for all diagnoses ). These may not fit well, so the fit is optimized using Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L- BFGS) algorithm to minimize the shape fit MSE, i.e., This optimization is similar to approaches used in activation maximization and CLIP, and obtains murmur shapes for all diagnoses that best fit the murmur amplitude.

[0110] Stage 6) Hypotheses evaluation: The MSE lack-of-fit is calculated for each murmur shape function for each diagnosis y to the amplitude of the inferred murmur segment , i.e., . This stage performs deductive reasoning to evaluate how well each murmur shape fits the observed instance. It is noted that more expressive shape functions may subsume simpler ones, i.e., leading to hypothesis overfitting. For example, Fig. 14(d) shows MS overfitting for MVP; but this omits the rule that MS murmurs only occur in diastole not systole, which can distinguish between MVP and MS. Hence, the model can leverage the inferred heart phase to resolve hypothesis overfitting. To do so, d and are inputted to fully- connected layers f y to predict the hypothesis-driven prediction f yh . This inference only uses shape fit and heart phase information, and does not need detailed amplitude information.

[0111] Stage 7) Final combined prediction: Finally, ensemble learning is performed with fully-connected layers f y using both the hypothesis-driven h and initial predictions to predict the final diagnosis, i.e., - This completes the model decision making process as: a) initial prediction, b) explanatory hypothesis and evaluation, c) resolved prediction and explanation.

[0112] In other words, embodiments of the present invention provide a method for predicting a type of a cardiac disease (y) in a subject. The method can be implemented with system 800 shown in Fig. 8B, which shows a schematic diagram of the system 800 for predicting a type of a cardiac disease (y) in a subject. The system 800 can include a processing device 802. With reference to Fig. 8C, the method 810 includes the following steps.

[0113] Step 812: receiving, by a processing device, a vector representation of a time series data (x) associated with an activity of a heart of the subject.

[0114] Step 814: extracting, using the processing device, a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A).

[0115] Step 816: generating, using the processing device, a prediction of a vector representation of a mask (in) based on the vector representation of the amplitude (a) using a second neural network (M M .

[0116] Step 818: applying, using the processing device, the vector representation of the mask (in) on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest (cz m ).

[0117] Step 820: determining, using the processing device, a set of shape parameters (0o) of the amplitude in the region of interest (a m ) based on a set of heuristics (0o), wherein the set of shape parameters (0o) defines a shape function (h m ) of the amplitude in the region of interest (cz m ), wherein the shape function (h m ) is based on a predetermined shape function representing the cardiac disease (y) in the subject.

[0118] Step 822: performing, using the processing device, shape fitting of the shape function (h m ) with the amplitude in the region of interest (a m ). [0119] Step 824: calculating, using the processing device, a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject.

[0120] With reference to Fig. 8D, the method 830 may further include a step 832 of iteratively: performing, using the processing device, optimization of the set of shape parameters (0 O ) using Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm to obtain an optimized set of shape parameters (0), wherein the optimized set of shape parameters (0) defines an optimized shape function of the amplitude in the region of interest (a m ), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject, wherein the method further comprises: performing, using the processing device, shape fitting of the optimized shape function of the amplitude in the region of interest (h m ) with the amplitude in the region of interest (a m ) such that the MSE lack of fit (d) is minimized.

[0121] As shown in Fig. 8E, the method 840 may further include a step 842 of wherein generating, using the processing device, the prediction of the vector representation of the mask (in) further comprises generating an embedding vector representation of the mask (z m ), the method may further include generating, using the processing device, an initial prediction (yo) of the cardiac disease (y) based on the embedding vector representation of the mask (z m ) using a third neural network (F y ).

[0122] With reference to Fig. 8F, the method 850 may further include a step 852 determining, using the processing device, a phase of the heart cycle (< >o) based on the initial prediction (yo).

[0123] The method 850 may further include a step 854 of generating, by the processing device, an intermediate diagrammatic explainable prediction (yh) of the cardiac disease (y) in the subject based on the phase of the heart cycle (< >o) and the calculated MSE lack of fit (d) using a fourth neural network ( y h). [0124] The method 828 may also further include a step 856 of generating, by the processing device, a final diagrammatic explainable prediction (y) of the cardiac disease (y) in the subject based on the initial (yo) prediction and the intermediate (yh) diagrammatic explainable prediction of the cardiac disease (y) in the subject using a fifth neural network (F y ).

[0125] In addition, it should be understood that the same architecture may be used for the aforementioned neural networks. For example, the same base CNN model previously described may be used for the aforementioned neural networks. Further, as described in the later sections, a skilled person would understand that the aforementioned methods are not limited to be only applied to the prediction of the type of the cardiac disease but may also be applied in other suitable applications.

[0126] In summary, the technical approach (Stages 1-7) follows the Peircean abduction process:

I. Observe event by observing displacement to interpret its amplitude (Stage 1), perceive the murmur location (Stage 2), and infer the heart phase in which the murmur occurred (Stage 3).

II. Generate plausible explanations by enumerating diagnoses, retrieving respective murmur shape functions, and initializing their corresponding shape hypotheses (Stage 4).

III. Evaluate plausibility by fitting each hypothesis to the observation (Stage 5), evaluating the rules in terms of shape goodness-of-fit in conjunction with matching the murmur heart phase (Stage 6).

IV. Resolve explanation with the hypothesis-fitted inference (System II thinking) and the initial inference (System I thinking) to make a final inferred diagnosis (Stage 7).

Alternative Prediction model with Spectrogram Input and Saliency Map Explanation

[0127] Fig. 10 shows a spectrogram-based CNN classifier. Spectrograms are popular to extract features from high-frequency time series data, and have been used to extract features from heart auscultation. They show how the frequency (pitch) of the signal (y- axis) changes over time (x-axis), by indicating the magnitude of specific frequency components at each pixel. In the present disclosure, each spectrogram is represented as a 3D tensor (frequency, time, magnitude) with raw magnitude numeric values, rather than as a colored 2D image that would be biased by the color map used. To capture the temporal motifs of frequency patterns in spectrograms, a CNN is trained to learn convolutional filters that activate if specific spatial patterns are detected. Specifically, the mel spectrogram s is used, since it is more sensitive to variations in lower frequencies.

[0128] Saliency maps are popular to explain which pixels were important for imagebased predictions with CNNs. For spectrograms, this indicates the frequencies at specific times that the model focused on. In the present disclosure, Gradient-weighted Class Activation Mapping (Grad- CAM) is implemented to generate saliency explanation e s . Despite their popularity, it is argued that using saliency maps neglects the interpretability needs of domain experts. Specifically, clinicians are not trained on interpreting spectrograms, thus it is hypothesized that this saliency map, spectrogram-saliency, is less appropriate than the murmur-shape diagrammatic explanation. A simplified saliency map explanations are provided in the present disclosure to show importance by time, time- saliency e t , by aggregating all saliency across frequencies Ef. This is simpler and does not require the user to understand spectrograms or note frequencies.

[0129] Despite their use for Al heart auscultation, explaining with saliency maps on spectrograms is not advocated. Indeed, they are overly technical, non-relatable, less interpretable, and less trustworthy. Instead, clinical Al explanations need to be abductive and diagrammatic to be more trustworthy.

Evaluations

[0130] Evaluations on Diagrammatization and DiagramNet are made in multiple stages as follows: 1) a demonstration study showing the interpretability of abductive explanations; and 2) a quantitative modeling study comparing DiagramNet against the baseline CNN models and other reasonable approaches. Demonstration Study: Predictions and Explanations

[0131] The following section demonstrates how Diagrammatization can produce abductive (best), alternative contrastive, counterfactual, and case explanations. a) Abductive explanations: DiagramNet selects the most consistent murmur shape with its prediction, thus inferring to the best explanation. Fig. 11 shows the best explanation for each diagnosis type. Users can see the predicted murmur segment shape as highlighted, and how the shape fits the amplitude time series optimally. b) Contrastive explanations: Fig. 12A shows contrastive explanations for a case with MVP. The murmur shape for MVP and MS fit best among all diagnoses. However, note that the MS murmur shape function has a higher degree than for MVP, so it overfits to this murmur data. Furthermore, the murmur segment occurs during the systolic heart phase, not diastolic, so this cannot be an MS murmur. Thus, the diagnosis is MVP. See, Fig. 12B for predicted shape parameters and goodness-of-fit MSE for each murmur shape (it should be noted that, in Fig. 12B, 0 indicates no parameter for that specific diagnosis. Bold numbers indicate evidence to predict towards the row’s diagnosis. T 4 for the MS diagnosis is redundant, since T 4 = T L ; this indicates that the MS shape overfits to the data, both MVP and MS have the same MSE, and the MVP shape is sufficient). c) Counterfactual explanations: These can be derived from the contrastive explanations to show how the murmur amplitude could be slightly different to be predicted as due to another diagnosis. See Appendix Fig. 13 for examples. d) Case (example-based) explanations: Instances that have good fits for specific murmur shapes can be retrieved. Fig. 14 demonstrates several cases of the crescendo-decrescendo murmurs representative of AS. These can be used to compare with the current case, to review how diverse the shapes of the same hypothesis could be.

[0132] Each diagram in Figs. 11 to 14 is not in-and-of-itself an explanation or abductive reasoning, but has to be interpreted in context of knowing that the Al can do diagrammatic, abductive reasoning. A user can be told how DiagramNet performs abductive-deductive reasoning in general: I) on observing the heart sound, II) hypothesize multiple diagnoses (AS, MR, MVP, MS) and conceive corresponding murmur shapes, III) evaluate how well the shapes fit the observation, and IV) resolve the most-likely diagnosis due to its best fit (MVP for Fig. 12A).

[0133] For the specific case in Fig. 12 A, the user can interpret the diagrams as follows: la) Given the observed PCG, lb) DiagramNet predicts a diagnosis y = MVP. 2a) The user asks for an explanation, and 2b) is presented with a murmur diagram as illustrated in Fig. 12A(c), which visualizes what the Al predicted as the most likely explanation, i.e., the murmur shape that best fits the murmur amplitude. 3a) The user may ask for contrastive explanations of why alternative diagnoses were not made, and 3b) be shown multiple murmur diagrams with alternative hypotheses and their poor fits (Figs. 12A(a), 12A(b), 12A(d)). To supplement the murmur diagram explanation, the system could provide a verbal explanation with physical causes: e.g., “Mitral valve prolapse (MVP) is suspected from this phonocardiogram, because the tendon chords of the mitral valve are likely loose, causing the valve to snap open during systole (heard as a mid-systolic "click" sound), and allowing blood regurgitation (heard as a uniform low volume, high-pitch murmur). Further examination by echocardiography is recommended to confirm diagnosis.” Fully interpreting the abductive process requires seeing the evaluation of multiple hypotheses (Fig. 12A), while each diagram in Fig. 11 only gives an abbreviated view showing the best fit. Hence, from the diagrammatic explanations, the clinician can verify that DiagramNet performed hypothesis testing of multiple diagnoses and selected the most likely diagnosis based on evidence that its corresponding murmur heart phase and shape best-fit the audio observation.

Modeling Study

[0134] Since the DiagramNet M is implemented as an ante-hoc explainable model, imbued with hypotheses (i.e., murmur shapes for each diagnosis), it is expected to perform better than other less-knowledgeable models. In this section, a quantitative study is conducted by comparing the prediction performance and explanation faithfulness of DiagramNet against other models. First, the models compared, evaluation metrics, and results are described below. [0135] For models that are a subset of DiagramNet (M o and M m ), this also serves as an ablation study to examine how adding new architectural features improve performance. The models evaluated are:

1) M 0 (x, a) = , base CNN model trained on displacement x and amplitude a to predict diagnosis.

2) M s (s) = ’ base CNN model trained on spectrogram s to predict diagnosis.

3) M τ (X, a) = ), multi-task model to predict diagnosis, and murmur segment start and end times. This is trained with supervised learning from y labels and τ = (τ P τ L ) annotations. This does not consider spatial information.

4) M e (x, a) = , multi-task model to predict diagnosis, and murmur shape parameters. This is trained with y labels and θ = (τ, π ) annotations. This does not consider spatial or geometrical information.

5) M m (x, a) = , encoder-decoder model to predict diagnosis, and murmur segment. Like M τ , this identifies the murmur start and end times, but by using U-Net to predict pixel locations of the murmur. This models spatial information through the transpose-CNN layers, so it is expected to be more accurate than Mτ.

6) M am (x, a) = , encoder-decoder model to predict diagnosis, and murmur amplitude. This explanation indicates that the model can “see” where the murmur is and attempt to reconstruct it.

7) ), DiagramNet with initial and final predicted diagnoses, and hypotheses explanations h m .

[0136] The above described models are compared with DiagramNet using various measures of prediction performance (accuracy, sensitivity, specificity) and explanation faithfulness (murmur overlap, murmur parameters estimation errors). These were evaluated on a dataset of 7,262 1-sec instances. For each instance, evaluations are made based on the metrics described below: • Prediction correctness ($ better) of whether the predicted diagnosis matches the actual diagnosis, i.e., [y == y], Metrics for each diagnosis are aggregated and common metrics used in medicine are included.

° Accuracy is calculated by averaging correctness over the test set.

° Sensitivity (T P/(T P + T N )) measures how likely the model can detect the actual disease.

° Specificity (T N /(T N + F P )) measures how likely the model will not cause a false alarm.

• Murmur segment Dice coefficient better) measures the overlap between predicted m and actual m murmur segments, i.e., s T = 2(m • m)/(|m| 2 + |m| 2 ). For M T and M e that only predict parameters, in = [T 1 < t < f L ] is computed.

• Murmur segment parameter MSE (J, better) indicates how well the model pred and end T L time parameters of the murmur, i.e., E T = | |T 2 — Til l

• Murmur shape parameters MSE (J, better) indicates how well the model predicted the murmur shape function parameters, i.e., s e = 110 — 9 y \ I2, where 9 and 9 y are the actual and predicted for the correct diagnosis y.

• Murmur shape fit MSE (J, better) indicates how well the predicted murmur shape h m (or reconstructed murmur amplitude m ) fits the ground truth murmur amplitude 112

[0137] Figs. 15A and 15B show the performance of all seven models for four evaluation metrics. For base CNN models, predicting on the spectrogram (M s ) improved performance only very slightly over predicting on amplitude (M o ), suggesting that CNNs can already model frequency information with its convolution filters. Multi-task models (M T , sacrificed diagnosis prediction accuracy to predict segment and shape parameters, yet still had high estimation errors, and very inaccurate segment prediction. This suggests that merely treating parameters as stochastic variables to predict is less reliable than explicitly modeling spatial and geometric information. Encoder-decoder models (M m , M a ) performed better by more accurately predicting diagnoses than base CNN models, could reasonably locate segment regions, and had moderately low shape parameter and fit estimation errors. DiagramNet was the best performing with the highest diagnosis prediction accuracy, and very low shape parameter and fit estimation errors. Due to training with backprop from m and y, even its initial diagnosis y 0 was better than that of other models, though its hypothesis-driven prediction y h was weaker. Interestingly, despite murmur shape prediction h m being less expressive than murmur amplitude prediction m (since it predicts straight lines), its fit is still better (lower MSE). Hence, imbuing diagrammatic constraints in DiagramNet improved both its prediction performance and interpretability. Next, the diagnostic performance for each cardiac disease is examined. Fig. 16 shows the confusion matrices for the base CNN model and the three diagnostic stages of DiagramNet. Base CNN often confuses different diseases, unlike DiagramNet. Particularly, it is noted how the base CNN confuses between MVP and MS due to their similar murmur shapes. When predicting on the murmur shape fits y h , DiagramNet does confuse between systolic murmurs (AS, MR, MVP), but can accurately distinguish between MVP and MS due to considering the murmur heart phase 4> 0 . The combined diagnosis prediction y ameliorates weaknesses in the initial and fitbased predictions to produce a very clean confusion matrix. Finally, Fig. 17 shows that DiagramNet has higher final sensitivity and specificity for all diagnoses than the base CNN.

Discussion

Diagrammatization to Support Human Cognition and User Domain Knowledge

[0138] Despite myriad XAI techniques, many have neglected the domain knowledge of users, thus leaving an interpretability gap. This goes beyond supporting human-centric XAI at the cognitive level by tailoring explanations to support specific reasoning processes, cognitive load limitations, uncertainty aversion, preferences, or relatability. This goes beyond social factors, or fitting contextual situations. Diagrammatization provides a basis to support user-centric XAI that satisfies human cognition and user domain expertise. This will allow users to interpret the Al explanations at a more useful, higher level, further fostering human- Al collaboration.

[0139] The present disclosure proposes for the Al explanation to be hypothesis-driven (with murmur shapes), rather than deferring to users to abductively infer their own hypotheses. The present disclosure is focused on evaluating diagrammatic explanations against popular saliency map explanations to clearly show the latter’s poor fit. In addition to reducing the user interpretability burden for abductive reasoning, this evaluates the need to follow diagrammatic conventions of the expert domain.

[0140] Although segmentation is a common prediction task in Al, it can be appreciated that some application developers may not immediately consider them explanations. However, saliency maps are a specific form of image localization, which also includes segmentation. Thus, segmentation is a valid approach for explaining image predictions. The technical approach described in the present disclosure uses segmentation integrally for the Al prediction. A similar argument can be made for shape-fitting hypotheses being merely fitted lines. Yet, clinicians explain their diagnoses on PCGs by drawing simplified line diagrams describing murmur shapes, thus DiagramNet automates this explanatory process. The technical approach also creates an intelligent Al to apply its knowledge of known murmur shapes to real data, thus performing abduction to the best murmur shape on its own. The shape fitting is not done post-hoc after the Al has made its prediction, but rather explicitly encoded as part of its reasoning ante-hoc. Collectively, the murmur shape diagram explanations from DiagramNet mimic the clinician reasoning process to identify where the murmur is (segmentation), abductive-deductively infer the most likely murmur shape (hypothesis evaluation), and resolve the explanations to make a coherent diagnosis. Thus, with Diagrammatization, the Al can autonomously generate its hypotheses and evaluate them to derive its prediction.

Generalizing Diagrammatization to Other Domains

[0141] The approach for Diagrammatization described in the present disclosure requires tailoring to specific applications. This helps to solve practical problems more concretely. The general approach of Diagrammatization to apply to other applications is described hereafter:

1) Study the concepts and decision processes in the application domain.

• Identify the system of representation, its conventions for interpretation and rules for evaluation. • Identify the structured hypotheses for abductive-deductive reasoning.

2) Formalize the representation and rules mathematically.

3) Implement the formal specifications in a predictive Al model, taking note to identify specific stages.

4) Evaluate with domain experts to check consistency with the user mental model of the domain problem.

[0142] Abduction is inference to the best explanation, which is implemented as hypothesis fitting. This goes beyond curve fitting of line graphs, and can include rule fitting as with our conjunction with heart phases and in prior works. Abduction can also be implemented for other domains that reason with other representations. Another application to generalize Diagrammatization can include, for example, skin cancer diagnosis using computer vision and explaining with the ABCDE criteria. Instead of merely rendering a saliency map or stating concept influences for XAI, explanatory diagrams can be drawn. Asymmetry can be explained by bisecting the image and showing whether each half differs from each other more than a threshold. Border smoothness can be shown by tracing the lesion outline, calculating the curvature of the curve, and comparing against a threshold. Color can be assessed by highlighting parts of the lesion with different pigments and comparing to a threshold of contrast ratio. Diameter can be shown by drawing a bounding circle and diameter with length reading, and comparing against the 6mm threshold. Thus, the model would be more trustworthy, since it can demonstrate the same geometrical measurements as a medical expert.

Generalizing DiagramNet to Other Applications

[0143] The modularity of DiagramNet helps in its generality, since each module has a distinct purpose (see Fig. 8A). Steps I, III, and IV of the Peircean abduction process can be performed automatically in DiagramNet, while Step II requires hand-coding by the developer, since this requires creative abduction to conceive potential hypotheses which is an open research challenge. Although implementing Diagrammatization requires substantial formulation for each application, DiagramNet is generalizable to applications that use line diagrams. This requires extracting a line representation from the input instance, formulating the parametric function for each hypothesis, segmenting the diagram to identify the region to fit, and fitting the best hypothesis to the instance data. Fig. 21 shows three of such examples. a) Electrocardiograms (ECG) are another clinical diagram for cardiac diagnosis. Clinicians diagnose atrial flutter by inferring a "sawtooth" pattern (Fig. 21 (a)). Like PCG, ECG is time series with high sampling rate, but instead of extracting amplitude a from the signal wave, the ECG trace signal x can directly be used. On segmenting the region of interest, the sawtooth pattern can be modelled as a piecewise linear function. Most of DiagramNet can be reused (Stages 2, 5-7 in Fig. 7) and only input x (Stage 1) and hypothesis functions F h (0) (Stage 5) need to be redefined. b) Candlestick charts are a time series diagram used to analyse stock price. Unlike PCG, it represents time at a lower sampling rate (e.g., days, years). Each candlestick represents low, opening, closing, and high prices for each time period. Analysts look for chart patterns like “broadening top”, “descending triangle”, and “rising wedge” to anticipate how a stock would change (see column b of Fig. 21). To explain an imminent breakdown, a “descending triangle” explanation could be fit to a segment (r 1 , T L ) with two lines (x 1; x 2 ). Estimation can be made on the bottom line with the low price 10%-tile x 1 (t) = P(xi ow , 10), and hypotenuse line from the linear fit of high prices x 2 (t') = —wt + x 0 , where w and x 0 are fit from data. Changes to DiagramNet are similar as with EC, where only Stages 1 and 5 in Fig. 8A need to be changed, and hypotheses are formulated with two linear functions instead. c) Photographs of skin lesions with ABCDE annotations are an imagediagram method to diagnose skin cancer. This representation is very different from the audio time series described above. Consider the analysis of lesion Asymmetry (see column c, row i of Fig. 21) extract the lesion outline as a geometric shape via edge detection (e.g., Sobel filter), 2) reflect the outline across a bisecting axis, and 3) compute the non-overlapping area a between the original and reflected outlines. The lesion is asymmetrical if a is larger than a threshold a (i.e., a > a), thus explaining the malignant prediction. Changes to DiagramNet is also modest as it already models time as a ID image. Here, the image is a 2D tensor, the outline extraction and reflection can be implemented heuristically to extract area a (Step 1 in Fig. 7). The hypothesis a > a can be evaluated with a statistical 1-tail t-test, i.e., against null hypothesis a < a (combined Stages 4 and 5).

[0144] The aforementioned domains rely on custom diagrams for diagrammatic explanations. However, off-the-shelf visualizations may be used, subjected to inherent domain constraints. Finally, in its current implementation and from this discussion, DiagramNet can generalize to line diagrams, such as line graphs and geometric shapes. Indeed, Diagrammatization can be applied to other hypotheses that can be defined with functions. Future work is needed to investigate its general applicability and performance.

Comparing Diagrammatization to Visualization

[0145] While diagrams can refer to drawings or visualizations, Diagrammatization is defined as comprising three aspects: i) Peircean abduction, ii) Domain conventions, and iii) Peircean diagrams. Each aspect has specific benefits for XAI technical development and user experience. First, with Diagrammatization, the Al performs Peircean abductive reasoning to generate and evaluate specific hypotheses. This improves user trust, since the Al’s reasoning is human-like. This improves user experience by reducing the burden on users to have to do abduction fully on their own. Current use of visualizations in XAI are typically of convenient or basic charts and heatmaps that do not exploit domain hypotheses, thus requiring users to perform additional reasoning. There are also modeling benefits. Used in the ante -hoc approach, the hypotheses regularize and constrain the Al prediction, thus the Al’s reasoning and explanation would be less spurious than current XAI, and its prediction performance improved. Post-hoc visualization explanations do not change the original Al prediction, and hence do not improve the Al performance.

[0146] Second, diagrammatic explanations should be constrained by diagram conventions in the target domain. Thus, domain experts would be familiar with them and can interpret them efficiently and effectively. Current XAI visualizations typically use off-the-shelf charts and heatmaps that, while simple to read, are not necessarily relevant to domain experts who have been trained to use specific or sophisticated diagrams with implicit ontologies, conventions, and rules. In the present disclosure, despite murmur shapes taking the rudimentary form of line charts, this work is the first to explain with shape-based murmur diagrams that are clinically-relevant, since XAI developers over rely on traditional charts and heatmaps that are more suited for data scientists. With Diagrammatization, XAI developers are encouraged to consider how domain experts explain with their own conventions to develop more sophisticated, domain-relevant visualizations.

[0147] Third, the definition of diagrams by Pierce and other philosophers are implemented in embodiments of the present invention to encompass graphical (visualization), symbolic (math, equations), sentential (verbalization) representations (see Fig. 3A). In the present disclosure, Diagrammatization is examined with the visualization of murmur diagrams.

Comparing Diagrammatization to Rationalization

[0148] It is shown that diagrammatic explanations are useful, but other forms are also useful. For example, in heart auscultation, a clinician may explain that a patient has aortic stenosis (AS) because “the aortic valve leaflets are abnormally stiff due to calcification, causing the valve to have difficulty opening and closing, producing a crescendo- decrescendo murmur sound”. This verbal explanation is comprehensive, and actually consists of multiple representation types. How each representation of verbal explanation compares with Diagrammatization is discussed as follows.

1) The verbal explanation is a rationalization which makes assumptions beyond the direct evidence (PCG) of the observation, since the clinician did not directly observe the calcification of the aortic valve; that would require an echocardiogram to confirm, which is a different causal pathway than what is available from the first line auscultation (see Fig. 22). Besides, automatic rationalization may be spurious or irrelevant since it depends on unbounded text.

2) It only explains at the low-level from the physical concepts to murmur shape description; there remains a gap from the shape description to the audio observation. DiagramNet overlays its explanation (the murmur shape) on the audio representation (PCG) to explicitly show its hypothesis in context of the observation.

3) It is a concept-based explanation that requires the knowledge of valves, their properties and location, and the causal pathway from calcification to stiffness to valve opening/closing to murmur shape. This requires modeling knowledge bases and causal networks, which is very costly to construct, and beyond the scope of the present disclosure.

[0149] Since the present disclosure is focused on developing XAI for clinicians who already know how to extrapolate from disease to murmur shape, low-level, conceptbased, rationalization explanation is omitted. Nevertheless, a more useful deployed XAI system should combine diagrams and conceptual rationalization to explain deeply.

Comparing Diagrammatization to Bayesian Generative Modeling

[0150] The causal pathway described in Fig. 23(c) illustrates that diagnostic inference can be represented as a Markov chain for Bayesian generative modeling. The causal sequence is somewhat reversed from Peircean abduction that is used in Diagrammatization, illustrating the backward reasoning process of abduction. The variables of diagnosis y, murmur shape hypotheses h, murmur shape parameters 9, and observation x can be described as a joint probability p(y, h, 9, x) = p(y)p(h | y)p(9 | h)p(x | 0). Inferring the diagnosis would then involve computing the posterior distribution p(y | x) = ff p(y, h, 9, x)d9dh. Similarly, the best murmur shape hypothesis can also be inferred by computing the posterior p(h | x) = JJp (y, h, 9, x)d9dy.

[0151] However, this is different from the approach of abductive inference. DiagramNet has a higher inductive bias than the Bayesian model, since it imposes geometric constraints on the murmur shapes by specifying linear piecewise functions. It also does not need to learn how to predict the shape parameters at training time, but can estimate them using test-time optimization. This simplifies its training and enables it to be trained on less data and be more accurate. Diagrammatization for Confirmatory Analysis

[0152] Diagrammatization uses abductive-deductive reasoning to generate specific hypotheses and test them. It requires hypotheses to be mathematically formulated with defined goodness-of-fit evaluation metrics. Diagrammatization is unsuitable for 1) exploratory analysis without hypotheses, such as when data scientists debug models by looking for spurious effects rather than explicitly hypothesizing bugs. Open-ended representations like feature attribution or saliency map would be more suitable here. It is also unsuitable for 2) unbounded representations like natural language explanations that acquire open-ended text from people without expectations on a finite set of explanations, so the number of hypotheses may also be unbounded. However, categorizing the text responses into a discrete taxonomy would simplify identifying key hypotheses for abductive reasoning, and this could then be suitable for Diagrammatization.

Conclusion

[0153] In the present disclosure, Diagrammatization is presented as a general representation paradigm for XAI to support diagrammatic reasoning with i) Peircean abduction, ii) domain conventions, and iii) Peircean diagrams to narrow the interpretability gap. Based on Diagrammatization, DiagramNet, a modular explainable deep neural network with multiple stages aligned with the 4-step Peircean abduction process, is developed and trained to predict cardiac disease from heart sounds and generate murmur diagram explanations. Demonstrations showed that it can provide support diverse abductive, contrastive, counterfactual, and case (example) explanations. Modeling evaluations found that DiagramNet not only had more faithful explanations but also better prediction performance than several baseline models. The present disclosure gives insights into diagram-based, abductive explainable Al, and contributes a new basis towards user-centered XAI.

[0154] Fig. 23 depicts an exemplary computing device 2300, hereinafter interchangeably referred to as a computer system 2300, where one or more such computing devices 2300 may be used to execute the methods 810, 830, 840 and 850 of Figs. 8C to 8F. One or more components of the exemplary computing device 2300 can also be used to implement the system 800. The following description of the computing device 2300 is provided by way of example only and is not intended to be limiting.

[0155] As shown in Fig. 23, the exemplary computing device 2300 includes a processor 2302 for executing software routines. Although a single processor is shown for the sake of clarity, the computing device 2300 may also include a multi-processor system. The processor 2302 is connected to a communication infrastructure 2304 for communication with other components of the computing device 2300. The communication infrastructure 2304 may include, for example, a communications bus, cross-bar, or network.

[0156] The computing device 2300 further includes a main memory 2306, such as a random access memory (RAM), and a secondary memory 2308. The secondary memory 2308 may include, for example, a hard disk drive 2310 and/or a removable storage drive 2312, which may include a floppy disk drive, a magnetic tape drive, an optical disk drive, or the like. The removable storage drive 2312 reads from and/or writes to a removable storage unit 2314 in a well-known manner. The removable storage unit 2314 may include a floppy disk, magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 2312. As will be appreciated by persons skilled in the relevant art(s), the removable storage unit 2314 includes a computer readable storage medium having stored therein computer executable program code instructions and/or data.

[0157] In an alternative implementation, the secondary memory 2308 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into the computing device 2300. Such means can include, for example, a removable storage unit 2316 and an interface 2318. Examples of a removable storage unit 2316 and interface 2318 include a program cartridge and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM or PROM) and associated socket, and other removable storage units 2316 and interfaces 2318 which allow software and data to be transferred from the removable storage unit 2316 to the computer system 2300.

[0158] The computing device 2300 also includes at least one communication interface 2320. The communication interface 2320 allows software and data to be transferred between computing device 2300 and external devices via a communication path 2322. In various embodiments of the inventions, the communication interface 2320 permits data to be transferred between the computing device 2300 and a data communication network, such as a public data or private data communication network. The communication interface 2320 may be used to exchange data between different computing devices 2300 which such computing devices 2300 form part an interconnected computer network. Examples of a communication interface 2320 can include a modem, a network interface (such as an Ethernet card), a communication port, an antenna with associated circuitry and the like. The communication interface 2320 may be wired or may be wireless. Software and data transferred via the communication interface 2320 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communication interface 2320. These signals are provided to the communication interface via the communication path 2322.

[0159] As shown in Fig. 23, the computing device 2300 further includes a display interface 2324 which performs operations for rendering images to an associated display 2326 and an audio interface 2328 for performing operations for playing audio content via associated speaker(s) 2330.

[0160] As used herein, the term "computer program product" may refer, in part, to removable storage unit 2314, removable storage unit 2316, a hard disk installed in hard disk drive 2310, or a carrier wave carrying software over communication path 2322 (wireless link or cable) to communication interface 2304. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computing device 2300 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magnetooptical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computing device 2300. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computing device 2300 include radio or infrared transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[0161] The computer programs (also called computer program code) are stored in main memory 2306 and/or secondary memory 2308. Computer programs can also be received via the communication interface 2320. Such computer programs, when executed, enable the computing device 2300 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 2302 to perform features of the above-described embodiments. Accordingly, such computer programs represent controllers of the computer system 2300.

[0162] Software may be stored in a computer program product and loaded into the computing device 2300 using the removable storage drive 2314, 2316, the hard disk drive 2310, or the interface 2318. Alternatively, the computer program product may be downloaded to the computer system 2300 over the communications path 2322. The software, when executed by the processor 2302, causes the computing device 2300 to perform functions of embodiments described herein.

[0163] It is to be understood that the embodiment of Fig. 23 is presented merely by way of example. Therefore, in some embodiments one or more features of the computing device 2300 may be omitted. Also, in some embodiments, one or more features of the computing device 2300 may be combined together. Additionally, in some embodiments, one or more features of the computing device 2300 may be split into one or more component parts.

[0164] It will be appreciated that the elements illustrated in Fig. 23 function to provide means for performing the various functions and operations of the servers as described in the above embodiments.

[0165] When the computing device 2300 is configured to realise the system 800 to predict a type of a cardiac disease in a subject, the system 800 can have a non-transitory computer readable medium having stored thereon an application which when executed causes the system 800 to perform steps comprising: receiving a vector representation of a time series data (x) associated with an activity of a heart of the subject, extracting a vector representation of an amplitude (a) of the vector representation of the time series data (x) using a first neural network (A), generating a prediction of a vector representation of a mask (in) based on the vector representation of the amplitude (a) using a second neural network (M M ), applying the vector representation of the mask (in) on the vector representation of the amplitude (a) to obtain a vector representation of an amplitude in a region of interest (a m ), determining a set of shape parameters (0o) of the amplitude in the region of interest (a m ) based on a set of heuristics (0o), wherein the set of shape parameters (0o) defines a shape function (h m ) of the amplitude in the region of interest (d m ), wherein the shape function (h m ) is based on a predetermined shape function representing the cardiac disease (y) in the subject, performing shape fitting of the shape function (h m ) with the amplitude in the region of interest (a m ), and calculating a Mean Squared Error (MSE) lack of fit (d) of the shape fitting, wherein the calculated MSE lack of fit (d) is used to predict a type of the cardiac disease (y) in the subject. The method also includes iteratively: performing optimization of the set of shape parameters (9 0 ) using Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm to obtain an optimized set of shape parameters (9), wherein the optimized set of shape parameters (9) defines an optimized shape function of the amplitude in the region of interest (a m ), wherein the optimized shape function is based on the predetermined shape function representing the cardiac disease (y) in the subject, wherein the method further includes: performing shape fitting of the optimized shape function of the amplitude in the region of interest (h m ) with the amplitude in the region of interest (d m ) such that the MSE lack of fit (d) is minimized. The method further includes performing segmentation of the vector representation of the amplitude (a) using the second neural network (M m ). In addition, the step of generating the prediction of the vector representation of the mask (in) can further include generating an embedding vector representation of the mask (z m ) and the method includes: generating an initial prediction (yo) of the cardiac disease (y) based on the embedding vector representation of the mask (z m ) using a third neural network (F y ), determining a phase of the heart cycle (4>o) based on the initial prediction (yo), generating an intermediate diagrammatic explainable prediction (yh) of the cardiac disease (y) in the subject based on the phase of the heart cycle (4>o) and the calculated MSE lack of fit (d) using a fourth neural network ( y h), and generating a final diagrammatic explainable prediction (y) of the cardiac disease in the subject based on the initial (yo) and intermediate (yh) diagrammatic explainable prediction of the cardiac disease in the subject using a fifth neural network ( y ).

[0166] It will be appreciated by a person skilled in the art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.