Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EVIDENCE-BASED OUT-OF-DISTRIBUTION DETECTION ON MULTI-LABEL GRAPHS
Document Type and Number:
WIPO Patent Application WO/2024/076724
Kind Code:
A1
Abstract:
Systems and methods for out-of-distribution detection of nodes in a graph includes collecting (802) evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network. Multi-label opinions are generated (804) including belief and disbelief for the diverse labels. The opinions are combined (806) into a joint belief by employing a comultiplication operation of binomial opinions. The joint belief is classified (808) to detect out-of-distribution nodes of the graph. A corrective action is performed (810) responsive to a detection of an out-of-distribution node. The systems and methods can employ evidential deep learning.

Inventors:
ZHAO XUJIANG (US)
CHEN HAIFENG (US)
Application Number:
PCT/US2023/034624
Publication Date:
April 11, 2024
Filing Date:
October 06, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEC LAB AMERICA INC (US)
International Classes:
G06N3/042; G06N3/09
Foreign References:
US9538146B22017-01-03
Other References:
ATIA JAVAID: "Machine Learning Algorithms and Fault Detection for Improved Belief Function Based Decision Fusion in Wireless Sensor Networks", SENSORS, MDPI, CH, vol. 19, no. 6, 17 March 2019 (2019-03-17), CH , pages 1334, XP093155702, ISSN: 1424-8220, DOI: 10.3390/s19061334
SYLVIE COSTE-MARQUIS: "On Belief Change for Multi-Label Classifier Encodings", PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, INTERNATIONAL JOINT CONFERENCES ON ARTIFICIAL INTELLIGENCE ORGANIZATION, CALIFORNIA, 1 August 2021 (2021-08-01) - 27 August 2021 (2021-08-27), California, pages 1829 - 1836, XP093155703, ISBN: 978-0-9992411-9-6, DOI: 10.24963/ijcai.2021/252
ZHEN GUO: "A survey on uncertainty reasoning and quantification in belief theory and its application to deep learning", INFORMATION FUSION, ELSEVIER, US, vol. 101, 1 January 2024 (2024-01-01), US , pages 101987, XP093155704, ISSN: 1566-2535, DOI: 10.1016/j.inffus.2023.101987
ALMEIDA ALEX M.G.; CERRI RICARDO; PARAISO EMERSON CABRERA; MANTOVANI RAFAEL GOMES; JUNIOR SYLVIO BARBON: "Applying multi-label techniques in emotion identification of short texts", ARXIV, vol. 320, 1 September 2018 (2018-09-01), pages 35 - 46, XP085502290, DOI: 10.1016/j.neucom.2018.08.053
XUJIANG ZHAO: "Multidimensional Uncertainty Quantification for Deep Neural Networks", ARXIV:2304.10527V1, 20 April 2023 (2023-04-20), XP093155713, Retrieved from the Internet
Attorney, Agent or Firm:
BITETTO, James J. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS: 1. A computer-implemented method for out-of-distribution detection of nodes in a graph, comprising: Collecting (802) evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network; generating (804) multi-label opinions including belief and disbelief for the diverse labels; combining (806) the opinions into a joint belief by employing a comultiplication operation of binomial opinions; classifying (808) the joint belief to detect out-of-distribution nodes of the graph; and performing (810) a corrective action responsive to a detection of an out-of- distribution node. 2. The method as recited in claim 1, wherein collecting evidence to quantify predictive uncertainty includes predicting positive and negative evidence vectors from the multi-label evidential graph neural network. 3. The method as recited in claim 2, wherein predicting the positive and negative evidence vectors includes generating a beta distribution using the positive and negative evidence vectors wherein the beta distribution is used to train the multi- label evidential graph neural network by minimizing beta loss in accordance with evidential deep learning. 22061PCT Page 32 of 38

4. The method as recited in claim 1, wherein generating multi-label opinions includes computing for sample i, class k: ^^ where ^ indicates positive belief mass ^ distribution, ^^ indicates negative belief mass distribution, )^ are features of positive and negative evidence vectors, respectively. 5. The method as recited in claim 1, wherein combining the opinions into a joint belief includes combining belief opinions b of a sample by ^^∨^∨…∨^ calculated recursively 6. The method as recited in claim 1, wherein classifying the joint belief to detect out-of-distribution nodes of the graph includes determining whether the joint belief exceeds a threshold value for a given node to determine if the node is out-of- distribution. 7. The method as recited in claim 1, wherein the nodes include patient information, the corrective action includes: alerting medical personnel of the out-of-distribution node; and making a medical decision based on the out-of-distribution node. 8. The method as recited in claim 1, further comprising optimizing through training the multi-label evidential graph neural network by minimizing total loss which includes a beta loss component and a positive evidence loss component. 22061PCT Page 33 of 38

9. The method as recited in claim 1, wherein the corrective action includes: applying a label to the out-of-distribution node. 10. The method as recited in claim 1, wherein the multi-label evidential graph neural network applies evidential deep learning. 11. A system for out-of-distribution detection of nodes in a graph, comprising: a hardware processor (601, 602); and a memory (603) that stores a computer program which, when executed by the hardware processor, causes the hardware processor to: collect (802) evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network; generate (804) multi-label opinions including belief and disbelief for the diverse labels; combine (806) the opinions into a joint belief by employing a comultiplication operation of binomial opinions; classify (808) the joint belief to detect out-of-distribution nodes of the graph; and perform (810) a corrective action responsive to a detection of an out- of-distribution node. 12. The system as recited in claim 11, wherein the computer program further causes the hardware processor to collect evidence to quantify predictive 22061PCT Page 34 of 38 uncertainty by predicting positive and negative evidence vectors from the multi-label evidential graph neural network. 13. The system as recited in claim 12, wherein the computer program further causes the hardware processor to generate a beta distribution using the positive and negative evidence vectors wherein the beta distribution is used to train the multi- label evidential graph neural network by minimizing beta loss in accordance with evidential deep learning. 14. The system as recited in claim 11, wherein the computer program further causes the hardware processor to generate multi-label opinions by computing for sample i, class k: ^^ = @FG^ @FYVF and ^^ = VFG^ @FYVF, where ^^ indicates positive belief mass distribution, ^^ indicates negative belief mass distribution, )^ are features of positive and negative evidence vectors, respectively. 15. The system as recited in claim 11, wherein the computer program further causes the hardware processor to combine the opinions into a joint belief by combining belief opinions b of a sample by ^^∨^∨…∨^ calculated recursively by ^"∨% = ^" + ^% − ^"^%, wherein the computer program further causes the hardware processor to classify the joint belief to detect out-of-distribution nodes of the graph by determining whether the joint belief exceeds a threshold value for a given node to determine if the node is out-of-distribution. 22061PCT Page 35 of 38

16. The system as recited in claim 11, wherein the nodes include patient information and the computer program further causes the hardware processor to: alert medical personnel of the out-of-distribution node to enable a medical decision based on the out-of-distribution node. 17. The system as recited in claim 11, wherein the computer program further causes the hardware processor to optimize the multi-label evidential graph neural network through training by minimizing total loss which includes a beta loss component and a positive evidence loss component. 18. The system as recited in claim 11, wherein the corrective action includes applying a label to the out-of-distribution node. 19. A computer program product for out-of-distribution detection of nodes in a graph, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: collecting (802) evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network; generating (804) multi-label opinions including belief and disbelief for the diverse labels; combining (806) the opinions into a joint belief by employing a comultiplication operation of binomial opinions; 22061PCT Page 36 of 38 classifying (808) the joint belief to detect out-of-distribution nodes of the graph; and performing (810) a corrective action responsive to a detection of an out-of- distribution node. 20. The computer program product as recited in claim 19, wherein the nodes include patient information and the corrective action includes: alert medical personnel of the out-of-distribution node to enable a medical decision based on the out-of-distribution node. 22061PCT Page 37 of 38

Description:
EVIDENCE-BASED OUT-OF-DISTRIBUTION DETECTION ON MULTI-LABEL GRAPHS RELATED APPLICATION INFORMATION [0001] This application claims priority to U.S. Provisional Application No. 63/413,695, filed on October 6, 2022, and U.S. Application No. 18/481,383, filed on October 5, 2023, both incorporated herein by reference in their entirety. BACKGROUND Technical Field [0002] The present invention relates to graph structured data systems and methods and more particularly to graph structured networks that address nodes that include out of distribution nodes. Description of the Related Art [0003] Many real-world application scenarios can be represented by graph structured data, ranging from natural networks to social networks. In graph scenarios, there are usually only a subset of nodes that are labeled. Multi-label properties of nodes cannot be avoided. For example, in social networks, one user may have more than one interest. In a Protein-Protein-Interaction (PPI) network, one protein can perform multiple functions. Since unknown labels are unavoidable, some of the unlabeled nodes may be out-of-distribution (OOD) and need to be discovered. SUMMARY [0004] According to an aspect of the present invention, a method for out-of- distribution detection of nodes in a graph includes collecting evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using 22061PCT Page 1 of 38 positive evidence from labels of training nodes of a multi-label evidential graph neural network. Multi-label opinions are generated including belief and disbelief for the diverse labels. The opinions are combined into a joint belief by employing a comultiplication operation of binomial opinions. The joint belief is classified to detect out-of-distribution nodes of the graph. A corrective action is performed responsive to a detection of an out-of-distribution node. [0005] According to another aspect of the present invention, a system for out-of- distribution detection of nodes in a graph includes a hardware processor and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor to collect evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network; generate multi-label opinions including belief and disbelief for the diverse labels; combine the opinions into a joint belief by employing a comultiplication operation of binomial opinions; and classify the joint belief to detect out-of- distribution nodes of the graph. [0006] According to another aspect of the present invention, a computer program product for out-of-distribution detection of nodes in a graph, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method including collecting evidence to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network; generating multi-label opinions including belief and disbelief for the diverse labels; combining the opinions into a joint belief by employing a 22061PCT Page 2 of 38 comultiplication operation of binomial opinions; classifying the joint belief to detect out-of-distribution nodes of the graph; and performing a corrective action responsive to a detection of an out-of-distribution node. [0007] These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. BRIEF DESCRIPTION OF DRAWINGS [0008] The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein: [0009] FIG. 1 is a block/flow diagram illustrating a high-level system/method for evidence-based out-of-distribution detection on multi-label graphs, in accordance with an embodiment of the present invention; [0010] FIG. 2 is a block/flow diagram illustrating a system/method for an out-of- distribution detection system on graph-structured data, in accordance with an embodiment of the present invention; [0011] FIG. 3 is a flow diagram illustrating a method for detecting out-of- distribution nodes in graphs, in accordance with an embodiment of the present invention; [0012] FIG. 4 is an illustrative example of a Protein-Protein Interaction (PPI) network employing a multi-label evidential graph neural network to improve the performance of node-level multi-label out-of-distribution detection, in accordance with an embodiment of the present invention; [0013] FIG. 5 is a block diagram showing a medical system that employs a multi- label evidential graph neural network to improve the performance of node-level multi- 22061PCT Page 3 of 38 label out-of-distribution detection, in accordance with an embodiment of the present invention; [0014] FIG. 6 is a block diagram showing an exemplary processing system employed in accordance with an embodiment of the present invention; [0015] FIG. 7 is a generalized illustrative diagram of a neural network, in accordance with an embodiment of the present invention; and [0016] FIG. 8 is a flow diagram illustrating a method for detecting out-of- distribution nodes in graphs, in accordance with an embodiment of the present invention. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS [0017] Embodiments in accordance with the present invention address Out-of- Distribution (OOD) detection on graph-structured data. OOD is an issue in various areas of research and applications including social network recommendations, protein function detection, medication classification, medical monitoring and other graph- structured data applications. The inevitable inherent multi-label properties of nodes provides more challenges for multi-label OOD detection than multi-class settings. Existing OOD detection methods on graphs are not applicable for multi-label settings. Other semi-supervised node classification methods lack the ability to differentiate OOD nodes from in-distribution (ID) nodes. Multi-class classification assigns each data sample one and only one label from more than two classes. Multi-label classification can be used to assign zero or more labels to each data sample. [0018] Out-of-distribution detection on multi-label graphs, in accordance with the present embodiments, can incorporate Evidential Deep Learning (EDL) to provide a novel Evidence-Based OOD detection method for node-level classification on multi- 22061PCT Page 4 of 38 label graphs. The evidence for multiple labels is predicted by Multi-Label Evidential Graph Neural Networks (ML-EGNNs) with beta loss. A Joint Belief is designed for multi-label opinions fusion by a comultiplication operator. Additionally, a Kernel- based Node Positive Evidence Estimation (KNPE) method can be introduced to reduce errors in quantifying positive evidence. Experimental results prove both the effectiveness and efficiency of our model on multi-label OOD Detection. Also, the present methods can maintain an ideal close-set classification performance when compared with baselines on real-world multi-label networks. [0019] Learning methods for multi-label node classification on graphs to predict user interests in social networks, classify medical conditions, identify functions of proteins in PPI networks, etc. are capable of differentiating OOD nodes from in- distribution (ID) nodes. By effectively distinguishing OOD nodes, users with potential interests, for example, can be identified for better recommendations or unknown functions of proteins can be discovered for pharmaceutical research. In a particularly useful embodiment, medical information can be employed in a graphical setting where each node can include a patient or user or characteristics of a patient or user. Multiple labels for each patient may need to be evaluated to ensure all of the patient’s medical conditions are properly classified. [0020] Multi- Label Out-of-Distribution Detection can be employed for data mining and network analysis. The OOD samples can be connected with low belief and lack of classification evidence from Subjective Logic (SL). Multi-label Out-of- Distribution on graphs can be trained on: (1) how to learn evidence or belief for each possibility based on structural information and node features; (2) how to combine information from different labels and comprehensively decide whether a node is out- 22061PCT Page 5 of 38 of-distribution; (3) how to maintain ideal close-set multi-label classification results while effectively performing OOD detection. [0021] In one embodiment, an evidential OOD detection method for node-level classification tasks on multi-label graphs is provided. Evidential Deep Learning (EDL) is leveraged in which the learned evidence is informative to quantify the predictive uncertainty of diverse labels so that unknown labels would incur high uncertainty. Beta distributions can be introduced to make Multi-Label Evidential Graph Neural Networks (ML-EGNNs) feasible. Joint Belief is formulated for multilabel samples by a comultiplication operator of binomial opinions, which combines argument opinions from multiple labels. The separate belief of classes obtained by evidential neural networks are employed as a basis for close-set classification, which is both effective and efficient. [0022] A Kernel-based Node Positive Evidence Estimation (KNPE) method uses structural information and prior positive evidence collected from the given labels of training nodes, to optimize a neural network model and to help detect multi-label OOD nodes. A method for node-level OOD detection uses a multi-label evidential neural network, in which OOD conditions can be directly inferred from evidence prediction, instead of relying on time-consuming dropout or ensemble techniques. [0023] OOD detection on multi-label graphs using evidential methods for the multi-label node-level detection are provided. Evidential neural networks are utilized with beta loss to predict the belief for multiple labels. Joint Belief is defined for multi- label opinions fusion. Further, a Kernel-based Node Positive Evidence Estimation (KNPE) method is provided to reduce errors in quantifying positive evidence. [0024] Experimental results prove both the effectiveness and efficiency of models, in accordance with the present embodiments, on multi-label OOD detection, which is 22061PCT Page 6 of 38 able to maintain an ideal close-set classification level when compared with baselines on real-world multi-label networks. [0025] Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. [0026] Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer- usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc. [0027] Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein. 22061PCT Page 7 of 38 [0028] A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers. [0029] Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters. [0030] Referring now in detail to the figures in which like numerals represent the same or similar elements and initially to FIG. 1, a high-level system/method for evidence-based out-of-distribution detection on multi-label graphs is shown in accordance with embodiments of the present invention. Multi-label out-of- distribution detection is performed using a multi-label evidential neural network method. Given multi-label graph data, a goal is to detect the out-of-distribution nodes. This is done by minimizing an area under a precision-recall curve (AUPR) for out-of- distribution detection to make a prediction more accurate. [0031] To address node level out-of-distribution detection on multi-label graph data, one embodiment provides a new Multi-Label Evidential Graph Neural Networks (ML-EGNN) framework 100 that utilizes evidential neural networks with beta loss to 22061PCT Page 8 of 38 predict a belief for multiple labels. In block 110, the framework leverages evidential deep learning in which learned evidence is informative to quantify a predictive uncertainty of diverse labels so that unknown labels would incur high uncertainty and thus provide a basis for differentiating the diverse labels. Beta distributions are also introduced to make the model feasible. In block 120, the framework provides joint belief for multi-label samples by a comultiplication operator of binomial opinions, which combines argument opinions from multiple labels. [0032] In block 130, kernel-based node positive evidence estimation is provided and uses structural information, and prior positive evidence that was collected from the given labels of training nodes, to help detect multi-label out-of-distribution nodes. Experimental results show the effectiveness and efficiency of the model on multi- label OOD detection. The framework can maintain an ideal close-set classification level when compared with baselines on real-world multi-label networks. [0033] Block 110 provides multi-label node evidence estimation. In this step, a Multi-Label Evidential Graph Neural Network (ML-EGNN) is designed and built by stacking graph convolutional layers and two fully connected layers (FCs) and rectified linear unit (ReLU) layers. [0034] Neurons in a ML-EGNN can include a respective activation function. These activation functions represent an operation that is performed on an input of a neuron, and that help to generate the output of the neuron. Here, the activation function can include ReLU but other appropriate activation functions may be adapted for use. ReLU provides an output that is zero when the input is negative, and reproduces the input when the input is positive. The ReLU function notably is not differentiable at zero—to account for this during training, the undefined derivative at zero may be replaced with a value of zero or one. 22061PCT Page 9 of 38 [0035] The node evidence estimation output from the graph convolutional layers, FCs and ReLU layers is taken as the positive and negative evidence vectors for Beta distribution, respectively. Given sample i, let ^ ^^^ ^^, ^|^^ represent the positive and negative evidence vectors predicted by Evidential Graph Neural Networks (EGNNs), where X is the input node features, A is the adjacency matrix, and ^ represents the network parameters. Then, the parameters of the Beta distribution for node i and label k are: [0036] With N training samples and K different classes, a multi-label evidential neural network is trained by minimizing the Beta Loss: [0037] where BCE denotes the Binary Cross Entropy Loss. ^ ^^ represents the predicted probability of sample i belonging to class k by model. ^ ^^ represents the ground truth for sample i with label k, i.e., ^ ^^ ^ 1 means the training node i belongs to class k, otherwise ^ ^^ ^ 0. And ^^⋅^ denotes the Digamma function. Besides, as the belief ^ ^^ and disbelief ^ ^^ of label k for sample i, then: 22061PCT Page 10 of 38 [0038] For the following process, these beliefs are regarded as multi-label opinions, to formulate a Joint Belief and quantify OOD samples. [0039] In block 120, multi-label opinion fusion is performed. After obtaining separate beliefs of multiple labels, next these opinions are combined and an integrated opinion is quantified, e.g., Opinions Fusion. Let ^ ^ ^^ ^ , ^ ^ ^ and ^ ^^ ^ , ^ ^ ^ be two different domains, and let ! " ^ ^^ " , ^ " , # " , $ " ^ and ! % ^ ^^ % , ^ % , # % , $ % ^ be binomial opinions on X and Y respectively. Then, the joint opinion ! "∨% can be formulated as: [0040] The Joint Belief of a certain sample i is ^ ^∨^∨⋯∨( , can be calculated by the above equation recursively. [0041] In block 130, kernel-based evidence estimation is performed. Kernel-based Evidence Estimation estimates prior Beta distribution parameters for each node based on the labels of a training node and node-level distance. The focus is on the estimation of positive evidence )*. For each pair of nodes i and j, calculate a node- level distance i.e., the shortest path between nodes i and j. Then, a Gaussian kernel function is used to estimate the positive distribution effect between nodes i and j: 22061PCT Page 11 of 38 [0042] where , is the bandwidth parameter. The contribution of positive evidence estimation for node j from training node i is ^ ^ ^ ( ^ 0ℎ ^+ , ℎ ^+ , … , ℎ ^+ , … , ℎ ^+ 2, where 0 ^ ^^ , … , ^ ^^ , … , ^ ^( 2 ^ 0 0, 1 2( represents the in-distribution label vector of training node i, and is obtained by: [0043] The prior positive evidence 3̂ + is estimated as all in a set of training samples. During the training process, Kullback–Leibler (KL) divergence (KL- divergence) is minimized between model predictions of positive evidence and prior positive evidence. KL-divergence (also called relative entropy or I-divergence), denoted , is a statistical distance of how one probability distribution P is different from a reference probability distribution Q. A relative entropy of 0 indicates that the two distributions in question have identical quantities of information. Relative entropy is a non-negative function of two distributions or measures. [0044] A total loss function (e.g., sum of beta loss and weighted positive evidence loss) that can be used to optimize the model can include: where 5 denotes a trade-off parameter. [0045] Referring to FIG. 2, a block/flow diagram shows an OOD detection system 200 on graph-structured data. OOD samples can be connected with low belief and lack of classification evidence from Subjective Logic (SL). Subjective logic (SL) is a type of probabilistic logic that explicitly takes epistemic uncertainty and source trust 22061PCT Page 12 of 38 into account. Specifically, epistemic uncertainty measures whether input data exists within the distribution of data already seen. A multinomial opinion of a random variable ^ is represented by ! ^ ^b, #, a^ where a domain is ^ ^1, ⋯ , 8^,where b indicates belief mass distribution, # indicates uncertainty with a lack of evidence, and a indicates base rate distribution. For a 8 multiclass setting, a probability mass p ^ 0^ ^ , ^ ^ , … , ^ ( 2 is assumed to follow a Dirichlet (Dir (α)) distribution parameterized by a K-dimensional Dirichlet strength vector ) : [0046] Dir otherwise where J()) is a 8-dimensional Beta function, I ( is a 8-dimensional unit simplex. The total strength of the Dirichlet is defined as I = Σ ^C^ ) ^ . Dirichlet distribution is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. [0047] The term evidence indicates how much data supports a particular classification of a sample based on the observations it contains. Let 3 ^ = ^3 ^ , … , 3 ( ^ be the evidence for 8 classes. Each entry 3 ^ ≥ 0 and the Dirichlet strength ) are linked according to the evidence theory by the following ) = e + aW, where P is the weight of uncertain evidence. With loss of generality, the weight P is set to 8 and considering the assumption of the subjective opinion that $ ^ = 1/8, we have the Dirichlet strength ) ^ = 3 ^ + 1. The Dirichlet evidence can be mapped to the ^ subjective opinion by setting the following equality's: ^ = and # = R [0048] Graph neural networks (GNNs) 208 provide a feasible way to extend deep learning methods into the non-Euclidean domain including graphs and manifolds. The most representative models are, according to the types of aggregators, e.g., Graph Convolutional Network (GCN), Graph Attention Networks (GAT), and GraphSAGE. 22061PCT Page 13 of 38 [0049] It is possible to apply GNNs 208 to various types of training frameworks, including (semi) supervised or unsupervised learning, depending on the learning tasks and label information available. of these, relevant to the present problem is semi- supervised learning for node-level classification. Assuming a network with partial nodes labeled and others unlabeled, GNNs 208 can learn a model that effectively identifies the labels for the unlabeled nodes. In this case, an end-to-end framework can be built by stacking graph convolutional layers 210 followed by fully connected FC layers 214. [0050] Based on Subjective Logic and Belief Theory, the K-dimensional Dirichlet probability distribution function (PDF) is applied for estimating multinomial probability density over a domain of = ^1, ⋯ , 8^. However, it is not feasible for multi-label classification. For Dirichlet distribution, an in-distribution node with multiple labels could be differentiated from other in-distribution samples according to its conflicting evidence, though it shows no sign of lacking evidence. To this end, a Beta distribution is introduced which is able to provide binary evidence for each class. [0051] Beta [0052] where the probability mass ^ ∈ [0,1] is assumed to follow a Beta distribution parameterised by a 2-dimensional strength vector [), U]. Belief 218 (J(), U)) is a 2-dimensional Beta function based on α 214 and β 216 the positive and negative evidence vectors, respectively. [0053] Further, a multi-label classification problem Ω can be formalized as a combination of 8 binomial classifications ^ ! ^ , … , ! ^ , … , ! ( ^ . Each binomial classification ! ^ holds a binomial opinion: ! ^ = (^ ^ , ^ ^ , # ^ , $ ^ ) where a domain is = ^0,1^, ^ ^ indicates positive belief mass distribution, ^ ^ indicates negative belief 22061PCT Page 14 of 38 mass distribution, # ^ indicates uncertainty with a lack of evidence, and $ ^ indicates base rate distribution. The total strength of the Beta is defined as I ^ = ) ^ + U ^ . Then, the Beta evidence can be mapped to a binomial subjective opinion by setting the following equalities: ^ ^ = @FG^ VFG^ @ FYVF and ^ ^ = @FYVF , [0054] # = ^ ^ ^ RF = EFYVF . [0055] Compared with classical Neural Networks, Evidential Neural Networks (ENNs) do not have a softmax layer, but use an activation layer (e.g., ReLU) to make sure that the output is non-negative. Multi-Label Evidential Graph Neural Networks (ML-EGNNs) are built by stacking graph convolutional layers in GNN 208 and two fully connected layers (FCs) 212 with ReLU layers, which are taken as the positive and negative evidence vectors (α 214 and β 216, respectively) for Beta distribution. Predictions of the neural network are treated as subjective opinions and learn the function that collects evidence by a deterministic neural network from data. [0056] Domains 202 and 204 are marked as X and Y respectively in FIG. 2. Given sample [, let ^ pos ^^, ^ ∣ ^) and ^ neg (^, ^ ∣ ^) represent the positive and negative evidence vectors predicted by EGNNs, where ^ is the input node features, ^ is an adjacency matrix 206, and ^ represents network parameters. Then, the parameters of Beta distribution for node [ and label \ are: [0057] [0058] With ] training samples and 8 different classes, a multi-label evidential neural network is trained by minimizing Beta Loss 226: 22061PCT Page 15 of 38 ℒBeta = ∑^`C^    ∑( ^C^    a [BCE (^^^, ^^^)]J()^^, U^^)^^^^ [0059] = ∑ ^`C^    ∑( ^C^    a [−^ ^^ log (^ ^^ ) − (1 − ^ ^^ )log (1 − D ^^ )]J() ^^ , U ^^ )^^ ^^ = ∑^`C^    ∑( ^C^     g−^^^h^^^∼jklm[log (^^^)] − (1 − ^^^)h^nF∼jklm[l o g (1 − ^^^)]] where J ( ) ^^ , U ^^ ) is a 2-dimensional Beta function. BCE denotes the Binary Cross Entropy Loss. ^ ^^ represents the predicted probability of sample [ belonging to class \ by model. ^ ^^ represents the ground truth for sample [ with label \, e.g., ^ ^^ = 1 means the training node [ belongs to class \, otherwise ^ [log ( )] can be formulated and derived as follows: h ^^^∼ Beta [log (^ ^^ )] = a ^ s    log ^ ^^ ^(^ ^^ ; ) ^^ , U ^^ )^^ ^^ where Γ(⋅) represents the Gamma function. By the same derivation, we can obtain the term E ^^∼jklm [ log ( 1 − ^ ^^ )] = ^ ( U ^^ ) − ^ ( ) ^^ + U ^^ ) . Thus, the Beta Loss 226 term ℒ Beta is: [0061] + .1 − ^^+/ ^^.)^+ + U^+/ − ^.U^+/^^ . where ^(⋅) denotes the Digamma function. Besides, as the belief and disbelief of label \ for sample ^, we have: [0062] 22061PCT Page 16 of 38 [0063] For the following inference process, these beliefs 218 are regarded as multi- label opinions, to formulate a Joint Belief 220 and quantify OOD samples. So far, for in-distribution multi-label classification, we set the positive belief as the probability of class [ for sample ^, i.e., @nFG^ @ ^FYVnF , for time reduction. [0064] After obtaining separate beliefs 218 of multiple labels, these beliefs 218 or opinions need to be combined and quantified in an integrated opinion, e.g., Opinions Fusion into Joint Belief 220. Note that, if a sample belongs to any label we already know, then it is an ID sample. In other words, only samples that do not belong to any known category should be classified as OOD samples. Hence, naive operations like summing up all the beliefs are inapplicable for multi-label settings. [0065] Inspired by the multiplication in subjective logic, let ^ = ^ ^ ^ , ^ ^ ^ and = ^^ ^ , ^ ^ ^ be two different domains (202 and 204, respectively), and let ! " = binomial opinions on ^ and respectively. A is the adjacency matrix. Then, the joint opinion ! "∨% is formulated as: [0066] ^ "∨% = ^ " + ^ % − ^ " ^ % . [0068] Based on that, the Joint Belief 220 of a certain sample i is ^ ^∨^∨…∨^ , which can be calculated recursively by ^ "∨% = ^ " + ^ % − ^ " ^ % . Only samples which do not belong to any known labels will have a relative low Joint Belief, which can effectively differentiate them from in-distribution samples. Thus, we use the Joint Belief to distinguish whether a sample 222 is in-distribution or a sample 223 is out-of- distribution. With a higher Joint Belief, we are more confident to consider a sample as in-distribution sample. In useful embodiments, a Joint Belief Threshold can be set 22061PCT Page 17 of 38 and employed to distinguish between in-distribution and out of distribution samples, nodes or graphs. [0069] Kernel-based Node Positive Evidence Estimation (KNPE) 224 estimates prior Beta distribution parameters for each node based on the labels of training node and node-level distance. To be specific, the estimation of positive evidence )ˆ is focused on. [0070] For each pair of nodes [ and ^, calculate the node-level distance ^ ^+ , i.e., the shortest path between nodes [ and ^. Then, the Gaussian kernel function is used to estimate the positive distribution effect between nodes [ and ^ : [0071] where , is the bandwidth parameter. [0072] The contribution of positive evidence estimation for node ^ from training node [0,1] ( represents the in-distribution label vector of training node i. ℎ ^ ^ + is obtained by: [0074] The prior positive evidence 3ˆ + is estimated as ∑ ^∈`  ℎ ^+ .^ ^ , ^ ^+ /, where ] is the set of training samples and the prior positive parameter ) = 3ˆ + + 1. During the training process, the KL-divergence is minimized between model predictions of positive evidence (positive evidence loss (ℒ PE ) 230) ℒ PE =   KL()ˆ + ∥ ) + ) . A total loss function 230 to optimize the model (minimization function (min)) includes: [0075] min ℒ total = ℒ Beta + 5ℒ PE where 5 denotes a trade-off parameter with ℒ PE . [0076] Referring to FIG. 3, a method for training a model for out-of-distribution node determination in graphs is illustratively shown and described. In block 302, a corpus of graph data with ground-truth labels of multi-labeled classes are provided as 22061PCT Page 18 of 38 input. The goal is to detect the out-of-distribution nodes in the graphs using a Multi- Label Evidential Graph Neural Network (ML-EGNN) framework to address the node- level multi-label out-of-distribution problem. In block 304, labeled graph data is collected. Labeled graph data can include any type of information, e.g., social media network citation networks, drug interaction data, medical monitoring data. The labeled graph data can include a set of strongly labeled data with multi-class labels. [0077] In block 306, a data processing device is employed to parse original graph data into its corresponding features. In one example, social media user information is collected as the node features. In another example, medical information is collected for individuals. In yet another example, data is collected for a Protein-Protein- Interaction (PPI) network. [0078] In block 308, prior knowledge processing is performed by a computer processing device. A kernel density estimation method is employed to estimate pseudo labels for evidence labels. This process is employed to optimize the model based upon minimization of loss (e.g., beta and positive evidence loss). [0079] In block 310, Multi-Label Evidential Graph Neural Networks training is performed. The ground-truth multi-labels are applied to train the ML-EGNNs for node-level multi-label out-of-distribution detection. [0080] In block 312, multi-label out-of-distribution detection test is performed. A final predicted result is generated for both node classification and multi-label out-of- distribution based on the belief, disbelief and uncertainty outputs. A threshold can be set for classification criteria. This threshold will be dependent on confidence and the desired accuracy of the OOD classification. [0081] Referring to FIG. 4, an illustrative example of a Protein-Protein Interaction (PPI) network 400 is described employing a multi-label evidential graph neural 22061PCT Page 19 of 38 network to improve the performance of node-level multi-label out-of-distribution detection. The PPI network 400 includes nodes 402 which are connected by edges 404. Each node includes labels 406 in a function block that in this example includes four functions or features. Each node in labeled with a letter A, B, C, D, E, F and H. The functions are identified using a key 408. The key 408 shows Function 1 and Function 2 as being In-Distribution (ID) functions and Function 3 and Function 4 as being Out-of-Distribution (OOD) functions. There are also a function category Does Not Belong and Unforeseen function. [0082] A key 412 shows details about types of nodes. These include: ID Labeled Protein, ID Unlabeled Protein and OOD Unlabeled Protein. Function 3 and Function 4 are unseen for Labeled Nodes A, B and C. A traditional classification method will confidentially put OOD Unlabeled Nodes H and F into one or more In-Distribution Functions (like Function 1 and Function 2). This defect will lead to the model being unable to detect the unknown functions. Hence, it is necessary to study the OOD detection problem on a multi-label graph. In this way, the nodes having unknown functions or unforeseen or undiscovered label types can be discovered. Detecting multi-class OOD nodes on a graph is not the same as detecting OOD nodes in multi- label settings. For example, multi-class classification assigns each data sample one and only one label from more than two classes. Multi-label classification can be used to assign a number of labels to each data sample. [0083] An uncertainty-based method may detect OOD proteins by higher uncertainty on Function 1 or Function 2. However, in this way, in-distribution node D may also have a high uncertainty score on Function 2 since it only has Function 1. Given that, those methods may misclassify some ID nodes into OOD samples when they have more sparse labels. Note that, we only consider OOD Unlabeled Nodes in 22061PCT Page 20 of 38 which all the labels are unseen, e.g., nodes like F with both ID Labels and OOD Labels are out of consideration. [0084] A novel multi-label opinion fusion enriched multi-label uncertainty representation with evidence information permits out-of-distribution prediction. Out- of-distribution detection with uncertainty estimation for graph settings with consideration of inherent multi-label properties of nodes and the ability to fuse information from different labels to distinguish OOD nodes enables the present embodiments to detect OOD nodes. [0085] For the PPI network 400, nodes 402 represent proteins, edges 404 connect pairs of interacting proteins, labels 406 indicate different functions of proteins. There are three kinds of nodes: In-Distribution Labeled Proteins A, B and C for training; In- Distribution Unlabeled Proteins D and E; Out-of-Distribution Unlabeled Proteins F and H. During the training process, Functions 3 and 4 are unseen/unknown to the model. Node H is output as a detected OOD node as unknown functions 410 are detected. Upon detection, corrective actin can be taken, such as providing updates to label definitions, identifying the new or unknown functions, redefining or reclassifying the node, etc. [0086] Referring to FIG. 5, an illustrative example of a medical system 500 that employs a multi-label evidential graph neural network to improve the performance of node-level multi-label out-of-distribution detection. The medical system 500 can include medical records 506 for multiple patients stored in memory on a server, in a cloud network, etc. The medical records 506 can be organized into a graphical representation 508. The graphical representation 508 can include nodes 502 connected by edges 504. 22061PCT Page 21 of 38 [0087] Each node 502 can represent a patient or user of the medical system 500, and the node feature can be considered as patient information, such as age, race, weight, etc. The edges 504 can represent relationships between users or relationships to other criteria, for example, the edges 504 can connect patients that share a doctor, a hospital or other commonality. For some nodes, the system includes associated labels, which have multiple classes (multi-class labels), such as specific medical diseases, e.g., diabetes, high blood pressure, heart stents, etc. [0088] All this information constructs representative graphs as input for the ML- EGNN 510. The output of ML- EGNN 510 will be disease predictions for other patients who do not have labels. The prediction includes disease classifications and out-of-distribution detections (e.g., detection of new diseases). All of this information can be provided to medical professionals 512 over a network or medical computer system 511. The network can include an internal or external network (e.g., cloud). The medical professionals 512 can make medical decisions 514 based on this information. The medical professionals 512 can also use this information to update patient data and make the system models more accurate and efficient. [0089] Each node 502 includes labels 503 associated with one or more features of each patient. In one example, labels 503 can include the features stored in the medical records 506, e.g., diagnoses for each patient, data collected for a particular medical condition, a medical history of each patient, etc. In one example, the labels 503 can include test data for tests accumulated over time, can include medical conditions, can include patient features or biological characteristics, etc. ML- EGNN 510 that has been trained to predict out-of-distribution nodes is employed to predict test results, medical conditions, doctor reports or other information that is likely Out- of-Distribution (OOD). 22061PCT Page 22 of 38 [0090] Multi-label opinion fusion enriched multi-label uncertainty representation with evidence information permits out-of-distribution prediction by the Multi-Label Evidential Graph Neural Network 510. Out-of-distribution detection with uncertainty estimation for graph settings, provides the ability to distinguish and detect OOD nodes. In this way, OOD nodes or features including unforeseen or rare medical information can be identified for further analysis and consideration by healthcare workers and/or medical professionals 512. By identifying OOD features including unforeseen or rare medical information, misclassification of patient records, patient medical history, etc. can be prevented. The discovered OOD features can be properly labeled for future consideration and the features which could have otherwise been misclassified can be considered and employed in improving medical decisions 514 by medical professionals 512. [0091] The network 511 can interact with any piece of the system and convey information and resources as needed to identify OOD nodes, update OOD nodes, display updates of patient information, record medical professional inputs/decisions, etc. Information can be conveyed over the network 511 so that the information is available to all users. The functionality provided for determining OOD nodes can be provided as a service for medical staff and programmers to update patient’s profiles in a distributed network setting, in a hospital setting, in a medical office setting, etc. [0092] Referring to FIG. 6, a block diagram showing an exemplary processing system 600 employed in accordance with an embodiment of the present invention. The processing system 600 can include one or more computer processing units (e.g., CPUs) 601, one or more graphical processing units (GPUs) 602, one or more memory devices 603, communication devices 604, and peripherals 605. The CPUs 601 can be single or multi-core CPUs. The GPUs 602 can be single or multi-core GPUs. The 22061PCT Page 23 of 38 CPUs and/or GPUs can be, in whole or part, hardware processing subsystems. The one or more memory devices 603 can include caches, RAMs, ROMs, and other memories (flash, optical, magnetic, etc.). The communication devices 604 can include wireless and/or wired communication devices (e.g., network (e.g., WIFI, etc.) adapters, etc.). The peripherals 605 can include a display device, a user input device, a printer, an imaging device, and so forth. Elements of processing system 600 are connected by one or more buses or networks (collectively denoted by reference numeral 610). [0093] In an embodiment, memory devices 603 can store specially programmed software modules to transform the computer processing system into a special purpose computer configured to implement various aspects of the present invention. In an embodiment, special purpose hardware (e.g., Application Specific Integrated Circuits, Field Programmable Gate Arrays (FPGAs), and so forth) can be used to implement various aspects of the present invention. [0094] In an embodiment, memory devices 603 store program code for implementing node level out-of-distribution detection on multi-label graph data. A ML-EGNN 620 can be stored in memory 603 along with program code for OOD detection 622 to enable efficient multi-label node classification and out-of-distribution detection of nodes in a graphical network. [0095] The processing system 600 may also include other elements (not shown), for example, various other input devices and/or output devices can be included in processing system 600, depending upon the particular implementation. Wireless and/or wired input and/or output devices can be employed. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be 22061PCT Page 24 of 38 utilized. These and other variations of the processing system 600 can also be provided. [0096] Moreover, it is to be appreciated that various figures as described below with respect to various elements and steps relating to the present invention that may be implemented, in whole or in part, by one or more of the elements of system 600. [0097] A MLEGNN is an information processing system that is inspired by biological nervous systems, such as the brain. MLEGNNs includes an information processing structure, which includes a large number of highly interconnected processing elements (called “neurons” or “nodes”) working in parallel to solve specific problems. MLEGNNs are furthermore trained using a set of training data, with learning that involves adjustments to weights that exist between the neurons. Here, the MLEGNNs is configured for a specific application, such as classification of nodes by fusing opinions to arrive at a Joint Belief, through such a learning process. [0098] Referring now to FIG. 7, an illustrative diagram of a neural network 700 is shown. Although a specific structure is shown, having three layers and a set number of fully connected neurons, it should be understood that this is intended solely for the purpose of illustration. In practice, the present embodiments may take any appropriate form, including any number of layers and any pattern or patterns of connections therebetween. [0099] MLEGNNs demonstrate an ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be detected by humans or other computer-based systems. The structure of a neural network is known generally to have input neurons 702 that provide information to one or more “hidden” neurons 704. Connections 708 between the input neurons 702 and hidden neurons 704 are weighted, and these weighted inputs are then 22061PCT Page 25 of 38 processed by the hidden neurons 704 according to some function in the hidden neurons 704. There can be any number of layers of hidden neurons 704, and as well as neurons that perform different functions. There exist different neural network structures as well, such as a convolutional neural network, a maxout network, etc., which may vary according to the structure and function of the hidden layers, as well as the pattern of weights between the layers. The individual layers may perform particular functions, and may include convolutional layers, pooling layers, fully connected layers, softmax layers, or any other appropriate type of neural network layer. With respect to MLEGNNs in accordance with present embodiments, the layers of the MLEGNN include graph convolutional layers, fully connected layers, a ReLU layer. A set of output neurons 706 accepts and processes weighted input from the last set of hidden neurons 704. [0100] This represents a “feed-forward” computation, where information propagates from input neurons 702 to the output neurons 706. Upon completion of a feed-forward computation, the output is compared to a desired output available from training data. The error relative to the training data is then processed in “backpropagation” computation, where the hidden neurons 704 and input neurons 702 receive information regarding the error propagating backward from the output neurons 706. Once the backward error propagation has been completed, weight updates are performed, with the weighted connections 708 being updated to account for the received error. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another. This represents just one variety of computation, and that any appropriate form of computation may be used instead. 22061PCT Page 26 of 38 [0101] To train an MLEGNNs, training data can be divided into a training set and a testing set. The training data includes pairs of an input and a known output. During training, the inputs of the training set are fed into the MLEGNNs using feed-forward propagation. After each input, the output of the MLEGNNs is compared to the respective known output. Discrepancies between the output and the known output that is associated with that particular input are used to generate an error value, which may be backpropagated through the MLEGNNs, after which the weight values of the MLEGNNs may be updated. This process continues until the pairs in the training set are exhausted. [0102] After the training has been completed, the MLEGNNs may be tested against the testing set, to ensure that the training has not resulted in overfitting. If the ML- EGNNs can generalize to new inputs, beyond those which it was already trained on, then it is ready for use. If the MLEGNNs does not accurately reproduce the known outputs of the testing set, then additional training data may be needed, or hyperparameters of the MLEGNNs may need to be adjusted. [0103] MLEGNNs may be implemented in software, hardware, or a combination of the two. For example, each weight 708 may be characterized as a weight value that is stored in a computer memory, and the activation function of each neuron may be implemented by a computer processor. The weight value may store any appropriate data value, such as a real number, a binary value, or a value selected from a fixed number of possibilities, that is multiplied against the relevant neuron outputs. [0104] FIG. 8 is a flow diagram illustrating a method for detecting out-of- distribution nodes in graphs, in accordance with an embodiment of the present invention. The method preferable employs evidential deep learning to provide better predictions/discovery for OOD nodes. Once discovered, OOD nodes can be pruned 22061PCT Page 27 of 38 from a graph, updated with labels, reclassified or subjected to other corrective action(s). Removing, reclassifying, labeling such OOD nodes not only improves the data set but also improves computer processing time when using the graph for practical applications such as, medical decisions, drug interactions, etc. [0105] In block 802, evidence is collected to quantify predictive uncertainty of diverse labels of nodes in a graph of nodes and edges using positive evidence from labels of training nodes of a multi-label evidential graph neural network. The collection of evidence to quantify predictive uncertainty can include predicting positive and negative evidence vectors from the multi-label evidential graph neural network. [0106] The positive and negative evidence vectors can be employed during training to generate a beta distribution using the positive and negative evidence vectors wherein the beta distribution is used to train the multi-label evidential graph neural network by minimizing beta loss. [0107] In block 804, multi-label opinions including belief and disbelief are generated for the diverse labels. The multi-label opinions can include computing for sample i, class k: ^ ^ = @FG^ @ FYVF and ^ ^ = VFG^ @ FYVF , where ^ ^ indicates positive belief mass distribution, ^ ^ indicates negative belief mass distribution, ) ^ are features of positive and negative evidence vectors, respectively. [0108] In block 806, the opinions are combined into a joint belief by employing a comultiplication operation of binomial opinions. The combination of opinions into a joint belief can include combining belief opinions b of a sample by ^ ^∨^∨…∨^ calculated recursively by ^ "∨% = ^ " + ^ % − ^ " ^ % . [0109] In block 808, the joint belief is classified to detect out-of-distribution nodes of the graph, wherein classifying the joint belief to detect out-of-distribution nodes of 22061PCT Page 28 of 38 the graph can include determining whether the joint belief exceeds a threshold value for a given node to determine if the node is out-of-distribution. [0110] In block 810, a corrective action responsive to a detection of an out-of- distribution node is performed. The corrective actin can include automatically assigning or applying a new label to the OOD node. In another embodiment, the node can be classified in a new class. In other embodiments, e.g., where the nodes include patient information, the corrective action can include alerting medical personnel of the out-of-distribution node. A medical decision may be needed based on the out-of- distribution node. For example, if given test results are unknown or unlabeled for a particular patient, a system in accordance with the present embodiment could identify the OOD node and send an alert to a healthcare worker. A decision on whether to take action, e.g., recommend a test, prescribe a drug, isolate the patient can accordingly be made. [0111] In block 820, a neural network can be initially or continuously trained by optimizing the multi-label evidential graph neural network by minimizing total loss which includes a beta loss component and a positive evidence loss component. This can be achieved through a kernel-based evidence estimation process. [0112] As employed herein, the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks. In useful embodiments, the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.). The one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.). The hardware processor subsystem can include one 22061PCT Page 29 of 38 or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.). In some embodiments, the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.). [0113] In some embodiments, the hardware processor subsystem can include and execute one or more software elements. The one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result. [0114] In other embodiments, the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result. Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs). These and other variations of a hardware processor subsystem are also contemplated in accordance with embodiments of the present invention. [0115] Reference in the specification to “one embodiment” or “an embodiment” of the present invention, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment. However, it is to be appreciated that features of one or more embodiments can be combined given the teachings of the present invention provided herein. 22061PCT Page 30 of 38 [0116] It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended for as many items listed. [0117] The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims. 22061PCT Page 31 of 38