METABOLOMICS APPROACH COMBINED WITH MACHINE LEARNING TO RECOGNIZE A MEDICAL CONDITION

Title:

METABOLOMICS APPROACH COMBINED WITH MACHINE LEARNING TO RECOGNIZE A MEDICAL CONDITION

Document Type and Number:

WIPO Patent Application WO/2021/202620

Kind Code:

Abstract:

Provided are methods, compositions, systems and devices comprising applying a metabolite biomarker signature determined by machine learning to a biological sample from a patient to recognize a medical condition.

More Like This:

WO/2023/158713	UNSUPERVISED MACHINE LEARNING METHODS
WO/2021/205435	HOME TESTING WITH CLOSED FEEDBACK LOOP AND CENTRAL EPIDEMIOLOGICAL MONITORING
WO/2023/223045	SYSTEMS AND METHODS FOR ROUTING MEDICAL IMAGES USING ORDER DATA

Inventors:

RAJPURKAR PRANAV (US)
PINSKY BENJAMIN ALAN (US)
HOGAN CATHERINE (US)
LE ANTHONY T (US)
COWAN TINA M (US)

Application Number:

PCT/US2021/025015

Publication Date:

October 07, 2021

Filing Date:

March 30, 2021

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV LELAND STANFORD JUNIOR (US)

International Classes:

G16H50/70; G01N30/02; G01N30/72; G01N33/487; G16H10/60; G16H50/20; G16H50/30; G16H50/80

Domestic Patent References:

WO2019200410A1

2019-10-17

Foreign References:

US20190101544A1	2019-04-04
US20120197539A1	2012-08-02
US20180107783A1	2018-04-19
US20170241043A1	2017-08-24

Attorney, Agent or Firm:

KONSKI, Antoinette F. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

WHAT IS CLAIMED IS:

1. A method comprising:

(a) generating a training dataset based on one or more metabolites isolated from a plurality of biological samples isolated from subjects, wherein the plurality of subjects comprises subjects having a medical condition, and wherein the training dataset comprises a set of features identified through one or more tests run on the biological samples;

(b) producing, using a machine learning system, a metabolite biomarker signature by:

(i) applying one or more machine learning models to the training dataset comprising the set of identified features;

(ii) selecting a subset of the set of features based on contributions to model predictions; and

(iii) generating the metabolite biomarker signature based on the subset of features; and

(c) storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

2. The method of claim 1, wherein the subject has been treated with a therapy neutralizing a pathogen causing the medical condition, or wherein the subject is immune- compromised, or both.

3. The method of claim 1 or 2, wherein the feature of the biological sample comprises an extracellular concentration of the metabolite.

4. The method of any one of claims 1 to 3, further comprising extracting and analyzing one or more biological samples of a patient using the metabolite biomarker signature.

5. The method of any one of claims 1 to 4, wherein the machine learning models comprise boosted or bagged decision trees.

6. The method of claim 5, wherein the boosted or bagged decisions are selected from the group of Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost).

7. The method of any one of claims 1 to 6, wherein selecting the subset of features comprises performing feature importance analysis.

8. The method of any one of claims 1 to 7, wherein selecting the subset of features comprises applying a Shapley Additive Explanation (SHAP) procedure.

9. The method of any one of claims 1 to 8, wherein the medical condition is selected from the group consisting of: an infection caused by a pathogen selected from the group consisting of a bacterium, a virus, a fungi or a parasite, a cancer, or a chronic disease.

10. The method of any one of claims 1 to 9, wherein the medical condition is selected from the group consisting of tuberculosis, a human papillomavirus (HPV) infection, malaria, an infection by a respiratory virus, or an infection by a coronavirus.

11. The method of any one of claims 1 to 10, wherein the medical condition is an infection by a respiratory virus selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus.

12. The method of claim 11, wherein the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D), wherein the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9, and wherein the lineage of influenza B is selected from Victoria or Yamagata.

13. The method of any one of claims 1 to 11, wherein the medical condition is an infection by a coronavirus, optionally selected from the group of: common cold optionally caused by any one of human coronavirus HCoV-OC43, HCoV-HKUl, HCoV-229E, or HCoV-NL63; severe acute respiratory syndrome (SARS) caused by severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome (MERS) caused by Middle East respiratory syndrome coronavirus (MERS-CoV); or Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

14. The method of any one of claims 1 to 13, wherein the one or more tests comprises at least one of liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS).

15. The method of any one of claims 1 to 14, wherein the features are ion features.

16. A method comprising applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition, the metabolite biomarker signature produced by the method of any one of claims 1 to 15.

17. The method of any one of claims 4 to 16, further comprising performing at least one of liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS) on the biological sample from the patient.

18. The method of any one of claims 1 to 17, further comprising identifying, using the metabolite biomarker signature, the medical condition that is a microbial infection in the patient based on an analysis of the biological sample.

19. The method of any one of claims 1 to 18, further comprising recognizing, based on the metabolite biomarker signature, in the patient the medical condition that is an infection by a respiratory virus optionally selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus, wherein the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D), wherein the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; orH9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9, wherein the lineage of influenza B is selected from Victoria or Yamagata, and wherein the coronavirus is selected from the group of HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome coronavirus (MERS-CoV); or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

20. The method of any one of claims 1 to 19, wherein the plurality of subjects further comprises subjects without the medical condition.

21. The method of any one of claims 1 to 20, further comprising isolating the one or more metabolites from the plurality of biological samples.

22. A method comprising:

(a) running one or more types of tests on biological samples from subjects having a medical condition, wherein the biological samples comprise a metabolite profile that changed in the subjects as a result of the medical condition;

(b) generating a training dataset comprising a set of features identified through the one or more tests run on the biological samples;

(i) applying one or more machine learning models to the training dataset comprising the set of identified features;

(ii) selecting a subset of the set of features having contributions to model predictions exceeding a threshold; and

(iii) generating the metabolite biomarker signature based on the subset of features; and

(d) applying the metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.

23. The method of claim 22, wherein the subject has been treated with a therapy neutralizing a pathogen causing the medical condition, or wherein the subject is immune- compromised, or both.

24. The method of claim 22 or 23, wherein the feature of the biological sample comprises an extracellular concentration of the metabolite.

25. The method of any one of claims 22 to 24, wherein the machine learning models comprise boosted or bagged decision trees.

26. The method of claim 25, wherein the boosted or bagged decisions are selected from the group of Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost).

27. The method of any one of claims 22 to 26, wherein selecting the subset of features comprises performing feature importance analysis.

28. The method of any one of claims 22 to 27, wherein selecting the subset of features comprises applying a Shapley Additive Explanation (SHAP) procedure.

29. The method of any one of claims 22 to 28, wherein the medical condition is selected from the group consisting of: an infection caused by a pathogen selected from the group consisting of a bacterium, a virus, a fungi or a parasite, a cancer, or a chronic disease.

30. The method of any one of claims 22 to 29, wherein the medical condition is selected from the group consisting of tuberculosis, a human papillomavirus (HPV) infection, malaria, an infection by a respiratory virus, or an infection by a coronavirus.

31. The method of claim 30, wherein the infection by a respiratory virus is optionally selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus, wherein the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D), wherein the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9, and wherein the lineage of influenza B is selected from Victoria or Yamagata.

32. The method of any one of claims 22 to 30, wherein the medical condition is an infection by a coronavirus optionally selected from the group of HCoV-OC43, HCoV- HKU1, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS- CoV or SARS-CoV-1); Middle East respiratory syndrome coronavirus (MERS-CoV); or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

33. The method of any one of claims 1 to 32, further comprising detecting a pathogen causing the medical condition in the biological sample by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay, wherein detection of the pathogen further indicates the subject has the medical condition.

34. The method of any one of claims 1 to 33, further comprising detecting an immunoglobulin or an immune cell specifically recognizing and binding a pathogen causing the medical condition in the biological sample by an immunofluorescence assay, wherein detection of the immunoglobulin or the immune cell or both further indicates the subject has the medical condition.

35. The method of any one of claims 1 to 34, further comprising administering to the patient having the medical condition a therapy specifically for treating the condition.

36. A method for selecting a subject for an anti-influenza treatment, comprising determining in a biological sample isolated from a subject suspected of being infected with an influenza virus a feature of a metabolite selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and anRT of0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and anRT of3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having an m/z of 227.0793 and an RT of 10.23 or an equivalent thereof, a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, a compound having an m/z of 232.1182 and an RT of 2.11 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 352.2131h and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 63.0440 and an RT of 1.78 or an equivalent thereof, a compound having an m/z of 634.7114 and anRT of 7.00 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, or a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof; and wherein an altered level of the metabolite in the sample as compared to a control level of the metabolite indicates that the subject is suitable for an anti-influenza treatment.

37. The method of claim 36, wherein the in-source fragment ion of pyroglutamic acid is pyroglutamic acid-D5.

38. The method of claim 36 or 37, wherein the subject is immune-compromised or has been treated with a therapy neutralizing the influenza viral infection, or both.

39. The method of any one of claims 36 to 37, further comprising detecting the influenza virus in the biological sample by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay, wherein detection of the influenza virus further indicates the subject has the infection.

40. The method of any one of claims 36 to 39, further comprising detecting an immunoglobulin or an immune cell specifically recognizing and binding the influenza virus in the biological sample by an immunofluorescence assay, wherein detection of the immunoglobulin or the immune cell or both further indicates the subject has the infection.

41. The method of any one of claims 36 to 39, further comprising administering to the subject having the infection an anti-influenza therapy.

42. The method of any one of claims 1 to 41, wherein the feature of the biological sample comprises presence or absence of one or more of the metabolites.

43. The method of any one of claims 1 to 42, wherein the feature of the biological samples comprises a concentration of one or more of the metabolites.

44. The method of claim 43, wherein the concentration is an extracellular concentration.

45. The method of claim 43 or 44, wherein the concentration is normalized to an internal standard, or to the mean compound abundance.

46. The method of any one of claims 1 to 45, wherein the biological sample is a nasopharyngeal sample, a blood sample, a serum sample, a plasma sample, or a urine sample.

47. The method of any one of claims 1 to 46, wherein the biological sample is a nasopharyngeal swab or viral transport medium (VTM) immersing a nasopharyngeal swab.

48. A system comprising a processor and a memory comprising instructions that are executable by the processor to cause the machine learning system to:

(a) generate a training dataset based on one or more tests run on biological samples from a plurality of subjects having a medical condition, wherein the biological samples comprise metabolites from the medical condition, and wherein the training dataset comprises a set of features identified through the one or more tests run on the biological samples;

(b) produce a metabolite biomarker signature by:

(i) applying one or more machine learning models to the training dataset comprising the set of identified features;

(ii) selecting a subset of the set of features based on contributions to model predictions; and

(iii) generating the metabolite biomarker signature based on the subset of features; and

(c) store, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

49. A method comprising: producing, using a machine learning system, a metabolite biomarker signature by:

(i) applying one or more machine learning models to a training dataset based on one or more metabolites in a plurality of biological samples from a plurality of subjects, wherein the plurality of subjects comprises subjects having a medical condition, and wherein the training dataset comprises a set of features identified through the one or more tests run on the biological samples;

(ii) selecting a subset of the set of features based on contributions to model predictions; and

(iii) generating the metabolite biomarker signature based on the subset of features; and storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

50. The method of claim 49, wherein the subject has been treated with a therapy neutralizing a pathogen causing the medical condition, or wherein the subject is immune- compromised, or both.

51. The method of claim 49 or 50, wherein the feature of the biological sample comprises an extracellular concentration of the metabolite.

52. The method of any one of claims 49 to 51, further comprising generating the training dataset based on the one or more metabolites.

53. The method of any one of claims 49 to 52, further comprising isolating the one or more metabolites from the plurality of biological samples.

54. The method of any one of claims 49 to 53, wherein the plurality of subjects comprises subjects without the medical condition.

55. The method of any one of claims 49 to 54, further comprising extracting and analyzing one or more biological samples of a patient using the metabolite biomarker signature.

56. The method of any one of claims 49 to 55, wherein the machine learning models comprise boosted or bagged decision trees.

57. The method of claim 56, wherein the boosted or bagged decision trees ares selected from the group of Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost).

58. The method of any one of claims 49 to 57, wherein selecting the subset of features comprises performing feature importance analysis.

59. The method of any one of claims 49 to 58, wherein selecting the subset of features comprises applying a Shapley Additive Explanation (SHAP) procedure.

60. The method of any one of claims 49 to 59, wherein the medical condition is selected from the group consisting of: an infection caused by a pathogen selected from the group consisting of a bacterium, a virus, a fungi or a parasite, a cancer, or a chronic disease.

61. The method of any one of claims 49 to 60, wherein the medical condition is selected from the group consisting of tuberculosis, a human papillomavirus (HPV) infection, malaria, an infection by a respiratory virus, or an infection by a coronavirus.

62. The method of any one of claims 49 to 61, wherein the medical condition is an infection by a respiratory virus is selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus.

63. The method of claim 62, wherein the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D), wherein the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9, and wherein the lineage of influenza B is selected from Victoria or Yamagata.

64. The method of any one of claims 49 to 62, wherein the medical condition is an infection by a coronavirus optionally selected from the group of: common cold optionally caused by any one of human coronavirus HCoV-OC43, HCoV-HKUl, HCoV-229E, or HCoV-NL63; severe acute respiratory syndrome (SARS) caused by severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome (MERS) caused by Middle East respiratory syndrome coronavirus (MERS-CoV); or Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

65. The method of any one of claims 49 to 64, wherein the one or more tests comprises at least one of liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS).

66. The method of any one of claims 49 to 65, wherein the features are ion features.

67. A method comprising applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition, the metabolite biomarker signature produced by the method of any one of claims 49 to 66.

68. The method of claim 67, further comprising performing at least one of liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS) on a biological sample from a patient.

69. The method of any one of claims 49 to 68, further comprising identifying the medical condition, using the metabolite biomarker signature.

70. The method of any one of claims 49 to 69, further comprising identifying, using the metabolite biomarker signature, the medical condition that is a microbial infection in the patient based on an analysis of the biological sample.

71. The method of any one of claims 49 to 69, further comprising recognizing, based on the metabolite biomarker signature, in the patient the medical condition that is an infection by a respiratory virus optionally selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus, wherein the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D), wherein the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; orH9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9, wherein the lineage of influenza B is selected from Victoria or Yamagata, and wherein the coronavirus is selected from the group of HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome coronavirus (MERS-CoV); or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

72. The method of any one of claims 49 to 71, further comprising administering to the subject having the medical condition a therapy specifically for treating the condition.

73. The method of any one of claims 35, 41 and 72, wherein the therapy comprises a pharmaceutical agent neutralizing a pathogen causing the medical condition.

74. The method of any one of claims 35, 41, 72 and 73, wherein the therapy comprises or further comprises a pharmaceutical agent not neutralizing the pathogen or the influenza virus.

75. The method of any one of claims 35, 41 and 72 to 74, wherein the therapy comprises or further comprises a pharmaceutical agent not neutralizing the extracellular pathogen or the extracellular influenza virus.

Description:

METABOLOMICS APPROACH COMBINED WITH MACHINE LEARNING TO RECOGNIZE A MEDICAL CONDITION

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application No. 63/002,913, filed March 31, 2020 and U.S. Provisional Application No. 63/006,641, filed April 7, 2020, the entire contents of each of which are incorporated herein by reference in its entirety.

BACKGROUND

[0002] Over the last decade, the diagnosis and monitoring of infectious diseases has been revolutionized by molecular testing, including the widespread use of Polymerase Chain Reaction (PCR) in Clinical Microbiology and Virology Laboratories. These methods are rapid and highly accurate; however, important limitations remain unaddressed, including high cost, high complexity, inability to differentiate active infection from latency or colonization, and lack of sensitivity in direct patient specimens (Somerville, et al. Pathology 47, 243-249 (2015); Schreckenberger, et al. J Clin Microbiol 53, 3110-3115 (2015);

Vergara et al. Eur Respir J 51(2), (2018); and Tan et al. Open Forum Infect Dis 3, ofv212 (2016)). Moreover, molecular testing is often restricted to high complexity laboratories, far from the point of care where prompt and actionable diagnosis is most needed (Buchan et al. Clin Microbiol Rev 27, 783-822 (2014)).

[0003] Accurate testing is particularly important for respiratory viruses including influenza, which are estimated to have caused over 35 million symptomatic illnesses during the 2018- 2019 season alone in the United States (Estimated Influenza Illnesses, Medical visits, Hospitalizations, and Deaths in the United States — 2018-2019 influenza season. (2019)). Another example is that rapid diagnostic methods for COVID-19 are currently limited to targeted molecular tests (RT-PCR) that detect the viral RNA genome, or serologic tests that detect anti-SARS-CoV-2 antibodies. However, up to 30% of COVID-19 cases may be missed by molecular methods and specific antibodies are not reliably identified until 2 weeks after the onset of symptoms.

[0004] Techniques that are most cost-effective and can be completed near the point-of-care represent an important currently unmet need in virology. SUMMARY OF THE DISCLOSURE

[0005] In one aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) generating a training dataset based on one or more metabolites isolated from a plurality of biological samples isolated from subjects, wherein the plurality of subjects comprises subjects having a medical condition, and wherein the training dataset comprises, consists essentially of, or yet further consists of a set of features identified through one or more tests run on the biological samples; (b) producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

[0006] In another aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) running one or more types of tests on biological samples from subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of a metabolite profile that changed in the subjects as a result of the medical condition; (b) generating a training dataset comprising, or alternatively consisting essentially of, or yet further consisting of a set of features identified through the one or more tests run on the biological samples; (c) producing a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features having contributions to model predictions exceeding a threshold; and (iii) generating the metabolite biomarker signature based on the subset of features; and (d) applying the metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.

[0007] In yet another aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to a training dataset based on one or more metabolites in a plurality of biological samples from a plurality of subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists essentially of, or yet further consists of a set of features identified through the one or more tests run on the biological samples;

(ii) selecting a subset of the set of features based on contributions to model predictions; and

[0008] In a further aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition. In some embodiments, the metabolite biomarker signature is produced by a method as disclosed herein.

[0009] In a further aspect, provided is a method for selecting a subject for an anti-influenza treatment. The method comprises, or alternatively consists essentially of, or yet further consists of determining in a biological sample isolated from a subject suspected of having a medical condition (such as being infected with an influenza virus) a feature of a metabolite. In some embodiments, the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having an m/z of 227.0793 and an RT of 10.23 or an equivalent thereof, a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, a compound having an m/z of 232.1182 and an RT of 2.11 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 352.213 In and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 63.0440 and an RT of 1.78 or an equivalent thereof, a compound having an m/z of 634.7114 and an RT of 7.00 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, or a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof. In some embodiments, an altered level of the metabolite in the sample as compared to a control level of the metabolite indicates that the subject is suitable for an anti-influenza treatment. In some embodiments, the the in-source fragment ion of pyroglutamic acid is pyroglutamic acid-D5. In some embodiments, the method further comprises administering the subject having the infection an anti -influenza therapy.

[00010] In one aspect, provided is a system comprising, or alternatively consisting essentially of, or yet further consisting of a processor and a memory. The memory comprises, or alternatively consists essentially of, or yet further consists of instructions that are executable by the processor to cause the machine learning system to: (a) generate a training dataset based on one or more tests run on biological samples from a plurality of subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of metabolites from the medical condition, and wherein the training dataset comprises, or alternatively consist essentially of, or yet further consist of a set of features identified through the one or more tests run on the biological samples; (b) produce a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) store, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

[00011] In some embodiments, the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having the medical condition. In further embodiments, the plurality of subjects further comprises subject without the medical condition. In some embodiments, the subject has been treated with a therapy neutralizing a pathogen causing the medical condition. Additionally or alternatively, the subject is immune-compromised. In some embodiments, the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child. In some embodiments, the medical condition is an infection by an influenza virus. In some embodiments, the medical condition is an infection by a coronavirus, such as HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), Middle East respiratory syndrome coronavirus (MERS- CoV), or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

[00012] In some embodiments, the feature of the biological sample comprises a concentration (absolute or normalized), for example an extracellular concentration, of the metabolite. In some embodiments, a metabolite is an intracellular metabolite.

[00013] In some embodiments, the machine learning models comprise, or alternatively consist essentially of, or yet further consist of a boosted or bagged decision tree, such as Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost). In some embodiments, the selecting the subset of features comprises performing feature importance analysis. In further embodiments, the method further comprises applying a Shapley Additive Explanation (SHAP) procedure and selecting the subset of features.

[00014] In some embodiments, the method further comprises administering to the patient having the medical condition a therapy specifically for treating the condition. In further embodiments, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent neutralizing a pathogen causing the medical condition. Additionally or alternatively, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing a pathogen causing the medical condition. In further embodiments, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing an extracellular pathogen causing the medical condition.

[00015] In some embodiments, provided herein is a breakthrough method for the diagnosis of infectious diseases via the characterization of host metabolite signatures using Quadrupole Time-of-Flight (Q-TOF) Liquid Chromatography /Mass Spectrometry (LC/MS) directly on plasma, urine and other body fluid specimens. This method is adapted to define the specific metabolic signatures occurring in individuals infected with SARS-CoV-2, and to identify specific metabolites and metabolic pathways altered in each viral infection. This method is simple, rapid, inexpensive and presents almost unlimited multiplexing capacity.

[00016] The foregoing general description and following detailed description are exemplary and explanatory and are intended to provide further explanation of the disclosure as claimed. Other objects, advantages, and novel features will be readily apparent to those skilled in the art from the following brief description of the drawings and detailed description of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

[00017] FIGs. 1A - ID show principal component analysis (PCA) of unpaired t-test comparison based on 4 key metabolites of nasopharyngeal swabs positive for respiratory virus (as marked, FIG. 1A: adenovirus; FIG. IB: coronavirus; and FIG. 1C: RSV) and nasopharyngeal swab negative for respiratory viruses by RT-PCR (as marked) and Influenza A 2009 H1N1 was also differentiated from influenza A H3N2 using the same methodology (FIG. ID)

[00018] FIGs. 2A - 2B provide a conceptual diagram of the classification analysis by gradient boosted decision tree implemented in GBM (FIG. 2A) and a conceptual diagram of the study from data collection to interpretation (FIG. 2B). The phases of data collection, model development, and interpretation are illustrated. LC/Q-TOF: liquid chromatography quadrupole time-of-flight; LC-MS/MS: liquid chromatography-mass spectrometry; RF: random forests; ROC: receiver operating characteristic curve; SHAP: Shapley Additive explanation.

[00019] FIGs. 3A - 3D show area under the receiver operating characteristic curve test performance of the biomarker discovery set. In FIG. 3A, ROC curves comparing the performance of the machine learning models (RF, LightGBM) with the traditional linear models (Lasso, Ridge) on the test set; bracketed values are 95% AUC confidence intervals calculated from a normal fit of the curves. FIGs. 3B - 3C are AUC curves of comparing LightGBM’ s performance on the test set stratified by subgroup pairs: pediatrics (FIG. 3B) and immunocompromised (FIG. 3C). 95% confidence intervals are shown in brackets. FIG. 3D provides AUC curves comparing LightGBM’ s performance on the prospective test set; bracketed values are 95% AUC confidence intervals calculated from a normal fit of the curves. AUC: area under the receiver operating characteristic curve; RF: random forests; ROC: receiver operating characteristic curve.

[00020] FIGs. 4A - 4D provide feature importance analysis by SHapley Additive explanation (SHAP) values. FIGs. 4A-4B list the top 20 ion features by percentage importance using the SHAP method. Ion features are identified by accurate mass @ retention time (m/z), and the grey scale indicate the association between feature value and positive influenza classification. For example, low values of 84.0447@0.81 are indicative of positive classification, while the relative value of 106.0865@10.34 does not have a clear interpretation, despite being an important feature. FIG. 4C provides AUC and 95% confidence interval of parsimonious decision tree models as a function of number of features used for training. For each set, the left bar indicates data from the discovery set while the right bar indicates data from the validation set. FIG. 4D provides an example decision tree model trained using only the top feature and a maximum depth of 1 that has an AUC of greater than 0.9 on the test set. AUC: area under the receiver operating characteristic curve; m/z: mass over charge ratio; RT: retention time; SHAP: SHapley Additive explanation analysis.

[00021] FIGs. 5A - 5B show area under the receiver operating characteristic curve test performance of the validation set. In FIG. 5A, ROC curves demonstrate LightGBM’s performance on the 96-sample validation test set in Laboratory 1. In FIG. 5B, ROC curves demonstrate LightGBM’s performance on the 96-sample validation test set in Laboratory 2.

[00022] FIG. 6 is a heatmap of nasopharyngeal metabolites. This heatmap was generated from metabolomics analysis of nasopharyngeal samples from children and adults with and without influenza infection, clustered by correlation distance and average linkage. The accurate mass and retention time (accurate mass @ retention time) are listed for each compound on the right, the hierarchical cluster tree appears on the left, and the influenza virus type or subtype is listed at the bottom.

[00023] FIG. 7 provides the LC/Q-TOF experimental workflow from sample collection to data analysis.

[00024] FIG. 8 provides area under the receiver operating characteristic (AUC) data with viral transport medium subtraction. Area under the receiver operating characteristic (AUC) data with viral transport medium subtraction. This model subtracted the mean viral transport medium (VTM) data to assess the impact of background matrix in the analysis. The estimates presented are similar to those without VTM subtraction.

[00025] FIGs. 9A - 9D provide pyroglutamic acid concentration by LC/MS-MS area under the curve analysis in influenza-positive vs influenza-negative specimens in Laboratory 1. Results are shown the overall classification of influenza-positive vs influenza negative (FIG. 9A, pyroglutamic acid; and FIG. 9C, in-source fragment ion of pyroglutamic acid) and classification by influenza type and subtype (FIG. 9B, pyroglutamic acid; and FIG. 9D, in-source fragment ion of pyroglutamic acid). P-values calculated by Mann-Whitney U test.

[00026] FIGs. 10A - 10D provide pyroglutamic acid concentration by LC/MS-MS area by standard curve analysis in influenza-positive vs influenza-negative specimens in Laboratory 2. Results are shown the overall classification of influenza-positive vs influenza negative (FIG. 10A, pyroglutamic acid; and FIG. IOC, in-source fragment ion of pyroglutamic acid) and classification by influenza type and subtype (FIG. 10B, pyroglutamic acid; and FIG. 10D, in-source fragment ion of pyroglutamic acid). P-values calculated by Mann-Whitney U test.

[00027] FIG. 11A is a block diagram depicting an embodiment of a network environment comprising a client device in communication with server device.

[00028] FIG. 1 IB is a block diagram depicting a cloud computing environment comprising client device in communication with cloud service providers.

[00029] FIGs. llC - 1 ID are block diagrams depicting embodiments of computing devices useful in connection with the methods and systems described herein.

[00030] FIG. 12 illustrates a system including a computing device and a sample processing system according to various potential embodiments.

[00031] FIG. 13 shows a flowchart for an example process employing a machine learning approach according to various potential embodiments. DETAILED DESCRIPTION

Definitions

[00032] As it would be understood, the section or subsection headings as used herein is for organizational purposes only and are not to be construed as limiting and/or separating the subject matter described.

[00033] Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, the preferred methods, devices, and materials are now described. All technical and patent publications cited herein are incorporated herein by reference in their entirety. Nothing herein is to be construed as an admission that the disclosure is not entitled to antedate such disclosure by virtue of prior disclosure.

[00034] The practice of the present disclosure will employ, unless otherwise indicated, conventional techniques of tissue culture, immunology, molecular biology, microbiology, cell biology and recombinant DNA, which are within the skill of the art. See, e.g., Sambrook and Russell eds. (2001) Molecular Cloning: A Laboratory Manual, 3rd edition; the series Ausubel et al. eds. (2007) Current Protocols in Molecular Biology; the series Methods in Enzymology (Academic Press, Inc., N.Y.); MacPherson et al. (1991) PCR 1: A Practical Approach (IRL Press at Oxford University Press); MacPherson et al. (1995) PCR 2: A Practical Approach; Harlow and Lane eds. (1999) Antibodies, A Laboratory Manual; Freshney (2005) Culture of Animal Cells: A Manual of Basic Techique, 5th edition; Gait ed. (1984) Oligonucleotide Synthesis; U.S. Patent No. 4,683,195; Hames and Higgins eds. (1984) Nucleic Acid Hybridization; Anderson (1999) Nucleic Acid Hybridization; Hames and Higgins eds. (1984) Transcription and Translation; Immobilized Cells and Enzymes (IRL Press (1986)); Perbal (1984) A Practical Guide to Molecular Cloning; Miller and Calos eds. (1987) Gene Transfer Vectors for Mammalian Cells (Cold Spring Harbor Laboratory); Makrides ed. (2003) Gene Transfer and Expression in Mammalian Cells; Mayer and Walker eds. (1987) Immunochemical Methods in Cell and Molecular Biology (Academic Press, London); Herzenberg et al. eds (1996) Weir’s Handbook of Experimental Immunology; Manipulating the Mouse Embryo: A Laboratory Manual, 3rd edition (Cold Spring Harbor Laboratory Press (2002)); Sohail (ed.) (2004) Gene Silencing by RNA Interference: Technology and Application (CRC Press).

[00035] As used in the specification and claims, the singular form “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a cell” includes a plurality of cells, including mixtures thereof.

[00036] As used herein, the term “comprising” is intended to mean that the compounds, compositions and methods include the recited elements, but not exclude others. “Consisting essentially of’ when used to define compounds, compositions and methods, shall mean excluding other elements of any essential significance to the combination. Thus, a composition consisting essentially of the elements as defined herein would not exclude trace contaminants, e.g., from the isolation and purification method and pharmaceutically acceptable carriers, preservatives, and the like. “Consisting of’ shall mean excluding more than trace elements of other ingredients. Embodiments defined by each of these transition terms are within the scope of this technology.

[00037] All numerical designations, e.g., pH, temperature, time, concentration, and molecular weight, including ranges, are approximations which are varied (+) or (-) by increments of 1, 5, or 10%. It is to be understood, although not always explicitly stated that all numerical designations are preceded by the term “about.” It also is to be understood, although not always explicitly stated, that the reagents described herein are merely exemplary and that equivalents of such are known in the art.

[00038] As will be understood by one skilled in the art, for any and all purposes, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Furthermore, as will be understood by one skilled in the art, a range includes each individual member.

[00039] It is noted that terms such as “approximately,” “substantially,” “about,” or the like may be construed, in various embodiments, to allow for insubstantial or otherwise acceptable deviations from specific values. In various embodiments, deviations of 20 percent may be considered insubstantial deviations, while in certain embodiments, deviations of 15 percent may be considered insubstantial deviations, and in other embodiments, deviations of 10 percent may be considered insubstantial deviations, and in some embodiments, deviations of 5 percent may be considered insubstantial deviations. In various embodiments, deviations may be acceptable when they achieve the intended results or advantages, or are otherwise consistent with the spirit or nature of the embodiments.

[00040] “Optional” or “optionally” means that the subsequently described circumstance may or may not occur, so that the description includes instances where the circumstance occurs and instances where it does not.

[00041] As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (“or”).

[00042] “Substantially” or “essentially” means nearly totally or completely, for instance, 95% or greater of some given quantity. In some embodiments, “substantially” or “essentially” means 95%, 96%, 97%, 98%, 99%, 99.5%, or 99.9%.

[00043] The terms or “acceptable,” “effective,” or “sufficient” when used to describe the selection of any components, ranges, dose forms, etc. disclosed herein intend that said component, range, dose form, etc. is suitable for the disclosed purpose.

[00044] As used herein, comparative terms as used herein, such as high, low, increase, decrease, reduce, or any grammatical variation thereof, can refer to certain variation from the reference. In some embodiments, such variation can refer to about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 80%, or about 90%, or about 1 fold, or about 2 folds, or about 3 folds, or about 4 folds, or about 5 folds, or about 6 folds, or about 7 folds, or about 8 folds, or about 9 folds, or about 10 folds, or about 20 folds, or about 30 folds, or about 40 folds, or about 50 folds, or about 60 folds, or about 70 folds, or about 80 folds, or about 90 folds, or about 100 folds or more higher than the reference. In some embodiments, such variation can refer to about 1%, or about 2%, or about 3%, or about 4%, or about 5%, or about 6%, or about 7%, or about 8%, or about 0%, or about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 75%, or about 80%, or about 85%, or about 90%, or about 95%, or about 96%, or about 97%, or about 98%, or about 99% of the reference. [00045] As used herein, the term “animal” refers to living multi-cellular vertebrate organisms, a category that includes, for example, mammals and birds. The term “mammal” includes both human and non-human mammals.

[00046] The term “subject,” “host,” “individual,” and “patient” are as used interchangeably herein to refer to animals, typically mammalian animals. Non-limiting examples of mammals include humans, non-human primates (e.g., apes, gibbons, chimpanzees, orangutans, monkeys, macaques, and the like), domestic animals (e.g., dogs and cats), farm animals (e.g., horses, cows, goats, sheep, pigs) and experimental animals (e.g., mouse, rat, rabbit, guinea pig). In some embodiments, a mammal is a human. A mammal can be any age or at any stage of development (e.g., an adult, teen, child, infant, or a mammal in utero). A mammal can be male or female. In some embodiments, a subject is a human. In some embodiments, a subject is suspected of having a medical condition. In further embodiments, the subject may be asymptomatic. In other embodiments, the subject may be symptomatic, i.e., showing a symptom of the medical condition. In some embodiments, the subject is immune-comprised. Additionally or alternatively, the subject has been treated with a therapy neutralizing the pathogen causing the medical condition.

[00047] A “composition” as used herein, refers to an active agent, such as a compound as disclosed herein and a carrier, inert or active. The carrier can be, without limitation, solid such as a bead or resin, or liquid, such as phosphate buffered saline.

[00048] An “effective amount” is an amount sufficient to effect beneficial or desired results. An effective amount can be administered in one or more administrations, applications or dosages. Such delivery is dependent on a number of variables including the time period for which the individual dosage unit is to be used, the bioavailability of the therapeutic agent, the route of administration, etc. It is understood, however, that specific dose levels of the therapeutic agents disclosed herein for any particular subject depends upon a variety of factors including the activity of the specific compound employed, bioavailability of the compound, the route of administration, the age of the animal and its body weight, general health, sex, the diet of the animal, the time of administration, the rate of excretion, the drug combination, and the severity of the particular disorder being treated and form of administration. In general, one will desire to administer an amount of the compound that is effective to achieve a serum level commensurate with the concentrations found to be effective in vivo. These considerations, as well as effective formulations and administration procedures are well known in the art and are described in standard textbooks.

[00049] “Therapeutically effective amount” of a drug or an agent refers to an amount of the drug or the agent that is an amount sufficient to obtain a pharmacological response; or alternatively, is an amount of the drug or agent that, when administered to a patient with a specified disorder or disease, is sufficient to have the intended effect, e.g., treatment, alleviation, amelioration, palliation or elimination of one or more manifestations of the specified disorder or disease in the patient. A therapeutic effect does not necessarily occur by administration of one dose, and may occur only after administration of a series of doses. Thus, a therapeutically effective amount may be administered in one or more administrations.

[00050] As used herein, “treating” or “treatment” of a disease in a subject refers to (1) preventing the symptoms or disease from occurring in a subject that is predisposed or does not yet display symptoms of the disease; (2) inhibiting the disease or arresting its development; or (3) ameliorating or causing regression of the disease or the symptoms of the disease. As understood in the art, “treatment” is an approach for obtaining beneficial or desired results, including clinical results. For the purposes of the present technology, beneficial or desired results can include one or more, but are not limited to, alleviation or amelioration of one or more symptoms, diminishment of extent of a condition (including a disease), stabilized (i.e., not worsening) state of a condition (including disease), delay or slowing of condition (including disease), progression, amelioration or palliation of the condition (including disease), states and remission (whether partial or total), whether detectable or undetectable. When the disease is cancer, the following clinical end points are non-limiting examples of treatment: reduction in tumor burden, slowing of tumor growth, longer overall survival, longer time to tumor progression, inhibition of metastasis or a reduction in metastasis of the tumor. In one aspect, treatment excludes prophylaxis.

[00051] As used herein, a biological sample, or a sample, is obtained from a subject. Exemplary samples include, but are not limited to, cell sample, tissue sample, tumor biopsy, liquid samples such as blood and other liquid samples of biological origin, including, but not limited to, ocular fluids (aqueous and vitreous humor), peripheral blood, sera, plasma, ascites, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, broncheoalveolar lavage fluid, semen, prostatic fluid, cowper’s fluid or pre-ejaculatory fluid, female ejaculate, sweat, tears, cyst fluid, pleural and peritoneal fluid, pericardial fluid, ascites, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vomit, vaginal secretions/flushing, synovial fluid, mucosal secretion, stool water, pancreatic juice, lavage fluids from sinus cavities, bronchopulmonary aspirates, blastocyl cavity fluid, or umbilical cord blood.

[00052] The term “contacting” means direct or indirect binding or interaction between two or more. A particular example of direct interaction is binding. A particular example of an indirect interaction is where one entity acts upon an intermediary molecule, which in turn acts upon the second referenced entity. Contacting as used herein includes in solution, in solid phase, in vitro, ex vivo, in a cell and in vivo. Contacting in vivo can be referred to as administering, or administration.

[00053] The term “isolated” or “extracting” is also used herein to refer to polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof (such as a biological sample) that are isolated from other polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof which are normally associated in nature. For example, extracting a biological sample from a subject may refer to obtaining polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof from the subject and having the obtained biological materials ex vivo or in vitro.

[00054] As used herein, a “cancer” is a disease state characterized by the presence in a subject of cells demonstrating abnormal uncontrolled replication and in some aspects, the term may be used interchangeably with the term “tumor.”

[00055] As used herein, a metabolite (used interchangeably with a metabolite compound) refers to an intermediate of metabolism, or end product of metabolism, or any substance involved in metabolism. In some embodiments, a metabolite refers to a small molecule, which is an organic compound having a low molecular weight, such as lower than 900 daltons, or size on the order of 1 nm. In some embodiment, a metabolite can be measured by a test as disclosed herein.

[00056] As used herein, the tern "mass spectrometry" or "MS" refers to an analytical technique to identify compounds by their mass. MS refers to methods of filtering, detecting, and measuring ions based on their mass-to-charge ratio, or "m/z". MS technology generally includes (1) ionizing the compounds to form charged compounds; and (2) detecting the molecular weight of the charged compounds and calculating a mass-to-charge ratio. The compounds maybe ionized and detected by any suitable means. A "mass spectrometer" generally includes an ionizer and an ion detector. In general, one or more molecules of interest are ionized, and the ions are subsequently introduced into a mass spectrographic instrument where, due to a combination of magnetic and electric fields, the ions follow a path in space that is dependent upon mass ("m") and charge ("z"). See, e.g., U.S. Patent Nos. 6,204,500, entitled "Mass Spectrometry From Surfaces;" 6,107,623, entitled "Methods and Apparatus for Tandem Mass Spectrometry;" 6,268,144, entitled "DNA Diagnostics Based On Mass Spectrometry; "6, 124, 137, entitled "Surface-Enhanced Photolabile Attachment And Release For Desorption And Detection Of Analytes;" Wright et al., Prostate Cancer and Prostatic Diseases 1999, 2: 264-76; and Merchant and Weinberger, Electrophoresis 2000, 21: 1164-67.

[00057] Retention time (RT) is a measure of the time taken for a solute to pass through a chromatography column. It is calculated as the time from injection to detection. The RT for a compound is not fixed as many factors can influence it even if the same chromatography (GC) and column are used. These include, but are not limited to: gas flow rate, temperature differences in the oven and column, column degradation, or column length. These factors can make it difficult to compare retention times. In some embodiments, an RT as used herein is determined by a quanlitative analysis. Qualitative analysis relies on comparing the retention times of the peaks in an unknown sample with those of known standards. If the retention time of a peak in the unknown sample is the same as the standard then a positive identification can be made. Such method that can reduce the effect of small changes in GC parameters would be beneficial for qualitative analysis. In some embodiments, an RT as used herein is a relative RT. The use of the relative retention time (RRT) reduces the effects of some of the variables that can affect the retention time. RRT is an expression of a sample’s retention time relative to the standard’s retention time. To measure RRT, a sample matrix is made up by mixing the sample with an internal standard (IS), and the following calculation can then be performed.

RRT= Standard RT / Sample RT [00058] Chromatography (GC) retention time coupled with mass spectral identification are used herein for identifying a compound.

[00059] A decision tree is a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility, displaying an algorithm that only contains conditional control statements. Ensemble methods combine several decision trees to produce better predictive performance than utilizing a single decision tree. Ensembled decision trees may be bagged or boosted.

[00060] Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree, for example by creating several subsets of data from training sample chosen randomly with replacement, using each collection of subset data to train the decision trees, and accordingly ending up with an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree. One non limiting example of bagged decision trees is random forest, which takes one extra step using the radom selection of features rather than using all features to grow trees.

[00061] Boosting is another ensemble technique to create a collection of predictors. In this technique, learners are learned sequentially with early learners fitting simple models to the data and then analyzing data for errors. Consecutive trees (random sample) are fitted and at every step, the goal is to solve for net error from the prior tree. When an input is misclassified by a hypothesis, its weight is increased so that next hypothesis is more likely to classify it correctly. By combining the whole set at the end converts weak learners into better performing model. Gradient Boosting is an extension over boosting method, using radient descent algorithm which can optimize any differentiable loss function. An ensemble of trees are built one by one and individual trees are summed sequentially. Next tree tries to recover the loss (difference between actual and predicted values). Also, see wikipedia.org/wiki/Gradient_boosting for more details about gradient boosting which is enclosed herein by reference in its entirety. Non-limiting examples of gradient boosting include Light Gradient Boosting Machine(LightGBM), XGBoost, or Adaptive Boosting (AdaBoost).

[00062] SHAP (SHapley Additive explanations) is a method to explain individual predictions. SHAP is based on the game theoretically optimal Shapley Values. The goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction. The SHAP explanation method computes Shapley values from coalitional game theory. The feature values of a data instance act as players in a coalition. Shapley shows how to fairly distribute the "payout" (= the prediction) among the features. A player can be an individual feature value, e.g. for tabular data. A player can also be a group of feature values. For example to explain an image, pixels can be grouped to super pixels and the prediction distributed among them. One innovation that SHAP brings to the table is that the Shapley value explanation is represented as an additive feature attribution method, a linear model. That view connects LIME and Shapley Values. SHAP specifies the explanation as: where g is the explanation model, z'E{ 0, 1 } ^M is the coalition vector, M is the maximum coalition size and f _|£ΐI is the feature attribution for a feature j, the Shapley values. In the coalition vector, an entry of 1 means that the corresponding feature value is "present" and 0 that it is "absent". To compute Shapley values, it is stimulated that only some features values are playing ("present") and some are not ("absent"). More details are available at shap . readthedocs . i o/ en/1 atest/.

Modes for Carrying out the Disclosure

[00063] Viruses infect respiratory epithelial cells, where they may induce metabolite alterations in the host (Ferrarini et al. Electrophoresis 38, 2341-2348 (2017); and Stewart et al. J Infect Dis 217, 1160-1169 (2018)). The ‘-omics’ field has emerged as a promising discipline to address some of these gaps, with greater emphasis placed on genomics and proteomics so far for infectious diseases diagnostics including clinical virology (Antonelli. Clin Microbiol Infect 19, 8-9 (2013); Mancone et al. Clin Microbiol Infect 19, 23-28 (2013); and Burke, et al. EBioMedicine 17, 172-181 (2017)). Metabolomics, or the large- scale study of small molecules, represents a change in paradigm from routine clinical virology diagnostics as it detects host metabolic response rather than directly detecting the pathogen (Sinem Nalbantoglu (August 7th 2019). Metabolomics: Basic Principles and Strategies, Molecular Medicine, Sinem Nalbantoglu and Hakima Amri, IntechOpen). Metabolomics theoretically holds promise for infectious diseases applications as it can be performed directly from patient specimens from minimal sample volume, is inexpensive to run, provides a real-time assessment of host response and may accurately differentiate active infection from colonization (Pacchiarotta et al. Bioanalysis 4, 919-925 (2012); and Zurfluh et al. Expert Rev Anti Infect Ther 16, 133-142 (2018)).

[00064] Nasopharyngeal swab sampling followed by swab immersion in viral transport medium (VTM) is the most common collection technique for the diagnosis of respiratory viruses and enables the non-invasive collection of respiratory cells. Applicant hypothesized that analysis of VTM after nasopharyngeal sampling using a recently reported and sensitive in-line two-column metabolomics method would reveal distinct signatures for the diagnosis of infectious diseases (Corman et al. Euro Surveill 25, 1-8 (2020); and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2020; 1143: 122072). This method is well suited for the characterization of host metabolite signatures directly from patient specimens using liquid chromatography quadrupole time-of-flight mass spectrometry (LC/Q-TOF) using a simplified experimental workflow (FIG. 7) (Corman et al. Euro Surveill 25, 1-8 (2020); and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2020; 1143: 122072).

[00065] Applicant used this LC/Q-TOF method to generate data to develop and validate machine learning (ML) algorithms for classification of influenza infection status, and an interpretation method for biomarker discovery (FIG. 2B). The developed top-20 biomarker signature was then adapted to testing on simpler, targeted triple quadrupole mass spectrometry instruments (LC/MS-MS; referred to as tandem mass spectrometry) in two distinct laboratories for validation on upper respiratory tract specimens. Accordingly, provided is a metabolomics approach combined with machine learning in diagnosis of influenza from nasopharyngeal swabs with high test performance. As is discussed herein, this approach is applied for the generation of diagnostic tools and therapeutic interventions for any disease, disorder or infectious disease which affects the metabolome of the subject.

[00066] The metabolomic method of this disclosure presents multiple novel aspects that have the potential to fill the diagnostic gap and significantly improve the way infections disease such as influenze or SARS-CoV-2 infection/COVID-19 is diagnosed and monitored. Firstly, metabolic signature discovery is based on a novel in-line, two-column metabolomics method that enables testing to be performed in a single run. This approach reduces turnaround time and increases precision compared to current standard of care in metabolomics, where testing must be performed separately for polar and non-polar compounds. Secondly, the method shows promise to improve the way SARS-CoV-2 diagnostics are performed by directly characterizing the host metabolic response to infection in a non-invasive manner that optimizes sensitivity. This is particularly important given the limitation of current molecular methods in only detecting target organism, which would fail to recognize mutated or variant strains. Other advantages over conventional testing include reduced turnaround time, simplicity, lower reagent cost, and virtually unlimited multiplexing capacity. Finally, this method can be adapted for use from a variety of specimen types, including plasma, urine and other normally-sterile body fluids as well as diseases or conditions other than infectious disease.

Systems, Devices, and Methods for Machine Learning Modeling

[00067] Aspects of the operating environment as well as associated system components (e.g., hardware elements) in connection with various embodiments of the methods and systems described herein will now be discussed. Referring to FIG. 11 A, an embodiment of a network environment is depicted. In brief overview, the network environment includes one or more clients 102a-102n (also generally referred to as local machine(s) 102, client(s) 102, client node(s) 102, client machine(s) 102, client computer(s) 102, client device(s) 102, endpoint(s) 102, or endpoint node(s) 102) in communication with one or more servers 106a- 106n (also generally referred to as server(s) 106, node 106, or remote machine(s) 106) via one or more networks 104. In some embodiments, a client 102 has the capacity to function as both a client node seeking access to resources provided by a server and as a server providing access to hosted resources for other clients 102a-102n.

[00068] Although FIG. 11A shows a network 104 between the clients 102 and the servers 106, the clients 102 and the servers 106 may be on the same network 104. In some embodiments, there are multiple networks 104 between the clients 102 and the servers 106. In one of these embodiments, a network 104’ (not shown) may be a private network and a network 104 may be a public network. In another of these embodiments, a network 104 may be a private network and a network 104’ a public network. In still another of these embodiments, networks 104 and 104’ may both be private networks.

[00069] The network 104 may be connected via wired or wireless links. Wired links may include Digital Subscriber Line (DSL), coaxial cable lines, or optical fiber lines. The wireless links may include BLUETOOTH, Wi-Fi, Worldwide Interoperability for Microwave Access (WiMAX), an infrared channel or satellite band. The wireless links may also include any cellular network standards used to communicate among mobile devices, including standards that qualify as 1G, 2G, 3G, 4G, or 5G. The network standards may qualify as one or more generation of mobile telecommunication standards by fulfilling a specification or standards such as the specifications maintained by International Telecommunication Union. The 3G standards, for example, may correspond to the International Mobile Telecommunications-2000 (IMT-2000) specification, and the 4G standards may correspond to the International Mobile Telecommunications Advanced (IMT- Advanced) specification. Examples of cellular network standards include AMPS, GSM, GPRS, UMTS, LTE, LTE Advanced, Mobile WiMAX, and WiMAX- Advanced. Cellular network standards may use various channel access methods e.g. FDMA, TDMA, CDMA, or SDMA. In some embodiments, different types of data may be transmitted via different links and standards. In other embodiments, the same types of data may be transmitted via different links and standards.

[00070] The network 104 may be any type and/or form of network. The geographical scope of the network 104 may vary widely and the network 104 can be a body area network (BAN), a personal area network (PAN), a local-area network (LAN), e.g. Intranet, a metropolitan area network (MAN), a wide area network (WAN), or the Internet. The topology of the network 104 may be of any form and may include, e.g., any of the following: point-to-point, bus, star, ring, mesh, or tree. The network 104 may be an overlay network which is virtual and sits on top of one or more layers of other networks 104’. The network 104 may be of any such network topology as known to those ordinarily skilled in the art capable of supporting the operations described herein. The network 104 may utilize different techniques and layers or stacks of protocols, including, e.g., the Ethernet protocol, the internet protocol suite (TCP/IP), the ATM (Asynchronous Transfer Mode) technique, the SONET (Synchronous Optical Networking) protocol, or the SDH (Synchronous Digital Hierarchy) protocol. The TCP/IP internet protocol suite may include application layer, transport layer, internet layer (including, e.g., IPv6), or the link layer. The network 104 may be a type of a broadcast network, a telecommunications network, a data communication network, or a computer network. [00071] In some embodiments, the system may include multiple, logically-grouped servers 106. In one of these embodiments, the logical group of servers may be referred to as a server farm 38 or a machine farm 38. In another of these embodiments, the servers 106 may be geographically dispersed. In other embodiments, a machine farm 38 may be administered as a single entity. In still other embodiments, the machine farm 38 includes a plurality of machine farms 38. The servers 106 within each machine farm 38 can be heterogeneous - one or more of the servers 106 or machines 106 can operate according to one type of operating system platform (e.g., WINDOWS NT, manufactured by Microsoft Corp. of Redmond, Washington), while one or more of the other servers 106 can operate on according to another type of operating system platform (e.g., Unix, Linux, or Mac OS X).

[00072] In one embodiment, servers 106 in the machine farm 38 may be stored in high- density rack systems, along with associated storage systems, and located in an enterprise data center. In this embodiment, consolidating the servers 106 in this way may improve system manageability, data security, the physical security of the system, and system performance by locating servers 106 and high performance storage systems on localized high performance networks. Centralizing the servers 106 and storage systems and coupling them with advanced system management tools allows more efficient use of server resources.

[00073] The servers 106 of each machine farm 38 do not need to be physically proximate to another server 106 in the same machine farm 38. Thus, the group of servers 106 logically grouped as a machine farm 38 may be interconnected using a wide-area network (WAN) connection or a metropolitan-area network (MAN) connection. For example, a machine farm 38 may include servers 106 physically located in different continents or different regions of a continent, country, state, city, campus, or room. Data transmission speeds between servers 106 in the machine farm 38 can be increased if the servers 106 are connected using a local-area network (LAN) connection or some form of direct connection. Additionally, a heterogeneous machine farm 38 may include one or more servers 106 operating according to a type of operating system, while one or more other servers 106 execute one or more types of hypervisors rather than operating systems. In these embodiments, hypervisors may be used to emulate virtual hardware, partition physical hardware, virtualize physical hardware, and execute virtual machines that provide access to computing environments, allowing multiple operating systems to run concurrently on a host computer. Native hypervisors may run directly on the host computer. Hypervisors may include VMware ESX/ESXi, manufactured by VMWare, Inc., of Palo Alto, California; the Xen hypervisor, an open source product whose development is overseen by Citrix Systems, Inc.; the HYPER-V hypervisors provided by Microsoft or others. Hosted hypervisors may run within an operating system on a second software level. Examples of hosted hypervisors may include VMware Workstation and VIRTU ALBOX.

[00074] Management of the machine farm 38 may be de-centralized. For example, one or more servers 106 may comprise components, subsystems and modules to support one or more management services for the machine farm 38. In one of these embodiments, one or more servers 106 provide functionality for management of dynamic data, including techniques for handling failover, data replication, and increasing the robustness of the machine farm 38. Each server 106 may communicate with a persistent store and, in some embodiments, with a dynamic store.

[00075] Server 106 may be a file server, application server, web server, proxy server, appliance, network appliance, gateway, gateway server, virtualization server, deployment server, SSL VPN server, or firewall. In one embodiment, the server 106 may be referred to as a remote machine or a node. In another embodiment, a plurality of nodes 290 may be in the path between any two communicating servers.

[00076] Referring to FIG. 11B, a cloud computing environment is depicted. A cloud computing environment may provide client 102 with one or more resources provided by a network environment. The cloud computing environment may include one or more clients 102a-102n, in communication with the cloud 108 over one or more networks 104. Clients 102 may include, e.g., thick clients, thin clients, and zero clients. A thick client may provide at least some functionality even when disconnected from the cloud 108 or servers 106. A thin client or a zero client may depend on the connection to the cloud 108 or server 106 to provide functionality. A zero client may depend on the cloud 108 or other networks 104 or servers 106 to retrieve operating system data for the client device. The cloud 108 may include back end platforms, e.g., servers 106, storage, server farms or data centers.

[00077] The cloud 108 may be public, private, or hybrid. Public clouds may include public servers 106 that are maintained by third parties to the clients 102 or the owners of the clients. The servers 106 may be located off-site in remote geographical locations as disclosed above or otherwise. Public clouds may be connected to the servers 106 over a public network. Private clouds may include private servers 106 that are physically maintained by clients 102 or owners of clients. Private clouds may be connected to the servers 106 over a private network 104. Hybrid clouds 108 may include both the private and public networks 104 and servers 106.

[00078] The cloud 108 may also include a cloud based delivery, e.g. Software as a Service (SaaS) 110, Platform as a Service (PaaS) 112, and Infrastructure as a Service (IaaS) 114. IaaS may refer to a user renting the use of infrastructure resources that are needed during a specified time period. IaaS providers may offer storage, networking, servers or virtualization resources from large pools, allowing the users to quickly scale up by accessing more resources as needed. Examples of IaaS can include infrastructure and services (e.g., EG-32) provided by OVH HOSTING of Montreal, Quebec, Canada, AMAZON WEB SERVICES provided by Amazon.com, Inc., of Seattle, Washington, RACKSPACE CLOUD provided by Rackspace US, Inc., of San Antonio, Texas, Google Compute Engine provided by Google Inc. of Mountain View, California, or RIGHTSCALE provided by RightScale, Inc., of Santa Barbara, California. PaaS providers may offer functionality provided by IaaS, including, e.g., storage, networking, servers or virtualization, as well as additional resources such as, e.g., the operating system, middleware, or runtime resources. Examples of PaaS include WINDOWS AZURE provided by Microsoft Corporation of Redmond, Washington, Google App Engine provided by Google Inc., and HEROKU provided by Heroku, Inc. of San Francisco, California. SaaS providers may offer the resources that PaaS provides, including storage, networking, servers, virtualization, operating system, middleware, or runtime resources. In some embodiments, SaaS providers may offer additional resources including, e.g., data and application resources. Examples of SaaS include GOOGLE APPS provided by Google Inc., SALESFORCE provided by Salesforce.com Inc. of San Francisco, California, or OFFICE 365 provided by Microsoft Corporation. Examples of SaaS may also include data storage providers, e.g. DROPBOX provided by Dropbox, Inc. of San Francisco, California, Microsoft SKYDRIVE provided by Microsoft Corporation, Google Drive provided by Google Inc., or Apple ICLOUD provided by Apple Inc. of Cupertino, California.

[00079] Clients 102 may access IaaS resources with one or more IaaS standards, including, e.g., Amazon Elastic Compute Cloud (EC2), Open Cloud Computing Interface (OCCI), Cloud Infrastructure Management Interface (CIMI), or OpenStack standards. Some IaaS standards may allow clients access to resources over HTTP, and may use Representational State Transfer (REST) protocol or Simple Object Access Protocol (SOAP). Clients 102 may access PaaS resources with different PaaS interfaces. Some PaaS interfaces use HTTP packages, standard Java APIs, JavaMail API, Java Data Objects (JDO), Java Persistence API (JPA), Python APIs, web integration APIs for different programming languages including, e.g., Rack for Ruby, WSGI for Python, or PSGI for Perl, or other APIs that may be built on REST, HTTP, XML, or other protocols. Clients 102 may access SaaS resources through the use of web-based user interfaces, provided by a web browser (e g. GOOGLE CHROME, Microsoft INTERNET EXPLORER, or Mozilla Firefox provided by Mozilla Foundation of Mountain View, California). Clients 102 may also access SaaS resources through smartphone or tablet applications, including, e.g., Salesforce Sales Cloud, or Google Drive app. Clients 102 may also access SaaS resources through the client operating system, including, e.g., Windows file system for DROPBOX.

[00080] In some embodiments, access to IaaS, PaaS, or SaaS resources may be authenticated. For example, a server or authentication server may authenticate a user via security certificates, HTTPS, or API keys. API keys may include various encryption standards such as, e.g., Advanced Encryption Standard (AES). Data resources may be sent over Transport Layer Security (TLS) or Secure Sockets Layer (SSL).

[00081] The client 102 and server 106 may be deployed as and/or executed on any type and form of computing device, e.g. a computer, network device or appliance capable of communicating on any type and form of network and performing the operations described herein. FIGs. 11C and 11D depict block diagrams of a computing device 100 useful for practicing an embodiment of the client 102 or a server 106. As shown in FIGs. 11C and 11D, each computing device 100 includes a central processing unit 121, and a main memory unit 122. As shown in FIG. 11C, a computing device 100 may include a storage device 128, an installation device 116, a network interface 118, an I/O controller 123, display devices 124a-124n, a keyboard 126 and a pointing device 127, e.g. a mouse. The storage device 128 may include, without limitation, an operating system, software, and a software of a genomic data processing system 120. As shown in FIG. 11D, each computing device 100 may also include additional optional elements, e.g. a memory port 103, a bridge 170, one or more input/output devices 130a-130n (generally referred to using reference numeral 130), and a cache memory 140 in communication with the central processing unit 121. [00082] The central processing unit 121 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 122. In many embodiments, the central processing unit 121 is provided by a microprocessor unit, e.g.: those manufactured by Intel Corporation of Mountain View, California; those manufactured by Motorola Corporation of Schaumburg, Illinois; the ARM processor and TEGRA system on a chip (SoC) manufactured by Nvidia of Santa Clara, California; the POWER7 processor, those manufactured by International Business Machines of White Plains, New York; or those manufactured by Advanced Micro Devices of Sunnyvale, California. The computing device 100 may be based on any of these processors, or any other processor capable of operating as described herein. The central processing unit 121 may utilize instruction level parallelism, thread level parallelism, different levels of cache, and multi-core processors. A multi-core processor may include two or more processing units on a single computing component. Examples of multi-core processors include the AMD PHENOM IIX2, INTEL CORE i5 and INTEL CORE i7.

[00083] Main memory unit or memory device 122 may include one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 121. Main memory unit or device 122 may be volatile and faster than storage 128 memory. Main memory units or devices 122 may be Dynamic random access memory (DRAM) or any variants, including static random access memory (SRAM), Burst SRAM or SynchBurst SRAM (BSRAM), Fast Page Mode DRAM (FPM DRAM),

Enhanced DRAM (EDRAM), Extended Data Output RAM (EDO RAM), Extended Data Output DRAM (EDO DRAM), Burst Extended Data Output DRAM (BEDO DRAM),

Single Data Rate Synchronous DRAM (SDR SDRAM), Double Data Rate SDRAM (DDR SDRAM), Direct Rambus DRAM (DRDRAM), or Extreme Data Rate DRAM (XDR DRAM). In some embodiments, the main memory 122 or the storage 128 may be non volatile; e.g., non-volatile read access memory (NVRAM), flash memory non-volatile static RAM (nvSRAM), Ferroelectric RAM (FeRAM), Magnetoresistive RAM (MRAM), Phase- change memory (PRAM), conductive-bridging RAM (CBRAM), Silicon-Oxide-Nitride- Oxide-Silicon (SONOS), Resistive RAM (RRAM), Racetrack, Nano-RAM (NRAM), or Millipede memory. The main memory 122 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein. In the embodiment shown in FIG. 11C, the processor 121 communicates with main memory 122 via a system bus 150 (described in more detail below). FIG. 11D depicts an embodiment of a computing device 100 in which the processor communicates directly with main memory 122 via a memory port 103. For example, in FIG. 11D the main memory 122 may be DRDRAM.

[00084] FIG. 11D depicts an embodiment in which the main processor 121 communicates directly with cache memory 140 via a secondary bus, sometimes referred to as a backside bus. In other embodiments, the main processor 121 communicates with cache memory 140 using the system bus 150. Cache memory 140 typically has a faster response time than main memory 122 and is typically provided by SRAM, BSRAM, or EDRAM. In the embodiment shown in FIG. 11D, the processor 121 communicates with various EO devices 130 via a local system bus 150. Various buses may be used to connect the central processing unit 121 to any of the I/O devices 130, including a PCI bus, a PCI-X bus, or a PCI-Express bus, or a NuBus. For embodiments in which the EO device is a video display 124, the processor 121 may use an Advanced Graphics Port (AGP) to communicate with the display 124 or the I/O controller 123 for the display 124. FIG. 11D depicts an embodiment of a computer 100 in which the main processor 121 communicates directly with I/O device 130b or other processors 121 ’ via HYPERTRANSPORT, RAPIDIO, or INFINIBAND communications technology. FIG. 11D also depicts an embodiment in which local busses and direct communication are mixed: the processor 121 communicates with I/O device 130a using a local interconnect bus while communicating with I/O device 130b directly.

[00085] A wide variety of EO devices 130a- 13 On may be present in the computing device 100. Input devices may include keyboards, mice, trackpads, trackballs, touchpads, touch mice, multi-touch touchpads and touch mice, microphones, multi-array microphones, drawing tablets, cameras, single-lens reflex camera (SLR), digital SLR (DSLR), CMOS sensors, accelerometers, infrared optical sensors, pressure sensors, magnetometer sensors, angular rate sensors, depth sensors, proximity sensors, ambient light sensors, gyroscopic sensors, or other sensors. Output devices may include video displays, graphical displays, speakers, headphones, inkjet printers, laser printers, and 3D printers.

[00086] Devices 130a- 13 On may include a combination of multiple input or output devices, including, e.g., Microsoft KINECT, Nintendo Wiimote for the WII, Nintendo WII U GAMEPAD, or Apple IPHONE. Some devices 130a- 13 On allow gesture recognition inputs through combining some of the inputs and outputs. Some devices 130a-130n provides for facial recognition which may be utilized as an input for different purposes including authentication and other commands. Some devices 130a-130n provides for voice recognition and inputs, including, e.g., Microsoft KINECT, SIR! for IPHONE by Apple, Google Now or Google Voice Search.

[00087] Additional devices 130a- 13 On have both input and output capabilities, including, e.g., haptic feedback devices, touchscreen displays, or multi-touch displays. Touchscreen, multi-touch displays, touchpads, touch mice, or other touch sensing devices may use different technologies to sense touch, including, e.g., capacitive, surface capacitive, projected capacitive touch (PCT), in-cell capacitive, resistive, infrared, waveguide, dispersive signal touch (DST), in-cell optical, surface acoustic wave (SAW), bending wave touch (BWT), or force-based sensing technologies. Some multi-touch devices may allow two or more contact points with the surface, allowing advanced functionality including, e.g., pinch, spread, rotate, scroll, or other gestures. Some touchscreen devices, including, e.g., Microsoft PIXELSENSE or Multi-Touch Collaboration Wall, may have larger surfaces, such as on a table-top or on a wall, and may also interact with other electronic devices.

Some I/O devices 130a-130n, display devices 124a-124n or group of devices may be augment reality devices. The I/O devices may be controlled by an I/O controller 123 as shown in FIG. 11C. The I/O controller may control one or more EO devices, such as, e.g., a keyboard 126 and a pointing device 127, e.g., a mouse or optical pen. Furthermore, an EO device may also provide storage and/or an installation medium 116 for the computing device 100. In still other embodiments, the computing device 100 may provide USB connections (not shown) to receive handheld USB storage devices. In further embodiments, an EO device 130 may be a bridge between the system bus 150 and an external communication bus, e.g. a USB bus, a SCSI bus, a FireWire bus, an Ethernet bus, a Gigabit Ethernet bus, a Fibre Channel bus, or a Thunderbolt bus.

[00088] In some embodiments, display devices 124a-124n may be connected to EO controller 123. Display devices may include, e.g., liquid crystal displays (LCD), thin film transistor LCD (TFT-LCD), blue phase LCD, electronic papers (e-ink) displays, flexile displays, light emitting diode displays (LED), digital light processing (DLP) displays, liquid crystal on silicon (LCOS) displays, organic light-emitting diode (OLED) displays, active- matrix organic light-emitting diode (AMOLED) displays, liquid crystal laser displays, time- multiplexed optical shutter (TMOS) displays, or 3D displays. Examples of 3D displays may use, e.g. stereoscopy, polarization filters, active shutters, or autostereoscopy. Display devices 124a-124n may also be a head-mounted display (HMD). In some embodiments, display devices 124a-124n or the corresponding I/O controllers 123 may be controlled through or have hardware support for OPENGL or DIRECTX API or other graphics libraries.

[00089] In some embodiments, the computing device 100 may include or connect to multiple display devices 124a-124n, which each may be of the same or different type and/or form. As such, any of the I/O devices 130a-130n and/or the I/O controller 123 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of multiple display devices 124a-124n by the computing device 100. For example, the computing device 100 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display devices 124a-124n. In one embodiment, a video adapter may include multiple connectors to interface to multiple display devices 124a-124n. In other embodiments, the computing device 100 may include multiple video adapters, with each video adapter connected to one or more of the display devices 124a-124n. In some embodiments, any portion of the operating system of the computing device 100 may be configured for using multiple displays 124a-124n. In other embodiments, one or more of the display devices 124a-124n may be provided by one or more other computing devices 100a or 100b connected to the computing device 100, via the network 104. In some embodiments software may be designed and constructed to use another computer’s display device as a second display device 124a for the computing device 100. For example, in one embodiment, an Apple iPad may connect to a computing device 100 and use the display of the device 100 as an additional display screen that may be used as an extended desktop. One ordinarily skilled in the art will recognize and appreciate the various ways and embodiments that a computing device 100 may be configured to have multiple display devices 124a-124n.

[00090] Referring again to FIG. 11C, the computing device 100 may comprise a storage device 128 (e.g. one or more hard disk drives or redundant arrays of independent disks) for storing an operating system or other related software, and for storing application software programs such as any program related to the software for the genomic data processing system 120. Examples of storage device 128 include, e.g., hard disk drive (HDD); optical drive including CD drive, DVD drive, or BLU-RAY drive; solid-state drive (SSD); USB flash drive; or any other device suitable for storing data. Some storage devices may include multiple volatile and non-volatile memories, including, e.g., solid state hybrid drives that combine hard disks with solid state cache. Some storage device 128 may be non-volatile, mutable, or read-only. Some storage device 128 may be internal and connect to the computing device 100 via a bus 150. Some storage devices 128 may be external and connect to the computing device 100 via an EO device 130 that provides an external bus. Some storage device 128 may connect to the computing device 100 via the network interface 118 over a network 104, including, e.g., the Remote Disk for MACBOOK AIR by Apple. Some client devices 100 may not require a non-volatile storage device 128 and may be thin clients or zero clients 102. Some storage device 128 may also be used as an installation device 116, and may be suitable for installing software and programs. Additionally, the operating system and the software can be run from a bootable medium, for example, a bootable CD, e.g. KNOPPIX, a bootable CD for GNU/Linux that is available as a GNU/Linux distribution from knoppix.net.

[00091] Client device 100 may also install software or application from an application distribution platform. Examples of application distribution platforms include the App Store for iOS provided by Apple, Inc., the Mac App Store provided by Apple, Inc., GOOGLE PLAY for Android OS provided by Google Inc., Chrome Webstore for CHROME OS provided by Google Inc., and Amazon Appstore for Android OS and KINDLE FIRE provided by Amazon.com, Inc. An application distribution platform may facilitate installation of software on a client device 102. An application distribution platform may include a repository of applications on a server 106 or a cloud 108, which the clients 102a- 102n may access over a network 104. An application distribution platform may include application developed and provided by various developers. A user of a client device 102 may select, purchase and/or download an application via the application distribution platform.

[00092] Furthermore, the computing device 100 may include a network interface 118 to interface to the network 104 through a variety of connections including, but not limited to, standard telephone lines LAN or WAN links (e.g., 802.11, Tl, T3, Gigabit Ethernet, Infmiband), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethemet-over-SONET, ADSL, VDSL, BPON, GPON, fiber optical including FiOS), wireless connections, or some combination of any or all of the above. Connections can be established using a variety of communication protocols (e.g., TCP/IP, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), IEEE 802.1 la/b/g/n/ac CDMA, GSM, WiMax and direct asynchronous connections). In one embodiment, the computing device 100 communicates with other computing devices 100’ via any type and/or form of gateway or tunneling protocol e.g. Secure Socket Layer (SSL) or Transport Layer Security (TLS), or the Citrix Gateway Protocol manufactured by Citrix Systems, Inc. of Ft. Lauderdale, Florida. The network interface 118 may comprise a built-in network adapter, network interface card, PCMCIA network card, EXPRESSCARD network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 100 to any type of network capable of communication and performing the operations described herein.

[00093] A computing device 100 of the sort depicted in FIGs. 11B and 11C may operate under the control of an operating system, which controls scheduling of tasks and access to system resources. The computing device 100 can be running any operating system such as any of the versions of the MICROSOFT WINDOWS operating systems, the different releases of the Unix and Linux operating systems, any version of the MAC OS for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein. Typical operating systems include, but are not limited to: WINDOWS 2000, WINDOWS Server 2022, WINDOWS CE, WINDOWS Phone, WINDOWS XP, WINDOWS VISTA, and WINDOWS 7, WINDOWS RT, and WINDOWS 8 all of which are manufactured by Microsoft Corporation of Redmond, Washington; MAC OS and iOS, manufactured by Apple, Inc. of Cupertino, California; and Linux, a freely-available operating system, e.g. Linux Mint distribution (“distro”) or Ubuntu, distributed by Canonical Ltd. of London, United Kingdom; or Unix or other Unix-like derivative operating systems; and Android, designed by Google, of Mountain View, California, among others. Some operating systems, including, e.g., the CHROME OS by Google, may be used on zero clients or thin clients, including, e.g., CHROMEBOOKS. [00094] The computer system 100 can be any workstation, telephone, desktop computer, laptop or notebook computer, netbook, ULTRABOOK, tablet, server, handheld computer, mobile telephone, smartphone or other portable telecommunications device, media playing device, a gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication. The computer system 100 has sufficient processor power and memory capacity to perform the operations described herein. The computer system 100 can be of any suitable size, such as a standard desktop computer or a Raspberry Pi 4 manufactured by Raspberry Pi Foundation, of Cambridge, United Kingdom. In some embodiments, the computing device 100 may have different processors, operating systems, and input devices consistent with the device. The Samsung GALAXY smartphones, e.g., operate under the control of Android operating system developed by Google, Inc. GALAXY smartphones receive input via a touch interface.

[00095] In some embodiments, the computing device 100 is a gaming system. For example, the computer system 100 may comprise a PLAYSTATION 3, or PERSONAL PLAYSTATION PORTABLE (PSP), or a PLAYSTATION VITA device manufactured by the Sony Corporation of Tokyo, Japan, a NINTENDO DS, NINTENDO 3DS, NINTENDO WII, or a NINTENDO WII U device manufactured by Nintendo Co., Ltd., of Kyoto, Japan, an XBOX 360 device manufactured by the Microsoft Corporation of Redmond, Washington.

[00096] In some embodiments, the computing device 100 is a digital audio player such as the Apple IPOD, IPOD Touch, and IPOD NANO lines of devices, manufactured by Apple Computer of Cupertino, California. Some digital audio players may have other functionality, including, e.g., a gaming system or any functionality made available by an application from a digital application distribution platform. For example, the IPOD Touch may access the Apple App Store. In some embodiments, the computing device 100 is a portable media player or digital audio player supporting file formats including, but not limited to, MP3, WAV, M4A/AAC, WMA Protected AAC, AIFF, Audible audiobook, Apple Lossless audio file formats and .mov, m4v, and .mp4 MPEG-4 (H.264/MPEG-4 AVC) video file formats. [00097] In some embodiments, the computing device 100 is a tablet e.g. the IPAD line of devices by Apple; GALAXY TAB family of devices by Samsung; or KINDLE FIRE, by Amazon.com, Inc. of Seattle, Washington. In other embodiments, the computing device 100 is an eBook reader, e.g. the KINDLE family of devices by Amazon.com, or NOOK family of devices by Barnes & Noble, Inc. of New York City, New York.

[00098] In some embodiments, the communications device 102 includes a combination of devices, e.g. a smartphone combined with a digital audio player or portable media player. For example, one of these embodiments is a smartphone, e.g. the IPHONE family of smartphones manufactured by Apple, Inc.; a Samsung GALAXY family of smartphones manufactured by Samsung, Inc.; or a Motorola DROID family of smartphones. In yet another embodiment, the communications device 102 is a laptop or desktop computer equipped with a web browser and a microphone and speaker system, e.g. a telephony headset. In these embodiments, the communications devices 102 are web-enabled and can receive and initiate phone calls. In some embodiments, a laptop or desktop computer is also equipped with a webcam or other video capture device that enables video chat and video call.

[00099] In some embodiments, the status of one or more machines 102, 106 in the network 104 are monitored, generally as part of network management. In one of these embodiments, the status of a machine may include an identification of load information (e.g., the number of processes on the machine, CPU and memory utilization), of port information (e.g., the number of available communication ports and the port addresses), or of session status (e.g., the duration and type of processes, and whether a process is active or idle). In another of these embodiments, this information may be identified by a plurality of metrics, and the plurality of metrics can be applied at least in part towards decisions in load distribution, network traffic management, and network failure recovery as well as any aspects of operations of the present solution described herein. Aspects of the operating environments and components described above will become apparent in the context of the systems and methods disclosed herein.

[000100] Referring to FIG. 12, in various embodiments, a system 1200 may include a computing device 1210 (or multiple computing devices, co-located or remote to each other) and a sample processing system 1280. In various embodiments, computing device 1210 (or components thereof) may be integrated with the sample processing system 1280 (or components thereof). Components of computing device 1210 may be implemented by various combinations of computing hardware and software. In various embodiments, the sample processing system 1280 may include, may be, or may employ, for example, chromatography, mass spectroscopy, in situ hybridization, PCR, next-generation sequencing, northern blotting, microarray, dot or slot blots, FISH, and/or electrophoresis, etc., on such biological sample as, for example, phlegm, saliva, blood (or components thereof), tissue, and/or cells. In certain embodiments, the sample processing system 1280 may be, or may include, devices or systems for performing liquid chromatography and mass spectrometry, and may be used for extracting metabolites in biological samples.

[000101] The control unit 1215 (or multiple computing devices) may be used to control, and receive signals acquired via, components of sample processing system 1280. The control unit 1210 may include one or more processors and one or more volatile and non volatile memories for storing computing code and data that are captured, acquired, recorded, and/or generated. The computing device 1210 may include a control unit 1215 that is configured to exchange control signals with sample processing system 1280, allowing the computing device 1210 to be used to control, for example, processing of samples and/or delivery of data generated and/or acquired through processing of samples. A raw data analyzer 1220 may be used, for example, to perform analyses of data captured via sample processing system 1280, and may employ, for example, alignment, peak picking, and normalization procedures as discussed herein. For example, in some implementations, data may be generated as a multi-dimensional array or vector with values representing concentrations or levels of each of a plurality of metabolites or other signature components, and many instances, such levels may have widely different scales (e.g. parts per thousand, parts per million, etc.). To prevent the machine learning system from overemphasizing high concentrations, values may be normalized to a predetermined range (e.g. 0-1, 0-100, or any other such range). The normalization may comprise linear rescaling, or may be a more complex function (e.g. based on an average concentration of the particular metabolite in a sample data set). In some implementations, dimension reduction may be performed to reduce large and sparse arrays or vectors. In some implementations, feature recognition may be performed to select a subset of features for further analysis, such as principal component analysis. A machine learning modeler 1225 may be used to implement various machine learning functionality discussed herein. For example, a model training and testing engine 1230 may be used to apply various machine learning techniques (which may comprise, e.g., Light Gradient Boosting Machine (LightGBM) and/or random forests techniques) to one or more training datasets (e.g., datasets comprising processed data from raw data analyzer 1220 and/or various features such as ion features) to train and test machine learning models for various predictions or other classifications, and a classification engine 1235 may employ a machine learning model (e.g., classifiers trained via model training and testing engine 1230) to analyze data on metabolites (based on, e.g., tests on samples from subjects) to make various predictions or other classifications (e.g., regarding presence of various medical conditions).

[000102] A feature analyzer 1240 may be used to evaluate features by, for example, quantifying the impact of each feature on the developed model. Feature analyzer 1240 may, for example, uncover clinically important ion features that were globally predictive of the outcome, and may determine, for example, Shapley values for all features, or the top features (e.g., the top 2, top 5, top 10, top 15, top 20, top 25, top 30, etc.) on individual predictions and provide percent of contributions of features. Shapley analysis may provide improvements in analysis accuracy and/or efficiency in many implementations. Classification systems not utilizing the systems and methods discussed herein may only identify a top set of contributing features and the magnitude of change for each ion feature, which may be limiting in some instances. By utilizing Shapley analysis, implementations of the present system provide both feature importance classification as well as a direction of change of the magnitude of each feature (e.g. higher or lower in infected samples). This may significantly improve accuracy in many instances. A biomarker signature module 1245 may generate a biomarker signature based on selected features as disclosed herein. Features may be selected based on a threshold, such a percent contribution to predicting a medical condition, such as 0.5%, 1%, 2%, 5%, 10%, etc.

[000103] A transceiver 1250 allows the computing device 1210 to exchange readings, control commands, and/or other data with sample processing system 1280 (or components thereof). The transceiver 1250 may additionally or alternatively include a network interface permitting the computing device 1210 to communicate with other remote devices and systems via, for example, a telecommunications network such as the internet. One or more user interfaces 1255 allow the computing device 1210 to receive user inputs (e.g., via a keyboard, touchscreen, microphone, camera, etc.) and provide outputs (e.g., via display screen, audio speakers, etc.). The computing device 1210 may additionally include one or more databases 1260 (stored in, e.g., one or more computer-readable non-volatile memory devices) for storing, for example, data and analyses obtained from or via raw data analyzer 1220, machine learning modeler 1225 (e.g., model training and testing engine 1230 and/or classification engine 1235), feature analyzer 1240, biomarker signature module 1245, and/or sample processing system 1280. In some implementations, database 1260 (or portions thereof) may alternatively or additionally be part of another computing device that is co located or remote and in communication with computing device 1210 and/or sample processing system 1280 (or components thereof).

[000104] A flowchart for an example process 1300 according to various potential embodiments is shown in FIG. 13. At 1305, biological samples from subjects in a cohort may be analyzed or otherwise processed. The cohort may include subjects with a medical condition as well as control subjects without the medical condition. Processing the biological samples may include suitable tests for extracting, for example, various metabolites in the samples. In certain embodiments, the control unit 1215 may, for example, instruct sample processing system 1280 to process samples and provide test results to computing device 1210. At 1310, raw test results may be received and processed (e.g., by or via raw data analyzer 1220). The raw test results may be analyzed by, for example, alignment, peak picking, and normalization procedures as discussed herein.

[000105] At 1315, a training dataset may be generated from the processed test results and a machine learning model may be developed as disclosed herein (e.g., by or via machine learning modeler 1225). At 1320, feature analysis may be performed to identify features having sufficient predictive value (e.g., contributing at least a certain threshold percent), and a metabolite biomarker signature may be generated based on identified features. Feature analysis may be performed by or via, for example, feature analyzer 1240, and the metabolite biomarker signature may be generated by or via, for example, biomarker signature module 1245. The metabolite biomarker signature may be stored in database 1260 in association with the medical condition, for subsequent application to patient samples for recognizing the medical condition based on a metabolite profile of the samples. Alternatively or additionally, the metabolite biomarker signature may be incorporated into a report, presented graphically or otherwise via user interfaces 1255, and/or transmitted to another device through a network via transceiver 1240.

[000106] At 1325, one or more biological samples from a patient (who may or may not have the medical condition) may be processed by running one or more tests. For example, the control unit 1215 may instruct sample processing system 1280 to process the patient sample(s) and provide test results to computing device 1210. The metabolite biomarker signature from step 1320 may be applied (by, e.g., biomarker signature module 1245) to the data obtained from tests on the patient’s sample(s) to determine whether the patient has the medical condition that is associated with the metabolite biomarker signature.

Metabolite Biomarker Signature

[000107] As it would be understood by one of skill in the art, any embodiment or aspect as disclosed herein may be used by itself or combined with any other embodiment or aspect unless specified.

[000108] In one aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) generating a training dataset based on one or more metabolites isolated from a plurality of biological samples isolated from subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists essentially of, or yet further consists of a set of features identified through one or more tests run on the biological samples; (b) producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

[000109] In another aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) running one or more types of tests on biological samples from subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consists of a metabolite profile that changed in the subjects as a result of the medical condition; (b) generating a training dataset comprising, or alternatively consisting essentially of, or yet further consisting of a set of features identified through the one or more tests run on the biological samples; (c) producing a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features having contributions to model predictions exceeding a threshold; and (iii) generating the metabolite biomarker signature based on the subset of features; and (d) applying the metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.

[000110] In yet another aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to a training dataset based on one or more metabolites in a plurality of biological samples from a plurality of subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists of, or yet further consists of a set of features identified through the one or more tests run on the biological samples; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

[000111] In some embodiments, the method further comprises generating the training dataset based on the one or more metabolites. In further embodiments, the training dataset comprises, or alternatively consists, or yet further consists of a feature of the one or more metabolites in a biological sample isolated from a subject. In yet further embodiments, the training dataset further comprises whether the subject has a medical condition or not.

[000112] In some embodiments, the method further comprises isolating the one or more metabolites from the plurality of biological samples. [000113] In some embodiments, a subset of feature(s) lacks at least one feature compared to the reference set of features. In some embodiments, a subset of feature(s) comprises, or alternatively consists essentially of, or yet further consists of the top 200, or the top 150, or the top 100, or the top 90, or the top 80, or the top 70, or the top 60, or the top 59, or the top 58, or the top 57, or the top 56, or the top 55, or the top 54, or the top 53, or the top 52, or the top 51, or the top 50, or the top 49, or the top 48, or the top 47, or the top 46, or the top 45, or the top 44, or the top 43, or the top 42, or the top 41, or the top 40, or the top 39, or the top 38, or the top 37, or the top 36, or the top 35, or the top 34, or the top 33, or the top 32, or the top 31, or the top 30, or the top 29, or the top 28, or the top 27, or the top 26, or the top 25, or the top 24, or the top 23, or the top 22, or the top 21, or the top 20, or the top 19, or the top 18, or the top 17, or the top 16, or the top 15, or the top 14, or the top 13, or the top 12, or the top 11, or the top 10, or the top 9, or the top 8, or the top 7, or the top 6, or the top 5, or the top 4, or the top 3, or the top 2, or the top 1 feature(s) ranked based on the feature importance. In one embodiments, a subset of feature(s) comprises, or alternatively consists essentially of, or yet further consists of the top 20 features ranked based on the feature importance.

[000114] In some embodiments, selecting the subset of features comprises, or alternatively consists essentially of, or yet further consists of performing feature importance analysis. In further embodiments, selecting the subset of features comprises applying a Shapley Additive Explanation (SHAP) procedure. In some embodiments, the primary measure of model performance or the feature performance is the area under the receiver operating characteristic curve (AUC), which illustrates the diagnostic discriminative performance of the models. Performance measures for the models or the feature subsets also included sensitivity, specificity, or accuracy at a high-sensitivity operating point used to binarize the model predictions. In some embodiments, the SHAP provides the top list of feature classification, the magnitude of the change for each ion feature, the direction of the change (higher or lower in infected samples) or any combination thereof. Typically, a tranditional classification provides feature importance classification but not direction of change at the same time.

[000115] In some embodiments, the subject has been treated with a therapy neutralizing a pathogen causing the medical condition. Additionally or alternatively, the subject is immune-compromised. In some embodiments, the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child.

[000116] In some embodiments, immune-compromisation or any grammatical variation thereof refers to a state in which the immune system’s ability to fight infectious disease or cancer is compromised or entirely absent. A subject or patient who is immune-compromised may not have an immune response, or have a lower immune response compared to a subject having a healthy immune system, recognizing a pathogen upon having an infection of the pathogen. Alternatively, a subject who is immune-compromised may not have an immune response, or have a lower immune response compared to a subject having a healthy immune system, recognizing a cancer cell upon developing such cancer. Thus, a commonly used diagnosis method relying on detecting the immune response would fail in identifying a subject having the medical condition (such as infection or cancer). However, a method as disclosed herein relying on a metabolite feature would have no difficulty in selecting the subject or patient having the medical condition, no matter whether the subject or patient is immune-compromised or not.

[000117] In some embodiments, a subject or patient who has received a therapy neutralizing the pathogen causing the medical condition may not present such pathogen, or may not have a higher level (such as, absolute amount, or raw concentration, or normalized concentration) of such pathogen compared to that of a subject free of the pathogen, in a biological sample isolated from the subject or patient. Thus, a commonly used diagnosis method relying on detecting the pathogen would fail in identifying a subject having the medical condition caused by the pathogen. However, a method as disclosed herein relying on a metabolite feature would have no difficulty in selecting the subject or patient having the medical condition, no matter whether the subject or patient has been treated with a therapy neutralizing the pathogen or not.

[000118] In some embodiments, the feature of the biological sample comprises, or alternatively consists essentially of, or yet further consists of an extracellular concentration of the metabolite. In some embodiments, a metabolite is an intracellular metabolite.

[000119] In some embodiments, the method further comprises extracting and analyzing one or more biological samples of a patient using the metabolite biomarker signature. In some embodiments, analyzing a biological sample refers to obtaining the metabolite feature(s) of the biological sample. In further embodiments, the metabolite feature(s) is determined by a test as disclosed herein, such as LC/MS-MS or LC-Q-TOF-MS.

[000120] In some embodiments, the machine learning models comprise, or alternatively consist essentially of, or yet further consist of boosted or bagged decision trees. In further embodiments, the boosted or bagged decisions are selected from the group of Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost). In some embodiments, a linear model, optionally selected from least absolute shrinkage and selection operator (LASSO) or Ridge, may substitute a machine learning model.

[000121] In some embodiments, the medical condition is selected from the group consisting of: an infection caused by a pathogen selected from the group consisting of a bacterium, a virus, a fungi or a parasite, a cancer, or a chronic disease. In some embodiments, the medical condition is selected from the group consisting of tuberculosis, a human papillomavirus (HPV) infection, or malaria. In some embodiments, the medical condition is an infection by a virus selected from adenovirus, coronavirus, influenza A H1N1, influenza A H3N2, influenza B, human metapneumovirus, parainfluenza 1, parainfluenza 2, parainfluenza 3, parainfluenza 4, respiratory syncytial virus (RSV), or rhinovirus.

[000122] In some embodiments, the medical condition comprises, or alternatively consists essentially of, or yet further consists of an infection by a respiratory virus. Accordingly, the respiratory virus is the pathogen causing the medical condition. In further embodiments, the respiratory virus is selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus. In further embodiments, the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D). In some embodiments, the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9. In some embodiments, the lineage of influenza B is selected from Victoria or Yamagata.

[000123] In some embodiments, the medical condition comprises, or alternatively consists essentially of, or yet further consists of an infection by a coronavirus. Accordingly, the coronavirus is referred to herein as the pathogen causing the medical condition. In some embodiments, the coronavirus is selected from the group of: common cold optionally caused by any one of human coronavirus (HCoV) HCoV-OC43, HCoV-HKUl, HCoV- 229E, or HCoV-NL63. In other embodiments, the coronavirus is selected from the group of severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), Middle East respiratory syndrome coronavirus (MERS-CoV) or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In some embodiments, the medical condition comprises, or alternatively consists essentially of, or yet further consists of severe acute respiratory syndrome (SARS) caused by severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome (MERS) caused by Middle East respiratory syndrome coronavirus (MERS-CoV); or Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

[000124] In some embodiments, the medical condition comprises, or alternatively consists essentially of, or yet further consists of a cancer. In further embodiments, the cancer is selected from a cancer of: circulatory system, for example, heart (sarcoma (angiosarcoma, fibrosarcoma, rhabdomyosarcoma, liposarcoma), myxoma, rhabdomyoma, fibroma, lipoma or malignant teratoma), mediastinum, pleura, or other intrathoracic organs, vascular tumors, tumor-associated vascular tissue; respiratory tract, for example, nasal cavity, middle ear, accessory sinuses, larynx, trachea, bronchus or lung such as small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC), bronchogenic carcinoma (squamous cell, undifferentiated small cell, undifferentiated large cell, adenocarcinoma), alveolar (bronchiolar) carcinoma, bronchial adenoma, sarcoma, lymphoma, chondromatous hamartoma, mesothelioma; gastrointestinal system, for example, esophagus (squamous cell carcinoma, adenocarcinoma, leiomyosarcoma, lymphoma), stomach (carcinoma, lymphoma, leiomyosarcoma), gastric, pancreas (ductal adenocarcinoma, insulinoma, glucagonoma, gastrinoma, carcinoid tumors, vipoma), small bowel (adenocarcinoma, lymphoma, carcinoid tumors, Karposi's sarcoma, leiomyoma, hemangioma, lipoma, neurofibroma, fibroma), large bowel (adenocarcinoma, tubular adenoma, villous adenoma, hamartoma, leiomyoma); genitourinary tract, for example, kidney (adenocarcinoma, Wilm's tumor (nephroblastoma), lymphoma, leukemia), bladder or urethra (squamous cell carcinoma, transitional cell carcinoma, adenocarcinoma), prostate (adenocarcinoma, sarcoma), testis (seminoma, malignant teratoma, embryonal carcinoma, teratocarcinoma, choriocarcinoma, sarcoma, interstitial cell carcinoma, fibroma, fibroadenoma, adenomatoid tumors, lipoma); liver, for example, hepatoma (hepatocellular carcinoma), cholangiocarcinoma, hepatoblastoma, angiosarcoma, hepatocellular adenoma, hemangioma, pancreatic endocrine tumors (such as pheochromocytoma, insulinoma, vasoactive intestinal peptide tumor, islet cell tumor or glucagonoma); bone, for example, osteogenic sarcoma (osteosarcoma), fibrosarcoma, malignant fibrous histiocytoma, chondrosarcoma, Ewing's sarcoma, malignant lymphoma (reticulum cell sarcoma), multiple myeloma, malignant giant cell tumor chordoma, osteochronfroma (osteocartilaginous exostoses), benign chondroma, chondroblastoma, chondromyxofibroma, osteoid osteoma or giant cell tumors; nervous system, for example, neoplasms of the central nervous system (CNS), primary CNS lymphoma, skull cancer (osteoma, hemangioma, granuloma, xanthoma, osteitis deformans), meninges (meningioma, meningiosarcoma, gliomatosis), brain cancer (astrocytoma, medulloblastoma, glioma, ependymoma, germinoma (pinealoma), glioblastoma multiform, oligodendroglioma, schwannoma, retinoblastoma, congenital tumors), spinal cord neurofibroma, meningioma, glioma, sarcoma); reproductive system, for example, gynecological, uterus (endometrial carcinoma), cervix (cervical carcinoma, pre- tumor cervical dysplasia), ovaries (ovarian carcinoma (serous cystadenocarcinoma, mucinous cystadenocarcinoma, unclassified carcinoma), granulosa-thecal cell tumors, Sertoli-Leydig cell tumors, dysgerminoma, teratoma), vulva (squamous cell carcinoma, intraepithelial carcinoma, adenocarcinoma, fibrosarcoma, melanoma), vagina (clear cell carcinoma, squamous cell carcinoma, botryoid sarcoma (embryonal rhabdomyosarcoma), fallopian tubes (carcinoma), or other sites associated with female genital organs; placenta, penis, prostate, testis, or other sites associated with male genital organs; hematologic system, for example, blood (myeloid leukemia (acute or chronic), acute lymphoblastic leukemia, chronic lymphocytic leukemia, myeloproliferative diseases, multiple myeloma, myelodysplastic syndrome), Hodgkin's disease, non-Hodgkin's lymphoma (malignant lymphoma); oral cavity, for example, lip, tongue, gum, floor of mouth, palate, or other parts of mouth, parotid gland, or other parts of the salivary glands, tonsil, oropharynx, nasopharynx, pyriform sinus, hypopharynx, or other sites in the lip, oral cavity or pharynx; skin, for example, malignant melanoma, cutaneous melanoma, basal cell carcinoma, squamous cell carcinoma, Karposi's sarcoma, moles dysplastic nevi, lipoma, angioma, dermatofibroma, or keloids; adrenal glands: neuroblastoma; or other tissues comprising connective or soft or both connective and soft tissue, retroperitoneum or peritoneum, eye, intraocular melanoma, or adnexa, breast, head or neck, anal region, thyroid, parathyroid, adrenal gland or other endocrine glands or related structures, secondary or unspecified malignant neoplasm of lymph nodes, secondary malignant neoplasm of respiratory or digestive systems or secondary malignant neoplasm of other sites, or any combination thereof. In some embodiments, the cancer is a solid tumor. In other embodiments, wherein the cancer is a liquid cancer. Additionally or alternatively, the cancer is a primary cancer or a metastasis. In some embodiments, the cancer comprises a carcinoma, a sarcoma, a myeloma, a leukemia, or a lymphoma. Accordingly, the cancer or the cell may be referred to herein as the pathogen causing the medical condition.

[000125] In some embodiments, a chronic disease refers to a medical condition that last 1 year or more and require ongoing medical attention or limited activities of daily living or both. In further embodiments, a chronic disease is a heart disease, or a stroke, or other cardiovascular disease. In some embodiments, a chronic disease is high blood pressure. In some embodiments, a chronic disease is high cholesterol. In some embodiments, a chronic disease is a cancer. In some embodiments, a chronic disease is diabetes.

[000126] In some embodiments, the one or more tests comprises, or alternatively consists essentially of, or yet further consists of a liquid chromatography (LC), or a mass spectrometry (MS), or both. In some embodiments, the one or more tests comprises, or alternatively consists essentially of, or yet further consists of liquid chromatography tandem mass spectrometry (LC/MS-MS). Additionally or alternatively, the one or more tests comprises, or alternatively consists essentially of, or yet further consists of liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS).

[000127] In some embodiments, the metabolite features are ion features. In some embodiments, a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of presence of absence of the metabolite. In further embodiments, a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of absolute amount of the metabolite in the biological sample. In yet further embodiments, a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of a concentration of the metabolite in the biological sample. In some embodiments, a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of a compound abundance of the metabolite in the biological sample. In some embodiments, a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of an absolute amount or a concentration or a compound abundance of the metabolite in the biological sample normalized, for example to an internal standard or to the mean compound abundance. Additionally, in some embodiments, the concentration or amount or compound abundance of the metabolite may be an extracellular or an intracellular one. Additionally or alternatively, the level (such as the absolute amount, or the compound abundance, or the concentration) is normalized by subtracting a level of a negative control. In further embodiments, the negative control is a subject free of the medical condition. In other embodiments, the negative control is a solution immersing or diluting the biological sample prior to performing the test. In some embodiments, the internal standard is a labeled standard D5-pyroglutamic acid.

[000128] In some embodiments, the biological sample is a nasopharyngeal sample, a blood sample, a serum sample, a plasma sample, or a urine sample. In some embodiments, the biological sample is a nasopharyngeal swab or viral transport medium (VTM) immersing a nasopharyngeal swab. In some embodiments, a metabolite feature is normalized by subtracting a level of the viral transport medium.

[000129] In further embodiments, a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of a feature of more than one metabolites. Additionally or alternatively, a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of more than one features of a metabolite. In some embodiments, a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of a set or a subset of features as disclosed herein.

[000130] In some embodiments, presence or absence of the metabolite(s) is determined. In some embodiments, an absolute amount of the metabolite(s) in the biological sample is determined. In further embodiments, a level (such as an absolute amount or a compound abundance or concentration) of the extracellular metabolite(s) in the biological sample is determined. In yet further embodiments, the level is an extracellular level. In some embodiments, a metabolite is an intracellular metabolite. Additionally or alternatively, the level is normalized, for example, for example to an internal standard or to the mean compound abundance. In some embodiments, the internal standard is a labeled standard D5-pyroglutamic acid.

[000131] In a further aspect, provided is a method comprising, or alternatively consisting essentially of, or yet further consisting of applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition. In some embodiments, the metabolite biomarker signature produced by a method as disclosed herein. In some embodiments, the method further comprises performing the one or more tests, for example, liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS), on the biological sample from the patient. In some embodiments, the patient is not the subjects used in determining or selecting the metabolite biomarker signature. In some embodiments, the patient is suspect of having a medical condition. In further embodiments, the patient is immune- compromised. In other embodiments, the patient is not immune-compromised.

Additionally or alternatively, the patient has been treated with a therapy neutralizing a pathogen causing the medical condition. In other embodiments, the patient has not been treated with a therapy neutralizing a pathogen causing the medical condition. In some embodiment, the patient is asymptomatic. In some embodiments, the patient is a human. In some embodiments, the patient is an adult. In other embodiments, the patient is a child.

[000132] As used herein, a metabolite biomarker signature can be used to indicate the medical condition or to select a subject or patient having the medical condition from that suspect of having such medical condition. A metabolite biomarker signature may be discovered by a method as disclosed herein. In some embodiments, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of one or more certain metabolite feature(s). In some embodiments, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of an altered (increased or decreased) level (such as absolute amount, concentration, or compound abundance, normalized or not) of a metabolite compared to a negative control. In further embodiment, the negative control is a level of a subject free of the medical condition, or an average thereof. In some embodiments, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a level (such as absolute amount, concentration, or compound abundance, normalized or not) of a metabolite similar to (such as at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 90%, or at least 95%, or at least 97%, or at least 99%, or at least 100%; additionally or alternatively no more than 110%, or no more than 120%, or no more than 150%, or no more than 200% of) that of a positive control. In further embodiment, the positive control is the level of a subject having the medical condition. In some embodiments, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) higher than a certain threshold. Additionally or alternatively, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) lower than a certain threshold. In some embodiments, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) within a certain range. In further embodiments, the threshold and range can be determined by a machine learning method as disclosed herein. In some embodiments, having a metabolite biomarker signature as disclosed herein in a biological sample isolated from a subject or patient is indicative of the subject or patient having a medical condition, or having a high possibility or risk (such as, more than 50%, or more than 60%, or more than 70%, or more than 75%, or more than 80%, or more than 85%, or more than 90%, or more than 95%, or more than 96%, or more than 97%, or more than 98%, or more than 99%, or about 100%) in having a medical condition. In some embodiments, having a feature profile in a biological sample isolated from a subject or patient not fall within a metabolite biomarker signature as disclosed herein is indicative of the subject or patient not having a medical condition, or having a low possibility or risk (such as, less than 50%, or less than 45%, or less than 40%, or less than 35%, or less than 30%, or less than 25%, or less than 20%, or less than 15%, or less than 10%, or less than 5%, or less than 1%) in having a medical condition. In some embodiments, the metabolite biomarker signature comprises, or alternatively consists of, or yet further consists of a decision tree or ensembled decision trees as determined by a machine learning method as disclosed herein. [000133] Accordingly, applying or using a metabolite biomarker signature as used herein refers to comparing feature(s) of a biological sample in a patient with the metabolite biomarker signature, or inputting feature(s) of a biological sample in a patient to the decision tree or ensemble decision trees of the metabolite biomarker signature, and optionally determining or outputting whether the patient has the medical condition, or the possibility or risk of the patient having the medical condition.

[000134] In some embodiments, the method further comprises identifying, using the metabolite biomarker signature, the medical condition in the patient based on an analysis of the biological sample. In some embodiments, the medical condition is a microbial infection. In some embodiments, the method further comprises recognizing, based on the metabolite biomarker signature, in the patient the medical condition that is an infection by a respiratory virus optionally selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus.

In some embodiments, the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D). In further embodiments, the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9. In yet further embodiments, the lineage of influenza B is selected from Victoria or Yamagata. In some embodiments, the coronavirus is selected from the group of HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome coronavirus (MERS-CoV); or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

[000135] In some embodiments, the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subject having the medical condition. Additionally or altneratively, the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects without the medical condition. In some embodiments, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects who are immunocomprised. Additionally or alternatively, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects who are not immunocomprised. In some embodiments, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects having been treated with a therapy neutralizing the pathogen causing the medical condition. Additionally or alternatively, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects not having been treated with a therapy neutralizing the pathogen causing the medical condition.

[000136] In some embodiments, the method further comprises determining or isolating or both determining and isolating the one or more metabolites from the plurality of biological samples.

[000137] As used herein, determining or analyzing a metabolite refers to obtaining a feature of the metabolite, such as presence or absence of the metabolite or level (such as absolute amount, or concentration, or compound abundance, normalized or not) of the metabolite. Such feature can be obtained via a test as disclosed herein.

[000138] In some embodiments, the method further comprises detecting a pathogen causing the medical condition in the biological sample, for example by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay. In further embodiments, detection of the pathogen further indicates the subject has the medical condition. In other embodiments, no detection of the pathogen further indicates the subject does not have the medical condition. In yet further embodiments, the method further comprises culturing the biological sample under a condition suitable for growth of the pathogen causing the medical condition prior to the detecting step.

[000139] In some embodiments, the method further comprises detecting an immunoglobulin or an immune cell specifically recognizing and binding a pathogen causing the medical condition in the biological sample, for example, by an immunofluorescence assay. In further embodiments, detection of the immunoglobulin or the immune cell or both further indicates the subject has the medical condition. In other embodiments, no detection of either the immunoglobulin or the immune cell or both further indicate the subject does not have the medical condition.

[000140] In some embodiments, the method further comprises administering to the patient having the medical condition a therapy specifically for treating the condition. In some embodiments, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent neutralizing a pathogen causing the medical condition. Additionally or alternatively, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing the pathogen, such as the influenza virus. In further embodiments, the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing the extracellular pathogen or the extracellular influenza virus. In further embodiments, the therapy or the pharmaceutical agent, is administered in an effective amount, for example in treating the medical condition, or in changing a metabolite feature to one indicative of not having the medical condition using the metabolite biomarker signature, or both.

[000141] In some embodiments, the medical condition is an infection caused by a pathogen. In further embodiment, the therapy is specifically for treating the pathogen. In some embodiments, corresponding therapy is available to one of skill in the art, for example at www.drugs.com.

[000142] In some embodiments, the medical condition is an infection caused by a bacterium. In further embodiments, the therapy is specifically for treating a bacterial infection. In yet further embodiments, the therapy is an antibiotic, or an antibody or an antigen binding fragment thereof specifically recognizing and binding the bacterium, or both.

[000143] In some embodiments, the medical condition is a viral infection. In further embodiments, the therapy is an anti-viral therapy. In yet further embodiments, the anti-viral therapy is selected from an antibody or an antigen binding fragment thereof specifically recognizing and binding the virus, an inhibitor that inhibits transcription of the viral genome, such as a NDA polymerase inhibitor or a reverse transcriptase inhibitor; a protease inhibitor that inhibits the post-translational process of the virus; an agent that inhibits the virus from attaching to or penetrating the host cell; an immunomodulatory that induces production of host cell enzyme which stop viral reproduction; an integrase strand transfer inhibitor that prevents integration of the viral DNA into the host DNA by inhibiting the viral enzyme integrase; a neuraminidase inhibitor that blocks viral enzymes and inhibits reproduction of the virus; or any combination thereof. In further embodiments, the anti viral therapy is selected from an adamantane antiviral, an antiviral booster, an antiviral interferon, a chemokine receptor antagonist, an integrase strand transfer inhibitor, a miscellaneous antiviral, a neuraminidase inhibitor, a non-nucleoside reverse transcriptase inhibitor (NNRTI), a non-structural protein 5A (NS5A) inhibitor, a nucleoside reverse transcriptase inhibitor (NRTI), a protease inhibitor, a purine nucleoside, or any combination thereof. See, for example, drugs.com/drug-class/adamantane-antivirals.html, drugs.com/drug-class/antiviral-boosters.html, drugs.com/drug-class/antiviral- combinations.html, drugs.com/drug-class/antiviral-interferons.html, drugs.com/drug- class/chemokine-receptor-antagonist.html, drugs.com/drug-class/integrase-strand-transfer- inhibitor.html, drugs.com/drug-class/miscellaneous-antivirals.html, drugs. com/drug- class/neuraminidase-inhibitors. html, drugs.com/drug-class/nnrtis.html, drugs.com/drug- class/ns5a-inhibitors.html, drugs.com/drug-class/nrtis.html, drugs. com/drug-class/protease- inhibitors. html, and drugs.com/drug-class/purine-nucleosides.html, each of which is incorporated herein by reference in its entirety.

[000144] In one aspect, provided is a method for selecting a subject for an anti-influenza treatment. The method comprises, or alternatively consists essentially of, or yet further consists of determining in a biological sample isolated from a subject suspected of having a medical condition which is being infected with an influenza virus a feature of a metabolite. Accordingly, the influenza virus may be referred to herein as a pathogen causing the medical condition. In some embodiments, the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having an m/z of 227.0793 and an RT of 10.23 or an equivalent thereof, a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, a compound having an m/z of 232.1182 and an RT of 2.11 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 63.0440 and an RT of 1.78 or an equivalent thereof, a compound having an m/z of 634.7114 and an RT of 7.00 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, or a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof. In some embodiments, the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having an m/z of 227.0793 and an RT of 10.23 or an equivalent thereof, a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, a compound having an m/z of 232.1182 and an RT of 2.11 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 352.213 In and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 63.0440 and an RT of 1.78 or an equivalent thereof, a compound having an m/z of 634.7114 and an RT of 7.00 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, or a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof. In some embodiments, the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having an m/z of 227.0793 and an RT of 10.23 or an equivalent thereof, a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, a compound having an m/z of 232.1182 and an RT of 2.11 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 352.213 In and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 63.0440 and an RT of 1.78 or an equivalent thereof, a compound having an m/z of 634.7114 and an RT of 7.00 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, or a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof. In some embodiments, the metabolite is selected from one or more of: a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 106.0865 and an RT of 10.34 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, or a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof In some embodiments, an altered (such as increased or decreased) level of the metabolite in the sample as compared to a control level of the metabolite indicates that the subject is suitable for an anti-influenza treatment. In some embodiments, the in-source fragment ion of pyroglutamic acid is pyroglutamic acid-D5. In further embodiments, a lower level of pyroglutamic acid indicates the subject is suitable for an anti -influenza treatment.

[000145] In some embodiments, a lower level of a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000146] In some embodiments, a lower level of a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000147] In some embodiments, a higher level of a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000148] In some embodiments, a lower level of a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza). In some embodiments, a lower level of a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000149] In some embodiments, a higher level of a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000150] In some embodiments, a lower level of a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza). In some embodiments, a lower level of a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000151] In some embodiments, a lower level of a compound having an m/z of 352.213 In and an RT of 10.89 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza). In some embodiments, a lower level of a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000152] In some embodiments, a higher level of a compound having an m/z of 86.0965 and an RT of 7.88 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000153] In some embodiments, a higher level of a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000154] In some embodiments, a higher level of a compound having an m/z of 230.0961 and an RT of 1.30 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000155] In some embodiments, a lower level of a compound having an m/z of 102.1268 and an RT of 11.61 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000156] In some embodiments, a higher level of a compound having an m/z of 634.7114 and an RT of 7.00 or an equivalent thereof, for example, compared to a control, indicates the subject has the medical condition (such as influenza).

[000157] In some embodiments, the subject is immune-compromised. Additionally or alternatively, the subject has been treated with a therapy neutralizing the influenza viral infection. In some embodiments, the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child.

[000158] A metabolite compound is identified by its m/z and RT. In some embodiments, a metabolite compound as disclosed herein is identified by its m/z and RT under the experimental setting as detailed in the examples. Accordingly, an equivalent of a reference metabolite compound may refer to a metabolite compound identified by its m/z and RT under an experimental setting different to the one detailed in the examples. The equivalent’s m/z or RT or both is different to the m/z or RT or both of the reference. However, if the equivalent is under the reference’s experimental setting, the equivalent shares the same m/z and RT of the reference. In other embodiments, an equivalent of a metabolite compound is the actual compound, such as pyroglutamic acid, having the same m/z and RT.

[000159] In some embodiments, the control is the level of the metabolite as measured in a sample isolated from a subject not suffering from an influenza infection, or an average level measured from a plurality of subjects not suffering from an influenza infection.

[000160] In some embodiments, the method further comprises detecting the influenza virus in the biological sample by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay. In further embodiments, detection of the influenza virus further indicates the subject has the infection. In other embodiments, no detection of the influenza virus further indicates the subject does not have the infection. In yet further embodiments, the method further comprises culturing the biological sample under a condition suitable for growth of the influenza virus prior to the detecting step by RT-PCT or an immunofluorescence assay.

[000161] In some embodiments, the method further comprises detecting an immunoglobulin or an immune cell specifically recognizing and binding the influenza virus in the biological sample by an immunofluorescence assay. In further embodiments, detection of the immunoglobulin or the immune cell or both further indicates the subject has the infection. In other embodiments, no detection of either the immunoglobulin or the immune cell or both further indicate the subject does not have the infection.

[000162] In some embodiments, the method further comprises administering to the subject having the infection an anti -influenza therapy. In further embodiments, the anti-influenza therapy comprises, or alternatively consists essentially of, or yet further consists of a neuraminidase inhibitor, a M2 channel blocker, an antibody neutralizing an influenza virus, or any combination thereof. In yet further embodiments, the anti-influenza therapy optionally not neutralizing an influenza virus, comprises, or alternatively consists essentially of, or yet further consists of oseltamivir, oseltamivir phosphate, zanamivir, rimantadine, amantadine, peramivir, baloxavir marboxil, acetaminophen, dextromethorphan, pseudoephedrine, guaifenesin, phenylephrine, chlorpheniramine, peramivir, diphenhydramine, or any combination thereof. In further embodiments, the medical condition is influenza A infection. In yet further embodiments, the anti-influenza therapy comprises, or alternatively consists essentially of, or yet further consists of amantadine, rimantadine, or any combination thereof. In further embodiments, the therapy or the pharmaceutical agent, is administered in an effective amount, for example in treating the influenza infection, or in changing a metabolite feature to one indicative of not having the influenza infection using the metabolite biomarker signature, or both.

[000163] In some embodiments, the feature of the biological sample comprises, or alternatively consists essentially of, or yet further consists of presence or absence of one or more of the metabolites. In some embodiments, the feature of the biological samples comprises, or alternatively consists essentially of, or yet further consists of a level (such as absolute amount, compound abundance, or concentration) of one or more of the metabolites. In further embodiments, the level is an extracellular level. In some embodiments, a metabolite is an intracellular metabolite. In yet further embodiments, the level is normalized to an internal standard, or to the mean compound abundance. Additionally or alternatively, the level is normalized by subtracting a level of a negative control. In further embodiments, the negative control is a subject free of the medical condition. In other embodiments, the negative control is a solution immersing or diluting the biological sample prior to performing the test. In some embodiments, the internal standard is a labeled standard D5- pyroglutamic acid.

[000164] In some embodiments, the biological sample is a nasopharyngeal sample, a blood sample, a serum sample, a plasma sample, or a urine sample. In some embodiments, the biological sample is a nasopharyngeal swab or viral transport medium (VTM) immersing a nasopharyngeal swab. In some embodiments, a metabolite feature is normalized by subtracting a level of the viral transport medium.

[000165] Also provided is a system comprising, or alternatively consisting essentially of, or yet further consisting of a processor and a memory. In some embodiments, the memory comprises, or alternatively consists essentially of, or yet further consists of instructions that are executable by the processor to cause the machine learning system to: (a) generate a training dataset based on one or more tests run on biological samples from a plurality of subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of metabolites from the medical condition, and wherein the training dataset comprises, or alternatively consists of, or yet further consists of a set of features identified through the one or more tests run on the biological samples; (b) produce a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) store, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.

[000166] In a further aspect, provided is a kit for use in a method as disclosed herein. In some embodiments, the kit comprises, or alternatively consists essentially of, or yet further consists of one or more of an instruction for use, methanol, formic acid, ammonium formate salt, ammonium hydroxide, water, acetonitrile, isopropanol, MS calibration and reference mass solution, or MS metabolite library of standards. Additionally or alternatively, the kit comprises, or alternatively consists essentially of, or yet further consists of a system as disclosed herein.

[000167] In yet a further aspect, provided is a non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform a method or a step thereof as described herein.

[000168] In one aspect, developed herein is an approach using liquid chromatography combined with mass spectrometry (LC/MS) as well as machine learning for the study of metabolite alterations to diagnose and differentiate a medical condition, such as respiratory viruses.

[000169] Accordingly, provided is a rapid and simple diagnosis of a medical condition, such as respiratory viral infections, in children and adults. This technique has many potential advantages, for example, it can be developed into a low-complexity, rapid point- of-care-test; it is cheaper than existing diagnostics; it is less invasive (for example, could be adapted for urine-based diagnosis, instead of nasopharyngeal swab); and it allows identification of certain metabolic pathways used in a medical condition, such as by respiratory viruses, that could pave the way for new therapeutic targets in the future. [000170] In some embodiments, a more streamlined LC/MS system is developed to accelerate turnaround time. In some embodiments, the method is validated using a blind trial set. In some embodiments, the method and system as disclosed herein is suitable for a point-of-care-test in a simple format. In some embodiments, the technology is expanded to transplant viruses, tropical medicine and diagnosis of congenital infections.

Methods Available for Analyzing a Metabolite

[000171] The human metabolome encompasses lipids, carbohydrates, and metabolic intermediates (e.g., organic acids, amino acids, and acylcamitines). Detection of these diverse compound classes using liquid chromatography-mass spectrometry (LC-MS) currently requires multiple chromatographic techniques. Commonly, lipidomic methods use reversed-phase (RP) chromatography, hydrophilic interaction chromatography (HILIC), or direct infusion; and gly comic methods use HILIC. Because methods based on either RP or HILIC alone can miss key metabolites, results from the independent use of these two approaches are often combined to capture and detect the full range of compounds by full- scan MS. However, this approach requires separate sample preparations for each chromatographic technique, and leads to overlapping datasets (i.e., the same metabolite being detected on both techniques) that must be meticulously curated to achieve a single, unique result set.

[000172] Various chromatographic strategies have been investigated to address limitations of the independent use of RP and HILIC. These include RP methods using mobile phase modifiers, such as ion-pairing reagents or ammonium fluoride, and columns with increased polar retention, such as C18-pentafluorophenyl (PFP) and porous graphitic carbon (Hypercarb), as well as combined RP -HILIC or HILIC -RP arrangements. While these strategies expand metabolome coverage, they are unable to resolve key pathognomonic metabolites (e.g., alloisoleucine, seen in maple syrup urine disease) without sacrificing negative mode ionization, or they require at least two LC systems to overcome mobile phase incompatibility. Ion-exchange (IEX) chromatography and mixed-mode IEX have also been investigated to widen metabolite coverage, especially to retain highly charged metabolites, but, under the conditions studied, were associated with prolonged retention of hydrophobic or highly charged compounds, or the lack of hydrophobic retention. [000173] Alternatively, an in-line dual-column IEX-RP configuration using a single LC system has been used to increase peak capacity in proteomic applications. And the RP column would separate the remaining, less polar metabolites. By pairing RP with IEX, it was predicted that both polar and non-polar metabolites should bind to and elute from their appropriate columns, resulting in expanded metabolite coverage with one LC system.

[000174] Other methods are available for analyzing a biological sample. The method comprises, or alternatively consists essentially of, or yet further consists of: separating components of the biological sample via reversed-phase (RP) chromatography to obtain an elute; subjecting the elute to separation via ion-exchange (IEX) chromatography or mixed mode IEX chromatography; and detecting the separated compounds to determine the components of the biological sample.

[000175] In some embodiments, the biological sample comprises, or alternatively consists essentially of, or yet further consists of lipids, carbohydrates, and metabolic intermediates.

In some embodiments, the biological sample comprises, or alternatively consists essentially of, or yet further consists of polar and non-polar metabolites. In some embodiments, the detecting step is performed using mass spectrometry. In some embodiments, the detecting step includes qualitative analysis. In some embodiments, the biological sample is separated in the RP chromatography and IEX chromatography with one solvent gradient. In some embodiments, there is no switching valve between the RP chromatography and IEX chromatography. In some embodiments, isomers of metabolites in the biological sample are separated. Representative methods are described in US Patent Application No. 17/207,295 filed March 19, 2021, which is incorporated herein by reference in its entirety.

[000176] The foregoing description of embodiments has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from this disclosure. The embodiments were chosen and described in order to explain the principals of the disclosure and its practical application to enable one skilled in the art to utilize the various embodiments and with various modifications as are suited to the particular use contemplated. Other substitutions, modifications, changes and omissions may be made in the design, operating conditions and arrangement of the embodiments without departing from the scope of the present disclosure as expressed in the appended claims.

[000177] The following examples are included to demonstrate some embodiments of the disclosure. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.

Example 1 - Novel metabolomics approach for the diagnosis of respiratory viruses directly from nasopharyngeal specimens

[000178] Respiratory virus infections, including influenza A and B, are important causes of morbidity and mortality among pediatric and adult patients. These viruses infect respiratory epithelial cells, where they may induce metabolite alterations. Uses of liquid chromatography (LC) combined with mass spectrometry (MS) was investigated for the study of host cell metabolite alterations to diagnose and differentiate respiratory viruses. Rapid identification of respiratory viruses may have important implications for patient management, optimization of infection control measures and antimicrobial stewardship.

[000179] Methods: Studied were asopharyngeal swab samples that were positive for respiratory viruses by a commercial multiplex reverse transcriptase-polymerase chain reaction assay (GenMark Diagnostics, Inc., Carlsbad, CA). Banked, frozen samples (-80°C) stored in viral transport media were retrieved, thawed to room temperature, and distributed in 5 pL aliquots. The aliquots were centrifuged at 13.3 x g for 15 minutes, and the filtrate was placed in autosampler vials for analysis by Agilent 6545 Quadrupole Time-of-Flight (Q-TOF) LC/MS (Agilent Technologies, Santa Clara, CA). Compounds were eluted using a quaternary solvent manager pumping ammonium formate and methanol, followed by accurate mass analysis by MS Q-TOF. Agilent Mass Profiler 3D principal component analysis was performed, and compound identification was completed using the METLIN metabolite database.

[000180] Table 1: Liquid chromatography (LC) Gradient-time table.

[000181] Results: A total of 130 samples were tested by Q-TOF LC/MS, including 120 positive samples (10 samples of each viral target), as well as 10 clinical specimens collected from patients with acute respiratory symptoms and negative respiratory virus RT-PCR. Viruses tested included adenovirus, coronavirus, influenza A H1N1 and H3N2, influenza B, human metapneumovirus, parainfluenza 1, 2, 3 and 4, respiratory syncytial virus (RSV), and rhinovirus. Q-TOF LC/MS allowed identification of key metabolites that distinguished all virus positive samples compared to the negative group (FIGs. 1A-1C), as well as differentiating these respiratory viruses from one another. Clear differentiation was also seen between influenza A H1N1 and H3N2 subtypes (FIG. ID), and between parainfluenza types.

[000182] Discussion: Preliminary data from the Q-TOF LC/MS analysis show that respiratory viruses exhibit different host cell metabolomic profiles that allow viral differentiation to the species level, and for influenza A virus, the subtype level. This metabolomic approach has substantial potential for diagnostic applications in infectious diseases, as it has the advantage of direct testing of patient samples. Investigation is performed to complete metabolite identification, study implicated metabolic pathways and design low-cost, low-complexity testing based on key metabolites that could be performed near or at the point-of-care.

Example 2 - Novel metabolomics approach combined with machine learning for the diagnosis of influenza from nasopharyngeal specimens

[000183] Respiratory virus infections are important causes of morbidity and mortality and may induce host metabolite alterations by infecting respiratory epithelial cells. Uses of liquid chromatography (LC) combined with quadrupole time-of-flight mass spectrometry (Q-TOF) and machine learning were investigated to identify distinct metabolic signatures from nasopharyngeal samples for the diagnosis of respiratory tract infection.

Nasopharyngeal swab samples positive and negative for influenza A and B were analyzed by LC/Q-TOF to identify distinct metabolic signatures for diagnosis of acute illness. Machine learning models were performed for classification, followed by Shapley additive explanation (SHAP) analysis to analyze feature importance and for biomarker discovery. [000184] A total of 236 samples were tested in the discovery phase by LC/Q-TOF, including 118 positive samples (40 influenza A 2009 H1N1, 39 influenza H3 and 39 influenza B) as well as 118 age and sex-matched negative controls with acute respiratory illness. LC/Q-TOF combined with machine learning analysis allowed identification of key metabolites that distinguished positive influenza from negative samples with an area under the receiver operating characteristic curve (AUC) of 0.94 (95%CI 0.88, 1.00). In another analysis, it showed an area under the receiver operating characteristic curve (AUC) of 1.00 (95% confidence interval (95% Cl) 0.99, 1.00), Using 5-fold cross-validation, overall sensitivity was 1.00 (95% Cl 0.86, 1.00) and specificity was 0.96 (95% Cl 0.81, 0.99). The metabolite most strongly associated with differential classification was pyroglutamic acid. Independent validation of a biomarker signature based on the top 20 differentiating metabolites was performed in a prospective cohort of 96 symptomatic individuals including 48 positive samples (24 influenza A 2009 H1N1, 5 influenza H3 and 19 influenza B) and 48 negative samples. This signature, optimized for sensitivity, revealed an AUC of 0.99 (95% Cl 0.97, 1.00), sensitivity of 1.00 (95% Cl 0.93, 1.00) and specificity of 0.69 (95% Cl 0.55, 0.80). Testing performed using a simpler targeted approach, liquid chromatography triple quadrupole mass spectrometry, showed an AUC of 1.00 (95% Cl 0.998, 1.00), sensitivity of 0.94 (95% Cl 0.83, 0.98), and specificity of 1.00 (95% Cl 0.93, 1.00). Thus, this metabolomic approach may be used for diagnostic applications in infectious diseases testing directly from patient samples and may be eventually adapted for point-of-care testing.

[000185] Provided herein is a metabolomics method for the diagnosis of infectious diseases based on an in-line, two-column chromatographic arrangement that allows the capture of both non-polar and polar compounds in a single 20-minute run. This method is used for the characterization of host metabolite signatures directly from patient specimens using Liquid Chromatography Quadrupole Time-of-Flight (LC/Q-TOF), followed by a machine learning (ML) algorithm developed for metabolomics classification analysis and biomarker discovery. In a non-limiting example, the LC/Q-TOF method was used to profile metabolites between influenza-positive samples, including influenza A H1N1, influenza A H3, and influenza B viruses, and negative samples. Given the high healthcare impact of influenza, and the biological plausibility to suggest metabolomics be well-suited for their detection, experiments were performed evaluating feasibility and accuracy of the application of the LC-Q-TOF method for the diagnosis of influenza from nasopharyngeal specimens.

[000186] Biomarker discovery phase

[000187] A total of 248 samples were included for testing and analyzed by LC/Q-TOF for metabolite discovery. Of these, 6 were excluded prior to analysis due to technical errors and their 6 corresponding controls were excluded. The final analysis included a total of 236 samples, with 118 positive influenza samples (40 influenza A 2009 H1N1, 39 influenza A H3 and 39 influenza B) and 118 negative age and sex-matched controls (Table 1). Compared to individuals without influenza, those with a positive influenza result were less likely to be immunocompromised (22.9% vs 45.8%; p=0.001), more likely to have been tested at an outpatient clinic (63.0% vs 26.9%; p<0.001), less likely to have been hospitalized (24.6% vs 69.5%; p<0.001) and less likely to have been admitted to the intensive care unit (ICU) (5.1% vs 22.0%; p<0.001). Patient characteristics were otherwise similar. All-cause 30-day mortality was identical in each group at 3/118 (2.5%). The discovery cohort training set consisted of 186 samples (94 positive, 92 negative), and the test set consisted of 50 samples (24 positive, 26 negative).

[000188] LC/Q-TOF metabolomics combined with machine learning demonstrates high classification performance.

[000189] Untargeted metabolomics by LC/Q-TOF identified a total of 3,366 ion features. Of these, 48 ion features were removed given they showed zero values for all samples tested, leaving 3,318 ion features for analysis. Application of machine learning models to these features, specifically the LightGBM (LGBM) and random forest (RF) models achieved an area under the receiver operating characteristic curve (AUC) of 1.00 (95% Cl 0.99, 1.00) and 0.93 (95% Cl 0.86, 1.00), respectively, on the test set (FIG. 3A). The statistical or linear models also performed well, specifically Lasso obtaining an AUC of 0.94 (95% Cl 0.88, 1.00) and Ridge obtaining an AUC of 0.92 (95% Cl 0.85, 1.00). Subtraction of the background spectral data from the blank VTM replicates did not impact test performance of the model (FIG. 8). An operating point that maximized sensitivity across all thresholds was selected and showed a sensitivity of at least 0.9 for all models. At this operating point optimized for sensitivity and Younden’s J statistic, LGBM achieved a sensitivity of 1.00 (95% Cl 0.86, 1.00) and a specificity of 0.96 (95% Cl 0.81, 0.99), superior to other models (Table 2). Subgroup analysis of the performance of the LGBM model on adults and children showed an AUC of 0.99 (95% Cl 0.97, 1.00) for adults and an AUC of 1.00 (95% Cl 0, 1.00) for children (Table 5). The same model demonstrated an AUC of 1.00 (95% Cl 0, 1.00) in immunocompromised hosts, and an AUC of 0.99 (95% Cl 0.97, 1.00) in non-immunocompromised hosts (Table 5). Though only 33 individuals in this cohort were hospitalized to the intensive care unit (ICU); AUC was 1.00 (95% Cl 0, 1.00) in ICU patients compared to AUC 0.94 (95% Cl 0.85, 1.00) in non-ICU patients. Data from the other models including for individuals admitted to the ICU, with bacterial coinfection, antibiotic treatment, and by time since symptom onset are provided in Table 5. Restricting the LGBM analysis to influenza A (2009 H1N1 and H3N2) vs influenza B positive samples showed an AUC of 0.62. Furthermore, a separate multivariable model was performed including the variables age, sex, days since symptom onset and Charlson comorbidity index, and demonstrated evidence that only model outcome was significantly associated with influenza status classification (Table 6).

[000190] Top 20 signature validation by LC/MS-MS maintains high classification performance.

[000191] Untargeted metabolomics discovery analysis identified a total of 3318 features. After ranking the overall LC/Q-TOF features by importance, the top 20 ion features associated with classification, of which only 13 contributed more than 1% to model predictions (FIGs. 4A-4B). After ranking features by importance, demonstrated is a model trained using only the top feature (84.0447@0.81) (accurate mass @ retention time) had an AUC of 0.92 (95% Cl 0.84, 1.00). Models trained using the top 3, 5, and 7 features obtained AUCs of 0.98 (95% Cl 0.96, 1.00), 1.0 (95% Cl 0.99,1.00), and 0.99 (95% Cl 0.99, 1.00), respectively (FIG. 4C). Thus, use of a decision tree trained on only the top 5 features achieved performance comparable to the LGBM model on the full feature set. Furthermore, building a classifier using a single decision on the top feature achieved an AUC of greater than 0.9 on the withheld test set (FIG. 4D). The top 20 metabolite biomarker signature identified by LC/Q-TOF was validated in a cohort of samples from 96 symptomatic individuals with nasopharyngeal swabs including 48 positives (24 influenza A H1N1, 5 influenza A H3 and 19 influenza B) and 48 negatives. This signature, with thresholds optimized for sensitivity in order to rule out influenza, revealed an AUC of 0.99 (95% Cl 0.97, 1.00), sensitivity of 1.00 (95% Cl 0.93, 1.00) and specificity of 0.69 (95% Cl 0.55, 0.80) (FIG. 3D).

[000192] Testing was performed by LC/MS-MS using the same sample set. The top 20 biomarker signature revealed an overall AUC of 1.00 (95% Cl 0.998, 1.00), sensitivity of 0.94 (95% Cl 0.83, 0.98) and specificity of 1.00 (95% Cl 0.93, 1.00) (FIG. 5). Area under the curve and quantitation results for the top 2 biomarkers, pyroglutamic acid and its in source fragment ion, showed significantly lower pyroglutamic levels in influenza-infected individuals (FIG. 9). Heatmap analysis showed the top 20 biomarker signature varied slightly by influenza subtype compared to the negative subgroup (FIG. 6).

[000193] Pyroglutamic acid identified as top differentiating metabolite.

[000194] Metabolite identification through in-house library matching revealed a tier 1 match for compound 130.0507@0.81 as pyroglutamic acid, and compound 84.0447@0.81 as an in-source fragment ion of pyroglutamic acid. Furthermore, compound 350.0774@9.34 was identified to be consistent with formylmethyl glutathione. Further confirmation of this identification is under investigation. Further metabolite annotation work will be required for the other metabolites listed as these did not definitively match the in-house library or large database screening (Table 3).

[000195] This study of 236 nasopharyngeal swab samples from symptomatic individuals showed that the described LC/Q-TOF method combined with machine learning could differentiate between influenza-positive (including influenza A 2009 H1N1, H3 and influenza B) and influenza-negative samples with high test performance including AUC, sensitivity and specificity over 0.90. Given this untargeted approach presents significant upfront instrument expense and complexity in data reproducibility and processing, this was followed by a simpler targeted approach using tandem mass spectrometry (LC/MS-MS).

The top 20 biomarker signature identified by LC/Q-TOF was adapted to LC/MS-MS testing on a 96-sample set, and demonstrated sustained high performance. Given LC/MS-MS is already employed in multiple laboratories for routine clinical testing, this proof of concept provides a model for feasibility of adaptation and roll-out to other centralized laboratory facilities (Seger et al. Clin Biochem. 2020;82:2-11; and Garg et al. Methods Mol Biol. 2016;1383:1-10). [000196] Molecular testing has revolutionized viral diagnostics in clinical laboratories, with multiplexed reverse-transcriptase polymerase chain reaction (RT-PCR) representing the current standard of care for the diagnosis of respiratory viral infections. However, in clinical laboratories, limitations to this technique remain, including high cost, potential for false-negatives as assay target sequences mutate, and the inability to differentiate active infection from persistent nucleic acid detection (Somerville et al. Pathology. 2015;47(3):243-9; and Whiley et al. J Clin Virol. 2009;45(3):203-4). Thus, improved diagnostic tools for respiratory virus infections are needed (Somerville, et al. Pathology 47, 243-249 (2015); and Whiley et al. J Clin Virol 2009; 45(3): 203-4). Furthermore, the high complexity of many molecular assays limits their use at the point of care where the patient need a rapid and actionable diagnosis is highest. Metabolomics, or the large-scale study of small molecules, represents the ‘-omics’ technology closest to phenotype and thus holds promise to address current gaps in molecular testing of infectious diseases (Johnson et al. Nat Rev Mol Cell Biol 2016; 17(7): 451-9; Patti et al. Nat Rev Mol Cell Biol 2012; 13(4): 263-9; and Fiehn et al. Plant Mol Biol 2002; 48(1-2): 155-71). This is particularly important given the significant burden of respiratory viruses in the U.S. and internationally (Centers for Disease Control and Prevention. Disease Burden of Influenza. Available at: cdc.gov/flu/about/burden/index.html. Accessed March 6th 2020).

[000197] The metabolomics approach allows for real-time monitoring of host response, uses very little sample volume, is inexpensive to run on a cost per sample basis and allows for hypothesis-free untargeted exploration of novel biomarkers. Furthermore, our finding of a 20-biomarker signature demonstrating reproducible high test performance suggests that these biomarkers may be developed into an assay that could be performed at the point-of- care.

[000198] This study demonstrated that LC-Q-TOF combined with machine learning differentiate between influenza-positive (including influenza A 2009 H1N1, H3 and influenza B) and influenza-negative samples with high test performance including sensitivity, specificity and AUC over 90% by decision tree modelling. Further adjustment of the sensitivity and specificity cut-off may be considered to optimize test performance based on different projected testing indications. Given the novelty of this approach, comparative datapoints for this application are lacking. However, LC/Q-TOF plus machine learning compared favorably to a previous study using an unbiased proteomic approach from nasopharyngeal lavage sampling with normal saline from 15 previously healthy hosts experimentally infected with influenza A H3N2 or human rhinovirus. Their 10-peptide signature was validated in a cohort of 80 subjects, achieving overall AUC of 0.86, sensitivity of 75% and specificity of 97.5% including paired samples. Metabolomics sample processing is simpler and faster than the proteomic workflow (approximately 30 minutes for ultrafiltration compared to >20 hours for proteomics and gene expression studies, thus conferring a relative advantage even at similar performance with these methods. Previous studies using untargeted metabolomics approaches for the detection and characterization of respiratory virus infections are limited by important heterogeneity in analytical methods (including MS (LC-Q-TOF, GC-MS) and nuclear magnetic resonance (NMR)), specimen type (nasopharyngeal aspirate, serum, urine, cell culture), hosts (animal and human), viruses (influenza, RSV, human rhinovirus) and metabolic signatures (ranging from 10 to 285 metabolites) (Tian et al. Viruses 2019; 11(11); and Turi et al. Metabolomics 2018; 14(10): 135). These studies profiled metabolites and metabolic pathways but did not include quantitative analytical results of classification model performance thus limiting assessment of potential clinical utility as a diagnostic assay.

[000199] A total of 20 metabolites were retained for the biomarker signature. The top 2 differentiating metabolites were definitively identified (tier 1 match) as pyroglutamic acid and the in-source fragment ion of pyroglutamic acid. Pyroglutamic acid is a cyclized derivative of L-glutamic acid that is uncommonly found and for which high blood levels may indicate disorders of glutathione metabolism (Human Metabolome Database. Human Metabolome Database: Showing metabocard for Pyroglutamic acid (HMDB0000267). Available at: hmdb.ca/metabolites/HMDB0000267. Accessed March 13 2020). It is interesting that in this study, a lower pyroglutamic acid level was detected in infected individuals compared to uninfected individuals. The mechanism for this finding is unclear at present. A previous cell culture study with influenza A 2009 H1N1 had shown metabolites in purine, lipid and glutathione metabolisms to be altered in infection (Tian et al. Viruses 2019; 11(11)). In addition, several biomarkers were shown to be altered in the serum of emergency room patients infected with influenza A 2009 H1N1, including lysophospholipids and sphingolipids related to inflammation, bile acids and tryptophan metabolites (Ferrarini et al. Electrophoresis 38, 2341-2348 (2017)). In addition, two animal studies noted changes in energy center metabolites such as glucose and glycine in nasal washes from ferrets infected with influenza A after oseltamivir treatment (Human Metabolome Database. Human Metabolome Database: Showing metabocard for Pyroglutamic acid (HMDB0000267). Available at: hmdb.ca/metabolites/HMDB0000267. Accessed March 13 2020) and in galactose, glycine, serine and threonine metabolism in plasma from mice infected with influenza A (Qian et al. J Chromatogr B Analyt Technol Biomed Life Sci 2018; 1092: 122-30). Further metabolomics work is under investiagtion to further characterize the metabolic pathways involved in influenza virus infections (Smallwood et al. Cell Rep 2017; 19(8): 1640-53).

[000200] The top 20 ion features retained in the biomarker signature likely represent a heterogeneous group of compounds from a variety of biological pathways. The top two ion features were successfully identified through in-house library matching as pyroglutamic acid (130.0507@0.81) and an in-source fragment ion of pyroglutamic acid (compound 84.0447@0.81), which are decreased in specimens from influenza-infected individuals. Pyroglutamic acid (synonyms: pidolic acid, 5-oxoproline) is a cyclized derivative of L- glutamic acid which can form in one of three ways in the living cell: from the degradation of glutathione, from incomplete reactions following glutamate activation, or from the degradation of proteins containing pyroglutamic acid at the N-terminus (Kumar. Current Science. 2012;102(2):288-97). Several recent studies have highlighted the complex interaction between glutathione metabolism important in reactive oxygen species (ROS) regulation and infection with influenza, which is known to increase the formation of ROS (Keshavarz et al. Cell Mol Biol Lett. 2020;25:15; Amatore et al. FASEB Bioadv. 2019;l(5):296-305; Nencioni et al. FASEB J. 2003;17(6):758-60; and Cai et al. Free Radic Biol Med. 2003;34(7):928-36). In a study using ultra-high-pressure LC/Q-TOF to detect early metabolic disturbances following infection with influenza H1N1 in A549 human lung epithelial cells, significant differences were found in 50 metabolites which were mainly mapped to purine, glutathione and lipid metabolism pathways (Tian et al. Viruses.

2019; 11(11)). In the reference study, the infected A549 cells were washed and lysed prior to metabolite analysis, and showed upregulation of glutathione metabolism with an increase in the intracellular concentration of pyroglutamic acid. The results herein show a decrease in pyroglutamic acid in NP swabs from influenza-infected individuals. Given the specimens used herein were not washed or lysed, the observed decrease in pyroglutamic acid in NP swabs from infected individuals may be due to decreased extracellular concentrations from increased use of glutathione in the intracellular space. Alternatively, a more complex mechanism involving oxidative stress and upstream metabolic effects may be at play. Though the mechanism giving rise to differential concentrations of pyroglutamic acid in the specimens is not yet known, the results conform to the findings in the current literature which highlight glutathione metabolism as a key pathway altered during influenza infection. In addition, the detected pyroglutamic acid was not identified to be an in-source fragment of glutamate, further supporting its independent role.

[000201] In this study, both statistical (such as linear) models and machine learning (such as decision trees) models were explored comprehensively to assess for best test performance for these untargeted metabolomics data. These results were reproducible across datasets and across models, adding confidence to the findings. Furthermore, the machine learning models were observed to consistently outperform the statistical models, consistent with findings in previous studies supporting use of a random forest approach (Li et al. Brief Bioinform 2018; 19(2): 325-40; and Trainor et al. Metabolites 2017; 7(2)).

[000202] This study presents several strengths. Firstly, it demonstrated high test performance in the discovery cohort, which was independently validated in a distinct cohort, supporting the reproducibility and robustness of this approach. Furthermore, high performance on the simpler tandem mass spectrometry testing may facilitate uptake by a large number of laboratories, alleviating the need for complex testing by LC/Q-TOF, and enabling testing at an estimated cost of less than $5 per sample. Second, it demonstrated a large effect size from a limited number of compounds in the SHAP feature importance analysis. This increases the feasibility of adapting this diagnostic approach to a point-of- care device such as portable mass spectrometry, though further work is performed to determine the optimal number of biomarkers required for this purpose. Third, this study was based on a real-world, diverse patient population of individuals who were naturally infected with influenza, which may better approximate metabolic changes compared to experimentally-infected previously healthy volunteers. Furthermore, this patient population was diverse including children and adults in the inpatient and outpatient settings, and additionally included a high proportion of immunocompromised individuals. Also, this study was based on a newly-adapted in-line, two-column LC arrangement that provides highly accurate results in a single injection. Standard of care in untargeted metabolomics is to perform a minimum of 4 runs (positive mode, negative mode, polar and non-polar) which increases imprecision and turnaround time, as well as the complexity of downstream analysis (Gertsman et al. J Inherit Metab Dis 2018; 41(3): 355-66). The in-line, two-column approach thus streamlined pre-analytical and analytical steps, providing simpler and more precise data for analysis. Furthermore, cases and controls in the discovery cohort were tightly age- and sex-matched, thus reducing potential confounders in metabolomic analysis due to up- or downregulation of certain metabolic pathways based on these host factors (Srivastava. Metabolites 2019; 9(12)). In addition, this cohort included a large number of samples, conferring over 90% power to detect a difference between influenza-infected and uninfected individuals. Finally, a systematic and comprehensive bioinformatics pipeline analysis strategy was used to identify the best model for untargeted and targeted metabolomics data.

[000203] This study also presents limitations, for example including lack of sample suitability assessment. First, this proof-of-concept study was performed at a single institution only and it is unclear if results are generalizable to other patient populations. However, that consistent results were observed across very diverse patient groups lends support to the potential generalizability of this diagnostic approach. Second, only influenza positive and negative samples were compared in the untargeted approach such that other respiratory viruses, and bacterial or viral co-infections were not extrapolated. However, limited coinfection data in the validation cohort supported maintained performance. Further study is performed to better understand changes that occur across the spectrum of nasopharyngeal microbiome including bacterial colonization or coinfection, and to incorporate comparisons with other important respiratory viruses such as RSV and parainfluenza to better rival with current molecular diagnostic methods. Third, this study did not assess the nasopharyngeal or respiratory metabolic profiles of healthy individuals as negative controls. Such comparison may help further isolate the metabolites that change in response to acute viral infection, not just from any respiratory illness. Furthermore, sample adequacy was not assessed due to the proprietary nature of the internal control included in the commercial respiratory pathogen panel used for clinical testing. Fourth, repeat longitudinal samples were not performed in the same individuals, and paired plasma or urine samples were not included, which would have strengthened findings of the identified metabolites, if reproducible. Finally, viral transport medium contains small molecules that may have confounded the analysis. However, subtraction of background spectral data from the blank VTM sample replicates did not impact test performance of the model, suggesting these data did not significantly contribute to model classification.

[000204] In summary, this study demonstrated the feasibility and high accuracy of an untargeted metabolomics approach from nasopharyngeal samples for the identification of distinct metabolic signatures for the diagnosis of influenza infection. This approach requires simple sample processing, low sample volume and is very inexpensive on a per test basis. This approach maintained high performance after adaptation to simpler LC/MS-MS instruments. Work is under investigation to leverage the full potential of this method including expansion to other patient settings and in larger cohorts, additional pathogens and sample types, and to prospectively assess its potential as a prognostic tool. In addition, this method could be studied as an adjunctive diagnostic tool, and to explore metabolic pathways that could eventually be harnessed for therapeutic potential.

[000205] For the discovery cohort, selected are stored specimens collected from April 23 2015 to October 13 2019 to achieve a 1:1 ratio of positive to age and sex-matched negative controls. Age-matching was performed to the identical age, or within 2 years if not available.

[000206] Included were specimens from 96 children (2-17 years-old) and 140 adults (>18 years-old). These corresponded to 123 males and 113 females. Mixed infections and samples from other sites (e.g., oropharyngeal swab, bronchoalveolar lavage and lung tissue) were excluded. Individual retrospective chart review was performed for all subjects in the untargeted phase of the study to identify age, sex, immunocompromised status, comorbidities, disease severity, antiviral treatment and clinical outcomes. LC/Q-TOF testing was performed to generate raw data on mass-to-charge ratio and retention time for each sample tested.

[000207] For the validation cohort, selected were negative and positive nasopharyngeal and nasal swab specimens from December 21 2019 to February 18 2020 in a 1:1 ratio without exclusion. Samples were subsequently stored at -80°C until testing. Testing was performed at the Stanford Biochemical Genetics Laboratory using a validation sample set of 96 samples tested by LC/MS-MS. Of the individuals with available demographic data, there were 14 children and 80 adults, corresponding to 39 females and 55 males. There were three individuals with documented viral coinfection (seasonal coronavirus, RSV or CMV) in the validation cohort.

[000208] The research objective was to assess the diagnostic test performance of the LC/Q- TOF (biomarker discovery cohort) and targeted analysis (validation cohort) for the diagnosis of influenza-infected vs uninfected individuals, and to identify key metabolites for classification of these two groups. In both the discovery and validation cohorts, target sample size was determined before the experiments to achieve over 90% power based on an AUC of 0.925 for detection of a difference in the primary outcome of influenza infection vs no infection. A secondary endpoint of influenza A vs influenza B was established in the study design phase, and used as an exploratory endpoint. The target sample size was not changed during the study.

[000209] Study population and sample collection: Nasopharyngeal samples collected from adult patients from Stanford Health Care (SHC) and children from the Lucille Packard Children’s Hospital (LPCH) were processed per routine clinical procedures. Briefly, a flocked swab was inserted in the nasal passage, rotated for collection of cells for 10-15 seconds and placed in viral transport medium (MicroTest M4RT, Remel Inc., San Diego, CA). Respiratory viral testing was performed on the ePlex Respiratory Pathogen (RP) panel (GenMark Diagnostics, Carlsbad, CA) at the Stanford Clinical Virology Laboratory. This automated qualitative nucleic acid amplification test (NAAT) identifies 15 viral targets, including influenza A, influenza H1N1 2009, influenza A H3 and influenza B. Specimens were aliquoted and stored at -80°C for subsequent LC-Q-TOF testing. For the primary validation and testing cohort, selected are stored specimens collected from 2015-2019 to achieve a 1:1 ratio of positive to age and sex-matched negative controls. Age-matching was performed to the identical age, or within 2 years if not available. Included are specimens from children (2-17 years-old) and adults (>18 years-old). Mixed infections were excluded. For the prospective cohort, selected are consecutive negative and positive nasopharyngeal swab specimens in a 1:1 ratio without exclusion. Individual retrospective chart review was performed for all subjects in the study to identify age, sex, immunocompromised status, comorbidities, disease severity, antiviral treatment and clinical outcomes.

[000210] For the discovery cohort, LC/Q-TOF testing was performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing was performed, and outlier data points were included for analysis. For the validation cohort, LC/MS-MS testing was performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing was performed, and outlier data points were included for analysis. This method served to confirm the results from the LC/Q-TOF analysis in a separate participant cohort.

[000211] Materials: The following LC-MS grade reagents were used for the experiments: methanol and formic acid (Fisher Scientific, Chino, CA), ammonium formate salt and high- purity ammonium hydroxide (25% v/v) (Sigma Aldrich, St. Louis, MO) and water (VWR, Visalia, CA). In addition, high-pressure liquid chromatography (HPLC) grade acetonitrile and isopropanol (VWR), and MS calibration and reference mass solutions (Agilent Technologies, Santa Clara, CA) were used. The Mass Spectrometry Metabolite Library of Standards was purchased to build the in-house reference database (IROA Technologies, Boston, MA), and was complemented by additional standards (Sigma-Aldrich).

[000212] LC-Q-TOF method: Liquid chromatography (LC) separation was performed on an Agilent 1290 Quaternary LC system (Agilent Technologies). In this unique chromatographic arrangement, two columns were used in-line: a reverse-phase (RP) column of 2.1 x 50mm 1.8 pm HSS T3 (Waters Corporation, Milford, MA) was placed first followed by an ion exchange (IEX) column of 2.0 x 30mm 3-pm Intrada (Imtakt USA, Portland, OR). Both columns were joined with EXP2 fittings (Optimize Technologies, OR). Mass spectrometry was performed on an Agilent 6545 Q-TOF instrument with electrospray ionization. The optimized mobiles phases were A) 150 mg of ammonium formate per liter water with 0.4% formic acid (v/v), B) 1.2 g of ammonium formate per liter of methanol with 0.2% formic acid, and C) water with 1% each formic acid and ammonium hydroxide, as previously described (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.

2020; 1143: 122072). The flow rate was 0.5 mL/minute, column temperature of 45°C and injection volume of 5pL, for a total run time of 20 minutes (inject-to-inject). MS was performed on an Agilent 6545 Q-TOF with dual Agilent JetStream electrospray ionization, as previousely described (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.

2020; 1143: 122072). The instrument was operated in sensitivity-mode with extended dynamic range and positive polarity, scanning from 50-1100 m/z. [000213] LC/Q-TOF metabolite extraction and analysis: A volume of 100pL of nasopharyngeal sample eluted in VTM was processed by ultracentrifugation using Pall Omega 3kDa centrifugal devices (VWR, Radnor, PA) at 4°C for 15 minutes at 17,000 x g. The filtrate was transferred to glass vials and analyzed, and each sample was run once. Two quality controls (QC) samples, one pooled QC sample and an independent normalization QC were used to assess for batch effect. The pooled QC was created by pooling an equal volume of aliquots from all the samples included in the run. Unsupervised principal component analysis was performed to visually assess appropriate performance of the pooled QC. In addition, blank VTM was run in triplicate to generate a mean background spectral distribution. Progenesis QI software (Waters Corporation) was used for run alignment, peak picking (automatic, level 4), adduct deconvolution, and feature identification. Positive polarity analysis was performed using the adducts [M+H], [M+NEL] and [M+Na] Metabolite identification was first performed using a previously-developed authentic standard library (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.

2020; 1143: 122072). If there was no identification match, preliminary annotation was performed in Progenesis QI software using the HMDB (Wishart et al. Nucleic Acids Res 2018; 46(D1): D608-D17) and KEGG (Kanehisa et al. Nucleic Acids Res 2000; 28(1): 27- 30) plug-ins, and by manual review in the NIST 20 MSMS library and METLIN. A mass error setting of 30ppm was used. Data were directly exported from Progenesis for machine learning analysis using peak area filters of 0; 5,000; 10,000 and 20,000 relative abundance values. Outlier values were not excluded.

[000214] LC-MS/MS Targeted methods: In some embodiments, the targeted analysis was performed on a clinically-validated method that detects pyroglutamic acid (Mak et al. Methods Mol Biol. 2019;2030:85-109; and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;944:166-74). Mass spectrometry was performed on an Agilent 6460 Triple Quadrupole mass spectrometer equipped with an Agilent JetStream electrospray ionization, as previously described (Mak et al. Methods Mol Biol. 2019;2030:85-109; and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;944:166-74). Selected reaction monitoring (SRM) pairs based on the important ion features were added to the method (Table 7). Liquid chromatography separation was performed on a two-dimensional Agilent 12002x Binary LC system (Agilent Technologies), as previously described (Mak et al. Methods Mol Biol. 2019;2030:85-109; and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;944:166-74). Two columns were connected using a 10-port switching valve (Rheodyne). First dimensional separation used a Thermo Hypercarb column, 3 x 50mm, 3 pm (Thermo, UK). Second dimensional separation used a Waters BEH C18 column, 2.1 x 100mm, 2.5pm (Waters Corporation). Mobile phase A, 0.03% perfluoroheptanoic acid in water, is identical for both pumps 1 and 2. Mobile phase B, acetonitrile, is identical for both pumps 1 and 2. The data were acquired using MassHunter Workstation Acquisition version B.08.02 (Agilent) and exported for ML analysis.

[000215] LC-MS/MS Targeted methods: In some embodiments, The targeted analysis was performed using the same method as described for LC/Q-TOF above, but adapted for LC/MS-MS with SRM. Mass spectrometry was performed on a Waters Xevo TQ-XS mass spectrometer equipped with electrospray ionization. Ion features that were not detected by this method on the pooled sample tested on the Xevo TQ-XS were removed from the SRM pairs. Liquid chromatography separation was performed on a Waters Acquity H-class quaternary LC system (Waters Corp.). The two-column arrangement described for LC/Q- TOF was replicated. This included a reverse-phase (RP) column of 2.1 x 50mm 1.8pm HSS T3 (Waters Corp), followed by an ion exchange (IEX) column of 2.0 x 30mm 3pm Intrada (Imtakt). Both columns were connected using PEEK tubing. The mobiles phases were A) 150 mg of ammonium formate per liter water with 0.4% formic acid (v/v), B) 1.2 g of ammonium formate per liter of methanol with 0.2% formic acid, and D) water with 1% each formic acid and ammonium hydroxide.

[000216] LC-MS/MS Metabolite Extraction and Analysis: A volume of 100 pL of respiratory specimen eluted in VTM or phosphate buffered saline (PBS) and 10 pL of pyroglutamic acid-D5 0.025nm/L as internal standard (Cambridge Isotope Laboratories,

Inc, Tewksbury, MA) was processed by ultrafiltration using Pall Omega 3kDa centrifugal devices (VWR, Radnor, PA) at 4°C for 15 minutes at 17,000 x g. The filtrate was transferred to glass vials and analyzed. The data were acquired using MassLynx version 4.2 (Waters Corp).

[000217] Statistical Analysis: Descriptive analysis was performed by Chi-squared test (categorical variables if 5 or more variables per cell) or Fisher’s exact test (categorical variables if less than 5 variables per cell) and Mann-Whitney U test (continuous variables), using Stata vl5.1 (Stata Corp, College Station, TX). Missing data are identified as unknown. A two-sided p value of <0.05 was considered significant

[000218] Machine Learning Analysis

[000219] Machine learning methods were developed for the task of determining whether a sample was positive or negative for influenza based on its metabolic profile. Machine learning is a class of techniques that uses data to learn a model that maps an input (the metabolic profile of a sample; includes mass-to-charge ratio (m/z) and retention time for each sample) to its associated output (the influenza infection outcome of the sample), and uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (the influenza outcomes of new samples). Two machine learning methods were implemented: gradient boosted decision trees and random forests.

[000220] Gradient boosted decision trees and random forests are both ensemble learning methods that improve upon the performance of decision tree models. Decision tree learners construct a model by iteratively identifying which feature most effectively divides the data into groups with low within-group variation in the outcome and high between-group variation in outcome, and then repeat the process within each group. Gradient boosted decision trees (GBDT) construct several decision trees such that each tree learns from the errors of the prior tree (Ke et al. Neural Information Processing Systems Foundations 2017). Random forests (RFs) construct several decision trees such that each tree is constructed using different subsets of the data. The machine learning approaches of GBDT and RF were chosen over alternative machine learning methods because they can handle mixes of categorical and continuous covariates, capture nonlinear relationships, and scale well to large amounts of data.

[000221] Dataset Splitting: Ion features showing zero values through all samples tested were removed from the dataset. The remaining dataset without normalization was partitioned into a training set used to develop machine learning models, and a holdout test set used to evaluate the predictive performance of the machine learning models. The partitioning of the dataset was random such that 80% of the samples were included in the training set, and the other 20% in the test set. There was no overlap between the samples and patients between the two sets. [000222] All models were developed on the training set, and their final performance reported on the holdout test set and/or the prospective cohort. Within the training set cross- validation was used to develop the models to avoid overfitting to the training set. Within the training set, cross-validation was used to develop the models. In the cross-validation procedure, the training dataset was randomly partitioned into k=4 equal sized subsamples consisting of an approximately equal percentage of each class. Of the k subsamples, a single sub sample was retained as the validation data for the model, and the remaining k - 1 subsamples were used to train a model. The cross-validation process was then repeated k times, with each of the k subsamples used exactly once as the validation data. Grid search was used to find the best set of hyperparameters for model training; the same hyperparameter settings were used across all k folds. The resulting k models (one from each fold) were used to make k sets of predictions on the test set, which were then averaged using a simple mean to make the final prediction for each sample in the test set.

[000223] Machine Learning Methods vs Traditional Linear Models: To determine the usefulness of capturing non-linear relationships with machine learning models, the modelling approaches using two machine learning methods, gradient boosted decision trees and random forests, were compared with two traditional linear models, least absolute shrinkage and selection operator (LASSO) and Ridge. These models are variants of Logistic regression, a statistical model that uses the logistic function to model the outcome assuming a linear relationship between the features and the outcome. LASSO makes the same linear assumption but alters the model fitting process to select only a subset of the features for use in the final model rather than using all of them. Unlike LASSO, Ridge does not result in a sparse model, but rather addresses multicollinearity in the features by shrinking the weights assigned to correlated variables. The training and test sets, and the cross-validation strategy were identical across the machine learning models and traditional linear models.

[000224] Feature Importance: The SHAP (SHapley Additive explanations) method was used to quantify the impact of each feature on the models. The method explains prediction by allocating credit among the input features; feature credit is calculated using Shapley Values (Lundberg et al. Nat Biomed Eng 2018; 2(10): 749-60; and Lundberg. Neural Information Processing Systems Foundations 2017; 30), as the change in the expected value of the model’s prediction of improvement for a symptom when a feature is observed versus unknown. To uncover clinically important metabolite ion features that were globally predictive of the outcome, the Shapley values for the top 20 ion features on individual predictions were aggregated and reported along with their averaged absolute Shapley contributions as a percent of the contributions of all the features.

[000225] Parsimonious Model: A set of parsimonious models were developed designed to use a small subset of features identified to be important by the feature importance method. The top k features with highest overall importance to the machine learning models were used; k values of 1, 3, 5, and 7 were used. On each of these choices, a single decision tree model was trained using the previously described cross-validation strategy to build the parsimonious model. Maximum depth was restricted to k, and additional hyperparameters were optimized using grid search during cross-validation. The performance of the parsimonious models was compared to the full models.

[000226] The subgroup analysis was used to evaluate variation in model performance across patient subpopulations. An LGBM model was trained using the previously described cross-validation strategy on the discovery training set and generated predictions with this model on the discovery test set. The test samples were then split into disjoint subpopulations and reported the AUC and confidence interval using DeLong’s method for each subgroup. The following subgroups were investigated: adult vs pediatric individuals, immunocompromised vs not, ICU-admitted vs not, antibiotic-treated vs not, bacterial coinfection vs not, and by time since symptom onset at the time of respiratory viral testing (<7 days vs >7 days).

[000227] A multivariable analysis was used to investigate the significance of potential confounders in the analysis. The model was further trained and predictions on the discovery test set were generated using the previously described methods. An additional logistic regression was then performed on the true label with predictors comprising predicted score, age, sex, number of days since symptom onset, Charlson comorbidity index score, and hospitalization status. The significance of each predictor was determined using the p-value from this regression.

[000228] The performance of the models was evaluated on the hold out test set. The primary measure of model performance was the area under the receiver operating characteristic curve (AUC), which illustrates the diagnostic discriminative performance of the models. Performance measures for the models also included sensitivity, specificity, and accuracy at a high-sensitivity operating point used to binarize the model predictions. In some embodiments, the high-sensitivity operating point was selected on the training set by aggregate the predictions on the k validation folds, and then picking the threshold that produced a model sensitivity closest to 0.9. In some embodiments, the high-sensitivity operating point was selected by selecting a high-sensitivity operating point on each of the k validation folds and averaging them: on each validation fold, an operating point that maximized the Youden’s J statistic and produced a sensitivity of at least 0.9 was selected. To assess the variability in estimates, 95% Wilson score confidence intervals were provided for sensitivity, specificity, and accuracy and 95% DeLong confidence intervals were provided for AUC (DeLong et al. Biometrics. 1988;44(3):837-45).

[000229] Analyses were performed in Python version 3.6.8, using the LightGBM v2.2.3 implementation for gradient boosted decision trees, scikit-learn v0.20.2 for random forest, stratified k-fold cross-validation and grid search (DeLong et al. Biometrics 44, 837-845 (1988)), SHAP (SHapley Additive explanations) vO.29.1 for computing feature importance, and R version 3.5.0 for statistical analysis.

[000230] Table 2. Baseline demographic characteristics of all patients in the untargeted phase (i.e., the biomarker discovery (LC/Q-TOF) phase) of the study.

[000231] * The p values were Fisher’s exact test if categorical variables with less than 5 datapoints per cell, and by Mann Whitney U test for continuous variables.

[000232] ED: emergency department; ICU: intensive care unit; IQR: inter-quartile range; SD: standard deviation; yo: years-old.

[000233] Table 3. Metabolite annotation of the top compounds associated with differentiation of influenza-positive from negative samples.

Da: daltons; m/z: mass to charge ratio.

[000234] Table 4. Overall sensitivity and specificity values for the four machine learning and statistical models. LGBM: LightGBM; RF: random forests.

[000235] Table 5. Subgroup analyses for AUC data for adult vs pediatrics, immunocompromised vs non-immunocompromised individuals, ICU-admitted vs non-ICU- admitted individuals, presence of bacterial coinfection or colonization or not, antibiotic treatment vs no antibiotic treatment, and time since symptom onset. Bacterial coinfection or colonization was defined as a positive respiratory culture or positive molecular test for a bacterial pathogen within 7 days of the index respiratory viral testing. The number (n) corresponds to the size of the test set.

[000236] ATBx: antibiotic; AUC: area under the receiving operating characteristic curve; Cl: confidence interval; coinfx: coinfection; d: days; IC: immunocompromised; ICU: intensive care unit; LGBM: LightGBM; Peds: pediatrics; RF: random forests.

[000237] Table 6. Multivariable linear regression model for influenza status prediction adjusted for age, sex, Charlson comorbidity score, number of days since symptom onset and machine learning model output. Only model output was observed to be significantly associated with influenza status prediction. Cl: confidence interval.

[000238] Table 7. Selected multiple reaction monitoring (SRM) pairs added to the LC/MS- MS Analysis. Compounds are listed by name or by accurate mass @ retention time.

[000239] Table 8. Key resources table Example 3 - Novel metabolomics method for the diagnosis of SARS-COV-2 infection and/or COVID-19 disease

[000240] The research objective is to assess the diagnostic test performance of the LC/Q- TOF (biomarker discovery cohort) and targeted analysis (validation cohort) for the diagnosis of SARS-COV-2-infected vs uninfected individuals, and to identify key metabolites for classification of these two groups. In both the discovery and validation cohorts, target sample size is determined before the experiments to achieve over 90% power based on an AUC of 0.925 for detection of a difference in the primary outcome of SARS- COV-2 infection vs no infection.

[000241] Nasopharyngeal samples collected from adult patients and children are processed per routine clinical procedures. Briefly, a flocked swab is inserted in the nasal passage, rotated for collection of cells for 10-15 seconds and placed in viral transport medium (MicroTest M4RT, Remel Inc., San Diego, CA). Specimens were aliquoted and stored at - 80°C for subsequent LC/Q-TOF testing.

[000242] For the discovery cohort, stored specimens are selected to achieve a 1 : 1 ratio of positive to age and sex-matched negative controls. Age-matching is performed to the identical age, or within 2 years if not available. Individual retrospective chart review is performed for all subjects in the untargeted phase of the study to identify age, sex, immunocompromised status, comorbidities, disease severity, antiviral treatment and clinical outcomes. LC/Q-TOF testing is performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing is performed, and outlier data points are included for analysis. For the validation cohort, negative and positive nasopharyngeal and nasal swab specimens are selected in a 1 : 1 ratio without exclusion. Samples are subsequently stored at -80°C until testing. LC/MS-MS testing is performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing is performed, and outlier data points are included for analysis.

[000243] The following methods and analysis are performed as described in Example 2: LC/Q-TOF methods, LC/Q-TOF metabolite extraction and analysis, LC-MS/MS targeted methods, LC-MS/MS metabolite extraction and analysis, statistical analysis, machine learning analysis, dataset splitting, comparing machine learning methods vs traditional linear models, determining feature importance, subgroup analysis, and multivariable analysis.

Equivalents

[000244] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this technology belongs.

[000245] The present technology illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising,” “including,” “containing,” etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the present technology claimed.

[000246] Thus, it should be understood that the materials, methods, and examples provided here are representative of preferred aspects, are exemplary, and are not intended as limitations on the scope of the present technology.

[000247] It should be understood that although the present invention has been specifically disclosed by certain aspects, embodiments, and optional features, modification, improvement and variation of such aspects, embodiments, and optional features can be resorted to by those skilled in the art, and that such modifications, improvements and variations are considered to be within the scope of this disclosure.

[000248] The present technology has been described broadly and generically herein. Each of the narrower species and sub-generic groupings falling within the generic disclosure also form part of the present technology. This includes the generic description of the present technology with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.

[000249] It should be noted that although the diagrams herein may show a specific order and composition of method steps, it is understood that the order of these steps may differ from what is depicted. For example, two or more steps may be performed concurrently or with partial concurrence. Also, some method steps that are performed as discrete steps may be combined, steps being performed as a combined step may be separated into discrete steps, the sequence of certain processes may be reversed or otherwise varied, and the nature or number of discrete processes may be altered or varied. The order or sequence of any element or apparatus may be varied or substituted according to alternative embodiments. Accordingly, all such modifications are intended to be included within the scope of the present disclosure as defined in the claims. Such variations will depend on the machine- readable media and hardware systems chosen and on designer choice. It is understood that all such variations are within the scope of the disclosure. Likewise, software and web implementations of the present disclosure may be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps.

[000250] The embodiments described herein have been described with reference to drawings. The drawings illustrate certain details of specific embodiments that provide the systems, methods and programs described herein. However, describing the embodiments with drawings should not be construed as imposing on the disclosure any limitations that may be present in the drawings.

[000251] It should be noted that although the diagrams herein may show a specific order and composition of method steps, it is understood that the order of these steps may differ from what is depicted. For example, two or more steps may be performed concurrently or with partial concurrence. Also, some method steps that are performed as discrete steps may be combined, steps being performed as a combined step may be separated into discrete steps, the sequence of certain processes may be reversed or otherwise varied, and the nature or number of discrete processes may be altered or varied. The order or sequence of any element or apparatus may be varied or substituted according to alternative embodiments. Accordingly, all such modifications are intended to be included within the scope of the present disclosure as defined in the claims. Such variations will depend on the machine- readable media and hardware systems chosen and on designer choice. It is understood that all such variations are within the scope of the disclosure. Likewise, software and web implementations of the present disclosure may be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps.

[000252] In addition, where features or aspects of the present technology are described in terms of Markush groups, those skilled in the art will recognize that the present technology is also thereby described in terms of any individual member or subgroup of members of the Markush group.

[000253] All publications, patent applications, patents, and other references mentioned herein are expressly incorporated by reference in their entirety, to the same extent as if each were incorporated by reference individually. In case of conflict, the present specification, including definitions, will control.

[000254] Other aspects are set forth within the following claims.

Previous Patent: COLLAGEN NUCLEATION INHIBITORS

Next Patent: MODIFIED PEPTIDE NUCLEIC ACID COMPOSITIONS