Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VIROLOGICAL AND MOLECULAR SURROGATES OF RESPONSE TO SARS-COV-2 NEUTRALIZING ANTIBODY SOTROVIMAB
Document Type and Number:
WIPO Patent Application WO/2023/086635
Kind Code:
A1
Abstract:
Methods involve deployment of a biomarker panel for classifying patients as responders/non-responders to a SARS-CoV-2 therapeutic (e.g., Sotrovimab) and/or classifying patients as at risk or not at risk for severe infectious disease. Using whole transcriptome data, example biomarker panels are deployed to analyze expression values of certain genes. Such biomarker panels outperform quantifiable clinical laboratory markers (e.g., lymphocytes or neutrophil-lymphocyte ratio), thereby indicating systemic immune response, as evidenced by expression values of biomarkers, provide powerful predictive insights.

Inventors:
MAHER MICHAEL CYRUS RILEY (US)
SORIAGA LEAH B (US)
TELENTI AMALIO (US)
Application Number:
PCT/US2022/049833
Publication Date:
May 19, 2023
Filing Date:
November 14, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VIR BIOTECHNOLOGY INC (US)
International Classes:
C12Q1/6886
Foreign References:
JP2016165286A2016-09-15
US11168128B22021-11-09
US6210891B12001-04-03
US6258568B12001-07-10
US6833246B22004-12-21
US7115400B12006-10-03
US6969488B22005-11-29
Other References:
LÉVY YVES ET AL: "CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death", ISCIENCE, vol. 24, no. 7, 10 June 2021 (2021-06-10), XP055902649
LIU JING ET AL: "Longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of SARS-CoV-2 infected patients", EBIOMEDICINE, vol. 55, 18 April 2020 (2020-04-18), NL, pages 102763, XP055833989, ISSN: 2352-3964, DOI: 10.1016/j.ebiom.2020.102763
LUCAS CAROLINA ET AL: "Longitudinal analyses reveal immunological misfiring in severe COVID-19", NATURE, NATURE PUBLISHING GROUP UK, LONDON, vol. 584, no. 7821, 27 July 2020 (2020-07-27), pages 463 - 469, XP037223596, ISSN: 0028-0836, [retrieved on 20200727], DOI: 10.1038/S41586-020-2588-Y
SCHULTE-SCHREPPING JONAS ET AL: "Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment", CELL, vol. 182, no. 6, 1 September 2020 (2020-09-01), Amsterdam NL, pages 1419 - 1440.e23, XP055802826, ISSN: 0092-8674, Retrieved from the Internet DOI: 10.1016/j.cell.2020.08.001
PATEL HAMEL ET AL: "Proteomic blood profiling in mild, severe and critical COVID-19 patients", MEDRXIV, 23 June 2020 (2020-06-23), XP055859782, Retrieved from the Internet [retrieved on 20211110], DOI: 10.1101/2020.06.22.20137216
ALLEGRA ALESSANDRO ET AL: "Immunopathology of SARS-CoV-2 Infection: Immune Cells and Mediators, Prognostic Factors, and Immune-Therapeutic Implications", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 21, no. 13, 6 July 2020 (2020-07-06), pages 4782, XP055859367, DOI: 10.3390/ijms21134782
ALLEGRA ALESSANDRO ET AL: "Immunopathology of SARS-CoV-2 Infection: Immune Cells and Mediators, Prognostic Factors, and Immune-Therapeutic Implications", THE FEBS JOURNAL, 24 October 2020 (2020-10-24), GB, XP055803488, ISSN: 1742-464X, Retrieved from the Internet DOI: 10.1111/febs.15609
KREUZBERGER NINA ET AL: "SARS-CoV-2-neutralising monoclonal antibodies for treatment of COVID-19", COCHRANE DATABASE OF SYSTEMATIC REVIEWS, vol. 2021, no. 9, 2 September 2021 (2021-09-02), XP055843698, DOI: 10.1002/14651858.CD013825.pub2
VOELKERDING ET AL., CLINICAL CHEM., vol. 55, 2009, pages 641 - 658
MACLEAN ET AL., NATURE REV. MICROBIOL., vol. 7, pages 287 - 296
DI IULIO J ET AL.: "Transfer transcriptomic signatures for infectious diseases", PROC NATL ACAD SCI U S A, vol. 118, 2021, XP055860439, DOI: 10.1073/pnas.2022486118
LIBERZON A ET AL.: "The Molecular Signatures Database (MSigDB) hallmark gene set collection", CELL SYST, vol. 1, 2015, pages 417 - 25
ULLOQUE-BADARACCO JR ET AL.: "Prognostic value of neutrophil-to-lymphocyte ratio in COVID-19 patients: A systematic review and meta-analysis", INT J CLIN PRACT, 2021, pages e14596
SIMADIBRATA DM: "Neutrophil-to-lymphocyte ratio on admission to predict the severity and mortality of COVID-19 patients: A meta-analysis", AM J EMERG MED, vol. 42, 2021, pages 60 - 9
Attorney, Agent or Firm:
CLARK ZHANG (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method for predicting response to sotrovimab in a COVID patient, the method comprising: obtaining expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score.

2. A method for predicting risk for disease progression in a COVID patient, the method comprising: obtaining expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3; generating a score by combining the expression values for the plurality of biomarkers; and determining risk for disease progression for the COVID patient based on the score.

3. The method of claim 1 or 2, wherein the plurality of biomarkers comprise five or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

4. The method of claim 1 or 2, wherein the plurality of biomarkers comprise ten of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

5. The method of claim 1 or 2, wherein the plurality of biomarkers consist of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3.

6. The method of any one of claims 1-5, wherein obtaining expression values for a plurality of biomarkers comprises: obtaining a sample from the COVID patient; and processing the sample to obtain the expression values for the plurality of biomarkers.

7. The method of claim 6, wherein processing the sample to obtain the expression values for the plurality of biomarkers comprises performing RT-PCR on the obtained sample.

8. The method of any one of claims 1-5, wherein obtaining expression values for a plurality of biomarkers comprises obtaining the expression values from a third party.

9. The method of any one of claims 2-8, wherein the method for predicting risk for disease progression achieves at least a sensitivity of 85%.

10. The method of any one of claims 1-8, wherein the method for predicting response to sotrovimab or the method for predicting risk for disease progression achieves improved sensitivity in comparison to clinical predictors.

11. The method of claim 10, wherein the clinical predictors are any one of percent lymphocytes, neutrophil-lymphocyte ratio (NLR), IL-6, percent neutrophils, or viral load.

12. The method of any one of claims 1-11, wherein a lower expression level of any one of EFHC2, RPL10, NUDT3, EIF2D, or EIF4B increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

13. The method of any one of claims 1-11, wherein a higher expression level of any one of TAD A3, CD38, DAB2, OAS2, or MY018A increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

14. A method for predicting response to sotrovimab in a COVID patient, the method comprising: obtaining expression values for N biomarkers, wherein the N biomarkers are selected for predicting patient response to sotrovimab by: obtaining whole transcriptome data from a plurality of subjects who either received or did not receive sotrovimab; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score.

15. A method for predicting risk for disease progression in a COVID patient, the method comprising: obtaining expression values for N biomarkers, wherein the N biomarkers are selected for predicting risk of disease progression by: obtaining whole transcriptome data from a plurality of subjects; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score.

16. The method of claim 14 or 15, wherein N is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50.

17. The method of claim 14 or 15, wherein N is 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50.

18. The method of claim 14 or 15, wherein Vis 10.

19. The method of any one of claims 14-18, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing uniform manifold approximation and projection (UMAP) dimensionality reduction.

20. The method of any one of claims 14-18, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing principal component analysis.

21. The method of any one of claims 14-18, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing t- distributed Stochastic Neighbor Embedding (t-SNE).

22. The method of any one of claims 14-21, wherein defining V clusters using the groupings of markers comprising performing K-means clustering.

23. The method of any one of claims 14-21, wherein the biomarker from each of the V clusters most associated to risk is selected according to ANOVA F-scores.

24. The method of any one of claims 14-23, wherein one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-α signaling via NFk B. IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MY C targets, or oxidative phosphorylation.

25. The method of any one of claims 14-23, wherein one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, or interferon gamma response.

26. The method of any one of claims 1-25, wherein the expression values are obtained based on mRNA measurements for the plurality of biomarkers.

27. The method of claim 26, wherein the mRNA measurements are obtained via hybridization to an array comprising probes for the plurality of biomarkers.

28. The method of claim 26, wherein the mRNA measurements comprise RNA-seq data.

29. The method of claim 26, wherein the mRNA measurements comprise qPCR data.

30. A non-transitory computer readable medium for predicting response to sotrovimab in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO18A, NUDT3, OAS2, RPL10, TADA3; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score.

31. A non-transitory computer readable medium for predicting risk for disease progression in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3; generate a score by combining the expression values for the plurality of biomarkers; and determine risk for disease progression for the COVID patient based on the score.

32. The non-transitory computer readable medium of claim 30 or 31, wherein the plurality of biomarkers comprise five or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

33. The non-transitory computer readable medium of claim 30 or 31, wherein the plurality of biomarkers comprise ten of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

34. The non-transitory computer readable medium of claim 30 or 31, wherein the plurality of biomarkers consist of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

35. The non-transitory computer readable medium of any one of claims 30-34, wherein the instructions that cause the processor to obtain expression values for a plurality of biomarkers further comprise instructions that, when executed by a processor, cause the processor to obtain the expression values from a third party.

36. The non-transitory computer readable medium of any one of claims 31-35, wherein the determined risk for disease progression achieves at least a sensitivity of 85%.

37. The non-transitory computer readable medium of any one of claims 30-36, wherein the classification of the COVID patient’s response to sotrovimab or the determined risk for disease progression achieves improved sensitivity in comparison to clinical predictors.

38. The non-transitory computer readable medium of claim 37, wherein the clinical predictors are any one of percent lymphocytes, neutrophil-lymphocyte ratio (NLR), IL-6, percent neutrophils, or viral load.

39. The non-transitory computer readable medium of any one of claims 30-38, wherein a lower expression level of any one of EFHC2, RPL10, NUDT3, EIF2D, or EIF4B increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

40. The non-transitory computer readable medium of any one of claims 30-38, wherein a higher expression level of any one of TADA3, CD38, DAB2, OAS2, or MY018A increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

41. A non-transitory computer readable medium for predicting response to sotrovimab in a CO VID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for N biomarkers, wherein the N biomarkers are selected for predicting patient response to sotrovimab by: obtaining whole transcriptome data from a plurality of subjects who either received or did not receive sotrovimab; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score.

42. A non-transitory computer readable medium for predicting risk for disease progression in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for N biomarkers, wherein the N biomarkers are selected for predicting risk of disease progression by: obtaining whole transcriptome data from a plurality of subjects; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score.

43. The non-transitory computer readable medium of claim 41 or 42, wherein N is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50.

44. The non-transitory computer readable medium of claim 41 or 42, wherein N is 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50.

45. The non-transitory computer readable medium of claim 41 or 42, wherein /Vis 10.

46. The non-transitory computer readable medium of any one of claims 41-45, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing uniform manifold approximation and projection (UMAP) dimensionality reduction.

47. The non-transitory computer readable medium of any of any one of claims 41-45, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing principal component analysis.

48. The non-transitory computer readable medium of any one of claims 41-45, wherein performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing t-distributed Stochastic Neighbor Embedding (t-SNE).

49. The non-transitory computer readable medium of any one of claims 41-48, wherein defining N clusters using the groupings of markers comprising performing K-means clustering.

50. The non-transitory computer readable medium of any one of claims 41-49, wherein the biomarker from each of the N clusters most associated to risk is selected according to ANOVA F-scores.

51. The non-transitory computer readable medium of any one of claims 41-50, wherein one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-a signaling via NFk B. IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MY C targets, or oxidative phosphorylation.

52. The non-transitory computer readable medium of any one of claims 41-51, wherein one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, or interferon gamma response.

53. The non-transitory computer readable medium of any one of claims 30-52, wherein the expression values are obtained based on mRNA measurements for the plurality of biomarkers.

54. The non-transitory computer readable medium of claim 53, wherein the mRNA measurements are previously obtained via hybridization to an array comprising probes for the plurality of biomarkers.

55. The non-transitory computer readable medium of claim 53, wherein the mRNA measurements comprise RNA-seq data.

56. The non-transitory computer readable medium of claim 53, wherein the mRNA measurements comprise qPCR data.

Description:
Virological and Molecular Surrogates of Response to SARS-CoV-2

Neutralizing Antibody Sotrovimab

Cross-Reference to Related Applications

[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/279,246 filed November 15, 2021, and U.S. Provisional Patent Application No. 63/286,727 filed December 7, 2021, the entire disclosure of each which is hereby incorporated by reference in its entirety for all purposes.

Background of the invention

[0002] Infection with SARS-CoV-2 leads to starkly divergent clinical outcomes.

Understanding the risk of progression to severe SARS-CoV-2 (COVID-19) is key to effective and timely treatment. Sotrovimab is an engineered human monoclonal antibody that broadly neutralizes SARS-CoV-2, SARS-CoV and other related animal sarbecoviruses and was derived from S309, an antibody isolated from a SARS-CoV- 1 infected subject. Sotrovimab targets a highly conserved epitope in Spike located in a site of the Receptor Binding Domain distal to the receptor-binding motif (RBM), retaining activity against SARS-CoV-2 variants of concern (VOC) (Alpha, Beta, Gamma, Delta) and variants of interest (VOI). Other monoclonal antibodies developed for COVID- 19 bind to the RBM that engages the angiotensin-converting enzyme 2 (ACE2) receptor. The RBM is one of the most mutable and immunogenic regions of the virus. As a result, RBM antibodies often do not retain activity against VOC/VOI.

[0003] Sotrovimab was recently tested in COMET-ICE (ClinicalTrials.gov NCT04545060), a multicenter, double-blind, phase 3 clinical trial that recruited non-hospitalized patients with symptomatic COVID- 19, and at least one known risk factor for disease progression. Participants were randomized to a single intravenous infusion of sotrovimab 500 mg or placebo. In the interim analysis of the trial, sotrovimab significantly reduced the risk of hospitalization or death from COVID-19 by 79% with a total of six (1%) of subjects in the sotrovimab group died or required hospitalization versus 30 (6%) subjects in the placebo group. There remains a need to differentiate COVID-19 patients that will respond favorably to sotrovimab and COVID-19 patients that are likely to progress to severe COVID-19 disease.

Summary

[0004] The COMET-ICE clinical trial identified significant heterogeneity in the risk for hospitalization, ICU admission, and death, providing the impetus to assess the performance of sotrovimab across various strata of risk. The clinical data presented the opportunity to identify surrogate markers for the efficacy of sotrovimab that could be used as endpoints in the design of future trials and that may provide insight into the pathogenesis of this disease. Good surrogate endpoints are those that are (i) modified by treatment, (ii) strongly associated with the (clinical) endpoint of interest and (iii) significantly more efficient or earlier to measure than the clinical endpoint itself. The current study aimed at identifying correlates of hospitalization and severe disease based on laboratory parameters and on transcriptome analysis and then to assess the impact of antibody treatment on these parameters. This approach enables assessment of the differential impact of antibody treatment on populations with different intrinsic risk of progression to more severe disease and the identification surrogates of treatment response. Here, the nature of the systemic immune response provided powerful insights into these questions. [0005] Disclosed herein are methods for predicting response to sotrovimab in a COVID patient, the method comprising: obtaining expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score. Additionally disclosed herein are methods for predicting risk for disease progression in a COVID patient, the method comprising: obtaining expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3; generating a score by combining the expression values for the plurality of biomarkers; and determining risk for disease progression for the COVID patient based on the score.

[0006] In various embodiments, the plurality of biomarkers comprise five or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the plurality of biomarkers comprise ten of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TAD A3. In various embodiments, the plurality of biomarkers consist of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, obtaining expression values for a plurality of biomarkers comprises: obtaining a sample from the COVID patient; and processing the sample to obtain the expression values for the plurality of biomarkers. In various embodiments, processing the sample to obtain the expression values for the plurality of biomarkers comprises performing RT-PCR on the obtained sample. In various embodiments, obtaining expression values for a plurality of biomarkers comprises obtaining the expression values from a third party. In various embodiments, the method for predicting risk for disease progression achieves at least a sensitivity of 85%. In various embodiments, the method for predicting response to sotrovimab or the method for predicting risk for disease progression achieves improved sensitivity in comparison to clinical predictors. In various embodiments, the clinical predictors are any one of percent lymphocytes, neutrophil-lymphocyte ratio (NLR), IL-6, percent neutrophils, or viral load.

[0007] In various embodiments, a lower expression level of any one of EFHC2, RPL10, NUDT3, EIF2D, or EIF4B increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab. In various embodiments, a higher expression level of any one of TADA3, CD38, DAB2, OAS2, or MY018A increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

[0008] Additionally disclosed herein is a method for predicting response to sotrovimab in a COVID patient, the method comprising: obtaining expression values for N biomarkers, wherein the N biomarkers are selected for predicting patient response to sotrovimab by: obtaining whole transcriptome data from a plurality of subjects who either received or did not receive sotrovimab; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score. Additionally disclosed herein is a method for predicting risk for disease progression in a COVID patient, the method comprising: obtaining expression values for N biomarkers, wherein the N biomarkers are selected for predicting risk of disease progression by: obtaining whole transcriptome data from a plurality of subjects; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generating a score by combining the expression values for the plurality of biomarkers; and classifying the COVID patient’s response to sotrovimab based on the score.

[0009] In various embodiments, N is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50. In various embodiments, N is 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50. In various embodiments, N is 10.

[0010] In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing uniform manifold approximation and projection (UMAP) dimensionality reduction. In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing principal component analysis. In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing t-distributed Stochastic Neighbor Embedding (t-SNE).

[0011] In various embodiments, defining N clusters using the groupings of markers comprising performing K-means clustering. In various embodiments, the biomarker from each of the N clusters most associated to risk is selected according to ANOVA F-scores. In various embodiments, one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-α signaling via NFk B, IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MYC targets, or oxidative phosphorylation. In various embodiments, one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, or interferon gamma response.

[0012] In various embodiments, the expression values are obtained based on mRNA measurements for the plurality of biomarkers. In various embodiments, the mRNA measurements are obtained via hybridization to an array comprising probes for the plurality of biomarkers. In various embodiments, the mRNA measurements comprise RNA-seq data. In various embodiments, the mRNA measurements comprise qPCR data.

[0013] Additionally disclosed herein is a non-transitory computer readable medium for predicting response to sotrovimab in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score. Additionally disclosed herein is a non-transitory computer readable medium for predicting risk for disease progression in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for a plurality of biomarkers, wherein the plurality of biomarkers comprise two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3; generate a score by combining the expression values for the plurality of biomarkers; and determine risk for disease progression for the COVID patient based on the score.

[0014] In various embodiments, the plurality of biomarkers comprise five or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the plurality of biomarkers comprise ten of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the plurality of biomarkers consist of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

[0015] In various embodiments, the instructions that cause the processor to obtain expression values for a plurality of biomarkers further comprise instructions that, when executed by a processor, cause the processor to obtain the expression values from a third party. In various embodiments, the determined risk for disease progression achieves at least a sensitivity of 85%. In various embodiments, the classification of the COVID patient’s response to sotrovimab or the determined risk for disease progression achieves improved sensitivity in comparison to clinical predictors. In various embodiments, the clinical predictors are any one of percent lymphocytes, neutrophil-lymphocyte ratio (NLR), IL-6, percent neutrophils, or viral load.

[0016] In various embodiments, a lower expression level of any one of EFHC2, RPL10, NUDT3, EIF2D, or EIF4B increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab. In various embodiments, a higher expression level of any one of TADA3, CD38, DAB2, OAS2, or MY018A increases risk for disease progression or decreases likelihood that the COVID patient responds to sotrovimab.

[0017] Additionally disclosed herein is a non-transitory computer readable medium for predicting response to sotrovimab in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for N biomarkers, wherein the N biomarkers are selected for predicting patient response to sotrovimab by: obtaining whole transcriptome data from a plurality of subjects who either received or did not receive sotrovimab; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score. Additionally disclosed herein is a non-transitory computer readable medium for predicting risk for disease progression in a COVID patient, the non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to: obtain expression values for N biomarkers, wherein the N biomarkers are selected for predicting risk of disease progression by: obtaining whole transcriptome data from a plurality of subjects; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated to one or more of risk of disease progression, risk of disease severity, and risk of hospitalization; generate a score by combining the expression values for the plurality of biomarkers; and classify the COVID patient’s response to sotrovimab based on the score.

[0018] In various embodiments, N is at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50. In various embodiments, N is 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50. In various embodiments, N is 10.

[0019] In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing uniform manifold approximation and projection (UMAP) dimensionality reduction. In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing principal component analysis. In various embodiments, performing dimensional reduction of the whole transcriptome data of the plurality of subjects comprising performing t-distributed Stochastic Neighbor Embedding (t-SNE).

[0020] In various embodiments, defining N clusters using the groupings of markers comprising performing K-means clustering. In various embodiments, the biomarker from each of the N clusters most associated to risk is selected according to ANOVA F-scores. In various embodiments, one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-α signaling via NFk B, IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MYC targets, or oxidative phosphorylation. In various embodiments, one or more of the N biomarkers are involved in any of the complement pathway, inflammatory response, interferon alpha response, or interferon gamma response.

[0021] In various embodiments, the expression values are obtained based on mRNA measurements for the plurality of biomarkers. In various embodiments, the mRNA measurements are previously obtained via hybridization to an array comprising probes for the plurality of biomarkers. In various embodiments, the mRNA measurements comprise RNA-seq data. In various embodiments, the mRNA measurements comprise qPCR data.

Brief Description of the Drawings

[0022] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description and accompanying drawings. [0023] FIG. 1 depicts a system overview for predicting likely severe disease or responsiveness to sotrovimab for a patient diagnosed with an infectious disease, in accordance with an embodiment.

[0024] FIGs. 2A and 2B respectively show results of the exploration of transcriptome signatures, including explained variance of each principal component and the cumulative explained variance, summing from the first to Nth component, where N is denoted on the x-axis. [0025] FIG. 2C shows the distribution of density differences between Day 1 and Day 8 for each patient. The line denotes the chosen cutoff for defining a high risk group.

[0026] FIG. 2D is a visualization of which patients are defined as high risk according to the cutoff shown in FIG. 2C.

[0027] FIG. 2E depicts UMAP dimensionality reduction of genes based on transcriptomic patterns across patients. Color denotes the AUC of that gene for predicting risk cluster at Day 8. [0028] FIG. 2F shows K-means clustering of genes into 10 groups.

[0029] FIG. 3 shows AUC (AUROC) for predicting the high risk cluster using a varying number of genes. The blue line denotes selecting the top genes based on F-score. The orange line shows performance when the top f-score s are taken uniformly from each cluster.

[0030] FIGs. 4A and 4B show comparative performance of sources and number of genes of a predictive gene panel.

[0031] FIG. 5A shows Fisher's exact p-value for association to hospitalization as a function of cutoff value and day.

[0032] FIG. 5B is a visualization of the distribution of day 1 neutrophil lymphocyte ratio for hospitalized versus non-hospitalized patients.

[0033] FIGs. 6A and 6B show response to Sotrovimab in high-risk group defined by laboratory parameters.

[0034] FIGs. 7A and 7B show the deconvolution of cell types including Lymphocyte (FIG. 7A) and neutrophil (FIG. 7B) proportions estimated from deconvolution of transcriptomics data using the ABIS matrix optimized for deconvolution of PBMC RNA-seq.

[0035] FIGs. 8 A and 8B show the response to Sotrovimab in high-risk group defined by transcriptome profile.

[0036] FIG. 8C shows a 2D kernel density, presented as a contour plot, highlight distribution of transcriptomics profiles in UMAP by visit day.

[0037] FIG. 8D shows a threshold on the density difference between Day 1 and Day 8 distributions defines a high-risk cluster (red fill) which encompasses Day 1 and 8 transcriptomics profiles for 6 of 8 hospitalized patients. [0038] FIG. 8E shows Day 1 and Day 8 distributions of baseline seropositive patients (n=69).

[0039] FIG. 9 shows overlap between risk clusters defined by neutrophil lymphocyte ratio versus transcriptome signatures.

[0040] FIGs. 10A and 10B show differences in viral load between high and low risk groups at Day 1 and Day 8, respectively. Risk groups are defined using either clinical variables (specifically neutrophil lymphocyte ratio) or transcriptomics. A boxplot of the distribution is presented within each violin plot. The white dot denotes the mean.

[0041] FIGs. 11A and 1 IB show transcriptome characteristics of high-risk group.

[0042] FIGs. 12A-12C show UMAP projection of transcriptomic profiles faceted by day and colored by treatment status for the full and transcriptome high risk cohorts.

[0043] FIGs. 13A-E shows surrogates to predict both risk of COVID-19 disease and response to sotrovimab using a 10 gene panel.

[0044] FIGs. 14A-14C show surrogates of risk and recovery using a 10-gene panel.

[0045] FIG. 15A shows the expression levels for each of the 10 panel genes for high risk versus low risk patients, as defined by the full transcriptome.

[0046] FIG. 15B shows the normalized expression levels of the 10 genes in an independent validation cohort.

Detailed description of the invention

Definitions

[0001] Terms used in the claims and specification are defined as set forth below unless otherwise specified.

[0002] The term “subject” or “patient” are used interchangeably and encompass a cell, tissue, organism, human or non-human, mammal or non-mammal, male or female, whether in vivo, ex vivo, or in vitro.

[0003] The terms “marker,” “markers,” “biomarker,” and “biomarkers” are used interchangeably and encompass, without limitation, lipids, lipoproteins, proteins, cytokines, chemokines, growth factors, peptides, nucleic acids, genes, and oligonucleotides, together with their related complexes, metabolites, mutations, variants, polymorphisms, modifications, fragments, subunits, degradation products, elements, and other analytes or sample-derived measures. A marker can also include mutated proteins, mutated nucleic acids, structural variants including copy number variations, inversions, and/or transcript variants, in circumstances in which such mutations or structural variants are useful for developing a model (e.g., a machine learning model), or are useful in predictive models developed using related markers (e.g., non- mutated versions of the proteins or nucleic acids, alternative transcripts, etc.).

[0004] The term “sample” or “test sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art.

System Overview

[0047] Generally, methods disclosed herein involve implementing biomarker panels for predicting likely severe disease or responsiveness to sotrovimab for a patient diagnosed with an infectious disease. In various embodiments, biomarker panels (such as a 10 gene panel described in the Examples below) achieve high sensitivity. This means that the biomarker panels can accurately identify a high proportion of patients (e.g., high sensitivity). Such biomarker panels can be complementary to clinical predictors, an example of which includes neutrophil-lymphocyte ratio. Clinical predictors provide high specificity, but low sensitivity. Thus, clinical predictors may accurately classify a small proportion of patients, but inaccurately classify the remaining, larger proportion of patients.

[0048] FIG. 1 depicts an overview of a system environment 100 for generating an infectious disease prediction in a patient 110, in accordance with an embodiment. Specifically, the overview shown in FIG. 1 is useful for predicting likely severe disease or responsiveness to sotrovimab for a patient diagnosed with an infectious disease, in accordance with an embodiment. The system environment 100 provides context in order to introduce a biomarker quantification assay 120 and an infectious disease prediction system 130.

[0049] The subject 110 refers to an individual who is associated with an infectious disease. In some embodiments, the subject 110 is previously diagnosed with an infectious disease. In some embodiments, the subject 110 is suspected of having an infectious disease. For example, the subject 110 may be exhibiting symptoms of an infectious disease. In particular embodiments, the infectious disease is caused by an infection by a virus, such as for example, severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In particular embodiments, the infectious disease is coronavirus disease 2019 (COVID-19) caused by a SARS-CoV-2 virus.

[0050] In various embodiments, a test sample is obtained from the subject 110. The sample can be obtained by the individual or by a third party, e.g., a medical professional. Examples of medical professionals include physicians, emergency medical technicians, nurses, first responders, psychologists, phlebotomist, medical physics personnel, nurse practitioners, surgeons, dentists, and any other obvious medical professional as would be known to one skilled in the art. In particular embodiments, the sample is a blood sample.

[0051] The test sample is tested to determine expression values of one or more markers by performing the biomarker quantification assay 120. The biomarker quantification assay 120 determines quantitative expression values of one or more biomarkers from the test sample. In various embodiments, the biomarker quantification assay 120 may be an immunoassay, and more specifically, a multi -plex immunoassay, for measuring protein biomarker expression levels. In various embodiments, the biomarker quantification assay 120 involves performing sequencing of RNA transcripts or sequences derived from RNA transcripts (e.g., cDNA sequences that have been reverse transcribed from RNA transcripts). Further example assays are described herein. The expression levels of various biomarkers can be obtained in a single run using a single test sample obtained from the subject 110. The quantified expression values of the biomarkers are provided to the infectious disease prediction system 130.

[0052] Generally, the infectious disease prediction system 130 may be embodied as one or more computers. Therefore, in various embodiments, the steps described in reference to the infectious disease prediction system 130 are performed in silico. The infectious disease prediction system 130 analyzes the received biomarker expression values from the biomarker quantification assay 120 to generate an infectious disease prediction 140 in the subject 110. [0053] In various embodiments, the biomarker quantification assay 120 and the infectious disease prediction system 130 can be employed by different parties. For example, a first party performs the biomarker quantification assay 120 which then provides the results to a second party which implements the infectious disease prediction system 130. For example, the first party may be a clinical laboratory that obtains test samples from subjects 110 and performs the assay 120 on the test samples. The second party receives the expression values of biomarkers resulting from the performed assay 120 and analyzes the expression values using the infectious disease prediction system 130.

[0054] Generally, the infectious disease prediction system 130 analyzes expression values of biomarkers of a biomarker panel to generate the infectious disease prediction 140. In various embodiments, the infectious disease prediction 140 represents a prediction that the subject 110 is a responder to a therapeutic. In particular embodiments, the infectious disease prediction 140 represents a prediction that the subject 110 is a responder to a SARS-CoV-2 therapeutic. In various embodiments, the infectious disease prediction 140 represents a prediction that the subject 110 is a non-responder. In particular embodiments, the infectious disease prediction 140 represents a prediction that the subject 110 is a non-responder to a SARS-CoV-2 therapeutic. An example of a SARS-CoV-2 therapeutic is antibody or antibody fragment. In particular embodiments, the SARS-CoV-2 therapeutic is Sotrovimab (XEVUDY).

[0055] In various embodiments, the infectious disease prediction 140 represents a prediction that the subject 110 is at risk of infectious disease progression, such as risk of progression to severe SARS-CoV-2 (COVID-19). In various embodiments, severe SARS-CoV-2 can be characterized by dyspnea, hypoxemia, a respiratory rate of 30 more breaths per minute, a blood oxygen saturation of 93% or less, a ratio of the partial pressure of arterial oxygen to the fraction of inspired oxygen (Pao2:Fio2) of less than 300 mm Hg, infiltrates in more than 50% of the lung field, or patient hospitalization.

Infectious Disease Prediction System

[0056] As described herein, infectious disease prediction system 130 analyzes expression values of biomarkers of a biomarker panel to generate the infectious disease prediction 140. In various embodiments, the infectious disease prediction system 130 analyzes expression values of biomarkers by implementing a machine learning model. For example, the machine learning model can generate a score by combining the expression values for the plurality of biomarkers. Thus, the infectious disease prediction system 130 can generate the infectious disease prediction 140 using the generated score.

[0057] In various embodiments, the infectious disease prediction system 130 may be implemented during one or two phases: 1) a training phase and 2) a deployment phase. More specifically, the training phase refers to the building and training of one or more machine learning models based on training data that includes quantitative expression values of biomarkers obtained from training individuals with a known outcome. For example, such training individuals may be known to be a responder or a non-responder to a SARS-CoV-2 therapeutic. Thus, the infectious disease prediction system 130 can train machine learning models to better predict likely responders or non-responders to a SARS-CoV-2 therapeutic according to the biomarker expression values.

[0005] In various embodiments, a machine learning model, such as a machine learning model trained and/or implemented by the infectious disease prediction system 130, is any one of a regression model (e.g., linear regression, logistic regression, or polynomial regression), decision tree, random forest, support vector machine, Naive Bayes model, k-means cluster, or neural network (e.g., feed-forward networks, convolutional neural network (CNN), or deep neural networks (DNN). The machine learning model can be trained using a machine learning implemented method, such as any one of a linear regression algorithm, logistic regression algorithm, decision tree algorithm, support vector machine classification, Naive Bayes classification, K-Nearest Neighbor classification, random forest algorithm, deep learning algorithm, gradient boosting algorithm, and dimensionality reduction techniques such as manifold learning, principal component analysis, factor analysis, autoencoder regularization, and independent component analysis, or combinations thereof. In various embodiments, the machine learning model is trained using supervised learning algorithms, unsupervised learning algorithms, or a combination of supervised and unsupervised techniques.

[0006] In various embodiments, the machine learning model has one or more parameters, such as hyperparameters or model parameters. Hyperparameters are generally established prior to training. Examples of hyperparameters include the learning rate, depth or leaves of a decision tree, number of hidden layers in a deep neural network, number of clusters in a k-means cluster, penalty in a regression model, and a regularization parameter associated with a cost function. Model parameters are generally adjusted during training. Examples of model parameters include weights associated with nodes in layers of neural network, support vectors in a support vector machine, and coefficients in a regression model. The model parameters of the machine learning model are trained (e.g., adjusted) using the training data to improve the predictive power of the machine learning model.

[0007] During deployment, the infectious disease prediction system 130 implements the machine learning model that combines the expression values of biomarkers in the biomarker panel to generate a score. In various embodiments, to generate the infectious disease prediction and classify the patient based on the score, the infectious disease prediction system 130 compares the score to a threshold value. For example, if the score is below the threshold value, the infectious disease prediction system 130 can classify the patient in a first category (e.g., as a non-responder) where if the score is above the threshold value, the infectious disease prediction system 130 can classify the patient in a second category (e.g., as a responder). As another example, if the score is below the threshold value, the infectious disease prediction system 130 can classify the patient in a first category (e.g., as a patient likely to progress to severe COVID- 19 disease) where if the score is above the threshold value, the infectious disease prediction system 130 can classify the patient in a second category (e.g., as a patient unlikely to progress to severe COVID-19 disease).

[0008] In various embodiments, to generate the infectious disease prediction based on the score, the infectious disease prediction system 130 compares the score to a reference score. As used herein, a reference score refers to a previously determined score that corresponds to training individuals with known outcomes.

[0009] For example, a reference score can be previously determined from training individuals known to be responders to a SARS-CoV2 therapeutic. Thus, the infectious disease prediction system 130 can classify the patient as a likely non-responder to a SARS-CoV2 therapeutic if the score of the patient is significantly different (e.g., p-value < 0.05) in comparison to the reference score previously determined from training individuals known to be responders to a SARS-CoV2 therapeutic. As another example, infectious disease prediction system 130 can classify the patient as a likely responder to a SARS-CoV2 therapeutic if the score of the patient is not significantly different (e.g., p-value > 0.05) in comparison to the reference score previously determined from training individuals known to be responders to a SARS-CoV2 therapeutic.

[0010] As another example, a reference score can be previously determined from training individuals known to be non-responders to a SARS-CoV2 therapeutic. Thus, the infectious disease prediction system 130 can classify the patient as a likely responder to a SARS-CoV2 therapeutic if the score of the patient is significantly different (e.g., p-value < 0.05) in comparison to the reference score previously determined from training individuals known to be non-responders to a SARS-CoV2 therapeutic. As another example, infectious disease prediction system 130 can classify the patient as a likely non-responder to a SARS-CoV2 therapeutic if the score of the patient is not significantly different (e.g., p-value > 0.05) in comparison to the reference score previously determined from training individuals known to be non-responders to a SARS-CoV2 therapeutic.

[0011] For example, a reference score can be previously determined from training individuals known to experience severe COVID-19 disease. Thus, the infectious disease prediction system 130 can classify the patient as unlikely to experience severe COVID-19 disease if the score of the patient is significantly different (e.g., p-value < 0.05) in comparison to the reference score previously determined from training individuals known to experience severe COVID-19 disease. As another example, infectious disease prediction system 130 can classify the patient as likely to experience severe CO VID-19 disease if the score of the patient is not significantly different (e.g., p-value > 0.05) in comparison to the reference score previously determined from training individuals known to experience severe COVID-19 disease.

[0012] As another example, a reference score can be previously determined from training individuals known to not experience severe COVID-19 disease. Thus, the infectious disease prediction system 130 can classify the patient as likely to experience severe COVID-19 disease if the score of the patient is significantly different (e.g., p-value < 0.05) in comparison to the reference score previously determined from training individuals known to not experience severe COVID-19 disease. As another example, infectious disease prediction system 130 can classify the patient as unlikely to experience severe CO VID-19 disease if the score of the patient is not significantly different (e.g., p-value > 0.05) in comparison to the reference score previously determined from training individuals known to not experience severe COVID-19 disease.

[0058] Generally, depending on the classification of the subject (e.g., the infectious disease prediction 140 described in FIG. 1), the subject can undergo or not undergo treatment. In other words, the prediction can guide the treatment of the subject. For example, if the patient is predicted to be responder to a SARS-CoV-2 therapeutic, methods disclosed herein involve administering the SARS-CoV-2 therapeutic to the subject to treat the infectious disease. In particular embodiments, if the patient is predicted to be responder to Sotrovimab (XEVUDY), methods disclosed herein involve administering Sotrovimab (XEVUDY) to the subject to treat the infectious disease. As another example, if the patient is predicted to likely experience severe COVID-19 disease, methods disclosed herein involve administering a SARS-CoV-2 therapeutic to the patient to treat and prevent progression to severe COVID-19 disease. In particular embodiments, if the patient is predicted to likely experience severe CO VID-19 disease, methods disclosed herein involve administering Sotrovimab (XEVUDY) to the patient to treat and prevent progression to severe COVID-19 disease. As another example, if the patient is predicted to not experience severe COVID- 19 disease, then the patient need not be administered a SARS- CoV-2 therapeutic. In particular embodiments, if the patient is predicted to not experience severe COVID-19 disease, then the patient need not be administered Sotrovimab (XEVUDY).

Biomarker Panel

[0059] As described herein, the infectious disease prediction system 130 analyzes expression values of biomarkers of a biomarker panel. In various embodiments, a single biomarker is included in the biomarker panel, hereafter referred to as a univariate biomarker panel. In other embodiments, two or more biomarkers are included in the biomarker panel, hereafter referred to as a multivariate biomarker panel. In such embodiments, the multivariate biomarker panel includes more than one biomarker. In various embodiments, the multivariate biomarker panel includes two biomarkers. In various embodiments, the multivariate biomarker panel includes 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 biomarkers. In particular embodiments, the multivariate biomarker panel includes 5 biomarkers. In particular embodiments, the multivariate biomarker panel includes 10 biomarkers.

[0060] In various embodiments, the multivariate biomarker panel includes two or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes two or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes three or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes three or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes four or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes four or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes five or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes five or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes six or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes six or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes seven or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes seven or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes eight or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes eight or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes nine or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes nine or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3. In various embodiments, the multivariate biomarker panel includes ten or more of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes ten or more of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

[0061] In particular embodiments, the multivariate biomarker panel includes each of the biomarkers identified in Table 8. For example, the multivariate biomarker panel includes each of CD38, DAB2, EFHC2, EIF2D, EIF4B, MYO 18 A, NUDT3, OAS2, RPL10, TADA3. In particular embodiments, the multivariate biomarker panel consists of each of the biomarkers identified in Table 8. For example, the multivariate biomarker panel consists of each of CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3.

[0062] In various embodiments, the multivariate biomarker panel includes 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 16 or more, 17 or more, 18 or more, 19 or more, or 20 or more of the biomarkers identified in Table 6 (biomarkers are identified under the “gene symbol” column with corresponding identifier shown under the “ensembl_id” column). Identificaton and Selection of Biomarkers for Inclusion in a Panel

[0063] Methods disclosed herein further include identifying biomarkers for inclusion in a biomarker panel. The identified biomarkers may be differentially expressed in patients that are likely to experience different outcomes. For example, the identified biomarkers may be differentially expressed in patients that are likely responders or non-responders to a SARS-CoV- 2 therapeutic. As another example, the identified biomarkers may be differentially expressed in patients that are likely to experience severe disease (e.g., severe COVID- 19 disease) and patients that are unlikely to experience severe disease (e.g., severe COVID-19 disease). Thus, the expression values of identified biomarkers included in a biomarker panel are informative for differentiating and classifying patients.

[0064] In various embodiments, methods for identifying biomarkers for inclusion in a biomarker panel include: obtaining whole transcriptome data from a plurality of subjects; performing dimensional reduction of the whole transcriptome data of the plurality of subjects to create groupings of markers based on their co-expression patterns across the plurality of subjects; defining N clusters using the groupings of markers; and selecting a biomarker from each of the N clusters most associated with disease risk (e.g., risk of disease progression, risk of disease severity, and risk of hospitalization). In various embodiments, each subject in the plurality of subjects may or may not have received a SARS-CoV-2 therapeutic (e.g., Sotrovimab (XEVUDY)).

[0065] Generally, the step of performing a dimensional reduction of the whole transcriptome data involves transforming the whole transcriptome data from a high-dimensional space into a lower dimensional space. Here, the lower dimensional space may retain certain information or properties of the whole transcriptome data of the high-dimensional space, while excluding less meaningful (e.g., redundant) data. Examples of dimensionality reduction analysis include principal component analysis (PCA), kernel PCA, graph-based kernel PCA, linear discriminant analysis, generalized discriminant analysis, non-negative matrix factorization, T-distributed stochastic neighbor embedding (t-SNE), or uniform manifold approximation and projection (UMAP). In particular embodiments, the step of performing a dimensional reduction of the whole transcriptome data involves performing a UMAP dimensional reduction analysis on the whole transcriptome data.

[0066] In various embodiments, the step of creating groupings of markers based on their co- expression patterns across the plurality of subjects involves performing a clustering step. In particular embodiments, the clustering step involves an unsupervised clustering. Examples of unsupervised clustering include hierarchical clustering, k-means clustering, density based spatial clustering of applications with noise (DBSCAN), or combinations thereof. In particular embodiments, the unsupervised clustering involves performing k-means clustering. In various embodiments, the step of creating groupings of markers involves creating N different clusters. In various embodiments, N represents 2 different clusters. However, in some embodiments, N represents 3 different clusters, 4 different clusters, 5 different clusters, 6 different clusters, 7 different clusters, 8 different clusters, 9 different clusters, 10 different clusters, 11 different clusters, 12 different clusters, 13 different clusters, 14 different clusters, 15 different clusters, 16 different clusters, 17 different clusters, 18 different clusters, 19 different clusters, or 20 different clusters. In particular embodiments, N represents 10 different clusters. In some scenarios, N may be selected according to a number of biomarkers that are to be included in the biomarker panel. For example, N can be equal to the number of biomarkers that are to be included in the biomarker panel. Therefore, if 10 biomarkers are to be included in the biomarker panel, the step of creating groupings of markers based on their co-expression patterns can include performing unsupervised clustering (e.g., k-means clustering) of the biomarkers based on their expression values to generate 10 different clusters.

[0067] In various embodiments, clusters may be generated according to common biological pathways, including any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-α signaling via NFk B. IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MYC targets, or oxidative phosphorylation.

[0068] In various embodiments, one or more biomarkers from each cluster are selected for inclusion in the biomarker panel, a step hereafter referred to as diversity-based selection of biomarkers. For example, one or more biomarkers are selected from each cluster according to a score that is indicative of association with risk (e.g., one or more of risk of disease progression, risk of disease severity, and risk of hospitalization). In various embodiments, the score is a statistical score, such as an analysis of variance (ANOVA) F-score. The F-score in an ANOVA analysis can be calculated as the ratio between a variation between sample means and a variation within the samples. The diversity-based selection of biomarkers is in contrast to a greedy based approach in which biomarkers are selected, irrespective of clustering, based solely on the score that is indicate of association with risk. For example, in a diversity-based selection, the biomarker in each cluster with the highest F-score is selected. As another example, in a greedy selection, the top biomarkers with the highest F-scores are selected.

[0069] In various embodiments, N biomarkers are selected for inclusion in the biomarker panel from N different clusters. In such embodiments, a single biomarker is selected from each of the N different clusters. In various embodiments, the N biomarkers selected for inclusion in the biomarker panel are involved in certain biological pathways, including any of the complement pathway, inflammatory response, interferon alpha response, interferon gamma response, TNF-α signaling via NFk B. IL6 JAK Stat3 Signaling pathway, xenobiotic metabolism, coagulation, apoptosis, G2M checkpoint, heme metabolism, MY C targets, or oxidative phosphorylation pathways. Thus, through this diversity-based selection of biomarkers, the N different biomarkers can provide diverse, non-overlapping information that are predictive of a particular outcome (e.g., responder to a SARS-CoV-2 therapeutic, non-responder to a SARS-CoV-2 therapeutic, likely severe disease, or unlikely severe disease).

SARS-CoV-2 Therapeutic

[0070] As disclosed herein, methods involve predicting responders and/or non-responders to a SARS-CoV-2 therapeutic, an example of which is an antibody against SARS-CoV-2.

Examples of antibodies against SARS-CoV-2 are disclosed in US Patent No. 11,168,128, which is hereby incorporated by reference in its entirety. In particular embodiments, the SARS-CoV-2 therapeutic is Sotrovimab (XEVUDY). As used herein, Sotrovimab is an antibody that comprises a heavy chain variable domain (VH) comprising a CDRH1, a CDRH2, and a CDRH3, and a light chain variable domain (VL) comprising a CDRL1 , a CDRL2, and a CDRL3. Exemplary CDRH1 , CDRH2, CDRH3, CDRL1, CDRL2, and CDRL3 sequences of Sotrovimab are shown below in Table 1A. Exemplary VH and VL sequences of Sotrovimab are shown below in Table 1B. Exemplary heavy chain and light chain sequences of Sotrovimab are shown below in Table 1C.

Assays [0071] As shown in FIG. 1, the system environment 100 involves implementing a biomarker quantification assay 120 for determining quantitative data for one or more biomarkers.

Examples of an assay (e.g., biomarker quantification assay 120) for one or more markers include DNA assays, microarrays, polymerase chain reaction (PCR), RT-PCR, Southern blots, Northern blots, antibody-binding assays, enzyme-linked immunosorbent assays (ELISAs), flow cytometry, protein assays, Western blots, nephelometry, turbidimetry, chromatography, mass spectrometry, immunoassays, including, by way of example, but not limitation, RIA, immunofluorescence, immunochemiluminescence, immunoelectrochemiluminescence, or competitive immunoassays, immunoprecipitation. The information from the assay can be quantitative and sent to a computer system. The information can also be qualitative, such as observing patterns or fluorescence, which can be translated into a quantitative measure by a user or automatically by a reader or computer system.

[0072] In various embodiments, the assay can be any one of RT-qPCR (quantitative reverse transcription polymerase chain reaction), qPCR (quantitative polymerase chain reaction), PCR (polymerase chain reaction), RT-PCR (reverse transcription polymerase chain reaction), SDA (strand displacement amplification), RPA (recombinase polymerase amplification), MDA (multiple displacement amplification), HDA (helicase dependent amplification), LAMP (loop- mediated isothermal amplification), RCA (rolling circle amplification), NASBA (nucleic acid- sequence-based amplification), and any other isothermal or thermocycled amplification reaction. In particular embodiments, the assay is a RT-qPCR assay or a LAMP assay. For example, in a critical care setting where a classification and therapy recommendation is to be rapidly developed for a patient (e.g., within 30 minutes or within 2 hours), assay can be RT-qPCR or a LAMP assay that enables rapid quantification of the biomarkers in a sample obtained from the patient.

[0073] In various embodiments, the biomarker quantification assay 120 involves performing sequencing to obtain sequence reads (e.g., sequence reads for generating a sequencing library). The sequence reads can be quantified to determine quantitative expression values biomarkers. Sequence reads can be achieved with commercially available next generation sequencing (NGS) platforms, including platforms that perform any of sequencing by synthesis, sequencing by ligation, pyrosequencing, using reversible terminator chemistry, using phospholinked fluorescent nucleotides, or real-time sequencing. As an example, amplified nucleic acids may be sequenced on an Illumina MiSeq platform. In particular embodiments, the biomarker quantification assay 120 involves performing whole transcriptome sequencing to determine expression values of biomarkers across the whole transcriptome.

[0074] When pyrosequencing, libraries of NGS fragments are cloned in-situ amplified by capture of one matrix molecule using granules coated with oligonucleotides complementary to adapters. Each granule containing a matrix of the same type is placed in a microbubble of the “water in oil” type and the matrix is cloned amplified using a method called emulsion PCR. After amplification, the emulsion is destroyed and the granules are stacked in separate wells of a titration picoplate acting as a flow cell during sequencing reactions. The ordered multiple administration of each of the four dNTP reagents into the flow cell occurs in the presence of sequencing enzymes and a luminescent reporter, such as luciferase. In the case where a suitable dNTP is added to the 3 ' end of the sequencing primer, the resulting ATP produces a flash of luminescence within the well, which is recorded using a CCD camera. It is possible to achieve a read length of more than or equal to 400 bases, and it is possible to obtain 10 6 readings of the sequence, resulting in up to 500 million base pairs (megabytes) of the sequence. Additional details for pyrosequencing is described in Voelkerding et al., Clinical Chem., 55: 641-658, 2009; MacLean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 6,210,891; US patent No. 6,258,568; each of which is hereby incorporated by reference in its entirety.

[0075] On the Solexa / Illumina platform, sequencing data is produced in the form of short readings. In this method, fragments of a library of NGS fragments are captured on the surface of a flow cell that is coated with oligonucleotide anchor molecules. An anchor molecule is used as a PCR primer, but due to the length of the matrix and its proximity to other nearby anchor oligonucleotides, elongation by PCR leads to the formation of a “vault” of the molecule with its hybridization with the neighboring anchor oligonucleotide and the formation of a bridging structure on the surface of the flow cell . These DNA loops are denatured and cleaved. Straight chains are then sequenced using reversibly stained terminators. The nucleotides included in the sequence are determined by detecting fluorescence after inclusion, where each fluorescent and blocking agent is removed prior to the next dNTP addition cycle. Additional details for sequencing using the Illumina platform is found in Voelkerding et al., Clinical Chem., 55: 641- 658, 2009; MacUean et al., Nature Rev. Microbiol., 7: 287-296; US patent No. 6,833,246; US patent No. 7,115,400; US patent No. 6,969,488; each of which is hereby incorporated by reference in its entirety.

[0076] In various embodiments, immunoassays designed to quantitate markers can be used in screening including multiplex assays. Measuring the concentration of a target marker in a sample or fraction thereof can be accomplished by a variety of specific assays. For example, a conventional sandwich type assay can be used in an array, EUISA, RIA, etc. format. Other immunoassays include Ouchterlony plates that provide a simple determination of antibody binding. Additionally, Western blots can be performed on protein gels or protein spots on filters, using a detection system specific for the markers as desired, conveniently using a labeling method.

[0077] Protein based analysis, using an antibody that specifically binds to a polypeptide (e.g. marker), can be used to quantify the marker level in a test sample obtained from a subject. In various embodiments, an antibody that binds to a marker can be a monoclonal antibody. In various embodiments, an antibody that binds to a marker can be a polyclonal antibody. For multiplex analysis of markers, arrays containing one or more marker affinity reagents, e.g. antibodies can be generated. Such an array can be constructed comprising antibodies against markers. Detection can utilize one or a panel of marker affinity reagents, e.g. a panel or cocktail of affinity reagents specific for one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty or more markers.

Examples

Example 1: Gene Panel of 10 biomarkers is Predictive of Severe Infectious Disease and of Response to Sotrovimab

[0078] Identified in these Examples are laboratory and molecular correlates of severe disease and of effective response to sotrovimab, a monoclonal antibody that is available under emergency use and other authorizations for treatment of symptomatic infection in high-risk adult outpatients. Here, “high-risk” patients are those participants that are at a particular risk of hospitalization within the study population in COMET-ICE. This study analyzed 1057 adults in the COMET-ICE clinical trial that randomized participants 1: 1 to receive a single 500 mg dose of sotrovimab or placebo. Several laboratory parameters, including the neutrophil-lymphocyte ratio and whole blood transcriptome profiles (in n=304) identified subjects at risk for disease progression. Response to treatment with sotrovimab treatment in the high-risk group was associated with rapid decrease of SARS-CoV-2 viral load and normalization of the transcriptome profile. Quantifiable clinical laboratory markers of risk for disease progression were identified. Furthermore, normalization of these parameters occurs with antibody treatment of established infection. These data indicate that systemic immune responses discriminate both risk for disease progression and response to one therapeutic intervention, indicating that reliance on measures in the upper respiratory tract alone does not fully capture events related to antibody treatment of coronavirus disease.

Methods

[0079] Characteristics of clinical trial population. COMET-ICE included 1057 adults with a positive polymerase-chain-reaction or antigen SARS-CoV-2 test result and onset of symptoms within the prior 5 days (Table 2A). Screening was performed within 24 hours before drug administration. Patients were required to be at high risk for COVID- 19 progression using previously identified clinical parameters. High risk was defined as older adults (age ≥55 years) or adults with at least one of the following risk factors: diabetes requiring medication, obesity (body-mass index >30 kg/m 2 ), chronic kidney disease (estimated glomerular filtration rate <60 mL/min/1.73 m 2 ), congestive heart failure (New York Heart Association class II or higher), chronic obstructive pulmonary disease, or moderate to severe asthma. Patients with already severe COVID-19, defined by shortness of breath at rest, oxygen saturation less than 94%, or requiring supplemental oxygen, were excluded. Participants were randomized 1: 1 to receive either a single 500-mg infusion of sotrovimab or equal volume saline placebo on day 1 after diagnosis. A subset of participants (n=304) consented for peripheral whole blood transcriptome analysis. Participants who opted in to the transcriptome sub-study had similar demographic, clinical and laboratory characteristics to those in the study. They were evenly divided between placebo and sotrovimab arms (Table 2A). In-person study visits occurred on days 1, 8, 15, 22 (W3), and 29 (W4) to assess adverse events and worsening of COVID- 19. During study visits, blood samples were collected for routine laboratory assessments. Samples for transcriptome analysis were collected twice: at the time of treatment (referred to as Day 1 herein) and a week later at the day 8 visit.

[0080] Clinical data analysis. The associations between laboratory values, and treatment response and hospitalization were measured using the area under the receiver operating characteristics (AUROC) curve. This metric was computed by directly ranking patients with no model fitting step. Rankings were produced in both ascending and descending order. The order that gave the highest AUROC was chosen as the preferred ranking. For binary variables such as baseline risk factors, significance was assessed by Fisher’s exact test. Assessing complementarity of features was complicated by varying missingness patterns, leading to sample size loss. This was minimized by looking at only pairs of variables, and by imputation of missing values. Neither approach significantly improved on the single most predictive variable.

[0081] RNA isolation and sequencing. Peripheral whole blood was collected into Paxgene Blood RNA tubes (PreAnalytiX) and stored according to manufacturer recommendations. RNA purification, library preparation and sequencing were performed by Q2 Solutions - EA Genomics (Morrisville, NC). Total RNA was isolated and was depleted of globin mRNA using the GLOBINclear kit (Invitrogen). RNA quantity and quality was assessed using an Agilent Bioanalyzer. The globin-depleted RNA was used to generate a sequencing library using the TruSeq stranded mRNA method (Illumina). Briefly, poly(T) oligonucleotides are used to select poly-adenylated RNAs from the total RNA after globin reduction which are then fragmented and converted to cDNA using random primers in two steps to maintain strand specific information once sequencing adapters are ligated. Sequencing depth was at least 25 million paired-end read clusters per sample with a minimum 50 base pair read length.

[0082] RNA-seq analysis. Library and sequencing quality metrics were assessed with FASTQC (v. 0. 11.8) and summarized with MultiQC (v. 1.7) following read trimming and alignment steps listed below. Low quality bases and adapters were clipped from sequenced reads using Trimmomatic (v. 0.39) and trimmed reads shorter than 30 bp were removed. Trimmed sequenced reads per library were then aligned to a custom reference combining the human reference transcriptome (GRCh38, release X from Gencode) with the SARS-CoV-2 reference transcriptome (ASM985889v3 version, Ensembl) and quantified using Salmon (v. 1.0.0). Transcript-level counts from Salmon were converted to gene-level counts using tximport (v.

I.20.0). Using DESeq2 (v 1.32.0), variance -stabilizing transformation was applied to gene-level counts prior to dimension reduction with principal components analysis (PCA) and uniform manifold approximation and projection (UMAP). The QC involved assessing 3' bias/gene body coverage. Furthermore, duplication rate versus reads/kbp was assessed and low read counts were verified to not be associated with high duplication at library level. Only genes with at least 10 read counts in at least 4% (n=24) of the samples were considered for further analysis (n = 23,540 genes).

[0083] Data analysis transcriptome. For exploration of transcriptome signatures, UMAP was run on variance -stabilizing transformed RNA-seq count data. Prior to UMAP projection, data were pre-conditioned and de-noised using PCA. The first 20 PCs were selected based on the point at which explained variance tended towards zero. Specifically, FIGs. 2A and 2B show (A) the explained variance of each principal component and (B) the cumulative explained variance, summing from the first to Nth component, where N is denoted on the x-axis.

[0084] For this baseline analysis, UMAP (from the umap-leam python package) was run with default parameters. To test the robustness of the embedding, this analysis was repeated for the full transcriptome (without PCA) for pathogen-associated transfer genes identified by di lulio

J, et al. Transfer transcriptomic signatures for infectious diseases. Proc Natl Acad Sci U S A 2021; 118, and for immune-related pathway gene sets (Hallmark Gene Sets annotation) (Liberzon A, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015; 1:417-25.). In each case, a variety of nearest neighbor values were tried, and the embedding was run multiple times to ensure repeatability. In all cases, the embeddings were similar. For example, the observed gradient between Day 1 and Day 8 samples, as well as the relative placement of the hospitalized were always consistent.

[0085] High risk vs low risk categorizations were derived as follows. Two-dimensional kernel density estimation with a bandwidth of 1 was applied to Day 1 and Day 8 UMAP values separately. High risk patients were defined as those within an area where the Day 1 density exceeded the Day 8 density by 0.005. This cutoff was derived by choosing a round positive number near the beginning of the tail of the distribution (FIG. 2C and FIG. 2D). Specifically, FIG. 2C shows the distribution of density differences between Day 1 and Day 8 for each patient. The red line denotes the chosen cutoff for defining a high risk group. FIG. 2D is a visualization of which patients are defined as high risk according to this cutoff. Additionally, a line search was performed on this cutoff to optimize for the separation between Day 1 and Day 8, as measured by Fisher’s exact p-value. This yielded an optimal cutoff of 0.006. To be conservative, this optimized value was not used since the selection of cutoffs for the line search could be influenced by information beyond Day 1 vs Day 8 status.

[0086] Differentially expressed genes associated with the high-risk cluster were scored using a model accounting for subject gender and visit day (DESeq2, v. 1.32.0). Differentially expressed genes were characterized via gene set enrichment analysis (fgsea, v 1. 18.0) using the Hallmark Gene Sets annotation (msigdbr, v. 7.4.1). For selection of a gene signature, a diversity- based selection was conducted according to top ANOVA F-scores within 10 empirically identified genes clusters. Gene clusters were derived by performing UMAP dimensionality reduction on the transpose of the transcriptome matrix (with genes as rows instead of patients). This creates a grouping of genes based on their co-expression patterns across individuals (FIG. 2E) and 10 gene clusters were defined using K-means clustering (FIG. 2F). Specifically, FIG. 2E depicts UMAP dimensionality reduction of genes based on transcriptomic patterns across patients. Color denotes the AUC of that gene for predicting risk cluster at Day 8. FIG. 2F shows K-means clustering of genes into 10 groups. From each cluster, the gene most associated to risk according to ANOVA F-score was selected. Diversity-based selection using gene clustering significantly improved on greedy selection based on F-score alone (FIG. 3) and yielded comparable performance to transfer-learned and knowledge-based genes sets (FIGs. 4A and 4B). Specifically, FIG. 3 shows AUC (AUROC) for predicting the high risk cluster using a varying number of genes. The blue line denotes selecting the top genes based on F-score. The orange line shows performance when the top f-scores are taken uniformly from each cluster. FIGs. 4A and 4B show comparative performance of sources and number of genes of a predictive gene panel. FIG. 4A AUC (AUROC) as a function of the number of genes for predicting high risk group at day 8. Blue points and whiskers show the mean and confidence interval of the mean, respectively, for random gene sets. The grey envelope shows the 90% confidence interval for random gene performance across 100 replicates. Green points show the performance of the transfer signature gene sets published by di lulio J, et al. Transfer transcriptomic signatures for infectious diseases. Proc Natl Acad Sci U S A 2021; 118. Orange points demonstrate the performance of gene sets from GSEA pathways. Finally, the red point denotes the performance of the selected 10 gene panel. FIG. 4B shows the top three gene sets for transfer signatures and GSEA pathways, evaluated by the margin above the upper confidence interval of random performance (equivalent to p < .05 for a one-sided significance test).

[0087] To assess performance of this set of 10 gene representing co-expression clusters, this entire process was repeated within a five-fold cross-validation loop, including gene clustering. In this procedure the dataset is partitioned into five chunks or folds. For each of the five folds, a model was trained on the other four chunks to predict its values in an unbiased manner. To avoid overfitting due to patient-specific attributes, samples from the same patient but different days were always kept in the same fold.

Results

[0088] Identification of high-risk group using clinical laboratory values. A total of 63 available laboratory variables were analyzed for their association to hospitalization (Table 2B). On day 1, white blood cell proportions were most predictive of hospitalization (AUROC 0.82- 0.83, Table 3). White blood cell proportions were quantified by the percent of neutrophils, lymphocytes, and the neutrophils to lymphocytes ratio (NLR). On day 8, these metrics predicted hospitalization with high sensitivity and specificity (AUROC 0.97-0.98). In comparison, the AUROC of viral RNA as measured in the nasopharynx was 0.67 on day 1, and 0.73 on day 8. This indicated that simple systemic immune and inflammatory laboratory parameters were more predictive than measurement of viral RNA in swabs from the upper respiratory tract. Multiple approaches were tested to combine clinical variables including the parameters used to provide subject selection in the COMET-ICE trial (supporting data folder) in a single predictive model but found that this did not improve on using the NLR.

[0089] Identification of treatment response using clinical laboratory values. Given the comparable accuracy of white blood cell metrics and the use of NLR in the recent literature, the utility of NLR was analyzed for stratifying patients into high and low risk groups. The cutoff for high-risk status was chosen to optimize the association to hospitalization, a NLR greater than 6 provided for the highest enrichment for disease progression (FIGs. 5 A and 5B). Specifically, FIGs. 5A and 5B show the threshold selection for neutrophil lymphocyte ratio. FIG. 5A shows Fisher's exact p-value for association to hospitalization as a function of cutoff value and day. FIG. 5B is a visualization of the distribution of day 1 neutrophil lymphocyte ratio for hospitalized versus non-hospitalized patients. The NLR ratio has been reported in multiple publications as a predictor of COVID- 19 progression. For example, the NLR ratio is further described in Ulloque-Badaracco JR, et al. Prognostic value of neutrophil-to-lymphocyte ratio in COVID-19 patients: A systematic review and meta-analysis. Int J Clin Pract 202 l:e 14596, and Simadibrata DM, et al. Neutrophil-to-lymphocyte ratio on admission to predict the severity and mortality of COVID- 19 patients: A meta-analysis. Am J Emerg Med 2021; 42:60-9.

[0090] In the present study, the sensitivity of the NLR=6 cutoff was 38% while its specificity was 95% for progression to hospitalization; thus, it is suitable as a low sensitivity, high specificity marker for decision making. This measure was used to classify high and low-risk patients for further analysis. Viral load at baseline was higher amongst those with NLR>6 than those with NLR<6; (log 10 viral load of 7.12 vs. 6.02, p-value 4.5e-5. Further, the effect of sotrovimab was most pronounced in the NLR>6 group (FIGs. 6A and 6B), both in terms of decreasing viral load and normalizing of NLR. Specifically, FIGs. 6A and 6B show response to Sotrovimab in high-risk group defined by laboratory parameters. The time trend of selected clinical variables for sotrovimab versus placebo treated patients (hue) in the full cohort and low risk and high-risk groups as defined by neutrophil lymphocyte ratio > 6. Difference from baseline for FIG. 6A: Neutrophil lymphocyte ratio (max difference: 3. 1 at day 8) and FIG. 6B: Log 10 viral load (max difference 0.83 log units at day 11). Sotrovimab leads to a statistical reduction of neutrophil lymphocyte ratio and viral RNA. At Day 5, sotrovimab-treated subjects in the NLR>6 group had decreased their mean viral load 100-fold (SD=32, N=71) while subjects in the placebo group experienced a decrease of 32-fold (SD=20, N=74; a 3-fold greater decrease in the sotrovimab group). In summary, peripheral blood counts (i.e., NLR) which are standardly collected in many patient encounters can be used to identify a population of individuals at elevated risk of progression and high viral load. These parameters normalized rapidly in those individuals receiving sotrovimab.

[0091] The role of SARS-CoV-2 serostatus in defining risk of progression of COVID- 19. Having SARS-CoV-2 anti-nucleocapsid antibodies provides protection against SARS-CoV- 2 re-infection. There is more limited information, however, on how serostatus may associate with severity of disease during acute infection. Specific to the current study, seropositivity at baseline may indicate prior infection by SARS-CoV-2, or that a patient is already seroconverting during an acute infection episode. Seropositivity rates varied significantly by race and ethnicity (chi 2 p=2e-8), with the lowest rate observed in Whites (8.7%) and the highest rate in Latinos (27.5%); Table 4. Seropositivity was also associated to lower viral RNA at baseline: mean 4.2 vs 6.4 log 10 viral RNA in seronegative versus seropositive patients (Mann- Whitney U p=le-36) and to other laboratory parameters associated with lesser clinical severity.

[0092] Of 202 seropositive patients at baseline, 6 (3%) were hospitalized or died, of which 4/96 (4.2%) corresponded to the placebo arm and 2/106 (1.9%) to the sotrovimab treatment arm. Of 740 seronegative patients at baseline, 30 (4. 1%) were hospitalized or died, of which 26 (87%) corresponded to the placebo arm and 4 (13%) to the sotrovimab treatment arm. No seropositive or sotrovimab-treated patients died or were admitted to the ICU. This is compared to 4 deaths (1%) and 9 ICU admissions (2.4%) among 374 seronegative patients in the placebo arm. Serology status at baseline was not significantly associated to all-cause hospitalization or death (Fisher’s exact p=0.67). [0093] Identification of high-risk group using transcriptomics. Because of the limited sensitivity - despite the high specificity - of NLR for the identification of patients at highest risk of hospitalization, whole blood transcriptomics was used to define additional predictors of disease progression and response to treatment. In theory such signatures could provide complementary insight into the biology of risk and recovery, while providing concordant information for understanding and extending insights from white blood cell proportions (FIGs. 7A and 7B). Specifically, FIGs. 7A and 7B show the deconvolution of cell types including Lymphocyte (FIG. 7A) and neutrophil (FIG. 7B) proportions estimated from deconvolution of transcriptomics data using the ABIS matrix optimized for deconvolution of PBMC RNA-seq. [0094] This dataset consisted of 304 subject samples collected prior to treatment on day 1 and at day 8. The transcriptomes of each subject were visualized using UMAP, which identified parameters (UMAP1 and UMAP2) in the total transcriptomic data that explain a proportion of the data variance. From day 1 to day 8, the distribution of all transcriptome profiles tended to shift towards higher values of UMAP component 2. Additionally, all six hospitalized patients on placebo failed to make this transition. Specifically, FIGs. 8A and 8B show the response to Sotrovimab in high-risk group defined by transcriptome profile. FIG. 8 A shows a UMAP projection of transcriptomic profiles across all Day 1 (pink) and Day 8 (green) samples. The high-risk group is denoted by the dashed line box. FIG. 8B shows the high-risk group defined in A. shows higher viral load at both day 1 and day 8. Here, the area outlined in the box in FIG. 8A was hypothesized to correspond to a UMAP-defined high-risk group where protective responses had failed to progress between day 1 and day 8. As a cross-check, the putatively high- risk transcriptomic group was significantly higher in viral load at both day 1 and day 8 (FIG. 8B). This risk stratification identified hospitalizations with a sensitivity of 85% and a specificity of 53%, thus making it a viable option for high sensitivity, low specificity prediction of hospitalization at day 1.

[0095] The transcriptomes of each patient was visualized using UMAP. From Day 1 to Day

8, the distribution of all transcriptome profiles tended to shift towards higher values of UMAP component 2 (FIG. 8C). A putative risk cluster was defined based on the differences in the distributions of Day 1 and Day 8 samples in the UMAP. The described risk cluster includes Day 1 and Day 8 transcriptomics profiles for 6 of 8 hospitalized patients (FIG. 8D). Patients in the high-risk cluster were significantly older, white, with a higher NLR, and higher viral RNA levels in respiratory samples (Table 5). The cluster analysis also highlighted that baseline seropositive patients were less likely to be associated with the high-risk transcriptome cluster on Day 1 and no seropositive patient remained in the high-risk cluster by Day 8 (FIG. 8E). Specifically, FIG. 8C shows a 2D kernel density, presented as a contour plot, highlight distribution of transcriptomics profiles in UMAP by visit day. FIG. 8D shows a threshold on the density difference between Day 1 and Day 8 distributions defines a high-risk cluster (red fill) which encompasses Day 1 and 8 transcriptomics profiles for 6 of 8 hospitalized patients. FIG. 8E shows Day 1 and Day 8 distributions of baseline seropositive patients (n=69).

[0096] The two hospitalized patients mis-identified by the baseline transcriptome analysis were in the sotrovimab arm. One of the two patients had undetectable viral RNA in the respiratory tract at enrollment, and through 8 days post-enrollment when blood was drawn for the transcriptome analysis. This patient on sotrovimab was then hospitalized by Day 21 with elevated viral load. The second misidentified patient on sotrovimab was hospitalized due to a small intestinal obstruction deemed unrelated to COVID19. This supports the hypothesis that the outlined area in FIG. 8 corresponded to an UMAP-defined high-risk cluster where protective responses had failed to engage appropriately between Day 1 and Day 8. Although statistical power was limited due to only 8 hospitalizations in the transcriptomic sub-study, the transcriptome high risk group demonstrated a suggestive association to all-cause hospitalization and death (Fisher’s exact p=0.058) with a sensitivity of 75% [41 %-94%] and a specificity of 63% [57%-68%],

[0097] The transcriptome-derived risk cluster was significantly associated to that defined by NLR (p=0.02, FIG. 9). Specifically, FIG. 9 shows overlap between risk clusters defined by neutrophil lymphocyte ratio versus transcriptome signatures. The Fisher's exact p-value for overlap between the two definitions was 0.018. The transcriptome-based risk stratification associated with higher mean viral load differences between the high and low risk groups, relative to stratifying by NLR at baseline (FIGs. 10A and 10B). Specifically, FIGs. 10A and 10B show differences in viral load between high and low risk groups at Day 1 and Day 8, respectively. Risk groups are defined using either clinical variables (specifically neutrophil lymphocyte ratio) or transcriptomics. A boxplot of the distribution is presented within each violin plot. The white dot denotes the mean.

[0098] The transcriptome -based risk cluster encompassed 6 out of 7 hospitalizations reported among participants of the transcriptome substudy. All 6 of these correctly labeled subjects received placebo. The hospitalized subject mis-identified by the transcriptome analysis received sotrovimab. The level of viral RNA in the respiratory track for this patient was undetectable at enrollment, and through 8 days post-enrollment when blood was drawn for the transcriptome analysis.

[0099] In summary, transcriptome analysis identified a group of participants in the trial that were characterized by greater viral load, greater clinical laboratory abnormalities, and high risk of hospitalization . [00100] Examining the biology of the transcriptome-defined high-risk cluster. Genes were scored for differential expression (high versus low-risk cluster across visits) and found a widespread transcriptional shift with thousands of genes identified as differentially expressed after adjusting for multiple comparisons (FDR-adjusted p < 0.05, FIG. 11A, Table 6 shows a subset of the genes). Differentially expressed genes were characterized via gene-set enrichment analysis using the MSigDB Hallmark Gene Set annotation. The most enriched Hallmark Gene Sets were associated with the innate immune responses, in particular complement, inflammatory response, as well as the interferon alpha and gamma response gene expression modules (FIG. 1 IB). Specifically, FIGs. 11A and 1 IB show transcriptome characteristics of high-risk group. FIG. 11A shows summary of differential expression analysis results comparing high risk group to recovery group, accounting for visit day and subject gender, shown per gene with labels for top 10 among down-regulated (blue) and up-regulated (red) genes by statistical significance, respectively (padj< 0.05, absolute LFC > log2( 1.5)). FIG. 1 IB shows gene set enrichment analysis results using Hallmark Gene Sets (top 10 gene sets with padj < 0 for NES > 0; padj < 0.05 for NES < 0). LFC: log fold change. NES: Normalized enrichment score, padj: FDR- adjusted p-value.

[00101] In summary, whole transcriptome analysis classifies nearly half of patients on day 1 at high risk of disease progression, a finding that is further supported by the lack of normalization of the high risk transcriptome profile in individuals who were subsequently hospitalized

[00102] Response to treatment identified by transcriptomics. The proportion of high-risk patients at day 1, as defined by transcriptome analysis, was similar for the placebo (N=71, 46%, of which 6 had progressive disease and were hospitalized) and for sotrovimab (N=74, 49%, 1 hospitalization). Given the effect of sotrovimab on progression demonstrated in COMET-ICE, the next step involved determining whether treatment altered the probability of remaining in the transcriptome-defined high-risk group at day 8. To perform this analysis, the rate of exiting the high-risk cluster was compared between sotrovimab and placebo (FIG. 12A-12C).

[00103] Specifically, FIGs. 12A-12C show UMAP projection of transcriptomic profiles faceted by day and colored by treatment status for the (A) full, and (B) transcriptome high risk cohorts. Risk is defined by transcriptome profile at baseline (dl) and denoted by the red box. Hospitalized patients are circled in red. (C) Risk groups are predictive of viral RNA concentration in respiratory secretions. Depicted are differences in viral load between placebo and sotrovimab for the full cohort, as well as low risk transcriptome (Tx) and high risk Tx subcohorts. [00104] Across all strata, sotrovimab leads to a statistical reduction of Viral RNA (d5). By day 8, 37% (placebo) versus 7% (sotrovimab) of subjects remained high risk as defined by the transcriptome analysis. This corresponds to an 81% decrease in the expression of risk-correlated transcriptional signatures for sotrovimab relative to placebo (p-value=le-5). This effect size is comparable to that observed for protection against hospitalization. This suggests the hypothesis that the reversion of phenotypes in the peripheral blood from a high-risk-profile to a low-risk profile might be mechanistically related to the effects of antibody on viral infection.

[00105] Identifying a set of genes whose expression captures the risk-defining elements of the overall transcriptome: Having established a transcriptomic profile relevant to risk, recovery, and treatment response, the number of required genes were reduced to a number that could be measured practically by RT-PCR. Such an approach is preferable for clinical applications due to lower cost and greatly reduced turnaround time relative to whole transcriptome sequencing. To select a gene panel in the present work, genes were clustered according to their expression patterns across patients (FIG. 2F) into 10 groups using UMAP and K-means clustering. The top gene from each group was selected. The 10 gene panel (CD38, DAB2, EFHC2, EIF2D, EIF4B, MY018A, NUDT3, OAS2, RPL10, TADA3) was able to accurately recapitulate the whole transcriptome risk clusters at both day 1 (AUROC=0.94) and day 8 (AUROC=0.98; FIG. 13A). The expression of each gene in the panel is shown in FIGs. 13B-13E. Specifically, FIGs. 13A-E shows surrogates to predict both risk of COVID-19 disease and response to sotrovimab using a 10 gene panel. FIG. 13A shows results of a cross-validation in which the 10-gene panel accurately predicts risk groups assigned by the full transcriptome, at both day 1 and day 8. FIG. 13B shows changes in expression for each of the genes in the 10- gene panel from day 1 to day 8. FIG. 13C shows performance of each of the genes in the same 10-gene panel to track the response (change in expression from day 1 to day 8) to sotrovimab versus placebo. FIG. 13D shows changes in expression for each of the genes in the 10-gene panel from day 1 to day 8. FIG. 13E shows performance of each of the genes in the same 10- gene panel to track the response (change in expression from day 1 to day 8) to sotrovimab versus placebo.

[00106] Expression of the 10-gene-panel was highly associated to viral load and hospitalization, and strongly affected by sotrovimab (FIGs. 14A-C). Specifically, FIGs. 14A- 14C show surrogates of risk and recovery using a 10-gene panel. The 10-gene panel-derived risk groups separate patients by viral load at both day 1 and day 8. FIG. 14A shows that the cross- validated, panel-derived risk assignments were associated to significantly higher viral load at both day 1 and day 8 (p< le-5). FIG. 14B shows that 11/14 samples from hospitalized patients were classified as high risk (p=1.5e-4. FIG. 14C shows that at day 8, 16% of patients in the placebo group remained in the panel-derived risk group, compared to 5% in the sotrovimab group, corresponding to a risk reduction of 69% (p=2e-3). On the transcriptomic subset, the 10- gene risk stratification had a sensitivity of 88% [55 %-98%] and a specificity of 64.5% [59.0%- 69.8%] for the prediction of all-cause hospitalization and death (Fisher’s exact p-value=0.005). A comparison of this performance across risk predictors is presented in Table 7.

[00107] Additionally, the 10 gene transcriptomic profile was further validated in an independent validation cohort. Specifically, FIG. 15A shows the expression levels for each of the 10 panel genes for high risk versus low risk patients, as defined by the full transc riptome. Additionally. FIG. I5B shows the same 10 genes in an independent validation cohort. With the exception of MYO18A, the trend for each gene between high and low risk in the study matches the trends with symptom severity in an independent validation cohort reported in Hu, et al., “Early immune responses have long-term associations with clinical, virologic, and immunologic outcomes in patients with COVID-19.” Res. Sq, (2022), which is hereby incorporated by reference in its entirety. In summary, a ten gene panel enables one to envision practical application of transcriptome -based risk stratification and monitoring in a clinical setting. These observations indicate that such tools not only define patients at highest risk of Covid- 19 progression but could constitute actionable surrogate markers of response to treatment.

Discussion

[00108] The present study identifies subgroups of COMET-ICE study participants at high risk of progression to severe COVID- 19. Here, clinical laboratory and molecular biomarkers that identify the high-risk groups are also markers of response to treatment and are associated with high viral RNA levels in the nasopharynx at baseline and lower viral RNA levels upon treatment with sotrovimab.

[00109] Measurement of peripheral blood mononuclear call counts is frequently used and proved informative in defining risk. Among various ways to use these counts the NLR was most informative. While specific, NLR proved to have insufficient sensitivity. In contrast, whole blood transcriptome analysis had an excellent performance in terms of sensitivity and specificity and flexibility for establishing the boundaries defining the high-risk group.

[00110] The high-risk group, defined by NLR or by transcriptome, was characterized by higher levels of viral RNA in nasopharynx at baseline. This suggests both that the underlying immune and inflammatory responses of these individuals may not be sufficient to limit viral replication. If events in the lung or other organs mimic this deficiency observed in the upper respiratory tract, then this could provide a virologic explanation for progression to severe disease. In addition, in response to sotrovimab, individuals in the high-risk group experienced a more substantial reduction in viral load compared to placebo. Therefore, this work defines a set of parameters that are good surrogate markers of sotrovimab response because they are modified by treatment and strongly associated to the study clinical endpoints of interest (severity, hospitalization) at the time points analyzed. Laboratory parameters such as the NLR would be easiest for clinical use but lack sensitivity. Transcriptomic biomarkers would be optimal but are more demanding for broad use. For example, a fmgerstick blood test has been recently developed which generates a "tuberculosis score’ based on mRNA expression of 3 genes. This test could serve for point of care triage test for tuberculosis.

[00111] Currently, the risk of developing severe COVID-19 is generally defined by demographics and by the presence or absence of specific underlying medical conditions characteristics such as body mass index. The COMET-ICE trial recruited individuals at risk as defined by epidemiological characteristics. The data presented here contribute clinical laboratory and molecular biomarkers that identify those that are at the highest risk for disease progression as well as defining the population that benefits the most from the administration of monoclonal antibodies. This approach could expand the definition of high-risk to include individuals that might otherwise not be considered for treatment on the sole basis of epidemiological and demographic risk criteria that are currently in use. The data does not bear on this question since the trail only enrolled those considered at risk according to current criteria, but severe COVID- 19 does affect those across various risk strata that were not recruited in the COMET-ICE trial.