Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITION FOR USE IN THE TREATMENT OF COVID-19
Document Type and Number:
WIPO Patent Application WO/2022/195558
Kind Code:
A1
Abstract:
Compositions and methods for treating a SARS-CoV-2 infection are provided. The disclosed compositions and methods are based on an interaction pathways and inhibition of serine 206 phosphorylation of the Nucleocapsid (N) protein of SARS-CoV-2. The compositions include one of more compounds capable of inhibition of phosphorylation of the Nucleocapsid serine 206. The amount of one or more compounds can be effective to, for example, reduce viral replication, reduce one or more symptoms of a disease, disorder, or illness associated with virus, or a combination thereof. Method for identifying a subject as having an elevated risk of developing one or more symptoms associated with severe COVID-19 are also provided.

Inventors:
SHUAIB MUHAMMAD (SA)
MOURIER TOBIAS (SA)
PAIN ARNAB (SA)
Application Number:
PCT/IB2022/052501
Publication Date:
September 22, 2022
Filing Date:
March 18, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV KING ABDULLAH SCI & TECH (SA)
International Classes:
A61K31/00; A61K31/192; A61K31/5377; A61P31/14; G01N33/50
Foreign References:
US20170252449A12017-09-07
Other References:
RUDD CHRISTOPHER E. ET AL: "GSK-3 Inhibition as a Therapeutic Approach Against SARs CoV2: Dual Benefit of Inhibiting Viral Replication While Potentiating the Immune Response", FRONTIERS IN IMMUNOLOGY, vol. 11, 26 June 2020 (2020-06-26), pages 26, XP055928867, DOI: 10.3389/fimmu.2020.01638
RANA ANIL KUMAR ET AL: "Glycogen synthase kinase-3: A putative target to combat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic", CYTOKINE & GROWTH FACTOR REVIEWS, vol. 58, 25 August 2020 (2020-08-25), GB, pages 92 - 101, XP055928930, ISSN: 1359-6101, DOI: 10.1016/j.cytogfr.2020.08.002
TUNG H Y LIM ET AL: "Mutations in the phosphorylation sites of SARS-CoV-2 encoded nucleocapsid protein and structure model of sequestration by protein 14-3-3", BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, ELSEVIER, AMSTERDAM NL, vol. 532, no. 1, 15 August 2020 (2020-08-15), pages 134 - 138, XP086270586, ISSN: 0006-291X, [retrieved on 20200815], DOI: 10.1016/J.BBRC.2020.08.024
SWAYNE LEIGH ANNE ET AL: "Consideration of Pannexin 1 channels in COVID-19 pathology and treatment", AMERICAN JOURNAL OF PHYSIOLOGY - LUNG CELLULAR AND MOLECULAR PHYSIOLOGY, vol. 319, no. 1, 1 July 2020 (2020-07-01), US, pages 121 - 125, XP055928823, ISSN: 1040-0605, DOI: 10.1152/ajplung.00146.2020
MOURIER TOBIAS ET AL: "SARS-CoV-2 genomes from Saudi Arabia implicate nucleocapsid mutations in host response and increased viral load", NATURE COMMUNICATIONS, vol. 13, no. 1, 1 February 2022 (2022-02-01), XP055928782, Retrieved from the Internet DOI: 10.1038/s41467-022-28287-8
HILLAR ET AL., ANTICANCER DRUGS, vol. 22, 2011, pages 978 - 985
GAISINA ET AL., J MED CHEM., vol. 52, 2009, pages 1853 - 1863
ANSEL, HOWARD C. ET AL.: "Pharmaceutical Dosage Forms and Drug Delivery Systems", 1995, WILLIAMS AND WILKINS
TALIK ET AL., SEPAR. PURIF. REV., vol. 41, 2012, pages 1 - 61
LAMNEWHOUSE, CHEST, vol. 98, 1990, pages 44 - 52
HAYASHI ET AL., ANTICANCER RES, vol. 25, 2005, pages 2399 - 2405
KIMURA ET AL., J. ORTHOP. SCI., vol. 14, 2009, pages 556 - 565
MONJI F ET AL., EUR JPHARMACOL., vol. 887, 15 November 2020 (2020-11-15), pages 173561
SUGIN LAL JABARIS S ET AL., PULM PHARMACOL THER, vol. 66, February 2021 (2021-02-01), pages 101978
ZHANG ET AL., SCI DATA, vol. 6, 2019, pages 278
FAJNZYLBER ET AL., NAT. COMMUN., vol. 11, 2020, pages 5493
ORGANIZATION, W. H., CORONAVIRUS DISEASE (COVID-19) WEEKLY EPIDEMIOLOGICAL UPDATE AND WEEKLY OPERATIONAL UPDATE, 2020
DONG, E.DU, H.GARDNER, L: "An interactive web-based dashboard to track COVID-19 in real time", LANCET INFECT DIS, vol. 20, 2020, pages 533 - 534, XP086152221, DOI: 10.1016/S1473-3099(20)30120-1
CENTER, J. H. U. M. C. R, COVID-19 DASHBOARD, 2020, Retrieved from the Internet
EBRAHIM, S. H. & MEMISH, Z. A: "COVID-19: preparing for superspreader potential among Umrah pilgrims to Saudi Arabia", LANCET, vol. 395, 2020, pages e48, XP086086209, DOI: 10.1016/S0140-6736(20)30466-9
MEMISH, Z. A.ALJERIAN, N.EBRAHIM, S. H.: "Tale of three seeding patterns of SARS-CoV-2 in Saudi Arabia", LANCET INFECT DIS, 2020
TUITE, A. R.: "Estimation of Coronavirus Disease 2019 (COVID-19) Burden and Potential for International Dissemination of Infection From Iran", ANN INTERN MED, vol. 172, 2020, pages 699 - 701
NEWS, A, SAUDI ARABIA ANNOUNCES FIRST CASE OF CORONAVIRUS, 2020, Retrieved from the Internet
GUSSOW, A. B. ET AL.: "Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 117, 2020, pages 15193 - 15199
LU, J. ET AL.: "Genomic Epidemiology of SARS-CoV-2 in Guangdong Province, China", CELL, vol. 181, 2020, pages 997 - 1003
HADFIELD, J. ET AL.: "Nextstrain: real-time tracking of pathogen evolution", BIOINFORMATICS, vol. 34, 2018, pages 4121 - 4123
RAMBAUT, A. ET AL.: "A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology", NATMICROBIOL, vol. 5, 2020, pages 1403 - 1407, XP037277086, DOI: 10.1038/s41564-020-0770-5
BONI, M. F. ET AL.: "Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic", NAT MICROBIOL, vol. 5, 2020, pages 1408 - 1417, XP037277090, DOI: 10.1038/s41564-020-0771-4
LEARY, S. ET AL.: "Three adjacent nucleotide changes spanning two residues in SARS-CoV-2 nucleoprotein: possible homologous recombination from the transcription-regulating sequence", BIORXIV, 2020
TOYOSHIMA, Y.NEMOTO, K.MATSUMOTO, S.NAKAMURA, Y.KIYOTANI, K: "SARS-CoV-2 genomic variations associated with mortality rate of COVID-19", J HUM GENET, vol. 65, 2020, pages 1075 - 1082, XP037283149, DOI: 10.1038/s10038-020-0808-9
SINGH, J.SINGH, H.HASNAIN, S. E.RAHMAN, S. A: "Mutational signatures in countries affected by SARS-CoV-2: Implications in host-pathogen interactome", BIORXIV, 2020
WU, S. ET AL.: "Effects of SARS-CoV-2 mutations on protein structures and intraviral protein-protein interactions", J MED VIROL, 2020
RAO, S. N.MANISSERO, D.STEELE, V. R.PAREJA, J: "A Systematic Review of the Clinical Utility of Cycle Threshold Values in the Context of COVID-19", INFECTDIS THER, vol. 9, 2020, pages 573 - 586
KORBER, B. ET AL.: "Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus", CELL, vol. 182, 2020, pages 812 - 827, XP055907214, DOI: 10.1016/j.cell.2020.06.043
LORENZO-REDONDO, R. ET AL.: "A Unique Clade of SARS-CoV-2 Viruses is Associated with Lower Viral Loads in Patient Upper Airways", MEDRRIV, 2020
VOLZ, E. ET AL.: "Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity", CELL, vol. 184, 2021, pages 64 - 75
NG, P. C.HENIKOFF, S: "SIFT: predicting amino acid changes that affect protein function", NUCLEIC ACIDS RES, vol. 31, 2003, pages 3812 - 3814, XP002995775, DOI: 10.1093/nar/gkg509
RAHMAN, M. S. ET AL.: "Evolutionary dynamics of SARS-CoV-2 nucleocapsid protein and its consequences", J MED VIROL,, 2020
GUAN, Q. ET AL.: "A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic", INT JLNFECT DIS, vol. 100, 2020, pages 216 - 223, XP086345175, DOI: 10.1016/j.ijid.2020.08.052
HE, R. T. ET AL.: "Analysis of multimerization of the SARS coronavirus nucleocapsid protein", BIOCHEM BIOPH RES CO, vol. 316, 2004, pages 476 - 483, XP004495908, DOI: 10.1016/j.bbrc.2004.02.074
CHANG, C. K.CHEN, C. M. M.CHIANG, M. H.HSU, Y. L.HUANG, T. H.: "Transient Oligomerization of the SARS-CoV N Protein - Implication for Virus Ribonucleoprotein Packaging", PLOS ONE, vol. 8, 2013
CHAO WU, A. J. Q., ASMAA HACHIM, NILOUFAR KAVIAN, AIDAN R. COLE, AUSTIN B. MOYLE, NICOLE D. WAGNER, JOYCE SWEENEY-GIBBONS, HENRY W: "Characterization of SARS-CoV-2 N protein reveals multiple functional consequences of the C-terminal domain", BIORXIV, 2020
GORDON, D. E. ET AL.: "A SARS-CoV-2 protein interaction map reveals targets for drug repurposing", NATURE, vol. 583, 2020, pages 459, XP055889773, DOI: 10.1038/s41586-020-2286-9
LOWREY, A. J.CRAMBLET, W.BENTZ, G. L.: "Viral manipulation of the cellular sumoylation machinery", CELL COMMUN SIGNAL, vol. 15, 2017
WU, C. H.CHEN, P. J.YEH, S. H.: "Nucleocapsid Phosphorylation and RNA Helicase DDX1 Recruitment Enables Coronavirus Transition from Discontinuous to Continuous Transcription", CELL HOST MICROBE, vol. 16, 2014, pages 462 - 472, XP029073149, DOI: 10.1016/j.chom.2014.09.009
WU, C. H. ET AL.: "Glycogen Synthase Kinase-3 Regulates the Phosphorylation of Severe Acute Respiratory Syndrome Coronavirus Nucleocapsid Protein and Viral Replication", JBIOL CHEM, vol. 284, 2009, pages 5229 - 5239
CARLSON, C. R. ET AL.: "Phosphoregulation of Phase Separation by the SARS-CoV-2 N Protein Suggests a Biophysical Basis for its Dual Functions", MOL CELL, vol. 80, 2020, pages 1092, XP086414333, DOI: 10.1016/j.molcel.2020.11.025
SAVASTANO, ADE OPAKUA, A. I.RANKOVIC, M.ZWECKSTETTER, M.: "Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates", NAT COMMUN, vol. 11, 2020
LU, S. ET AL.: "The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein", NATURE COMMUNICATIONS, vol. 12, 2021, pages 502
GEOGHEGAN, J. L.HOLMES, E. C.: "The phylogenomics of evolving virus virulence", NAT REV GENET, vol. 19, 2018, pages 756 - 769, XP036637433, DOI: 10.1038/s41576-018-0055-5
ALIZON, S.HURFORD, A.MIDEO, N.VAN BAALEN, M: "Virulence evolution and the trade-off hypothesis: history, current state of affairs and the future", JEVOL BIOL, vol. 22, 2009, pages 245 - 259
ANDERSON, R. M.MAY, R. M: "Coevolution of hosts and parasites", PARASITOLOGY, vol. 85, 1982, pages 411 - 426
ANDERSON, R. M.MAY, R. M: "Population biology of infectious diseases: Part I", NATURE, vol. 280, 1979, pages 361 - 367, XP037128568, DOI: 10.1038/280361a0
ARIF, T. B: "The 501.V2 and B. 1.1.7 variants of coronavirus disease 2019 (COVID-19): A new time-bomb in the making?", INFECT CONTROL, vol. 1-2, 2021
GRUBAUGH, N. D.HANAGE, W. P.RASMUSSEN, A. L.: "Making Sense of Mutation: What D614G Means for the COVID-19 Pandemic Remains Unclear", CELL, vol. 182, 2020, pages 794 - 795
MCBRIDE, R.VAN ZYL, M.FIELDING, B. C: "The coronavirus nucleocapsid is a multifunctional protein", VIRUSES, vol. 6, 2014, pages 2991 - 3018, XP055200197, DOI: 10.3390/v6082991
CHANG, C. K.HOU, M. H.CHANG, C. F.HSIAO, C. D.HUANG, T. H.: "The SARS coronavirus nucleocapsid protein--forms and functions", ANTIVIRAL RES, vol. 103, 2014, pages 39 - 50, XP028615959, DOI: 10.1016/j.antiviral.2013.12.009
LAL, M. S. A. S. K: "Molecular Biology of the SARS-Coronavirus", vol. 129-151, 2009
WEGENER, M.MULLER-MCNICOLL, M: "View from an mRNP: The Roles of SR Proteins in Assembly, Maturation and Turnover", ADV EXPMED BIOL, vol. 1203, 2019, pages 83 - 112
BOUHADDOU, M. ET AL.: "The Global Phosphorylation Landscape of SARS-CoV-2 Infection", CELL, vol. 182, 2020, pages 685 - 712
NATHAN, K. G.LAL, S. K: "The Multifarious Role of 14-3-3 Family of Proteins in Viral Replication", VIRUSES, vol. 12, 2020
VERHEIJE, M. H. ET AL.: "The Coronavirus Nucleocapsid Protein Is Dynamically Associated with the Replication-Transcription Complexes", J VIROL, vol. 84, 2010, pages 11575 - 11579
CHEN, H. Y. ET AL.: "Mass spectroscopic characterization of the coronavirus infectious bronchitis virus nucleoprotein and elucidation of the role of phosphorylation in RNA binding by using surface plasmon resonance", J VIROL, vol. 79, 2005, pages 1164 - 1179
PENG, T. YLEE, K. R.TARN, W. Y.: "Phosphorylation of the arginine/serine dipeptide-rich motif of the severe acute respiratory syndrome coronavirus nucleocapsid protein modulates its multimerization, translation inhibitory activity and cellular localization", FEBS J, vol. 275, 2008, pages 4152 - 4163
V'KOVSKI, P. ET AL.: "Determination of host proteins composing the microenvironment of coronavirus replicase complexes by proximity-labeling", ELIFE, vol. 8, 2019
Download PDF:
Claims:
We claim:

1. A method of treating a subject for a coronavirus infection comprising determining that the subject is infected with a SARS-CoV-2 variant having increased levels of phosphorylation at Serine 206 (S206) in the nucleocapsid protein of the SARS-CoV-2 variant compared to a control and administering the subject an effective amount of one of more compounds capable of inhibition of phosphorylation of the SARS-CoV-2 nucleocapsid serine 206.

2. The method of claim 1 , wherein the probenecid or pharmaceutically acceptable salt thereof is in a pharmaceutical composition further comprising a pharmaceutically acceptable carrier and/or excipient.

3. The method of claim 1, wherein the probenecid or pharmaceutically acceptable salt thereof is administered systemically.

4. The method of claim 1 , wherein the probenecid or pharmaceutically acceptable salt thereof is administered orally, parenterally, topically, or mucosally.

5. The method of claim 4, wherein the probenecid or pharmaceutically acceptable salt thereof is administered mucosally to the lungs, nasal mucosa, or combination thereof.

6. The method of claim 1 , wherein the probenecid or pharmaceutically acceptable salt thereof is administered in an effective amount to reduce viral replication.

7. The method of claim 1 , wherein the probenecid or pharmaceutically acceptable salt thereof is administered in an effective amount to reduce one or more symptoms of a disease, disorder, or illness associated with the coronavirus.

8. The method of claim 7, wherein the symptoms include fever, congestion in the nasal sinuses and/or lungs, runny or stuffy nose, cough, sneezing, sore throat, body aches, fatigue, shortness of breath, chest tightness, wheezing when exhaling, chills, muscle aches, headache, diarrhea, tiredness, nausea, vomiting, and combinations thereof.

9. The method of any one of claims 1-8 wherein the one or more compounds is a GSK3 kinase inhibitor. 10. The method of any one of claims 1-9, wherein the compound is SAR502250 or AZD1080.

11. A method for identifying a subject as having an elevated risk of developing one or more symptoms associated with severe COVID-19, comprising i) obtaining a sample from the subject, ii) determining that the subject is infected with a SARS-CoV-2 variant having increased levels of phosphorylation at Serine 206 (S206) in the nucleocapsid protein of the SARS-CoV-2 variant compared to a control, iii) diagnosing the subject as having an elevated risk of developing one or more symptoms associated with severe COVID-19.

12. The method of claim 11, wherein the control is a SARS-CoV-2 variant having arginine at amino acid position 203 and glycine at amino acid position 204 in the nucleocapsid protein.

13. The method of claim 11 or claim 12, wherein the level of phosphorylation at S206 in the N protein is increased by 2-fold, 3 -fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more than 10-fold compared to the control.

14. The method of any one of claims 11-13, wherein the SARS-CoV-2 variant in the subject diagnosed as having an elevated risk of developing one or more symptoms associated with severe COVID-19 carries one or more genetic mutations selected from the group consisting of G28881 A, G28882A, and G28883C.

15. The method of any one of claims 11-14, wherein the SARS-CoV-2 variant in the subject diagnosed as having an elevated risk of developing one or more symptoms associated with severe COVID-19 carries one or more amino acid substitutions selected from the group consisting of R203K and G204R in the nucleocapsid protein.

16. The method of any one of claims 11-15, wherein the sample is a bodily fluid of the subject, the bodily fluid selected from the group consisting of mucus, sputum (processed or unprocessed), bronchial alveolar lavage (BAL), bronchial wash (BW), bodily fluids, cerebrospinal fluid (CSF), urine, tissue (e.g., biopsy material), rectal swab, nasopharyngeal aspirate, nasopharyngeal swab, throat swab, feces, plasma, serum, and whole blood.

17. The method of claim 16, wherein the sample is obtained from a nasopharyngeal swab, a nasopharyngeal aspirate, sputa/deep throat saliva, or a throat swab of the subject.

18. The method of any one of claims 11-17, wherein the subject is selected from the group consisting of a subject who has one or more symptoms of COVID-19, an asymptomatic subject who is at increased risk of being infected with SARS-CoV-2 virus, a subject who has received a vaccine against infection with SARS-CoV-2 virus.

19. The method of any one of claims 11-18, wherein the subject is identified as at elevated risk of developing one or more symptoms associated with severe COVID-19 with a confidence level of at least a 50%, 60%, 70%, 80%, 90%, 95%, 97%, or 99%.

20. The method of any one of claims 11-19, further comprising treating the subject determined as having an elevated risk of developing one or more symptoms associated with severe COVID-19 with one or more therapeutic agents effective against severe COVID-19.

21. The method of claim 20, wherein the therapeutic agent is selected from the group consisting of antiviral drugs and monoclonal antibodies specific for SARS-CoV-2 virus.

22. The method of claim 21, wherein the antiviral drugs are selected from the group consisting of remdesivir and nirmatrelvir.

23. The method of claim 21, wherein the monoclonal antibodies are selected from the group consisting of casirivimab, imdevimab, bamlanivimab, etesevimab, baricitinib, and sotrovimab.

24. The method of any one of claims 20-23 further comprising discontinuing treatment of the subject if the level of phosphorylation at S206 in the N protein is reduced, by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more than 90% compared to the level prior to the treatment.

25. The method of any one of claims 20-23 further comprising discontinuing treatment of the subject if the level of phosphorylation at S206 in the N protein is similar to the control sample.

Description:
COMPOSITION FOR USE IN THE TREATMENT OF COVID-19

CROSS-REFERENCE TO RELATED APPLICATION

The present application claims priority to U.S. Application No. 63/162,834, filed March 18, 2021, and U.S. Application No. 63/302,197, filed January 24, 2022, the disclosures of which are incorporated herein by reference in their entirety.

FIELD OF THE INVENTION

This invention is generally in the field of compositions and methods for treating a SARS-CoV-2 infection.

BACKGROUND OF THE INVENTION

The emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes the respiratory coronavirus infectious disease 2019 (COVID-19), resulted in a pandemic that has triggered an unparalleled public health 1 2 . The global spread of SARS-CoV- 2 depended fundamentally on human mobility patterns.

As the COVID-19 pandemic is still ongoing, there is a need for novel therapeutic strategies to treat severe infections in patients.

It is an object of the present invention to provide compositions and method for treating a SAR-CoV-2 infection in a subject.

It is a further object of the present invention to provide compositions and methods that identify patients having an elevated risk of developing one or more symptoms associated with severe COVID-19.

It is yet another object of the present invention to provide compositions and methods that help select and guide treatment regimens for patients infected with SARS-CoV-2.

SUMMARY OF THE INVENTION

Compositions and methods for treating a SARS-CoV-2 infection are provided. The disclosed compositions and methods are based on an interaction pathways and inhibition of serine 206 phosphorylation of the Nucleocapsid (N) protein of SARS-CoV-2. The compositions include one of more compounds capable of inhibition of phosphorylation of the Nucleocapsid serine 206. In one embodiment, the compound is a phosphorylation of GSK3A inhibitor. Exemplary GSK3A inhibitors include SAR502250, AZD1080. The amount of one or more compounds can be effective to, for example, reduce viral replication, reduce one or more symptoms of a disease, disorder, or illness associated with virus, or a combination thereof. Symptoms include, but are not limited to, fever, congestion in the nasal sinuses and/or lungs, runny or stuffy nose, cough, sneezing, sore throat, body aches, fatigue, shortness of breath, chest tightness, wheezing when exhaling, chills, muscle aches, headache, diarrhea, tiredness, nausea, vomiting, and combinations thereof. The subject can be, for example, a mammal or a bird. In preferred embodiments, the subject is a human.

The subject can be symptomatic or asymptomatic. In some embodiments, the subject has been, or will be, exposed to the virus. In some embodiments, treatment begins 1, 2, 3, 4, 5, or more hours, days, or weeks prior to or after exposure to the virus. In some embodiments, the subject has not been exposed to the virus. In some embodiments, the subject anticipates being exposed to the virus. Thus, preventative and prophylactic methods are also provided.

The disclosed compositions can be administered systemically or locally. Exemplary routes of administration include, but are not limited to, oral, parenteral, topical or mucosal. In some embodiments, the composition is administered to lungs (e.g., pulmonary administration) by oral inhalation or intranasal administration. In some embodiments, the composition is administered intranasally to the nasal mucosa.

Methods and compositions for assessing the risk of developing one or more symptoms associated with severe COVID- 19 in a subject are provided. The methods generally involve the step of i) obtaining a sample from the subject, ii) determining that the subject is infected with a SARS-CoV-2 variant having increased levels of phosphorylation at Serine 206 (S206) in the nucleocapsid protein of the SARS-CoV-2 variant compared to a control, and iii) identify the subject as having an elevated risk of developing one or more symptoms associated with severe COVID- 19. In some embodiments, control is a SARS-CoV-2 variant having arginine at amino acid position 203 and glycine at amino acid position 204 in the nucleocapsid protein. In some embodiments, the level of phosphorylation at S206 in the N protein is 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold. 8-fold, 9-fold, 10-fold, or more than 10- fold compared to the control. In further embodiments, the methods involve steps of detecting the presence of one or more mutations of G28881A, G28882A, and G28883C of the viral genome, and/or one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein.

Typically, the sample is a bodily fluid of the subject, the bodily fluid selected from the group consisting of mucus, sputum (processed or unprocessed), bronchial alveolar lavage (BAL), bronchial wash (BW), bodily fluids, cerebrospinal fluid (CSF), urine, tissue (e.g., biopsy material), rectal swab, nasopharyngeal aspirate, nasopharyngeal swab, throat swab, feces, plasma, serum, and whole blood. In preferred embodiments, the sample is obtained from a nasopharyngeal swab, a nasopharyngeal aspirate, sputa/deep throat saliva, or a throat swab of the subject. The methods are suitable for use in a subject who has one or more symptoms of COVID-19, an asymptomatic subject who is at increased risk of being infected with SARS-CoV-2 virus, or a subject who has received a vaccine against infection with SARS-CoV-2 virus. Generally, the methods can identify subject as at elevated risk of developing one or more symptoms associated with severe COVID-19 with a confidence level of at least a 50%, 60%, 70%, 80%, 90%, 95%, 97%, or 99%.

Methods for guiding risk classification and treatment strategies in subjects identified as predisposed to developing severe COVID are also described. In some embodiments, the methods further involved the steps of treating the subject determined as having an elevated risk of developing one or more symptoms associated with severe COVID-19 with one or more therapeutic agents effective against severe COVID-19. Exemplary therapeutic agents include antiviral drugs such as remdesivir and nirmatrelvir, and monoclonal antibodies specific for SARS-CoV-2 virus such as casirivimab, imdevimab, bamlanivimab, etesevimab, baricitinib, and sotrovimab. In some embodiments, the methods further involved the steps of discontinuing treatment of the subject if the level of phosphorylation at S206 in the N protein is reduced, by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more than 90% compared to the level prior to the treatment, or if the level of phosphorylation at S206 in the N protein is similar to the control sample. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1A. Locations of the sampling cities within Saudi Arabia. FIG. IB) Combined numbers of samples retrieved from the 4 cities and the Eastern region during the first six months of the pandemic. Cities are colored as in panel A. Months are shown at the bottom of the figure, and each month is divided into 5 -days intervals. New daily cases for the city of Khobar is shown on the Eastern Region plot. Major restrictions imposed by the Ministry of Health and by Royal decrees are indicated above plots. FIG. 1C) Combined average numbers of new daily cases. FIG. ID) Estimate of effective reproduction number [Reff] over time in Saudi Arabia (top) and the estimate of effective population size [Ne], the relative population size required to produce the diversity seen in the sample (bottom). The red horizontal red line represents an R of 1, the level required to sustain epidemic growth. Grey confidence areas denote the 95% credible intervals. FIG. IE. The 836 detected SNPs are shown along their positions in the SARS-CoV-2 genome (x-axis) and their frequency in the Saudi samples (y- axis). High-frequency SNPs are highlighted along with the 3 SNPs underlying the R203K/G204R changes in the N protein (G28881A;G2882A;G28883C). FIG. IF shows scatter plot of SNP frequencies in Saudi samples (y-axis) and in global, non-Saudi samples available from GISAID in 2020. SNPs differing by at least 0.1 in absolute values are highlighted in blue. FIG. 1G. Samples were grouped into 10-days periods according to their sampling date. The number of SNPs (compared to the Wuhan-Hu- 1 reference) was recorded. Within each time group, the average number of SNPs and the standard deviation were then calculated. Average SNPs in global samples - excluding Saudi Arabia -are shown as red line, with boxes showing plus/minus one standard deviation. Average SNPs in samples from Saudi Arabia are shown as blue line with whiskers denoting plus/minus one standard deviation.

FIG. 2A. Global Time scaled phylogeny of 952 Saudi samples coloured by Nextstrain clades. FIG. 2B shows Distributions of importation dates for the 5 Nextstrain (nextstrain.org) clades found in Saudi Arabia coloured by clade... FIG. 2C. Top: The numbers of samples from Saudi Arabia presented in this study are shown as bars by their sampling date (January 2020-March 2021). Bottom: Samples deposited in GISAID. On both plots, lines show the fraction of samples having the R203K/G204R SNPs (red line), having both the R203K/G204R SNPs and the Spike protein N501Y SNP (blue line), and having the Spike protein D614G SNP (green line). FIG. 2D shows density distributions of virus copy numbers derived from Ct measurements. Ct values from the N1 primer pairs weer normalized by RNase P primer pair values and coverted to cpy numbers from a standard curve.

FIGs. 3A-G. SNP analysis. FIG. 3 A) Manhattan plot showing the association between SARS-CoV-2 SNPs and recorded mortality. Negative logio(uncorrected p-values) from Fisher's exact tests are shown as red circles. Gene boundaries are indicated by background colors (listed on top), and the three R203K/G204R SNPs (positions 28,881-28,883) in the N gene are highlighted. FIG. 3B) Overview of the three SNPs underlying the R203K/G204R changes. Amino acid numbers in the N protein are shown above. FIG. 3C) The number of samples with and without the R203K/G204R SNPs and their recorded mortality rates. Odds ratio is calculated as (diseased with SNPs/diseased without SNPs)/(not diseased with SNPs/not diseased without SNPs). FIG. 3D) Samples with and without the R203K/G204R SNPs shown as bars (left y-axis) for different patient age groups (x-axis). The mortality rates for each age group are shown as lines (right x-axis). FIG. 3E) The numbers of Saudi samples with (dark bars) and without (light bars) the R203K/G204R SNPs are shown through 2020. The R203K/G204R frequencies are shown as circles connected by lines (right y- axis). FIG. 3F) Similar to panel E), but showing numbers for non-Saudi samples deposited in GISAID. FIG. 3G) The number of samples with R203K/G204R and reference alleles in 1,419 samples with patient status metadata available in GISAID divided into mild (825 samples) and severe cases (594 samples). R203K/G204R SNP frequencies are listed above bars, and p-value from Fisher's exact two-sided test is shown above plot.

FIGS. 4A-F. Identification of host-interacting partners and profiling of phosphorylation status of mutant and control SARS-CoV-2 N protein by Affinity Mass-Spectrometry. Identification of host-interacting partners and profiling of phosphorylation status of mutant and control SARS-CoV-2 N protein by Affinity Mass-Spectrometry. FIG. 4A) A schematic diagram showing the SARS-CoV-2 N protein different domains (Upper: control, Lower: mutant) and highlighting the mutation site (R203K and G204R). The bar-plot (lower panel) indicates the SIFT 21 predicted deleteriousness score of substitution at position 204 from G to R. FIG. 4B) Volcano plot displaying the differential interactions of pairwise comparisons (mutant vs control) in - Log10 adj. p-values vs. the Log2 protein fold change. Proteins with statistically significant (Adjusted p-value <= 0.05, and Log fold change >= 1) difference between mutant and control AP-MS conditions are highlighted. FIG. 4C) Heatmap showing significantly differentially changed human proteins (3 replicates) interactome in mutant versus control N protein AP-MS analysis. FIG. 4D) Gene Ontology (GO)- enrichment analysis of significantly changed terms between mutant and control proteins in terms of biological process and pathway enrichment. The scale shows p-value adjusted Log2 of odds ration mutant-versus-control. FIG. 4E) Sketch showing part of SR-rich motif of SARS-CoV-2 N protein containing the KR mutation site (R203K and G204R) (Lower). The hyper- phosphorylated serine 206 in the mutant N protein near the KR mutation site is indicated in orange color. FIG. 4F) Phosphorylation status of mutant and control N protein was analyzed by mass spectrometry (3 biological replicates per affinity condition). Bar-plot shows the Log2 intensities of selected phosphorylated peptides in control and mutant condition. Serine 206 is hyper-phosphorylated in mutant N protein.

FIGS. 5A-5B show Co-occurrences of SNPs shown as Jaccard Index

(FIG. 5 A) and log2 odds-ratio (FIG. 4B). The co-occurrence between the three SNPs in the R203K & G204R mutations (as indicated above plots) and all SNPs present in at least 20 samples (x-axes) are shown as circles. Co- occurrences between the three SNPs are highlighted in orange

FIG. 6A Mortality of R203K/G304R and reference allele samples shown independently for nationalities with at least 25 samples with relevant patient outcome information. Nationalities are sorted according to available sample numbers. With the exception of Bangladesh and Egypt, all nationalities display increased R203K/G204R mortality. Mortality rates within our samples vary across our sampling window (FIG. 6B), and also when looking at national numbers from the entire Saudi Arabia Kingdom (FIG. 6C). As the majority of Egypt and Bangladesh nationals were sampled prior to April 2020 (FIG. 6D), their observed deviations in (FIG. 6A) may simply be a reflection of this. FIG. 6E. Scatterplot of R203K/G204R SNP frequencies (obtained from GISAID on December 31st 2020) and recorded deaths per case (values downloaded from worldometer.org on January 18th, 2020) for 128 countries. Countries with at least 100 GISAID samples are highlighted in grey. Regression lines are shown for all countries (black line; y=0.002172x + 0.017860; Kendall’s tau=0.077; p=0.21) and for countries with at least 100 samples (grey line; y=-0.00137 + 0.02102 Kendall’s tau=0.028; p=0.74).

FIG. 7 shows number of samples with and without R203K/G204R SNPs (bars) and the frequencies of R203K/G204R SNPs (lines) shown for individual continents.

FIG. 8 shows boxplots showing the distribution of virus copy number derived from Ct measurements obtained with N1 primers. Only samples tested using the TaqPath kit were included. P -values from Mann- Whitney U test for difference in medians are shown above. WT denote the genotype in the Wuhan reference sequence.

FIGSs. 9A-D. Oligomerization and RNA binding analysis of mutant and control N protein. FIGSs. 9 A) BS3 cross-linking (2mM) and SDS- PAGE analysis of the oligomerization forms of mutant and control N proteins. FIGSs. 9B) Densitometry analysis of bands corresponding to oligomeric forms (timer and tetramer) was performed. Bar-plot represents the relative intensities from two independent experiments (as shown mean ± SD). FIGSs. 9C) Sketch of In vitro RNA immunoprecipitation (RIP) procedure used for analysis of viral RNA interaction with purified mutant and control N protein. Isolated RNAs were analyzed by RT-qPCR using specific primers for viral N gene (N1 and N2) and E gene. FIGSs. 9D) Bar chart shows level of viral RNA retrieval (% input) with mutant and control N protein (± SD from 4 experiments, [t-test, p value 0.00016 (***), 0.00019 (***), and 0.003 (**)]).

FIGS. 10A-E. Affinity mass spectrometry (AP-MS) analysis of mutant and control SARS-CoV-2 N protein and host protein interaction. FIGSs. 10A) Sketch showing the workflow of affinity mass spectrometry procedure. HEK-293 cell expressing 2XStrep-tagged control and mutant N protein were used for MagStrep affinity purification. Purified proteins were separated on SDS-PAGE and subjected to silver staining and western blotting for confirmation. After confirmation, interacting proteins were analyzed on mass spectrometry. FIGSs. 9B) (Upper) Silver staining of control and mutant N protein associated host proteins (1 &2 show two loading volume). (Lower) Western blot confirmation of N protein (mutant and control) using anti-Strep antibody. FIGSs. 9C) Correlation matrix of three replicates for control and mutant N protein AP-MS. FIGSs. 9D) The amount of total N protein detected by mass spectrometry in the control and mutant affinity. Graph shows log2 of protein intensities. FIGSs. 9E) Level of stress granule proteins (G3BP1 and G3BP2) identified in control and mutant condition.

FIG. 11 A. For severe and mild cases, each country is plotted according to the number of samples (y-axes) and the overall frequency of R203K/G204R SNPs (x-axes). FIG. 11B, C, E) Number of reference and R203K/G204R SNP samples shown for severe and mild cases for samples from Italy (FIG. 11B), India (FIG. 11C), and Maharashtra (FIG. 11E). Frequency of R203K/G204R SNP samples shown as numbers on bars. P values from two-sided Fisher’s exact test shown above bars, FIG. 11D) The number of severe and mild case samples (bars) for individual Indian states. States are sorted according to their R203K/G204R SNPs frequency (black line, right y-axis).

DETAILED DESCRIPTION OF THE INVENTION

The disclosed compositions and methods are based on studies in which 892 SARS-CoV-2 genomes collected from patients in Saudi Arabia were sequenced. The studies (as well as global data analysis) showed a clear association between patient mortality and two consecutive mutations (R203K/G204R) in the SARS-CoV-2 nucleoprotein (N). These mutations affect the oligomerization of N protein and its binding to viral RNA, as well as its interaction with host proteins. Furthermore, the mutations result in the phosphorylation of a nearby serine site (S206) in the N protein. I. COMPOSITION

The compositions include one of more compounds capable of inhibition of phosphorylation of the Nucleocapsid serine 206. In one embodiment, the compound is a phosphorylation of GSK3A inhibitor.

Exemplary GSK3A inhibitors include SAR502250, and AZD1080, tideglusib, 9-ING-41, etc.

AZDI 080 is a selective, orally active, brain permeable GSK3 inhibitor, inhibits human GSK3α and GSK3β with K i of 6.9 nM and 31 nM, respectively, shows > 14-fold selectivity against CDK2, CDK5, CDK1 and Erk2.

The GSK-3β inhibitor 9-ING-41 entered clinical trials in patients with advanced cancer (clinical trial no. NCT03678883). 9-ING-41 is a maleimide-based ATP-competitive small molecule GSK-3β inhibitor with high selectivity and low toxicity (Hillar, et al., Anticancer Drugs. 22:978- 985. 2011.; Gaisina, etal., J Med Chem. 52: 1853-1863. 2009).

II. METHODS

Methods for detecting one or more molecular markers in a sample containing SARS-CoV-2 viral DNA/protein have been developed. In some embodiments, the methods provide a diagnosis for patients as having been infected by SARS-CoV-2 variants that likely lead to severe COVID, based on the detection of one or more molecular markers in a sample of viral DNA/protein from a subject. In other embodiments, the methods identify a subject having SARS-COV-2 variants as being a candidate for effective treatment with one or more substances that are active in altering the amount or presence of one or more molecular markers associated with development of severe COVID. In further embodiments, the methods identify one or more substances that are active in altering the amount or presence of one or more molecular markers associated with development of severe COVID.

The identification of one or more molecular markers in viral DNA and proteins can be used to improve both the stratification of patients for treatment with therapies and the rational selection of agents effective in treating one or more symptoms of a coronavirus infection.

Methods for treating one or more symptoms of coronavirus infection in the subject in need thereof are described. Methods can also evaluate treatment efficacy for patients having coronavirus infection.

A. Methods for Identifying Subjects Predisposed to Developing Severe COVID

The methods can be used to identify a subject having an elevated risk of developing one or more symptoms associated with severe COVID-19, such as systemic inflammatory response syndrome, sepsis, or septic shock. Typically, the methods are based on determining that the subject has one or more viral markers that are known or are determined by the methods provided to increase risk for severe COVID.

In some embodiments, the molecular marker of SARS-CoV-2 variants that predispose the host to developing severe COVID includes an increased level of phosphorylation at Serine 206 (S206) in the Nucleocapsid (N) protein of the SARS-CoV-2 variant compared to a control sample. In preferred embodiments, the N protein with an increased level of phosphorylation at S206 also has one or more of amino acid substitutions of R203K and G204R. Exemplary control is a SARS-CoV-2 variant which has arginine at amino acid position 203 and glycine at amino acid position 204 in the N protein. In further embodiments, SARS-CoV-2 variants that predispose the host to developing severe COVID include one or more mutations of G28881A, G28882A, G28883C in the viral genome.

As shown the Examples below, these molecular markers of SARS- CoV-2 are associated with higher mortality rate, higher frequency of severe symptoms in patients, increased viral reolication, and/or protein expression, increased interactions with host cell factors such as cell cycle regulatory pathways regulating human and virus protein expression.

In some embodiments, the methods can be used to identify a subject having been infected by a SARS-CoV-2 variant displaying hyper- phosphorylation at Serine 206 (S206) in the N protein with R203K and/or G204R substitutions. The methods can involve steps of detecting and quantifying the level of phosphorylation at Serine 206 (S206) in the N protein, preferably with R203K and/or G204R substitutions, for example by mass spectrometry. In some embodiments, the level of phosphorylation at Serine 206 (S206) in the N protein with R203K and/or G204R substitutions is 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more than 10-fold increase compared to the level of phosphorylation at S206 in the N protein without R203K and/or G204R substitution. In some embodiments, the methods also include the steps of detecting and quantifying the level of phosphorylation at one or more additional phosphorylation sites of S2, S79, S176, and S180 of the N protein. In preferred embodiments, the level of phosphorylation at one or more of S2, S79, S176, and S180 in the N protein with R203K and/or G204R substitutions is similar to that in the N protein without R203K or G204R substitution.

In some embodiments, the methods can be used to identify a subject having been infected by a SARS-CoV-2 variant carrying one or more mutations of G28881A, G28882A, and G28883C. The methods involve steps of detecting one or more mutations of G28881A, G28882A, and G28883C of the SARS-CoV-2 genome, for example by gene sequencing or RT-PCT. In some embodiments, the methods can be used to identify a subject having been infected by a SARS-CoV-2 variant carrying one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein. The methods can involve steps of detecting one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein, for example by mass spectrometry.

In some embodiments, the subject is one who has one or more symptoms of COVID- 19, or is an asymptomatic subject who is at increased risk of being infected with SARS-CoV-2 virus, or a subject who has received a vaccine against infection with SARS-CoV-2 virus. A positive SARS-CoV- 2 viral test (z.e., reverse transcription polymerase chain reaction [RT-PCR] test or antigen test) or serologic (antibody) test can help assess for current or previous infection.

In some embodiments, the methods include screening one or more positive and/or negative controls. Exemplary positive controls include one or more RNA sequences encoding one or more molecular markers of SARS- CoV-2 variants that predispose the host to developing severe COVID, and an N protein carrying the amino acid substitutions and/or phosphorylation. Exemplary positive control RNA sequences include plasmids, or as cells expressing SARS-CoV-2 viruses, or DNA plasmids containing one or more mutations of G28881A, G28882A, and G28883C. Exemplary positive control protein is an N protein carrying R203K and/or G204R substitutions and/or the phosphorylation at S206. Exemplary negative controls include one or more RNA sequences specific for one or more distinct human respiratory pathogen such as influenza virus. i. Coronavirus and SARS-CoV-2

The coronaviruses (order Nidovirales, family Coronaviridae, and genus Coronavirus) are a diverse group of large, enveloped, positive- stranded RNA viruses that cause respiratory and enteric diseases in humans and other animals.

Coronaviruses typically have narrow host specificity and can cause severe disease in many animals, and several viruses, including infectious bronchitis virus, feline infectious peritonitis virus, and transmissible gastroenteritis virus, are significant veterinary pathogens. Human coronaviruses (HCoVs) are found in both group 1 (HCoV-229E) and group 2 (HCoV-OC43) and are historically responsible for -30% of mild upper respiratory tract illnesses.

At -30,000 nucleotides, their genome is the largest found in any of the RNA viruses. There are three groups of coronaviruses; groups 1 and 2 contain mammalian viruses, while group 3 contains only avian viruses. Within each group, coronaviruses are classified into distinct species by host range, antigenic relationships, and genomic organization. The genomic organization is typical of coronaviruses, with the characteristic gene order (5’-replicase [rep], spike [S], envelope [E], membrane [M], nucleocapsid [N]-3 ’) and short untranslated regions at both termini. The SARS-CoV rep gene, which includes approximately two-thirds of the genome, encodes two polyproteins (encoded by ORF la and ORF lb) that undergo co-translational proteolytic processing. There are four open reading frames (ORFs) downstream of rep that are predicted to encode the structural proteins, S, E, M, and N, which are common to all known coronaviruses.

In some embodiments, the methods screen for one or more molecular markers that predispose the host to developing severe COVID associated with SARS-CoV-2 betacoronavirus of the subgenus Sarbecovirus. SARS- CoV-2 viruses share approximately 79% genome sequence identity with the SARS-CoV virus identified in 2003. The genome organization of SARS- CoV-2 viruses is shared with other betacoronaviruses; six functional open reading frames (ORFs) are arranged in order from 5’ to 3’: replicase (ORFla/ORFlb), spike (S), envelope (E), membrane (M) and nucleocapsid (N). In addition, seven putative ORFs encoding accessory proteins are interspersed between the structural genes.

In some embodiments, the coronavirus is a variant of SARS-CoV-2, such as SARS-CoV-2 B. l.1.7 (Alpha variant), SARS-CoV-2 B. 1.351 (Beta variant), SARS-CoV-2 P.l (Gamma variant), SARS-CoV-2 B. 1.617, SARS- CoV-2 B. 1.617.1 (Kappa variant), SARS-CoV-2 B.1.621 (Mu variant), SARS-CoV-2 B.1.617.2 (Delta variant), SARS-CoV-2 B. l.617.3, and SARS-CoV-2 B. 1.1.529 (Omicron variant). ii. Symptoms of COVID-19

Patients with SARS-CoV-2 infection can experience a range of clinical manifestations, from no symptoms to critical illness. In general, adults with SARS-CoV-2 infection can be grouped into the following severity of illness categories; however, the criteria for each category may overlap or vary across clinical guidelines and clinical trials, and a patient’s clinical status may change overtime.

(i) Asymptomatic or pre-symptomatic infection: individuals who test positive for SARS-CoV-2 using a virologic test (i.e., a nucleic acid amplification test or an antigen test) but who have no symptoms that are consistent with COVID- 19. (ii) Mild illness: individuals who have any of the various signs and symptoms of COVID- 19 (e.g., fever, cough, sore throat, malaise, headache, muscle pain, nausea, vomiting, diarrhea, loss of taste and smell) but who do not have shortness of breath, dyspnea, or abnormal chest imaging.

(iii) Moderate Illness: Individuals who show evidence of lower respiratory disease during clinical assessment or imaging and who have an oxygen saturation (SpO2) >94% on room air at sea level.

(iv) Severe illness: individuals who have SpO2 <94% on room air at sea level, a ratio of arterial partial pressure of oxygen to fraction of inspired oxygen (PaO2/FiO2) <300 mm Hg, a respiratory rate >30 breaths/min, or lung infdtrates >50%. These patients may experience rapid clinical deterioration. Oxygen therapy should be administered immediately using a nasal cannula or a high-flow oxygen device. If secondary bacterial pneumonia or sepsis is suspected, administer empiric antibiotics, re-evaluate the patient daily, and de-escalate or stop antibiotics if there is no evidence of bacterial infection.

(v) Critical illness: individuals who have acute respiratory distress syndrome, septic shock that may represent virus-induced distributive shock, cardiac dysfunction, an exaggerated inflammatory response, and/or exacerbation of underlying comorbidities. In addition to pulmonary disease, patients with critical illness may also experience cardiac, hepatic, renal, central nervous system, or thrombotic disease.

Patients with certain underlying comorbidities are at a higher risk of progressing to severe COVID-19. These comorbidities include being aged >65 years; having cardiovascular disease, chronic lung disease, sickle cell disease, diabetes, cancer, obesity, or chronic kidney disease; being pregnant; being a cigarette smoker; being a transplant recipient; and receiving immunosuppressive therapy.

In some cases, patients with COVID-19 may have additional infections that are noted when they present for care or that develop during the course of treatment. These coinfections may complicate treatment and recovery. Older patients or those with certain comorbidities or immunocompromising conditions may be at higher risk for these infections. In some embodiments, the methods screen one or more molecular markers of SARS-CoV-2 variants that predispose the host to developing severe COVID. In further embodiments, the methods identify and diagnose patients having an elevated risk of developing one or more symptoms associated with severe COVID-19 as a result of SARS-CoV-2 infection. In these cases, the patients carrying these SARS-CoV-2 variants are likely to develop one or more symptoms associated with severe illness, critical illness, and additional complications. The disclosed methods can identify subject as at risk of developing severe COVID with a confidence level of at least a 50%, 60%, 70%, 80%, 90%, 95%, 97%, or 99%.

B. Methods for Risk Classification and Treatment Strategies

The disclosed methods are suitable for guiding risk classification and treatment strategies in subjects identified as predisposed to developing severe COVID. The methods improve the stratification of patients for treatment using antiviral and/or monoclonal therapies.

The disclosed methods for risk-stratifying, and/or selecting treatment options for subjects who are identified at a higher risk of progressing to severe COVID-19 involve steps of detecting and/or measuring the level of phosphorylation at Serine 206 (S206) in the N protein with R203K and/or G204R substitutions. The subject is identified as being suitable for receiving antiviral and/or monoclonal therapies if the subject carries SARS-CoV-2 variants having increased level of phosphorylation at Serine 206 (S206) in the N protein with R203K and/or G204R substitutions, for example by 2- fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more than 10-fold increase compared to the level of phosphorylation at S206 in the N protein without R203K and/or G204R substitution.

The methods can further involve steps of detecting the presence of one or more mutations of G28881A, G28882A, and G28883C of the viral genome, and/or one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein. The subject is identified as being suitable for receiving antiviral and/or monoclonal therapies if the subject carries SARS-CoV-2 variants having one or more mutations of G28881A, G28882A, and G28883C of the viral genome, and/or one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein. In some embodiments, the methods administer to the subject identified as suitable for receiving antiviral and/or monoclonal therapies an effective amount of antiviral and/or monoclonal therapies to treat or prevent one or more symptoms of coronavirus infection in the subject, for example, reducing or preventing one or more symptoms or physiological markers of severe acquired respiratory syndrome (SARS) in a subject. Exemplary symptoms of COVID- 19 include cough, fatigue, fever, body aches, headache, sore throat, loss or altered sense of taste and/or smell, vomiting, diarrhea, cytokine storm, skin changes, ocular complications, confusion, chronic neurological impairment, chest pain and shortness of breath. Therefore, in some embodiments, the methods prevent or reduce one or more of cough, fatigue, fever, body aches, headache, sore throat, loss or altered sense of taste and/or smell, vomiting, diarrhea, cytokine storm, skin changes, ocular complications, confusion, chronic neurological impairment, chest pain and shortness of breath.

1. Selection of Subjects for Treatment

In some embodiments, the methods select a subject as being likely to benefit from therapeutic treatment with one or more active agents for treating severe COVID. In some embodiments, the subject is identified as being likely to benefit from therapeutic treatment if the subject carries SARS-CoV- 2 variants having an increased level of phosphorylation at Serine 206 (S206) in the nucleocapsid (N) protein of the SARS-CoV-2 variant compared to a control. In preferred embodiments, the N protein with an increased level of phosphorylation at S206 also has one or more of amino acid substitutions of R203K and G204R. Exemplary control is a SARS-CoV-2 variant which has arginine at amino acid position 203 and glycine at amino acid position 204 in the N protein. In further embodiments, the subject is identified as being likely to benefit from therapeutic treatment if the subject carries SARS-CoV- 2 variants carrying one or more mutations of G28881A, G28882A, and G28883C of the viral genome, and/or one or more amino acid substitutions of R203K and G204R in the Nucleocapsid (N) protein.

2. Therapeutic Agents

Remdesivir (GS-5734), an inhibitor of the viral RNA-dependent, RNA polymerase with in vitro inhibitory activity against SARS-CoV-1 and the Middle East respiratory syndrome (MERS-CoV), was identified early as a promising therapeutic candidate for Covid- 19 because of its ability to inhibit SARS-CoV-2 in vitro. On October 22, 2020, the U.S. Food and Drug Administration (FDA) approved the antiviral drug VEKLURY ® (remdesivir) for use in adults and pediatric patients (12 years of age and older and weighing at least 40 kg) for the treatment of COVID-19 requiring hospitalization.

The FDA has issued an Emergency Use Authorization (EUA) on December 22, 2021 for Pfizer’s PAXLOVID™ (nirmatrelvir tablets and ritonavir tablets, co-packaged for oral use) for the treatment of mild-to- moderate coronavirus disease 2019 (COVID-19) in adults and pediatric patients (12 years of age and older weighing at least 40 kg) with positive results of direct severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2) viral testing, and who are at high risk for progression to severe COVID- 19, including hospitalization or death. PAXLOVID™ consists of nirmatrelvir, which inhibits a SARS-CoV-2 protein to stop the virus from replicating, and ritonavir, which slows down nirmatrelvir’ s breakdown to help it remain in the body for a longer period at higher concentrations. PAXLOVID™ is administered as three tablets (two tablets of nirmatrelvir and one tablet of ritonavir) taken together orally twice daily for five days, for a total of 30 tablets. PAXLOVID™ is not authorized for use for longer than five consecutive days.

Monoclonal antibody therapies remain available under EUA, including REGEN-COV® (casirivimab and imdevimab, administered together), and bamlanivimab and etesevimab, administered together.

FDA has authorized the emergency use of baricitinib to treat COVID- 19 in hospitalized adults and pediatric patients 2 years or older requiring supplemental oxygen, non-invasive or invasive mechanical ventilation, or extracorporeal membrane oxygenation (ECMO). According to a statement issued by the WHO on January 14, Baricitinib is recommended for treating patients suffering with severe or critical Covid- 19. On the other hand, Sotrovimab, a monoclonal antibody drug, is recommended for treating patients who have mild or moderate Covid- 19. Accordingly, in some embodiments, the methods involve the step of administering one or more of antiviral drugs such as remdesivir and PAXLOVID™, and monoclonal antibodies such as casirivimab, imdevimab, bamlanivimab, etesevimab, baricitinib, and sotrovimab, to a subject identified as having an elevated risk of developing one or more symptoms associated with severe COVID-19 based on the disclosed methods. The methods administer an effective amount of antiviral drugs and/or monoclonal antibodies to prevent, retard the development of, and/or treat one or more symptoms associated with coronavirus infection in the subject, for example, reducing or preventing one or more symptoms or physiological markers of severe acquired respiratory syndrome (SARS) in a subject.

3. Additional Therapies and Procedures

The antiviral drugs and/or monoclonal antibodies against SARS- CoV-2 can be administered alone or in combination with one or more additional therapies. In some embodiments, the combination therapy includes administration of one or more of the antiviral drugs and/or monoclonal antibodies in combination with one or more additional active agents. The combination therapies can include administration of the active agents together in the same admixture, or in separate admixtures. Therefore, in some embodiments, the pharmaceutical composition includes two, three, or more active agents. Such formulations typically include an effective amount of an agent targeting the site of treatment. The additional active agent(s) can have the same or different mechanisms of action. In some embodiments, the combination results in an additive effect on the treatment of the lung condition. In some embodiments, the combinations result in a more than additive effect on the treatment of the disease or disorder.

The additional therapy or procedure can be simultaneous or sequential with the administration of the dendrimer composition. In some embodiments, the additional therapy is performed between drug cycles or during a drug holiday that is part of the dosage regime. For example, in some embodiments, the additional therapy or procedure is damage control surgery, fluid resuscitation, blood transfusion, bronchoscopy, and/or drainage.

In some embodiments, the antiviral drugs and/or monoclonal antibodies are used in combination with oxygen therapy. In further embodiments, the additional therapy or procedure is prone positioning, recruitment maneuver, inhalation of NO, extracorporeal membrane oxygenation (ECMO), intubation, and/or inhalation of PGI2. A prone position enhances lung recruitment in a potentially recruitable lung by various mechanisms, releasing the diaphragm, decreasing the effect of heart and lung weight and shape on lung tissue, decreasing the lung compression by the abdomen, and releasing the lower lobes, which improves gas exchange and decreases mortality in severe ARDS patients. ECMO provides extracorporeal gas exchange with no effect on lung recruitment. It affords lung rest and works well for the non-recruitable lung. It has been shown to improve survival for certain groups of patients in high-performance ECMO centers.

In some embodiments, the compositions and methods are used prior to or in conjunction, subsequent to, or in alternation with treatment with one or more additional therapies or procedures

One or more additional therapeutic, diagnostic, and/or prophylactic agents may be used to treat inflammation in the lungs, and/or systemic inflammation resulting from COVID-19 induced pneumonia. Additional therapeutic agents can also include one or more of antibiotics, surfactant, corticosteroids, and glucocorticoids.

In some embodiments, the composition may contain one or more additional compounds to relief symptoms such as inflammation, or shortness of breath.

In some embodiments, one or more agents include bronchodilators, corticosteroids, methylxanthines, phosphodiesterase-4 inhibitors, anti- angiogenesis agents, antibiotics, antioxidants, anti-viral agents, anti-fungal agents, anti-inflammatory agents, immunosuppressant agents, and/or anti- allergic agents, are administered prior to, in conjunction with, subsequent to, or alternation with treatment with the disclosed antiviral drugs and/or monoclonal antibodies.

The amount of a second therapeutic generally depends on the severity of lung disorders to be treated. Specific dosages can be readily determined by those of skill in the art. See Ansel, Howard C. et al. Pharmaceutical Dosage Forms and Drug Delivery Systems (6 th ed.) Williams and Wilkins, Malver, PA (1995).

The additive drug may be present in its neutral form, or in the form of a pharmaceutically acceptable salt. In some cases, it may be desirable to prepare a formulation containing a salt of an active agent due to one or more of the salt's advantageous physical properties, such as enhanced stability or a desirable solubility or dissolution profile.

In some embodiments, the additional agent is a diagnostic agent imaging or otherwise assessing the site of application. Exemplary diagnostic agents include paramagnetic molecules, fluorescent compounds, magnetic molecules, and radionuclides, x-ray imaging agents, and contrast media. These may also be ligands or antibodies which are labelled with the foregoing or bind to labelled ligands or antibodies which are detectable by methods known to those skilled in the art.

In certain embodiments, the pharmaceutical composition contains one or more local anesthetics. Representative local anesthetics include tetracaine, lidocaine, amethocaine, proparacaine, lignocaine, and bupivacaine. In some embodiments, one or more additional agents, such as a hyaluronidase enzyme, is also added to the formulation to accelerate and improves dispersal of the local anesthetic.

1. Bronchodilators

In some embodiments, antiviral drugs are used in combination with one or more bronchodilators. Bronchodilators are a type of medication that helps open the airways to make breathing easier.

Short-acting bronchodilators in an emergency or as needed for quick relief. Some exemplary short-acting bronchodilators include anticholinergics such as ipratropium (e.g., ATROVENT®, in COMBIVENT®, in DUONEB®), beta2 -agonists such as albuterol (e.g., VOSPIRE ER®, in COMBIVENT®, in DUONEB®), and levalbuterol (e.g., XOPENEX®).

Long-acting bronchodilators are used to treat lung conditions over an extended period of time. They are usually taken once or twice daily over a long period of time, and they come as formulations for inhalers or nebulizers. Some exemplary long-acting bronchodilators include anticholinergics such as aclidinium (e.g., TUDORZA®). tiotropium (e.g., SPIRIVA®), or umeclidinium (e.g., INCRUSE ELLIPTA®), beta2 -agonists such as arformoterol (e.g., BROVANA®), formoterol (e.g., FORADIL®, PERFOROMIST®), indacaterol (e.g., ARCAPTA®), salmeterol (e.g., SEREVENT®), and olodaterol (e.g., STRIVERDI RESPIMAT®). ii. Corticosteroids

In some embodiments, antiviral drugs are used in combination with one or more corticosteroids. Corticosteroids help reduce inflammation in the body, making air flow easier to the lungs. There are several corticosteroids. Some are prescribed with bronchodilators because these two medications can work together to make breathing more effective. Fluticasone (e.g., FLOVENT®), budesonide (e.g., PULMICORT®), and prednisolone are the ones doctors commonly prescribe for COPD. iii. Methylxanthines

Methylxanthines are heterocyclic compounds that are methylated derivatives of xanthine comprising of coupled pyrimidinedione and imidazole rings (Talik et al., Separ. Purif. Rev. 2012;41: 1-61). Methylxanthines have been widely used for therapeutic purposes for decades, with proven therapeutic benefits in different medical scopes. For example, the naturally occurring methylxanthines like caffeine, theophylline, and theobromine have been used in the treatment of respiratory diseases (Lam and Newhouse, Chest, 1990;98:44-52), cardiovascular diseases, cancer (Hayashi et al., Anticancer Res. 2005;25:2399-2405; Kimura et al., J. Orthop. Sci. 2009;14:556-565) and the commercially produced xanthine derivative drug like pentoxifylline has been widely documented to have immunomodulatory properties.

In some embodiments, antiviral drugs are used in combination with one or more methylxanthines such as pentoxifylline and caffeine. Potential beneficial properties of methylxanthines like pentoxifylline and caffeine as an adjuvant therapy to treat COVID-19 patients have been suggested (Monji F et al., Eur J Pharmacol . 2020 Nov 15; 887: 173561). In these cases, theophylline (e.g., THEO-24®, THEOLAIR®, ELIXOPHYLLINE®, QUIBRON-T®, UNIPHYL®, and ELIXOPHYLLIN®), can be used, which works as an anti-inflammatory and/or antioxidant, and relaxes the muscles in the airway, to take along with a bronchodilator. Theophylline comes as a pill or a liquid to be taken on a daily basis, and/or combined with other medications. iv. Phosphodiesterase-4 Inhibitors

Benefits of roflumilast, a Phosphodiesterase-4 (PDE-4) inhibitor as a comprehensive support COVID- 19 pathogenesis has been described (Sugin Lal Jabaris S et al., Pulm Pharmacol Ther. 2021 Feb; 66: 101978). Roflumilast, a well-known anti-inflammatory and immunomodulatory drug, is protective against respiratory models of chemical and smoke induced lung damage. There is significant data which demonstrate the protective effect of PDE-4 inhibitor in respiratory viral models and is likely to be beneficial in combating COVID-19 pathogenesis.

In some embodiments, antiviral drugs and/or monoclonal antibodies thereof are used in combination with one or more phosphodiesterase -4 inhibitors. In some embodiments, the compositions help relieve inflammation and/or improve air flow to the lungs. Several PDE-4 inhibitors have been identified such as cilomilast, piclamilast, oglemilast, tetomilast, tofimilast, ronomilast, revamilast, UK-500,001, AWD 12-281, CDP840, CI- 1018, GSK256066, YM976, GS-5759 to treat chronic obstructive pulmonary disease (COPD) and asthma. CHF 6001, is an inhaled PDE-4 inhibitor currently undergoing phase II clinical trials for COPD. Also, two orally administered PDE-4 inhibitors such as roflumilast and apremilast have been approved in a row as treatments against inflammatory diseases including COPD, psoriasis, and psoriatic arthritis. v. Antimicrobial Agents

In some embodiments, antiviral drugs and/or monoclonal antibodies are used in combination with one or more antimicrobial agents. An antimicrobial agent is a substance that kills or inhibits the growth of microbes such as bacteria, fungi, viruses, or parasites. Antimicrobial agents include antiviral agents, antibacterial agents, antiparasitic agents, and anti- fungal agents. Representative antiviral agents include ganciclovir and acyclovir. Representative antibiotic agents include aminoglycosides such as streptomycin, amikacin, gentamicin, and tobramycin, ansamycins such as geldanamycin and herbimycin, carbacephems, carbapenems, cephalosporins, glycopeptides such as vancomycin, teicoplanin, and telavancin, lincosamides, lipopeptides such as daptomycin, macrolides such as azithromycin, clarithromycin, dirithromycin, and erythromycin, monobactams, nitrofurans, penicillins, polypeptides such as bacitracin, colistin and polymyxin B, quinolones, sulfonamides, and tetracyclines.

Other exemplary antimicrobial agents include iodine, silver compounds, moxifloxacin, ciprofloxacin, levofloxacin, cefazolin, tigecycline, gentamycin, ceftazidime, ofloxacin, gatifloxacin, amphotericin, voriconazole, natamycin. vi. Local Anesthetics

In some embodiments, antiviral drugs and/or monoclonal antibodies are used in combination with one or more local anesthetics. A local anesthetic is a substance that causes reversible local anesthesia and has the effect of loss of the sensation of pain. Non-limiting examples of local anesthetics include ambucaine, amolanone, amylocaine, benoxinate, benzocaine, betoxycaine, biphenamine, bupivacaine, butacaine, butamben, butanilicaine, butethamine, butoxycaine, carticaine, chloroprocaine, cocaethylene, cocaine, cyclomethycaine, dibucaine, dimethysoquin, dimethocaine, diperodon, dycyclonine, ecgonidine, ecgonine, ethyl chloride, etidocaine, beta-eucaine, euprocin, fenalcomine, formocaine, hexylcaine, hydroxytetracaine, isobutyl p-aminobenzoate, leucinocaine mesylate, levoxadrol, lidocaine, mepivacaine, meprylcaine, metabutoxycaine, methyl chloride, myrtecaine, naepaine, octacaine, orthocaine, oxethazaine, parethoxycaine, phenacaine, phenol, piperocaine, piridocaine, polidocanol, pramoxine, prilocaine, procaine, propanocaine, proparacaine, propipocaine, propoxycaine, psuedococaine, pyrrocaine, ropivacaine, salicyl alcohol, tetracaine, tolycaine, trimecaine, zolamine, and any combination thereof. In other aspects of this embodiment, the antiviral drugs and/or monoclonal antibodies include an anesthetic agent in an amount of, e.g., about 10 mg, about 50mg, about 100mg, about 200mg, or more than 200 mg. The concentration of local anesthetics in the compositions can be therapeutically effective meaning the concentration is adequate to provide a therapeutic benefit without inflicting harm to the patient. vii. Anti-inflammatory Agents

In some embodiments, antiviral drugs are used in combination with one or more anti-inflammatory agents. Anti-inflammatory agents reduce inflammation and include steroidal and non-steroidal drugs. Suitable steroidal active agents include glucocorticoids, progestins, mineralocorticoids, and corticosteroids. Other exemplary anti-inflammatory agents include triamcinolone acetonide, fluocinolone acetonide, prednisolone, dexamethasone, loteprendol, fluoromethoIone, ibuprofen, aspirin, and naproxen. Exemplary immune-modulating drugs include cyclosporine, tacrolimus, and rapamycin. Exemplary non-steroidal anti- inflammatory drugs (NSAIDs) include mefenamic acid, aspirin, diflunisal, salsalate, ibuprofen, naproxen, fenoprofen, ketoprofen, deacketoprofen, flurbiprofen, oxaprozin, loxoprofen, indomethacin, sulindac, etodolac, ketorolac, diclofenac, nabumetone, piroxicam, meloxicam, tenoxicam, droxicam, lomoxicam, isoxicam, meclofenamic acid, flufenamic acid, tolfenamic acid, elecoxib, rofecoxib, valdecoxib, parecoxib, lumiracoxib, etoricoxib, firocoxib, sulphonanilides, nimesulide, niflumic acid, and licofelone.

In some embodiments, anti-inflammatory agents are anti- inflammatory cytokines. Exemplary cytokines are IL-10, TGF-[3 and IL-35.

C. Methods for Monitoring and Evaluating Treatment

Efficacy

The methods include steps of monitoring one or more molecular markers of SARS-CoV-2 variants that predispose the host to developing severe COVID. In some embodiments, the methods include the steps of treating the subject for severe COVID-19, and then performing any of the disclosed methods to monitor progress of treatment, and/or to detect SARS- CoV-2 variants following the treatment.

In some embodiments, the method further includes discontinuing treatment of the subject if the percent phosphorylation at Serine 206 (S206) in the N protein is reduced, for example by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more than 90% compared to the level prior to treatment; if the SARS-CoV-2 variant carrying one or more mutations of G28881A, G28882A, G28883C is reduced in quantity, for example by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more than 90% compared to that before the treatment started; and/or if the SARS-CoV-2 variant carrying one or more mutations of R203K and G204R in the Nucleocapsid (N) protein is reduced in quantity, for example by 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more than 90%, compared to the level prior to treatment.

The methods can be used to test for presence of the SARS-CoV-2 variants over a period of time, or after the initial negative or positive read- out, for example, over a week, two weeks, three weeks, four weeks, a month, two months, three months, four months, five months, six months, about a year, two years, three years, four years, five years, more than five years.

D. Methods for Drug Screening

There is a lack of effective treatments available for severe COVID.

The systems and methods are useful to investigate the activity or applicability of one or more test compounds to treat or alleviate or prevent one or more symptoms of severe COVID-19. Therefore, in some embodiments, the methods include one or more steps for assessing phosphorylation status of Ser206 in the presence of one or more active agents, where phosphorylation status of Ser206 in the presence of the active agent is assessed by comparison with phosphorylation status of Ser206 in the absence of the active agent. In an exemplary embodiment, an active agent is selected if it is effective to reduce the phosphorylation of Ser206. A particularly preferred embodiment is exemplified below.

EXAMPLES

In this study, 892 SARS-CoV-2 genomes from nasopharyngeal swab samples of patients from the four main cities, Jeddah, Makkah, Madinah, and Riyadh, as well as a small number of patients from the Eastern region of Saudi Arabia were sequenced (Figure 1A-E). The genomes were analyzed to investigate the nucleotide changes and multiple mutation events that represent the first 6 months of the locally circulating pandemic lineages of the SARS-CoV-2 in Saudi Arabia and searched for potential association of polymorphic sites in the genome with available hospital records including severe disease and case fatality rates among the Covid- 19 patients. Phylogenetic analysis was performed to visualize the genetic diversity of SARS-CoV-2 and the nature of transmission lineages during March-August, 2020. A snapshot of the genomic variation landscapes of the SARS-CoV-2 lineages in this study population is presented herein, and linked specific set of mutation events in the N gene to the disease outcome in a diverse population of Covid- 19 patients in Saudi Arabia. Further support for this functional linkage was also found in global data sets. Finally experiments show the functional impact of these mutations in the N protein on the virus' life cycle and its interaction with the host.

Materials and Methods

Sample collection. As part of the study, nasopharyngeal swab samples were collected in 1 ml of TRIzol (Ambion, USA) from 892 COVID- 19 patients with various grades of clinical disease manifestations — consisting of severe, mild and asymptomatic symptoms. The anonymized samples were amassed from 8 hospitals and one quarantine hotel located in Madinah, Makkah, Jeddah, and Riyadh (data not shown). Patient metadata in the form of age, sex, comorbidities, ICU submission, and mortality were provided by the hospitals (data not shown) and used for statistical analysis. Ethical approvals were obtained from the Institutional review board of the Ministry of Health in the Makkah region with the numbers H-02-K-076-0420-285 and H-02-K-076-0320-279, as well as the Institutional review board of Dr. Sulaiman Al Habib Hospital number RC20.06.88 for samples from Riyadh and the Eastern regions respectively.

RNA Isolation.

RNA was extracted using the Direct-Zol RNA Miniprep kit (Zymo Research, USA) following the manufacturer’s instructions, along with several optimization steps to improve the quality and quantity of RNA from clinical samples. The optimization included extending the TRIzol incubation period, and the addition of chloroform during initial lysis step to obtain the aqueous RNA layer. The quality control of purified RNA was performed using Broad Range Qubit kit (Thermo Fisher, USA) and RNA 6000 Nano LabChip kit (Agilent, USA) respectively. RT-PCR was conducted using the one-step Super Script III with Platinum Taq DNA Polymerase (Thermo Fisher, USA) and TaqPath COVID-19 kit (Applied Biosystems, USA) on the QuantStudio 3 Real-Time PCR instrument (Applied Biosystems, USA) and 7900 HT ABI machine. The primers and probes used were targeting two regions in the nucleocapsid gene (N 1 and N2) in the viral genome following the Centre for Disease Control and prevention diagnostic panel, along with primers and probe for human RNase P gene (CDC; fda.gov/media/ 134922/download) (Table 1).

Table 1. Primers used in this study

Samples were considered COVID positive once the cycle threshold (Ct) values for both N 1 and N2 regions were less than 40. For amplicon seq purposes, the samples chosen were of Ct less than 35 to ensure successful genome assembly in order to upload on GISAID. Sequencing and data analysis. cDNA and amplicon libraries were prepared using the COVID-19 ARTIC-V3 protocol, producing ~ 400 bp amplicons tiling the viral genome using V3 nCoV-2019 primers (Wellcome Sanger Institute, UK; dx.doi.org/ 10.17504/protocols.io.beuzjex6). Amplicons were then processed for deep, pairedend sequencing with the Novaseq 6000 platform on the SP 2 x 250 bp flow cell type (Illumina, USA).

Genome assembly, SNP and indel calling.

Illumina adapters and low-quality sequences were trimmed using Trimmomatic (v0.38)63. Reads were mapped to SARS-CoV-2 Wuhan-Hu- 1 NCBI reference sequence NC_045512.2 using BWA (v0.7. 17)64. Mapped reads were processed using GATK (v4.1.7) pipeline commands MarkDuplicatesSpark, Haplotype Caller, VariantFiltration, SelectVariants, BaseRecalibrator, ApplyBQSR, and HaplotypeCaller to identify variants65. High quality SNPs were filtered using the filter expression: ReadPosRankSum < -8.0”

High quality Indels were filtered using the filter expression: Consensus sequences were generated by applying the good quality variants from GATK on the reference sequence using bcftools (v1.9) consensus command. Regions which are covered by less than 30 reads are masked in the final assembly with ‘N’s. Consensus assembly sequences were deposited to GISAID (data not shown).

To retrieve high-confidence SNPs assembled sequences were re- aligned against the Wuhan-Hu- 1 reference sequence (NC_045512.2), and only positions in the sample sequences with unambiguous bases in a 7- nucleotide window centered around the SNP position were kept for further analysis.

Phylogenetic analysis.

To generate the phylogeny of Saudi samples with a global context, a total of 308,012 global sequences were downloaded from GISAID on 31 December 2020, fdtered and processed using Nextstrain pipeline 12. Global sequences were grouped by country and sample collection month and 20 sequences per group were randomly sampled which resulted in 10,873 global representative sequences and 952 Saudi sequences. The phylogeny was constructed using IQTREE (v2.0.5)67, clades were assigned using Nextclade and internal node dates were inferred and sequences pruned using TreeTime (v0.7.5)68. Nextstrain protocol was followed for the above-mentioned steps. The resulting global phylogenetic tree was reduced to retain the branches that lead to Saudi leaf nodes and visualized using baltic library (https ://github .com/evogytis/baltic) .

Phylodynamic analysis.

Phylodynamic analyses use the same sequence subset used in the full phylogenetic analysis, extracted from the GISAID SARSCoV-2 database. Wrapper functions for the importation date estimates and skygrowth model are provided in the sarscov2 R package as ‘compute timports’ and ‘ skygrowth 1’ respectively (https ://github ,com/JorgensenD/sarscov2Rutils) .

Importation date estimates for Nextstrain clades.

Importation rate estimates were carried out using all available sequence data for Saudi Arabia deposited on GISAID 11 up to 31 December 2020, including the sequences described in this paper. Sequences were grouped by Nextstrain clade for analysis using the Nextstrain_clade parameter 12 in the GISAID metadata table. Additional international sequences were selected for each of the included Nextstrain clade based on Tamura Nei 93 distance with the C program tn93 (v1.0.6) (github.com/veg/tn93)70. Five hundred sequences were selected from available closely related sequences in a time stratified manner, taking every N/500th sequence from the set of N sequences arranged by date, rounded to the nearest integer.

For each Nextstrain clade a maximum-likelihood phylogeny was produced with IQTree (v1.6.12) with an HKY substitution model67,71. These trees were dated using the R package treedater (v0.5.0) after collapsing short branch lengths and resolving polytomies randomly fifteen times for each clade with the functions di2multi and multi2di from the ape R package (v5.5)72, 73. A strict molecular clock was used when estimating dated phylogenies, constrained between 0.0009 and 0.0015 substitutions per site per year74. The state of each internal node in the phylogeny was reconstructed by maximum parsimony with the R package phangom (v2.7.0)75. As this method cannot directly estimate the timing of an importation event, importations were estimated to occur at the midpoint of a branch along which a state change occurs between the internal and external samples.

The probability density of importation events over time into Saudi Arabia by cluster is presented in Fig. 2b where a gaussian kernel density estimator is used with a bandwidth of 5.51 chosen using the Silverman rule of thumb, implemented in the geom density function of the ggplot2 R package (v3.3.5).

MS analysis using Orbitrap Fusion Lumos.

The MS analysis was performed as described previously82,83 with slight modifications. Briefly, approximately 0.5 μg of peptide mixture in 0.1% formic acid (FA) was injected into a nanotrap (PepMap 100, C18, 75 pm x 20 mm, 3 μm particle size) and desalted for 5 min with 0.1% FA in water at a flow rate of 5 μl/min. They were then eluted and analyzed using an Orbitrap Fusion mass spectrometer (MS) (Lumos, Thermo Fisher Scientific) coupled with an UltiMate™ 3000 UHPLC (Thermo Scientific). The peptides were separated by an Easy Spray Cl 8 column (50 cm x 75 μm ID, PepMap C18, 2 pm particles, 100 A pore size, Thermo Scientific) with a 75-min gradient at constant 300 nL/min, at 40 °C. The electrospray potential was set at 1.9 kV, and the ion transfer tube temperature was set at 270 °C. A full MS scan with a mass range of 350-1500 m/z was acquired in the Orbitrap at a resolution of 60,000 (at 400 m/z) using the profile mode, a maximum ion accumulation time of 50 ms, and a target value of 2e5. The most intense ions that were above a 2e4 threshold and carried multiple positive charges (2-6) were selected for fragmentation (MS/MS) via higher energy collision dissociation (HCD) with normalized collision energy at 30% within the 2 s cycling time. The dynamic exclusion was 30 s. The MS2 was acquired with datatype as centroid at a resolution of 30,000. Protein identification analysis from the raw mass spectrometry data was performed using the Maxquant software (vl.5.3.30) as described in Zhang, et al., Sci data, 6:278 (2019). For phosphorylated peptides, we used Maxauant label-free quantification (LFQ)38. The analysis and quantification of phosphorylated peptides were performed according to published protocol84. The sample file naming format and the datasets associated with the MaxQuant analysis are represented in

Table 2.

Table 2. Datasets associated with the MaxQuant analysis : i i : i : i i i

Analysis of differential interaction.

First, the identified protein group data were corrected for non-specific background binding by removing all proteins detected in mock control (cells transfected with the plasmid vector without N gene) affinity mass spectrometry (data not shown). The normalized LFQ data were processed for statistical analysis on the LFQ-Analyst a web-based tool85 to performed pair-wise comparison between mutant and control N protein AP-MS data.

The significant differentially changed proteins between mutant and control conditions were identified. The threshold cutoff of adjusted p-value = 0.05, and Log foldchange 1 were used. Among the replicates, outliers were removed based on correlation and PCA analysis. The GO-enrichment analysis was performed on the LFQ-Analyst.

In vitro protein-RNA interaction (RIP) assay.

The in vitro interaction assay was performed using purified 2xStrep- tagged N protein (mutant and control) and total isolated RNA from patient swabs as mentioned above in the RNA isolation section. The 2xStrep-tagged (in bead-bound condition) N protein (mutant and control) was incubated with total RNA in reaction buffer (50mM Tris HC1 pH7.5, 150mM KC1, O.lmM EDTA, ImM DTT, 5% Glycerol, 0.02% NP40, 1 mM PMSF, 40 U RNase OUT™). After shaking incubation for 45 min, N proteins (mutant and control) were pulldown using Mag Strep beads on a magnetic separator with four washes (using the same reaction buffer). After the final wash, the RNA was isolated by adding Trizol using the Zymo direct-zol method. The isolated RNAs were analyzed by RTqPCR using specific viral N gene (N 1 and N2), E gene, S gene, and ORFlab region primers (Table 7). GraphPad Prism (v9.1.1) was used for analysis and graph generation.

RNA-sequencing and differential gene expression analysis.

Calu-3 cells were transfected with plasmids expressing the full-length N-control and N-mutant protein along with mock control. After 48-h cells were harvested in Trizol and total RNA was isolated using Zymo-RNA Direct-Zol kit (Zymo, USA) according to the manufacture’s instruction. The concentration of RNA was measured by Qubit (Invitrogen), and RNA integrity was determined by Bioanalyzer 2100 system (Agilent Technologies, CA, USA). The RNA was then subjected to library preparation using Ribozero-plus kit (Illumina). The libraries were sequenced on NovaSeq 6000 platform (Illumina, USA) with 150 bp paired-end reads. The raw reads from Calu-3 RNA-sequencing were processed and trimmed using trimmomatic and mapped to annotated ENSEMBL transcripts from the human genome (hg 19)87,88 using kallisto (v0.43.1). Differential expression analysis was performed after normalization using EdgeR integrated in the NetworkAnalyst90. GO biological process and pathway enrichment analyses on differentially expressed genes were performed using NetworkAnalyst.

Results

SNP calling and phylodynamics of SARS-CoV-2 samples

SARS-CoV-2 genomes from 892 patient samples were sequenced and assembled. This group includes two patients that had tested negative for COVID-19 at the hospitals, and 144 patients that were placed in quarantine and had either mild symptoms or were asymptomatic. The remaining patients were all hospitalized. Data on comorbidities were available for 689 patients with diabetes (39%) and hypertension (35%) being the most abundant. Patient outcome data was available for 850 samples, and 199 patients (23%) died during hospitalization).

From the 892 assembled viral genomes collected over a period of 6 months, a total of 836 single-nucleotide polymorphisms (SNPs) were identified, compared to the Wuhan SARS-CoV-2 reference (GenBank accession: NC_045512) (Figure IE- IF). Together with all sequences from Saudi Arabia available on GISAID on December 31 st 2020 the assembled sequences were used to construct effective population size and growth rate estimates of SARS-CoV2 over the course of the first wave of the epidemic. The observed numbers of SNPs relative to the reference sequence are in general lower than the numbers observed in global samples, but with the exception of a period from mid-June to late July, the average number of SNPs in Saudi samples is within one standard deviation of samples deposited in GISAID (FIG. 1G). 41 indels were detected, of which 26 reside in coding regions (Table 3).

Table 3. Detected Indels

1

Most indels were specific to a single sample, and no identical indel was found in more than four samples. Compared with global SNP data, seven SNPs were found in higher frequencies (absolute difference > 0.1) in samples from Saudi Arabia (FIG. 1E-F). These include the Spike protein D614G (A23403G) and three consecutive SNPs (G28881A, G28882A, and G28883C) causing the R203K and G204R changes in the nucleocapsid protein. Together with all sequences from Saudi Arabia available on GISAID on December 31st 2020, the assembled sequences were used to construct the effective population size and growth rate estimates of SARS-CoV2 over the course of the first wave of the epidemic. The skygrowth model (Figure ID) shows a downward trend in the effective reproduction number (R(t)) over time with the timely introduction and maintenance of effective non- pharmaceutical interventions by the Saudi Ministry of Health. Despite the bounds of R(t) estimate including one for much of the study period, we can infer an estimated decrease in R(t) over time until the lifting of restrictions in late June (Fig. ID). By investigating the 1.6 million individual traces from the Skygrowth analysis a date of 27th April 2020 ca be infered as the first point where over 50% of collected traces resulted in an R(t) estimate below one, suggesting an epidemic in decline for the first time.

The effective population size (N e ) represents the relative diversity of the sequences collected in Saudi Arabia over the course of the outbreak, suggesting that diversity peaked at the start of June, ahead of the peak in cases seen in the country in mid-June (Figure ID). This aligns with the peak number of reported cases in the regions, which contribute the most to the included genetic sequences (Riyadh, June 16th; Jeddah, May 30th; Makkah, June 6th; Madinah May 6th).

A maximum-likelihood phylogenetic analysis revealed that samples from Saudi Arabia represent 5 major Nextstrain clades 10 , 19A-B & 20A-C (Figure 2A), and through time-scaled phylogenies dates of importation events were estimated for each clade. The majority of importations for all clades were inferred to have occurred early in the outbreak, primarily in March and early April (Figure 2B). Based on available genetic data, clade 19B was likely the first introduced into Saudi Arabia with the most likely importation date for this clade falling in February. The country of origin behind importation events was further estimated, suggesting multiple imports from countries across Asia, Europe, Asia, Oceania, America and Africa The phylogeny highlighted the clade 20A that all carried the nucleocapsid (N) protein R203K/G204R mutations 18 with high incidences of ICU hospitalizations. These samples were predominantly coming from Jeddah. Through timescaled phylogenies dates of importation events were then estimated for each clade.

Origin of R203K/G204R SNPs.

A dated phylogeny of global samples showed that samples with the R203K/G204R SNPs are predominantly found in Nextstrain clades 20A, 20B, and 20C, and do not form a monophyletic group (not known). Furthermore, a few samples are further found in the early appearing 19A and 19B clades. However, due to the limited number of mutations separating SARS-CoV-2 genomes constructing a reliable and robust phylogeny is problematic 19, and while different clades may be well supported, the exact relationship between clades is often less easily resolved. Although phylogenetic trees of SARS-CoV-2 genomes may appear to robustly reflect transmission events, collapsing branches with low support will typically result in extensive polytomies. Additionally, the placement of individual virus genomes may be hampered by systematic errors, homoplasies, potential recombination, or coinfection of multiple virus strains. It is therefore not clear if the phylogenetic distribution of samples with R203K/ G204R SNPs reflects multiple independent origins of the SNPs, although it is evident that the R203K/G204R SNPs appeared early in the pandemic spread (data not shown). Within the sampling window an apparent transient increase in the frequency of R203K/G204R SNPs (data not shown) was observed in accordance with earlier observations. In the global data, the peak in R203K/ G204R frequency is slightly delayed compared to samples from Saudi Arabia. The initial global peak was observed in July 2020 followed by a decline until the fall of 2020, where the R203K/ G204R SNPs once again increased along with the Spike protein Y50 IN mutation in the B 1. 1.17 lineage 17 (data not shown) . A mutant form of the nucleocapsid (N) protein associated.

A mutant form of the Nucleocapsid (N) protein associated with patient mortality

A genome-wide association study between SARS-CoV-2 SNPs and patient mortality highlighted these three consecutive SNPs (G28881A, G28882A, G28883C) underlying the mutations R203K/G204R 13 (Figure 3 A). Of the 892 assembled genomes, 882 (98.9%) genomes either have the three reference alleles, GGG, or the three mutant alleles, AAC, at positions 28,881-28,883. This is similarly found in global samples deposited in GIASID in 2020, where 99.7% of samples with SNPs at positions 28,881- 28,883 contain all three SNPs (data not shown). In our samples, no other SNPs co-occur with the R203K/ G204R SNPs (FIG. 5A-5B). The frequency of the R203K/G204R SNPs is markedly higher in samples from Jeddah, where the observed frequency of 0.38 is more than 10-fold higher than the average of the other cities (data not shown). No other SNPs show similar levels of association with mortality (Figure 3 A), and consistent with this, no other SNPs were identified co-occurring with the R203K/G204R SNPs (Figure 5A-B).

Using multivariable regression, next studies evaluated the effect of the R203K/G204R SNPs on mortality, severity, and viral load in our COVID- 19 patient samples for which a limited amount of clinical meta- datasets were available. Disease severity was defined as deceased patients and patients admitted to ICU.

For mortality and severity, we first fitted a logistic linear model using R203K/G204R SNPs as a covariate and adjusting by sex, age, comorbidities, hospital, and other SNPs. 12 additional SNPs (C241T, C1191T, C3037T, G10427A, C14408T, C15352T, C18877T, A23403G, G25563T, C26735T, T27484C, and C28139T) that co-occurred with the R203K/G204R SNPs in at least five samples were included in the model. The A23403G mutation results in the Spike protein D614G SNP that is associated with higher viral load. Then, to adjust for confounding effects from time, models that included time were also included. In the models, age and time were included using smoothing splines to allow for potential nonlinear relationships.

Using first a logistic regression model that did not include time, a positive and statistically significant association was observed between R203K/G204R SNPs and severity. Specifically, the log-odds of severity increased by 1.18, 95% CI 0.22-2.13 (). A positive significant association was also observed for the C14408T SNP, and a negative association for the C241T SNP (data not shown).

In the time-adjusted model, the log-odds for the the R203K/G204R SNPs increased to 1.38, 95% CI 0.28-2.48 (Table 4).

Table 4. Regression table for the association between R203K/G204R mutation and severity (ICU or deceased) adjusting for other mutations, sex, comorbidities, hospital and age. 41

Note: For each of the parameters, the table shows the regression coefficient estimate, standard error, and 95% confidence interval, and the two-sided z test and p -value testing the null hypothesis that the coefficient is equal to 0.

5 In this model, the C241T SNP again displayed a significant negative association, and a positive association was now observed for the C1887T SNP (Table 5).

Table 5. Regression table for the association between R203K/G204R mutations and severity (ICU or deceased) adjusting for 10 other mutations, sex, comorbidities, hospital, age and time.

Note: For each of the parameters, the table shows the regression coefficient estimate, standard error, and 95% confidence interval, and the two-sided z test and p-value testing the null hypothesis that the coefficient is equal to 0.

The relationship between mortality and R203K/G204R SNPs was positive and statistically significant in the model that did not include time with log- odds equal to 1.04, 95% 0.16-1.92. No significant association was observed for other SNPs (Table 6).

Table 6. Regression table for the association between

R203K/G204R mutations and mortality (deceased) adjusting for other mutations, sex, comorbidities, hospital and age.

Note: For each of the parameters, the table shows the regression coefficient estimate, standard error, and 95% confidence interval, and the two-sided z test and p-value testing the null hypothesis that the coefficient is equal to 0.

However, after adjusting for time as a variable, there was no longer any association between R203K/G204R SNPs and mortality (log -odds: 0.58,

95% CI -0.41-1.56), indicating a temporal component in the observed association (Table 7).

Table 7. Regression table for the association between

R203K/G204R mutations and mortality (deceased) adjusting for other mutations, sex, comorbidities, hospital and age.

Note: For each of the parameters, the table shows the regression coefficient estimate, standard error, and 95% confidence interval, and the two-sided z test and p-value testing the null hypothesis that the coefficient is equal to 0.

The mortality rate for samples with the R203K/G204R SNPs is 0.49 compared to 0. 19 for samples without the SNPs (Figure 3B). The significantly increased mortality was observed for both male and female patients separately (data not shown). Of the 124 samples with the

R203K/G204R SNPs (one sample did not have patient outcome information and is excluded from Figure 3B), 101 samples are from Jeddah. Mortality among Jeddah patients (0.40) is vastly higher than among patients from all other cities (0.16) (data not shown). The observed mortality rate among samples R203K/G204R SNPs is higher when only considering samples from

Jeddah (0.55 versus 0.30) and samples from outside Jeddah (0.23 versus

0.16), although this difference is only significant for the former (data not shown). To further assess the association between with the R203K/G204R

SNPs and mortality, genotype information at positions 28,881-28,883 from samples for which no complete genome could be assembled and from samples genotyped by Sanger sequencing were included. The addition of 304 such samples confirmed the association between these SNPs and patient mortality (Fisher's exact, p= 4.7 x 10 -10 ; log2 odds ratio: 1.58) (data not shown). No apparent difference between patient ages is observed between samples with the R203K/G204R SNPs and the remaining samples, and the excess mortality is generally observed across the entire range of patient ages (Figure 3C). The association between R203K/G204R SNPs and mortality is still observed after correcting for comorbidities. Consistent with their association to disease severity, the R203K/G204R SNPs are significantly less prevalent in samples from patients in quarantine compared to hospitalized patients, and correspondingly more abundant in samples from patients submitted to ICU.

Toyoshima et al. 14 previously found no correlation between the incidence of these SNPs in individual countries and their mortality rate, although the SNPs are prominent in individual countries with high mortality rates such as Italy, England, and Wales 15 . When observing mortality rates for different patient nationalities, no apparent differences between nationalities are identified (Figure 6A-D), and updating the approach of Toyoshima et al. 14 reproduce their previous findings (Figure 6E).

A time-scaled phylogenetic approach suggest that the R203K/G204R SNPs originated late January 2020, although the earliest sampled genome with the SNPs are only available from February 23rd (data now shown). The phylogenetic distribution of viral genomes harboring the R203K/G204R SNPs implies that the SNPs may have originated independently at least twice during the pandemic (data not shown).

Within the sampling window a transient increase in the frequency of R203K/G204R SNPs was observed (Figure 3D) in accordance with earlier observations 13,16 . This is further underlined by global data showing a clear increase of R203K/G204R in the first half of 2020 reaching a frequency of 0.75, followed by a similar decrease in the second half of the year (Figure 3E). Although R203K/G204R frequencies vary, the overall trend is generally observed across all continents (Figure 7).

Subsequent studies sought to test the association between mortality and the R203K/G204R SNPs on a global scale and collected 17,261 non- Saudi samples with patient metadata from GISAID (December 31st, 2020). The reporting format is highly non-standardized, therefore, a manually curated list of terms reflecting two different disease outcomes: severe cases (deceased patients, critically ill patients, and cases submitted to ICU) and mild cases. This reduced the available samples to 1,419. Similar to our observations from Saudi samples, studies show that the samples from severe cases display a significantly higher frequency of R203K/G204R SNPs compared to mild cases (Figure 3G). The available metadata are far from ideal for this type of analysis and numbers should be interpreted with great caution, but nevertheless this observation is considered in independent support of our findings among samples from Saudi Arabia.

The cycle threshold (Ct) values obtained through quantitative PCRs can be used as a proxy for viral load and even a predictor of clinical outcome 17 . Earlier, a non-synonymous SNP in the Spike (S) protein, D614G, was found to be associated with higher viral load 18-20 . A significantly higher viral copy number was found in samples with the either D614G and R203K/G204R SNPs, as well as samples with both SNPs, than in samples with the Wuhan reference alleles (Figure 8).

Subsequent studies tested if R203K/G204R SNPs were associated with higher viral copy numbers as indicated by the cycle threshold (Ct) values obtained through quantitative PCRs. As two different kits were used for the qPCR reactions (see Methods), we fitted adjusted models were fitted that besides sex, age, comorbidities, hospital, and time, included qPCR kits and the above-mentioned SNPs as covariates. From this adjusted regression we found a positive and statistically significant relationship between R203K/G204R SNPs and Log10(viral copy number), with the mean of log10(viral copy number) values increasing by 1.33 units (95% CI 0.72- 1.93) (Table 8). Table 8. Regression table for the association between

R203K/G204R mutations and viral load (copynumber Nl) adjusting for qPCRkit, other mutations, sex, comorbidities, hospital, age and time.

For each of the parameters, the table shows the regression coefficient estimate, standard error, and 95% confidence interval, and the two-sided t test and p-value testing the null hypothesis that the coefficient is equal to 0.

Similarly, the model showed a positive significant association between the SNPs A23403G (Spike protein D614G) and C26735T SNPs and

Iogl0(viral copy number), the former being consistent with earlier reports 13,30. A significant negative association was found for the C3037T

C14408T, and G25563T SNPs (6). The positive and statistically significant association of R203K/G204R SNPs with higher viral load in critical COVID-

19 patients (Fig. 2D) hence suggests their functional implications during viral infection. N mutant protein has high oligomerization potential and RNA binding affinity

The SARS-CoV-2 N protein binds the viral RNA genome and is central to viral replication. Protein structure predictions have shown that the R203K/G204R mutations result in significantly changes in protein structure 16 , theoretically destabilizing the N structure 22 , and potentially enhancing the protein's ability to bind RNA and alter its response to serine phosphorylation events 23 . The R203K/G204R mutations in the SARS-CoV-2 N protein are within the linkage region (LKR) containing the serine/arginine-rich motif (SR-rich motif) (Figure 4A), known to be involved in the oligomerization of N proteins 24,25 . Protein cross-linking shows that N mutant protein (with the R203K/G204R mutations) has higher oligomerization potential compared to the control N protein (without the changed amino acids) at low protein concentration (Figure 9A-B). However, both mutant and control N protein formed dimers with similar efficiency (Figure 9A).

Given that the oligomerization of N protein acts as a platform for viral RNA interactions 26 , further studies sought to examine the binding affinity of mutant and control N protein with viral RNA isolated from COVID-19 patient swabs. The RNA-binding activity of mutant and control N proteins was examined by pulled-down viral RNA through in vitro RIP assay (Figure 9C), and the data revealed that mutant N protein enriched significantly higher level of viral RNA compared to control protein (Figure 9D). This indicates a strong binding capability of mutant N proteins with viral RNA, which could potentially impact the essential roles of N protein at various stages of viral life cycle and its interaction with the host.

The R203K/G204R mutations in the N protein affect its interaction with host proteins

According to the SIFT tool 21 , a substitution at position 204 from G to R in the N protein is predicted to affect functional features (Figure 4A). Therefore, additional studies investigated how the two amino acids substitution (R203K and G204R) in the N protein impact its functional interaction with the host that could modulate viral pathogenesis and rewiring of host cell pathways and processes. HEK-293T cells (3 biological replicates,) were used for affinity-purification followed by mass spectrometry analysis (AP-MS) to identify host proteins associated with control and mutant N protein (Figure 10A-E). Protein identification was performed using MaxQuant software. The compilation of the identified protein groups in mock and N protein (N control and mutant). The majority (62%) of previously reported N protein interacting partners overlapped with the identified unadjusted proteins list (FIG. 10A-E).43 human proteins were that displayed significant (adjusted p-value < 0.05, and Log fold change > 1) differential interactions with mutant and control N protein (Figure 4B-C). Among these, 42 proteins showed increased interaction and and one protein (PRPF 19) showed decreased interaction with the N mutant (Figure 4B-C). The total level of mutant and control N protein remained unchanged (Figure 10D). These studies indicated that the previously reported 27 stress granule proteins G3BP1 and G3BP2 showed comparable interaction in mutant and control conditions (Figure 10E). Among the group with increased interaction, many proteins associated with TOR and other signaling pathways (such as AKT1S1, GSK3A, and PIN1), proteins associated with the viral process, viral transcription, and negative regulation of RNA nuclear export (NUP98 and NUP153), and proteins involved in apoptotic and cell death processes (PAWR, ACINI, and PDCD5) were identified (Figure 4B- C). Proteins in the mutant condition that are linked with the immune system processes (PTMS), kinase activity (GCN1), and translation (e.g. MRPS36) were also identified (Figure 4B-C). In the group with decreased interaction, SNIP1 (NF-kappaB signaling), TMA16 (translation), and CSNK2B (casein kinase II) (Figure 4B-C). Gene ontology analysis showed that the most enriched biological processes are associated with negative regulation of tRNA and ribosomal subunit export from the nucleus (Figure 4D). This finding suggests that the mutant virus can efficiently inhibit and hijack the host translation to facilitate viral replication and pathogenesis. In reactome enrichment analysis, pathways associated with the sumoylation of host proteins and antiviral mechanisms were identified (Figure 4D). Further, many viruses can manipulate the host sumoylation process to enhance viral survival and pathogenesis 28 . Serine 206 (S206) displays hyper-phosphorylation in the mutant N protein

In SARS-CoV, it has been shown that phosphorylation of N protein is more prevalent during viral transcription and replication 29 and inhibition of phosphorylation diminishes viral titer and cytopathogenic effects 30 . Recent elegant studies elaborated the role of N protein phosphorylation in modulating RNA binding and phase separation in SARS-CoV-2 26,31-33 . Thus, phosphorylation of N protein in the LKR region is critical for regulating both viral genome processing (transcription and replication) and nucleocapsid assembly 26,31 To further understand the functional relevance of KR mutation in the N protein, phosphoproteomic analysis were performed in control and mutant conditions. The studies consistently found that the serine 206 (S206) site, which is next to the KR mutation site (Figure 4E), is highly phosphorylated, specifically in the mutant N protein (Figure 4F; and data not shown). Notably, the total amount of N protein (Figure 10D) and the phosphorylation at serine 2 (S2) and other sites (S79, S176, and S180) in the LKR region did not change between mutant and control conditions (Figure 4F).

The N mutant (R203K/G204R) induces overexpression of interferon- related genes in transfected host cells. To understand whether the R203K/G204R mutations in the N gene affect host cell transcriptome, we transfected Calu-3 cells (4 biological replicates) with plasmids expressing the full-length N-control and N-mutant protein along with mock-transfection control. The transcriptome profde of N-mutant and N-control transfected cells displays a distinct pattern from the mock control (data not shown). 144 and 153 differentially expressed (DE) genes were identified in the N-control and N-mutant transfected cells, respectively, with adjusted p-value < 0.05 and log2 fold-change ≥1 (data not shown). Among the DE genes, numerous interferon, cytokine, and immune-related genes are up-regulated. A robust overexpression of interferon-related genes was found in the N-mutant compared to N-control transfected cells (data not shown) after adjusting for fold-change (data not shown). Indeed, strong overexpression of interferon and chemokine-related genes (data nor shown) were reported in critical COVID- 19 patients. Recent reports further indicate a link between increased expression of interferon-related genes and higher viral load in severe COVID-19 patients. Also, overexpression of other genes such as ACE2, STAT 147, and TMPRSS 1351 were found (data not shown) that are elevated in critical COVID-19 disease.

Pathway enrichment analysis (top 15 pathways based on p-value and FDR) of the up-regulated genes (data not shown) shows an overrepresentation of biological processes associated with response to the virus (and data not shown). Similarly, all DE genes were related to substantially enriched pathways, such as interferonrelated response, cytokine production, and viral reproductive processes (data not shown). The enriched GO terms display an interconnected network highlighting the relationships between up-regulated overlapping genes sets in these pathways (data not shown). Taken together, these results suggest that the R203K/G204R mutations in the N protein may enhance its function in provoking a hyper- expression of interferon-related genes that contribute to the cytokine storm in exacerbating COVID-19 pathogenesis.

Discussion

From 892 samples collected across the country over the course of approximately 6 months the dynamics of transmission and diversity of SARS-CoV-2 in Saudi Arabia was analyzed. The lineage analysis of assembled genomes highlights the repeated influx of SARS-CoV-2 lineages into the Kingdom through international travels.

The detailed patient data allowed the detection of three SNPs - underlying the N protein R203K and G204R mutations - associated with significantly associated with higher viral load. It is worth noting that two studies have found higher viral load has in infected patients to be associated with severity and mortality (Fajnzylber, et al., Nat. Commun. 11, 5493 (2020) and . In publicly available global samples with relevant patient information these SNPs were found to be similarly are associated with increased mortality. These findings thus strongly suggest that the R203K and G204R mutations in the N protein play a role in the severity of the COVID- 19 disease not only in Saudi Arabia but also supported by sparsely available global datasets with relevant clinical outcome and mortality data. The trade-off model for virulence evolution - although challenged by certain empirical observations 34,35 - implies that higher virulence comes at a cost for the virus reproduction rate if not counteracted by changes in the recovery or the transmission rates 36,37 . In this respect, the decrease in the frequency of R203K and G204R mutations during the late half of 2020 (Figure 3F), could suggest that the associated mortality of the mutations negatively affects their associated reproduction rate. This is in contrast to the Spike protein mutation D614G, the B.1.1.7 lineage, and the 501.V2 variant that - although more transmissible - are apparently not associated with higher virulence and subsequently increase in the global population 18,20,38,39 .

The N protein of SARS-CoV-2, a highly abundant structural protein within the infected cells, serves multiple functions during viral infection, which besides RNA binding, oligomerization, and genome packaging, playing essential roles in viral transcription, replication, and translation 40,41 . Also, the N protein can evade immune response and perturbs other host cellular processes such as translation, cell cycle, TGF[3 signaling, and induction of apoptosis 42 to enhance virus survival. The critical functional regulatory hub within the N protein is a conserved serine-arginine (SR) rich- linker region (LKR), which is involved in RNA and protein binding 43 , oligomerization 24,25 , and phospho-regulation 26,31

The data shows that the mutant N protein containing R203K and G204R changes has higher oligomerization and stronger viral RNA binding ability, suggesting a potential link of these mutations with efficient viral genome packaging. The R203K and G204R mutations are in close proximity to the recently reported RNA-mediated phase separation domain (aa 210- 246) 33 that is involved in viral RNA packaging through phase separation. This domain was thought to enhance phase-separation also through protein- protein interactions 33 .

Moreover, the functional activities of the N protein at different stages of viral life cycle are regulated by phosphorylation-dependent physiochemical changes in the LKR region 31 . Although all individual phosphorylation sites may not be functionally important 23,44 , the specific enhancement of phosphorylation at serine 206 in the mutant N protein shown in this study suggests a functional sienificance. The serine 206 can form a phosphorylation-dependent binding site for protein 14-3-3, involved in cell cycle regulatory pathways regulating human and virus protein expression 45 . Multiple lines of evidence show that N protein phosphorylation is critical for its dynamic localization and function at replication-transcription complexes (RTC), where it promotes viral RNA transcription and translation by recruiting cellular factors 29-31 . 46-49 . The enrichment of glycogen synthase kinase 3 A (GSK3 A) with the mutant N protein, could specifically phosphorylate serine 206 in the R203K/G204R mutation background. GSK3 was shown to be a key regulator of SARS-CoV replication due to its ability to phosphorylate N protein 30 . Phosphorylation of serine 206 acts as priming site for initiating a cascade of GSK-3 phosphorylation events 30,31 Also, GSK3 inhibition dramatically reduces the production of viral particles and the cytopathic effect in SARS-CoV-infected cells 30 .

In conclusion, the results presented herein highlight the influence of the R203K/G204R mutations on the essential properties and phosphorylation status of SARS-CoV-2 N protein that lead to increased efficacy of viral infection, potentially underlying the observed rise in mortality observed during these genome analysis.

References

1 Organization, W. H. Coronavirus disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update, <www.who.int/emergencies/diseases/novel-coronavirus- 2019/situation-reports> (2020).

2 Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 20, 533-534, doi: 10.1016/S1473-3099(20)30120-1 (2020).

3 Center, J. H. U. M. C. R. COVID-19 Dashboard, <https ://coronavirus jhu.edu/map ,html> (2020) .

4 Ebrahim, S. H. & Memish, Z. A. COVID-19: preparing for superspreader potential among Umrah pilgrims to Saudi Arabia. Lancet 395, e48, doi: 10.1016/S0140-6736(20)30466-9 (2020).

5 Memish, Z. A., Aljerian, N. & Ebrahim, S. H. Tale of three seeding patterns of SARS-CoV-2 in Saudi Arabia. Lancet Infect Dis, doi: 10. 1016/S1473-3099(20)30425-4 (2020).

6 Tuite, A. R. et al. Estimation of Coronavirus Disease 2019 (COVID- 19) Burden and Potential for International Dissemination of Infection From Iran. Ann Intern Med 172, 699-701, doi: 10.7326/M20-0696 (2020).

7 News, A. Saudi Arabia announces first case of coronavirus, <https://www.arabnews.com/node/163578 l/saudi-arabia> (2020).

8 Gussow, A. B. et al. Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses. Proceedings of the National Academy of Sciences of the United States of America 117, 15193-15199, doi: 10.1073/pnas.2008176117 (2020).

9 Lu, J. et al. Genomic Epidemiology of SARS-CoV-2 in Guangdong Province, China. Cell 181, 997-1003 el009, doi: 10.1016/j.cell.2020.04.023 (2020).

10 Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121-4123, doi: 10. 1093/bioinformatics/bty407 (2018).

11 Rambaut, A. et al. A dynamic nomenclature proposal for SARS- CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 5, 1403-1407, doi: 10.1038/s41564-020-0770-5 (2020).

12 Boni, M. F. et al. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID- 19 pandemic. Nat Microbiol 5, 1408-1417, doi: 10.1038/s41564-020-0771-4 (2020).

13 Leary, 8. et al. Three adjacent nucleotide changes spanning two residues in SARS-CoV-2 nucleoprotein: possible homologous recombination from the transcription-regulating sequence. bioRxiv, 2020.2004.2010.029454, doi: 10.1101/2020.04.10.029454 (2020).

14 Toyoshima, Y., Nemoto, K., Matsumoto, S., Nakamura, Y. & Kiyotani, K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J Hum Genet 65, 1075-1082, doi: 10.1038/sl0038-020-0808-9 (2020).

15 Singh, J., Singh, H., Hasnain, S. E. & Rahman, S. A. Mutational signatures in countries affected by SARS-CoV-2: Implications in host-pathogen interactome. bioRxiv, 2020.2009.2017.301614, doi: 10.1101/2020.09. 17.301614 (2020).

16 Wu, S. et al. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein-protein interactions. J Med Virol, doi: 10.1002/jmv.26597 (2020).

17 Rao, S. N., Manissero, D., Steele, V. R. & Pareja, J. A Systematic Review of the Clinical Utility of Cycle Threshold Values in the Context of COVID-19. Infect Dis Ther 9, 573-586, doi: 10.1007/s40121-020-00324-3 (2020).

18 Korber, B. et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell 182, 812-827 e819, doi: 10.1016/j.cell.2020.06.043 (2020).

19 Lorenzo-Redondo, R. et al. A Unique Clade of SARS-CoV-2 Viruses is Associated with Lower Viral Loads in Patient Upper Airways. medRxiv, 2020.2005.2019.20107144, doi: 10.1101/2020.05. 19.20107144 (2020).

20 Volz, E. et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 184, 64-75 el l, doi: 10.1016/j.cell.2020.11.020 (2021).

21 Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812-3814, doi: 10.1093/nar/gkg509 (2003).

22 Rahman, M. S. et al. Evolutionary dynamics of SARS-CoV-2 nucleocapsid protein and its consequences. J Med Virol, doi: 10.1002/jmv.26626 (2020).

23 Guan, Q. et al. A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic. Int J Infect Dis 100, 216-223, doi: 10.1016/j.ijid.2020.08.052 (2020).

24 He, R. T. et al. Analysis of multimerization of the SARS coronavirus nucleocapsid protein. Biochem Bioph Res Co 316, 476-483, doi: 10.1016/j.bbrc.2004.02.074 (2004).

25 Chang, C. K., Chen, C. M. M., Chiang, M. H., Hsu, Y. L. & Huang, T. H. Transient Oligomerization of the SARS-CoV N Protein - Implication for Virus Ribonucleoprotein Packaging. Pios One 8, doi:ARTN e65045 10.1371/j oumal .pone .0065045 (2013) .

26 Chao Wu, A. J. Q., Asmaa Hachim, Niloufar Kavian, Aidan R. Cole, Austin B. Moyle, Nicole D. Wagner, Joyce Sweeney-Gibbons, Henry W. Rohrs, Michael L. Gross, J. S. Malik Peiris, Christopher F. Basler, Christopher W. Farnsworth, Sophie A. Valkenburg, Gaya K. Amarasinghe, Daisy W. Leung. Characterization of SARS-CoV-2 N protein reveals multiple functional consequences of the C-terminal domain. BioRxiv, doi:https://doi.org/10. 1101/2020. 11.30.404905 (2020).

27 Gordon, D. E. et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 583, 459-+, doi: 10.1038/s41586- 020-2286-9 (2020). 28 Lowrey, A. J., Cramblet, W. & Bentz, G. L. Viral manipulation of the cellular sumoylation machinery. Cell Commun Signal 15, doi:ARTN 27

10.1186/S12964-017-0183-0 (2017).

29 Wu, C. H., Chen, P. J. & Yeh, S. H. Nucleocapsid Phosphorylation and RNA Helicase DDX1 Recruitment Enables Coronavirus Transition from Discontinuous to Continuous Transcription. Cell Host Microbe 16, 462-472, doi: 10.1016/j.chom.2014.09.009 (2014).

30 Wu, C. H. et al. Glycogen Synthase Kinase-3 Regulates the Phosphorylation of Severe Acute Respiratory Syndrome Coronavirus Nucleocapsid Protein and Viral Replication. J Biol Chem 284, 5229- 5239, doi: 10.1074/jbc.M805747200 (2009).

31 Carlson, C. R. et al. Phosphoregulation of Phase Separation by the SARS-CoV-2 N Protein Suggests a Biophysical Basis for its Dual Functions. Mol Cell 80, 1092-+, doi: 10.1016/j.molcel.2020.11.025 (2020).

32 Savastano, A., de Opakua, A. I., Rankovic, M. & Zweckstetter, M.

Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase -containing condensates. Nat Commun 11, doi:ARTN 6041

10.1038/s41467-020-19843-l (2020). 33 Lu, S. et al. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane- associated M protein. Nature communications 12, 502, doi: 10.1038/s41467-020-20768-y (2021).

34 Geoghegan, J. L. & Holmes, E. C. The phylogenomics of evolving virus virulence. Nat Rev Genet 19, 756-769, doi: 10.1038/s41576- 018-0055-5 (2018).

35 Alizon, S., Hurford, A., Mideo, N. & Van Baalen, M. Virulence evolution and the trade-off hypothesis: history, current state of affairs and the future. J Evol Biol 22, 245-259, doi: 10. 1111/j .1420- 9101.2008.01658.x (2009).

36 Anderson, R. M. & May, R. M. Coevolution of hosts and parasites. Parasitology 85 (Pt 2), 411-426, doi: 10.1017/s0031182000055360 (1982).

37 Anderson, R. M. & May, R. M. Population biology of infectious diseases: Part I. Nature 280, 361-367, doi: 10.1038/280361a0 (1979).

38 Arif, T. B. The 501. V2 and B.l.1.7 variants of coronavirus disease 2019 (COVID-19): A new time-bomb in the making? Infect Control Hosp Epidemiol, 1-2, doi: 10.1017/ice.2020.1434 (2021).

39 Grubaugh, N. D., Hanage, W. P. & Rasmussen, A. L. Making Sense of Mutation: What D614G Means for the COVID-19 Pandemic Remains Unclear. Cell 182, 794-795, doi: 10.1016/j.cell.2020.06.040 (2020).

40 McBride, R., van Zyl, M. & Fielding, B. C. The coronavirus nucleocapsid is a multifunctional protein. Viruses 6, 2991-3018, doi: 10.3390/v6082991 (2014).

41 Chang, C. K., Hou, M. H., Chang, C. F., Hsiao, C. D. & Huang, T. H. The SARS coronavirus nucleocapsid protein— forms and functions. Antiviral Res 103, 39-50, doi: 10.1016/j.antiviral.2013.12.009 (2014). 42 Lal, M. S. a. S. K. in Molecular Biology of the SARS-Coronavirus (ed Sunil K. Lal) 129-151 (2009).

43 Wegener, M. & Muller-McNicoll, M. View from an mRNP: The Roles of SR Proteins in Assembly, Maturation and Turnover. Adv Exp Med Biol 1203, 83-112, doi: 10.1007/978-3-030-31434-7_3 (2019).

44 Bouhaddou, M. et al. The Global Phosphorylation Landscape of SARS-CoV-2 Infection. Cell 182, 685-712 e619, doi: 10.1016/j.cell.2020.06.034 (2020).

45 Nathan, K. G. & Lal, S. K. The Multifarious Role of 14-3-3 Family of Proteins in Viral Replication. Viruses 12, doi: 10.3390/vl2040436 (2020).

46 Verheije, M. H. et al. The Coronavirus Nucleocapsid Protein Is Dynamically Associated with the Replication-Transcription Complexes. J Virol 84, 11575-11579, doi: 10. 1128/Jvi.00569-10 (2010).

47 Chen, H. Y. et al. Mass spectroscopic characterization of the coronavirus infectious bronchitis virus nucleoprotein and elucidation of the role of phosphorylation in RNA binding by using surface plasmon resonance. J Virol 79, 1164-1179, doi: 10. 1128/Jvi.79.2. 1164-1179.2005 (2005).

48 Peng, T. Y., Lee, K. R. & Tam, W. Y. Phosphorylation of the arginine/serine dipeptide -rich motif of the severe acute respiratory syndrome coronavirus nucleocapsid protein modulates its multimerization, translation inhibitory activity and cellular localization. Febs J 275, 4152-4163, doi: 10.111 l/j.1742- 4658.2008.06564.x (2008).

49 V'kovski, P. et al. Determination of host proteins composing the microenvironment of coronavirus replicase complexes by proximity- labeling. Elife 8, doi:ARTN e42037 10.7554/eLife.42037 (2019).