Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A VIRAL EXPOSURE SIGNATURE FOR DETECTION OF EARLY STAGE HEPATOCELLULAR CARCINOMA
Document Type and Number:
WIPO Patent Application WO/2021/072268
Kind Code:
A1
Abstract:
A viral exposure signature (VES) that can identify early stage, pre-symptomatic hepatocellular carcinoma (HCC) among at-risk patients is described. The VES was developed using serological profiling and synthetic virome technology to identify unique viral peptide epitopes corresponding to 61 viral species. Methods of identifying a subject with early stage (pre-symptomatic) HCC using the VES are described.

Inventors:
WANG XIN WEI (US)
LIU JINPING (US)
TANG WEI (US)
Application Number:
PCT/US2020/055077
Publication Date:
April 15, 2021
Filing Date:
October 09, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
US HEALTH (US)
International Classes:
C12Q1/6804; C12N15/10; C12Q1/6883
Domestic Patent References:
WO2017132550A12017-08-03
Foreign References:
US20160320406A12016-11-03
Other References:
G. J. XU ET AL: "Comprehensive serological profiling of human populations using a synthetic human virome", SCIENCE, vol. 348, no. 6239, 4 June 2015 (2015-06-04), US, pages aaa0698 - aaa0698, XP055305755, ISSN: 0036-8075, DOI: 10.1126/science.aaa0698
STERN JONATHAN ET AL: "Virome and bacteriome: two sides of the same coin", CURRENT OPINION IN VIROLOGY, vol. 37, 1 August 2019 (2019-08-01), United Kingdom, pages 37 - 43, XP055776041, ISSN: 1879-6257, Retrieved from the Internet DOI: 10.1016/j.coviro.2019.05.007
LIU JINPING ET AL: "A Viral Exposure Signature Defines Early Onset of Hepatocellular Carcinoma", CELL, ELSEVIER, AMSTERDAM NL, vol. 182, no. 2, 10 June 2020 (2020-06-10), pages 317, XP086224636, ISSN: 0092-8674, [retrieved on 20200610], DOI: 10.1016/J.CELL.2020.05.038
FARAZI ET AL., NAT REV CANCER, vol. 6, 2006, pages 674 - 687
ARZUMANYAN ET AL., NAT REV CANCER, vol. 13, 2013, pages 123 - 135
JANJUA ET AL., J HEPATOL, vol. 66, 2017, pages 504 - 513
CARRAT ET AL., LANCET, vol. 393, 2019, pages 1453 - 1464
CHANG ET AL., GASTROENTEROLOGY, vol. 151, 2016, pages 472 - 480
LIU ET AL., J HEPATOL, vol. 70, 2019, pages 674 - 683
SHERMAN ET AL., HEPATOLOGY, vol. 56, 2012, pages 793 - 796
TZARTZEVA ET AL., GASTROENTEROLOGY, vol. 155, 2018, pages 1128 - 1139
SHERMAN ET AL., HEPATOLOGY, vol. 22, 1995, pages 432 - 438
BENJAMIN LEWIN: "Genes VII", 2000, OXFORD UNIVERSITY PRESS
PEARSON ET AL., METH. MOL. BIO., vol. 24, 1994, pages 307 - 31
"Molecular Biology and Biotechnology: a Comprehensive Desk Reference", 1995, WILEY, JOHN & SONS, INC.
STOVER ET AL., J INFECT DIS, vol. 187, 2003, pages 1388 - 1396
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
HUANG ET AL., COMPUTER APPLS. IN THE BIOSCIENCES, vol. 8, 1992, pages 155 - 65
AUSUBEL ET AL.: "Short Protocols in Molecular Biology: A Compendium of Methods from Current Protocols in Molecular Biology", 1999, COLD SPRING HARBOR LABORATORY PRESS
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 10
SMITHWATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443
PEARSONLIPMAN, PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 2444
HIGGINSSHARP, GENE, vol. 73, 1988, pages 237 - 44
HIGGINSSHARP, CABIOS, vol. 5, 1989, pages 151 - 3
CORPET ET AL., NUC. ACIDS RES., vol. 16, 1988, pages 10881 - 90
FOXMAN ET AL., NAT REV MICROBIOL, vol. 9, 2011, pages 254 - 64
CADWELL, IMMUNITY, vol. 42, 2015, pages 805 - 813
XU ET AL., SCIENCE, vol. 348, 2015, pages aaa0698
SCHUTTE ET AL., GASTROINTEST TUMORS, vol. 2, no. 4, 2016, pages 188 - 194
LARMAN ET AL., NAT. BIOTECHNOL, vol. 29, 2011, pages 535 - 541
MOHAN ET AL., NAT PROTOC, vol. 13, 2018, pages 1958 - 1978
MARRERO ET AL.: "Diagnosis, Staging and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases", HEPATOLOGY, vol. 68, no. 2, 2018, pages 723 - 750
SCHULZ ET AL., BMJ, vol. 340, 2010, pages c332
STRAUS ET AL., ANN INTERN MED, vol. 118, 1993, pages 45 - 58
HO, REV INFECT DIS, vol. 12, 1990, pages S701 - S710
BRUNO ET AL., HEPATOLOGY, vol. 46, 2007, pages 1350 - 1356
CHANG ET AL., IMMUNOL REV, vol. 254, 2013, pages 114 - 142
ECHAVARRIA, CLIN MICROBIOL REV, vol. 21, 2008, pages 704 - 715
SEGATA ET AL., GENOME BIOL, vol. 12, 2011, pages R60
BANSALHEAGERTY, DIAGN PROGN RES, vol. 3, 2019, pages 14
BLANCHE ET AL., STAT MED, vol. 32, 2013, pages 5381 - 5397
MCKAY J ET AL., NAT GENET, vol. 49, 2017, pages 1126 - 1132
PHAROAH ET AL., NAT GENET, vol. 45, 2013, pages 362 - 370
FUMAGALLI ET AL., PLOS GENET, vol. 6, 2010, pages e1000849
COHEN ET AL., SCIENCE, vol. 359, 2018, pages 926 - 930
LIU ET AL., ANN ONCOL, vol. 30, 2019, pages 464 - 470
SHIEH ET AL., NAT REV CLIN ONCOL, vol. 13, 2016, pages 550 - 56
VIRGIN, CELL, vol. 157, 2014, pages 142 - 150
BARTENSCHLAGER ET AL., NAT REV MICROBIOL, vol. 11, 2013, pages 482 - 496
SLYKER ET AL., J INFECT DIS, vol. 207, 2013, pages 1798 - 1806
LICHTNER ET AL., J INFECT DIS, vol. 211, 2015, pages 178 - 186
EDLIN ET AL., HEPATOLOGY, vol. 62, 2015, pages 1353 - 1363
ROBERTS ET AL., HEPATOLOGY, vol. 63, 2016, pages 388 - 397
Attorney, Agent or Firm:
CONNOLLY, Jodi L. et al. (US)
Download PDF:
Claims:
CLAIMS

1. A method of identifying a subject with early stage hepatocellular carcinoma (HCC), comprising:

(i) detecting the presence or absence of antibodies to a plurality of viruses in a sample obtained from the subject, wherein the plurality of viruses comprises at least 10, at least 20, at least 30, at least 40, at least 50, or at least 60 of the viruses listed in Table 5A;

(ii) determining the presence of a viral exposure signature (VES) in the sample obtained from the subject if:

(a) antibodies specific for one or more of hepatitis C vims (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus, strain AD169; HCV genotype 6g, isolate JK046; HCV genotype lb, isolate BK; HCV genotype lc, isolate HC-G9; HCV genotype lb, strain HC-J4; HCV genotype 4a, isolate ED43; hepatitis delta virus; HCV genotype 5a, isolate EUH1480; human cytomegalovirus; Crimean-Congo hemorrhagic fever virus, strain Nigeria/IbAr 10200/ 1970; HCV genotype lb, isolate HC-J1; influenza A virus, strain A/USSR/90/ 1977 H1N1; influenza A virus, strain A/Bangkok/ 1/1979 H3N2; HCV genotype lc, isolate India; and Chapare virus, isolate Human/Bolivia/810419/2003 are detected in the sample; and/or

(b) antibodies specific for one or more of Epstein-Barr virus, strain B95-8; human rhinovirus 23; HCMV, strain Towne; human herpesvirus 2 (HHV-2), strain HG52; human herpesvirus 3; varicella-zoster vims, strain Dumas; Cercopithecine herpesvirus 16; human adenovims C serotype 2; human astrovims-1; human respiratory syncytial vims; human herpesvirus 6B, strain Z29; human herpesvirus 7, strain JI; human rhinovirus 14; Lordsdale vims, strain G I I/H u man/U n i ted Kingdom/Lordsdale/1993; human herpesvirus 1, strain KOS; human metapneumovims, strain CAN97-83; coxsackievims A16, strain G-10; Epstein-Barr vims, strain AG876; cowpox vims; human herpesvirus 1, strain 17; human adenovims E serotype 4; human adenovims F serotype 40; tanapox vims; human adenovims C serotype 5; rhinovirus B; human herpesvirus 8; human herpesvirus 6A, strain Uganda-1102; human rhinovirus A serotype 89, strain 41467-Gallo; norovims MD145, isolate G I I/H u m an/U n i ted States/MD145-12/1987; molluscum contagiosum vims subtype 1; vaccinia vims, strain Copenhagen; poliovims type 1, strain Sabin; orf vims; HHV-2, strain 333; hepatitis B vims; Epstein-Barr vims, strain GDI; human parainfluenza 3 vims, strain Wash/47885/57; HHV-2; human enterovims 71, strain BrCr; human herpesvims 6A, strain GS; Cercopithecine herpesvirus 1; influenza B virus, strain B/Yamagata/16/1988; and influenza A virus, strain

A/Philippines/2/1982 H3N2 are not detected in the sample; and

(iii) identifying the subject as having early stage HCC when the VES is present.

2. The method of claim 1, wherein the plurality of viruses comprises the 61 viruses listed in Table 5 A.

3. The method of claim 1 or claim 2, wherein the plurality of viruses consists of the 61 viruses listed in Table 5 A.

4. The method of claim 1, wherein the plurality of viruses comprises the 31 viruses listed in Table 6.

5. The method of claim 1 or claim 4, wherein the plurality of viruses consists of the 31 viruses listed in Table 6.

6. The method of any one of claims 1-5, wherein step (ii) comprises determining the presence of the VES in the sample obtained from the subject if:

(a) antibodies specific for three or more, five or more, or seven or more of hepatitis C virus (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus, strain AD169; HCV genotype 6g, isolate JK046; HCV genotype lb, isolate BK; HCV genotype lc, isolate HC-G9; HCV genotype lb, strain HC-J4; HCV genotype 4a, isolate ED43; hepatitis delta virus; HCV genotype 5a, isolate EUH1480; human cytomegalovirus; Crimean-Congo hemorrhagic fever virus, strain

Nigeria/IbAr 10200/ 1970; HCV genotype lb, isolate HC-J1; influenza A virus, strain A/USSR/90/1977 H1N1; influenza A virus, strain A/Bangkok/ 1/1979 H3N2; HCV genotype lc, isolate India; and Chapare virus, isolate Human/Bolivia/810419/2003 are detected in the sample; and/or

(b) antibodies specific for three or more, five or more, or seven or more of Epstein-Barr virus, strain B95-8; human rhinovirus 23; HCMV, strain Towne; human herpesvirus 2 (HHV- 2), strain HG52; human herpesvirus 3; varicella-zoster virus, strain Dumas; Cercopithecine herpesvirus 16; human adenovirus C serotype 2; human astrovirus-1; human respiratory syncytial virus; human herpesvirus 6B, strain Z29; human herpesvirus 7, strain JI; human rhinovirus 14; Lordsdale virus, strain GII/Human/United Kingdom/Lordsdale/1993; human herpesvirus 1, strain KOS; human metapneumovirus, strain CAN97-83; coxsackievirus A16, strain G-10; Epstein-Barr virus, strain AG876; cowpox virus; human herpesvirus 1, strain 17; human adenovirus E serotype 4; human adenovirus F serotype 40; tanapox virus; human adenovirus C serotype 5; rhinovirus B; human herpesvirus 8; human herpesvirus 6A, strain Uganda-1102; human rhinovirus A serotype 89, strain 41467-Gallo; norovirus MD145, isolate GII/Human/United States/MD145-12/1987; molluscum contagiosum virus subtype 1; vaccinia virus, strain Copenhagen; poliovirus type 1, strain Sabin; orf virus; HHV-2, strain 333; hepatitis B vims; Epstein-Barr virus, strain GDI; human parainfluenza 3 vims, strain Wash/47885/57; HHV-2; human enterovirus 71, strain BrCr; human herpesvirus 6A, strain GS; Cercopithecine herpesvirus 1; influenza B vims, strain B/Yamagata/16/1988; and influenza A vims, strain A/Philippines/2/1982 H3N2 are not detected in the sample.

7. A method of identifying a subject as having early stage hepatocellular carcinoma (HCC), comprising:

(i) detecting the presence or absence of antibodies specific for a plurality of viruses in a sample obtained from the subject, wherein the plurality of vimses comprises hepatitis C vims (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovims (HCMV) strain AD169; HCV genotype 6g, isolate JK046; Epstein-Barr vims (EBV), strain B95-8; human rhinovims 23; HCMV strain Towne; HCV genotype lb, isolate BK; and human herpesvirus 2 (HHV-2), strain HG52; and

(ii) identifying the subject as having early stage HCC if:

(a) antibodies specific for HCV genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; HCMV strain AD169; HCV genotype 6g, isolate JK046; and/or HCV genotype lb, isolate BK, are detected in the sample; and/or

(b) antibodies specific for EBV, strain B95-8; human rhinovims 23; HCMV strain Towne; and/or HHV-2, strain HG52, are not detected in the sample.

8. The method of claim 7, wherein step (ii) comprises identifying the subject as having early stage HCC if: (a) antibodies specific for at least two, at least three, at least four, at least five or all six of HCV genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; HCMV strain AD169; HCV genotype 6g, isolate JK046; and HCV genotype lb, isolate BK, are detected in the sample; and/or

(b) antibodies specific for at least one, at least two, at least three or all four of EBV strain B95-8; human rhinovirus 23; HCMV strain Towne; and/or HHV-2 strain HG52, are not detected in the sample.

9. The method of any one of claims 1-8, wherein the sample is a blood or serum sample.

10. The method of any one of claims 1-9, wherein the antibodies are detected by phage immunoprecipitation, immunoblot or enzyme-linked immunosorbent assay.

11. The method of any one of claims 1-10, further comprising administering an appropriate therapy for the prevention or treatment of HCC.

12. The method of claim 11, wherein the appropriate therapy comprises vaccination against HBV, vaccination against HCV, administration of an anti- viral drug, a lifestyle change or a dietary modification.

13. The method of claim 12, wherein the anti-viral drug is a nucleoside analog, interferon, or lamivudine.

14. The method of claim 12, wherein the lifestyle or diet change includes reducing or eliminating intravenous drug use, reducing or eliminating alcohol consumption, reducing exposure to aflatoxin, or reducing iron overload.

15. The method of claim 11, wherein the appropriate therapy comprises a liver transplant or liver resection.

16. The method of claim 15, further comprising radiofrequency ablation.

17. The method of any one of claims 1-16, further comprising diagnostic monitoring every 3 months or every 6 months of the subject with early stage HCC.

18. The method of claim 17, wherein diagnostic monitoring comprises ultrasound, computerized tomography (CT), magnetic resonance imaging (MRI), or a combination thereof.

19. The method of any one of claims 1-18, wherein the subject has not previously had a diagnosis of one or more of liver disease, hepatitis B vims (HBV) infection, hepatitis C virus (HCV) infection, hepatitis delta virus (HDV) infection, nonalcoholic fatty-liver disease (NAFLD), nonalcoholic steatohepatitis (NASH) and hepatocellular carcinoma (HCC).

20. A phage display library expressing unique peptide epitopes from each of the viruses listed in Table 5A or Table 6.

21. The phage display library of claim 20, wherein the peptide epitopes comprise: the peptides of SEQ ID NOs: 1-61; peptides comprising at least 90%, at least 95%, at least 96%, at last 97%, at least 98%, or at least 99% sequence identity to each of SEQ ID NOs: 1-61; the peptides of SEQ ID NOs: 62-102; peptides comprising at least 90%, at least 95%, at least 96%, at last 97%, at least 98%, or at least 99% sequence identity to each of SEQ ID NOs: 62-102; or combinations thereof.

22. An array comprising unique peptide epitopes from each of the viruses listed in Table 5 A or Table 6.

23. The array of claim 22, wherein the peptide epitopes comprise: the peptides of SEQ ID NOs: 1-61; peptides comprising at least 90%, at least 95%, at least 96%, at last 97%, at least 98%, or at least 99% sequence identity to each of SEQ ID NOs: 1-61; the peptides of SEQ ID NOs: 62-102; peptides comprising at least 90%, at least 95%, at least 96%, at last 97%, at least 98%, or at least 99% sequence identity to each of SEQ ID NOs: 62-102; or combinations thereof.

Description:
A VIRAL EXPOSURE SIGNATURE FOR DETECTION OF EARLY STAGE HEPATOCELLULAR CARCINOMA

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/914,138, filed October 11, 2019, which is herein incorporated by reference in its entirety.

FIELD

This disclosure concerns a viral exposure signature and its use for identifying a subject with early stage (pre-symptomatic) hepatocellular carcinoma.

ACKNOWLEDGMENT OF GOVERNMENT SUPPORT

This invention was made with government support under project number Z01-BC010313 awarded by the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND

Hepatocellular carcinoma (HCC) is considered a virus-related malignancy in which hepatitis B and C viruses (HCV and HBV) are major etiological factors (Farazi et al, Nat Rev Cancer 2006;6:674-687). Viral hepatitis causes inflammation and chronic liver diseases (CLD), which may lead to fibrosis, cirrhosis and eventually, HCC. While HBV or HCV chronic carriers have an increased risk of developing HCC, the risk varies among individuals and not all patients with liver disease develop liver cancer (Arzumanyan et al, Nat Rev Cancer 2013;13:123-135). An effective strategy to prevent HCC is to eliminate causative factors. However, while direct-acting antiviral (DDA) treatment is remarkably effective in eliminating HCV infection, it reduces but does not completely eliminate HCC risk (Janjua et al, J Hepatol 2017;66:504-513; Carrat et al, Lancet 2019;393:1453-1464). Similarly, HBV vaccination, introduced in the early 80s, has been successful in significantly reducing HBV carriers but only modestly reduces HCC burden in HBV-prevalent areas (Chang et al, Gastroenterology 2016;151:472-480). It is puzzling that the control of HBV infection in HBV-prevalent areas as well as HCV infection has been remarkably successful for decades, while the global HCC incidence and mortality rate has continued to increase since the 1990s (Liu et al, J Hepatol 2019;70:674-683). Changing trends of etiological factors such as alcohol and non-alcohol/non- viral related liver diseases may contribute to the observed increase. Thus, in addition to cancer prevention, early detection is a key research area to stop HCC-inflicted mortality. Currently, medical guidelines recommend biannual surveillance using ultrasound with or without alpha- fetoprotein (AFP) for individuals with chronic liver disease such as cirrhosis (Sherman et al, Hepatology 2012;56:793-796). However, these practices have yielded mix results as to whether it is effective in detecting HCC at an early stage and can provide survival benefit (Tzartzeva et al, Gastroenterology 2018;154:1706-1718; Moon et al, Gastroenterology 2018;155:1128-1139; Sherman et al, Hepatology 1995;22:432-438). Noticeably, a majority of HCC patients are still diagnosed at an advanced stage, which precludes their chance to receive potentially curative therapies, and consequently leads to poor survival. Thus, there is an unmet need to implement an effective biomarker-guided surveillance program for early cancer detection.

SUMMARY

Described herein is a viral exposure signature (VES) that can be used to identify a subject with early stage HCC, particularly pre- symptomatic HCC. The VES is based on the presence or absence of antibodies to specific viral strains in a subject. Detection of the VES in a subject can be used, for example, to guide treatment and disease monitoring decisions.

Provided herein are methods of identifying a subject with early stage HCC. In some embodiments, the method includes detecting the presence or absence of antibodies to a plurality of viruses in a sample obtained from the subject; determining the presence of a viral exposure signature (VES) in the sample obtained from the subject; and identifying the subject as being at risk for developing HCC when the VES is present. In some embodiments, the plurality of viruses comprises at least 10, at least 20, at least 30, at least 40, at least 50 or at least 60 of the viruses listed in Table 5 A. In some examples, the plurality of viruses comprises or consists of the 61 viruses listed in Table 5A or the 31 viruses listed in Table 6.

In some embodiments, the presence of the VES is determined by identifying antibodies to one or more of hepatitis C vims (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus, strain AD169; HCV genotype 6g, isolate JK046; HCV genotype lb, isolate BK; HCV genotype lc, isolate HC-G9; HCV genotype lb, strain HC-J4; HCV genotype 4a, isolate ED43; hepatitis delta vims; HCV genotype 5a, isolate EUH1480; human cytomegalovims; Crimean-Congo hemorrhagic fever virus, strain Nigeria/IbArl0200/1970; HCV genotype lb, isolate HC-J1; influenza A virus, strain A/USSR/90/1977 H1N1; influenza A vims, strain A/Bangkok/1/1979 H3N2; HCV genotype lc, isolate India; and Chapare vims, isolate Human/Bolivia/810419/2003.

In some embodiments, the presence of the VES is determined by not detecting antibodies to one or more of Epstein-Barr virus, strain B95-8; human rhinovirus 23; HCMV, strain Towne; human herpesvims 2 (HHV-2), strain HG52; human herpesvims 3; varicella-zoster vims, strain Dumas; Cercopithecine herpesvirus 16; human adenovirus C serotype 2; human astrovirus- 1 ; human respiratory syncytial virus; human herpesvirus 6B, strain Z29; human herpesvirus 7, strain JI; human rhinovirus 14; Lordsdale virus, strain GII/Human/United Kingdom/Lordsdale/1993; human herpesvirus 1, strain KOS; human metapneumovirus, strain CAN97-83; coxsackievirus A16, strain G-10; Epstein-Barr vims, strain AG876; cowpox vims; human herpesvirus 1, strain 17; human adenovirus E serotype 4; human adenovirus F serotype 40; tanapox vims; human adenovims C serotype 5; rhino vims B; human herpesvirus 8; human herpes vims 6A, strain Uganda-1102; human rhinovirus A serotype 89, strain 41467-Gallo; norovirus MD145, isolate GII/Human/United States/MD145-12/1987; molluscum contagiosum virus subtype 1; vaccinia virus, strain Copenhagen; poliovirus type 1, strain Sabin; orf virus; HHV-2, strain 333; hepatitis B virus; Epstein-Barr virus, strain GDI; human parainfluenza 3 virus, strain Wash/47885/57; HHV-2; human enterovims 71, strain BrCr; human herpesvims 6A, strain GS; Cercopithecine herpesvirus 1; influenza B virus, strain B/Yamagata/16/1988; and influenza A vims, strain A/Philippines/2/ 1982 H3N2.

In other embodiments, the method of identifying a subject with early stage HCC includes (i) detecting the presence or absence of antibodies specific for a plurality of viruses in a sample obtained from the subject, wherein the plurality of viruses comprises hepatitis C virus (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus (HCMV) strain AD 169; HCV genotype 6g, isolate JK046; Epstein-Barr vims (EBV), strain B95-8; human rhinovirus 23; HCMV strain Towne; HCV genotype lb, isolate BK; and human herpesvims 2 (HHV-2), strain HG52; and (ii) identifying the subject as being at risk for developing HCC if: (a) antibodies specific for HCV genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; HCMV strain AD169; HCV genotype 6g, isolate JK046; and/or HCV genotype lb, isolate BK, are detected in the sample; and/or (b) antibodies specific for EBV, strain B95-8; human rhinovirus 23; HCMV strain Towne; and/or HHV-2, strain HG52, are not detected in the sample.

In some embodiments, the sample is a blood or serum sample.

In some embodiments, the antibodies are detected by phage immunoprecipitation, immunoblot or enzyme-linked immunosorbent assay.

In some embodiments, the method further includes administering an appropriate therapy or providing an appropriate procedure (such as surgery) for the treatment of HCC. In some examples, the method further includes performing a liver transplant in the subject with early stage HCC. In other examples, the method further includes liver resection of the subject with early stage HCC, with or without radiofrequency ablation (RFA). In some examples, if the subject is also positive for HBV or HCV, the subject is administered an anti-viral drug.

In some embodiments, the method further includes active diagnostic monitoring of the subject with early stage HCC. For example, the subject can be monitored on a regular schedule, such as every 3 months or every 6 months, using ultrasound, contrast enhanced computerized tomography (CT) and/or magnetic resonance imaging (MRI).

Also provided is a phage display library expressing unique peptide epitopes from each of the viruses listed in Table 5A or Table 6. In some embodiments, the phage display library expresses the peptides of SEQ ID NOs: 1-61, or a subset thereof. In some examples, the phage display library expresses the peptides of SEQ ID NOs: 1-102, or a subset thereof. In other examples, the phage display library expresses the peptides of SEQ ID NOs: 62-102, or a subset thereof.

Further provided is an array comprising unique peptide epitopes from each of the viruses listed in Table 5A or Table 6. In some examples the unique peptide epitopes comprise the peptides of SEQ ID NOs: 1-61 (shown in Table 5B), the peptides of SEQ ID NOs: 62-102 (shown in Table 3B), or the peptides of SEQ ID NOs: 1-102.

The foregoing and other objects and features of the disclosure will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS FIGS. 1A-1E: Viral richness and frequency of infection spectrum in serum. (FIG. 1A) Schema of screening of NCI-UMD cohort including 899 serum samples by VirScan and 849 matching huffy coat or cheek swab samples by genome-wide association study (GWAS), with integrated analysis among population groups: population controls (PC, n=412), high risk chronic liver disease cases (HR, n=337), and hepatocellular carcinoma cases (HCC, n=150); the VES is validated in a perspective NIDDK cohort with NIDDK-HR (n=129) and NIDDK-HCC (n=44). (FIG. IB) Histogram showing the sequencing reads of VirScan with the mean coverage accuracy of 0.93. (FIG. 1C) Rarefaction plot showing the viral species richness detected in PC, HR and HCC groups. (FIG. ID) Raincloud plot showing the viral species in each individual across populations. From left to right, each integrated boxplot illustrates: minimum, the first quantile, mean, the third quantile, and maximum, respectively. (FIG. IE) Left: Bar plot showing the percentage of the prevalent viral infection among all samples. Right: Dot plot showing the number of the corresponding unique epitopes in each sample. Each dot represents the unique epitope number of one individual. The blue bars on the dot plot represent the mean.

FIGS. 2A-2C: Comparison of VirScan with medical charts, antigenicity of HCVlb and HIV coinfection viruses. (FIG. 2 A) Contingency matrices comparing HCV, HBV, and HIV detection with VirScan against viral detection laboratory tests reported in the patient medical charts. For the purpose of computing binary classification test statistics, clinical results were considered true values and VirScan results were considered predicted values. (FIG. 2B) Left: Heatmap showing HCV proteomic enrichment among PC, HR and HCC groups. Each row represents the significant peptide tiling. Each column is a sample. The colored bar on the left of the panel indicates proteomic location of the tiling peptides (green). The first colored bar at the top of the panel indicates the groups of the samples among PC, HR, and HCC groups. The second bar at the top is HCV species positive (HCV species+) based on VirScan data. The intensity of each cell corresponds to the scaled -log 10 (p- value) measure of significance of enrichment for a peptide in a sample (greater values indicate stronger antibody response). Right: Bar plot showing the B-cell epitope prediction score for each peptide. (FIG. 2C) Bar chart representing the coinfection viral status in HIV positive (HIV +) versus HIV negative (HIV -) cases. Asterisks denote the false discovery rate less than 0.05.

FIGS. 3A-3E: Composition of VES associated with HCC. (FIG. 3A) VES are identified using Xgboost machine learning method. Flow chart showing training set and 10X cross validation sets to compare the viral profiles in HCC versus PC. The scored results are shown the predictive VES score of each sample among PC, HR and HCC. (FIG. 3B) Gradient boosting plot showing the area under the curve (AUC) value of training sets and 10X cross validation sets. The vertical line represents gradient boosting stops at round 108 th testing to avoid overfitting. (FIG. 3C) Bar plot showing the 61 -VES identified by comparing HCC with PC using Xgboost in NCI-UMD cohort. (FIG. 3D) Violin plot showing the predictive VES score among PC, HR and HCC groups. (**** P<0.0001, two-tailed p-value in Mann Whitney test). (FIG. 3E) Phylogenetic analysis of the 61 viral strains, which results in eight well-defined branches.

FIGS. 4A-4H: Determination of VES predictive accuracy and association with clinical outcomes. (FIG. 4A) Estimate of receiver operating characteristic curves (ROC) of NCI-UMD cohort at HCC diagnosis. Plots display AUC estimation for 61-VES at HCC diagnosis (PC, n= 412; HR, n=337; HCC, n = 150). (FIG. 4B) VES levels are listed as below, low and high of NCI-UMD cohort. The dashed line indicates less than 0.5 is below VES level. Low and high VES levels are defined by more than 0.5 VES level (median of more than 0.5 feature level as a separation). (FIG. 4C) Kaplan Meier (KM) plot survival curve for the NCI-UMD cohort with either 61-VES. (FIGS. 4D, 4E) Estimate of receiver operating characteristic curves (ROC) in predicting NIDDK validation cohort at HCC diagnosis and baseline. Plots display area under the curve estimation for 61-VES and clinical variable AFP at HCC diagnosis (NIDDK-HR2, n= 106; NIDDK-HCC, n = 44) and at baseline (NIDDK-HR1, n= 129; NIDDK-HCC, n= 44). (FIG. 4F) Time-dependent AUC showing the landmark time points performance of VES from 1 to 10 years relative to baseline. (FIG. 4G) The boxplots show the relationships between 61-VES and the clinical diagnosis in the NIDDK validation cohort at different follow-up (F/U) time points. (FIG. 4H) AUC values corresponding to predictions based on clinical indicators from patient charts compared with those based on VES, as well as those based on the combination clinical and VES for NIDDK cohort at baseline.

FIGS. 5A-5E: VirScan reproducibility and viral composition at DNA, RNA virus level and viral family level. (FIG. 5A) Distribution of reproducibility threshold -loglO (p-values) is shown. Histogram of the frequency of the reproducibility threshold -log 10 (p-values). The mode of the distribution is approximately 2.358. (FIG. 5B) Examples of the experimental repeats in VirScan showing the background signals of the blank PBS samples at the bottom and the hits with significant -loglO (P-value) more than 2.358 of serum samples (top panel). (FIG. 5C) Pie charts showing the DNA and RNA viral compositions before and after immunoprecipitation in VirScan, as library input and Phage-IP, respectively. (FIG. 5D) Stacked bar plot showing phylogenetic composition of common viral taxa (0.1% abundance) at the viral family level among PC, HR and HCC. (FIG. 5E) The diagram includes detailed information on the excluded participants from initial enrollment, sample allocation with indicated criteria, QC and final data analysis.

FIGS. 6A-6B: Extended information of composition of viral features in the investigated population. (FIG. 6A) Heatmap showing the hierarchical clustering (hCluster) of the samples among PC, HR and HCC with the differential viral features. The listed 17 viruses exhibit a fold change greater than 2 with FDR<0.05 in PC and HCC ANOVA test. Bottom bar shows the scaled density signal. (FIG. 6B) Histogram showing the most differential viral species (sp) and strains in HCC versus PC.

FIGS. 7A-7C: Quality control of the GWAS study. (FIG. 7A) QQ-plot for all 729,000 variants represented in the GWAS. (FIG. 7B) Principal component analysis (PCA) of all samples after quality control (QC) in different racial groups. (FIG. 1C) SNP rs 12979860 was significantly associated with epitopes in Core and NS5B regions of HCV. Left panel: Heatmap showing the significance of SNP associated with 375 epitopes abundances of HCV genotype 2 and 3. Core and NS5B regions were highly associated with the genotypes. Right panel: Boxplots represent the difference of the epitope abundance between the genotypes in the Core region and NS5B region. FIGS. 8A-8E: CONSORT flow diagrams for NIDDK cohort and assessment of the association of clinical outcomes with VES in NIDDK Cohort. (FIG. 8A) The diagram includes detailed information on the excluded participants from initial enrollment, sample allocation with indicated criteria, follow-up, QC and final data analysis. (FIG. 8B) Kaplan-Meier survival curves for NIDDK cohorts grouped by VES level. (FIG. 8C) Time-dependent ROC curve analysis of VES performance for landmark time points 1-10 years relative to baseline. (FIG. 8D) AUC prediction performance based on univariate and multivariate clinical indicators compared to VES (vertical band) for the NIDDK cohort at diagnosis. (FIG. 8E) AUC prediction performance based on univariate and multivariate clinical indicators compared to VES (vertical band) for the NCI-UMD cohort.

FIGS. 9A-9G: Genome- wide scan identifies specific genetic variants linked to VES. (FIG. 9 A) Manhattan plot showing the detected genetic variants from GW AS associated with the viral featural phenotype of NCI-UMD cohort. Annotated names of gene loci with P- value less than 10 7 . (FIG. 9B) Locus Zoom plot showing the LD structure of one of the lead SNPs, rs 16960234, around the region of CDH13 and RP11-543N12.1. (FIG. 9C) Heatmap showing the high linkage disequilibrium (LD) SNPs of rsl6960234 from 1000 Genomes database (R2>0.6). The density of the heatmap indicates the r2 value of the correlation. The labeled SNPs are the ones with eQTL available. (FIG. 9D) The eQTL of CDH13 in tissue artery tibial across genotypes of SNP rsl690234 from GTEx database. (FIG. 9E) The genotypic odds ratios (OR) of rsl690234 among HR and HCC relative to PC. (FIGS. 9F, 9G) VES score fold changes (FD) in genotypes AA, AG and GG of rsl690234 based on 61-VES and 31-VES among HCC relative to PC.

FIG. 10: Viral infection prevalence and unique viral epitope count across population control (PC), at risk group (AR), and HCC group. The viral infection prevalence across all PC, AR and HCC samples is shown on the bar plots. The count of unique epitopes per sample is shown on the dot plot and the vertical lines represent the mean values of the count of unique epitopes.

FIGS. 11A-11G: Further validation of robustness of the 61-VES. (FIG. 11 A) XGBoost performance evaluated by AUC on HCC versus AR with lOx cross-validation. (FIG. 11B) ROC curves for PC versus HCC prediction, as well as for AR versus HCC prediction, using features from HCC versus AR predication. (FIG. 11C) Features selected by HCC versus AR predication was highly overlapped with VES signature. (FIG. 11D) XGBoost performance evaluated by AUC on HCC versus PC with 60/40 train-test split. (FIG. 11E) ROC curves showed the train and test datasets performance. (FIG. 11F) 1000 permutation with the 60/40 train-test split. (FIG. 11G) The selected features and feature importance after 1000 permutation test. SEQUENCE LISTING

The amino acid sequences listed in the accompanying sequence listing are shown using standard three letter code for amino acids, as defined in 37 C.F.R. 1.822. The Sequence Listing is submitted as an ASCII text file, created on October 8, 2020, 58.3 KB, which is incorporated by reference herein. In the accompanying sequence listing:

SEQ ID NOs: 1-102 are amino acid sequences of unique peptide epitopes from human viruses.

DETAILED DESCRIPTION

I. Terms and Methods

Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes VII, published by Oxford University Press, 2000 (ISBN 019879276X); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Publishers, 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by Wiley, John & Sons, Inc., 1995 (ISBN 0471186341); and George P. Redei, Encyclopedic Dictionary of Genetics, Genomics, and Proteomics, 2nd Edition, 2003 (ISBN: 0-471- 26821-6).

The singular forms “a,” “an,” and “the” refer to one or more than one, unless the context clearly dictates otherwise. For example, the term “comprising a probe” includes single or plural probes and is considered equivalent to the phrase “comprising at least one probe.” The term “or” refers to a single element of stated alternative elements or a combination of two or more elements, unless the context clearly indicates otherwise. As used herein, “comprises” means “includes.” Thus, “comprising A or B,” means “including A, B, or A and B,” without excluding additional elements.

Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety, as are the GenBank® Accession numbers (for the sequence present on February 8, 2016). In case of conflict, the present specification, including explanations of terms, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Except as otherwise noted, the methods and techniques of the present disclosure are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification. See, e.g., Sambrook el al., Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, 1989; Sambrook et al., Molecular Cloning: A Laboratory Manual, 3d ed., Cold Spring Harbor Press, 2001; Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates, 1992 (and Supplements to 2000); Ausubel et al., Short Protocols in Molecular Biology: A Compendium of Methods from Current Protocols in Molecular Biology, 4th ed., Wiley & Sons, 1999; Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1990; and Harlow and Lane, Using Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1999.

In order to facilitate review of the various embodiments of the disclosure, the following explanations of specific terms are provided:

Administration: The introduction of an agent, such as an anti-viral therapeutic, into a subject by a chosen route. Administration can be local or systemic. For example, if the chosen route is intravascular, the agent is administered by introducing the composition into a blood vessel of the subject. Exemplary routes of administration include, but are not limited to, oral, injection (such as subcutaneous, intramuscular, intradermal, intraperitoneal, and intravenous), sublingual, rectal, transdermal (for example, topical), intranasal, vaginal, and inhalation routes.

Antibody: A polypeptide ligand comprising at least one variable region that recognizes and binds (such as specifically recognizes and specifically binds) an epitope of an antigen, such as a viral antigen. Mammalian immunoglobulin molecules are composed of a heavy (H) chain and a light (L) chain, each of which has a variable region, termed the variable heavy (VH) region and the variable light (VL) region, respectively. Together, the VH region and the VL region are responsible for binding the antigen recognized by the antibody. There are five main heavy chain classes (or isotypes) of mammalian immunoglobulin, which determine the functional activity of an antibody molecule: IgM, IgD, IgG, IgA and IgE. Antibody isotypes not found in mammals include IgX,

IgY, IgW and IgNAR. IgY is the primary antibody produced by birds and reptiles, and has some functionally similar to mammalian IgG and IgE. IgW and IgNAR antibodies are produced by cartilaginous fish, while IgX antibodies are found in amphibians.

Array: An arrangement of molecules, such as biological macromolecules (such as peptides or nucleic acid molecules) or biological samples (such as tissue sections), in addressable locations on or in a substrate. In some embodiments herein, the array comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60 (such as 61) addressable locations. In particular examples, the array comprises peptide epitopes from each of the viruses listed in Table 5 A or Table 6. Control: A “control” refers to a sample or standard used for comparison with an experimental sample, such as a serum sample obtained from a subject suspected of having or at risk for HCC. In some embodiments, the control is a sample obtained from a healthy patient (e.g., one not having HCC or cirrhosis). In some embodiments, the control is a historical control or standard reference value or range of values (e.g., a previously tested control sample, such as a group of samples that represent baseline or normal values).

Diagnosis: The process of identifying a disease by its signs, symptoms and results of various tests. The conclusion reached through that process is also called “a diagnosis.” Forms of testing commonly performed include blood tests, medical imaging, and biopsy.

Early stage: In the context of the present disclosure, detecting “early stage” HCC refers to identifying HCC in a subject prior to the onset of symptoms and/or prior to standard clinical diagnosis. “Early stage” in this context is not synonymous with stage 0 or stage I cancer. In some embodiments, early stage HCC is characterized by the presence of a single lesion less than 3 cm in diameter (such as 0.1 to 2.9 cm in diameter, such as 0.5 to 2.5 cm, 0.5 to 1 cm or 1 to 2.9 cm in dimeter) without detectable local or distant metastatic lesions (such as detectable by CT or MRI).

Epitope: An antigenic determinant· These are particular chemical groups or peptide sequences on a molecule that are antigenic, i.e. that elicit a specific immune response. An antibody specifically binds a particular antigenic epitope on a polypeptide, such as a viral polypeptide.

Hepatocellular carcinoma (HCC): A primary malignancy of the liver, which in some cases occurs in patients with inflammatory livers resulting from viral hepatitis, liver toxins or hepatic cirrhosis (often caused by alcoholism). Exemplary therapies for HCC include but are not limited to, one or more of surgery, transarterial chemoembolization (TACE), ablative therapies (including both thermal and cryoablation), radio embolization, and percutaneous alcohol injection.

Isolated: An “isolated” biological component (such as a nucleic acid molecule, protein, or cell) has been substantially separated or purified away from other biological components, such as other chromosomal and extra-chromosomal DNA and RNA, proteins and cells. Nucleic acid molecules and proteins that have been “isolated” include nucleic acid molecules and proteins purified by standard purification methods. The term also embraces nucleic acid molecules and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acid molecules and proteins.

Sample (or biological sample): A biological specimen containing genomic DNA, RNA (including mRNA), protein (such as antibodies), or combinations thereof, obtained from a subject. Examples include, but are not limited to, peripheral blood, plasma, urine, saliva, tissue biopsy, fine needle aspirate, punch biopsy surgical specimen, and autopsy material. In specific embodiments herein, the sample is a blood or serum sample.

Sequence identity: The identity or similarity between two or more nucleic acid sequences, or two or more amino acid sequences, is expressed in terms of the identity or similarity between the sequences. Sequence identity can be measured in terms of percentage identity; the higher the percentage, the more identical the sequences are. Sequence similarity can be measured in terms of percentage similarity (which takes into account conservative amino acid substitutions); the higher the percentage, the more similar the sequences are.

Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al, Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. in the Biosciences 8, 155-65, 1992; and Pearson et al., Meth. Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations.

The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al, J. Mol. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biological Information (NCBI) and on the internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Additional information can be found at the NCBI web site.

Subject: Living multi-cellular vertebrate organisms, a category that includes human and non-human mammals. In some examples herein, the subject is suspected of having or at risk for having HCC.

Tumor: All neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. In some examples, the tumor is a HCC tumor.

II. Viral Exposure Signature and Methods of Use

Viruses are known to affect human health by altering host immunity, which makes the interplay between the virome and the host crucial in the pathogenesis of human chronic diseases, including cancer (Foxman et al., Nat Rev Microbiol 2011;9:254-64; Cadwell, Immunity 2015;42:805-813). Diverse pathogenic and non-pathogenic viruses may interact with one another as well as their host to shape host immunity, which may alter its response to new infections. Consequently, viruses that persist or are cleared in the host may leave unique molecular footprints that can alter disease susceptibility to cancer and may serve as an excellent window of early onset disease (Cadwell, Immunity 2015;42:805-813). It was hypothesized that unique post-viral exposure signatures resulting from virus-host interactions could reflect a cascade of events that may alter the risk of developing HCC. Such signatures could serve as early detection biomarkers and offer knowledge about potentially modifiable factors for early onset HCC. In the study disclosed herein, serological samples from 899 individuals enrolled in a case-control study of liver cancer (NCT00913757; clinicaltrials.gov) were profiled using a synthetic virome technology, VirScan, based on a high-throughput sequencing method, to detect exposure history to all known human viruses (Xu et al, Science 2015;348:aaa0698). A unique viral exposure signature (VES) that can discriminate HCC cases from CLD and healthy volunteers matched by age and sex is disclosed herein. The VES was validated in a prospective National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) at-risk cohort for HCC.

Provided herein are methods of identifying a subject as being at risk for developing HCC.

In some embodiments, the method includes detecting the presence or absence of antibodies to a plurality of viruses in a sample obtained from the subject; determining the presence of a viral exposure signature (VES) in the sample obtained from the subject; and identifying the subject as being at risk for developing HCC when the VES is present.

In some embodiments, the presence of the VES is determined by identifying antibodies to one or more of hepatitis C vims (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus, strain AD169; HCV genotype 6g, isolate JK046; HCV genotype lb, isolate BK; HCV genotype lc, isolate HC-G9; HCV genotype lb, strain HC-J4; HCV genotype 4a, isolate ED43; hepatitis delta vims; HCV genotype 5a, isolate EUH1480; human cytomegalovims; Crimean-Congo hemorrhagic fever virus, strain Nigeria/IbArl0200/1970; HCV genotype lb, isolate HC-J1; influenza A virus, strain A/USSR/90/1977 H1N1; influenza A vims, strain A/Bangkok/1/1979 H3N2; HCV genotype lc, isolate India; and Chapare vims, isolate Human/Bolivia/810419/2003.

In some embodiments, the presence of the VES is determined by not detecting antibodies to one or more of Epstein-Barr virus, strain B95-8; human rhinovirus 23; HCMV, strain Towne; human herpesvims 2 (HHV-2), strain HG52; human herpesvims 3; varicella-zoster vims, strain Dumas; Cercopithecine herpesvims 16; human adenovims C serotype 2; human astrovirus- 1 ; human respiratory syncytial virus; human herpesvims 6B, strain Z29; human herpesvims 7, strain JI; human rhinovims 14; Lordsdale virus, strain GII/Human/United Kingdom/Lordsdale/1993; human herpesvims 1, strain KOS; human metapneumovims, strain CAN97-83; coxsackievirus A16, strain G-10; Epstein-Barr vims, strain AG876; cowpox vims; human herpesvims 1, strain 17; human adenovirus E serotype 4; human adenovirus F serotype 40; tanapox vims; human adenovirus C serotype 5; rhino virus B; human herpesvirus 8; human herpes vims 6A, strain Uganda-1102; human rhinovirus A serotype 89, strain 41467-Gallo; norovirus MD145, isolate GII/Human/United States/MD145-12/1987; molluscum contagiosum virus subtype 1; vaccinia virus, strain Copenhagen; poliovirus type 1, strain Sabin; orf virus; HHV-2, strain 333; hepatitis B virus; Epstein-Barr virus, strain GDI; human parainfluenza 3 virus, strain Wash/47885/57; HHV-2; human enterovirus 71, strain BrCr; human herpesvirus 6A, strain GS; Cercopithecine herpesvirus 1; influenza B virus, strain B/Yamagata/16/1988; and influenza A vims, strain A/Philippines/2/ 1982 H3N2.

In some embodiments, the plurality of viruses includes at least 10, at least 20, at least 30, at least 40, at least 50 or at least 60 of the vimses listed in Table 5 A. In some examples, the plurality of vimses comprises or consists of the 61 vimses listed in Table 5A. In some examples, the plurality of vimses comprises or consists of the 31 vimses listed in Table 6.

In particular embodiments, step (ii) includes determining the presence of the VES in the sample obtained from the subject if (a) antibodies specific for three or more, four or more, five or more, six or more, or seven or more of hepatitis C vims (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus, strain AD169; HCV genotype 6g, isolate JK046; HCV genotype lb, isolate BK; HCV genotype lc, isolate HC- G9; HCV genotype lb, strain HC-J4; HCV genotype 4a, isolate ED43; hepatitis delta virus; HCV genotype 5a, isolate EUH1480; human cytomegalovims; Crimean-Congo hemorrhagic fever virus, strain Nigeria/IbArl0200/1970; HCV genotype lb, isolate HC-J1; influenza A virus, strain A/USSR/90/1977 H1N1; influenza A vims, strain A/Bangkok/1/1979 H3N2; HCV genotype lc, isolate India; and Chapare vims, isolate Human/Bolivia/810419/2003 are detected in the sample; and/or (b) antibodies specific for three or more, four or more, five or more, six or more, or seven or more of Epstein-Barr virus, strain B95-8; human rhinovirus 23; HCMV, strain Towne; human herpesvirus 2 (HHV-2), strain HG52; human herpesvims 3; varicella-zoster virus, strain Dumas; Cercopithecine herpes virus 16; human adenovims C serotype 2; human astrovims-1; human respiratory syncytial vims; human herpesvirus 6B, strain Z29; human herpesvirus 7, strain JI; human rhinovims 14; Lordsdale vims, strain GII/Human/United Kingdom/Lordsdale/1993; human herpesvirus 1, strain KOS; human metapneumovirus, strain CAN97-83; coxsackievirus A16, strain G-10; Epstein-Barr vims, strain AG876; cowpox vims; human herpesvirus 1, strain 17; human adenovims E serotype 4; human adenovims F serotype 40; tanapox virus; human adenovirus C serotype 5; rhinovirus B; human herpesvirus 8; human herpesvims 6A, strain Uganda- 1102; human rhinovirus A serotype 89, strain 41467-Gallo; norovims MD145, isolate GII/Human/United States/MD 145- 12/1987; molluscum contagiosum virus subtype 1; vaccinia virus, strain Copenhagen; poliovirus type 1, strain Sabin; orf virus; HHV-2, strain 333; hepatitis B virus; Epstein-Barr virus, strain GDI; human parainfluenza 3 vims, strain Wash/47885/57; HHV-2; human enterovirus 71, strain BrCr; human herpes vims 6A, strain GS; Cercopithecine herpesvirus 1; influenza B virus, strain B/Yamagata/16/1988; and influenza A vims, strain A/Philippines/2/ 1982 H3N2 are not detected in the sample.

In some embodiments, the sample is a blood or serum sample. In some examples, the method further includes obtaining the biological sample from the subject. In some examples, the subject is a human subject.

The presence of antibodies can be detected using any immunoassay. In some embodiments, the antibodies are detected by phage immunoprecipitation, immunoblot or enzyme-linked immunosorbent assay.

Also provided is a phage display library expressing unique peptide epitopes from each of the viruses listed in Table 5A or Table 6. The phage display library can be used to determine the presence of the VES. In some embodiments, the phage display library expresses the peptides of SEQ ID NOs: 1-61 (see Table 5B). In other examples, the phage display library expresses the peptides of SEQ ID NOs: 62-102 (see Table 3B). In some examples, the phage display library expresses the peptides of SEQ ID NOs: 1-102. In some examples, the phage display library expresses peptides at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to any of SEQ ID NOs: 1-61, SEQ ID NOs: 62-102 and SEQ ID NOs: 1-102.

Further provided is an array including unique peptide epitopes from each of the viruses listed in Table 5 A or Table 6. The array can be used to determine the presence of the VES. In some examples the unique peptide epitopes comprise the peptides of SEQ ID NOs: 1-61 (shown in Table 5B), the peptides of SEQ ID NOs: 62-102 (shown in Table 3B), or the peptides of SEQ ID NOs: 1-102. In some examples, the peptides have amino acid sequences at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identical to any of SEQ ID NOs: 1- 61, SEQ ID NOs: 62-102 and SEQ ID NOs: 1-102.

In other embodiments provided herein, the method of identifying a subject as being at risk for developing HCC includes (i) detecting the presence or absence of antibodies specific for a plurality of viruses in a sample obtained from the subject, wherein the plurality of viruses includes hepatitis C virus (HCV) genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; human cytomegalovirus (HCMV) strain AD 169; HCV genotype 6g, isolate JK046; Epstein-Barr virus (EBV), strain B95-8; human rhinovirus 23; HCMV strain Towne; HCV genotype lb, isolate BK; and human herpesvirus 2 (HHV-2), strain HG52; and (ii) identifying the subject as being at risk for developing HCC if: (a) antibodies specific for HCV genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; HCMV strain AD169; HCV genotype 6g, isolate JK046; and/or HCV genotype lb, isolate BK, are detected in the sample; and/or (b) antibodies specific for EBV, strain B95-8; human rhinovirus 23; HCMV strain Towne; and/or HHV-2, strain HG52, are not detected in the sample.

In some examples, step (ii) includes identifying the subject as being at risk for developing HCC if (a) antibodies specific for at least two, at least three, at least four, at least five or all six of HCV genotype 3b, isolate Tr-Kj; HCV genotype lb, isolate Taiwan; HCV genotype la, isolate 1; HCMV strain AD169; HCV genotype 6g, isolate JK046; and HCV genotype lb, isolate BK, are detected in the sample; and/or (b) antibodies specific for at least one, at least two, at least three or all four of EBV strain B95-8; human rhinovirus 23; HCMV strain Towne; and/or HHV-2 strain HG52, are not detected in the sample.

In some examples, the sample is a blood or serum sample. In specific examples, the method further includes obtaining the biological sample from the subject.

In some examples, the antibodies are detected by phage immunoprecipitation, immunoblot or enzyme-linked immunosorbent assay.

In some embodiments of the disclosed methods, the method further includes treating a subject with an appropriate therapy to aid in the prevention or treatment of HCC. In some examples, the appropriate therapy includes vaccination against hepatitis B vims (HBV) (such as administration of Engerix-B®, Recombivax HB®, or Heplisav-B®), anti- viral treatment against HBV (such as administration of PEG-IFN, entecavir, tenofovir, lamivudine, adefovir, and/or telbivudine) and/or anti- viral treatment against HCV (such as administration of one or more of glecaprevir, sofobuvir, daclatasvir, grazoprevir, and ombitasvir). Anti- viral drugs include, for example, nucleoside/nucleotide analogs (e.g., entecavir and tenofovir disoproxil fumarate), interferon, and lamivudine. In some examples, the method further includes performing a liver transplant in the subject with early stage HCC. In other examples, the method further includes liver resection of the subject with early stage HCC, with or without radiofrequency ablation (RFA).

In some embodiments, the method further includes active diagnostic monitoring of the subject with early stage HCC. For example, the subject can be monitored on a regular schedule, such as every 2 months, every 3 months, every 4 months, every 5 months or every 6 months, using ultrasound, contrast enhanced computerized tomography (CT) and/or magnetic resonance imaging (MRI).

In some examples, the additional treatment includes lifestyle or diet changes, including programs to reduce intravenous drug use, needle exchange programs, prevention of sexually- transmitted diseases, reducing or eliminating alcohol consumption, reducing obesity-related inflammation (such as by improving diet and increasing exercise), improving insulin resistance, increasing consumption of vegetables, consuming branched-chain amino acids and/or taking vitamin D. For some patients, such as those with hereditary hemochromatosis, iron overload can increase the risk of developing HCC. Thus, in some examples, the appropriate therapy includes treating iron overload. Aflatoxin B 1 , a known carcinogen produced by fungi of the Aspergillus species, is commonly found as a contaminate of grains, nuts, and vegetables in regions such as Asia and Africa. Thus, reducing aflatoxin exposure can also be used to prevent or treat HCC.

Additional preventative therapies and treatments are described in Schiitte et al. , Gastrointest Tumors 3(1): 37-43, 2016 and Schiitte et al, Gastrointest Tumors 2(4): 188-194, 2016.

III. Phage Immunoprecipitation Sequencing

In some embodiments of the present disclosure, the methods of detecting the presence or absence of specific antibodies in patient samples, and thereby determining the presence of the VES, can be performed using phage immunoprecipitation sequencing (PhIP-Seq). This method is a high- throughput method that allows for a comprehensive analysis of a subject’s antibody repertoire (see U.S. Publication No. 2016/0320406; Larman et al, Nat. Biotechnol 29: 535-541, 2011; and Mohan et al, Nat Protoc 13: 1958-1978, 2018; each of which is incorporated by reference herein).

PhIP-Seq is one method that can be used to rapidly detect the presence or absence of a plurality of virus-specific antibodies in a patient sample. Briefly, this method includes designing a peptide library that is representative of the viruses that are to be detected. In context of the present disclosure, the library includes, for example, the 61 or 31 unique peptide epitopes of the 61-VES or 31-VES, respectively (see Tables 5A and 6). An oligonucleotide library encoding the peptides is constructed and PCR-amplified with adapters for cloning into a selected phage display vector to produce the phage display library. A patient sample, such as a blood or serum sample, is contacted with the phage display library to allow for phage- antibody complex formation and subsequent immunoprecipitation. The library of peptide-encoding oligonucleotide sequences is amplified by PCR directly from the immunoprecipitate, bar-coded and subjected to deep sequencing. Additional details of this method can be found in U.S. Publication No. 2016/0320406; Larman et al. (Nat. Biotechnol 29: 535-541, 2011), Mohan et al. (Nat Protoc 13:1958-1978, 2018), and the Novagen T7Select System Manual (available online)

The following examples are provided to illustrate certain particular features and/or embodiments. These examples should not be construed to limit the disclosure to the particular features or embodiments described. EXAMPLES

Example 1: Methods

This example describes the materials and experimental procedures used for the studies described in Example 2.

Participants and VirScan Analysis

The patient cohort consisted of 899 sequentially enrolled participants (clinicaltrials.gov number: NCT0091375), including 150 HCC cases, 337 CLD as at-risk individuals (HR or AR, used interchangeably) and 412 healthy volunteers as a population control (PC) matched by age and sex (FIG. 1A).

Study Cohorts

UMD cohort. To measure virome-host interplay, 899 participants were recruited. Participants were grouped as (1) population control (PC, n=412) if they were relatively healthy without any diagnosis of liver disease; (2) high-risk (HR, n=337) if they were diagnosed with chronic liver diseases (hepatitis B vims (HBV), hepatitis C virus (HCV), hepatitis delta vims (HDV), aflatoxins from fungal contamination, alcohol, nonalcoholic fatty-liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH)); or hepatocellular carcinoma (HCC, n=150) if they were diagnosed with HCC. All clinic measurements were covered by NCT0091375 (clinicaltrials.gov) with the enrollment criteria as the liver disease status. Semm, matching huffy coat and cheek swab samples were collected from each individual.

NIDDK cohort. This cohort consisted of 173 patients with chronic liver disease that included 44 HCC cases with 129 controls matched by liver disease etiology, age and sex. Patients were enrolled in a natural history protocol (clinicaltrials.gov number; NCT0001971) with longitudinal follow-up, at least annually with serologic testing and imaging, for up to 20 years.

Only cases with complete clinical and laboratory data and available longitudinal serologic samples were selected for analysis. The 44 HCC cases were sequentially identified out of 3,067 patients followed in this natural history study on chronic liver disease, and the controls were matched on a 2:1 basis as described above. HCC was diagnosed by radiologic imaging and/or liver biopsy as described by the American Association for the Study of Liver Disease (AASLD) practice guidelines (see Marrero et al, “Diagnosis, Staging and Management of Hepatocellular Carcinoma: 2018 Practice Guidance by the American Association for the Study of Liver Diseases,” Hepatology 68(2): 723-750, 2018). For the purposes of this analysis, stored serum samples (-80°C) were analyzed at study entry (baseline) and at recurrent time points until the time of HCC diagnosis.

Sample collection

Blood samples were collected and stored at -80°C (n=899 from UMD, n=488 from NIDDK). Buffy coat and cheek swab samples also were collected and stored at -80°C (n=849 from UMD).

Virscan PhIP-seq

Phage immunoprecipitation and sequencing were performed using a slightly modified version of previously published PhIP-Seq protocols. First, 96-deep-well plates were blocked with bovine serum albumin in TBST overnight on a rotator at 4°C. The diluted 1 ml bacteriophage library was added in each blocked well. Serum samples, containing 2 mg IgG, were mixed with the bacteriophage library. Two technical replicates for each sample were set up. After an overnight rotation, protein A and protein G Dynabeads were added to each well. After another 4-hour incubation on a rotator at 4°C with a 96-well magnetic stand, the beads were washed three times with 400 ml of PhIP-Seq wash buffer. Next, the beads were resuspended in water and lysed at 95°C for 10 minutes. Blank PBS samples (instead of serum) were also set up as negative controls on each plate. Two rounds of PCR were performed to amplify and multiplex on the lysed bacteriophage DNA product. After the second round of PCR, PCR products were pooled using equimolar amounts of all 192 samples for gel extraction. After gel extraction, the size and quality of libraries were assessed on a Bioanalyzer instrument from Agilent. The DNA samples were aliquoted and stored at -80°C until sequencing. Sequencing was performed using 50 bp single read protocol on Illumina HiSeq 4000 platform (1X50 bp), which obtained -100 million to 200 million reads per lane (around 1,000,000 reads per sample in current setting).

Raw data from Illumina HiSeq 4000 platform was processed by BCL2FASTQ2 for demultiplexing and converting binary base calls and qualities to fastq format. The fastq files were mapped to original virome peptide reference sequences using the Bowtie program. Two sequencing samples were cut off from next-step analysis as their reads were less than 30,000. The initial informatics and statistical analysis were performed using a slightly modified version of the previously published technique and in-house scripts. Briefly, the scatter plots of the loglO of the - log 10 (P values) and a sliding window of width 0.005 from 0 to 2 across the axis of one replicate were used. It was determined that the distribution of the threshold -log 10 (P value) was centered around a mode of -2.358 (FIG. 5B). The 593 hits that came up in at least 3 of the 22 immunoprecipitations with PBS beads alone blank sample were eliminated. Also, any peptides that were not enriched in at least two of the samples were filtered out. A threshold number of hits per virus was set based on the size of the virus. If the hit shared a subsequence of at least 7 amino acids with any hit previously observed in any of the viruses from that sample, that hit was considered to be from a cross-reactive antibody and would be ignored for that virus. The peptide hits, which do not share any linear epitopes, were summed to be strain and species score data. The final score was compared for each vims to the threshold for that virus to determine whether the sample is positive for exposure to that viral species. The raw count data were calculate based on -log 10(p- value)

2.358 cutoff.

DNA sample extraction

DNA extraction from huffy coat or lymphocyte samples was performed following the manufacturer’s instruction (DNeasy Blood & Tissue Kit from Qiagen). The eluted DNA was stored at -20°C for further analysis.

GWAS platform

Illumina OmniExpress was applied for the SNP array. Genotyping was performed on 200 ng of genomic DNA using Illumina Infinium HTS Global Screening Arrays on an Illumina iScan system. The raw genotyping data were processed by Illumina GenomeStudio software 2.0. Quality control was performed using PLINK version 2.0 (available online). Samples with a genotyping call rate<95% were removed. SNPs with MAF (Minor Allele Frequency) <0.05, HWE (Hardy- Weinberg equilibrium) <10-4, and call rate<95%, were excluded.

GWAS analysis

Variant quality control was performed. After filtering, 849 individuals and 713,111 SNPs remained for further analysis, with the total genotyping rate 99.79%. Hardy-Weinberg equilibrium deviation was flagged at p value < 0.0001. Independent loci in regions were identified for SNPs associated with vims feature phenotype at P <5 x 10-7 using PLINK. LocusZoom was used to plot regional signals associated with phenotype with LD and recombination rate calculated from 1000 Genome. LD structure of signals were further investigated with Haploview. A linear regression with additive model was applied to estimate the genotypic effect the SNP contributed to the disease or phenotype. ELISA assay

IgG, IgA and IgG4 levels in serum were measured using human ELISA kits (Bethyl and Thermo Fisher) according to the manufacturers’ instructions. ELISA result reading was performed using a machine (Biorad).

Statistical Methods

To identify differences between populations, Xgboost and LEfSe were used to calculate the significance of association of virus exposure traits with HCC versus PC.

XGBoost

XGBoost (available online) is software for a machine learning method of regression and classification using ensemble learning with gradient tree boosting. It is designed to increase the scalability and acceleration of optimized computation for practical use. XGBoost includes three types of parameters - general, booster and task. Each of the types has several hyperparmeters, such as maximum depth of the regression trees, number of weak learners, learning rate, and regularization, that need to be tuned. These parameters were tuned using a grid search to maximize the mean AUC value computed from 5-fold cross validation on the training data. After finding the optimal values of the hyperparameters, the model was constructed using the following main parameter setting: max_depth =3, eta =0.1, subsample=l, colsample_bytree =0.5, and min_child_weight=l. Then XGBoost was applied to the entire data set with 200 boosting iterations. To avoid over-fitting, stop model training at least 20 rounds when no improvement was observed in AUC value was set (early_stopping_rounds=20). The best iteration model was used as the final model. XGBoost automatically conducts feature selection and calculates importance for each feature. Multiple subsets of the features were tested to achieve the highest AUC and a decision was made to take all of the output features for further analysis. For each training and testing sample, a virus feature score was also generated based on the features selected and implemented in the XGBoost classification prediction.

LEfSe

The LEfSe method of analysis first compares abundance of all viral clades (in this case between PC and HCC) by Kruskal- Wallis test at a pre-defined a of 0.05. Significantly different vectors resulting from the comparison of relative abundances between PC and HCC are used as input for linear discriminant analysis (LDA), which produces an effect size and a p- value. The LDA threshold on the logarithmic LDA score for discriminative features is set up at 2.0. LEfSe also calculated the hierarchically organized viral taxa. The relative abundance data for Lefse test was prepared based on strain and species score data.

Additional Statistical Methods

All analyses were conducted in R and GraphPad Prism 7 (La Jolla, CA) and used for statistical analyses. Data are presented either as means +/- s.e.m. or medians of continuous values and were analyzed by a two-sided Student’s t-test or Mann- Whitney test used for comparison of two groups, respectively. Fisher’s exact X2 t-test was used to calculate statistical significance of categorical values between groups. Two-tail P values with no more than 0.05 were considered significant. Linear regression was used to determine the correlation between two different variables.

Viral feature level, clinical outcome and ROC curve

All HCC patients were classified into high, low or below viral feature score groups based on viral feature levels (FIGS. 4B and 6A). Kaplan-Meier estimates of overall survival were estimated for each group and compared using the log rank test. Hazard ratios and 95% confidence intervals were calculated using univariate and multivariate Cox proportional hazards models to assess associations between different viral feature level along with several clinical factors. The ability of clinical and viral features in predicting HCC was assessed by computing receiver operating characteristic (ROC) curves using the logistic regression in R. Area under the curve (AUC) values were calculated for these variables.

Example 2: Viral Exposure Signature (VES) for Diagnosis of Hepatocellular Carcinoma (HCC)

This example describes the development of two virus exposure signatures - a first VES based on detection of 61 viral strains and a second VES based on detection of 31 viral strains - to identify subject’s at risk for developing HCC.

The landscape of viral exposure profiles

VirScan applies a phage display library that covers 93,904 viral epitopes, representing 206 human viral species and over 1000 viral strains, to screen for previous exposure history (Xu et al, Science 2015;348:aaa0698). A phage particle with an epitope that was recognized by a participant’s antibody was immunoprecipitated (Phage- IP), and the encoding DNA barcode was then sequenced (FIG. 1A). A case-control design of the Maryland (NCI-UMD) cohort was used for the discovery of viral exposure profiles. The inclusion and enrollment of the study subjects are outlined in FIG. 5E, following the CONSORT guideline (Schulz et al., BMJ 340:c332, 2010)

(Table 8). For the NCI-UMD cohort, VirScan Phage-IP products yielded 0.5-5 million single-end reads per serum sample, with the mean of the mapped reads rate of 0.93 (FIG. IB). A total of 30,033 viral epitopes were significantly emiched with a p- value (-log 10) greater than the reproducibility threshold of 2.358 based on both replicates (FIGS. 5A-5B). It was noted that the composition of the viral types at the viral taxonomic level showed small yet noticeable differences between the obtained Phage-IP products and the library input (FIGS. 5D-5E), indicating a measurable difference between patients-derived data and the original input. When assessing viral richness among PC, HR and HCC, it was determined that the numbers of viral infection increased along with the sample size and reached saturation at the sample size over 200 (FIG. 1C). An average of 7 species of virus per sample was detected and more than 20 out of 206 viral species were found in four individuals (FIG. ID). Overall, the distribution of viral species was similar among PC, HR and HCC (FIG. ID), indicating no bias in the landscape of overall viral exposure profiles between different groups. The abundance of the most prevalent viral species among all volunteers such as human herpesvirus 4 (EBV) and human herpesvirus 5 (HCMV) was similar to a prior population study (FIG. IE, Table 2A) (Xu et al, Science 2015;348:aaa0698), and was consistent with previous epidemiology reports (Straus et al, Ann Intern Med 1993;118:45-58; Ho, Rev Infect Dis 1990;12 Suppl 7:S701-S710). However, the HCV infection rate (26.4%) in this study was relatively high, which was mainly contributed by AR (48.4%) and HCC (39.3%) (Table 2A; FIG. 10). A wide range of unique viral epitopes for each viral species that were recognized among different participants was detected, indicating that B-cell antigenicity to the same viral species is diverse among the participants (FIG. IE, right panel; FIG. 10). Moreover, global compositions of the viral types at the viral taxonomic level show small but noticeable differences between Phage-IP products and the library input (FIGS. 5C-5D).

To further assess the quality of VirScan, the results of VirScan were compared to available medical chart entries for HCV, HBV and HIV testing results and found that VirScan had 45%, 47% and 70% specificity in detecting HCV, HBV and HIV, respectively, when compared to these medical record data (FIG. 2A). In contrast, its sensitivity was 84% for HCV, 48% for HBV and 73% for HIV. A majority of viral status data from medical charts was unknown or missing (Table 2B), which makes this comparison suboptimal. Epitope enrichment of HCVlb, a major type associated with HCC (Bruno et al, Hepatology 2007;46:1350-1356), was also examined. Consistently, an increase in peptide enrichment, corresponding mainly to the core, NS4 and NS5A of HCVlb, was observed among AR and HCC compared to PC, and these regions were consistent with the prediction score of B-cell antigenicity (FIG. 2B). The presence of HIV and other viruses known to have co-infection with HIV (Xu et al, Science 2015;348:aaa0698; Chang et al, Immunol Rev 2013;254:114-142; Echavarria, Clin Microbiol Rev 2008;21:704-715; Stover et al, J Infect Dis 2003;187:1388-1396) was also examined. A significant increase of co-infection between HIV and human herpesvirus 5, human adenovirus C, human adenovirus D, human herpesvirus B or HBV was found, with a false discovery rate (FDR) <0.05 (FIG. 2C). Taken together, the above results revealed that VirScan is a reliable method to capture a broad spectrum of viral exposures with a serological test.

HCC-associated VES

A gradient boosting approach was applied to search for the best-fit vims composition that can discriminate HCC from PC (FIG. 3 A). Using 10-fold cross validation and 1,000 random permutations, it was found that a VES can significantly discriminate HCC from PC with an AUC value of 0.9 and 0.7 for training and cross validation, respectively (FIG. 3B). This signature consisted of unique peptides corresponding to 61 viral strains (FIG. 3C). Among them, 18 viruses were positively associated, while the remaining viruses were negatively associated, with HCC. HCV, including 11 unique variants such as 3b or Taiwan lb among others, was the main contributing vims in the signature. This was not surprising since 39.3% of HCC cases from this cohort were HCV+. It was also found that herpesvirus 5, HDV, influenza virus H1N1 and influenza virus H3N2 were enriched in the HCC group. In contrast, 43 vimses, such as human respiratory syncytial vims and human rhino vims 23, were preferentially depleted in the HCC group (Table 5A, FIG. 3C). Weighed VES scores of the 61 vimses differed significantly between HCC and PC (p<0.0001), as well as HCC and HR (p<0.0001), or HR and PC (p<0.0001) (FIG. 3D). There was a significant increase among PC, HR and HCC (ptrend <0.0001), suggesting that the VES was positively linked to hepatocarcinogenesis.

A phylogenetic analysis of the reactive epitopes of the 61 viral strains was performed to determine similarity among these HCC-related viruses (FIG. 3E). To search common reactive viral epitopes either enriched or depleted in HCC, viral epitopes that rank at the top for their association with HCC were restricted. These vimses can be divided into eight main branches where different HCV epitopes are clustered together with other viral epitopes, with an exception of cluster #6, which contains six HCV variants (out of 12 viruses) (FIG. 3E; Table 5B). In general, there was no clear enrichment within each branch for increased or decreased vimses, suggesting that varying viral epitopes involved in immunoreactivity are commonly shared among HCC. Since a majority of HCC patients have evidence of CLDs, to avoid this confounding variable, AR was also compared to HCC using the same gradient-boosting approach. It was found that an AR versus HCC VES can significantly discriminate HCC from AR or PC with AUC values similar to VES for training and cross validation (FIGS. 11A-11B). A majority of these VES-related viral strains overlap (FIG.

11C). To further test the robustness of VES, a 60/40 split was performed where 60% of cases were used for VES discovery while the remaining 40% of cases were used for an independent prediction. In total, 1,000 permutations of the split were performed to establish the confidence interval (Cl). Again, similar VES was found with a mean of AUC 0.7 for prediction (FIGS. 1 ID-11G).

Another statistically conserved method, the linear discriminant analysis of effect size (LEfSe, or LDA) (Segata el al., Genome Biol 2011;12:R60), was used to search for HCC associated viruses. Furthermore, pairwise comparisons were performed for viral taxa at all levels including DNA/RNA viruses, viral families, viral species and viral strains between HCC and PC. In addition to VES at the strain level, this analysis also identified the viral taxonomic differences by viral families, such as Flaviviridae of positive single-strand RNA viruses, Pneumoviridae of negative single-strand RNA viruses and Circoviridae of single-strand DNA viruses. These analyses resulted in 341 viruses that can significantly distinguish HCC from PC. Among them, several HCV variants, herpesvirus 5 variants, Norwalk vims variants, cytomegalovirus, adenovirus variant and astrovirus-1 were uniquely different between PC and HCC (FIG. 6B). A total of 31 viruses were overlapping between Xgboost and LEfSe (Table 6) and were different between PC and HCC. Unsupervised hierarchical clustering of the abundances of the top-ranking viruses revealed that HCC were more closely related to HR than PC, consistent with the VES prediction score (FIG. 6A, FIG. 3D). Collectively, these results indicate that a unique set of VES is robust in defining HCC.

Validation of the VES in HCC

To further validate the two VES identified above for their clinical utility, VirScan profiles in the at-risk NIDDK cohort for HCC was analyzed. This cohort consisted of 173 CLD patients (NIDDK-HR) who were enrolled for a natural history study for liver disease with a follow-up of up to 20 years (Table 1; FIG. 8 A). Among them, 44 individuals developed HCC. This cohort contained serum samples collected at enrollment (baseline) and at various follow-up time points until a diagnosis of HCC (diagnosis). Logistic regression analysis was performed using the VES from either all 61 viruses (FIG. 4) or the overlapping 31 viruses (FIG. 7) and receiver-operating characteristic (ROC) curves were generated corresponding to the Maryland cohort or the NIDDK- HR cohort, respectively. The areas under the curve (AUC) were 0.89, 95% Cl (0.86-0.92) for 61- VES (FIG. 4A) and 0.85, 95% Cl (0.81-0.88) for 31-VES in the Maryland cohort (FIG. 7A). It was observed that levels of 61 -VES scores varied among HCC cases in the Maryland cohort with some having below the detection limit and others having either low or high levels (FIG. 4B). Patients with a high level had a significantly worse survival compared to patients with a low level or below the detection limit (log rank p=0.026, and p-trend = 0.033) (FIG. 4C). Similar results were observed with the 31-VES. Among patients from the NIDDK cohort, VirScan data were available for 40 HCC cases at baseline, 129 controls at baseline, 44 HCC cases at diagnosis and 106 controls at diagnosis (n=106). The average number of viral species in each case of NIDDK cohort were 6.

Table 9A shows the results from univariable and multivariable Cox model survival analysis on several clinicopathologic variables to clarify the independent and additional prognostic value of VES. Among patients from the NIDDK cohort, VirScan data were available for 40 HCC cases at baseline, 129 controls at baseline, 44 HCC cases at diagnosis and 106 controls at diagnosis. It was found that the AUC values were 0.98, 95% Cl (0.97-1.00) at diagnosis (FIG. 4D) and 0.91, 95% Cl (0.87-0.96) at baseline (FIG. 4E) with 61-VES. Similar results were obtained with 31-VES. The performance of the VES was superior to alpha-fetoprotein (AFP), a known HCC diagnostic marker used in the clinic. The 31-VES yielded AUC values of 0.92, 95% Cl (0.87-0.97) and 0.81, 95% Cl (0.74-0.89) at diagnosis and at baseline, respectively, when combined with AFP. The DeLong test showed a significant improvement between VES and AFP (p values 4xl0 -12 and 8xl0 -10 at baseline and diagnosis, respectively) (FIGS. 4D and 4E). Similar trends (p-trend = 0.19) were also found between the levels of VES and overall survival among 44 patients in the NIDDK cohort (FIG. 8B). In order to assess the time-dependent performance of VES to predict the onset of HCC, 104 cancer- free controls and 40 HCC cases (from the NIDDK validation cohort) for which at least two time points were available were analyzed. In the context of survival modeling, an event was defined as the occurrence of an HCC diagnosis. Under this interpretation, censoring time was defined as the time difference between baseline and follow-up within the cancer- free control group, whereas event time was defined as the time difference between baseline and HCC diagnosis within the HCC group. Table 9B shows results from a multivariable Cox regression model generated to predict the occurrence of HCC diagnosis based on VES scores at baseline, adjusted for clinical prognostic variables. Moreover, a time-dependent ROC curve analysis (Bansal and Heagerty, Diagn Progn Res 3:14, 2019; Blanche et al, StatMed 32: 5381-5397, 2013) was performed to assess the performance of VES over a range of landmark time points from 1 to 10 years relative to baseline (FIGS. 4F and 8C), which appears very robust and stable across this range. It was found that patients who developed HCC had, on average, much higher VES scores at baseline and at different times of follow-up until HCC diagnosis, when compared to cancer-free at-risk patients who were followed up at a similar time interval without developing HCC (FIG. 4G). A statistically significant increase in viral exposures (p < 0.05) was observed only for patients who developed HCC over time during the surveillance period in the NIDDK cohort. It appears that HCC cases with a high viral exposure had a more aggressive disease than those with a low viral exposure, and that VES was a robust indicator of early onset of HCC in this prospective cohort. Furthermore, the prediction performance of AR versus HCC based on VES was superior to other clinical indicators from the patient charts, such as AFP, alanine transaminase (ALT), cirrhosis and platelet counts, as well as the combination of all key clinical variables, as shown by analyses of the NIDDK cohort at baseline (FIG. 4H), which agree qualitatively with those of NIDDK at diagnosis (FIG. 8D) and the NCI- UMD cohort (FIG. 8E). An association of VES and HCC was similarly found in both HCV- positive and HCV-negative patients (Table 9C).

Phenotype-genotype association with VES To determine if host genetic background may be linked to VES, a genome- wide association study (GW AS) in the Maryland cohort was performed, as this approach may help identifying susceptibility variants related to viral infection and cancer (McKay J et al, Nat Genet 2017;49:1126-1132; Pharoah et al, Nat Genet 2013;45:362-370; Fumagalli et al., PLoS Genet 2010;6:el000849). After assessment using the genetic quality control measures, 849 participants (PC, n=402; HR, n=323; HCC, n=124) were included in the analysis. Following the removal of monoallelic SNPs and the ones that deviate away from Hardy-Weinberg equilibrium, an association test was performed for all the remaining SNPs. To further assess the quality of the GWAS data, it was determined whether there was an association between an SNP, rsl2979860 in IL28B, and HCV infection. As its favorable genotype, CC has been shown to be associated with better HCV treatment response or natural clearance. It was found that rsl2979860-CC was significantly associated with HCV genotype 3 with odds ratio (OR) 2.74 (95% Cl 1.14-7.97) in a dominant model manner (Table 3A). Furthermore, the SNP associated with 375 epitopes abundances of HCV genotype 2 and 3 was evaluated. The CC allele was found to be associated with a decreased abundance of core epitopes but an increased abundance of NS5B epitopes in the HCV genome (FIG. 7C; Table 3B), consistent with a recent study (Ansari el al, Nat Genet 49:666-673, 2017). To assess VES -associated SNPs, HCC and PC groups were combined and then divided into two groups based on dichotomization of VES scores. In the associated quantile-quantile plots (FIG.

8B), a wider spread with small differences in allele frequencies was evident with increased slope of the line. Principal-component analysis based on genotyping revealed differences in ethnicity (FIG. 7B).

Manhattan plot analysis revealed several SNPs with much larger differences between high and low VES scores having the p- values < 10 -5 (FIG. 9A). Three SNPs, rs34725101, rs4483229, and rs 16960234, in three different genomic regions corresponding to RHOA, EPB41L4B and CDH13, respectively, had the p- values < 10-7, an acceptable standard for common-variant GWAS, to be linked to VES (Table 3C and FIG. 9A). Among them, rs 16960234 was further analyzed because both major and minor alleles of this variant could be detected in this cohort. High linkage disequilibrium (LD) SNPs (r2>0.6) were also found for rsl6960234, but not rs34725101 and rs4483229 (FIGS. 9B-9C; Table 7). Seven of the high LD SNPs of rsl6960234 showed the expression profile of CDH13 as expression quantitative trait loci (eQTL) in genotype-tissue expression (GTEx) database (McKay J et al, Nat Genet 2017;49:1126-1132). The CDH13 expression levels in the artery tibial tissues from the carriers with risk/protective G/G genotype of rs 16960234 were significantly higher than the carriers with protective/risk genotype A/A (FIG.

9D). To obtain the genotypic effects of rs 16960234 in HCC or HR, logistic regression was constructed and the genotypic odds ratio of this SNP in HR or HCC was calculated and compared to PC (FIG. 9E). rs 16960234 genotyping G/G showed significant increase risk in HR vs. PC, OR; 1.89 (0.30-11.4) and risk was even higher in HCC vs. PC, OR: 7.22 (1.30-40.0) (FIG. 9E; Table 4). Consistent with genotypic effect in HCC, the VES score also showed gradual increases in heterogeneous A/G and G/G compared with A/A (FIGS. 9F-9G). Thus, rsl6960234 and its linked gene CDH13 may be associated with VES and contributed to the disease risk. Diagnostic Applications

Detecting cancer at an early stage preferably before it is symptomatic may provide an opportunity in achieving a cure and improving outcomes on cancer-related mortality. Evidence suggests that earlier detection of cancer improves survival for some cancer types, such as cervical and colon cancers. A conventional approach is to develop biomarkers specific for cancer cells to aid in early cancer diagnosis. CancerSEEK is an emerging platform successful in achieving a good sensitivity and specificity to clinically-detected multiple cancer types by profiling circulating cell- free DNA (ctDNA) presumably shed from tumor cells (Cohen el al, Science 2018;359:926-930). A recent study offers a cautionary note for measuring cancer gene panels using ctDNA because of its high false positivity among healthy individuals (Liu et al, Ann Oncol 2019;30:464-470).

Molecular and biological heterogeneity of cancer cells contributed by complex etiological landscape creates a dilemma as how best to design cancer-specific diagnostic panels effective for early cancer detection. As such, a continuous debate has been carried out in recent decades for many malignant diseases including HCC as whether available methods are adequate in achieving this goal (Sherman et al, Hepatology 2012;56:793-796; Shieh et al, Nat Rev Clin Oncol 2016;13:550-56).

HCC is a unique malignancy for which most major causative etiologies are known (Wang and Thorrgeirsson, Oncology 2014; 1:5). However, defining biomarkers specific for HCC cells has been challenging because of its complex genomic landscape with extensive intratumor and intertumor heterogeneities. Are there common features shared among HCC patients to be used as a surrogate for early detection? An emerging concept is that an interplay between viral infection and host genetic background is crucial for maintaining virome homeostasis or causing human disease (Virgin, Cell 2014;157:142-150). The study disclosed herein assessed how a history of viral exposures by an individual is associated with their risk of developing HCC. Using a synthetic viral scan technology (VirScan) with a simple blood test (Xu et al, Science 2015;348:aaa0698), a VES was identified that could discriminate HCC with a high confidence from individuals with chronic liver diseases or from healthy volunteers. Remarkably, this signature was able to identify individuals at a medium follow-up year of 8.8 prior to a clinical diagnosis of HCC. Thus, these results offer a sensitive tool applicable to the HCC surveillance program to improve early diagnosis.

The current study took the advantage of a simple tool to profile serological samples to link an individual’s history of viral infection and corresponding response to early onset HCC. The strategy was first to search VES using a case control design that include HCC cases as well as at- risk individuals with chronic liver diseases and healthy volunteers matched by age and sex. A VES that can discriminate HCC from at-risk and healthy individuals was then validated using a prospective cohort of sequentially enrolled at-risk patients who were followed up for the development of HCC. The VES consists of known HCC etiologies such as HCV, HBV and HDV, but also includes other viruses such as herpesviruses 4 and 5, Crimean-Congo hemorrhagic fever virus, cytomegalovirus, and influenza A vims, among others. A few features are noted. First, HCV appears to be a major etiology driving VES but an extended heterogeneity in various HCV subtypes are noted in both Maryland and NIDDK cohorts. Second, a set of viruses are enriched while many others including HBV are depleted in HCC patients.

The current method of VirScan is based on the phage immunoprecipitation sequencing (PhIP-Seq) technology that provides a powerful approach for analyzing antibody-repertoire binding specificities with high throughput and at low cost to all known human viruses (Mohan et al, Nat Protoc 2018;13:1958-1978). Comparing VirScan results with HCV and HBV status from medical chart of the UMD cohort, it was found that VirScan shows great specificity for both HCV and HBV, and good sensitivity for HCV but to a lesser extent for HBV. HCV encodes a large polyprotein consisting of -3,000 amino acids, which is cleaved co- and post-translationally into ten different proteins associated with intracellular membranes (Bartenschlager et al, Nat Rev Microbiol 2013;11:482-496). Consistently, HCV antigen reactivity largely overlapped with the predicted antigenicity score by the B-cell epitope prediction method coinciding with peptides to be presented at the surface of the cellular membrane. Consistent with early reports for the likelihood of coinfection of HIV and other viruses associated with AIDS and non- AIDS diseases (Xu et al , Science 2015;348:aaa0698; Slyker et al, J Infect Dis 2013;207:1798-1806; Lichtner et al, J Infect Dis 2015;211:178-186), evidence of coinfection between HIV and viruses such as HBV, herpesvirus 8 and adenovirus D, influenza B vims, adenovims C, and herpesvirus 5 was found in patients enrolled in the Maryland cohort. History of HCV infection is prevalent among at-risk (48%), HCC patients (39%) and healthy volunteers (4%) who reside in Maryland. This is in contrast to an estimated prevalence of about 4.6 million persons (-1.5%) infected with HCV in the U.S. (Edlin et al, Hepatology 2015;62:1353-1363). It should be noted that 7.5%-44% of incarcerated individuals and 4%-38% of hospitalized patients tested positive for HCV (Edlin et al, Hepatology 2015;62:1353-1363), suggesting that the current surveys underestimate the prevalence of HCV infection. In contrast, while 2.6% of the Maryland healthy individuals showed evidence of HBV infection, more than 800,000 chronic HBV carriers were detected during 2011-2012 in the non institutionalized U.S. population (Roberts et al, Hepatology 2016;63:388-397). The current survey methods may underestimate the prevalence of HBV and HCV. This is important as both HBV and HCV are major causative factors for HCC. Collectively, VirScan is a reliable method for profiling viral exposure and is scalable regarding to sample throughput and relatively low cost per analysis amenable for surveillance and early detection of HCC.

Table 2A. Viral Frequency in 899 patients and volunteers from NCI-UMD cohort

1 pC: population control

2 HR: high risk group

3 HCC: hepatocellular carcinoma Table 2B. Comparison of VirScan with HBV, HCV, and HIV from medical charts

Virscan Virscan

Clinical Variable

Negative Positive

Hepatitis B Virus (HBV)

HBV surface antibody

Negative 88 75

Positive 48 27

Unknown/Missing 132 117

HBV core antibody

Negative 80 18

Positive 82 10

Unknown/Missing 250 47

Hepatitis C Virus (HCV)

HCV IgG antibody

Not detected 27 34 Detected 24 123

Unknown/Missing 95 184 HCV RNA PCR

Negative 2 2

Positive 2 18 Yirscan Yirscan

Clinical Variable

Negative Positive

Unknown/Missing 142 321

Human Immunodeficiency Virus (HIV)

Negative 268 116

Positive 7 19

Unknown/Missing 58 19

1 Regulation: the frequency of the virus feature is higher in disease population (increased) or lower (decreased)

2 Importance score: the improvement in accuracy brought by a feature to the decision tree branches it is on. The higher the score is, the more important the feature is to the module prediction

Table 5B. Most frequent epitopes from the 61-VES

'PC: population control, HCC: hepatocellular carcinoma

2 LDA score: Linear discriminant analysis (LDA) effect size, the degree of consistent difference in relative abundance between features in the two groups

5

In view of the many possible embodiments to which the principles of the disclosed subject matter may be applied, it should be recognized that the illustrated embodiments are only examples of the disclosure and should not be taken as limiting the scope of the disclosure. Rather, the scope of the disclosure is defined by the following claims. We therefore claim all that comes within the scope and spirit of these claims.