Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MATERIALS AND METHODS FOR ASSESSING VIROME AND MICROBIOME MATTER
Document Type and Number:
WIPO Patent Application WO/2020/250068
Kind Code:
A1
Abstract:
A method is described of analyzing the microbiome, including the virome, of a patient. Viral marker clusters for diagnosing inflammatory bowel disease, Crohn's disease and ulcerative colitis are identified from such analysis. Methods of diagnosis and treatment of dysbiosis and various disorders, such as inflammatory bowel disease, Crohn's disease and ulcerative colitis, are also included.

Inventors:
PLEVY SCOTT (US)
HILL COLIN (IE)
SHKOPOROV ANDREY (IE)
CLOONEY ADAM (IE)
SUTTON THOMAS (IE)
Application Number:
PCT/IB2020/055047
Publication Date:
December 17, 2020
Filing Date:
May 27, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV COLLEGE CORK – NATIONAL UNIV OF IRELAND CORK (IE)
JANSSEN BIOTECH INC (US)
International Classes:
C12Q1/70; A61K35/741
Foreign References:
US20180125900A12018-05-10
US20100074872A12010-03-25
US7912698B22011-03-22
US20110202322A12011-08-18
US20110307437A12011-12-15
US20130149339A12013-06-13
Other References:
TAO ZUO ET AL: "Gut mucosal virome alterations in ulcerative colitis", GUT MICROBIOTA, vol. 68, no. 7, 6 March 2019 (2019-03-06), UK, pages 1169 - 1179, XP055720085, ISSN: 0017-5749, DOI: 10.1136/gutjnl-2018-318131
JASON M. NORMAN ET AL: "Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease", CELL, vol. 160, no. 3, 1 January 2015 (2015-01-01), AMSTERDAM, NL, pages 447 - 460, XP055720670, ISSN: 0092-8674, DOI: 10.1016/j.cell.2015.01.002
ANDREY N. SHKOPOROV ET AL: "The human gut virome is highly diverse, stable and individual-specific", BIORXIV, 3 June 2019 (2019-06-03), XP055720288, Retrieved from the Internet DOI: 10.1101/657528
HO BIN JANG ET AL: "Gene sharing networks to automate genome-based prokaryotic viral taxonomyABSTRACT", BIORXIV, 29 January 2019 (2019-01-29), XP055720424, Retrieved from the Internet DOI: 10.1101/533240
SHKOPOROV ANDREY N ET AL: "Bacteriophages of the Human Gut: The "Known Unknown" of the Microbiome", CELL HOST & MICROBE, vol. 25, no. 2, 13 February 2019 (2019-02-13), pages 195 - 209, XP085602333, ISSN: 1931-3128, DOI: 10.1016/J.CHOM.2019.01.017
WEITZ ET AL., ISME J, vol. 9, 2015, pages 1352 - 64
CANCHAYA ET AL., CURRENT OPINION IN MICROBIOLOGY, vol. 6, 2003, pages 417 - 424
GEVERS ET AL., CELL HOST MICROBE, vol. 15, 2014, pages 382 - 392
HALFVARSON ET AL., NAT MICROBIOL, vol. 2, 2017, pages 17004
LE CHATELIER ET AL., NATURE, vol. 500, 2013, pages 541 - 6
FORSLUND ET AL., NATURE, vol. 528, 2015, pages 262 - 266
FORSTER ET AL., NAT BIOTECHNOL, vol. 37, 2019, pages 186 - 192
KRISHNAMURTHYWANG, VIRUS RES, vol. 239, 2017, pages 136 - 142
MINOT S. ET AL.: "The human gut virome: inter-individual variation and dynamic response to diet", GENOME RES, vol. 21, 2011, pages 1616 - 1625
ZUO ET AL.: "Gut mucosal virome alterations in ulcerative colitis", GUT, 2019
ROUX ET AL.: "Viral dark matter and virus-host interactions resolved from publicly available microbial genomes", ELIFE, 2015, pages 4
REYES, A. ET AL.: "Viruses in the faecal microbiota of monozygotic twins and their mothers", NATURE, vol. 466, 2010, pages 334 - 338
NORMAN ET AL., CELL, vol. 160, 2015, pages 447 - 60
ZUO ET AL., GUT, 2019
FERNANDES ET AL., J PEDIATR GASTROENTEROL NUTR, vol. 68, 2019, pages 30 - 36
ECKBURG ET AL., SCIENCE, vol. 308, 2005, pages 1635 - 8
COSTELLO ET AL., SCIENCE, vol. 324, 2009, pages 1190 - 2
BJURSELL ET AL., JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 281, 2006, pages 36269 - 36279
MAHOWALD ET AL., PNAS, vol. 10, 2009, pages 3698 - 3703
RAMIREZ-FARIAS ET AL., BR J NUTR, vol. 4, 2008, pages 1 - 10
POOL-ZOBELSAUER, J NUTR, vol. 137, 2007, pages 2580S - 2584S
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
"Current Protocols in Molecular Biology", 2005, JOHN WILEY AND SONS, INC.
THURBER R.V. ET AL.: "Laboratory procedures to generate viral metagenomes", NAT PROTOC, vol. 4, 2009, pages 470 - 483
BERNARDES ET AL.: "Evaluation and improvements of clustering algorithms for detecting remote homologous protein families", BMC BIOINFORMATICS, vol. 16, 2015, pages 34, XP021212715, DOI: 10.1186/s12859-014-0445-4
YUN ET AL., ADV DRUG DELIV REV., vol. 65, no. 6, 2013, pages 822 - 832
BOLDUC ET AL.: "vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria", PEERJ, vol. 5, 2017, pages e3243
LOPEZ-SILES ET AL., APPL ENVIRON MICROBIOL, vol. 81, 2015, pages 7582 - 92
LOPEZ-SILES ET AL., FRONT CELL INFECT MICROBIOL, vol. 8, 2018, pages 281
PASCAL ET AL., GUT, vol. 66, 2017, pages 813 - 822
MACHIELS ET AL., GUT, vol. 63, 2014, pages 1275 - 83
STRAUSS ET AL., INFLAMM BOWEL DIS, vol. 17, 2011, pages 1971 - 8
GEVERS ET AL., VEILLONELLA, 2014
JOOSSENS ET AL., GUT, vol. 60, 2011, pages 631 - 7
WILLING ET AL., GASTROENTEROLOGY, vol. 139, 2010, pages 1844 - 1854
MANICHANH ET AL., GUT, vol. 55, 2006, pages 205 - 11
DICKSVED ET AL., ISME J, vol. 2, 2008, pages 716 - 27
RIGOTTIER-GOIS, ISME J, vol. 7, 2013, pages 1256 - 61
BERKOWITZ ET AL., FRONT IMMUNOL, vol. 9, 2018, pages 74
MAIER ET AL., NATURE, vol. 555, 2018, pages 623 - 628
PARRAS-MOLTO ET AL., MICROBIOME, vol. 6, 2018, pages 119
ROUX ET AL.: "Towards quantitative viromics for both double-stranded and single-stranded DNA viruses", PEERJ, vol. 4, 2016, pages e2777
WOODSALZBERG, GENOME BIOL, vol. 15, 2014, pages R46
NURK ET AL., GENOME RES, vol. 27, 2017, pages 824 - 834
SUTTON ET AL., MICROBIOME, vol. 7, 2019, pages 12
ROUX ET AL.: "VirSorter: mining viral signal from microbial genomic data", PEERJ, vol. 3, 2015, pages e985
GRAZZIOTIN ET AL., NUCLEIC ACIDS RES, vol. 45, 2017, pages D491 - D498
GUERIN ET AL., CELL HOST MICROBE, vol. 24, 2018, pages 653 - 664 e6
TATUSOV ET AL., NUCLEIC ACIDS RES, vol. 28, 2000, pages 33 - 6
HYATT ET AL., BMC BIOINFORMATICS, vol. 11, 2010, pages 119
CALLAHAN ET AL., NAT METHODS, vol. 13, 2016, pages 581 - 3
EDGAR ET AL., BIOINFORMATICS, vol. 27, 2011, pages 2194 - 200
SCHLOSS ET AL., APPL ENVIRON MICROBIOL, vol. 75, 2009, pages 7537 - 41
ALLARD ET AL., BMC BIOINFORMATICS, vol. 16, 2015, pages 324
Attorney, Agent or Firm:
SHIRTZ, Joseph F. et al. (US)
Download PDF:
Claims:
Claims

1. A method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising: obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;

obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;

creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;

creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.

2. The method of claim 1,

wherein at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and

wherein at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.

3. The method of claims 1 or 2, wherein a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.

4. The method of any one of claims 1 -3, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.

5. The method of any one of claims 1 -4, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.

6. The method of any one of claims 1-5, wherein each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.

7. The method of any one of claims 1 -6, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.

8. The method of claim 7, wherein performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).

9. The method of any one of claims 1 -8, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.

10. The method of any one of claims 1-9, wherein the healthy cohort and the cohort diagnosed with IBD are each human cohorts.

11. The method of any one of claims 1-10, further comprising:

associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD); associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);

associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort;

associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and

identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters.

12. The method of any one of claims 1-11, further comprising:

representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and

representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.

13. The method of any one of claims 1-12,

wherein the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,

wherein the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and

wherein the method further comprises:

creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;

creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and

wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.

14. The method of claim 13,

wherein the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and

wherein the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database.

15. A method for determining the presence of inflammatory bowel disease (IBD) in a subject, the method comprising:

obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;

creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;

obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.

16. The method of claim 15, wherein at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.

17. The method of claims 15 or 16, wherein a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.

18. The method of any one of claims 15-17, wherein at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.

19. The method of any one of claims 15-19, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters.

20. The method of any one of claims 15-19, wherein the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae.

21. The method of any one of claims 15-20, wherein the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l 5, vc1 l7, vc1 l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467.

22. The method of any one of claims 15-21, wherein an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.

23. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

24. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

25. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

26. The method of claim 22, wherein an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

27. The method of any one of claims 15-21 , wherein an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc111, vc1 l4, vc115, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.

28. The method of any one of claims 15-21 , wherein an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.

29. The method of any one of claims 15-21, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.

30. The method of claim 29, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

31. The method of claim 29, wherein a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

32. The method of claim 29, wherein a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

33. The method of any one of claims 15-32, wherein obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).

34. The method of any one of claims 15-33, further comprising:

obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and

evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.

35. The method of claim 34, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.

36. The method of claim 34 or claim 35, wherein the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifr actor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

37. The method of claim 36, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.

38. The method of claim 37, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

39. The method of claim 34 or claim 35, wherein an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

40. The method of claim 34 or claim 35, wherein an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.

41. The method of claim 34 or claim 35, wherein an increased abundance of

Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject.

42. The method of claim 37, wherein an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

43. The method of claim 36, wherein a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.

44. The method of claim 43, wherein a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

45. The method of claim 43, wherein a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

46. The method of any one of claims 34-45, wherein obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.

47. The method of claim 46, wherein the V region is V4 region.

48. The method of any one of claims 15-47, wherein the GI microbiota sample is a fecal sample.

49. The method of any one of claims 15-48, wherein the subject is a human.

50. The method of any one of claims 15-49, further comprising administering an IBD treatment to the subject.

51. The method of any one of claims 15-50, further comprising administering to the subject additional diagnostic tests for IBD, CD and/or UC.

52. The method of any one of claims 15-51, further comprising enrolling the subject in a clinical trial.

53. The method of any one of claims 15-52, wherein comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:

identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;

determining relative abundance of members within each common cluster in the plurality of subject viral clusters;

associating a correlation value with each common cluster in the plurality of marker clusters; and

comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.

54. A kit for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:

a device to:

receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;

receive a second dataset representing a plurality of viral genome IBD marker clusters;

create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and

compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset.

55. The kit of claim 54, wherein the device is further configured to:

receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;

evaluate the third dataset for the purpose of IBD diagnosis; and

determine the presence of IBD based at least in part on the evaluation of the third database.

56. The kit of claim 54 or 55, wherein the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.

57. The kit of any of claims 54-56, wherein the IBD is ulcerative colitis (UC).

58. The kit of any of claims 54-56, wherein the IBD is Crohn's disease (CD).

59. The kit of any one of claims 54-58, wherein the subject is human.

60. A system comprising:

one or more processors;

a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to:

receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;

receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD; create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;

create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.

61. A method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467.

62. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

63. A method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.

64. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

65. A method for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103.

66. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

67. The method of claim 1 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

68. The method of claim 63, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

69. The method of claim 65, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.

70. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

71. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

72. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.

73. The method of claim 67 or claim 70, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

74. The method of claim 68 or claim 71, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.

75. The method of claim 69 or claim 72, wherein said probiotic composition comprises one or more bacterial strains from the genus Akkermansia.

76. The method of any one of claims 67-75, wherein the V region is V4 region.

77. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

78. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

79. The method of any one of claims 61-78, wherein the subject is human.

Description:
MATERIALS AND METHODS FOR ASSESSING VIROME AND MICROBIOME

MATTER

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to United States Provisional Application Serial Number 62/861,807, filed on 14 June 2019, United States Provisional Application Serial Number 62/861,818, filed on 14 June 2019, United States Provisional Application Serial Number 62/861,776, filed on 14 June 2019, and United States Provisional Application Serial Number 62/861,746, filed on 14 June 2019. The disclosure of each of the aforementioned applications is incorporated herein by reference in its entirety.

FIELD OF THE INVENTION

[0002] The present invention, in some aspects, relates to a method of analyzing the

microbiome, e.g., virome, of a patient. The present invention, in some aspects, also relates to methods of diagnosing and treating dysbiosis of the microbiome, and various disorders that include infections and inflammatory disorders.

BACKGROUND

[0003] The virome is likely to be one of the major forces shaping the human gut microbiome, but is perhaps its least understood component. The virome is dominated by phages, such as bacteriophages, which play vital roles in many microbial communities by driving diversity, facilitating nutrient turnover (Weitz et al., 2015. ISME J, 9, 1352-64) and facilitating horizontal gene transfer (Canchaya et al, 2003. Current opinion in microbiology, 6, 417-424). High throughput sequencing has revealed the enormous diversity of the viral fraction of microbial ecosystems. Understanding the role of bacteriophages in microbial community structure can provide for more understanding and/or control of the alterations in human gut microbiome composition and diversity associated with many diseases, including Inflammatory Bowel Disease (IBD) (Gevers et al, 2014. Cell Host Microbe, 15, 382-392; Halfvarson et al., 2017. Nat Microbiol, 2, 17004), obesity (Le Chatelier et al., 2013. Nature, 500, 541-6) and diabetes (Forslund et al., 2015. Nature, 528, 262-266). [0004] Many gut bacteria (and potential phage hosts) remain difficult to culture (Forster et al., 2019. Nat Biotechnol, 37, 186-192). This places a heavy reliance on metagenomic sequencing and bioinformatic approaches. However, a lack of universal marker genes (similar to 16S rRNA for the bacteriome) and a subsequent lack of taxonomic information due to poorly populated databases (Krishnamurthy and Wang, 2017. Virus Res, 239, 136-142) means that database- independent analysis of the virome must be carried out at the level of metagenomic assembly or individual viral genome. Early sequencing studies using 454 technology first described the novelty and diversity of the human gut virome (Minot et al., 2011. Genome Res, 21, 1616-25), but were only able to identify 2% of reads and with limits in sequencing depth, the true diversity and composition was not revealed. Improvements in sequencing technologies have allowed the virome to be analyzed in unprecedented detail with studies sequencing up to 50 million reads per sample (Zuo et al, 2019. Gut mucosal virome alterations in ulcerative colitis. Gut ) and have confirmed that the virome is incredibly diverse, that the majority do not align to known sequences in databases (viral dark matter) (Roux et al, 2015b. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife, 4), and that composition is highly unique to individuals (Reyes et al, 2010, Nature, 466, 334-8).

[0005] Inflammatory Bowel Disease, including Crohn’s disease (CD) and ulcerative colitis (UC), is a chronic disorder of the intestinal tract resulting in periods of flare (active) and remission (inactive) disease. IBD has been associated with alterations in the human gut microbiome which include decreased diversity and reduced abundance of the Firmicutes and Bacteroides. There is tenative evidence that the gut virome plays a role in IBD (Norman et al., 2015. Cell, 160, 447-60; Zuo et al., 2019. Gut, Fernandes et al., 2019. J Pediatr Gastroenterol Nutr, 68, 30-36) where IBD is associated with a decreased overall virome diversity and abundance and an increased abundance of the family Caudovirales. Because only a small fraction of gut virome are classified at the family level, let alone classified genomically, nearly all of this research has been conducted on a fraction of the virome, with a current benchmark study using about 15% of the data (Norman et al., 2015. Cell, 160, 447-60). This hampers the identification of virome disease biomarkers and means that any link between virome, bacteriome and disease status remains elusive. [0006] Analysis of the whole gut virome using metagenomic assembly is also challenging. At this level of resolution, the virome exhibits enormous diversity and interpersonal variation, obscuring patterns in the virome across individuals and cohorts.

[0007] Accordingly, there is a need for improved systems and methods to identify markers for IBD in the gut virome.

SUMMARY OF THE INVENTION

[0008] In one aspect, a method is provided for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising:

obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;

obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;

creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;

creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.

[0009] In some embodiments, at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.

[0010] In some embodiments, a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database. [0011] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.

[0012] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.

[0013] In some embodiments, each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.

[0014] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.

[0015] In some embodiments, performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).

[0016] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.

[0017] In some embodiments, the healthy cohort and the cohort diagnosed with IBD are each human cohorts.

[0018] In some embodiments, the methods described above further comprise:

associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD);

associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);

associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort; associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and

identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters.

[0019] In some embodiments, the above methods further comprise:

representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and

representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.

[0020] In some embodiments, of the above methods,

the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,

the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and

the method further comprises:

creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;

creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and

wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.

[0021] In some embodiments, of the above methods,

the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database.

[0022] In one aspect, a method for determining the presence of inflammatory bowel disease (IBD) in a subject is provided, the method comprising:

obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;

creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;

obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.

[0023] In some embodiments, at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.

[0024] In some embodiments, the above methods further comprise determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters. In some embodiments, the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. In some embodiments, the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467. In some embodiments, an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vei l 8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

[0025] In some embodiments, an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc1 l l, vc1 l4, vc1 l 5, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc16, vc1 19, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.

[0026] In some embodiments, a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc112, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

[0027] In some embodiments, obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).

[0028] In some embodiments, the method further comprises:

obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and

evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.

[0029] In a specific embodiment, the method further comprises determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.

[0030] In some embodiments, the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifr actor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, an increased abundance of one or more bacterial genera selected from Clostridium XIV a, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject. In a specific embodiment, an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

[0031] In some embodiments, an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject. In some embodiments, an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

[0032] In some embodiments, a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.

In a specific embodiment, the V region is V4 region.

[0033] In some embodiments, the GI microbiota sample is a fecal sample. In some

embodiments, the subject is human.

[0034] In various embodiments of the above methods, the method further comprises administering an IBD treatment to the subject. In various embodiments, the method further comprises administering to the subject additional diagnostic tests for IBD, CD and/or UC. In various embodiments, the method further comprises enrolling the subject in a clinical trial.

[0035] In various embodiments of the above methods, comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:

identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;

determining relative abundance of members within each common cluster in the plurality of subject viral clusters;

associating a correlation value with each common cluster in the plurality of marker clusters; and

comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.

[0036] In one aspect a kit is provided for determining the presence of inflammatory bowel disease (IBD) in a subject, the kit comprising:

a device to:

receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;

receive a second dataset representing a plurality of viral genome IBD marker clusters; create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and

compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset.

[0037] In some embodiments, the device is further configured to:

receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;

evaluate the third dataset for the purpose of IBD diagnosis; and

determine the presence of IBD based at least in part on the evaluation of the third database.

[0038] In some embodiments of the above kits, the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample. In some embodiments, the IBD is ulcerative colitis (UC). In some embodiments, the IBD is Crohn's disease (CD). In some embodiments, the subject is human.

[0039] In another aspect is provided a system comprising:

one or more processors;

a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to: receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;

receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD;

create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;

create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters. [0040] In one aspect a method is provided for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,

Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

[0041] In another aspect a method is provided for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

[0042] In another aspect is provided a method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes,

Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.

[0043] In another aspect is provided a method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

[0044] In another aspect is provided a for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.

[0045] In another aspect is provided a method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

[0046] In another aspect is provided a method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,

Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,

Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

[0047] In another aspect is provided a method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes,

Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.

[0048] In another aspect, a method is provided for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.

[0049] In some embodiments of the above aspects, the V region is V4 region.

[0050] In another aspect is provided a method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

[0051] In another aspect a method is provided for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

[0052] In various embodiments of the above aspects, the subject is human.

BRIEF DESCRIPTION OF THE DRAWINGS

[0053] Figures 1A-1D demonstrate a comparison of commonality pre and post clustering of viral contigs. PCoA of Spearman distances using pre-clustering (viral contigs) (Figure 1A) and post-clustering viral cluster (VC) count tables (Figure IB). Figure 1C shows the relative abundance of viral contigs (top) and VCs (bottom) for control subjects at varying thresholds of commonality across subjects. Figure ID depicts the number of viral contigs/VCs shared between 30%, 50% and 70% of subjects in each cohort.

[0054] Figures 2A-2D show the virome composition comparison of the IBD cohorts to controls. Figure 2A depicts PCoA using Spearman distances. Figure 2B depicts alpha diversity (observed VCs) with p-values from wilcoxon tests. Figure 2C shows volcano plots of differential abundance results from DeSeq2 between controls and CD. Figure 2D shows volcano plots of differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.

[0055] Figures 3A-3D show the bacterial compositional comparison of the IBD cohorts and controls. Figure 3A depicts PCoA using unweighted UniFrac distances. Figure 3B is a plot showing alpha diversity (Chaol diversity) with p-values from wilcoxon tests. Figure 3C shows differential abundance results from DeSeq2 between controls and CD. Figure 3D shows differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.

[0056] Figures 4A-4B show the drivers of PCoA separation for the virome (spearman distances; Figure 4A) and 16S unweighted UniFrac (Figure 4B). VC and RSV abundances were correlated, using spearman correlations, with PC axis 1 and 2. Only significant correlations with a rho of greater than 0.35 or -0.35 were graphed for the virome or ± .5 for the 16S (or a maximum of the top 6 for each quadrant). Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes.

[0057] Figures 5A-5F demonstrate the investigation of differences in viromes and 16S between subjects in UC flare and UC remission. Beta diversity for viromes (using Spearman distances; Figure 5A) and 16S (unweighted UniFrac; Figure 5B) are shown. VCs and RSV abundance were correlated with PC axis 1 and 2. Only significant correlations with a rho of greater than ± 0.35 were graphed for the virome or ± .5 (or top 6 for each quadrant) for the 16S. Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes. Alpha diversity is shown in Figure 5C for VC (Observed VCs and Shannon), and in Figure 5D for 16S (Chaol and Shannon diversity), differential abundance results using DeSeq2 between UU flare are shown in Figure 5E, along with remission for VCs (Figure 5E) and 16S (Figure 5F). All points above the dotted line are significant.

[0058] Figures 6A-6D show the classification between healthy controls and patients with IBD using VC and 16S composition. The top 20 importance factors are shown for each models for VCs (Figure 6A), 16S (Figure 6B), VCs and 16S combined (Figure 6C). The shades of grey of the bars correspond to differential abundance between groups; text to the right of the bar are the classifications and/or the bacterial annotation to CRISPR protospacers. Figure 6D shows the ROC curve analysis for each of the 3 models including the % accuracy.

[0059] Figure 7A depicts a VC PCoA using Spearman distances comparing the 3 cohorts CD, UC and controls. Figure 7B shows distances between points in each cohort for the VC spearman PCoA. Figure 7C shows 16S PCoA using unweighted UniFrac distances comparing the 3 cohorts. Figure 7D is a boxplot showing distances between points in each cohort for the 16S unweighted UniFrac PCoA. P-values for boxplots are from wilcoxon tests.

[0060] Figures 8A-8F show the alpha diversity of patients with IBD versus healthy controls. Shown are Observed VCs (Figure 8A), Shannon diversity of VCs (Figure 8B), Chaol diversity of 16S counts (Figure 8C), and Shannon diversity of 16S counts (Figure 8D). P-values for boxplots are from wilcoxon tests. Figure 8E shows Spearman correlations between observed VC counts and observed bacterial species counts. Figure 8F shows Shannon diversity of VCs and 16S counts. [0061] Figures 9A-9B show the alpha diversity of observed VLPs for any VCs classified as Caudovirales tested for disease groups and controls (Figure 9A) and disease groups/states and controls (Figure 9B). P-values for boxplots are from wilcoxon tests.

[0062] Figures 10A-10B show the read alignment for samples in each cohort to VCs classified as lysogenic (Figure 10A) and non-lysogenic (Figure 10B). P-values for boxplots are from wilcoxon tests.

[0063] Figure 11 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.

[0064] Figure 12 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.

[0065] Figure 13A shows the Spearman correlation between estimated viral load and observed VCs. Figure 13B shows viral load plotted per subject with points colored using various intensities of grey by disease status

[0066] Figure 14 depicts a network plot of CRISPR protospacers to the 20 most relevant VCs (10 key and additional important VCs from machine learning). Clusters and CRISPR

protospacers are colored using various intensities of grey according to differential abundance using DeSeq2.

[0067] Figures 15A-15J show images of the 10 key drivers in the separation of IBD and controls. Annotations are using pVOGs.

[0068] Figure 16 is a block diagram illustrating a system or device for identifying virome marker clusters according to aspects of the present invention.

[0069] Figure 17 is a block diagram illustrating a system or device for detecting health or disease in a subject based at least in part on virome marker clusters according to aspects of the present invention.

DETAILED DESCRIPTION

[0070] It is an object of the present invention to meet the above-stated needs. Generally, this disclosure provides a framework for analyzing viromes across cohorts and demonstrates the presence of significant IBD signals in the virome, which could have value in the development of biomarkers and therapeutics into the future. [0071] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

[0072] As used herein, the term“bacteria” encompasses both prokaryotic organisms and archaea present in mammalian microbiota.

[0073] The term“microbiota” is used herein to refer to microorganisms (e.g., bacteria, archaea, fungi, protozoa) and viruses (e.g., phages and eukaryotic viruses) present in a host animal or human (e.g., in the gastrointestinal tract, skin, oral cavity, vagina, etc.). Microbiota exerts a significant influence on health and well-being of the host. Viruses present in microbiota are separately described as“virobiota”. The term“microbiome” refers to the collective genes of all organisms comprising the microbiota.

[0074] The term“virome” is used herein to refer to include viruses, virus-like particles (VLPs), and molecules that closely resemble viruses but may or may not be infectious and may or may not include viral genetic material. The“virome” can include the“virobiota” but is not limited to the“virobiota”.

[0075] A“microbiota sample” it is meant a sample that contains a microbiota from a particular source. A“GI microbiota sample” is from the gastro-intestinal tract, and may include a fecal microbiota sample. Microbiota samples may comprise all of the components present in the microbiota.

[0076] The term“gastrointestinal (GI) microbiota” is used to refer to microorganisms (e.g., bacteria, fungi, unicellular parasites) and viruses (e.g., phages and eukaryotic viruses) in the digestive tract.

[0077] As used herein, the term“dysbiosis” refers to a microbial imbalance on or inside the body. Dysbiosis can result from, e.g., antibiotic exposure as well as other causes, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites. Dysbiosis can also result from unknown causes, or causes that are not yet known. The term“consequences of dysbiosis” refers to various disorders associates with dysbiosis. For example, dysbiosis in the GI tract has been reported to be associated with a wide variety of illnesses, such as, e.g., irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, bacterial vaginosis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, infectious gastroenteritis, non-infectious gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, antibiotics-associated diarrhea, etc.

[0078] The term“restoring normal microbiota” is used herein to refer to restoring microbiota of a subject to the level of bioactivity and diversity of corresponding microbiota of a healthy subject. This may also be considered as normalizing the microbiota, populating the microbiota, populating normal microbiota, preventing the onset of dysbiosis, or augmenting the growth of at least one type of virus in a subject.

[0079] Specific changes in microbiota discussed herein can be detected using various methods, including without limitation quantitative PCR (qPCR) or high-throughput sequencing methods which detect over- and under-represented genes in the total bacterial population (e.g., 454- sequencing for community analysis; screening of microbial 16S ribosomal RNAs (16S rRNA), etc.), or transcriptomic or proteomic studies that identify lost or gained microbial transcripts or proteins within total bacterial populations. See, e.g., U.S. Patent Publication No. 2010/0074872; Eckburg et al, Science, 2005, 308: 1635-8; Costello et al., Science, 2009, 326: 1694-7; Grice et al, Science, 2009, 324: 1190-2; Li et al, Nature, 2010, 464: 59-65; Bjursell et al., Journal of Biological Chemistry, 2006, 281 :36269-36279; Mahowald et al, PNAS, 2009, 14:5859-5864; Wikoff et al., PNAS, 2009, 10:3698-3703.

[0080] Various exemplary ways of amplifying and sequencing of nucleic acids from microbiota samples includes, but is not limited to: solid-phase PCR involving bridge

amplification of DNA fragments of the biological samples on a substrate with oligo adapters, wherein amplification involves primers having a forward index sequence (e.g., Illumina forward index for MiSeq/NextSeq/HiSeq platforms) or a reverse index sequence (e.g., Illumina reverse index for MiSeq/NextSeq/HiSeq platforms), a forward barcode sequence or a reverse barcode sequence, a transposase sequence (e.g., corresponding to a transposase binding site for

MiSeq/NextSeq/HiSeq platforms), a linker, an additional random base, and a sequence for targeting a specific target region (e.g., 16S region, 18S region, ITS region). Illumina sequencing (e.g., with a HiSeq platform, with a MiSeq platform, with a NextSeq platform, etc.) may be used as part of a sequencing-by-synthesis technique.

[0081] As used herein, the terms“a microbiota disease” and“disease of a microbiota” refer to a change in the composition of a microbiota, including without limitation very small changes in a relative abundance of one or more organisms within the microbiota as compared to a healthy control. Microbiota diseases can result from, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites, antibiotic exposure as well as other causes. Exemplary microbiota diseases in the GI tract include, but are not limited to, irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, diarrhea, etc.

[0082] As used herein, the term“probiotic” refers to a substantially pure bacteria (i.e., a single isolate, of, e.g., live bacterial cells, conditionally lethal bacterial cells, inactivated bacterial cells, killed bacterial cells, spores, recombinant carrier strains), or a mixture of desired bacteria, bacteria components or bacterial extract, or bacterially-derived products (natural or synthetic bacterially-derived products such as, e.g., bacterial antigens or metabolic products) and may also include any additional components that can be administered to a mammal. Such compositions are also referred to herein as a“bacterial inoculant.”

[0083] As used herein, the term“prebiotic” refers to an agent that increases the number and/or activity of one or more desired bacteria, enhancing their growth. Non-limiting examples of prebiotics useful in the methods of the present disclosure include fructooligosaccharides (e.g., oligofructose, inulin, inulin-type fructans), galactooligosaccharides, human milk

oligosaccharides (HMO), Lacto-N-neotetraose, D-Tagatose, xylo-oligosaccharides (XOS), arabinoxylan-oligosaccharides (AXOS), N-acetylglucosamine, N-acetylgalactosamine, glucose, other five- and six-carbon sugars (such as arabinose, maltose, lactose, sucrose, cellobiose, etc.), amino acids, alcohols, resistant starch (RS), and mixtures thereof. See, e.g., Ramirez-Farias et al, Br J Nutr (2008) 4: 1-10; Pool-Zobel and Sauer, J Nutr (2007), 137:2580S-2584S. The prebiotic may be effective to fully, or partially, restore normal microbiota.

[0084] As used herein, the term“viral cluster” or“VC” refers to a set of contigs that fit certain critera described herein, which are in turn grouped together based on protein homology profiles.

[0085] As used herein, the term“stimulate” when used in connection with growth and/or activity of bacteria encompasses the term“enhance”. [0086] The terms“treat” or“treatment” of a state, disorder or condition include: (1) preventing, delaying, or reducing the incidence and/or likelihood of the appearance of at least one clinical or sub-clinical symptom of the state, disorder or condition developing in a subject that may be afflicted with or predisposed to the state, disorder or condition but does not yet experience or display clinical or subclinical symptoms of the state, disorder or condition; or (2) inhibiting the state, disorder or condition, i.e., arresting, reducing or delaying the development of the disease or a relapse thereof (in case of maintenance treatment) or at least one clinical or sub- clinical symptom thereof; or (3) relieving the disease, i.e., causing regression of the state, disorder or condition or at least one of its clinical or sub-clinical symptoms. The benefit to a subject to be treated is either statistically significant or at least perceptible to the patient or to the physician.

[0087] The terms“patient”,“individual”,“subject”,“mammal , and“animal” are used interchangeably herein and refer to mammals, including, without limitation, human and veterinary animals (e.g., cats, dogs, cows, horses, sheep, pigs, etc.) and experimental animal models. In a preferred embodiment, the subject is a human.

[0088] As used herein, the term“therapeutically effective amount” refers to the amount of a compound, composition, particle, organism (e.g., a probiotic or a microbiota transplant), etc. that, when administered to a subject for treating (e.g., preventing or ameliorating) a state, disorder or condition, is sufficient to effect such treatment. The“therapeutically effective amount” will vary depending, e.g., on the agent being administered as well as the disease severity, age, weight, and physical conditions and responsiveness of the subject to be treated.

The terms“therapeutically effective amount” and“effective amount” are used interchangeably.

[0089] As used herein, the term“acceptable” with reference to excipients, diluents, and carriers refers to molecular entities and compositions that are generally regarded as

physiologically tolerable.

[0090] The term“carrier” refers to a diluent, adjuvant, excipient, or vehicle with which the compound is administered. Such pharmaceutical carriers can be sterile liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. Water or aqueous solution saline solutions and aqueous dextrose and glycerol solutions are preferably employed as carriers, particularly for injectable solutions. Alternatively, the carrier can be a solid dosage form carrier, including but not limited to one or more of a binder (for compressed pills), a glidant, an encapsulating agent, a flavorant, and a colorant. Suitable pharmaceutical carriers are described in“Remington’s Pharmaceutical Sciences” by E.W. Martin.

[0091] The term“about” or“approximately” means within a statistically meaningful range of a value. Such a range can be within an order of magnitude, preferably within 50%, more preferably within 20%, still more preferably within 10%, and even more preferably within 5% of a given value or range. The allowable variation encompassed by the term“about” or

“approximately” depends on the particular system under study, and can be readily appreciated by one of ordinary skill in the art.

[0092] The terms“a,”“an,” and“the” do not denote a limitation of quantity, but rather denote the presence of“at least one” of the referenced item.

[0093] The practice of the present invention employs, unless otherwise indicated, conventional techniques of statistical analysis, molecular biology (including recombinant techniques), microbiology, cell biology, and biochemistry, which are within the skill of the art. Such tools and techniques are described in detail in e.g., Sambrook et al. (2001) Molecular Cloning: A

Laboratory Manual. 3rd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, New York; Ausubel et al. eds. (2005) Current Protocols in Molecular Biology. John Wiley and Sons, Inc.: Hoboken, NJ; Bonifacino et al. eds. (2005) Current Protocols in Cell Biology. John Wiley and Sons, Inc.: Hoboken, NJ; Coligan et al. eds. (2005) Current Protocols in Immunology, John Wiley and Sons, Inc.: Hoboken, NJ; Coico et al. eds. (2005) Current Protocols in Microbiology, John Wiley and Sons, Inc.: Hoboken, NJ; Coligan et al. eds. (2005) Current Protocols in Protein Science, John Wiley and Sons, Inc.: Hoboken, NJ; and Enna et al. eds. (2005) Current Protocols in Pharmacology, John Wiley and Sons, Inc.: Hoboken, NJ. Additional techniques are explained, e.g., in U.S. Patent No. 7,912,698 and U.S. Patent Appl. Pub. Nos. 2011/0202322 and 2011/0307437.

[0094] The term“computing system” is intended to include stand alone machines or devices and/or a combination of machines, components, modules, systems, servers, processors, memory, detectors, user interfaces, computing device interfaces, network interfaces, hardware elements, software elements, firmware elements, and other computer-related untis. By way of example, but not limitation, a computing system can include one or more of a general-purpose computer, a special-purpose computer, a processor, a portable electronic device, a portable electronic medical instrument, a stationary or semi-stationary electronic medical instrument, or other electronic data processing apparatus.

[0095] The term“database” as referred to herein is intended to include a collection of indexed data stored on a computer readable medium. By way of example and not limitation, data in the database can include numerical values, textual values, computational representation of physical objects (including living, non-living, organic, non-organic objects, and combinations thereof), computational representation of physical phenomina, categorical classification. Various data can be linked together or otherwise indexed. By way of example and not limitation, data in the database can be represented as an indexed matrix.

[0096] The term“dataset” as referred to herein is intended to include information that can be provided to a computing system in a computer readable format.

[0097] The terms“component,”“module,”“system,”“server,” “processor,”“memory,” and the like are intended to include one or more computer-related units, such as but not limited to hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, such as data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal.

[0098] Some implementations of the disclosed technology will be described more fully with reference to the accompanying drawings. This disclosed technology may, however, be embodied in many different forms and should not be construed as limited to the implementations set forth herein. The components described hereinafter as making up various elements of the disclosed technology are intended to be illustrative and not restrictive. [0099] It is also to be understood that the mention of one or more method steps does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Similarly, it is also to be understood that the mention of one or more components in a device or system does not preclude the presence of additional components or intervening components between those components expressly identified.

[00100] Certain embodiments and implementations of the disclosed technology are described above with reference to block and flow diagrams of systems and methods and/or computer program products according to example embodiments or implementations of the disclosed technology. It will be understood that one or more blocks of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, respectively, can be implemented by computer-executable program instructions. Likewise, some blocks of the block diagrams and flow diagrams may not necessarily need to be performed in the order presented, may be repeated, or may not necessarily need to be performed at all, according to some embodiments or implementations of the disclosed technology.

[00101] These computer-executable program instructions may be loaded onto a computing system such as a general-purpose computer, a special-purpose computer, a processor, or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks.

[00102] As an example, embodiments or implementations of the disclosed technology may provide for a computer program product, including a computer-usable medium having a computer-readable program code or program instructions embodied therein, said computer- readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks. Likewise, the computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.

[00103] Accordingly, blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, can be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.

[00104] In some embodiments presented herein, IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Individuals in each cohort can be human. The viral genome sequences can be unidentified in that they are not taxonomically classified in a viral genome database. The viral genome sequences can be unidentified in that they are considered“viral dark matter” as described herein and would otherwise be understood by a person of ordinary skill in the art. The viral genome sequences can be unidentified at the order level, at the family level, strain level, or any intervening level. The viral genome sequences can be unidentified in that they are classified taxonomically, at some level, in a viral genome database, however, the viral genome sequences have not been compared to the classification database. Viral genome sequences can include sequenced VLPs, molecules that closely resemble viruses, but are non- infectious because they contain no viral genetic material. Viral genome sequences can be derived from gastrointestinal (GI) microbiota samples provided from individuals in each cohort.

Metagenomic assembly can be performed on the samples using short reads to resolve viral genomes. The reads can subsequently be aligned to determine abundance, or count of members in each viral genome. The resolved viral genomes can include unidentified viral genome sequences. To the extent that resolved viral genomes can be identified, the IBD markers can also be identified using the identified viral genomes.

[00105] In some embodiments of the various methods presented herein, IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters. A viral cluster can each include one or more unidentified viral genome sequences. The viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts. IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database. In some

embodiments, viral clusters associated with IBD can further be associated with one or both of a sub-cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).

[00106] The viral genome sequences can be represented as datasets that are readable by a computational device or system. For instance, the viral genome sequences can be represented as viral contigs. Each viral genome sequence can be represented in whole or in part. Each viral genome sequence can be represented with resolution at the strain level.

[00107] Each dataset can be associated with a cohort and/or sub-cohort. The datasets collectively can include a significant number of viral genome sequence reads within the GI microbiota samples provided from the individuals. In some embodiments, the dataset is performed by sequencing VLP DNA isolated from GI microbiota sample(s). The VLP DNA may be isolated from GI microbiota samples and prepared by any of the various methods of preparing DNA known in the art, such as those described in Thurber R.V. et al., 2009,

Laboratory procedures to generate viral metagenomes. Nat Protoc 4:470-483; Reyes, A., et al, 2010, Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466:334- 338; and Minot S. et al., 2011, The human gut virome: inter-individual variation and dynamic response to diet. Genome Res 21 : 1616-1625, each of which is incorporated by reference herein in its entirety. The datasets collectively can include a number of viral genome sequence reads within the GI microbiota samples. The reads per sample can include the ranges of 15% to 97%, 25% to 97%, 50% to 97%, 60% to 97%, 70% to 97%, 80% to 97%, and 90% to 97%.

[00108] The viral genome sequences can be respresented as protein sequences. Alternatively, the viral genome sequences can be represented as a sequence from which protein sequences or protein content can be derived (e.g. genetic sequence). [00109] Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters. To the extent that the whole virome includes identified viral genome sequences, the identified viral genome sequences can be included in the protein clustering and protein homology analysis. Proteins can be derived from each dataset based on the viral genome sequences. The proteins can be organized into protein clusters (PCs) using Markov cluster (MCL)-based protein families, transitive clustering (TransClust), spectral clustering of protein sequences (SCPS), High-Fidelity clustering of protein sequences (HiFix) or other appropriate technique. Additional clustering techniques are described in Bernardes et al., BMC Bioinformatics (2015) 16:34“Evaluation and improvements of clustering algorithms for detecting remote homologous protein families”, incorporated by reference herein.

[00110] To determine protein homology viral genome sequences, or protein sequences derived therefrom can be evaluated pairwise such that each pair is given a similarity score based on the shared protein content between the sequences within the pair. Viral clusters can determined based on the similarity scores.

[00111] A viral cluster can include one or more unidentified viral genome sequences. A viral cluster can be completely populated by unidentified viral genome sequences. A viral cluster can be unassociated with a known taxon. A viral cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.

[00112] The viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts. Or, said another way, a collection of viral clusters associated with the healthy cohort can be created such that each viral cluster in the collection includes at least one viral genome derived from the healthy cohort, and another collection of viral clusters associated with the cohort diagnosed with IBD can be created such that this collection of viral clusters includes at least one viral genome derived from the cohort diagnosed with IBD.

[00113] IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database. [00114] Viral clusters associated with IBD can further be associated with one or both of a sub cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).

[00115] IBD markers can be defined as a viral cluster that is prevalent in at least one cohort and/or sub-cohort and minimal or absent in at least one other cohort or sub-cohort. In other words, the IBD markers can include viral clusters that are found predominantly in the healthy cohort and not in the IBD cohort and viral clusters that are found predominantly in the IBD cohort and not in the healthy cohort.

[00116] When the IBD cohort is further sub-divided into CD and UC, the IBD markers can include viral clusters that are found predominantly in the CD cohort and not the UC cohort and vice-versa, regardless of whether the same viral clusters are predominant in both the healthy and IBD cohorts. IBD marker clusters can identified by comparing the viral clusters associated with the CD sub-cohort to the UC sub-cohort. Within the total collection of IBD marker clusters, the IBD marker clusters can include a first subset of IBD marker clusters that are viral clusters more prevalently found in subjects diagnosed with CD compared to UC and a second subset of IBD marker clusters that are viral clusters more prevelantly found in subjects diagnosed with UC compared to CD.

[00117] The IBD markers can include viral clusters that contain no identified viral sequences. An IBD marker can be unassociated with a known taxon. An IBD marker cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.

[00118] To the extent that the whole virome includes identified viral genome sequences, the identified viral genome sequences can be included in the protein clustering and protein homology analysis. A viral cluster including an identified viral genome sequence can represent an unidentified taxon of higher rank than a strain and of lower rank than a family. A viral cluster including an identified viral genome sequence can be associated with one or more cohorts and/or sub-cohorts. Identified viral genome sequences can be clustered by protein clustering and protein homology to create reference viral clusters. Reference viral clusters can be associated with one or more cohorts and/or sub-cohorts. Identification of IBD marker clusters can include comparing reference viral clusters associated with the healthy cohort to reference viral clusters associated with the cohort diagnosed with IBD. Similarly, IBD marker clusters can include comparing reference viral clusters associated with CD with reference viral clusters associated with UC. [00119] To the extent that the whole virome includes identified viral genome sequences, the IBD markers can include viral clusters that contain at least one identified viral sequence. An IBD marker cluster containing an identified viral sequence can include an unidentified grouping of viral sequences. An IBD marker cluster can be an unidentified grouping of viral sequences, optionally comprising an identified viral sequence. An IBD marker cluster containing an identified viral sequence can represent an identified taxon.

[00120] Identification of the IBD markers as described above can be perfomed on a computing system having one or more processors and a memory with instructions thereon that can be performed by the processor(s). The computing system can receive datasets associated with each cohort and/or sub-cohort that each respectively include unidentified viral genome sequences. The viral genome sequences can be represented as a viral contig or other suitable computer-readable format. The computing system can create viral clusters for each dataset associated with each cohort and/or sub-cohort. Clustering can use a protein clustering algorithm to group like protiens and a protein homology algorithm to group viral genome sequences, including unidentified viral genome sequences, into viral clusters. Viral clusters can be compared across cohorts and/or sub cohorts to identify marker clusters. Marker clusters can represent clusters highly represented in at least one cohort and/or sub-cohort that is also marginally represented in at least one other cohort and/or sub- cohort.

[00121] Identification of the marker clusters can be performed using machine learning. The datasets can include an associaton for each viral cluster to a known variable, the known variable being the health state of the patient (healthy, IBD diagnosis, and optionally CD diagnosis and/or UC diagnosis). For each health state, the system can determine a correlated set of viral clusters from the total set of viral clusters. Viral clusters having a strong correlation to the presence or absence of a given health state can be identified as viral clusters.

[00122] Identification of the marker clusters can be performed using a beta diversity analysis on the viral clusters. A count table can be created by summing the counts of the viral genomic sequences (potentially represented as viral contigs) in each viral cluster. The count table can be subjected to an ordination method to determine beta diversity. The beta diversity analysis can be performed through principal coordinates analysis (PCoA), principal components analysis (PC A), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), redundancy analysis (RDA), and/or other suitable technique. [00123] Identification of the marker clusters can be performed using a calculation of differential abundance of viral clusters across cohorts and/or sub-cohorts. The calculation can be executed using a test or software package such as available through DESeq2, t-test, Wilcoxon rank-sum test, edgeR package, metagenomieSeq package, ANCOM package, and/or other suitable technique, algorithm, or software package.

[00124] Figure 16 is block diagram illustrating an example system 100 for identifying IBD marker clusters. The system 100 can include a non-transient memory 120 with executable instructions thereon to perform methods for identifying IBD marker clusters as described herein, a processor 130 in communication with the memory 120 capable of receiving and executing the instructions from the memory 120, to identy IBD marker clusters, and an output interface 140 capable of outputting a representation of the IBD marker clusters identified by the processor 130. The system can be in communication with a data store 110 on which cohort datasets are stored. The processor 130 can be configured to receive the datasets from the datastore 110, receive instructions from the memory 120, compute IBD marker clusters by performing operations on the datasets according to the executable instructions, and provide a representation of the IBD marker clusters to the output interface 140. The representation of the IBD marker clusters can be a computer-readable representation and/or a human user interface. Preferrably, output interface 140 can provide a means for conveying a computer readable representation of the IBD marker clusters to a digital storage medium such that the IBD marker clusters can be accessed by an IBD diagnosis device such as an example IBD diagnosis device as described herein. The system 100 can be contained within a singular device, potentially even a singular semiconductor chip (e.g. system on a chip), or can be distributed across multiple devices at multiple geographical locations as would be understood by a person of ordinary skill in the art. For instance, the data store 110 can be provided by a data server at a location remote to the processor 130 via a network (e.g. internet), and the processor 130 can be located on a computing device remote from the memory 120 and the executable instructions can be transmitted from through a network (e.g. internet) to the processor.

[00125] In various embodiments, a subject can be diagnosed with IBD by analyzing

unidentified viral genome sequences derived from the subject. Viral genome sequences can be obtained from the subject through a fecal sample or other means. The viral genome sequences can be derived from a GI microbiota sample obtained from the subject. The viral genome sequences can include unidentified viral genome sequences. The viral genome sequences can be represented as a subject dataset. The subject dataset can be in a computer readable format. The analysis can include clustering the viral genome sequences from the subject, including the unidentified viral genome sequences obtained from the patient. Clustering of the subject’s viral genome sequences can be carried out similar to as described above. The collection of viral clusters created based on the subject’s viral genome sequences can be compared to IBD markers. The IBD markers can be identified through analysis of a healthy cohort and a cohort diagnosed with IBD similar to as described above. The subject can be diagnosed with IBD based on analysis and comparison of viral genome sequences alone. Alternatively, bacteria derived from the subject can be analyzed for the purpose of IBD diagnosis of the subject and analysis of the viral genome sequences can be performed in conjunction such that the combination of bacterial and viral analysis can be used to diagnose the subject with IBD.

[00126] The marker clusters may comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. The marker clusters may comprise viral clusters from Siphoviridae. The marker clusters may comprise viral clusters from Myoviridae. The marker clusters may comprise viral clusters from Podoviridae. The marker clusters may comprise CrAss-like viral clusters. The marker clusters may comprise viral clusters from Microviridae.

[00127] The marker clusters may comprise one or more of the following exemplary viral clusters: vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l5, vc1 l7, vc1 l8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467. Alternatively, vc5, vc9, vc1O may be used as marker clusters.

[00128] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of IBD in the subject: vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420. The abundance may be increased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00129] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of Crohn’s Disease (CD) in the subject: vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l 8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420. In some embodiments, an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of CD in the subject. In these embodiments, the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00130] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of ulcerative colitis (UC) in the subject: vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413. In some embodiments, an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc111, vc114, vc115, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject. In these embodiments, the abundance may be increased by 10-99%, 10-20%, 20- 30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00131] In various embodiments, a decreased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of IBD in the subject: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 12, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467. In some embodiments, a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. The abundance may be decreased 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00132] In various embodiments, a decreased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of Crohn’s Disease (CD) in the subject: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284. The abundance may be decreased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00133] In various embodiments, an increased abundance of vc98 and/or vc103 in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of ulcerative colitis (UC) in the subject. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00134] The dataset may be prepared by sequencing VLP DNA isolated from GI microbiota sample(s). A fourth dataset may be obtained that represents bacterial sequences derived from the GI microbiota sample obtained from the subject, with the fourth dataset for the presence of bacterial taxa associated with IBD. The fourth dataset may be obtained by sequencing 16S rDNA or a V region (e.g., V4 region) of 16S rDNA in the GI microbiota sample. The presence of IBD in the subject may be determined based at least in part on the comparison of the fourth dataset to at least one of a healthy control and a control diagnosed with IBD.

[00135] In various embodiments, the GI microbiota sample is a fecal sample. In various embodiments, the GI microbiota sample is a cecal sample. In various embodiments, the GI microbiota sample is an ileal sample. In various embodiments, the GI microbiota sample is a colonic microbiota sample. In various embodiments, microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.

[00136] In various embodiments, the subject is human.

[00137] The methods may further comprise administering an IBD treatment to the subject. IBD treatments include conventional treatments such as mesalamine, steroids, immunomodulators, and dietary modification. IBD treatments may also comprise administration of compositions comprising viruses and bacteria, as described below.

[00138] Also provided is a method for preventing and/or treating IBD in a subject. An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411 , vc456, and vc467. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00139] Also provided is a method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,

Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from

Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,

Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota. [00140] Also provided is a method for preventing and/or treating CD in a subject. An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.

[00141] Also provided is a method for preventing and/or treating CD in a subject in need thereof. An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,

Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00142] Also provided is a method for preventing and/or treating UC in a subject in need thereof. An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,

Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00143] Also provided is a method for preventing and/or treating UC in a subject. An effective amount of a virus from a viral cluster vc98 and/or vc103 is administered. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof. The composition stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00144] Also provided is a method for preventing and/or treating IBD in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,

Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,

Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00145] Also provided is a method for preventing and/or treating CD in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter,

Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00146] Also provided is a method for preventing and/or treating UC in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.

[00147] Any of the above methods may further comprise administering to the subject additional diagnostic tests for IBD, CD and/or UC.

[00148] Any of the above methods may further comprise enrolling the subject in a clinical trial.

[00149] Bacterial taxa associated with IBD may comprise one or more of the following bacterial genera: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus,

Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia. Bacterial taxa associated with IBD may comprise a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. [00150] An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifr actor. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00151] An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

An increased abundance of the bacterial genus Flavonifractor in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.

An increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control may be indicative of the presence of UC in the subject. An increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with UC in remission may be indicative of the presence of a flare-up of UC in the subject. An increased abundance of

Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of UC in remission may be indicative of the presence of UC in remission in the subject. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold. [00152] A decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,

Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia. The abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00153] A decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150- 200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00154] A decreased abundance, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000- fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10- fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500- fold, or by about 1,000-fold, of the bacterial genus Akkermansia in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.

[00155] The analysis of the collection of the viral clusters created based on the subject’s viral genome sequences can include identifying common clusters present in both the collection of viral clusters associated with the subject and present in the collection of marker clusters. For each common cluster, a relative abundance of members within that cluster found in the subject’s GI microbiota sample can be determined. For each common cluster, a correlation value can be associated with each common cluster in the collection of marker clusters. The comparision of the viral clusters derived from the subject to the marker clusters can include comparing the relative abundance of members within each common cluster associated with the patient to the correlation value of each common cluster in the collection of marker clusters.

[00156] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is a decrease in the abundance of a virus of a viral taxon listed in Table 13 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00157] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is a decrease in the abundance of a virus of a viral taxon listed in Table 14 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00158] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of a virus of a viral taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00159] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of a virus of a viral taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00160] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00161] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00162] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 17 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00163] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 18 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold. [00164] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc23 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00165] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc39 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.

[00166] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc1O is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects.

[00167] In various embodiments, a kit for determining the presence of IBD in a subject can include a device to receive viral genome sequences, including unidentified viral genome sequences, from an individual subject and diagnose the subject for IBD based at least in part on the virome marker clusters. The viral genome sequences can be derived from a GI microbiota sample provided by the subject. The GI microbiota sample can be a fecal sample. The GI microbiota sample can be a cecal sample. The GI microbiota sample can be an ileal sample. The GI microbiota sample can be a colonic microbiota sample. In various embodiments, microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.

[00168] Diagnosis can include clustering the received viral genome sequences and comparing the subject’s viral genome clusters to the marker clusters. The viral genome sequences can be clustered by protein clustering and protein homology as described herein. The device can also analyse bacteria from the subject for the purpose of IBD diagnosis. The bacteria can be derived from the same GI microbiota sample provided by the subject used to obtain the viral genome sequences or a separate GI microbiota sample. The subject can be diagnosed for IBD based on the analysis of the bacteria and/or the analysis of the viral genome sequences. The IBD diagnosis can include a diagnosis for ulcerative colitis and/or Crohn’s disease.

[00169] Figure 17 is a block diagram illustration of a system or device 200 (referred to herein for simplicity as“device”) that be used as part of a kit for detecting IBD or health in a subject. The device 200 can include a dataset input module 210 configured to receive a dataset derived from a GI microbiota sample of a subject, a clustering module 220 configured to determine viral genome sequences within the subject’s dataset and cluster the viral genome sequences into viral clusters, a marker cluster input module 230 configured to receive an input that is based on IBD marker clusters, a cluster comparison module 240 that is configured to compare the subject’s viral clusters to the input representation of the marker clusters, and an output interface 250 configured to provide an indication of health or disease based on the comparison of the subject’s viral clusters to the representation of the marker clusters. The modules 210, 220, 230, 240 can be implemented by a computing system in hardware and/or software according to the principles described herein and as would be appreciated and understood by a person of ordinary skill in the art.

[00170] The dataset input module, when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of the subject’s dataset. The clustering module 220, when implemented at least in part of hardware, can include a processor in communication with a memory with instructions thereon to create viral clusters based on and/or associated with the subject’s dataset according the the principles described herein. The marker cluster input module 230, when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of IBD marker clusters. Additionally, or alternatively, the marker cluster input module 230 can include a memory store with a representation of the IBD marker clusters stored thereon. The cluster comparison module 240, when implemented at least in part by hardware, can include memory with instructions thereon to compare the subject’s viral clusters to the IBD marker clusters and provide as an output and indication of health or disease. The output interface 250, when implemented at least in part by hardware, can include a wired or wireless transmitter configured to transmit an electronic signal representative of the indication nfo health or disease. Additionally, or alternatively, the output interface 250 can include a user interface configured to provide an auditory, visual, or other sensory indication to a user that can be interpreted as an indication of health or disease.

[00171] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the viruses described herein. In some embodiments, the virus is from any viral taxon listed in Table 13. In some embodiments, the virus is from any viral taxon listed in Table 14

[00172] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or an agent that specifically targets, any of the viruses described herein. In some embodiments, the virus is from any viral taxon listed in Table 11. In some embodiments, the virus is from any viral taxon listed in Table 12

[00173] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the bacteria described herein. In some embodiments, the bacteria is from any bacterial taxon listed in Table 17. In some embodiments, the bacteria is from any bacterial taxon listed in Table 18.

[00174] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or agent that specifically targets, any of the bacteria described herein. In some embodiments, the bacteria is from any bacterial taxon listed in Table 15. In some embodiments, the bacteria is from any bacterial taxon listed in Table 16.

[00175] In another aspect, the disclosure provides a method for treating a gastrointestinal (GI) disorder in a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the virus compositions described herein. Non-limiting examples of encompassed GI disorders include, e.g., inflammatory bowel disease (IBD), ulcerative colitis, Crohn's disease, irritable bowel syndrome (IBS), infectious gastroenteritis, non-infectious gastroenteritis, food allergy, and gastrointestinal graft versus host disease. [00176] The disclosure also provides pharmaceutical compositions comprising the viruses and/or bacteria of the disclosure. The compositions disclosed herein can be formulated into a variety of forms and administered by a number of different means. Non-limiting examples of useful routes of delivery include oral, topical, rectal, mucosal, sublingual, nasal, intravenous, subcutaneous, and via naso/oro-gastric gavage. The active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation. The active agent, vector, virus, bacteriophage, particle, or a bacterial inoculant can be mixed with a carrier and (for easier delivery to the digestive tract) applied to liquid or solid food, or feed or to drinking water. The carrier material should be non-toxic to the

virus/bacteriophage/bacteria and the subject/patient. Non-limiting examples of formulations useful in the methods of the present disclosure include oral capsules and saline suspensions for use in feeding tubes, transmission via nasogastric tube, or enema. If live virus, bacteriophage or bacteria are used, the carrier should preferably contain an ingredient that promotes viability of the virus/bacteriophage/bacteria during storage. The formulation can include added ingredients to improve palatability, improve shelf-life, impart nutritional benefits, and the like. If a reproducible and measured dose is desired, the formulation can be administered by a rumen cannula. In certain embodiments, the formulation used in the methods of the disclosure further comprises a buffering agent. Examples of useful buffering agents include saline, sodium bicarbonate, milk, yogurt, infant formula, and other dairy products.

[00177] Bacteria-containing formulations may also comprise one or more prebiotics which promote growth and/or immunomodulatory activity of the bacteria in the formulation. While it is possible to use a compound, vector, virus, bacteriophage, particle, or a bacterial inoculant of the present disclosure for therapy as is, it may be preferable to administer it in a pharmaceutical formulation, e.g., in admixture with a suitable pharmaceutical excipient, diluent or carrier selected with regard to the intended route of administration and standard pharmaceutical practice. The excipient, diluent and/or carrier must be“acceptable” in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient thereof. Acceptable excipients, diluents, and carriers for therapeutic use are well known in the pharmaceutical art, and are described, for example, in Remington: The Science and Practice of Pharmacy. Lippincott Williams & Wilkins (A.R. Gennaro edit. 2005). The choice of pharmaceutical excipient, diluent, and carrier can be selected with regard to the intended route of administration and standard pharmaceutical practice. Although there are no physical limitations to delivery of the formulations of the present disclosure, oral delivery is preferred for delivery to the digestive tract because of its ease and convenience, and because oral formulations readily accommodate additional mixtures, such as milk, yogurt, and infant formula.

[00178] Oral delivery may also include the use of nanoparticles that can be targeted, e.g., to the GI tract of the subject, such as those described in Yun et al., Adv Drug Deliv Rev. 2013, 65(6):822-832 (e.g., mucoadhesive nanoparticles, negatively charged carboxylate- or sulfate- modified particles, etc.). Non-limiting examples of other methods of targeting delivery of compositions to the GI tract are discussed in U.S. Pat. Appl. Pub. No. 2013/0149339 and references cited therein (e.g., pH sensitive compositions [such as, e.g., enteric polymers which release their contents when the pH becomes alkaline after the enteric polymers pass through the stomach], compositions for delaying the release [e.g., compositions which use hydrogel as a shell or a material which coats the active substance with, e.g., in vivo degradable polymers, gradually hydrolyzable polymers, gradually water-soluble polymers, and/or enzyme degradable polymers], bioadhesive compositions which specifically adhere to the colonic mucosal membrane, compositions into which a protease inhibitor is incorporated, a carrier system being specifically decomposed by an enzyme present in the colon).

[00179] For oral administration, the active ingredient(s) can be administered in solid dosage forms, such as capsules, tablets, and powders, or in liquid dosage forms, such as elixirs, syrups, and suspensions. A capsule typically comprises a core material comprising a bacterial composition and a shell wall that encapsulates the core material. In some embodiments, the core material comprises at least one of a solid, a liquid, and an emulsion. In other embodiments, the shell wall material comprises at least one of a soft gelatin, a hard gelatin, and a polymer. Suitable polymers include, but are not limited to: cellulosic polymers such as hydroxypropyl cellulose, hydroxyethyl cellulose, hydroxypropyl methyl cellulose (HPMC), methyl cellulose, ethyl cellulose, cellulose acetate, cellulose acetate phthalate, cellulose acetate trimellitate,

hydroxypropylmethyl cellulose phthalate, hydroxypropylmethyl cellulose succinate and carboxymethylcellulose sodium; acrylic acid polymers and copolymers, such as those formed from acrylic acid, methacrylic acid, methyl acrylate, ammonio methylacrylate, ethyl acrylate, methyl methacrylate and/or ethyl methacrylate (e.g., those copolymers sold under the trade name “Eudragit”); vinyl polymers and copolymers such as polyvinyl pyrrolidone, polyvinyl acetate, polyvinylacetate phthalate, vinylacetate crotonic acid copolymer, and ethylene-vinyl acetate copolymers; and shellac (purified lac). In yet other embodiments, at least one polymer functions as taste-masking agents.

[00180] The active component(s) can be encapsulated in gelatin capsules together with inactive ingredients and powdered carriers, such as glucose, lactose, sucrose, mannitol, starch, cellulose or cellulose derivatives, magnesium stearate, stearic acid, sodium saccharin, talcum, magnesium carbonate. Examples of additional inactive ingredients that may be added to provide desirable color, taste, stability, buffering capacity, dispersion or other known desirable features are red iron oxide, silica gel, sodium lauryl sulfate, titanium dioxide, and edible white ink. Similar diluents can be used to make compressed tablets. Both tablets and capsules can be manufactured as sustained release products to provide for continuous release of medication over a period of hours. Compressed tablets can be sugar coated or film coated to mask any unpleasant taste and protect the tablet from the atmosphere, or enteric-coated for selective disintegration in the gastrointestinal tract. Liquid dosage forms for oral administration can contain coloring and flavoring to increase patient acceptance.

[00181] Formulations suitable for parenteral administration include aqueous and nonaqueous, isotonic sterile injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and nonaqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.

[00182] Alternatively, powders or granules embodying the bacterial and viral compositions disclosed herein can be incorporated into a food product. In some embodiments, the food product is a drink for oral administration. Non-limiting examples of a suitable drink include fruit juice, a fruit drink, an artificially flavored drink, an artificially sweetened drink, a carbonated beverage, a sports drink, a liquid diary product, a shake, an alcoholic beverage, a caffeinated beverage, infant formula and so forth. Other suitable means for oral administration include aqueous and nonaqueous solutions, emulsions, suspensions and solutions and/or suspensions reconstituted from non-effervescent granules, containing at least one of suitable solvents, preservatives, emulsifying agents, suspending agents, diluents, sweeteners, coloring agents, and flavoring agents. The food product can be a solid foodstuff. Suitable examples of a solid foodstuff include without limitation a food bar, a snack bar, a cookie, a brownie, a muffin, a cracker, an ice cream bar, a frozen yogurt bar, and the like.

[00183] In other embodiments, the bacterial and viral compositions disclosed herein are incorporated into a therapeutic food. In some embodiments, the therapeutic food is a ready-to- use food that optionally contains some or all essential macronutrients and micronutrients. In another embodiment, the compositions disclosed herein are incorporated into a supplementary food that is designed to be blended into an existing meal. In one embodiment, the supplemental food contains some or all essential macronutrients and micronutrients. In another embodiment, the bacterial compositions disclosed herein are blended with or added to an existing food to fortify the food's protein nutrition. Examples include food staples (grain, salt, sugar, cooking oil, margarine), beverages (coffee, tea, soda, beer, liquor, sports drinks), snacks, sweets and other foods.

[00184] The useful dosages of the compositions and formulations of the disclosure will vary widely, depending upon the nature of the disease, the patient’s medical history, the frequency of administration, the manner of administration, the clearance of the agent from the host, and the like. The initial dose may be larger, followed by smaller maintenance doses. The dose may be administered as infrequently as weekly or biweekly, or fractionated into smaller doses and administered daily, semi- weekly, etc., to maintain an effective dosage level.

Additional Embodiments:

1. A method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising: obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;

obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;

creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort; creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.

2. The method of embodiment 1 ,

wherein at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and

wherein at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.

3. The method of embodiments 1 or 2, wherein a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.

4. The method of any one of embodiments 1-3, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.

5. The method of any one of embodiments 1-4, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.

6. The method of any one of embodiments 1-5, wherein each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.

7. The method of any one of embodiments 1-6, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.

8. The method of embodiment 7, wherein performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).

9. The method of any one of embodiments 1-8, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.

10. The method of any one of embodiments 1-9, wherein the healthy cohort and the cohort diagnosed with IBD are each human cohorts.

11. The method of any one of embodiments 1-10, further comprising:

associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD);

associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);

associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort;

associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and

identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters. 12. The method of any one of embodiments 1-11, further comprising:

representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and

representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.

13. The method of any one of embodiments 1-12,

wherein the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,

wherein the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and

wherein the method further comprises:

creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;

creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and

wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.

14. The method of embodiment 13,

wherein the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and

wherein the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database. 15. A method for determining the presence of inflammatory bowel disease (IBD) in a subject, the method comprising:

obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;

creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;

obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.

16. The method of embodiment 15, wherein at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.

17. The method of embodiments 15 or 16, wherein a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.

18. The method of any one of embodiments 15-17, wherein at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.

19. The method of any one of embodiments 15-18, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters.

20. The method of any one of embodiments 15-19, wherein the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. 21. The method of any one of embodiments 15-20, wherein the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1 Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vei l 3, vei l 5, vc1 l7, vei l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467.

22. The method of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.

23. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

24. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

25. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

26. The method of embodiment 22, wherein an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

27. The method of any one of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc1 l l, vc1 l4, vei l 5, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.

28. The method of any one of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.

29. The method of any one of embodiments 15-21, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.

30. The method of embodiment 29, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. 31. The method of embodiment 29, wherein a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

32. The method of embodiment 29, wherein a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

33. The method of any one of embodiments 15-32, wherein obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).

34. The method of any one of embodiments 15-33, further comprising:

obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and

evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.

35. The method of embodiment 34, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.

36. The method of embodiment 34 or embodiment 35, wherein the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Bamesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. 37. The method of embodiment 36, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.

38. The method of embodiment 37, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

39. The method of embodiment 34 or embodiment 35, wherein an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

40. The method of embodiment 34 or embodiment 35, wherein an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.

41. The method of embodiment 34 or embodiment 35, wherein an increased abundance of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject.

42. The method of embodiment 37, wherein an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

43. The method of embodiment 36, wherein a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.

44. The method of embodiment 43, wherein a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifr actor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.

45. The method of embodiment 43, wherein a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.

46. The method of any one of embodiments 34-45, wherein obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.

47. The method of embodiment 46, wherein the V region is V4 region.

48. The method of any one of embodiments 15-47, wherein the GI microbiota sample is a fecal sample, a cecal sample, an ileal sample, or a colonic microbiota sample.

49. The method of any one of embodiments 15-48, wherein the subject is human.

50. The method of any one of embodiments 15-49, further comprising administering an IBD treatment to the subject.

51. The method of any one of embodiments 15-50, further comprising administering to the subject additional diagnostic tests for IBD, CD and/or UC. 52. The method of any one of embodiment s 15-51, further comprising enrolling the subject in a clinical trial.

53. The method of any one of embodiments 15-52, wherein comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:

identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;

determining relative abundance of members within each common cluster in the plurality of subject viral clusters;

associating a correlation value with each common cluster in the plurality of marker clusters; and

comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.

54. A kit for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:

a device to:

receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;

receive a second dataset representing a plurality of viral genome IBD marker clusters;

create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and

compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset. 55. The kit of embodiment 54, wherein the device is further configured to:

receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;

evaluate the third dataset for the purpose of IBD diagnosis; and

determine the presence of IBD based at least in part on the evaluation of the third database.

56. The kit of embodiment 54 or 55, wherein the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.

57. The kit of any of embodiments 54-56, wherein the IBD is ulcerative colitis (UC).

58. The kit of any of embodiments 54-56, wherein the IBD is Crohn's disease (CD).

59. The kit of any one of embodiments 54-58, wherein the subject is human.

60. A system comprising:

one or more processors;

a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to:

receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;

receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD;

create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;

create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and

identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.

61. A method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc101, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467.

62. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

63. A method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.

64. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

65. A method for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103. 66. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.

67. The method of embodiment 61 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

68. The method of embodiment 63 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

69. The method of embodiment 65, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. 70. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

71. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.

72. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.

73. The method of embodiment 67 or embodiment 70, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.

74. The method of embodiment 68 or embodiment 71, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.

75. The method of embodiment 69 or embodiment 72, wherein said probiotic composition comprises one or more bacterial strains from the genus Akkermansia.

76. The method of any one of embodiments 67-75, wherein the V region is V4 region.

77. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

78. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.

79. The method of any one of embodiments 61-78, wherein the subject is human.

EXAMPLES

[00185] The present invention is also described and demonstrated by way of the following examples. However, the use of these and other examples anywhere in the specification is illustrative only and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to any particular preferred embodiments described here. Indeed, many modifications and variations of the invention may be apparent to those skilled in the art upon reading this specification, and such variations can be made without departing from the invention in spirit or in scope. The invention is therefore to be limited only by the terms of the appended claims along with the full scope of equivalents to which those claims are entitled.

[00186] As an illustration of the principles described above, an analysis the whole-virome of a published keystone IBD virome cohort is presented herein. Using protein-based clustering of viral sequences, example methods applied according to the present invention are demonstrated to overcome high levels of inter-individual variation in the gut virome and reveal compositional changes within the gut virome in subjects with IBD. Virome changes are shown to reflect alterations in bacterial composition. Viromes of individuals with Crohn’s disease can be characterized by increased numbers of temperate phage sequences. No substantial change is observed in viral alpha diversity across cohorts. Incorporating both the bacteriome and virome composition is demonstrated to offered more accurate classification of subjects as healthy or diseased compared to classification based on the bacteriome alone.

[00187] An analysis of a keystone dataset consisting of subjects with CD, UC and healthy controls is presented herein. This analysis overcomes strain-level resolution using protein homology and MCL to create higher taxonomic ranks and reveals hitherto unseen compositional patterns across the virome in health and disease. The analysis includes up to 97% of reads per sample (78.55 ± 18.79% (meant SD)), rather than the 15% used in a prior publication also relying on the same keystone dataset(Norman et al, 2015, which is incorporated herein by reference in its entirety). As demonstrated in the presently illustrated analysis, it is possible to identify patterns across individuals and cohorts. Alterations in the virome were observed and potential disease biomarkers were identified for further characterization. This disclosure shows that virome alterations mimic that of the bacteriome and offers an improved method for classifying IBD patients from healthy subjects.

[00188] Without wishing to be bound by theory, the described approach provides insight into the viral dark matter in human health and disease. The methods also allow cohort comparisons and overcome problems associated with the high level of inter-individual variation. By identifying sequences which are associated with health and disease, this approach provides a framework for identifying novel virome biomarkers and targets for further wet-lab

characterization. [00189] It has been previously reported that the human gut virome exhibits high levels of inter individual variation (Reyes et al, 2010. Nature, 466, 334-8) which is exacerbated by the need to analyze the virome at an assembly level which leads to strain level analysis. Unlike 16S analysis performed on bacteria, which is assessed at higher taxonomic ranks such as family and genus, viral and phage taxonomy does not have a similar defined structure which makes comparisons of cohorts very difficult. Strain level resolution hampers cohort comparisons due to a lack of commonality amongst in samples in the dataset which masks compositional patterns occurring at higher taxonomic ranks. The analysis presented here originally utilized vOTUs (viral contigs made non-redundant at 90% identity over 90% of the length), but this level of resolution masked shared signals across cohorts. This was overcome by clustering viral genomes based on their protein content using vContact2 (Bolduc et al., 2017. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243) to create a higher, protein based, taxonomic rank. This revealed shared virome features while retaining relevant biological signals across the cohort (as seen by VCs across subjects), increased the variation explained in the beta diversity (increased eigenvalues) and decreased the abundance of unique viral VCs per subject across the dataset. Viral clustering also enabled the detection of a core virome in healthy subjects, consisting of eight VCs across 50% of the cohort. This proved to be a key differentiator between health and disease throughout the analysis. Many of these core VCs were differentially abundant in health and disease and were primary drivers of PCoA separation and machine learning predictions. Without wishing to be bound by theory, the method can allow for improved comparisons of cohorts for a multiude of disease conditions and body sites by enabling the analysis of the whole virome with a reduction of the level of uniqueness at strain level; differently abundant VCs across cohorts can themselves be viral marker clusters or can lead to the identification of viral marker clusters that can be used to classify individual subject samples as indicative of health or disease.

[00190] In beta diversity analysis, IBD subjects shifted significantly away from healthy controls thus providing evidence of compositional differences in the gut viromes. Drivers of these separations were associated with Myoviridae and Siphoviridae in IBD and Microviridae and crAss-like phages in healthy subjects. This was also reflected in differential abundance analysis with many VCs classified as Myoviridae and Siphoviridae significantly increased in subjects with IBD. No differences were found between the alpha diversity of the subjects with IBD and healthy controls, a finding contradictory to previous publications (Zuo et al, 2019. Gut), including previous analysis of this dataset (Norman et al, 2015. Cell, 160, 447-60). These findings may reflect the limited scope of database dependent analysis methods and that changes in these subsets of the viral community are not reflected in the virome as a whole. The findings presented here also suggest that although the composition of the virome is altered in subjects with disease the number of viruses remains consistent. The alpha diversity of VCs was assessed with a classification in the order Caudovirales to replicate the work of Norman et al., (2015.

Cell, 160, 447-60). Results were in agreement in that an increase in Caudovirales diversity in the CD cohort was observed, however no significance for UC versus controls was observed when analyzing the total virome. This could reflect a more global change with abundant lytic phage being replaced with temperate phage in IBD environments. It was speculated that a more stressful environment for the bacteria could result in lysogenic phage being released. This stress would also result in a decreased bacterial alpha diversity in IBD which was also observed here.

In terms of beta diversity subjects with IBD had the smallest distances between points which could also be attributed to the low diversity. The incorporation of <)>Q33, a lactococcal phage not native to the human gut, at known concentrations allowed for the quantification of the total bacteriophage loads in the fecal samples. Interestingly, viral load was inversely proportional to observed VCs suggesting that dominance of particular viruses rather than an expansion of many could be responsible for a higher viral load.

[00191] This study provides new evidence of a correlation between alterations in the whole human gut virome to IBD. As previously reported the genus Faecalibacterium (Lopez-Siles et al., 2015. Appl Environ Microbiol, 81, 7582-92; Lopez-Siles et al., 2018. Front Cell Infect Microbiol, 8, 281; Gevers et al, 2014. Cell Host Microbe, 15, 382-392; Pascal et al, 2017. Gut, 66, 813-822; Machiels et al, 2014. Gut, 63, 1275-83) was depleted in IBD cohorts along with Ruminococcaceae (Gevers et al, 2014). Furthermore, it was found many differentially abundant taxa in agreement with the literature including Fusobacterium (Pascal et al., 2017, Strauss et al., 2011. Inflamm Bowel Dis, 17, 1971-8; Gevers et al., 2014), Veillonella (Gevers et al., 2014) and Ruminococcus gnavus (Joossens et al., 2011. Gut, 60, 631-7; Willing et al., 2010.

Gastroenterology, 139, 1844-1854 el) which correlated towards the shift in beta diversity as previously found in subjects with IBD. The trends observed in both alpha and beta diversity are also in agreement with previous reports (Halfvarson et al, 2017. Nat Microbiol, 2, 17004; Manichanh et al., 2006. Gut, 55, 205-11 ; Dicksved et al., 2008. ISME J, 2, 716-27; Pascal et al., 2017. Gut, 66, 813-822) and the previous analysis of this dataset (Norman et al, 2015. Cell, 160, 447-60), thus providing validity to the cohort tested and the methods. Although it is not possible from a cross-sectional study to know whether the virome alters the bacteriome or vice-versa, the datasets complement each other. The beta diversity trends were observed in both datasets and this was confirmed through Procrustes analysis. Although 16S showed improved accuracy in classifying subjects with IBD from controls, the addition of the virome improved upon this classification to over 94% area under the curve and to over 85% accuracy. All subjects with IBD were correctly classified, proving further evidence that the gut bacteriome and virome are truly altered in IBD disease, such that the gut bacaterome and virome are potential avenues for any of diagnosis, therapeutics and correction.

[00192] A common trend observed in the present virome analysis is the increased severity in the virome alteration of CD patients in comparison to UC, a finding replicated in the 16S. In PCoA, CD is located further from the healthy controls while subjects with CD are also the least stable cohort. There were also a greater number of differentially abundant clusters between healthy and CD compared to healthy and UC. Subjects with CD also had more differentially abundant RSVs than UC versus controls in the bacteriome, together with being located furthest from controls on the PCoA. This CD cohort had the least beta stability which may also be linked to having the lowest diversity. Interestingly, CD had a significantly higher diversity of Caudovirales and an increased number of reads aligned to lysogenic VCs when compared to healthy controls.

Furthermore, when differentiating cohorts using machine learning, many of the top 20 importance factors were differentially abundant between CD and controls for both virome and 16S models.

[00193] In light of the findings of this study two potential scenarios were proposed to describe the host-bacteriome-virome interaction in IBD and particular CD. One hypothesis is that oxidative stress in the gut during inflammation (Rigottier-Gois, 2013. ISME J, 7, 1256-61) creates a more stressful living environment for the bacteria which lyse and release lysogenic phage, also resulting in decreased bacterial diversity. These lysogenic phage are then detected in virome analysis as seen in this study and previously reported (Norman et al., 2015. Cell, 160, 447-60, Zuo et al., 2019. Gut mucosal virome alterations in ulcerative colitis. Gut). Secondly, although less probable, an event causing more lysogenic induction would result in increased bacterial lysis, therefore a decreased bacterial alpha diversity, and more lysogenic phage being detected. Currently it is not possible to pinpoint the precise mechanism of action, however it is clear that there are specific compositional differences in the viromes of subjects with IBD.

[00194] Examining the interaction of virome composition and disease state in UC in greater detail revealed more subtle changes than that seen between health and disease. This finding, in conjunction with the overall comparison between UC, CD and healthy controls, suggests the virome is not only less perturbed between healthy and UC, but also between flare and remission. This may reflect the disease severity of UC relative to CD or that they may interact with the host in different ways. Differences in disease locations, severity and risk factors such as the potential paradoxical relationship between CD and UC with smoking (Berkowitz et al., 2018. Front Immunol, 9, 74) have previously alluded to differences in disease etiologies. It is possible that the virome composition does not alter significantly between disease states to the same extent as disease versus healthy. However, disease status was not tested in a CD cohort which may be more fruitful as the difference in CD were more exaggerated.

[00195] Those of ordinary skill in the art can envision additional steps and methods for analysis. Increased sample sizes, particularly for disease state, may increase the ability to detect any potential alterations between flare and remission. Inclusion of one or more of food frequency questionnaires, medical history details, and medication history (Maier et al, 2018. Nature, 555, 623-628) may provide for improved analysis of the microbiota. Addition of metadata, such as that including household controls, can assist in the statistical analysis and can allow the exploration of environmental effects. Other DNA amplification steps besides MDA

amplification, which has a known bias towards Microviridae (Parras-Molto et al, 2018.

Microbiome, 6, 119), can be undertaken. More modern methods such as the accel-NGS prep kit may remove the need for amplification and may provide for more reliable indication of diversity (Roux et al, 2016. Towards quantitative viromics for both double-stranded and single-stranded DNA viruses. PeerJ, 4, e2777).

[00196] This study provides a detailed analysis of whole virome composition comparing CD/UC and healthy controls date. It also represents a detailed study of the unidentified majority of the virome in human disease and provides insights, paving the way for better understanding of the human virome as a whole. This analysis shows that analysis of the dark matter can be used to detect accurate profiles of the human gut virome. Although it is not yet possible to conclude if the bacteriome shapes the virome or vice-versa, they do correlate with each other, as shown by Procrustes analysis, and can assist in the classification of subjects with IBD from healthy controls. This analysis provides a method for the comparison of whole viromes across cohorts in diseases other than IBD, which will give further insights into how a fuller understanding of the role of the microbiome in health and disease can be beneficial.

[00197]

Datasets

[00198] A publicly available dataset which was generated on human gut virome composition associated with IBD (Norman et al., 2015) was utilized. The dataset was analyzed with a novel whole- virome analysis protocol that provided novel insights into compositional changes of the virome, and any potential role of such changes in IBD. The dataset (Norman dataset) comprised 165 virome samples from 130 subjects, more specifically 61 healthy controls, 27 subjects with Crohn’s disease (CD), and 42 subjects with ulcerative colitis (UC). Of these, six samples were known to be collected during CD flare, eight in CD remission, 13 in UC flare, and 20 in UC remission. To build upon these findings, a second dataset (Simponi dataset] was generated that consisted of longitudinal samples from 40 subjects with UC. These samples included 82 samples from periods of flareand 31 samples from periods of remission, allowing for the investigation of the impact of disease status on gut virome composition. 16S rRNA gene sequencing data was also obtained and performed for 149 (Normal dataset) and 109 (Simponi dataset! samples.

Protein-based clustering can overcome virome individuality and allow cohort comparisons

[00199] In order to compare the viromes of subjects with IBD to healthy controls, the overall composition was initially investigated at a viral contig level through PCoA of beta diversity (Figure 1A). The viral contigs represent individual viral genomes (whole or part), and therefore the resolution is at the strain level. This level of resolution was reflected in the extremely high levels of individuality amongst subjects at an assembly level. It was also observed in beta diversity as individuals were the primary drivers of separation with longitudinal samples grouping together. This individual specificity masked compositional differences of the virome. Each of the cohorts (control, CD and UC) showed little divergence with overlapping ellipses, while PC-axes 1 and 2 described very little of the variation in the dataset (4.85% and 3.59%). [00200] To overcome this high interpersonal variation, investigation of disease specific compositional changes in the virome a lower taxonomic resolution (i.e. a higher taxonomic rank) was developed. This was achieved by clustering viral contigs based on protein similarities and MCL using vContact2 (see methods). The viral contigs clustered into 472 Viral Clusters (VCs) of >2 members with 2,382 singletons remaining. (Singletons are henceforward referred to as a VC with one member.) The resulting VCs formed a new count table and so a VC-based PCoA was produced (Figure IB).

[00201] Samples largely grouped per condition with noticeable increases in the eigenvalues to 10.36% and 5.58% variation explained for PC-axes 1 and 2 respectively. However, it should be noted that samples with true deviation from the main cohort (such as subjects N208 and N56) remained distinctive, suggesting that the clustering process retains true compositional differences. To further determine if the use of clustering decreased the masking effect of intra individuality, the viral contig and VC relative abundances were plotted for control subjects and colored by the percent abundance of shared viral contigs/clusters across the remainder of the control cohort (Figure 1C). The relative abundance of viral contigs unique to each subject was 14% (± 8%) on average while an abundance of only 1.7% (± 4%) on average per subject was shared across 30% the healthy control cohort. There were no individual viral contigs shared across greater than 50% of the cohort. In contrast, clustering reduced the mean abundance of clusters unique to the subject to 1.3% (± 3%), while there was a mean abundance of 15% (± 6%) per subject shared across 30% of the cohort, 7.1% (± 6.6%) across 50% and 0.7% (± 1.4%) across 70%. A total of eight VCs were now shared across 30% of CD and UC cohorts where previously no viral contigs had been found (Figure ID). Therefore the use of clustering resulted in increased commonality across the dataset and therefore allowed for the comparison of viromes between cohorts.

Analysis of viral clusters reveals IBP specific alterations in the gut virome

[00202] PCoA beta diversity analysis using Spearman distances found CD relapse and remission located furthest from controls (p-value: 0.0023 and 0.0032, respectively) followed by UC relapse/remission (p-values: 0.002/0.0023) (Figure 2A). Small variations were observed in the disease state of each condition but were not significant although this may be due to small sample sizes. PCoA without the division of disease status showed CD and UC beta diversity significantly differed from healthy controls (p-values: 0.0002 and 0.0002; Figure 7A). The healthy cohort was also the least diverse across subjects, followed by UC, having the smallest pairwise distances between points (Figure 7B). This level of commonality was also reflected when comparing the number of clusters shared across each cohort as previously addressed

(Figure ID). There was an observable core virome (defined as presence across >50% of subjects) in the healthy cohort with two VCs (vc2 and vc7) shared across >70% of subjects and six (vc1, vc1O, vc23, vc25, vc32, vc39) across >50%. The majority of these VCs were unclassified (i.e. did not cluster with known viral genomes) with the exception of vc1 , classified as Siphoviridae, and vc1 O which is a crAss-like phage. In contrast, core VC’s were not found across UC subjects and just one core VC (vc32 unclassified) was found across CD subjects.

[00203] To assess the impact of whole virome analysis, Caudovirales and overall viral alpha diversity were assessed as these had been found to be different across cohorts using database dependent methods in the original analysis. Alpha diversity was calculated for each sample using the VC count tables. There were no significant differences across the cohorts and disease states for each disease (Figure 2B). Differences between the IBD conditions and healthy controls were compared using an additional diversity measurement (Shannon diversity) but no significance was found (Figures 8A-8B). The alpha diversity of any VCs assigned to Caudovirales families was also compared (Figures 9A-9B) and significance was found for CD (increased) versus healthy only.

[00204] DeSeq2 analysis revealed a number of classifiable VCs, including two crAss-like phages and two Microviridae, at significantly increased abundances in healthy controls compared to CD (Figure 2C) and UC (Figure 2D) vc19 and vc320 (crass-like phages) were absent from all CD and only vc320 was in one subject with UC, but other clusters classified as crAss-like phages were present. Conversely VCs classified as Siphoviridae (nine for CD, eight for UC) and Myoviridae (one for CD, two for UC) were increased in CD and UC versus controls. However, many of the most differentially abundant clusters were taxonomically unassigned and were therefore classed as being a part of the“viral dark matter”. 49 VCs were increased in control samples relative to CD with 86 differentially abundant in total. 25 VCs were increased in control samples relative to UC with 59 differentially abundant in total. All DeSeq2 results are found in Table 1 (control vs CD) and Table 2 (control vs UC). Interestingly, 30 of the 37 VCs (81%) increased in CD compared to controls and 28 of the 34 (82%) for UC versus controls were categorized as lysogenic. This was just 32% and 24% for VC increased in CD and UC, respectively. Further investigation into the presence of lysogenic VC abundance in each cohort indicated that CD subjects have significantly more reads aligned to lysogenic VCs than healthy controls (Figures 10A-10B).

The bacteriome also differs between patients with IBP and controls

[00205] Bacteriome data was compared to previously published studies to verify this dataset is not an outlier. Beta diversity (unweighted UniFrac) showed CD (relapse/remission) samples grouping furthest from controls (p-value: 0.0065, 0.0332) followed by UC (relapse/remission) (p-value: 0.018, 0.001) (Figure 3A), which was reflected in the virome composition.

Interestingly, and in contrast to the virome, the control bacteriome contained the largest variation amongst samples with CD having the smallest distances between points (Figures 7C-7D).

[00206] Decreased alpha diversity was observed (Chaol diversity) in the IBD cohorts versus healthy controls with the largest differences observed in CD flare (p-value: 0.012) and remission (p-value: 0.018) along with UC Flare (p-value: 0.051) (Figure 3B). Due to the small sample sizes this analysis was also repeated without the division of disease status and using various metrics (Figures 8C-8D). For both Chaol diversity and Shannon diversity, the healthy cohort was significantly higher than both IBD cohorts, while UC was also significantly increased when compared to CD (0.001).

[00207] A large number of taxa were found differentially abundant between control and CD (Figure 3C) and control versus UC (Figure 3D). A total of 113 taxa were decreased in CD versus controls while just 17 were increased. Similarly, 69 were increased in control vs UC and only 21 significantly increased in UC. Many of the taxa increased in controls versus both IBD cohorts were of the phyla Firmicutes and included the genera Faecalibacterium and the families Ruminococcaceae and Clostridiales. The most differentially abundant RSVs increased in CD versus controls included Fusobacterium and Veillonella, while the most increased in UC versus controls were Clostridium senso stricto and Lachnospiraceae (DeSeq2 results are listed in Table 3 (control vs CD) and Table 4 (control vs UC)). Correlations between PCoA and abundance counts reveal key drivers of gut microbiome composition

[00208] The drivers of significant shifts in beta diversity were assessed through correlations between PC coordinates and the VCs (for the virome) and RSVs (for the bacteriome). There were 25 VCs significantly correlated to PC-axes 1 and/or 2 (Figure 4A). Dependent upon the correlation coefficient, the associations could further be broken down into four quadrants. In quadrant 1 (top left), towards subjects with IBD, there were 18 significantly correlated and comprised of eight Siphoviridae, two Myoviridae, two Heterogeneous and six unclassified

(Table 5). Quadrant 3 (bottom left) one Myoviridae and 1 unclassified VC were significantly correlated towards subjects with IBD. VCs classed as Microviridae and crAss-like phages were significantly correlated towards the healthy controls (quadrant 4, bottom right), while there were also two unclassified VCs.

[00209] There were 76 RSVs significantly correlated towards controls (quadrant 1) for the beta diversity of 16S composition (Figure 4B). The correlations with the highest correlation coefficient (rho values) included RSVs with taxonomic assignments to Firmicutes,

Ruminococcaceae and Alistipes. Quadrant 3 correlations, also towards controls, contained 46 RSVs including Alistipes indistinctus and Clostridiales. For Quadrant 4, towards IBD subjects, four RSVs were significantly correlated including Ruminococcus gnaves and Flavonifractor plautii (Table 6).

[00210] The relationship between the virome and 16S composition was investigated through Procrustes analysis (Figures 10A and 10B). There was a significant positive correlation with an observed correlation coefficient of 0.7143 (p-value of 0.001). However, VC alpha diversity did not significantly correlate with observed bacterial species (Figure 8E), although there was significant weak correlation with Shannon diversity (p-value: 0.038, rho: 0.194) (Figure 8F).

Alterations in virome composition are less distinct between UC activity states

[00211] Differences in disease states (flare and remission) were investigated using a second cohort of 40 subjects with ulcerative colitis, sampled longitudinally resulting in 113 virome and 109 16S samples. Beta diversity analysis of virome composition using VCs (Figure 5A) did not show significant separation between flare and remission (p-value: 0.17). However, unclassified viral cluster vc40 was found to be significant (Table 7). In 16S analysis, the shift between flare and remission in beta diversity was not significant (p-value: 0.022) and there were 14 RSVs correlated to PC-axes 1 or 2 (Figure 5B, Table 8). RSVs towards the shift in UC remission (quadrant 1 ) included Faecalibacterium prausnitzii, Dorea longicatena and Coprococcus comes. An RSV classified as Ruminococcus gnavus was the only RSV which correlated towards UC flare. The virome and 16S were correlated using Procrustes analysis and there was a significant positive correlation, in agreement with previous results, with an observed correlation coefficient of 0.906 (p-value of 0.001) (Figure 12).

[00212] Although the median alpha diversity was higher in the virome for UC flare (Figure 5C) and UC remission for the 16S (Figure 5D), these values were not significant when assessed using both Chaol and Shannon diversity, again in agreement with the previous analysis. Viral load was estimated through spiking with a known concentration of lactococcal )>Q33 phage and was found to be negatively correlated with viral alpha diversity (rho: -0.415, p-value: 0.009) (Figure 13A). Viral diversity was also investigated over time and in relation to disease status (Figure 13B) and although there were fluctuations in the time series there was no observable trend with disease status and a comparison resulted no significant differences (p-value: 0.383).

[00213] Two crAss-like phages were increased in subjects in remission when compared to flare along with 2 Siphoviridae, 1 Microviridae and 7 unclassified phage (Figure 5E, Table 9).

Conversely there were 39 VCs increased in flare. These included two Anelloviridae, one Myoviridae, ten Siphoviridae and 24 unclassified. Bacteroides and Dialister were the only RSVs increased in remission while seven RSVs were increased in flare including Enterococcus, Prevotella and Streptococcus (Figure 5F, Table 10).

Virome composition aids the classification between Health and Disease

[00214] The ability of the virome and 16S composition to differentiate between patients with IBD and healthy controls was tested through the use of machine learning. Sample sizes were increased by combining UC and CD samples to form a composite IBD cohort. The virome alone (Figure 6A) yielded an accuracy of 0.769 (p-value of 0.032) with four of the top five contributors (vc39, vc23, vc38 and vc45) being increased in controls versus both IBD states. All five of these clusters were unclassified but two had CRISPR protospacer alignments to

Lachnospiraceae and Parabacteroides while the remaining two had hits to Bacteroides. The 16S alone had a greater predictive power than the virome (accuracy: 0.824, p-value: 0.008) with an RSV classified as Ruminococcaceae contributing the largest gain followed by a Clostridiales and Odoribacter splancgnicus (Figure 6B). The virome and the 16S were combined and the predictive power measured (Figure 6C). The accuracy increased to 0.853 (p-value: 0.0026) with the virome contributing to five of the top 20 most important features. Of these, 4 had CRISPR protospacers to bacteria including the order Clostridales, the family Lachnospiraceae, genus Pseudoflavonifractor, Clostridium and Johnsonella along with Fusobacterium and Bacteroides (Figure 14). Differences between CD and healthy proved to be the main predictors of disease with 11 VCs/RSVs being decreased in CD and one increased when compared to controls.

[00215] ROC curve analysis was performed as a second measure of accuracy of each model (Figure 6D). The AUC (area under the curve) of the virome alone was 78.31%, a decrease compared to 16S AUC which yielded an AUC of 89.72%. However, the virome and 16S combined had the largest AUC with 94.79%, predicting all 16 patients with IBD as IBD and only misclassifying five controls as IBD.

Key VCs revealed by the analysis of IBD viromes

[00216] Through various approaches of virome analysis ten key VCs consistently emerged (Figures 15A-15J). A key VC was defined as any which was core in one cohort and largely absent from another and/or significantly correlated in the PCoA axes and differentially abundant between the cohorts. vc23, vc39 and vc1O were present in the healthy core and largely absent from the subjects with CD (7, 14 and 26% respectively) and UC (12, 14 and 40% respectively). These three VCs were all in the top seven importance factors in the machine learning while vc39 and vc23 were in the top two. vc23, although unclassified, contained CRISPR protospacers to Parabacteroides, while vc39, also unclassified, had hits to undefined Lachnospiraceae. CIO, a crAss-like phage, did not feature any CRISPR protospacer alignments.

[00217] The remaining 7 key VCs (vc17, vc13, vc5, vc15, vc9, vc22 and vc1Ol) were all significantly correlated to the PC-axes and were at significantly increased abundance in UC and/or CD compared to healthy controls, with the exception of vc1Ol which was increased in control and UC versus CD. vc13, vc15, vc17, all classified as Siphoviridae, had CRISPR protospacer hits to a number of genus of the Firmicutes, including Blautia, Coprobacillus, Pentoiphilus, Ruminococcus, Enterococcus, Lactobacillus, Streptococcus and Clostridium

(Figure 14). vc5, vc9, vc22, classified as Myoviridae, contained CRISPR protospacers to Firmicutes genera Clostridium, Coprobacillus, Enterococcus, Lactobacillus, Johnsonella, Roseburia, Ruminococcus, Veillonella and Flavonifractor along with the Proteobacteria Parasutterella (Figure 14). Finally, vc1Ol, a Microviridae, did not have any CRISPR protospacer alignments.

[00218] The key VCs were shown to be effective as marker clusters for classifying individual subject GI microbiota datasets within the larger dataset as either diseased or healthy.

[00219] Below are the methods used in the Examples described above.

STAR Methods / Data Download

Original Norman et al.. cohort

[00220] Raw sequencing reads (virome and 16S) for the Norman et al., 2015 cohort were downloaded using a link in the original publication (Norman et al., 2015. Cell, 160, 447-60).

[00221] Simponi cohort

To build upon these findings, the Simponi cohort, consisting of longitudinal samples from 40 subjects with UC, including 82 samples from periods of flareand 31 samples from periods of remission, was processed and analyzed. The processing included extraction of fecal VLP DNA, library preparation and sequencing. Processing also included extraction of fecal DNA, library preparation and 16S sequencing. Q33 spiking was also performed.

Bioinformatic viral processing

[00222] The samples described in Norman et al. were used. Raw sequence (2,199,754 ± 983,529) quality was assessed using FASTQC and filtered using Trimmomatic using the following parameters: SLIDINGWINDOW: 4:20, MINLEN: 60 HEADCROP 15; CROP 225. Human reads were removed using Kraken (v.0.10.5) (Wood and Salzberg, 2014. Genome Biol, 15, R46) and version 38 of the human genome, which resulted in a mean of 1,130,518 ± 436,424 sequences per sample. SPAdes meta (Nurk et al., 2017. Genome Res, 27, 824-834) was utilized to assemble the reads into contigs per sample accurately (Sutton et al, 2019. Microbiome, 7, 12) which were subsequently pooled and retained if longer than lkb. Redundancy was removed with 90% identity over 90% of the length (of the shorter) retaining the longest contig in each case. Bacterial contamination was removed by using an extensive set of inclusion criteria to select viral sequences only. Briefly, contigs were required to be: 1) VirSorter (Roux et al, 2015a. VirSorter: mining viral signal from microbial genomic data. PeerJ, 3, e985) positive, 2) circular, 3) a minimum of 2 pVogs with at least 3 per lkb (Grazziotin et al, 2017. Nucleic Acids Res, 45, D491-D498), 4) alignment to an in-house crAssphage database (threshold: le 10 ) (Guerin et al, 2018. Cell Host Microbe, 24, 653-664 e6), 5) greater than 3kb with no hits to the nt database (vx.y) (threshold: le 10 ), 6) hits to viral RefSeq database (threshold: le 10 ) (v.89), and less than 3 ribosomal proteins as predicted using the COG database (Tatusov et al., 2000. Nucleic Acids Res, 28, 33-6).

[00223] Quality reads were subsequently aligned to the reference set of viral sequences (n = 7,605) using bowtie2. Using SAMTools, a count table was generated and finally a 75% breadth of coverage filter was employed to predict spurious bowtie2 alignments. Any viral sequences which did not feature a recruited read coverage of at least 1 over 75% of the total sequence length were set to 0. The final set of viral contigs was 7,582.

Clustering and Taxonomy

[00224] Protein sequences were predicted using Prodigal (Hyatt et al., 2010. BMC

Bioinformatics, 11, 119) (n=121,021) and subsequently clustered using vContact2 (Bolduc et al, 2017. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243) using a pc-inflation and vc-inflation of 1.5 with all other parameters set to default. This resulted in 472 viral clusters of >2 members and 2,382 singletons, hereby referred to as a viral cluster (VC) with one member. A cluster count table was generated by summing all the counts from the previous table in each cluster. Taxonomic classification was assigned to a cluster using vContact2 and a custom database of viral genomes formed from the concatenation of the taxonomically classified portion of the NCBI's Viral RefSeq (v.89) and the JGI's IMG-VR (downloaded 9 January 2019). The resulting clusters were classified to family level based on the presence of reference genomes within. Clusters containing genomes from multiple families, were termed "heterogeneous", and may arise from disagreement between protein based phylogeny and current taxonomic classification discussed further by Bolduc et al. CRISPR protospacers were predicted from the human microbiome project bacterial reference genomes (VERSION/REF) using PILRCR (Edgar, 2007). These were aligned to VLS using blastn (-task“blastn-short”) and formatted with blastn formatter. (The top alignments with an evalue score <le-5 to each VLS was retained in each case). A VC was deemed lysogenic if it contained VLS with alignments to PVOGs featuring annotated integrase genes or site specific recombinase genes.

Simponi

[00225] The same processing as described above was performed for the Simponi cohort where 2,523,262 ± 1,289,619 raw reads were quality filtered (Trimmomatic: SLIDINGWINDOW:

4:20, MINLEN: 60 HEADCROP 15; CROP 135 (fwd), 120 (rev)) and assembled yielding 8,089 viral contigs in the final count table which lead to 484 clusters of > 2 members and 4,521 of one member.

Bioinformatic 16S processing

[00226] The samples described in Norman et al. were used. Read quality was assessed on the raw reads (68,146 ± 32,196) using FastQC before and after quality filtering using Trimmomatic; HEADCROP: 15 CROP:235 SLIDINGWINDOW: 4: 20 MINLEN:30. The trimmed reads of the Norman et al. 16S dataset were then processed using DADA2 (Callahan et al., 2016. Nat Methods, 13, 581-3) (vl.10.1). To do this, reads were quality filtered further (truncLen=230, maxEE=1.4, truncQ=l 1), before dereplication and de novo chimera removal (method =

"consensus"). 16S reads published in this study were processed using the same method

(truncLen=c(180,100), maxEE=1.4, truncQ=2) and the resulting sequence tables of both datasets merged in DADA2. Chimeras were removed de novo from the combined datasets

(method- 1 consensus"), followed by a round of reference based chimera removal using UCHIME (Edgar et al., 2011. Bioinformatics, 27, 2194-200) (v4.2) against the ChimeraSlayer Gold database. Resulting non-chimeric RSVs were sorted by length, with all RSVs having a minimum length of 200 bp and a maximum of 260bp retained. The final count table resulted in a mean of 41,060 ± 17,131 counts per sample. Classification of retained RSVs was achieved using mothur (Schloss et al, 2009. Appl Environ Microbiol, 75, 7537-41) (vl .38.0, bootstrap >=80), while SPINGO (Allard et al., 2015. BMC Bioinformatics, 16, 324) (vl.3, bootstrap >= 0.8, similarity >=0.5) was used for species level classification. The RDP vl 1.4 database was used in both instances. Simponi

[00227] The same methods as above were employed to process the 16S raw data from the Simponi cohort. There were 382,602 ± 181,911 raw reads. The following Trimmomatic parameters were applied: HEADCROP:20 SLIDINGWINDOW:4:20 CROP:210 MINLEN:50, resulting in a mean of 76,619 ± 40,278 counts in the final count table per sample after being subjected to the bioinformatics pipeline.

Data Analysis and Statistics

[00228] All statistics and figure generation were performed in R (v.3.5.1). Alpha and beta diversity was calculated using phyloseq (v.1.26) while differential abundance was with DeSeq2 (v.1.22.1). Correlations were using the spearman method, an Adonis from the vegan library (v.2.5-3 was utilized to investigate for significance in the beta diversity while Procrustes coordinates and significance was generating using procuste and procuste.randtest from the vegan library. Significance was defined as less than 0.05 and all adjustments (where required) was using the Benjamini-hochberg method. For all statistical tests, one sample was chosen at random per subject. All figures were generated using ggplot2 (v.3.1.0). Machine learning was carried out in R using the XgBoost package (0.71.2). In each case, the model was trained on 70% of the data and results refer to the remaining 30% of the data which tested the model. Parameters were optimized for each model. ROC curves and accuracy were performed using the R library RORC (v.1.0-7). Contig figures were generated using GView (vx.y).

Data Availability

[00229] Accession numbers for downloading the original raw sequences used in Norman et al. can be found in Norman et al, 2015.

[00230] Below are the tables referenced in the Examples described above.

[00231] The abbreviations in the table are described as follows:

[00232] “DeSeq2” refers to software for estimating variance-mean dependence in count data from high-throughput sequencing assays, and for testing differential expression based on a model using the negative binomial distribution. See, Love MI, Huber W, Anders S (2014).“Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. [00233] “lfcSE” refers to the logfoldchangeStandard Error calculation performed by DeSeq2.

[00234] “stat” refers to the Wald statistic calculation performed by DeSeq2.

[00235] The“p-value” ranges from zero to one and indicates the probability of finding such values from a given null (HO) hypothesis.

[00236] The“padj” value is the p-value adjusted for multiple testing using the Benjamini- Hochberg method.

[00237] The remaining values, e.g., for“RSV”,“Control Mean”,“Control Median”,“Control Present”,“CD Mean”,“CD Median”,“CD Present”,“UC Mean”,“UC Median”,“UC Present”, “logfdr”,“Domain”, and“Classification”,“pel” and“pc2”, refer to the control, Crohn’s disease, and ulcerative colitis samples or are understood by those of ordinary skill who work with DeSeq2.

[00238] The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and the accompanying figures. Such modifications are intended to fall within the scope of the appended claims. It is further to be understood that all values are approximate and are provided for description.

[00239] Patents, patent applications, publications, product descriptions, and protocols are cited throughout this application, the disclosures of which are incorporated herein by reference in their entireties for all purposes.

Table 1. DeSeq2 results of VC counts comparing Healthy controls to CD

Table 2. DeSeq2 results of VC counts comparing Healthy controls to UC

Table 3. DeSeq2 results of 16S RVS counts comparing Healthy controls to CD

Table 4. DeSeq2 results of 16S RVS counts comparing Healthy controls to UC

Table 5. Spearman correlations between VC counts and PC-axes 1 and 2 from Spearman distances

Table 6. Spearman correlations between RSV counts and PC-axes 1 and 2 from unweighted

UniFrac distances

Table 7. Spearman correlations between VC counts and PC-axes 1 and 2 from Spearman distances

Table 8. Spearman correlations between RSV counts and PC-axes 1 and 2 from unweighted

UniFrac distances

Table 9. DeSeq2 results of VC counts comparing UC Flare to UC Remission

Table 10. DeSeq2 results of RSV counts comparing UC Flare to UC Remission

Table 11. Viral Clusters for which an Increase in Abundance is Associated with Crohn’s Disease

Table 12. Viral Clusters for which an Increase in Abundance is Associated with Ulcerative Colitis

Table 13. Viral Clusters for which a Decrease in Abundance is Associated with Crohn’s Disease

Table 14. Viral Clusters for which a Decrease in Abundance is Associated with Ulcerative Colitis

Table 15. Bacterial taxa for which an Increase in Abundance is Associated with Crohn’s Disease

Table 16. Bacterial taxa for which an Increase in Abundance is Associated with Ulcerative Colitis

Table 17. Bacterial taxa for which a Decrease in Abundance is Associated with Crohn’s Disease

Table 18. Bacterial taxa for which a Decrease in Abundance is Associated with Ulcerative Colitis