Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VIRUSES ASSOCIATED WITH IMMUNODEFICIENCY AND ENTEROPATHY AND METHODS USING SAME
Document Type and Number:
WIPO Patent Application WO/2014/047261
Kind Code:
A1
Abstract:
The present invention relates to previously undescribed viruses that are associated with significant expansion of the virome, immunodeficiency, and enteropathy during lentiviral infection. The invention also provides methods to detect acquired immune deficiency syndrome (AIDS) or AIDS progression in a subject, methods to diagnose immunodeficiency or enteropathy in a subject, and methods to identify a therapeutic agent to treat the same.

Inventors:
BAROUCH DAN H (US)
VIRGIN HERBERT (US)
WANG DAVID (US)
ZHAO GUOYAN (US)
THACKRAY LARISSA (US)
HANDLEY SCOTT (US)
PRESTI RACHEL (US)
Application Number:
PCT/US2013/060579
Publication Date:
March 27, 2014
Filing Date:
September 19, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BETH ISRAEL HOSPITAL (US)
UNIV WASHINGTON (US)
International Classes:
C12N7/00; A61K39/12; A61K39/23
Foreign References:
US20060211115A12006-09-21
US20040136963A12004-07-15
Other References:
HANDLEY ET AL.: "Pathogenic simian immunodeficiency virus infection is associated with expansion of the enteric virome.", CELL, vol. 151, no. 2, 12 October 2012 (2012-10-12), pages 253 - 266
Attorney, Agent or Firm:
BIEKER-BRADY, Kristina (101 Federal Street15th Floo, Boston MA, US)
Download PDF:
Claims:
What is claimed is:

CLAIMS

1 . An isolated polynucleotide comprising all or a portion of a nucleotide sequence that is at least 70% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

2. The isolated polynucleotide of claim 1 , wherein said nucleotide sequence is at least 75% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

3. The isolated polynucleotide of claim 2, wherein said nucleotide sequence is at least 80% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

4. The isolated polynucleotide of claim 3, wherein said nucleotide sequence is at least 85% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

5. The isolated polynucleotide of claim 4, wherein said nucleotide sequence is at least 90% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

6. The isolated polynucleotide of claim 5, wherein said nucleotide sequence is at least 95% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

7. The isolated polynucleotide of claim 6, wherein said nucleotide sequence is 100% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof.

8. The isolated polynucleotide of any one of claims 1 -7, further comprising a label.

9. The isolated polynucleotide of claim 8, wherein said label is selected from the group consisting of a fluorophore, a hapten, an enzyme, and a radioisotope.

10. An isolated polypeptide comprising all or a portion of an amino acid sequence that is at least 70% identical to any one of SEQ ID NOs: 108-331 .

1 1 . The isolated polypeptide of claim 10, wherein said amino acid sequence is at least 75% identical to any one of SEQ ID NOs: 108-331 .

12. The isolated polypeptide of claim 1 1 , wherein said amino acid sequence is at least 80% identical to any one of SEQ ID NOs: 108-331 .

13. The isolated polypeptide of claim 12, wherein said amino acid sequence is at least 85% identical to any one of SEQ ID NOs: 108-331 .

14. The isolated polypeptide of claim 13, wherein said amino acid sequence is at least 90% identical to any one of SEQ ID NOs: 108-331 .

15. The isolated polypeptide of claim 14, wherein said amino acid sequence is at least 95% identical to any one of SEQ ID NOs: 108-331 .

16. The isolated polypeptide of claim 15, wherein said amino acid sequence is 100% identical to any one of SEQ ID NOs: 108-331 .

17. The isolated polypeptide of any one of claims 10-16, further comprising a label.

18. The isolated polypeptide of claim 17, wherein said label is selected from the group consisting of a fluorophore, a hapten, an enzyme, and a radioisotope.

19. An isolated antibody, or fragment thereof, that is specific for one or more of the isolated

polynucleotides of any one of claims 1 -9 or the isolated polypeptides of any one of claims 10-18.

20. A nucleic acid-based vaccine comprising a vector comprising all or a portion of the polynucleotide sequence of any one of claims 1 -9.

21 . An recombinant expression system for the production of a protein, or fragment thereof, that is

encoded by the polynucleotide sequence of any one of claims 1 -9.

22. The recombinant expression system of claim 21 , wherein said recombinant expression system is an in vitro or an in vivo expression system .

23. The recombinant expression system of claim 22, further comprising a cell.

24. The recombinant expression system of claim 23, wherein said cell is a bacterial, plant, or mammalian cell.

25. The recombinant expression system of claim 24, wherein said mammalian cell is a Chinese hamster ovary (CHO) cell.

26. An isolated recombinant cell that comprises the polynucleotide sequence of any one of claims 1 -9.

27. A recombinant virus comprising one or more of the isolated polynucleotides of any one of claims 1 -9 and/or one or more of the isolated polypeptides of any one of claims 10-18.

28. The recombinant virus of claim 27, wherein said virus further comprises a genome comprising a heterologous nucleic acid encoding an antigenic gene product of interest or fragment thereof, or wherein said recombinant virus further comprises a capsid comprising a heterologous antigenic gene product of interest or fragment thereof.

29. The recombinant virus of claim 28, wherein said antigenic gene product, or fragment thereof,

comprises a bacterial, viral, parasitic, or fungal gene product, or fragment thereof.

30. The recombinant virus of any one of claims 27-29, wherein all or a portion of said recombinant virus is from a viral family selected from the group consisting of Adenoviridae, Parvoviridae, Calciviridae, Papillomaviridae, Picobirnaviridae, Picornaviridae, and Polyomaviridae.

31 . The recombinant virus of claim 30, wherein said viral family is Adenoviridae.

32. The recombinant virus of claim 30, wherein said viral family is Parvoviridae.

33. A method of detecting acquired immune deficiency syndrome (AIDS) and/or AIDS progression in a subject, said method comprising detecting one or more target nucleotide sequences from a sample of said subject that specifically hybridize under stringent conditions to one or more of said

polynucleotides of claims 1 -9, wherein the detection of an increase in the level of said one or more target nucleotide sequences in said subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates AIDS and/or AIDS progression in said subject.

34. A method of diagnosing, or providing a prognostic indicator of, immunodeficiency and/or enteropathy in a subject, said method comprising detecting one or more target nucleotide sequences from a sample of said subject that specifically hybridize under stringent conditions to one or more of said polynucleotides of claims 1 -9, wherein the detection of an increase in the level of said one or more target nucleotide sequences in said subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates the presence of, or the propensity to develop, immunodeficiency and/or enteropathy in said subject.

35. A method of identifying a therapeutic agent for use in treating immunodeficiency and/or enteropathy in a subject, said method comprising detecting one or more target nucleotide sequences that specifically hybridize under stringent conditions to one or more of said polynucleotides of claims 1 -9 from a sample of a subject administered a therapeutically effective amount of a candidate agent, wherein the detection of a decrease in the level of said one or more target nucleotide sequences in said subject, relative to the level of one or more target nucleotide sequences from said subject prior to administration or a control subject, identifies said candidate agent as said therapeutic agent.

36. The method of claim 35, wherein said candidate agent is administered to said subject in a

therapeutically effective amount.

37. The method of claim 34 or 35, wherein said immunodeficiency or enteropathy is associated with a lentivirus.

38. The method of claim 37, wherein said immunodeficiency and enteropathy are associated with a

lentivirus.

39. The method of claim 37 or 38, wherein said lentivirus is human immunodeficiency virus (H IV) or simian immunodeficiency virus (SIV).

40. The method of any one of claims 33-39, wherein detecting one or more target nucleotide sequences comprises synthesizing cDNA from RNA comprised by said sample.

41 . The method of claim 40, wherein said one or more target nucleotide sequences are detected by a PCR assay.

42. The method of claim 41 , wherein said PCR assay is selected from the group consisting of a real time PCR (RT-PCR) assay and a nested PCR assay.

43. The method of any one of claims 33-42, wherein said sample is a tissue, organ, liquid, or feces sample.

44. The method of claim 43, wherein said subject is a mammal.

45. The method of claim 44, wherein said mammal is a primate.

46. The method of claim 44, wherein said primate is a human.

Description:
VIRUSES ASSOCIATED WITH IMMUNODEFICIENCY

AND ENTEROPATHY AND METHODS USING SAME

Background of the Invention

Human immunodeficiency virus (H IV) infection of humans and pathogenic simian

immunodeficiency virus (SIV) infection of rhesus monkeys causes progressive immunocompromise and acquired immune deficiency syndrome (AIDS). One hallmark that correlates with the rate of progression to AIDS is systemic immune activation. Systemic immune activation is, in turn, associated with damage to the intestinal epithelium (enteropathy) and translocation of as-yet-undefined immunostimulatory pathogen-associated molecular patterns (PAMPS) or antigens into tissues and the blood.

Despite the importance of intestinal barrier damage to AIDS progression, the mechanisms responsible for AIDS enteropathy are not understood. One possibility is that immunodeficiency leads to epithelial damage by intestinal viruses or other pathogens. The mammalian virome and bacterial microbiome is extremely complex and can contribute to immune status and disease in a range of settings. Thus far, a prior study that utilized 16S rDNA sequencing, which was unable to detect viruses, found no discernible differences in the diversity of bacteria associated with SIV infection (McKenna et al., PLoS Pathog. 4: e20 (2008)). However, it remained a possibility that the virome, a subset of the metagenome that may be defined as viruses that infect eukaryotic cells, contributes to epithelial damage during lentiviral infection.

It is therefore important to understand the contribution of the virome to lentiviral infection- associated phenotypes, such as enteropathy. There is an unmet need in the field for understanding the contribution of the virome upon lentiviral infection, as well as for the development of alternative methods of diagnosing and treating lentiviral infections (e.g., H IV).

Summary of the Invention

This invention relates to the discovery of previously undescribed viruses that are associated with significant expansion of the virome during lentiviral infection. In a first aspect, the invention features isolated polynucleotides including all or a portion of a nucleotide sequence that is at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 100% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof. In some embodiments, the isolated polynucleotides include a label (e.g., a fluorophore, a hapten, an enzyme, or a radioisotope). The isolated polynucleotides of the invention may include at least 5, 6, 7, 8, 9, 1 0, 20, 30, 40, 50, 60, 70, 80, 90, 1 00, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000 or more contiguous or non-contiguous nucleotides of a reference polynucleotide molecule. In some

embodiments, the polynucleotides of the invention are between 10-1 00 nucleotides in length, more particularly between 10-30 nucleotides in length (e.g., 10, 1 1 , 12, 13, 14, 1 5, 16, 1 7, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length), and can be at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 1 00% identical to any one of SEQ ID NOs: 332-371 .

In another aspect, the invention features isolated polypeptides including all or a portion of an amino acid sequence that is at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 1 00% identical to any one of SEQ ID NOs: 108-331 . In some embodiments, the isolated polypeptides include a label (e.g., a fluorophore, a hapten, an enzyme, or a radioisotope). The isolated polypeptides of the invention may include at least 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 18, 19, 20, 25, 50, 75, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, or 350 or more contiguous or non-contiguous amino acids of a reference polypeptide molecule.

In another aspect, the invention features isolated antibodies, or fragments thereof, that are specific for one or more of the isolated polynucleotides or polypeptides of the invention. In some embodiments, the isolated antibodies, or fragments thereof, may be chimeric, human, humanized, or synthetic. In other embodiments, the isolated antibodies, or fragments thereof, may further include a label.

In another aspect, the invention features recombinant expression systems for the production of a protein, or fragment thereof, that is encoded by the polynucleotides of the invention. In some embodiments, the recombinant expression system is an in vitro or an in vivo expression system . In other embodiments, the recombinant expression system further includes a cell (e.g., a bacterial, plant, or mammalian cell). In yet other embodiments, the mammalian cell is a Chinese hamster ovary (CHO) cell.

In another aspect, the invention features recombinant viruses including one or more of the isolated polynucleotides and/or one or more of the isolated polypeptides of the invention. In one preferred embodiment, the viruses further include a genome including a heterologous nucleic acid encoding an antigenic gene product of interest or fragment thereof, or the viruses further include a capsid including a heterologous antigenic gene product of interest or fragment thereof. In another preferred embodiment, the antigenic gene product, or fragment thereof, includes a bacterial, viral, parasitic, or fungal gene product, or fragment thereof. In some embodiments, all or a portion of the recombinant virus is from the viral family Adenoviridae, Parvoviridae, Calciviridae, Papillomaviridae, Picobirnaviridae, Picornaviridae, or Polyomaviridae. In preferred embodiments, the viral family is Adenoviridae or Parvoviridae.

In another aspect, the invention features methods of detecting acquired immune deficiency syndrome (AIDS) and/or AIDS progression in a subject including detecting one or more target nucleotide sequences from a sample of the subject that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention, where the detection of an increase in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates AIDS and/or AIDS progression in the subject. In another aspect, the invention features methods of diagnosing, or providing a prognostic indicator of, immunodeficiency and/or enteropathy in a subject including detecting one or more target nucleotide sequences from a sample of the subject that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention, where the detection of an increase in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates the presence of, or the propensity to develop, immunodeficiency and/or enteropathy in the subject. In some embodiments, the immunodeficiency and/or enteropathy is associated with a lentivirus (e.g., human immunodeficiency virus (H IV) or simian immunodeficiency virus (SIV)).

In yet another aspect, the invention features methods of identifying a therapeutic agent for use in treating immunodeficiency and/or enteropathy in a subject including detecting one or more target nucleotide sequences that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention from a sample of a subject administered a therapeutically effective amount of a candidate agent, where the detection of a decrease in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from the subject prior to administration or a control subject, identifies the candidate agent as the therapeutic agent. In some embodiments, the candidate agent is administered to the subject in a therapeutically effective amount. In other embodiments, the immunodeficiency and/or enteropathy is associated with a lentivirus (e.g., human immunodeficiency virus (H IV) or simian immunodeficiency virus (SIV)).

In other aspects, the invention features nucleic acid-based vaccines including a vector including the polynucleotides of the invention as well as isolated recombinant cells including the polynucleotides of the invention (e.g., all or a portion of a polynucleotide having at least 70%, 80%, 85%, 90%, 95%, 99%, or 100% sequence identity to any one or more of SEQ ID NOs: 1 -107 and/or 332-371 , or a reverse complement thereof).

In other aspects, the invention features polypeptide-based vaccines including the polypeptides of the invention (e.g., all or a portion of a polypeptide having at least 70%, 80%, 85%, 90%, 95%, 99%, or

100% sequence identity to any one or more of SEQ ID NOs: 108-331 ).

In any of the methods described herein, the detecting of one or more target nucleotide sequences may include synthesizing cDNA from RNA of the sample.

In any of the embodiments described herein, the one or more target nucleotide sequences are detected by a PCR assay (e.g., a real time PCR (RT-PCR) assay and/or a nested PCR assay).

In any of the embodiments described herein, the sample is a tissue, organ, liquid, or feces sample. In preferred embodiments, the sample is from a mammal, preferably a primate, such as a human.

Definitions

The term "antibody" is used in the broadest sense and includes monoclonal antibodies (e.g., full- length or intact monoclonal antibodies), polyclonal antibodies, multivalent antibodies, multispecific antibodies (e.g., bispecific antibodies so long as they exhibit the desired biological activity) and may also include certain antibody fragments (as described in greater detail herein). An antibody typically comprises both "light chains" and "heavy chains." The light chains of antibodies (immunoglobulins) from any vertebrate species can be assigned to one of two clearly distinct types, called kappa (κ) and lambda (λ), based on the amino acid sequences of their constant domains. Depending on the amino acid sequence of the constant domain of their heavy chains, immunoglobulins can be assigned to different classes. There are five major classes of immunoglobulins: IgA, IgD, IgE, IgG, and IgM, and several of these can be further divided into subclasses (isotypes), e.g., lgG1 , lgG2, lgG3, lgG4, lgA1 , and lgA2. The heavy chain constant domains that correspond to the different classes of immunoglobulins are called α, δ, ε, γ, and μ, respectively. The subunit structures and three-dimensional configurations of different classes of immunoglobulins are well known.

"Antibody fragments" of "fragments" comprise only a portion of an intact antibody, wherein the portion preferably retains at least one, preferably most or all, of the functions normally associated with that portion when present in an intact antibody. Examples of antibody fragments include Fab, Fab', F(ab') 2 , and Fv fragments (e.g., single-chain variable fragments (scFv)) ; diabodies; linear antibodies; single-chain antibody molecules; and multispecific antibodies formed from antibody fragments. Papain digestion of antibodies produces two identical antigen-binding fragments, called "Fab" fragments, each with a single antigen-binding site, and a residual "Fc" fragment, whose name reflects its ability to crystallize readily. Pepsin treatment yields an F(ab') 2 fragment that has two antigen-combining sites and is still capable of cross-linking antigen. In one embodiment, an antibody fragment comprises an antigen binding site of the intact antibody and thus retains the ability to bind antigen. In another embodiment, an antibody fragment, for example one that comprises the Fc region, retains at least one of the biological functions normally associated with the Fc region when present in an intact antibody, such as FcRn binding, antibody half life modulation, ADCC function, ADCVI function, and complement binding. In one embodiment, an antibody fragment is a monovalent antibody that has an in vivo half life substantially similar to an intact antibody. For example, such an antibody fragment may comprise on antigen binding arm linked to an Fc sequence capable of conferring in vivo stability to the fragment.

By "capsid" is meant a protein shell or coat of a virus which often adopts a helical or icosahedral structure. The capsid of an adenovirus, for example, adopts an icosahedral structure and consists of three major structural proteins: hexon, penton, and fiber proteins. The capsid encloses the genetic material of the virus.

Throughout this specification and claims, the word "comprise," or variations such as "comprises" or "comprising," will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

By "enteropathy" is meant damage to the intestinal epithelium, commonly associated with lentiviral infection (e.g., human immunodeficiency virus (H IV) infection in humans), which can result in intestinal leakage associated with increased serum LPS binding protein (LBP) levels and systemic immune activation. Enteritis, inflammation of the intestinal epithelium, is a type of enteropathy.

By "gene product" is meant to include m RNAs or other nucleic acids (e.g., microRNAs) transcribed from a gene as well as polypeptides translated from those mRNAs.

By "heterologous nucleic acid molecule" is meant any exogenous nucleic acid molecule that can be incorporated into, for example, a virus or expression system of the invention for subsequent expression of a gene product of interest or fragment thereof encoded by the heterologous nucleic acid molecule. In a preferred embodiment, the heterologous nucleic acid molecule encodes an antigenic gene that is of bacterial, viral, parasitic, or fungal origin (e.g., a nucleic acid molecule encoding the H IV Gag, Pol, Env, Nef, Tat, Rev, Vif, Vpr, or Vpu gene product, or fragment thereof). The heterologous nucleic acid molecule is one that is not normally associated with the other nucleic acid molecules found in virus or expression system.

By "immunodeficiency" is meant a compromised immune system of a subject relative to that of a control, whereby the compromise of the immune system can be measured by a decrease in the levels of CD4 T cells, B cells, plasma cells, antibodies, or neutrophil granulocytes of the subject relative to that of the control.

By "isolated" is meant separated, recovered, or purified from a component of its natural environment.

A "label" refers to a molecular moiety or compound that is detected or leads to a detectable signal. A label may be joined directly or indirectly to a polynucleotide, polypeptide, or a probe thereof. Direct labeling can occur through bonds or interactions that link the label to the probe, including covalent bonds or non-covalent interactions, e.g. hydrogen bonds, hydrophobic and ionic interactions, or formation of chelates or coordination complexes. Indirect labeling can occur through use of a bridging moiety or linker (e.g., antibody or additional oligomer), which is either directly or indirectly labeled, and which may amplify the detectable signal. Labels include any detectable moiety, such as a fluorophore, hapten, enzyme, radioisotope, enzyme substrate, reactive group, chromophore (e.g., a dye, a particle, or a bead that imparts detectable color), or luminescent compound (e.g., bioluminescent, phosphorescent, or chemiluminescent labels). A "radioisotope" can be any radioisotope known to skilled artisans, such as, 3 H, 14 C, 32 P, 33 P, 35 S, or 125 l. A "fluorophore" can be any fluorophore known to skilled artisan, for example, a fluorescein, a rhodamine, a coumarin, an indocyanine, or a green fluorescent protein (GFP) or variant thereof (e.g., a red fluorescent protein (RFP)). An enzyme can be any enzyme for which a suitable substrate is available, such as, for example, alkaline phosphatase, a horseradish peroxidase, or a chloramphenicol acetyltransferase. A suitable substrate is a substrate that, when contacted by an enzyme, produces a product that is detectable by methods known to skilled artisans. For example, the substrate can be a chromogenic substrate (e.g., p-dinitrophenyl phosphate as a substrate for alkaline phosphatase or diaminobenzidine as a substrate for horseradish peroxidase), a fluorogenic substrate (e.g., 5-amino-2,3-dihydrophthalazine-1 ,4-dione (luminol) for horseradish peroxidase or disodium 3-(4- methoxyspiro{1 ,2-dioxetane-3,2'-(5'chloro)tricycle[3.3.1 .13,7]decan}-4-yl)phenyl phosphate for alkaline phosphatase). A "hapten" can be any hapten for which a probe is available, such as biotin, streptavidin, or digoxigenin.

By "portion" is meant a part of a whole. A portion may comprise at least 1 0%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95% of the entire length of an polynucleotide or polypeptide sequence region. For polynucleotides, for example, a portion may include at least 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 1 00, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000 or more contiguous nucleotides of a reference polynucleotide molecule. For polypeptides, for example, a portion may include at least 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 50, 75, 90, 1 00, 125, 150, 175, 200, 225, 250, 275, 300, or 350 or more contiguous amino acids of a reference polypeptide molecule.

By "recombinant," with respect to an expression system or virus, is meant an expression system or virus that has been manipulated in vitro. For example, an expression system or virus which includes a heterologous nucleic acid sequence, such as a sequence encoding an antigenic gene product, introduced using recombinant nucleic acid techniques. By "sample" is meant any biological substance obtained from a subject, such as a biological feces (stool), fluid, tissue, or organ sample. A biological fluid sample can be, without limitation, a blood sample, a plasma sample, a serum sample, a cerebrospinal fluid sample, a urine sample, or a saliva sample.

By "sequence identity" or "sequence similarity" is meant that the identity or similarity between two or more amino acid sequences, or two or more nucleotide sequences, is expressed in terms of the identity or similarity between the sequences. Sequence identity can be measured in terms of "percentage (%) identity," wherein the higher the percentage, the more identity shared between the sequences.

Sequence similarity can be measured in terms of percentage similarity (which takes into account conservative amino acid substitutions) ; the higher the percentage, the more similarilty shared between the sequences. Homologs or orthologs of nucleic acid or amino acid sequences possess a relatively high degree of sequence identity/similarity when aligned using standard methods. Sequence identity may be measured using sequence analysis software on the default setting (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710

University Avenue, Madison, Wl 53705). Such software may match similar sequences by assigning degrees of homology to various substitutions, deletions, and other modifications.

By "specifically hybridizes" is meant hybridization, under stringent hybridization conditions, of a first polynucleotide (e.g., a probe or primer) to a second polynucleotide (e.g., a target sequence) to a detectably greater degree than hybridization of the first polynucleotide to non-target polynucleotide sequences and/or to the substantial exclusion of non-target polynucleotide sequences. Selectively hybridizing sequences have at least 70% sequence identity, at least 80% sequence identity, at least 90% sequence identity, or 100% sequence identity (e.g., complementary) with each other.

The term "stringent conditions" refers to conditions under which a probe will hybridize to its target sequence to a detectably greater degree than to other sequences. Stringent conditions are sequence- dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5Ό lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Typically, stringent conditions will be those in which the salt concentration is less than about 1 .0 M Na ion, typically about 0.01 to 1 .0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30Ό for short probes (e.g., 10 to 50 nucleotides) and at least about 60 Ό for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents, such as formamide.

A "subject" is a vertebrate, such as a mammal (e.g., primates and humans). Mammals also include, but are not limited to, farm animals (such as cows), sport animals, pets (such as cats, dogs, and horses), mice, and rats.

By "therapeutically effective amount" is meant an amount of a therapeutic agent that alone, or together with one or more additional (optional) therapeutic agents, produces beneficial or desired results upon administration to a mammal. The therapeutically effective amount depends upon the context in which the therapeutic agent is applied. For example, in the context of administering a composition including a therapeutic agent, the therapeutically effective amount of the composition is an amount sufficient to achieve a reduction in the level of an infectious virus, such as H IV or SIV (e.g., as measured by a stabilization or an increase in CD4 T cell count relative to a control), and/or a reduction in the level of enteropathy (e.g., as measured by a decrease in serum LBP levels relative to a control) as compared to a response obtained without administration of the composition, and/or to prevent the propagation of an infectious virus (e.g., H IV) in a subject (e.g., a human) having an increased risk of viral infection. Ideally, a therapeutically effective amount provides a therapeutic effect without causing a substantial cytotoxic effect in the subject. In general, a therapeutically effective amount of a composition administered to a subject (e.g., a human subject) will vary depending upon a number of factors associated with that subject, for example the overall health of the subject, the condition to be treated, or the severity of the condition. A therapeutically effective amount of a composition can be determined by varying the dosage of the product and measuring the resulting therapeutic response.

As used herein, and as well understood in the art, "treatment" is an approach for obtaining beneficial or desired results, such as clinical results. Beneficial or desired results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions; diminishment of extent of disease, disorder, or condition; stabilization (i.e., not worsening) of a state of disease, disorder, or condition; prevention of spread of disease, disorder, or condition; delay or slowing the progress of the disease, disorder, or condition; amelioration or palliation of the disease, disorder, or condition; and remission (whether partial or total), whether detectable or undetectable. "Palliating" a disease, disorder, or condition means that the extent and/or undesirable clinical manifestations of the disease, disorder, or condition are lessened and/or time course of the progression is slowed or lengthened, as compared to the extent or time course in the absence of treatment.

Other features and advantages of the invention will be apparent from the following Detailed

Description, the drawings, and the claims.

Brief Description of the Drawings

Figure 1 A is a graph showing the taxonomic distribution of sequences identified in feces of pathogenic SIV-infected (SIV+) and uninfected (SIV-) control rhesus monkeys housed at the NEPRC 24 weeks after intrarectal infection with SIVmac251 . The flanking doughnut chart displays the averaged values per kingdom for SIV+ or SIV- monkeys.

Figure 1 B is a graph showing the taxonomic distribution of sequences identified in feces of pathogenic SIV-infected (SIV+) and uninfected (SIV-) control rhesus monkeys described in Figure 1 A housed at the N EPRC 64 weeks after SIV infection. * indicates euthanized for progressive AIDS 24 to 64 weeks after SIV infection. The flanking doughnut chart displays the averaged values per kingdom for SIV+ or SIV- monkeys.

Figure 1 C is a graph showing the taxonomic distribution of sequences identified in feces of pathogenic SIV-infected (SIV+) and uninfected (SIV-) control rhesus monkeys housed at the TNPRC 23- 64 weeks after intravaginal infection with SIVmac251 . The flanking doughnut chart displays the averaged values per kingdom for SIV+ or SIV- monkeys.

Figure 1 D is a graph showing taxonomic distribution of sequences identified in feces of nonpathogenic SIV-infected (SIV+) and control (SIV-) vervet African green monkeys housed at the N IH at least three years after intravenous infection with SIVagm90, SIVagmVerl , or after natural infection in the wild. The flanking doughnut chart displays the averaged values per kingdom for SIV+ or SIV- monkeys.

Figure 1 E is a graph showing the taxonomic distribution of sequences identified in feces of nonpathogenic SIV-infected (SIV+) and control (SIV-) sabaeus African green monkeys housed at the NEPRC and infected intravenously with SIVagmMJ8, SIVagm9315BR, or uninfected controls. The flanking doughnut chart displays the averaged values per kingdom for SIV+ or SIV- monkeys.

Figure 2A is a graph showing the quantitation of sequences from different kingdoms of life identified in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the NEPRC 24 weeks after SIV infection. The nature of SIV infection is as defined in the Figure 1 A.

Figure 2B is a graph showing the quantitation of sequences from different kingdoms of life identified in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the NEPRC 64 weeks after SIV infection. The nature of SIV infection is as defined in the Figure 1 B.

Figure 2C is a graph showing the quantitation of sequences from different kingdoms of life identified in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the TNPRC. The nature of SIV infection is as defined in the Figure 1 C.

Figure 2D is a graph showing the quantitation of sequences from different kingdoms of life identified in the feces of nonpathogenic SIV-infected and control vervet African green monkeys housed at the N IH. The nature of SIV infection is as defined in the Figure 1 D.

Figure 2E is a graph showing the quantitation of sequences from different kingdoms of life identified in the feces of nonpathogenic SIV-infected and control sabaeus African green monkeys housed at the NEPRC. The nature of SIV infection is as defined in the Figure 1 E.

Figure 3A is a graph showing SIV RNA levels in animals in the NEPRC cohort.

Figure 3B is a graph showing CD4 T cell (CD4) numbers in animals in the NEPRC cohort.

Figure 3C is a graph showing serum LPS binding protein (LBP) levels in animals in the N EPRC cohort.

Figure 4A is a chart showing the distribution of virus sequences present in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the N EPRC 24 weeks after SIV infection. "Mammalian" indicates that sequences were most closely related to viruses that infect mammals. Viruses infecting non-mammals are referred to as "other." "Unclassified viruses" includes all unclassified viruses, e.g., Chronic bee paralysis virus, Chimpanzee stool associated circular ssDNA virus, Circovirus-like genome RW-C, Circovirus-like genome CB-A, and Rodent stool-associated circular genome virus.

Figure 4B is a chart showing the distribution of virus sequences present in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the N EPRC 64 weeks after SIV infection. * indicates euthanized for progressive AIDS between 24 and 64 weeks after SIV infection. Virus classifications as described in Figure 4A.

Figure 4C is a chart showing the distribution of virus sequences present in the feces of pathogenic SIV-infected and control rhesus monkeys housed at the TNPRC. Virus classifications as described in Figure 4A.

Figure 4D is a chart showing the distribution of virus sequences present in the feces of nonpathogenic SIV-infected and control vervet African green monkeys housed at the N IH. Virus

classifications as described in Figure 4A.

Figure 4E is a chart showing the distribution of virus sequences present in the feces of nonpathogenic SIV-infected and control sabaeus African green monkeys housed at the NEPRC.

(C) Viruses present in feces of pathogenic SIV-infected and control rhesus monkeys housed at the TNPRC. Virus classifications as described in Figure 4A. Figure 4F is a graph showing the average number of picornavirus sequences, after normalization for analysis using MEGAN, detected in the indicated cohorts of SIV-infected (+) and control (-) rhesus monkeys.

Figure 5A are schematic diagrams showing the assembled viral contigs (in gray) from newly identified WUHARV Caliciviruses 1 (animal 39), 2 (from an animal not included in the cohort), and 3 (animal 39) compared to Tulane calicivirus (black bar). Calicivirus 1 contig 1 derived from 879 sequences, length= 6578 bp; Calicivirus 2 contig 1 derived from 16 sequences, length=812 bp; Calicivirus 2 contig 2 assembled from 120 sequences, length= 5083 bp; Calicivirus 3 contig 1 assembled from 14 sequences, length= 750 bp; Calicivirus 3 contig 2 assembled from 67 sequences, length= 21 1 1 bp; Calicivirus 3 contig 3 assembled from 41 sequences, length= 832 bp; Calicivirus 3 contig 4 assembled from 38 sequences, length=1273 bp. Animal numbers refer to the monkeys in Figure 1 A. Indicates the percentage nucleotide identity over the designated length of the best aligned homologous region (indicated by double headed arrow) compared to the most closely related genome indicated in the black bar.

Figure 5B are schematic diagrams showing the assembled viral contigs (in gray) from newly identified WUHARV Parvovirus 1 (animal 39) and 2 (animal 35) compared with the sequence of canine or mouse parvovirus 4a (black bars), as indicated. Parvovirus 1 contig 1 assembled from 375 sequences, length= 4905 bp; Parvovirus 2 contig 1 representing 1 sequence, length= 470 bp; Parvovirus 2 contig 2 assembled from 6 sequences, length= 690 bp. Animal numbers refer to the monkeys in Figure 1 A. Indicates the percentage nucleotide identity over the designated length of the best aligned homologous region (indicated by double headed arrow) compared to the most closely related genome indicated in the black bar.

Figure 5C are schematic diagrams showing the assembled viral contigs (in gray) from newly identified WUHARV Enterovirus 1 (animal 41 ), 2 (animal 39) and 3 (animal 33) compared with the sequence of Simian enterovirus SV19 (black bar). Enterovirus 1 assembled from 1084 sequences, length= 7273 bp; Enterovirus 2 assembled from 758 sequences, length= 7128 bp; Enterovirus 3 assembled from 406 sequences, length= 6962 bp. Animal numbers refer to the monkeys in Figure 1 A. Indicates the percentage nucleotide identity over the designated length of the best aligned homologous region (indicated by double headed arrow) compared to the most closely related genome indicated in the black bar.

Figure 5D are schematic diagrams showing the assembled viral contigs (in gray) from newly identified WUHARV Sapelovirus 1 (animal 42), 2 (animal 41 ) and 3 (animal 37) compared with the sequence of Simian Sapelovirus 1 strain 2383 (black bar). Sapelovirus 1 assembled from 3081 sequences, length= 8059 bp; Sapelovirus 2 assembled from 271 1 sequences, length= 8025 bp;

Sapelovirus 3 assembled from 380 sequences, length= 6872 bp. Animal numbers refer to the monkeys in Figure 1 A. Indicates the percentage nucleotide identity over the designated length of the best aligned homologous region (indicated by double headed arrow) compared to the most closely related genome indicated in the black bar.

Figure 5E is a chart showing the presence of viral sequences as detected by PCR using virus- specific primers (Table 1 ). Numbers below the chart refer to the animals in Figure 1 A. "a" refers to lack of detection of a virus likely due to the presence of a divergent virus; "b" refers to lack of detection of a virus for unknown reasons; and "c" refers to detection of virus sequences in serum samples taken at the time of euthanasia for AIDS.

Figure 6A are schematic diagrams showing the assembled viral contigs (in gray) from newly identified WUHARV Adenovirus 1 (animal #40) compared to the known virus Simian adenovirus 1 strain ATCC VR-195 (black bar). These contigs were assembled from 1308 sequences. Animal numbers refer to the monkeys in Figure 1 A. Indicates the percentage nucleotide identity over the designated length of the best aligned homologous region (indicated by double headed arrow) compared to the most closely related genome indicated in the black bar.

Figure 6B is an agarose gel showing PCR confirmation of WUHARV Adenovirus 1 during amplification, plaque purification, and cesium chloride gradient purification. The three PCR products for each sample (lanes 2-1 9) were derived from primers 4302c3f and 4302c3r, 4302c18f and 4302c1 8r, and 4302c1 f and 4302c1 r, respectively (Table 1 ). Lane 1 is a molecular weight ladder.

Figures 6C are images showing representative histopathology (top panels) and adenovirus immunohistochemistry (IHC) (bottom panels) for animal #23. Adenovirus infection was associated with villous atrophy and fusion (i) and sloughed epithelial cells that contained intranuclear adenoviral inclusions (arrows in (ii)). Adenovirus antigen could be localized to villous tip epithelium by

immunohistochemistry (brown color of DAB chromagen, Mayer's counterstain; (iii) and (iv)). Scale bars in (i) and (iii) are 0.5 mm . Scale bars in (ii) and (iv) are 200 μητι .

Figures 6D are images showing representative histopathology (top panels) and adenovirus immunohistochemistry (IHC) (bottom panels) for animal #27. Adenovirus infection was associated with villous atrophy and fusion (i) and sloughed epithelial cells that contained intranuclear adenoviral inclusions (arrows in (ii)). Adenovirus antigen could be localized to villous tip epithelium by

immunohistochemistry (brown color of DAB chromagen, Mayer's counterstain; (iii) and (iv)). Scale bars in (i) and (iii) are 0.5 mm . Scale bars in (ii) and (iv) are 200 μητι .

Figure 7A is a diagram showing the neighbor-joining phylogenetic analysis of the predicted full- length non-structural polyprotein of WUHARV Calicivirus 1 .

Figure 7B is a diagram showing the neighbor-joining phylogenetic analysis of the predicted nonstructural protein of WUHARV Parvovirus 1 .

Figure 7C is a diagram showing the neighbor-joining phylogenetic analysis of the full genome of WUHARV Enteroviruses 1 , 2, and 3, and WUHARV Sapeloviruses 1 and 2.

Figure 8A is a rank abundance plot for SIV- animals constructed using both 16S rDNA sequencing from a previous study performed from TNPRC (McKenna et al., PLoS Pathog. 4: e20 (2008)) and our next-generation sequencing (NGS) data from TNPRC (Table 2; Figure 9C). Samples for 16S rDNA sequencing were obtained over a period of months in 1996, while samples for this study were obtained in 201 1 . Circles indicate 1 6S rDNA sequencing data from McKenna et al.; squares indicate sequencing data from this study.

Figure 8B is a rank abundance plot for SIV+ animals constructed using both 16S rDNA sequencing from a previous study performed from TNPRC (McKenna et al., PLoS Pathog. 4: e20 (2008)) and our next-generation sequencing (NGS) data from TNPRC (Table 2; Figure 9C). Samples for 16S rDNA sequencing were obtained over a period of months in 1996, while samples for this study were obtained in 201 1 . Circles indicate 1 6S rDNA sequencing data from McKenna et al.; squares indicate sequencing data from this study. Figure 8C are graphs showing species accumulation (left panel), Shannon's diversity (middle panel), and Pielou's evenness (right panel) for SIV-infected and control monkeys housed at NEPRC for 24 weeks. The species accumulation curve was constructed for SIV-infected (red) and uninfected control (blue) rhesus monkeys by quantifying the average number of bacterial families identified as additional animals were added to the analysis. The corresponding Shannon's diversity and Pielou's evenness ranges were calculated for equivalent sample numbers based on the minimum sample number between SIV+ and SIV- animals. When this minimum number was less than the maximum number of animals, 100 random samples with replacement were used to determine the sample mean. Differences between means were assessed using an unpaired Student's t-test. The nature of SIV infection is as defined in Figures 1 A-1 D.

Figure 8D are graphs showing species accumulation (left panel), Shannon's diversity (middle panel), and Pielou's evenness (right panel) for SIV-infected and control monkeys housed at NEPRC for 64 weeks. The species accumulation curve and corresponding Shannon's diversity and Pielou's evenness ranges were calculated as described for Figure 8C. The nature of SIV infection is as defined in Figures 1 A-1 D.

Figure 8E are graphs showing species accumulation (left panel), Shannon's diversity (middle panel), and Pielou's evenness (right panel) for SIV-infected and control monkeys housed at TNPRC 23- 64 weeks after intravaginal infection with SIVmac251 . The species accumulation curve and

corresponding Shannon's diversity and Pielou's evenness ranges were calculated as described for Figure 8C. The nature of SIV infection is as defined in Figures 1 A-1 D.

Figure 8F are graphs showing species accumulation (left panel), Shannon's diversity (middle panel), and Pielou's evenness (right panel) for SIV-infected and control vervet African green monkeys housed at the N IH after intravenous infection with SIVagm90, SIVagmVeM , or after natural infection in the wild. The species accumulation curve and corresponding Shannon's diversity and Pielou's evenness ranges were calculated as described for Figure 8C. The nature of SIV infection is as defined in Figures 1 A-1 D.

Figure 8G is a graph showing species accumulation for SIV-infected and control sabaeus African green monkeys housed at NEPRC and infected intravenously with SIVagmMJ8 or SIVagm9315BR. The species accumulation curve was calculated as described for Figure 8C. Based on the lack of comparable family richness between SIV-infected and control animals in this cohort, we do not report diversity or evenness.

Figure 9A is a heatmap displaying the number of sequences assigned to specific bacterial families for each individual pathogenic SIV-infected and control rhesus monkey housed at the NEPRC 24 weeks after SIV infection. The nature of SIV infection is as defined for Figure 1 A.

Figure 9B is a heatmap displaying the number of sequences assigned to specific bacterial families for each individual pathogenic SIV-infected and control rhesus monkey housed at the NEPRC 64 weeks after SIV infection. The nature of SIV infection is as defined for Figure 1 B.

Figure 9C is a heatmap displaying the number of sequences assigned to specific bacterial families for each individual pathogenic SIV-infected and control rhesus monkeys housed at the TNPRC. The nature of SIV infection is as defined for Figure 1 C. Figure 9D is a heatmap displaying the number of sequences assigned to specific bacterial families for each individual nonpathogenic SIV-infected and control vervet African green monkeys housed at the N IH. The nature of SIV infection is as defined for Figure 1 D. Detailed Description

The present invention relates to the discovery that pathogenic SIV infection is associated with a significant and unexpected expansion of the enteric virome, as detected using next generation sequencing (NGS) of RNA plus DNA. We documented a remarkable number of differences in the fecal virome between pathogenically SIV-infected monkeys, uninfected control monkeys, and monkeys infected with non-pathogenic SIV. These findings included increases in viral sequences, the presence of novel viruses, the association of unsuspected adenovirus infection with intestinal disease and enteric epithelial pathology, and viremia with enteric parvoviruses in advanced AIDS. At least 32 new viruses were detected from genera that cause diseases in mammalian hosts including adenoviruses, caliciviruses, parvoviruses, picornaviruses, and polyomaviruses (see, for example, Table 3 or Figure 5 for a summary of the identified viruses). Our assignment of viral sequences to new viruses was conservative, and thus additional sequencing may detect additional viruses in the enteric virome in SIV-infected animals.

Application of standard diagnostic approaches, such as PCR or culture, would not have identified the breadth of divergent viruses detected here, and therefore would have underestimated both the potential causes of enteritis or systemic viral infection and the diversity of antigens which might contribute to enteropathy and immune activation. Our findings show that the nature of the enteric virome can be used as a prognostic indicator of H IV progression. The nature of the enteric virome may also contribute to AIDS pathogenenesis by damaging the intestinal epithelium to allow access of microbes, PAMPs, and viral antigens into tissues and the circulation to activate the immune system and stimulate lentivirus replication.

These data challenge the notion that abnormalities in the intestinal tract in pathogenic SIV- infected primates are due to direct effects of SIV or indirect effects of SIV on immune responses to enteric bacteria (Sandler et al., J. Infect. Dis. 203: 780-790 (201 1 )). Instead, immunocompromise during lentivirus infection appears to be associated with significant expansion of the enteric virome, which results in damage to the intestine, as shown for adenoviruses in the present study.

Such damage could provide access for bacterial PAMPs, or as shown here enteric viruses, into tissues and the circulation. It is already recognized that bacterial and viral contributions to intestinal pathology are not independent of each other. Clear synergies between the virome, bacteria, and host genes have been documented in murine systems (Bloom et al., Cell Host Microbe 9: 390-403 (201 1 ) ; Cadwell et al., Ce// 141 : 1 135-1 145 (2010) ; Virgin et al., Ce// 147: 44-56 (201 1 )). Importantly, it is not clear how bacterial PAMPs would explain the T cell activation characteristic of the systemic immune activation associated with AIDS progression. Our data suggest that T and B cell activation might be due to immune responses to unexpected viral antigens, as for example the parvovirus we detected in the circulation of a subset of animals. Unsuspected viral infections might also contribute to the high levels of IFN-a noted in the circulation of untreated AIDS patients. Searching for virus-specific T cell responses requires knowledge of the sequence of the viral proteins present, indicating the importance of sequencing the virome to define potential antigens that might drive immune activation in lentivirus-infected hosts. Polynucleotides of the Invention

As a first aspect, the invention provides polynucleotide sequences related to previously undiscovered viruses of the enteric virome. The isolated polynucleotides may include all or a portion of a nucleotide sequence that is at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 1 00% identical to any one of SEQ ID NOs: 1 -107, or a reverse complement thereof. In some embodiments, the isolated polynucleotides include a label (e.g., a fluorophore, a hapten, an enzyme, or a radioisotope). The isolated polynucleotides of the invention may include at least 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000 or more contiguous or non-contiguous nucleotides of a reference polynucleotide molecule. In some embodiments, the polynucleotides of the invention are between 10-100 nucleotides in length, more particularly between 10-30 nucleotides in length (e.g., 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 1 8, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length), and can be at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 100% identical to any one of SEQ ID NOs: 332-371 . SEQ ID NOs: 332-371 (see, for example, Table 1 ) disclose primers that can be utilized in a PCR assay to screen for the presence of the viruses.

Polypeptides of the Invention

In another aspect, the invention features isolated polypeptides including all or a portion of an amino acid sequence that is at least 70% identical (e.g., at least 71 %, 72%, 73%, or 74% identical), at least 75% identical (e.g., at least 76%, 77%, 78%, or 79% identical), at least 80% identical (e.g., at least 81 %, 82%, 83%, or 84% identical), at least 85% identical (e.g., at least 86%, 87%, 88%, or 89% identical), at least 90% identical (e.g., at least 91 %, 92%, 93%, or 94% identical), at least 95% identical (e.g., at least 96%, 97%, 98%, or 99% identical), or 1 00% identical to any one of SEQ ID NOs: 108-331 . In some embodiments, the isolated polypeptides include a label (e.g., a fluorophore, a hapten, an enzyme, or a radioisotope). The isolated polypeptides of the invention may include at least 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 18, 19, 20, 25, 50, 75, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, or 350 or more contiguous or non-contiguous amino acids of a reference polypeptide molecule.

Antibodies, Recombinant Expression Systems, and Viruses of the Invention

The invention features isolated antibodies, or fragments thereof, that are specific for one or more of the isolated polynucleotides or polypeptides of the invention. The isolated antibodies, or fragments thereof, may be chimeric, human, humanized, or synthetic, and may further include a label.

In another aspect, the invention features recombinant expression systems for the production of a protein, or fragment thereof, that is encoded by the polynucleotides of the invention. The recombinant expression system may be an in vitro or an in vivo expression system and may further include a cell. The cell may be a bacterial cell (e.g., an E coli cell), a plant cell, or a mammalian cell (e.g., a Chinese hamster ovary (CHO) cell).

In yet another aspect, the invention features recombinant viruses including one or more of the isolated polynucleotides and/or one or more of the isolated polypeptides of the invention. In one preferred embodiment, the viruses further include a genome including a heterologous nucleic acid encoding an antigenic gene product of interest or fragment thereof, or the viruses further include a capsid including a heterologous antigenic gene product of interest or fragment thereof. The antigenic gene product, or fragment thereof, may include a bacterial, viral, parasitic, or fungal gene product, or fragment thereof. Non-limiting examples of bacterial gene products, or fragments thereof, include 10.4, 85A, 85B, 86C, CFP-10, Rv3871 , and ESAT-6 gene products, or fragments thereof, of Mycobacterium; 0, H, and K antigens, or fragments thereof, of E. coli; and protective antigen (PA), or fragments thereof, of Bacillus anthracis. Non-limiting examples of viral gene products, or fragments thereof, include Gag, Pol, Nef, Tat, Rev, Vif, Vpr, or Vpu, or fragments thereof, of H IV and other retroviruses; 9D antigen, or fragments thereof, of HSV; Env, or fragments thereof, of all envelope protein-containing viruses. Non-limiting examples of parasitic gene products, or fragments thereof, include circumsporozoite (CS) protein, gamete surface proteins Pfs230 and Pfs48/45, and Liver Specific Antigens 1 or 3 (LSA-1 or LSA-3), or fragments thereof, of Plasmodium falciparum. Non-limiting examples of fungal gene products, or fragments thereof, include any cell wall mannoprotein (e.g., Afmpl of Aspergillus fumigatus) or surface-expressed glycoprotein (e.g., SOWgp of Coccidioides immitis). In some embodiments, all or a portion of the recombinant virus is from the viral family Adenoviridae, Parvoviridae, Calciviridae, Papillomaviridae, Picobirnaviridae, Picornaviridae, or Polyomaviridae. In preferred embodiments, the viral family is Adenoviridae or Parvoviridae. For example, in some embodiments, all or a portion of the recombinant virus of the invention may be from WUHARV Adenovirus 1 and have all or a portion of a nucleotide sequence that is at least 70% identical to any one of SEQ ID NOs: 1 -13 and/or express all or a portion of a polypeptide sequence that is at least 70% identical to any one of SEQ ID NOs: 1 08-163. In some embodiments, the recombinant virus of the invention may be from WUHARV Adenovirus 2 or 3 and have all or a portion of a nucleotide sequence that is at least 70% identical to any one of SEQ ID NOs: 14-54 and/or express all or a portion of a polypeptide sequence that is at least 70% identical to any one of SEQ ID NOs: 164-256. In some embodiments, the recombinant virus of the invention may be from WUHARV Adenovirus 4 and have all or a portion of a nucleotide sequence that is at least 70% identical to SEQ ID NO: 55 or SEQ ID NO: 56 and/or express all or a portion of a polypeptide sequence that is at least 70% identical to SEQ ID NO: 257 or SEQ ID NO: 258. In some embodiments, the recombinant virus of the invention may be from WUHARV Adenovirus 5 and have all or a portion of a nucleotide sequence that is at least 70% identical to any one of SEQ ID NOs: 57-69 and/or express all or a portion of a polypeptide sequence that is at least 70% identical to any one of SEQ ID NOs: 259-277.

Detection of Acquired Immune Deficiency Syndrome (AIDS) or AIDS Progression

Discovery of the expansion of the enteric virome in nonhuman primates infected with pathogenic SIV, but not with non-pathogenic SIV, has profound implications for understanding AIDS pathogenesis in these animals and suggests a similar expansion of the enteric virome in human AIDS. Our data are consistent with a model in which immunosuppression results in increased levels of enteric viral infection which, in a feed-forward manner, contributes to AIDS via damage to the intestinal mucosa and induction of systemic immune activation that accelerates AIDS progression. This study shows the pathogenetic potential of the enteric virome, as exemplified by animals with enteritis associated with adenovirus infection or parvovirus viremia. By sequencing both RNA and DNA and by using metagenomic approaches, rather than focusing on bacterial 16S rDNA analysis, we have documented a new set of viruses associated with clinical AIDS progression in rhesus monkeys. Since these viruses include many potential pathogens, studies of H IV and SIV pathogenesis should take them into account as possible contributors to disease progression. This provides substantial opportunity to explain and eventually intervene in the processes that lead to AIDS clinical disease progression. Our data indicate that the expansion of the enteric virome can be used as a marker for rapidly progressive disease.

Accordingly, the present invention also relates to methods of detecting acquired immune deficiency syndrome (AIDS) and/or AIDS progression in a subject by detecting one or more target nucleotide sequences from a sample of the subject that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention, where the detection of an increase in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates AIDS and/or AIDS progression in the subject. Detecting of the one or more target nucleotide sequences may include synthesizing cDNA from RNA of the sample, and may utilize a PCR assay for detection, such as a real time PCR (RT-PCR) assay and/or a nested PCR assay. SEQ ID NOs: 332-371 (see, for example, Table 1 ) disclose primers that can be utilized in a PCR assay to screen for the presence of the viruses. The sample may be a tissue, organ, liquid, or feces sample from a mammal, preferably a primate or a human. This method of detecting AIDS and/or AIDS progression in a subject can be used alone, in conjunction, or in parallel with known method(s) of detecting AIDS and/or AIDS progression, such as by the detection of CD4 T cell levels.

Diagnosis of Immunodeficiency or Enteropathy

The compositions of the invention may be used for other diagnostic purposes. In some aspects, the invention features methods of diagnosing, or providing a prognostic indicator of, immunodeficiency and/or enteropathy in a subject including detecting one or more target nucleotide sequences from a sample of the subject that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention, where the detection of an increase in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from a control subject, indicates the presence of, or the propensity to develop, immunodeficiency and/or enteropathy in the subject. Detecting one or more target nucleotide sequences may include synthesizing cDNA from RNA of the sample, and may utilize a PCR assay for detection, such as a real time PCR (RT- PCR) assay and/or a nested PCR assay. SEQ ID NOs: 332-371 (see, for example, Table 1 ) disclose primers that can be utilized in a PCR assay to screen for the presence of the viruses. The sample may be a tissue, organ, liquid, or feces sample from a mammal, preferably a primate or a human. This method of diagnosing, or providing a prognostic indicator of, immunodeficiency and/or enteropathy in a subject can be used alone, in conjunction, or in parallel with known method(s) of diagnosing, or providing a prognostic indicator of, immunodeficiency and/or enteropathy, such as by the detection of CD4 T cell levels and/or serum LPS binding protein (LBP) levels. Treatment of Immunodeficiency or Enteropathy

In other aspects, the compositions of the invention may be used for therapeutic purposes. For example, the invention features nucleic acid- or polypeptide-based vaccines. The vaccines may include a vector that includes the polynucleotides of the invention or a vaccine that includes a polypeptide of the invention. In addition, the invention features methods of identifying a therapeutic agent for use in treating immunodeficiency and/or enteropathy in a subject including detecting one or more target nucleotide sequences that specifically hybridize under stringent conditions to one or more of the polynucleotides of the invention from a sample of a subject administered a therapeutically effective amount of a candidate agent, where the detection of a decrease in the level of the one or more target nucleotide sequences in the subject, relative to the level of one or more target nucleotide sequences from the subject prior to administration or a control subject, identifies the candidate agent as the therapeutic agent. In some embodiments, the candidate agent is administered to the subject in a therapeutically effective amount. In other embodiments, the immunodeficiency and/or enteropathy is associated with a lentivirus (e.g., human immunodeficiency virus (H IV) or simian immunodeficiency virus (SIV)). Detecting one or more target nucleotide sequences may include synthesizing cDNA from RNA of the sample, and may utilize a PCR assay for detection, such as a real time PCR (RT-PCR) assay and/or a nested PCR assay. SEQ ID NOs: 332-371 (see, for example, Table 1 ) disclose primers that can be utilized in a PCR assay to screen for the presence of the viruses. The sample may be a tissue, organ, liquid, or feces sample from a mammal, preferably a primate or human.

Administration of a therapeutic agent

The vaccines of the invention or the therapeutic agent, once identified by the methods of the invention, can be administered to a subject (e.g., a human), pre- or post-lentiviral (e.g., H IV) infection, to treat, prevent, ameliorate, inhibit the progression of, or reduce the severity of immunocompromise and/or enteropathy. The subject, at the time of administration, may present as symptomatic or asymptomatic. In addition, the vaccine or identified therapeutic agent may also treat, prevent, ameliorate, inhibit the progression of, or reduce the severity of one or more symptoms, if present, of lentiviral (e.g., H IV) infection. Examples of the symptoms caused by lentiviral infection include one or more of, e.g., fever, muscle aches, coughing, sneezing, runny nose, sore throat, headache, chills, diarrhea, vomiting, rash, weakness, dizziness, bleeding under the skin, in internal organs, or from body orifices like the mouth, eyes, or ears, shock, nervous system malfunction, delirium , seizures, renal (kidney) failure, personality changes, neck stiffness, dehydration, seizures, lethargy, paralysis of the limbs, confusion, back pain, loss of sensation, impaired bladder and bowel function, and sleepiness that can progress into coma or death. These symptoms, and their resolution during treatment, may be measured by, e.g., a physician during a physical examination or by other tests and methods known in the art.

The vaccines or therapeutic agents can be formulated for administration alone or as a pharmaceutical composition by a route selected from , e.g., intramuscular, intravenous, intradermal, intraarterial, intraperitoneal, intralesional, intracranial, intraarticular, intraprostatical, intrapleural, intratracheal, intranasal, intravitreal, intravaginal, intrarectal, topical, intratumoral, peritoneal,

subcutaneous, subconjunctival, intravesicular, mucosal, intrapericardial, intraumbilical, intraocularal, oral, or local administration, or by inhalation, by injection, by infusion, by continuous infusion, by localized perfusion bathing target cells directly, by catheter, by lavage, by gavage, in cremes, or in lipid compositions. The preferred method of administration can vary depending on various factors (e.g., the components of the composition being administered and the severity of the condition being treated).

Formulations suitable for oral or nasal administration may consist of liquid solutions, such as an effective amount of the composition dissolved in a diluent (e.g., water, saline, or PEG-400), capsules, sachets, tablets, or gels, each containing a predetermined amount of the chimeric Ad5 vector composition of the invention. The pharmaceutical composition may also be an aerosol formulation for inhalation, e.g., to the bronchial passageways. Aerosol formulations may be mixed with pressurized, pharmaceutically acceptable propellants (e.g., dichlorodifluoromethane, propane, or nitrogen). In particular, administration by inhalation can be accomplished by using, e.g., an aerosol containing sorbitan trioleate or oleic acid, for example, together with trichlorofluoromethane, dichlorofluoromethane, dichlorotetrafluoroethane, or any other biologically compatible propellant gas.

Immunogenicity of the vaccine of the invention may be significantly improved if it is coadministered with an immunostimulatory agent or adjuvant. Suitable adjuvants well-known to those skilled in the art include, e.g., aluminum phosphate, aluminum hydroxide, QS21 , Quil A (and derivatives and components thereof), calcium phosphate, calcium hydroxide, zinc hydroxide, glycolipid analogs, octodecyl esters of an amino acid, muramyl dipeptides, polyphosphazene, lipoproteins, ISCOM matrix, DC-Choi, DDA, cytokines, and other adjuvants and derivatives thereof.

The vaccine or therapeutic agent of the invention, or a pharmaceutical composition including the same, may be formulated to release the vaccine or therapeutic agent immediately upon administration (e.g., targeted delivery) or at any predetermined time period after administration using controlled or extended release formulations. Administration in controlled or extended release formulations is useful where the vaccine or agent, either alone or in combination, has (i) a narrow therapeutic index (e.g., the difference between the plasma concentration leading to harmful side effects or toxic reactions and the plasma concentration leading to a therapeutic effect is small; generally, the therapeutic index, Tl, is defined as the ratio of median lethal dose (LD 50 ) to median effective dose (ED 50 )) ; (ii) a narrow absorption window at the site of release (e.g., the gastro-intestinal tract) ; or (iii) a short biological half-life, so that frequent dosing during a day is required in order to sustain a therapeutic level.

Many strategies can be pursued to obtain controlled or extended release in which the rate of release outweighs the rate of metabolism of the vaccine or therapeutic agent, or the pharmaceutical composition including the same. For example, controlled release can be obtained by the appropriate selection of formulation parameters and ingredients, including, e.g., appropriate controlled release compositions and coatings. Suitable formulations are known to those of skill in the art. Examples include single or multiple unit tablet or capsule compositions, oil solutions, suspensions, emulsions,

microcapsules, microspheres, nanoparticles, patches, and liposomes.

The vaccine or therapeutic agent may be administered, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20,

30, 35, 40, 45, 50, 55, or 60 minutes, 2, 4, 6, 10, 15, or 24 hours, 2, 3, 5, or 7 days, 2, 4, 6 or 8 weeks, or even 3, 4, or 6 months pre-exposure, or may be administered to the subject 15-30 minutes or 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 20, 24, 48, or 72 hours, 2, 3, 5, or 7 days, 2, 4, 6 or 8 weeks, 3, 4, 6, or 9 months, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 years or longer post-exposure to the infective agent, alone or in a pharmaceutical composition.

When treating disease (e.g., AIDS due to H IV infection) , the vaccine or therapeutic agent may be administered to the subject either before a definitive diagnosis, before the occurrence of immunodeficiency and/or enteropathy, or after diagnosis or symptoms become evident. For example, the pharmaceutical composition including the vaccine or therapeutic agent may be administered, e.g., immediately after diagnosis or the clinical recognition of symptoms or 2, 4, 6, 10, 1 5, or 24 hours, 2, 3, 5, or 7 days, 2, 4, 6 or 8 weeks, or even 3, 4, or 6 months after diagnosis or detection of symptoms.

The vaccines or therapeutic agents may be sterilized by conventional sterilization techniques, or may be sterile filtered. The resulting aqueous solutions may be packaged for use as is, or lyophilized; the lyophilized preparation may be administered in powder form or combined with a sterile aqueous carrier prior to administration. The pH of the preparations typically will be between 3 and 1 1 , more preferably between 5 and 9 or between 6 and 8, and most preferably between 7 and 8, such as 7 to 7.5. The resulting compositions in solid form may be packaged in multiple single dose units, each containing a fixed amount of the vaccine or therapeutic agent, and, if desired, one or more immunomodulatory agents, such as in a sealed package of tablets or capsules, or in a suitable dry powder inhaler (DPI) capable of administering one or more doses.

Dosages

The dose of the vaccine or therapeutic agent or the number of treatments using the same may be increased or decreased based on the severity of, occurrence of, or progression of, the level of immunocompromise and/or enteropathy in the subject (e.g., based on the severity of one or more symptoms of, e.g., viral infection). The dosage administered depends on the subject to be treated (e.g., the age, body weight, capacity of the immune system , and general health of the subject being treated), the form of administration (e.g., as a solid or liquid), the manner of administration (e.g., by injection, inhalation, dry powder propellant), and the cells targeted (e.g., epithelial cells, such as blood vessel epithelial cells, nasal epithelial cells, or pulmonary epithelial cells).

In addition, single or multiple administrations of the vaccines or therapeutic agents of the present invention may be given (pre- or post-infection) to a subject (e.g., one administration or administration two or more times). For example, subjects who are particularly susceptible to, e.g., viral infection may require multiple treatments to establish and/or maintain protection against the virus. Levels of induced immunity provided by the vaccines or therapeutic agents described herein can be monitored by, e.g., measuring CD4 T cell levels and/or serum LPS binding protein (LBP) levels. The dosages may then be adjusted or repeated as necessary to maintain desired therapeutic levels in the subject having immunocompromise and/or enteropathy associated with, e.g., a lentiviral (e.g., H IV) infection.

In some embodiments, a single dose of the vaccine or therapeutic agent may achieve protection, pre-exposure, from infective agents. In addition, a single dose administered post-exposure to a viral or other infective agent can function as a treatment according to the present invention. Multiple doses (e.g., 2, 3, 4, 5, or more doses) can also be administered, in necessary, to these subjects.

Carriers, Excipients, Diluents

The compositions of the invention may include a recombinant replication-defective Ad5 vector with chimeric hexon and fiber proteins, containing a heterologous nucleic acid molecule encoding an antigenic gene product or fragment thereof. An adenoviral vector of the invention also includes one or more of the adenoviruses identified in the present study (e.g., one or more of these adenoviruses may be used as a vector that is modified to include a heterologous nucleic acid molecule, which, upon expression in a host, produces a therapeutic immunogenic response in the host). Therapeutic formulations of the compositions of the invention are prepared using standard methods known in the art by mixing the active ingredient having the desired degree of purity with optional physiologically acceptable carriers, excipients or stabilizers (Remington's Pharmaceutical Sciences (20 th edition), ed. A. Gennaro, 2000, Lippincott, Williams & Wilkins, Philadelphia, PA). Acceptable carriers, include saline, or buffers such as phosphate, citrate and other organic acids; antioxidants including ascorbic acid; low molecular weight (less than about 10 residues) polypeptides; proteins, such as serum albumin, gelatin or immunoglobulins;

hydrophilic polymers such as polyvinylpyrrolidone, amino acids such as glycine, glutamine, asparagines, arginine or lysine; monosaccharides, disaccharides, and other carbohydrates including glucose, mannose, or dextrins; chelating agents such as EDTA; sugar alcohols such as mannitol or sorbitol; salt- forming counterions such as sodium ; and/or nonionic surfactants such as TWEEN™, PLURON ICS™, or PEG.

Optionally, the formulation contains a pharmaceutically acceptable salt, preferably sodium chloride, e.g., at about physiological concentrations. Optionally, the formulations of the invention can contain a pharmaceutically acceptable preservative. In some embodiments the preservative

concentration ranges from 0.1 to 2.0%, typically v/v. Suitable preservatives include those known in the pharmaceutical arts. Benzyl alcohol, phenol, m-cresol, methylparaben, and propylparaben are examples of preservatives. Optionally, the formulations of the invention can include a pharmaceutically acceptable surfactant at a concentration of 0.005 to 0.02%.

These and other aspects of the invention are further described in the Examples, below.

Examples

The following examples are to illustrate the invention. They are not meant to limit the invention i any way.

Example 1. Materials and Methods

Nucleic acid preparation and 454 sequencing

100 mg or 200 mg of frozen stool was resuspended in 6 volumes of PBS (Finkbeiner et al., PLoS Pathog. 4: e100001 1 (2008)), centrifuged to pellet particulate matter and the supernatant was then passed through a 0.45-μιτι filter. Total nucleic acid was isolated from 200 μΙ_ or 850 μΙ_ of this filtrate using the Ampliprep DNA extraction machine (Roche) according to manufacturer's instructions. To enable detection of both RNA and DNA viruses, 9 μΙ_ total nucleic acid from each sample was reverse transcribed and 6 μΙ_ of the cDNA reaction amplified as previously described (Wang et al., PLoS Biol. 1 : E2 (2003)). Briefly, RNA templates were reverse transcribed using a first primer containing a 1 6- nucleotide specific sequence followed by 9 random nucleotides for random priming. The 16-nucleotide specific sequence was unique for each sample and served as a barcode in assigning sequencing sequences to a sample. Sequenase (United States Biochemical) was used for second strand cDNA synthesis and for random-primed amplification of DNA templates using the first primer. Each sample was subjected to 40 cycles of PCR amplification using a second primer containing the same 16 nucleotide specific sequence as in the corresponding first primer. Amplification products were quantitated, diluted to 15 ng/μί and then 5 μ Ι_ of each sample was pooled, adaptor-ligated and sequenced on the 454 GS-FLX platform (454 Life Sciences).

Detection and analysis of viral sequences using custom bioinformatic pipeline

Sequences were analyzed using VirusHunter as described (Presti et al., J. Virol. 83: 1 1599-

1 1606 (2009) ; Loh et al., J. Virol. 83: 1301 9-13025 (2009) ; Zhao et al., J. Virol. 85: 10230-10238 (201 1 ) ; Felix et al., PLoS Biol. 9: e1000586 (201 1 ) ; Loh et al., J. Virol. 85: 2642-2656 (201 1 )). Briefly, sequences were assigned to samples based on the unique barcode sequences (i.e., the second primer sequences), primer sequences were trimmed, and sequences were clustered using CD-H IT (Li et al., Bioinformatics 22: 1658-1659 (2006)) to remove redundant sequences (95% identity over 95% sequence length). The longest sequence from each cluster was chosen as the representative unique sequence and entered into the analysis pipeline. Then, unique sequences were masked by RepeatMasker (Smit, et al.

RepeatMasker Open-3.0). If a sequence did not contain a stretch of at least 50 consecutive non-"N" nucleotides or if greater than 40% of the total length of the sequence was masked, it was removed from further analysis (filtered). Filtered high quality unique non-repetitive sequences were sequentially compared against (i) the human genome using BLASTn; (ii) GenBank nt database using BLASTn; and (iii) GenBank nr database using BLASTX (Altschul et al., J. Mol. Biol. 215: 403-41 0 (1990)). Minimal e- value cutoffs of 1 e "10 and 1 e "5 were applied for BLASTn and BLASTX, respectively (Bench et al., Appl. Envir. Microbiol. 73: 7629-7641 (2007) ; Wommack et al., Appl.Envir. Microbiol. 74: 1453-1463 (2008)). Sequences were phylotyped as human, mouse, fungal, bacterial, phage, viral, or other based on the identity of the top BLAST hit. Sequences without any significant hit to any of the databases were designated as unassigned. If a sequence aligned to both a virus and another kingdom (e.g., bacteria or fungi) with the same e-value it was classified as ambiguous. All eukaryotic viral sequences were further classified into viral families based on the taxonomy ID of the best hit.

Assembly of viral contigs and virus comparison analysis

All viral sequences, unassigned sequences, and the longest five similar sequences for those sequences from each sample were assembled into contigs using Newbler (454 Life Sciences) with default parameters. If a sample was sequenced multiple times, all available sequencing data were used to optimize contig assembly. The longest contig from amongst all contigs belonging to a given genus was chosen as the first representative contig. To compare viruses across multiple animals all sequences (contigs and sequences if no contigs were obtained from a sample) were compared with this

representative virus contig. If a sequence aligned with the representative contig over its full length and shared 98% nucleotide identify or higher over the aligned region it was considered to be the same as the representative contig. For sequence that was considered as different from the representative contig, the next longest contig was selected as the second representative virus. This process was repeated until all sequences were classified. If two contigs or sequences were located at different regions of the genome, and no conclusive decision could be made about their possible relatedness, we defaulted in a conservative fashion to assuming that only a single virus was present. Representative viral contigs were queried against the NCBI nt database and the most related viral genomes were identified. The most closely related virus with full genome sequence available was selected as the reference genome. For adenoviruses different sequences shared the highest homology with different viruses, indicating that in these large genomes some regions of the new viruses we detected were most related to different viruses in the data base. Two out of the three contig sequences used for designing primers shared highest homology to simian adenovirus 1 strain ATCC VR-195, which was therefore selected as reference genome. If no nucleotide level homology was detected, viral contigs were queried for protein homology against the NCBI nr database and the most related viral genome was identified.

Metagenomic analysis using MEGAN

Individual sequences obtained by 454 sequencing were analyzed using BLASTX (version

2.2.22+) on a customized server with ~1 700 available processor slots and a memory range of 2-32 GB per node. Sequences were compared by BLASTX to the NCBI nr database version 06/06/201 1 . Results with an e-value <e ~10 were stored and used for taxonomic assignment using the Lowest-Common

Ancestor (LCA) algorithm in MEGAN v. 4.62.3 (22 Nov 201 1 ). The following LCA parameters were used for taxonomic assignment: Min Support: 5, Min Score: 35, Top percent: 10, Win Score: 0, Min

Complexity: 0. This process resulted in the generation of sample specific RMA files used by MEGAN for downstream analysis. These files contain all of the taxonomic assignment information for each sample. Global metagenome comparisons using all sequences assigned to all taxa were completed for each cohort. These comparisons used MEGAN'S normalization protocol enabling inter-sample comparison.

Additionally, sequences contained in specific taxonomic subsets (bacteria, viruses, or phage) were isolated and processed through MEGAN using the same parameters. Similarly, sequences from specific phage taxa (caudovirales, microviridae, leviviridae and unclassified phage) were extracted and compared. This procedure permitted independent analysis of these taxa without artifactual effects of global

normalization. Summarized sequence counts per taxa were exported for subsequent statistical analysis using GraphPad Prism version 5.0d. PCR detection of viruses

Primers (Table 1 ) were designed to amplify regions conserved between WUHARV adenoviruses 1 -5, caliciviruses 1 -2, calicivirus 3, parvoviruses 1 -2, enteroviruses 1 -3, sapeloviruses 1 -3, and related viral genomes. Primer sensitivity was evaluated using libraries with high or low numbers of adenovirus, calicivirus, parvovirus, enterovirus, or sapelovirus sequences, while primer specificity was evaluated using libraries with high numbers of unrelated virus sequences, as well as virus sequences from related genera. Libraries generated from stool samples were diluted 10 fold and screened (n=2) for presence of viruses using : 1 0xPCR buffer 2.5 ul, MgCI 2 (25 mM) 2.5 ul, dNTP (2 mM) 2.5 ul, forward primer (10 uM) 0.5 ul, reverse primer (10 uM) 0.5 ul, Taq 0.3 ul, and H 2 0 6.2ul. PCR products were amplified at: 95Ό, 5 min; 95 °C, 30 sec, 60°C, 30 sec, 72 Ό, 1 min, for 32 cycles; 72Ό, 10 min and then visualized using EtBr on a 1 .5 % agarose gel. There was concordance in all duplicate tests.

TABLE 1 : Primers

Primer SEQ

WUHARV

Targeted region Sequence (5 -3 ) ID Orientation Virus

name NO:

GGCAATCATGATGGACACCT

Adenovirus 1 4302c3f Hexon T 332 F

Adenovirus 1 4302c3r Hexon TTAATCACCACCGCAACGC 333 R CAATGGAACATTAATCCCAC

Adenovirus 1 4302c1 f Hexon G 334 F

CCTGCCAACACTCCCATATT

Adenovirus 1 4302C1 r Hexon T 335 R

4302C1 8 AGAGCTATCACACAGCGTTC

Adenovirus 1 f E1 B A 336 F

4302C1 8

Adenovirus 1 r E1 B ACCGAGTGGTGGAGGAGAA 337 R

4310ac1 TAACGTTCAGACCAATCTGG

Adenovirus 2 8f pllla A 338 F

4310ac1 CGGCAATAGTGCTACTGTTG

Adenovirus 2 8r pllla G 339 R

4310ac1

Adenovirus 2 6f Hexon CGGGACAACTTCATTGGACT 340 F

4310ac1

Adenovirus 2 6r Hexon GCGCCAATG I I I ACAAAGGT 341 R

4310bc1 TAACGTTCAGACCAATCTGG

Adenovirus 3 8f pllla A 342 F

4310bc1 CGGCAATAGTGCTACTGTTG

Adenovirus 3 8r pllla G 343 R

4310bc2 ACGACAGCACCAGTTCAAAA

Adenovirus 3 1 f Hexon C 344 F

4310bc2

Adenovirus 3 1 f Hexon I I I I CTGGCAGCGTGATGTT 345 R

4310bc2

Adenovirus 3 8r E3 CTCTTGGCAACCCCTTATTG 346 F

4310bc2

Adenovirus 3 8f E3 TGGGTGAAACCATTCCTGTT 347 R

4312u1

Adenovirus 4 1 r E3 CCGTCCTCTCCTGGTAGAAA 348 F

4312u1

Adenovirus 4 1 f E3 CGTCGACTGTTGGAGAAACA 349 R

4312u1

Adenovirus 4 Or DBP GCCGTTACATCCAGATCCTC 350 F

4312u1

Adenovirus 4 Of DBP TAC ACCG AGG G AATG AAAG C 351 R

NCR 1 between E1 a and

Adenovirus 4 4312u7r E1 b CTTGTGCCTGTGC I I I I CAT 352 F

NCR between E1 a and GTGCAAAGAGAACTAGTATG

Adenovirus 4 4312u7f E1 b G 353 R

GGATGTTCAAGTACATGGGC

Adenovirus 5 4287u7f Iva A 354 F

GATGCATGACAAGTTCCCCA

Adenovirus 5 4287u7r Iva A 355 R

G AATG GTAG CTGCTTTCTTC

Adenovirus 5 4287c5f E3 A 356 F

Adenovirus 5 4287c5r E3 TGTTGGGTGATTGTGATGGA 357 R

4287C1 1 CTGAAAAAAACGAATTGGTG

Adenovirus 5 f Fiber-1 G 358 F

4287C1 1

Adenovirus 5 r Fiber-1 TTGACAACAATGGTGCGTTG 359 R

Adenovirus AdV-a pllla ACT A ACG T KC AG ACC A ATCT 360 F (1-5) GG

Adenovirus

(1-5) AdV-b pllla GTAC AG RCTCACGGACTG C 361 R

Calicivirus (1-

2) CV-a NS 2 polyprotein GTACGAYGTCGGAGGGACC 362 F

Calicivirus (1- GRTCACAAGCCATGACACTC

2) CV-b NS polyprotein AG 363 R

TTATGTTATGGACAACCCAA

Calicivirus 3 CV-c NS polyprotein AGG 364 F

GGTCAAGAGACAATAGCTCC

Calicivirus 3 CV-d NS polyprotein AT 365 R

Parvovirus ACCAGACTAACWCAAGGCG

(1-3) PV-a capsid C 366 F

Parvovirus GGTASGTGTTCCATTGTCTT

(1-3) PV-b capsid GG 367 R

Enterovirus

(1-3) EV-a 5'UTR 3 GCACAACCCCAGTGTAGTTC 368 F

Enterovirus CCAATCCAATMGCTATATGA

(1-3) EV-b 5'UTR TGAC 369 R

Sapelovirus CCAGKMTAAAAGGCAATTGT

(1-3) SV-a 5'UTR GG 370 F

Sapelovirus CCTGTC AG GTAG C ACTAG AC

(1-3) SV-b 5'UTR T 371 R

NCR = non-coding region

2NS = non-structural

3UTR = untranslated region

Isolation and detection of WUHARV adenoviruses

Stool samples from rhesus monkeys #30, 40, and 44 were diluted in media, passed through a 0.45-μιτι filter, and used to inoculate a T-25 flask containing an E1 complementing cell line such as PerC6 or Per55K cells maintained as previously described (Abbink et al., J. Virol. 81 : 4654-4663 (2007)). Upon 100% cell lysis, cells and supernatant were harvested and frozen at -20°C. Viruses were plaque purified twice. Briefly, MW6 plates were seeded with Per55K cells on day 1 . On day 0 cells were infected with log dilutions of virus. On day 1 an agar overlay was performed, plates were incubated until plaques were big enough to pick, and plaques picked and amplified in a well of a 24 well plate. Virus stocks were then generated and virions purified. Briefly, virus was amplified to inoculate 24 T-175 triple layer flasks. Cells were harvested and virus particles purified using CsCI. To detect adenoviruses, primers (Supplementary Table 1 ) were designed to amplify regions from WUHARV adenoviruses (1 -5) from contigs with a range of relatedness to the reference genomes. Crude lysate, plaques and purified virus were screened for presence of adenovirus using: 2 ul DNA, 25 ul Phusion Master Mix with HF buffer, 1 .5 ul 100% DMSO, 2 ul forward primer (10 uM), 2 ul reverse primer (10 uM), 17.5 ul H 2 0. PCR products were amplified at: 98 °C 30 sec; 98Ό 10 sec, 50 °C 10 sec, 72Ό 30 sec for 30 cycles; 72Ό 10 min. and then visualized using EtBr on a 0.8% agarose gel.

Assays and necropsy of SIV -infected rhesus monkeys

Serum levels of LPS binding protein (LBP) were quantitated by ELISA (Antibodies Online).

Twelve animals housed at the N EPRC were subjected to complete necropsy within two hours of death and representative sections of all major organs were collected, fixed in 10% neutral buffered formalin (NBF), embedded in paraffin, sectioned at 5μιη, and stained using haematoxylin and eosin (HE).

Following histopathologic examination, additional immunohistochemistry was used to analyze the degree of adenovirus infection within the small and large intestinal sections. The specific adenovirus immunohistochemistry protocol was as follows: deparaffinization and rehydration followed by a 5' block in 3% hydrogen peroxide; pre-treatment with proteinase K for 5 minutes; all steps were followed by a tris- buffered saline (TBS) wash. Prior to application of primary antibodies, all slides were treated with both a biotin block and a Dako protein block for 10 minutes each. Sections were incubated with anti-mouse adenovirus known to cross react with 41 known serovars of adenovirus (Millipore (Billerica, MA, USA), monoclonal, 1 :200) overnight at 4 S C. This was followed by 30 minute incubation at room temperature with Vectastain ABC standard. All slides were developed with DAB chromagen (Dako) and

counterstained with Mayer's haematoxylin. In all cases, step sections were incubated with isotype- specific irrelevant antibodies for negative controls and failed to show staining. Positive controls consisted of sections of small intestine positive for adenovirus.

GenBank accession numbers

Sequence data from each animal were uploaded to the MG-RAST server (version 3.12). The sequences of viral contigs presented in Figures 5 have been uploaded to GenBank with the following numbers: WUHARV Calicivirus 1 (JX627575), WUHARV Parvovorius 1 (JX627576), WUHARV

Enterovirus 1 (JX627570), WUHARV Enterovirus 2 (JX627571 ), WUHARV Enterovirus 3 (JX627572), WUHARV Sapelovirus 1 (JX627573), and WUHARV Sapelovirus 2 (JX627574).

Statistical analysis

For analysis of sequence numbers after normalization the data were log 10 transformed prior to statistical analysis. P-values were derived using the nonparametric Mann-Whitney test. P-values < 0.05 are considered significant. For analysis of bacterial families in Figure 9, we utilized one-way ANOVA with a Bonferroni correction to correct for multiple comparisons.

Construction of phylogenetic trees

We performed phylogenetic analysis for viruses with sufficient sequence information as defined by contig length is > 90% of full length of the most closely related viruses shown in Figure 5. Multiple sequence alignments were performed with ClustalW (Thompson et al. Nucleic Acids Res. 22: 4673-4680 (1994)). Phylogenetic analysis was performed using the neighbor-joining method in the PHYLIP package (Felsenstein, Phylogeny Inference Package, Department of Genome Sciences, University of Washington, Seattle (2005)) with 100 bootstrap replicates. Phylogenetic trees were visualized using TreeView (Page, CABIOS. 12: 357-358 (1 996)).

Caliciviridae sequences used for phylogentic trees

The predicted amino acid sequences of the full length polyprotein from WUHARV Calicivirus 1 were used to construct a phylogenetic tree. Polyproteins from following viruses were used: Bovine calicivirus (BoCAA09480.1 ), Calicivirus pig/F15-10/CAN (CV pig F15-10, ACQ44561 .1 ), Calicivirus pig/AB104/CAN (CV pig AB104, ACQ44563.1 ) , Calicivirus pig/NC-WGP93C/USA/2009 (CV pig NC- WGP93C, ADG27878.1), Calicivirus pig/AB90/CAN (CV pig AB90, YP_002905325.1), Norovirus Hu/GII- 4/Niigata2/2008/JP(BAJ13866.1), Norovirus dog/GVI.1/HKU_Ca026F/2007/HKG (ACV89839.1), Norovirus genogroup 3 (AFQ00092.1), Norovirus Bo/Newbury2/1976/UK (AAD16174.5), Norwalk-like virus (AAM95184.2), Norwalk virus (NP_056820.1), Tulane Virus (ACB38131.1), and WUHARV

Calicivirus 1 (JX627575).

Parvoviridae sequences used for phylogentic trees

The predicted amino acid sequences of the near full length nonstructural 1 protein from WUHARV Parvovirus 1 were used to construct the phylogenetic tree. Polyproteins from following viruses were used: Bufavirus 1 (AFN44273.1), Bufavirus2 (AFN44276.1), Canine parvovirus (CPV_ AEK69509,

AEK69509.1), Canine parvovirus (CPV_AAV54174, AAV54174.1), Feline panleukopenia virus (FPV_BAA 19018, BAA 19018.1), Feline panleukopenia virus (FPV_AAC37927, AAC37927.1), Kilham rat virus (AAC40695.1), Lull I virus (NP_821154.1), Mink enteritis virus (AEO92090.1), Minute virus of mice (ABB01353.1), Mouse parvovirus 1 (NP_042345.1), Mouse parvovirus 2 (YP_656490.1), Parvovirus H1 (NP_040318.1), Porcine parvovirus (ADN94624.1), Porcine parvovirus (ADN94588.1), and WUHARV Parvovirus 1 (JX627576).

Picornaviridae sequences used for phylogentic trees

The full length genome of WUHARV Enterovirus 1 , 2, 3, WUHARV Sapelovirus 1 and 2 were used to construct the phylogenetic tree. Genome sequences of following viruses were used: Baboon enterovirus strain A13 (BaboonEV A13, AF326750.2), Duck picornavirus TW90A (AY563023.1), Enterovirus 75 strain USA/OK85-10362 (EV 75, AY556070.1), Human echovirus 11, isolate HUN-1108 (HEchoV 11, AJ577589.1), Human enterovirus 71 strain BJ08-Z025-5 (HEV71, FJ606450.1), Human enterovirus 90 (HEV 90, AB192877.1), Human enterovirus 92 strain RJG7 (HEV 92, EF667344.1), Human coxsackievirus A2 strain C VA2/S D/C H N/09 (HCoxV A2, HQ728259.1), Human coxsackievirus A5 strain CVA5/S D/C H N/09 (HCoxV A5, HQ728261.1), Human coxsackievirus A7 strain Parker (HCox A7, AY421765.1), Porcine enterovirus 8 strain V13 (PSV-1 , Porcine sapelovirus 1 , AF406813.1), Porcine sapelovirus strain csh (PSV_csh, HQ875059.1), Simian enterovirus 46 strain RNM5 (SimianEV 46, EF667343.1), Simian enterovirus SV19 strain M19s (SV19, AF326754.2), Simian enterovirus SV43 strain OM112t (SV43, AF326761.2), Simian sapelovirus 1 strain 2383 (SimianSV-1 , AY064708.1), WUHARV Enterovirus 1 (JX627570), WUHARV Enterovirus 2 (JX627571), WUHARV Enterovirus 3 (JX627572), WUHARV Sapelovirus 1 (JX627573), and WUHARV Sapelovirus 2 (JX627574).

Example 2. Next Generation Sequencing Analysis Reveals Expansion of the Enteric Virome during Pathogenic SIV Infection

Defining the enteric virome

To define the effects of pathogenic and non-pathogenic SIV infection on the enteric virome, we shotgun sequenced libraries of fecal RNA + DNA from four independent cohorts of monkeys, each comprising SIV-infected and uninfected control animals. Two cohorts of pathogenically SIV-infected and uninfected control rhesus monkeys were housed at the New England Primate Research Center (NEPRC) or the Tulane National Primate Research Center (TNPRC) (Table 2). As expected, the set point of SIV in the serum correlated with rapid progression to AIDS and death. The NEPRC cohort was sampled at both 24 and 64 weeks after SIV infection. Two cohorts of non-pathogenically SIV-infected and uninfected control African green monkeys were housed at the National Institutes of Health (N IH, vervet monkeys) or the NEPRC (sabaeus monkeys) (Table 2).

Total RNA + DNA from fecal material were sequenced using 454 technology to leverage the resulting long sequences for robust assessment of taxonomy and assembly of viral genomes (Table 2). There was no statistical correlation between SIV infection status and either the number of total or unique sequences (viral plus other) obtained within any of the four cohorts. For each cohort, sequences were analyzed by two different computational approaches. In the first method, the taxonomic structure of the sequences was analyzed using MEGAN version 4.62.3 (build Nov. 22, 201 1 (Huson et al., Genome. Res. 17: 377-386 (2007) ; Huson et al., BMC Bioint rmatics 10(Suppl 1 ) : S12 (2009))). Each sequence was compared to the non-redundant (nr) database using BLASTX and results mapped to the NCBI Taxonomy Database. Sequences assigned to bacterial families or classes were extracted and used for subsequent analysis. The second computational approach was a custom pipeline called VirusHunter developed to identify novel viruses via analysis of both nucleic acid and protein similarity (Presti et al., J. Virol. 83: 1 1599-1 1 606 (2009) ; Loh et al., J. Virol. 83: 13019-13025 (2009) ; Zhao et al., J. Virol. 85: 1 0230-10238 (201 1 ) ; Felix et al., PLoS Biol. 9: e1000586 (201 1 ) ; Loh et al., J. Virol. 85: 2642-2656 (201 1 )).

TABLE 2: Cohorts and sequences analyzed

New England Primate Research Center

2 wpi = weeks post-infection with SIV

3 Tulane National Primate Research Center

4 National Institutes of Health Enteric virome of rhesus monkeys housed at the NEPRC

We first analyzed the enteric virome of 44 rhesus monkeys housed at the NEPRC comprised of 22 monkeys infected intrarectally with pathogenic SIVmac251 and 22 SIV-uninfected monkeys (herein termed controls) (Figures 1 A, 1 B, 2A, and 2B). Per standard husbandry procedures, SIV-infected and control rhesus monkeys were fed the same diet but housed separately. Analysis of this cohort confirmed SIV viremia in infected animals and revealed the expected decreases in CD4 T cell counts and increases in serum LBP levels consistent with intestinal leakage and consequent systemic immune activation at both 24 and 64 weeks after infection (Figures 3A-3C). We collected fecal specimens either at 24 (Figures 1 A and 2A) or 64 weeks (Figures 1 B and 2B) after SIV infection. Between these two collection times 10 SIV-infected rhesus monkeys were euthanized for progressive AIDS. As expected, the set point level of SIV in the serum of rhesus monkeys correlated with rapid progression to AIDS and death. No control animals died.

SIV infection was associated with a greater than 10-fold increase in the number of sequences from viruses (p<0.0001 ) and a decrease in sequences from bacteria (p=0.003) at 24 weeks post-infection (Figures 1 A and 2A). At this time after SIV infection, there were no statistically significant SIV-associated changes in the total number of sequences from phages, alveolata (representing protists), viridiplantae (representing food sequences from plants), or other kingdoms and phyla (Figures 1 A and 2A). Samples collected 40 weeks later (64 weeks after SIV infection) revealed increases in viral sequences in most of the surviving animals that showed low numbers of viral sequences 24 weeks after SIV infection (e.g., compare animals 23, 31 , and 33 between Figures 1 A and 1 B). Differences between SIV-infected and control monkeys, similar to those observed at 24 weeks after SIV infection, were observed for both viral (p<0.0001 ) and bacterial (p=0.035) sequences at 64 weeks after infection (Figures 1 B and 2B). By 64 weeks after SIV infection, the surviving SIV-infected monkeys showed significant decreases in the number of phage (p=0.0320), alveolata (p=0.0183), and viridiplantae (p=0.0013) sequences compared to controls (Figure 2B). These data suggest that pathogenic SIV infection is associated with significant expansion in the enteric virome.

Enteric virome of rhesus monkeys housed at the TNPRC

To confirm our findings in pathogenically SIV-infected rhesus monkeys housed at the NEPRC, we analyzed an independent cohort of 13 rhesus monkeys infected intravaginally with SIVmac251 and 29 control rhesus monkeys housed at the TNPRC (Table 2; Figures 1 C and 2C). SIV infection at the TNPRC was associated with a significant increase in viral sequences (p=0.0420) and decrease in bacterial sequences (0.0019). In the TNPRC cohort, the SIV-infected monkeys showed significant increases in the number of phage (p=0.0133) sequences (Figures 1 C and 2C). Similar to the 24 week time point in the NEPRC cohort there were no significant changes in sequences from phage, alveolata, viridiplantae, or sequences from other kingdoms and phyla (Figures 1 C and 2C). These results confirm that an expansion of the enteric virome is associated with pathogenic SIV infection in two independent cohorts of rhesus monkeys.

Enteric virome of African green monkeys

We next assessed whether the pathogenic SIV infection-associated changes in the enteric virome observed in rhesus monkeys (Figures 1 A-1 C and 2A-2C) were also seen during non-pathogenic SIV infection in African green monkeys (Table 2; Figures 1 D-1 E and 2D-2E). The vervet African green monkey cohort housed at the N IH (Figures 1 D and 2D) was comprised of six monkeys infected intravenously with SIVagm90, two monkeys infected intravenously with SIVagmVeM , 1 1 monkeys naturally infected with SIV, and 19 uninfected control animals. The cohort of sabaeus African green monkeys housed at the NEPRC (Table 2; Figures 1 E and 2E) was comprised of two monkeys infected intravenously with SIVagmMJ8, 8 monkeys infected intravenously with SIVagm9315BR and 6 uninfected control animals. Analysis of these two sets of sequences revealed an increase in phage sequences in the N IH cohort (p=0.0331 ) that was not observed in the NEPRC cohort, but no other significant SIV infection-associated changes were observed in either cohort including for the virome (Figures 1 D-1 E and 2D-2E). These data indicate that the expansion of the enteric virome observed during pathogenic SIV infection is not observed during non-pathogenic SIV infection. Importantly, these African green monkeys had been infected with non-pathogenic SIV for a prolonged period (a minimum of 3 years for the N IH cohort, and from 27 weeks (2 animals) to 2.6 years (8 animals) for the NEPRC cohort). Therefore, the lack of an increase in viral sequences in these SIV-infected animals is not attributable them being infected for a shorter time than the pathogenically SIV-infected rhesus monkeys analyzed above.

Example 3. Viruses Present in SIV-infected Rhesus and African Green Monkeys

We next defined the nature of the viral sequences that we detected in SIV-infected and uninfected monkeys using VirusHunter software (Presti et al., J. Virol. 83: 1 1599-1 1606 (2009) ; Loh et al., J. Virol. 83: 13019-13025 (2009) ; Zhao et al., J. Virol. 85: 1 0230-10238 (201 1 ) ; Felix et al., PLoS Biol. 9: e1000586 (201 1 ) ; Loh et al., J. Virol. 85: 2642-2656 (201 1 )). When a nucleotide sequence did not have significant similarity to the genome of an already sequenced virus, we analyzed the predicted translation products and selected the most closely related virus in the database for comparison. This analysis allowed us to determine which types of viruses were detected in individual animals in each cohort

(Figures 4A-4E). Using conservative criteria we detected at least 32 distinct and previously undescribed viruses in the sequences generated from individual rhesus monkeys housed at the NEPRC alone (Figures 4A and 4B). Certain viruses were found in multiple different animals, indicating shared exposure to enteric viruses. We did not count circoviruses in this estimate due to their ubiquity and diversity.

Importantly, we found no significant differences in known insect (Dicistroviridae, Iflaviridae) or plant viruses, which are presumably derived from the diet, comparing SIV-infected animals and control animals in any cohort (Figures 4A-4E). The lack of differences in viruses from insects and plants between SIV- infected and SIV uninfected monkeys provides an important internal control indicating that the process of library construction and analysis does not artificially expand the number of mammalian viral sequences in samples from SIV-infected rhesus monkeys.

Newly identified viruses included five adenoviruses, three caliciviruses, one papillomavirus, seven members of the Parvoviridae (2 parvovirus/amdoviruses, five dependoviruses, and one bocavirus), seven picobirnaviruses, and seven members of the Picornavirales (three enteroviruses, 3 sapeloviruses, and one picornavirus), and one polyomavirus (Figures 4A and 4B; Table 3). Importantly, many SIV-infected rhesus monkeys at both the NEPRC and the TNPRC were shedding multiple potentially pathogenic viruses (Figures 4A-4C). The presence of multiple novel viruses, and of individual animals infected with multiple distinct viruses, was not regularly observed in control rhesus monkeys housed at the same locations. In striking contrast, cohorts of African green monkeys housed at either the N EPRC or the N IH were relatively free of virus infection whether SIV-infected or not (Figures 4D and 4E).

As previously observed by others using classical virologic methods (Wang et al., J. Med.

Primatol. 36: 101 -107 (2007) ; Oberste et al., J. Gen. Virol. 88: 3360-3372 (2007) ; Oberste et al., J. Virol. 76: 1244-1251 (2002) ; Sasseville et al., J. Immunotoxicol. 7: 79-92 (2010) ; Bailey et al., Vet. Pathol. 47: 462-481 (2010)), picornaviruses were detected in both control and SIV-infected rhesus monkeys (Figure 4; Table 3). This allowed us to compare the number of sequences detected from pathogenic SIV-infected rhesus monkeys versus control animals (Figure 4F). In monkeys housed at either the NEPRC or the

TNPRC there were significant increases in the number of sequences derived from picornaviruses in SIV- infected animals compared to controls (p=0.0002 and 0.0004 for the NEPRC rhesus animals at 24 or 64 weeks of infection, p=0.0247 for the TNPRC rhesus animals). No relationship was detected between picornavirus sequences and SIV with non-pathogenic SIV infection of African green monkeys. These data are consistent with a failure to control picornavirus infection in association with pathogenic SIV infection.

TABLE 3: Summary of viruses identified in Rhesus macaques at the NEPRC

Rhesus

0-81 % nt

macaque

WUHARV (50/64, - recovirus 23, 28 * ,

2881413, - - 32, 39 *

Calicivirus 3 strain

216/268)

FT437

WUHARV

Human 69% nt

15 25 - 32

Papillomavirus papillomavirus (300/432)

PapiMomaviridae

1

WUHARV Human

73% nt

bocavirus - - - 29

(1 18/160)

Bocavirus 1 isolate KU3

WUHARV

Adeno- 93% nt

associated - 40 * - 29

Dependovirus

virus 1 1 (3812/4090)

1

23, 30,

WUHARV

Adeno- 92% nt 31 , 32,

associated - 34 * , 38, - 27, 29

Dependovirus

virus 10 (3680/4020) 39 * . 40 * ,

2

44 *

WUHARV Adeno-

94% nt

associated

- 40 * - -

Dependovirus virus isolate

(1680/1793)

3 rh.31

WUHARV Adeno-

86% nt

associated

- 37, 40 * - -

Dependovirus

Parvoviridae virus isolate

(988/1 145)

4 rh.8R

WUHARV

Adeno- 86% nt

associated - 26 - -

Dependovirus

virus 7 (264/307)

5

WUHARV 77% nt 24 * , 31 ,

Bufavirus 2 7 33, 38, - 25, 37 Parvovirus 1 (1626/21 1 1 ) 39 *

75-79% nt

WUHARV

Bufavirus 2 (522/698, - 35 * - 27 Parvovirus 2

139/175)

WUHARV

26% aa

Human

- 26, 36 * - -

Picobirnavirus picobirnavirus

(147/574)

1 Picobirnaviridae WUHARV

29% aa

Otarine 1 ,

- - -

Picobirnavirus picobirnavirus 1 1

(1 19/415)

2

WUHARV

29% aa

Human 8, 15

- - -

Picobirnavirus picobirnavirus

(102/354)

3

WUHARV

Human 36% aa £Z

- - -

Picobirnavirus picobirnavirus (94/260)

4

WUHARV

22 37

Human 34% aa

- -

Picobirnavirus picobirnavirus (63/187)

5

WUHARV

36 *

Human 37% aa

- - -

Picobirnavirus picobirnavirus (55/149)

6

WUHARV 33% aa

Human £Z

Picobirnavirus - - - picobirnavirus

7 (101 /302)

WUHARV Human 86% nt 41, 44 *

enterovirus 92 - - - Enterovirus 1 strain RJG7 (6228/7268)

23, 25,

26,

Simian 25, 29,

WUHARV 83% nt

enterovirus 27, 29, 30, 32,

- SV19 strain

Enterovirus 2 (5879/71 00) 31 , 33, 38,

M19s 41

Picornaviridae 35 * , 38,

39 *

Simian 84%nt 25, 26,

WUHARV

enterovirus 27, 29,

- - - SV19 strain (5854/6961 ) 30, 33, Enterovirus 3

M19s 37, 41

WUHARV

Simian 85% nt

- 26 - - picornavirus 6 (284/335)

Picornavirus 1 25, 26,

29, 30, 1 , 23, 25,

WUHARV 81 % nt

31 , 32, 1 1 , 26, 30,

Simian

17, 31 , 32, Sapelovirus 1 sapelovirus 1 (6558/8087) 19

34 * , 35 * , 19, 33, 37, 38, 39 * , 22 38 42 * , 43 *

WUHARV 81 % nt 25, 28 * ,

Simian

- 35 * . 37, - 29,41 sapelovirus 1

Sapelovirus 2 (6510/8076) L

WUHARV 79% nt

Simian 23, 37,

- - 27 sapelovirus 1 40 *

Sapelovirus 3 (5476/6919)

WUHARV Polyomavirus

76% nt

Polyomaviridae - - - 29

Polyomavirus HPyV6 isolate (242/318)

1 601 a

Viruses with 98% nt identity over the full length of aligned regions are the "same" virus.

2 Most closely related viruses were identified as the top hit using a NCBI web-based BLAST search against the NCBI nr database

on 8/31 /2012.

3 Percent aa identity is reported for viruses for which no known virus had nt identity.

4 As determined by 454 sequencing.

5 Underlined numbers indicate animals from which virus contigs were assembled.

* Euthanized for progressive AIDS 24 to 64 weeks after SIV infection Genomic analysis of viruses in rhesus monkeys at the NEPRC

We next analyzed the viruses present in the NEPRC cohort by assembling virus sequences from individual animals into contigs which could then be compared to the most closely related virus present in the database (see, e.g., Figures 5A-5D, 6A, and Table 3). Here, these viruses will be named using the convention "WUHARV-virus name-number." Even within a single animal we found contigs from distinct but related viruses, indicating that some animals were shedding more than one virus of the same genus (Figures 4, 5, and Table 3).

We detected at least four adenoviruses (WUHARV Adenovirus 1 -4) in the NEPRC cohort (Figure 6A depicts WUHARV Adenovirus 1 ). We assembled portions of three calicivirus (WUHARV Caliciviruses 1 -3) genomes (Figure 5A). Importantly, WUHARV Caliciviruses 1 and 2 were most closely related to, but distinct from , the known primate calicivirus virus Tulane (Farkas et al., J. Virol. 82: 5408-5416 (2008) ;

Farkas et al., J. Gen. Virol. 91 : 734-738 (2010) ; Wei et al., J. Virol. 82: 1 1429-1 1436 (2008) ; Farkas et al., J. Virol. 84: 8617-8625 (2010)). For example, WUHARV Calicivirus 1 shared only 75% nucleotide identity over the 6,489-bp contig we assembled with Tulane calicivirus and was phylogenetically distinct from Tulane (Figure 7A). WUHARV Calicivirus 3 was quite distantly related to either Tulane virus or to WUHARV Caliciviruses 1 and 2 (Figure 5A). We detected parvoviruses most closely related to Bufavirus 2, a recently described parvovirus (Phan et al., J. Virol. [Epub ahead of print] (2012)) (Figures 5B and 7B). We assembled viral contigs covering most of the 7,000-8,000-bp genomes of several enteroviruses or sapeloviruses, both within the Picornaviridae (Figures 5C, 5D, and 7C). WUHARV Enteroviruses 1 -3 share nucleotide similarity most closely with simian enterovirus SV1 9 with 73-84% nucleotide identity over a large portion of the genome. WUHARV Sapeloviruses 1 -3 are most closely related to simian sapelovirus 1 strain 2383, sharing 79 to 81 % nucleotide identity over essentially the entire genome. These data confirm that a remarkably wide variety of viruses are included within the expansion of the enteric virome associated with pathogenic SIV infection.

Next generation sequencing-independent confirmation of virome findings

We considered the possibility that relying on next generation sequencing (NGS) to document expansion of the enteric virome associated with pathogenic SIV infection might lead to false conclusions. For example, perhaps all detected viruses were present in multiple monkeys but the sequencing process is somehow biased by pathogenic SIV infection. To address this, we designed PCR assays to detect viruses for which we had large portions of the genome (Figure 5E, Table 1 ) and used independent assays to detect viruses (Figure 5E). PCR is a standard and sensitive diagnostic approach to detection of viruses in biological samples. In some cases the contigs were so divergent from each other that separate PCR assays had to be designed for different viruses in the same group. For example, one PCR assay was developed to detect WUHARV Caliciviruses 1 and 2 while a different assay was developed to detect WUHARV Calicivirus 3, which is highly divergent from all know caliciviruses. Overall PCR analysis correlated well with 454 detected viruses. PCR was positive for 454-detected viruses in 62/69 (90%) cases (Figure 5E), with some of the failures potentially related to the presence of viruses that were divergent from the viruses used to design PCR primers. Consistent with this, PCR detected viruses in samples when as few as 1 -2 viral sequences were detected in 454-derived datasets.

Compared to NGS, PCR detected 5/7 adenoviruses (failing to detect divergent adenoviruses in animals #34 and #39), 14/1 6 caliciviruses (failing to detect divergent caliciviruses in animals #23 and #24), 10/1 1 parvovirus genus members (parvoviridae, failing to detect a divergent parvovirus in animal #7), 1 1 /12 enterovirus genus members (picornaviridae, failing to detect a divergent enterovirus in animal #34) and 22/23 sapelovirus genus members (picornaviridae, failing to detect a non-divergent virus in animal #19 representing a true false negative). Importantly, PCR was negative for virus infection in a total of 151 /151 cases for adenoviruses, caliciviruses, parvoviruses, enteroviruses, and sapeloviruses when next generation sequencing followed by bioinformatic analysis did not reveal a viral sequence.

To further confirm NGS results, we cultured viruses from fecal samples. NGS data revealed (Figures 4A, 6A, and Table 3) that multiple animals at NEPRC were potentially infected by novel adenoviruses. We therefore selected fecal samples from four animals for potential isolation of adenoviruses. Within the NEPRC cohort we selected feces from animals #40 (60 adenovirus sequences), #44 (138 adenovirus sequences), and #30 (2 adenovirus sequences), as well as a fourth rhesus monkey not in this cohort (57 adenovirus sequences of 5,758 unique reads) and sought to isolate viruses from them. We cultured five adenoviruses from these four animals (WUHARV Ad#1 -5). These viruses were sequentially plaque purified and then isolated on cesium chloride gradients. The identity of these purified viruses as the adenoviruses detected in 454 sequencing was confirmed by PCR and sequencing (WUHARV Ad1 , shown in Figure 6A). Together both PCR and culture analyses confirmed the presence of viruses detected by NGS in fecal samples from pathogenic SIV-infected animals. Example 4. Novel Viruses Detected by Next Generation Sequencing are Associated with AIDS Enteropathy

We next considered the possibility that detection of viral sequences in feces would predict intestinal disease in SIV-infected rhesus monkeys. This is a key question because our other data demonstrate only that viruses are shed in feces. To determine if viruses detected by sequencing can be clinically significant we evaluated the small and large intestine of 12 SIV-infected rhesus monkeys housed at the NEPRC (Figure 1 B, results summarized in Table 4). Of the 12 animals necropsied, six had intestinal pathology characteristic of infection with cytomegalovirus or Balantidium (Table 4). Importantly, analysis of the fecal virome of these 12 animals revealed that three animals (#23, 27, and 41 ) had high levels of adenovirus sequences prior to necropsy (Table 4). These three rhesus macaques, but not others in this necropsy cohort, exhibited adenovirus-associated enteritis by histologic examination (Figure 6C and 6D, (i) and (ii)). Of these three monkeys with enteropathy, all had lesions in the jejunum and ileum (ileitis) while one also had lesions in the cecum (colitis). Immunohistochemistry for adenovirus confirmed the diagnosis of adenovirus ileitis and colitis (Figure 6C and 6D, (iii) and (iv)). Together these data demonstrate that viruses detected in the fecal material from SIV-infected rhesus monkeys using next generation sequencing can cause intestinal disease and epithelial damage in SIV-infected macaques.

To further investigate the clinical relevance of viruses detected by NGS, we used virus specific PCR assays (Table 1 ) to determine whether viruses detected in the fecal material of SIV-infected rhesus monkeys (Figures 4A and 5E) were present in serum. We detected parvovirus (Figure 5E) in 4/10 serum samples taken at the time animals were euthanized for advanced AIDS between 24 and 64 weeks postinfection. Sequence analysis of PCR amplicons demonstrated that 3/4 viruses present in fecal material (animals #24, #28, and #39) were also present in serum. This suggests that viruses detected in the fecal material of SIV-infected rhesus monkeys can invade tissues and enter the circulation, further supporting the conclusion that SIV-associated expansion of the enteric virus may contribute to disease.

TABLE 3: Summary of adenovirus detection and pathology

in SIV-infected rhesus monkeys

a Number of adenovirus sequences detected at 64 weeks.

b Results from PCR for indicated adenovirus (primers, Supplemental Table 1 ).

c Results obtained at necropsy.

d Novel adenoviruses highly diverged from Adenovirus 1 -5 as well known adenoviruses.

Example 5. SIV Infection and the Bacterial Microbiome

We next assessed the effects of SIV infection on the taxonomy of the bacterial microbiome. Our metagenomic data was comparable to published 16S rDNA-derived class-level data from SIV-infected and control macaques at TNPRC (McKenna et al., PLoS. 4: e20 (2008)), indicating that these distinct methods yield overall similar results (Figures 8A and 8B). Rarefaction analysis revealed that all but a few samples with very high numbers of viral sequences were robust for analysis of bacterial diversity at the family level (Figures 8C-8F). Species accumulation curves indicated that all cohorts except the NEPRC African green monkey cohort were robust for this analysis; further analysis excluded this cohort (Figure 8G). We detected no consistent SIV-associated differences in bacterial family richness, evenness, or diversity (Legendre and Legendre. Numerical Ecology, Second English Edition. Amsterdam : Elsevier Science (1998)). There was a statistically significant difference in Shannon Diversity in the NEPRC cohort sampled 64 weeks post-infection between SIV infected and uninfected control monkeys

(p=0.0345), but this was not replicated in either other cohort of monkey infected with pathogenic SIV (NEPRC cohort at 24 weeks of infection, TNPRC cohort, Figure 8C). There were no significant differences between SIV-infected and uninfected monkeys in any cohort amongst the most-represented 20 bacterial families (Figures 9A-9D). There was no significant difference in bacterial family evenness across cohorts (Figure 8C-8F). Additional analysis using principal component analysis and both supervised and unsupervised random forest analysis (Yatsuneko et al., Nature. 486: 222-227 (2012)) showed no association between SIV infection and the bacterial microbiome. Further we failed to find an association between SIV infection and either the genus- or species-level taxonomic structure of the bacterial microbiome. Thus, in contrast to our analysis of the virome, we detected no consistent SIV- infection associated differences in the family-level taxonomy of the bacterial microbiome.

Other Embodiments

While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure that come within known or customary practice within the art to which the invention pertains and may be applied to the essential features hereinbefore set forth.

All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each independent publication or patent application was specifically and individually indicated as being incorporated by reference in their entirety.