Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
HIV-1 DETECTION
Document Type and Number:
WIPO Patent Application WO/2014/037712
Kind Code:
A2
Abstract:
The present invention relates to a reagent binding to a highly conserved HIV-1 sequence, wherein the highly conserved HIV-1 sequence is SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6¸ SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 or SEQ ID NO: 16; or the RNA form of any of SEQ ID NOS: 1 to 8.

Inventors:
GALL ASTRID (GB)
KELLAM PAUL (GB)
Application Number:
PCT/GB2013/052311
Publication Date:
March 13, 2014
Filing Date:
September 04, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENOME RES LTD (GB)
International Classes:
C12Q1/70
Domestic Patent References:
WO2003070193A22003-08-28
WO2005075679A22005-08-18
WO2003020878A22003-03-13
WO2013111800A12013-08-01
Foreign References:
US0005322A1847-10-09
US0000770A1838-06-07
US0005310A1847-09-25
US0000652A1838-03-23
US0005686A1848-08-01
US0000272A1837-07-17
Other References:
ARCHER, J.; G. BAILLIE; S. J. WATSON; P. KELLAM; A. RAMBAUT; D. L. ROBERTSON: "Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II", BMC BIOINFORMATICS, 2012, pages 13
BAKER, M.: "Structural variation: the genome's hidden architecture", NAT METHODS, vol. 9, 2012, pages 133 - 137
CHARNEAU, P.; A. M. BORMAN; C. QUILLENT; D. GUETARD; S. CHAMARET; J. COHEN; G. REMY; L. MONTAGNIER; F. CLAVEL: "Isolation and envelope sequence of a highly divergent HIV-1 isolate: definition of a new HIV-1 group", VIROLOGY, vol. 205, 1994, pages 247 - 253
GALL, A.; S. KAYE; S. HUE; D. BONSALL; R. RANCE; G. J. BAILLIE; S. FIDLER; J. N. WEBER; M. O. MCCLURE; P. KELLAM: "Restriction of sequence diversity in the V3 region of the HIV-1 envelope gene during antiretroviral treatment in a cohort of recent seroconverters", RETROVIROLOGY, 2012
GARCIA-LERMA, J. G.; H. MACINNES; D. BENNETT; H. WEINSTOCK; W. HENEINE: "Transmitted human immunodeficiency virus type 1 carrying the D67N or K219Q/E mutation evolves rapidly to zidovudine resistance in vitro and shows a high replicative fitness in the presence of zidovudine", JOURNAL OF VIROLOGY, vol. 78, 2004, pages 7545 - 7552
GURTLER, L. G.; P. H. HAUSER; J. EBERLE; A. VON BRUNN; S. KNAPP; L. ZEKENG; J. M. TSAGUE; L. KAPTUE: "A new subtype of human immunodeficiency virus type 1 (MVP-5180) from Cameroon", JOURNAL OF VIROLOGY, vol. 68, 1994, pages 1581 - 1585
HEMELAAR, J.; E. GOUWS; P. D. GHYS; S. OSMANOV: "Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004", AIDS, 2006, pages W13 - 23
HENN, M. R.; C. L. BOUTWELL; P. CHARLEBOIS; N. J. LENNON; K. A. POWER; A. R. MACALALAD; A. M. BERLIN; C. M. MALBOEUF; E. M. RYAN;: "Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection", PLOS PATHOG, vol. 8, 2012, pages E1002529
HIV/AIDS, J. U. N. P., UNAIDS WORLD AIDS DAY REPORT., 2011
HOLMES, H.; C. DAVIS; A. HEATH: "Development of the 1st International Reference Panel for HIV-1 RNA genotypes for use in nucleic acid-based techniques", J VIROL METHODS, vol. 154, 2008, pages 86 - 91
HUANG, X.; A. MADAN: "CAP3: A DNA sequence assembly program", GENOME RESEARCH, vol. 9, 1999, pages 868 - 877
HUELSENBECK, J. P.; F. RONQUIST.: "MRBAYES: Bayesian inference of phylogenetic trees", BIOINFORMATICS, vol. 17, 2001, pages 754 - 755
JABARA, C. B.; C. D. JONES; J. ROACH; J. A. ANDERSON; R. SWANSTROM: "Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID", P NATL ACAD SCI USA, vol. 108, 2011, pages 20166 - 20171
KATOH, K.; K. MISAWA; K. KUMA; T. MIYATA: "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform", NUCLEIC ACIDS RES, vol. 30, 2002, pages 3059 - 3066
KORBER, B.; M. MULDOON; J. THEILER; F. GAO; R. GUPTA; A. LAPEDES; B. H. HAHN; S. WOLINSKY; T. BHATTACHARYA: "Timing the ancestor of the HIV-1 pandemic strains", SCIENCE (NEW YORK, vol. 288, 2000, pages 1789 - 1796
KRZYWINSKI, M.; J. SCHEIN; BIROL, J. CONNORS; R. GASCOYNE; D. HORSMAN; S. J. JONES; M. A. MARRA: "Circos: an information aesthetic for comparative genomics", GENOME RESEARCH, vol. 19, 2009, pages 1639 - 1645
LAURING, A. S.; R. ANDINO: "Quasispecies theory and the behavior of RNA viruses", PLOS PATHOG, vol. 6, 2010, pages E1001005
LI, H.; R. DURBIN: "Fast and accurate long-read alignment with Burrows-Wheeler transform", BIOINFORMATICS, vol. 26, 2010, pages 589 - 595
MARGULIES, M.; M. EGHOLM; W. E. ALTMAN; S. ATTIYA; J. S. BADER; L. A. BEMBEN; J. BERKA; M. S. BRAVERMAN; Y. J. CHEN; Z. CHEN: "Genome sequencing in microfabricated high-density picolitre reactors", NATURE, vol. 437, 2005, pages 376 - 380
MARTIN, D. P.; D. POSADA; K. A. CRANDALL; C. WILLIAMSON: "A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints", AIDS RES HUM RETROV, vol. 21, 2005, pages 98 - 102
MARTIN, D. P.; C. WILLIAMSON; D. POSADA: "RDP2: recombination detection and analysis from sequence alignments", BIOINFORMATICS, vol. 21, 2005, pages 260 - 262
MEYERHANS, A.; J. P. VARTANIAN; S. WAINHOBSON: "DNA Recombination during Pcr", NUCLEIC ACIDS RES, vol. 18, 1990, pages 1687 - 1691
PLANTIER, J. C.; M. LEOZ; J. E. DICKERSON; F. DE OLIVEIRA; F. CORDONNIER; V. LEMEE; F. DAMOND; D. L. ROBERTSON; F. SIMON: "A new human immunodeficiency virus derived from gorillas", NATURE MEDICINE, vol. 15, 2009, pages 871 - 872
POSADA, D.: "jModelTest: phylogenetic model averaging", MOL BIOL EVOL, vol. 25, 2008, pages 1253 - 1256
ROBERTSON, D. L.; P. M. SHARP; F. E. MCCUTCHAN; B. H. HAHN: "Recombination in HIV- 1", NATURE, vol. 374, 1995, pages 124 - 126
SALAZAR-GONZALEZ, J. F.; M. G. SALAZAR; B. F. KEELE; G. H. LEARN; E. E. GIORGI; H. LI; J. M. DECKER; S. Y. WANG; J. BAALWA; M. H.: "Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection", J EXP MED, vol. 206, 2009, pages 1273 - 1289
SALMINEN, M. O.; J. K. CARR; D. S. BURKE; F. E. MCCUTCHAN: "Identification of Breakpoints in Intergenotypic Recombinants of Hiv Type-1 by Bootscanning", AIDS RES HUM RETROV, vol. 11, 1995, pages 1423 - 1425
SIMON, F.; P. MAUCLERE; P. ROQUES; LOUSSERT-AJAKA; M. C. MULLER-TRUTWIN; S. SARAGOSTI; M. C. GEORGES-COURBOT; F. BARRE-SINOUSSI; F: "Identification of a new human immunodeficiency virus type 1 distinct from group M and group O", NATURE MEDICINE, vol. 4, 1998, pages 1032 - 1037
SNOECK, J.; J. FELLAY; BARTHA; D. C. DOUEK; A. TELENTI.: "Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints", RETROVIROLOGY, vol. 8, 2011, pages 87
TAYLOR, B. S.; M. E. SOBIESZCZYK; F. E. MCCUTCHAN; S. M. HAMMER: "The challenge of HIV-1 subtype diversity", N ENGL J MED, vol. 358, 2008, pages 1590 - 1602
VANDEN HAESEVELDE, M.; J. L. DECOURT; R. J. DE LEYS; B. VANDERBORGHT; G. VAN DER GROEN; H. VAN HEUVERSWIJN; E. SAMAN: "Genomic cloning and complete sequence analysis of a highly divergent African human immunodeficiency virus isolate", JOURNAL OF VIROLOGY, vol. 68, 1994, pages 1586 - 1596
WESTBY, M.; M. LEWIS; J. WHITCOMB; M. YOULE; A. L. POZNIAK; T. JAMES; T. M. JENKINS; M. PERROS; E. VAN DER RYST: "Emergence of CXCR4-Using human immunodeficiency virus type 1 (HIV-1) variants in a minority of HIV-1-Infected patients following treatment with the CCR5 antagonist maraviroc is from a pretreatment CXCR4-using virus reservoir", JOURNAL OF VIROLOGY, vol. 80, 2006, pages 4909 - 4920
YU, Q.; E. M. RYAN; T. M. ALLEN; B. W. BIRREN; M. R. HENN; N. J. LENNON: "PriSM: a primer selection and matching tool for amplification and sequencing of viral genomes", BIOINFORMATICS, vol. 27, 2011, pages 266 - 267
ZHOU, B.; M. E. DONNELLY; D. T. SCHOLES; K. ST GEORGE; M. HATTA; Y. KAWAOKA; D. E. WENTWORTH: "Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses", JOURNAL OF VIROLOGY, vol. 83, 2009, pages 10309 - 10313
Attorney, Agent or Firm:
OLSWANG LLP (London, Greater London WC1V 6XX, GB)
Download PDF:
Claims:
Claims

1. A reagent binding to a highly conserved HIV-1 sequence, wherein the highly conserved HIV-1 sequence is SEQ ID NO: 1 , SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 or SEQ ID NO: 16; or the RNA form of any of SEQ ID NOS: 1 to 8.

2. A reagent as claimed in claim 1 , wherein the reagent comprises an oligonucleotide, such as an oligonucleotide having the sequence as shown in SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23 or SEQ ID NO: 24 or the RNA form of any of SEQ ID NOS: 17 to 24. or an oligonucleotide comprising at least a fragment of 5 contiguous nucleotides of any of these sequences.

3. A HIV-1 primer set comprising:

(i) at least one forward primer comprising SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21 and SEQ ID NO: 23 or an oligonucleotide comprising at least a fragment of 5 contiguous nucleotides of any of these sequences or an oligonucleotide consisting essentially of any of these sequences; and

(ii) at least one reverse primer comprising SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22 and SEQ ID NO: 24 or an oligonucleotide comprising at least a fragment of 5 contiguous nucleotides of any of these sequences or an oligonucleotide consisting essentially of any of these sequences. 4. A HIV-1 primer set comprising:

(i) SEQ ID NO: 17 and SEQ ID NO: 18 or a pair of oligonucleotides comprising at least a fragment of 5 nucleotides from SEQ ID NO: 17 and from SEQ ID NO: 18 or consisting essentially of SEQ ID NO: 17 and SEQ ID NO: 18;

(ii) SEQ ID NO: 19 and SEQ ID NO: 20 or a pair of oligonucleotides comprising at least a fragment of 5 nucleotides from SEQ ID NO: 19 and from SEQ ID NO: 20 or consisting essentially of SEQ ID NO: 19 and SEQ ID NO: 20; (iii) SEQ ID NO: 21 and SEQ ID NO: 22 or a pair of oligonucleotides comprising at least a fragment of 5 nucleotides from SEQ ID NO: 21 and from SEQ ID NO: 22 or consisting essentially of SEQ ID NO: 21 and SEQ ID NO: 22;

(iv) SEQ ID NO: 23 and SEQ ID NO: 24 or a pair of oligonucleotides comprising at least a fragment of 5 nucleotides from SEQ ID NO: 23 and from SEQ ID NO: 24 or consisting essentially of SEQ ID NO: 23 and SEQ ID NO: 24; or

(v) a combination of one or more or all primer sets in (i) to (iv) above.

5. A method for detection of a HIV-1 nucleic acid in a sample by using a reagent binding to a highly conserved HIV-1 sequence, wherein the highly conserved HIV-1 sequence is SEQ ID NO: 1 ; SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6 , SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 or SEQ ID NO: 16.

6. A method for manipulation of a HIV-1 genome or of HIV-1 expression by using a reagent binding to a highly conserved HIV-1 sequence, wherein the highly conserved HIV-1 sequence is SEQ ID NO: 1 ; SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6 , SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11 , SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15 or SEQ ID NO: 16.

7. Use of one or more or all of the oligonucleotides as claimed in claim 2 in a nucleic acid amplification reaction or as a probe for the detection of HIV-1 nucleic acid in a sample. 8. A method for detection of HIV-1 nucleic acid in a sample, comprising carrying out a polymerase chain reaction using one or more or all HIV-1 primer sets as claimed in claim 3 or claim 4 and detecting the presence of an amplification product.

9. A method for diagnosis of an individual as HIV-1 positive, the method comprising detecting the presence of HIV-1 nucleic acid in a sample from the individual by using a reagent as claimed in claim 1 or claim 2 or using one or more of the primer sets as claimed in claim 3 or claim 4 in a nucleic acid amplification reaction and detecting the presence of an amplification product or as a probe and detecting probe binding.

10. A method of detecting or diagnosing HIV-1 in a population infected with heterogeneous HIV-1 strains, the method comprising using a reagent as claimed in claim 1 or claim 2 or using one or more of the primer sets as claimed in claim 3 or claim 4 in a nucleic acid amplification reaction and detecting the presence of an amplification product or as a probe and detecting probe binding, wherein the same reagent or primer set can detect the heterogeneous HIV-1 strains present in the population.

11. A method for determining the sequence of HIV-1 nucleic acids (in genomic DNA, proviral DNA, viral RNA) in a sample comprising:

(a) preparing DNA, cDNA (complementary Deoxyribonucleic Acid) of a sample to be detected or sequencing RNA directly (b) contacting the sample with amplification reagents and a primer set as claimed in claim 2 or claim 3 to form a reaction mixture;

(b) placing the reaction mixture under amplification conditions to form an amplification product;

(c) sequencing the amplification product; and

(d) assembling amplification products in the sample to determine the partial or complete HIV-1 nucleic acid sequence.

12. A method for identifying the HIV-1 group or subtype in a sample, the method comprising determining the HIV-1 nucleic acid sequence in the sample as set out in claim 11 and comparing the HIV- 1 nucleic acid sequence in the sample with reference sequences of known HIV-1 groups and subtypes.

13. A method for identifying the presence of mutations in HIV-1 genotypes from samples, the method comprising determining the HIV-1 nucleic acid sequence in the sample as set out in claim 11 and comparing the HIV-1 nucleic acid sequence in the sample with a reference sequence.

14. A method as claimed in claim 13, wherein the mutation identified is associated with drug resistance.

15. A method as claimed in any one of claims 11 to 14, wherein the preparation of cDNA and amplification as required in steps (a) to (c) is carried out by a one-step reverse transcription polymerase chain reaction protocol.

16. A method as claimed in any one of claims 11 to 15, wherein sequencing the amplification product is carried out using a sequencing technique.

17. A method as claimed in any one of claims 11 to 16, wherein the amplification products are assembled by de novo assembly or mapping of short reads against a reference sequence. 18. A method as claimed in any one of claims 11 to 17, wherein the sample comprises isolated nucleic acids, dried blood, plasma or whole blood.

19. A kit for determining the sequence of HIV-1 nucleic acids in a sample, the kit comprising a reagent as claimed in claim 1 or claim 2 or a primer set as claimed in claim 3 or claim 4 or a combination of any two or more reagents and/ or primer sets. 20. A kit for detection of HIV-1 nucleic acids in a sample, the kit comprising a reagent as claimed in claim 1.

21. A kit for manipulation of HIV-1 expression or genetic modification comprising a reagent as claimed in claim 1

22. A kit for detection of HIV-1 nucleic acids in a sample, the kit comprising one or more or all of oligonucleotides as claimed in claim 2.

23. A kit as claimed in any one of claims 19 to 22, further comprising suitable amplification reagents.

24. A kit as claimed in claim 22, further comprising means for detecting the nucleic acids in the sample such as a detectable label.

Description:
HIV-1 DETECTION

Technical Field

The present invention relates to HIV. In particular the invention relates to binders of HIV-1 consensus sequences, universal primers and methods for detecting and genotyping HIV-1. Background

The Human Immunodeficiency Virus 1 (HIV-1) is one of the most genetically diverse viruses known. Four main genetic groups have been described: The major group M, which causes ~85% of infections worldwide and is further divided into nine subtypes (A-D, F-H, J and K), the outlier group O, the non- major and non-outlier group N, and another recently designated group P. Up to 35% amino acid differences between subtypes are found, and strains belonging to the same subtype can vary by up to 20%. In addition, inter-subtype recombination is common. Circulating recombinant forms (CRFs), which are found in three or more epidemiologically unlinked individuals, and unique recombinant forms (URFs), identified in less than three individuals, consist of mosaic genomes with sections of two or more subtypes. To date 51 CRFs have been identified. Viruses classified as HIV contain RNA as their genetic information and the infectivity of HIV depends upon the virus's ability to insert its genetic information into the DNA of a host. In order to insert its genetic information and therefore successfully infect a host, an HIV virus must convert its genetic material (RNA) into DNA and subsequently transcribe this proviral DNA into viral RNA. However, while the virus may successfully convert RNA into DNA, both the conversion and the transcription are seldom accurate. The DNA copy can diverge from the viral RNA by several base pairs. Hence, while a host initially may be infected with a single virus particle, after several rounds of replication, the host may be infected with a genetically diverse population of viruses.

There is no universal HIV detection system available at this time. It is clear that viruses known as HIV have genomes that are highly mutable and are therefore constantly changing. This presents those searching for methods of detecting the virus based upon its genetic information with a constantly moving target. Due to the remarkable degree of sequence heterogeneity, a universal method for generation of large numbers of HIV-1 genome sequences has remained elusive. The ability to generate HIV-1 genome sequences is crucial for understanding the dynamics of the pandemic at the population level, and viral diversification, including acquisition of drug resistance mutations, in individual patients. The development of therapy as well as vaccine and the measure of their efficacy relies on availability of epidemiological data at the population level as well as tools to monitor viral status at the individual level. Both population and patient monitoring requires tools to detect HIV-1 genomes across its diversity and independently of mutation status. As the phenotype of a virus is the compound effect of polymorphisms present in a genome and as HIV therapy now targets proteins encoded by genes dispersed throughout the 9.7kb genome there is a need for high-performance HIV-1 whole genome sequencing. Next- generation sequencing (NGS) provides unprecedented possibilities for large-scale sequencing of virus genomes. A method for NGS of HIV-1 genomes of subtype B and analysis of minor variants has been described recently.

HIV diversity also represents a challenge to research into this virus biology as well as to research and development into its prevention, treatment and therapy. Research tools and therapeutic approaches based on targeting the viral genome, such as silencing RNA or targeted sequence editing using site directed nucleases such as Zinc Finger Nucleases or Transcription Activator-Like Nucleases (TALEN), would be improved by being universal.

Summary This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In one aspect, the invention provides reagents that bind to highly conserved HIV-1 nucleic acid sequences.

In one aspect, the invention provides a method for detection or genotyping of HIV-1 nucleic acid in a sample by using reagents that bind highly conserved HIV-1 sequences Such conserved HIV-1 nucleic acid sequences may be present on viral RNA, DNA, proviral DNA or integrated viral DNA, and are amenable to binding by external reagents.

Such conserved HIV-1 nucleic acid sequences may be any one of SEQ ID NOS: 1 to 16.

Binders of such conserved HIV-1 nucleic acid sequences may be any one of Sequence ID NOS: 17 to 24 or an oligonucleotide comprising at least a fragment of 5 contiguous nucleotides of any of these sequences.

In one aspect, the invention provides an oligonucleotide selected from the group consisting of: SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23 and SEQ ID NO: 24 or an oligonucleotide comprising at least a fragment of 5 contiguous nucleotides of any of these sequences.

In a second aspect, the invention provides a HIV-1 primer set comprising at least one forward primer comprising SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21 or SEQ ID NO: 23 or an oligonucleotide comprising at least a fragment of 5nucleotides of any of these sequences or an oligonucleotide consisting essentially of any of these sequences; and at least one reverse primer comprising SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22 or SEQ ID NO: 24 or an oligonucleotide comprising at least a fragment of 5 nucleotides of any of these sequences or an oligonucleotide consisting essentially of any of these sequences.

In one aspect, the invention provides use of one or more or all of the oligonucleotides described above in a nucleic acid amplification reaction or as a probe for the detection of HIV-1 nucleic acid in a sample. In one aspect, the invention provides a method for detection of HIV-1 nucleic acid in a sample, comprising carrying out a polymerase chain reaction using one or more or all HIV-1 primer sets described above and detecting the presence of an amplification product.

In one aspect, the invention provides a method for diagnosis of an individual as HIV-1 positive, the method comprising detecting the presence of HIV-1 nucleic acid in a sample from the individual by using one or more of the binders or oligonucleotides or one or more of the primer sets described above in a nucleic acid amplification reaction and detecting the presence of an amplification product or as a probe and detecting probe binding. In one aspect, the invention provides a method of detecting or diagnosing HIV-1 in a population infected with heterogeneous HIV-1 strains, the method comprising using one or more of the primer sets described above in a nucleic acid amplification reaction and detecting the presence of an amplification product or as a probe and detecting probe binding, wherein the same primer set can detect the heterogeneous HIV-1 strains present in the population.

In another aspect, the invention provides a method for determining the sequence of HIV-1 nucleic acids in a sample comprising:

(a) preparing cDNA (complementary Deoxyribonucleic Acid) of a sample to be detected;

(b) contacting the sample with amplification reagents and a primer set described above to form a reaction mixture;

(b) placing the reaction mixture under amplification conditions to form an amplification product;

(c) sequencing the amplification product; and

(d) assembling amplification products in the sample to determine the complete HIV-1 nucleic acid sequence. Methods for identifying the HIV-1 group or subtype in a clinical sample and for identifying the presence of mutations, such as mustations associated with drug resistance, in HIV-1 genotypes from clinical samples comprising determining the HIV-1 nucleic acid sequence in the sample as set out in the previous aspects and comparing the HIV-1 nucleic acid sequence in the sample with relevant reference sequences are also provided. In another aspect, the invention provides a kit for determining the sequence of HIV-1 nucleic acids in a sample which kit comprises a primer set of the second aspect or a combination of any two or more of these primer sets.

In another aspect, the invention provides a kit for detection of HIV-1 nucleic acids in a sample, comprising reagents that bind highly conserved HIV-1 sequences, such as one or more or all of oligonucleotides described above, and means for detecting the nucleic acids in the sample. The preferred features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the invention.

Brief Description of the Drawings

Embodiments of the invention will be described, by way of example, with reference to the following drawings, in which:

Figure 1 : Bayesian phylogeny of HIV-1 genome sequences derived in this study

Figure 2: Flowchart of the universal method for generation of HIV-1 genome sequences.

Figure 3: Detection of recombination in HIV-1 genomes sequenced in this study. a Genome view of the CRFs. b & c Bootscan analysis along the genome. d Similarity to different subtypes and CRF01_AE of the HIV-1 subtype consensus alignment.

Figure 4: Genetic diversity of HIV-1 genes, groups and subtypes with eight representative HIV-1 genomes sequenced in this study are shown in a circular format.

Figure 5: Genetic diversity of HIV-1 genes, groups and subtypes. Representative HIV-1 genomes of group O and N sequenced in this study are shown in a circular format and relative to the relevant reference sequence.

Figure 6: Broad application of the 'pan'-HIV-1 RT-PCR. The primer set amplifies HIV-1 genomes in four overlapping products.

Figure 7: Sensitivity of the 'pan'-HIV-1 RT-PCR. Figure 8: Broad application of the 'pan'-HIV-1 RT-PCR. The primer set amplifies HIV-1 genomes in four overlapping products of 1.9kb, 3.6kb, 3.0kb and 3.5kb. Sequences

Consensus sequences: Binding sites on viral RNA (5'-3') and on proviral DNA (5'-3')

SEQ ID NO: 1 - AGC CYG GGA GCT CTC TG

SEQ ID NO: 2 - AAA ATG ATA GGR GGA ATT GGA GG

SEQ ID NO: 3 - TTA AAA GAA AAG GGG GGA TTG GG

SEQ ID NO: 4 - CCT ATG GCA GGA AGA AGC G

SEQ ID NO: 5 - CGC TGA CGG TAC ARG CCA

SEQ ID NO: 6 - CCC TCA GAW GCT GCA TAW AAG

SEQ ID NO: 7 - GGG AAG TGA YAT AGC WGG AAC

SEQ ID NO: 8 - GAY TAT GGA AAA CAG ATG GCA G

Consensus sequences: Binding sites only on proviral DNA (5'-3')

SEQ ID NO: 9 - CAG AGA GCT CCC RGG CT

SEQ ID NO: 10 - CCT CCA ATT CCY CCT ATC ATT TT

SEQ ID NO: 1 1 - CCC AAT CCC CCC TTT TCT TTT AA

SEQ ID NO: 12 - CGC TTC TTC CTG CCA TAG G

SEQ ID NO: 13 - TGG CYT GTA CCG TCA GCG

SEQ ID NO: 14 - CTT WTA TGC AGC WTC TGA GGG

SEQ ID NO: 15 - GTT CCW GCT ATR TCA CTT CCC

SEQ ID NO: 16 - CTG CCA TCT GTT TTC CAT ART C Primer sequences (5'-3')

SEQ ID NO: 17 - AGC CYG GGA GCT CTC TG SEQID NO: 18 - CCT CCA ATT CCY CCT ATC ATT TT SEQID NO: 19 - GGG AAG TGA YAT AGC WGG AAC SEQ ID NO: 20 - CTG CCA TCT GTT TTC CAT ART C SEQ ID NO: 21 - TTA AAA GAA AAG GGG GGA TTG GG SEQ ID NO: 22 - TGG CYT GTA CCG TCA GCG SEQ ID NO: 23 - CCT ATG GCA GGA AGA AGC G SEQ ID NO: 24 - CTT WTA TGC AGC WTC TGA GGG

Other candidate conserved consensus sequences from table S3 (binding sites on viral RNA 5' -3')

SEQ ID NO: 25 - GCC TCA ATA AAG CTT GCC TTG A SEQ ID NO: 26 - CCC TCA RAT CAC TCT TTG GCA

SEQ ID NO: 27 - GGA AAG GTG AAG GRG CAG TA

SEQ ID NO: 28 - CCT ATG GCA GGA AGA AGC G

SEQ ID NO: 29- GAA CCC ACT GCT TAA RSC

SEQ ID NO: 30 - AYA CAG GRG CAG ATG ATA CAG T SEQ ID NO: 31 - AAA ATG ATA GGR GGA ATT GGA GG SEQ ID NO: 32 - TGG ACT GTC AAT GAY ATA CAR AAG SEQ ID NO: 33 - ATT GGA GGA AAT GAA CAA RTA GAY A SEQ ID NO: 34 - GAA TTT GGN ATH CCC TAC AAT CC SEQ ID NO: 35 - TTA AAA GAA AAG GGG GGA TTG GG SEQ ID NO: 36 - GGT TTA TTA CAG RGA CAG CAG A Detailed Description

Embodiments of the present invention are described below by way of example only. These examples represent the best ways of putting the invention into practice that are currently known to the Applicant although they are not the only ways in which this could be achieved. The invention is related to reagents capable of binding nucleic acid sequences that can be used in the field of virus diagnostics, more specifically the diagnosis of infections with the AIDS causing Human Immunodeficiency Virus (HIV). The present invention relates generally to universal conserved consensus sequences in HIV-1 genome that can be targeted by binding or hybridisation and which can be useful in methods for large-scale detection of HIV-1 genetic material including genome sequencing from the range of HIV-1 genetic groups and subtypes prevalent globally. Such new methods will be pivotal for large-scale studies of ongoing inter- and intra-host evolution of HIV-1.

Partly through selective pressure that constrains diversity and restricts evolution in part of the viral genome, some region of the viral genome will exhibit higher levels of sequences conservation.

Reagents aimed at detecting, genotyping or targeting HIV-1 nucleotide sequences should aim to react with highly conserved portion of the genome. However, not all conserved or semi-conserved sequences are open or accessible to binding by exogenous oligonucleotides or protein. For example RNA and DNA are known to exhibit 3-dimensional structures which can prevent or influence reagent binding. The inventors have identified highly conserved HIV-1 nucleic acid sequences present on viral RNA, DNA, proviral DNA and integrated viral DNA, and that are amenable to binding by external reagents. The inventors have also designed specific oligonucleotides and universal 'pan'-HIV-1 primer sets targeting semi-conserved regions of the HIV-1 genome. HIV- 1 nucleic acid sequence to be detected or amplified is usually only present in a sample (for example a blood sample obtained from a patient suspected of having a viral infection) in minute amounts. The oligonucleotides and primers should be sufficiently complementary to the target sequence to allow efficient reverse transcription and amplification of the viral nucleic acid present in the samples. Due to the heterogeneity of viral genomes false negative test results may be obtained if the primers and probes are capable of recognizing sequences present in only part of the variants of the virus. The HIV virus shows a high heterogeneity. Genetic variability has been demonstrated amongst isolates from different continents but also between individuals and between different stages of the disease within one individual.

The detection of all presently known subtypes of HIV-1 is of extreme importance, especially with regard to patient management, emergence of drug resistance, security of blood and blood products and clinical- and epidemiological studies. Current assays for the amplification and subsequent detection of HIV-1 derived nucleic acid sequences are usually based on amplification of sub genomic regions of the viral genome. These assays have been developed for subtype B, which is the major subtype in European countries and the United States, but only causes 10% of infections worldwide. However, the presence of other subtypes, which were geographically confined before, is increasing due to frequent travel between these countries and, for example, African countries. Sensitive assays are therefore needed that are capable of detecting as many variants of the HIV-1 virus as possible (preferably all).

Detection of the highly conserved HIV-1 sequences of the invention, such as any one of SEQ ID NOS: 1 to 16, allows for detection of many variants of the HIV-1 virus.

The highly conserved HIV-1 sequences of the invention may be viral RNA (5'-3') or proviral DNA (5'-3'), such as shown in any one or more of SEQ ID NOS: 1 to 8. SEQ ID NOS: 1 to 8 are described using DNA bases by convention but include RNA forms of the sequences (such as with U instead of T).

The highly conserved HIV-1 sequences of the invention may be proviral DNA (5'-3') as shown in any one or more of SEQ ID NOS: 9 to 16.

The reagents that bind the highly conserved HIV-1 sequences described herein include: oligonucleotides, such as the oligonucleotides of any one or more of SEQ ID NOS: 17 to 24; silencing RNAs; interfering RNAs; antisense oligonucleotides with or without modified bases containing morpholine ring

(morpholinos); - modified RNA nucleotides such as Locked Nucleic Acid (LNAs); and site directed DNA nucleases such as Zinc Finger Nuclease (ZFN) or TAL effector nucleases (TALENs). TALENs are created by combining custom-designed TALEs that recognizes specific target sequence with a non-specific nuclease domain. ZFNs are composed of a zinc finger DNA-binding domain and a DNA-cleavage domain. The reagents, such as oligonucleotide probes, that bind the highly conserved HIV-1 sequences bind the relevant sequences in a specific manner. Specific binding may refer to binding to the highly conserved sequences in preference to other HIV-1 or host or other nucleic acid sequences of a similar length as measured by a competitive binding assay.

"Specific hybridization" of a probe to a region of an HIV-1 polynucleic acid target means that said probe forms a duplex with part of this region or with the entire region under the experimental conditions used, and that under those conditions said probe does not form a duplex with other regions of the polynucleic acids present in the sample to be analysed. Suitably, the specific hybridisation of a probe to a nucleic acid target region occurs under stringent hybridisation conditions, such as 3X SSC, 0.1 % SDS, at 50°C. The skilled person knows how to vary the parameters of temperature, probe length and salt

concentration such that specific hybridisation can be achieved. Hybridization and wash conditions are well known and exemplified in Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N. Y., (1989), particularly Chapter 1 1 therein. When needed, slight modifications of the probes in length or in sequence can be carried out to maintain the specificity and sensitivity required under the given circumstances. Oligonucleotide probes which specifically hybridise to the target suitably are at least 95%

complementary to the target sequence over their length, suitably greater than 95% identical such as 96%, 97%, 98%, 99% and most preferably 100% complementary over their length to the target HIV-1 sequence. Suitably each nucleotide of the probe can form a hydrogen bond with its counterpart target nucleotide.

Uses of binders according to the invention include manipulation of HIV-1 expression or manipulation of HIV-1 genome for research or therapeutic purpose. In that occurrence, specific binders could include silencing RNA or gene editing reagents such as targeted zinc finger nucleases or Transcription

Activator-Like Effector Nucleases (TALENs).

With the present invention oligonucleotides which bind the highly conserved HIV-1 sequences are provided that can be used as primers and probes in the amplification and/ or detection of HIV-1 nucleic acid. Use of oligonucleotides which bind the highly conserved HIV-1 sequences in detection or diagnosis of HIV-1 also forms an aspect of the invention. It has been found that, by using the oligonucleotides of the present invention in methods for the amplification and detection of nucleic acid a sensitive and specific detection of HIV-1 can be obtained. The benefit of the sequences of the present invention primarily resides in the fact that, with the aid of primers and probes comprising the sequences according to the invention the nucleic acid of all presently known subtypes of HIV-1 can be detected. The use of the oligonucleotides according to the invention is not limited to any particular amplification technique or any particular modification thereof. It is evident that the oligonucleotides according to the invention find their use in many different nucleic acid amplification techniques and various methods for detecting the presence of nucleic acid of HIV. The oligonucleotides of the present invention can likewise be used in quantitative amplification methods. In applications such as TAQman PCR, the oligonucleotides may be useful in amplification and as probes. The oligonucleotides may also be useful as probes in applications like hybridisation and imaging.

Oligonucleotides

One aspect of the invention provides an oligonucleotide selected from the group consisting of: SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21 , SEQ ID NO: 22, SEQ ID NO: 23 and SEQ ID NO: 24 or an oligonucleotide comprising at least a fragment of 5, 6, 7, 8, 9 or 10 contiguous nucleotides of any of these sequences. One or more or all of these oligonucleotides may be used in a nucleic acid amplification reaction or as a probe for the detection of HIV-1 nucleic acid in a sample. All oligonucleotide sequences, such as SEQ ID NO: 17 to SEQ ID NO: 24, are described using DNA bases by convention but include RNA forms of these sequences (such as with U instead of T).

Primer Sets

In one embodiment, the HIV-1 primer set according to the invention comprises: (i) SEQ ID NO: 17 and SEQ ID NO: 18;

(ii) SEQ ID NO: 19 and SEQ ID NO: 20;

(iii) SEQ ID NO: 21 and SEQ ID NO: 22;

(iv) SEQ ID NO: 23 and SEQ ID NO: 24; or

(v) a combination of one or more primer sets in (i) to (iv) above. In one embodiment, the primer sets in (i) to (iv) above comprise an oligonucleotide comprising at least a fragment of 5, 6, 7, 8 , 9 or 10 contiguous nucleotides of any of SEQ ID NOS: 9 to 16 or an

oligonucleotide consisting essentially of any of these sequences in place of the respective sequence.

In one embodiment, the universal 'pan'-HIV-1 primer sets according to the invention comprises:

(i) SEQ ID NO: 17 and SEQ ID NO: 18; (ii) SEQ ID NO: 19 and SEQ ID NO: 20;

(iii) SEQ ID NO: 21 and SEQ ID NO: 22; or

(iv) SEQ ID NO: 23 and SEQ ID NO: 24.

The term "primer" as used herein refers to an oligonucleotide either naturally occurring (e.g. as a restriction fragment) or produced synthetically, which is capable of acting as a point of initiation of synthesis of a primer extension product which is complementary to a nucleic acid strand (template or target sequence) when placed under suitable conditions (e.g. buffer, salt, temperature and pH) in the presence of nucleotides and an agent for nucleic acid polymerization, such as DNA dependent or RNA dependent polymerase. A primer must be sufficiently long to prime the synthesis of extension products in the presence of an agent for polymerization. A typical primer contains at least about 10 nucleotides in length of a sequence substantially complementary or similar to the target sequence, but somewhat longer primers are preferred. Usually primers contain about 15-26 nucleotides but longer primers may also be employed, especially when the primers contain additional sequences such as a promoter sequence for a particular polymerase. Normally a set of primers will consist of at least two primers, one 'upstream' and one 'downstream' primer which together define the amplicon (the sequence that will be amplified using said primers).

It is understood that oligonucleotides consisting of the sequences of the present invention may contain minor deletions, additions and/or substitutions of nucleic acid bases (including RNA nucleotides such as inosine and modified RNA nucleotides as in locked nucleic acids "LNA") to the extent that such alterations do not negatively affect the yield, specificity or product obtained to a significant degree. Analogues of oligonucleotides can also be prepared. Such analogues may constitute alternative structures such as "PNA" (molecules with a peptide-like backbone instead of the phosphate sugar backbone of normal nucleic acid) or the like. It is evident that these alternative structures, representing the sequences of the present invention are likewise part of the present invention. Such commonly used modifications will be known to the skilled person and examples can be found on http://eu.idtdna.com/catalog/Modifications/ModificationHome. aspx.

Methods

A method for detection of HIV-1 nucleic acid in a sample, comprising detection of the highly conserved HIV-1 sequences of the invention, such as one or more of SEQ ID NOS: 1 to 16, by reagent (s) that bind the highly conserved HIV-1 sequence (s).

A method for detection of HIV-1 nucleic acid in a sample, comprising carrying out a polymerase chain reaction using one or more or all HIV-1 primer sets described above.

A method for diagnosis of an individual as HIV-1 positive, the method comprising detecting the presence of HIV-1 nucleic acid in a sample from the individual by using by reagent (s) that bind the highly conserved HIV-1 sequence (s), such as one or more or all of the oligonucleotides or primer sets as described above. The detection may be by means of a nucleic acid amplification reaction wherein the presence of an amplification product is detected. The detection may be by using the oligonucleotide (s) as a probe and detecting probe binding. A method of detecting or diagnosing HIV-1 in a population infected with heterogeneous HIV-1 strains, the method comprising using the reagent (s) that bind the highly conserved HIV-1 sequence (s), preferably one or more or all of the oligonucleotides or primer sets as described above, preferably the pan HIV-1 primer set described above, wherein the same reagent can detect the heterogeneous HIV-1 strains present in the population.

A method for determining the sequence of HIV-1 nucleic acids in a sample comprising:

(a) preparing DNA, cDNA (complementary Deoxyribonucleic Acid) of a sample to be detected or sequencing RNA directly;

(b) contacting the sample with amplification reagents and a primer set as described above to form a reaction mixture;

(b) placing the reaction mixture under amplification conditions to form an amplification product;

(c) sequencing the amplification product; and

(d) assembling amplification products in the sam ple to determine a partial or complete HIV-1 nucleic acid sequence. A method for identifying the HIV-1 group or subtype in a clinical sample, the method comprising determining the HIV-1 nucleic acid sequence in the sample using a primer set as described above and comparing the HIV-1 nucleic acid sequence in the sample with reference sequences of known HIV-1 groups and subtypes.

A method for identifying the presence of mutations in HIV-1 genotypes from clinical samples, the method comprising determining the HIV-1 nucleic acid sequence in the sample using a primer set as described above and comparing the HIV-1 nucleic acid sequence in the sample with a reference sequence or database of mutations. The mutation may be a drug resistance mutation. Mutations associated with HIV drug resistance are summarised in the Stanford HIV drug resistance database (httpvVrwd . Stanford e iu ) that can be used as a reference database to compare nucleic acid sequences in the sample against. Sample

The term "sample" as used herein, means anything suspected of containing an HIV target sequence. The test sample is or can be derived from any biological source, such as for example, blood, ocular lens fluid, cerebral spinal fluid, milk, ascites fluid, synovial fluid, peritoneal fluid, amniotic fluid, tissue, fermentation broths, cell cultures and the like. In one embodiment the sample comprises isolated nucleic acids, dried blood, plasma or whole blood.

The sample can be used (i) directly as obtained from the source or (ii) following a pre-treatment to modify the character of the sample. Thus, the test sample can be pre-treated prior to use by, for example, preparing plasma from blood, disrupting cells or viral particles, preparing liquids from solid materials, diluting viscous fluids, filtering liquids, distilling liquids, concentrating liquids, inactivating interfering components, adding reagents, purifying nucleic acids, and the like.

Amplification

Nucleic acid amplification procedures are well known in the art. Such reactions include, but are not intended to be limited to, the polymerase chain reaction (PCR), the ligase chain reaction (LCR) and gap LCR (GLCR). Generically, these exemplified amplification reactions generate multiple copies of a DNA target sequence. Amplification can be performed directly based on proviral DNA, and/or in light of the RNA nature of the HIV genome, the "RT-PCR" amplification procedure may be employed, (described in U. S. Pat. Nos. 5,322, 770 and 5,310, 652, both of which are herein incorporated by reference). A one- step reverse transcription polymerase chain reaction (RT-PCR) protocol may be employed to minimise sample handling and to enable processing and potential automation of large sample numbers.

Briefly, the RT-PCR format provides a method of transcribing a strand of DNA from an RNA target sequence. The copied DNA strand transcribed from the RNA target is commonly referred to as "cDNA" which then can serve as a template for amplification by any of the methods mentioned above. The process of generating cDNA shares many of the hybridization and extension principles surrounding other amplification methods such as PCR, but the enzyme employed should have reverse transcriptase activity. Enzymes having reverse transcriptase activity, as well as the RT PCR process, are well known to those of skill in the art. Additionally, other methods for synthesizing cDNA are also known and include U. S. Patent No. 5,686, 272, which is herein incorporated by reference. Sequencing

It is important for many questions to sequence the viral RNA genome and not the proviral DNA, as the former represents the current replication pool contributing to HIV-1 diversity and evolution. Any sequencing technique may be employed, for example: Roche 454 GS FLX+ or GS Junior - for whole genomes and fragmented DNA lllumina Genome Analyzer, HiSeq or MiSeq - for whole genomes and fragmented DNA

Ion torrent PGM or Proton - for whole genomes and fragmented DNA

Applied Biosystems Genetic Analyzer or SOLID System - for whole genomes and fragmented DNA Pacific Biosciences PacBioRS - for whole genomes and fragmented DNA

Helicos HeliScope - for whole genomes and fragmented DNA

Oxford Nanopore Technologies GridlON & MinlON - for whole genomes and fragmented DNA Complete Genomics - for whole genomes and fragmented DNA Assembly The amplification products may be assembled by de novo virus genome assembly or mapping of short reads against a reference sequence.. For example, a de novo assembly may be constructed using the GS De Novo Assembler version 2.6 (Roche/454 Life Sciences) Other programs such as SOAPdenovo, Velvet, Edena, ABySS, allpaths and Mira may be used. Overlapping contiguous sequences may be aligned using CAP3 17 and visually inspected in Se-AI version 2.0a1 1 (http://tree.bio.ed.ac.uk/software/seal/) to derive a consensus sequence. Sequence reads from multiple platforms can also be combined (e.g., lllumina + Roche 454) Probes

In one aspect of the invention a single probe to each target may be used. In another aspect of the invention a set of different probes for the same target may be used, for example to increase the sensitivity of the method. The probes may be labelled with any suitable label, such as a radioactive or fluorescent label. The term "label" as used herein means a molecule or moiety having a property or characteristic which is capable of detection. A label can be directly detectable, as with, for example, radioisotopes, fluorophores, chemiluminophores, enzymes, colloidal particles, fluorescent microparticles and the like; or a label may be indirectly detectable, as with, for example, specific binding members. Kits

The present invention further provides test kits for the amplification and detection of HIV nucleic acid. The use of said test-kits enables accurate and sensitive screening of samples suspected of containing HIV derived nucleic acid. Furthermore the test-kit may contain suitable amplification reagents. A kit for determining the sequence of HIV-1 nucleic acids in a sample, the kit comprising reagents,

oligonucleotides or primer sets described above. The kits may further comprise suitable amplification reagents. The kits may further comprise means for detecting the nucleic acids in the sample, such as a detectable label attached to the oligonucleotide.

The kits may comprise one or more suitable containers containing one or more reagents such as primer/probe sets according to the present invention, an enzyme having polymerase activity, deoxynucleotide triphosphates and, optionally, an enzyme having reverse transcriptase activity.

The reagents are for example the suitable enzymes for carrying out the amplification reaction. Said enzymes may be present in the kit in a buffered solution but can likewise be provided as a lyophilized composition. The kit may further be furnished with buffer compositions, suitable for carrying out an amplification reaction. Said buffers may be optimized for the particular amplification technique for which the kit is intended as well as for use with the particular oligonucleotides that are provided with the kit. The reagents may include enzyme cofactors such as magnesium; salts; nicotinamide adenine dinucleotide (NAD); and deoxynucleotide triphosphates (dNTPs), such as for example deoxyadenine triphosphate, deoxyguanine triphosphate, deoxycytosine triphosphate and deoxythymine triphosphate. It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. Although various embodiments have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.

It will be understood that particular aspects and embodiments described herein are shown by way of illustration and not as limitations of the invention. The principal features of this invention can be employed in various embodiments without departing from the scope of the invention. Those skilled in the art will recognize, or be able to ascertain using no more than routine study, numerous equivalents to the specific procedures described herein. Such equivalents are considered to be within the scope of this invention and are covered by the claims. All publications and patent applications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference. The use of the word "a" or "an" when used in conjunction with the term "comprising" in the claims and/or the specification may mean "one," but it is also consistent with the meaning of "one or more," "at least one," and "one or more than one." The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or." Throughout this application, the term "about" is used to indicate that a value includes the inherent variation of error forthe device, the method being employed to determine the value, or the variation that exists among the study subjects.

As used in this specification and claim(s), the words "comprising" (and any form of comprising, such as "comprise" and "comprises"), "having" (and any form of having, such as "have" and "has"), "including" (and any form of including, such as "includes" and "include") or "containing" (and any form of containing, such as "contains" and "contain") are inclusive or open-ended and do not exclude additional, unrecited elements or method steps. In one aspect such open ended terms also comprise within their scope a restricted or closed definition, for example such as "consisting essentially of, or "consisting of. The term "or combinations thereof" as used herein refers to all permutations and combinations of the listed items preceding the term. For example, "A, B, C, or combinations thereof is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, CBA, BCA, ACB, BAC, or CAB. Continuing with this example, expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth. The skilled artisan will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.

All of the compositions and/or methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.

All documents referred to herein are incorporated by reference to the fullest extent permissible.

Any element of a disclosure is explicitly contemplated in combination with any other element of a disclosure, unless otherwise apparent from the context of the application.

The present invention is further described by reference to the following examples, not limiting upon the present invention.

Example

The present invention is hereby illustrated with the following non limiting example: Universal amplification, next-generation sequencing and assembly of HIV-1 genomes ABSTRACT Whole HIV-1 genome sequences are pivotal for large-scale studies of inter- and intra-host evolution, including the acquisition of drug resistance mutations. The ability to rapidly and cost-effectively generate large numbers of HIV-1 genome sequences from different populations and geographical locations and determine the effect of minority genetic variants is, however, a limiting factor. Next-generation sequencing promises to bridge this gap but is hindered by the lack of methods for enrichment of virus genomes across the phylogenetic breadth of HIV-1 and methods for robust assembly of the virus genomes from short read data. Here we report a method for amplification, next-generation sequencing and unbiased de novo assembly of HIV-1 genomes of groups M, N and O, as well as recombinants, that does not require prior knowledge about sequence or subtype. A sensitivity of at least 3,000 copies/ml was determined using plasma virus samples of known copy number. We applied our novel method to compare genome diversity of HIV-1 groups, subtypes and genes. The highest diversity was found in the env, net, vpr, tat, rev and parts of the gag gene. Furthermore, we used our method to investigate mutations associated with HIV-1 drug resistance in clinical samples at the level of the complete genome. Drug resistance mutations were detected as both major variant and minor species. In conclusion, we demonstrate the feasibility of our method for large-scale HIV-1 genome sequencing. This will enable phylogenetic and phylodynamic resolution of the ongoing pandemic and efficient monitoring of complex HIV-1 drug resistance genotypes.

INTRODUCTION

Next-generation sequencing (NGS) provides unprecedented possibilities for large-scale sequencing of virus genomes. Sequencing of RNA viruses such as the Human Immunodeficiency Virus 1 (HIV-1) depends on reverse transcription and amplification to enrich virus genomes to the amounts of DNA required for NGS. Furthermore, short read sequence assembly of RNA virus and HIV-1 genomes is complicated by high population diversity (17) due to error-prone RNA polymerases and reverse transcriptases respectively and high rates of virus genome replication.

HIV-1 is one of the most genetically diverse viruses known. Four genetic groups have been described; the major group M, which causes ~85% (7) of ~34 million infections worldwide (9) and is further divided into nine subtypes (A-D, F-H, J and K) (15), the outlier group O (3, 6, 31), the non-major and non-outlier group N (28), and another recently designated group P (23). Up to 35% amino acid differences between subtypes are found, and strains belonging to the same subtype can vary by up to 20% (7). In addition, inter-subtype recombination is common (25). Circulating recombinant forms (CRFs), found in three or more epidemiologically unlinked individuals, and unique recombinant forms (URFs), identified in less than three individuals, consist of mosaic genomes with sections of two or more subtypes. To date 51 CRFs have been identified.

The ability to generate HIV-1 genome sequences is crucial for understanding the dynamics of the pandemic at the population level, and viral diversification, including acquisition of drug resistance mutations, in individual patients. As the phenotype of a virus is the compound effect of polymorphisms present in a genome and as HIV therapy now targets proteins encoded by genes dispersed throughout the 9.7kb genome there is a need for high-performance HIV-1 whole genome sequencing. A method for NGS of HIV-1 genomes of subtype B and analysis of minor variants has been described recently (8), but due to the remarkable degree of sequence heterogeneity, a universal method for generation of large numbers of HIV-1 genome sequences has remained elusive.

Here we present a novel method for rapid and reliable amplification, NGS and assembly of HIV-1 genomes that does not necessitate prior knowledge of HIV-1 group or subtype. We apply the method to investigate genetic diversity of HIV-1 genes, groups and subtypes, as well as mutations associated with HIV-1 drug resistance in clinical samples. MATERIALS AND METHODS

Samples and RNA extraction. Virus strains and isolates were from the National Institute for Biological Standards and Control at the Health Protection Agency or University College London. Residual ETDA plasma samples sent to the Department of Clinical Microbiology and Virology, University College London Hospitals NHS Foundation Trust for routine genotypic analysis and samples provided by Imperial College London, were anonymised before analysis (Table S1). Viral RNA was purified with a QIAamp® Viral RNA Mini kit (Qiagen).

Primer design and one-step reverse transcription polymerase chain reaction (RT-PCR). A pan -

HIV-1 primer set for amplification of HIV-1 genomes of all groups and subtypes was designed based on 1496 sequences of the 2009 'web alignment' from the Los Alamos HIV Sequence Database (Table 1). One-step RT-PCRs generating overlapping amplicons of 1 .9kb, 3.6kb, 3kb and 3.5kb were performed using a Superscript®! 11 One-Step RT-PCR System with Platinum® Tag DNA Polymerase High Fidelity (Invitrogen). Each 25μΙ reaction contained 12.5μΙ reaction mix (2x), 4.5μΙ RNase-free water, 1 μΙ each of each primer (20 pmol/ul), 1 μΙ SuperScriptlll RT/Platinum Tag High Fidelity Mix and 5μΙ of template RNA. Cycling conditions were 50°C 30min, 94°C 2min, 35 cycles of 94°C 15s, 58°C 30s, 68°C 4min 30s, and finally 68°C 10min. Amplicons were verified by agarose gel electrophoresis, and quantified using Quant- iT™ PicoGreen® dsDNA Reagent (Invitrogen).

Roche/454 sequencing, quality control and read assembly. Amplicons were pooled in equimolar amounts, and 500 ng of DNA was sequenced using a Genome Sequencer FLX Titanium XL+ Instrument (Roche/454 Life Sciences) (19). Up to 17 samples were sequenced on ¼ of a PicoTiterPlate using Multiplex Identifier adaptors (MIDs) as previously described (4). SFF files were de-multiplexed and converted to FASTQ files, primer sequences removed, and quality control (removing reads < 200bp, trimming low quality bases from the 3'-end of reads until the median quality of the read was > 30) was performed using QUASR (http://sourceforge.net/projects/quasr/). A de novo assembly was constructed using the GS De Novo Assembler version 2.6 (Roche/454 Life Sciences). Overlapping contiguous sequences were aligned using CAP3 (1 1) and visually inspected in Se-AI version 2.0a1 1 (http://tree.bio.ed.ac.uk/software/seal/) to derive a consensus sequence.

Phylogenetic analysis. A reference set of 29 full genome sequences was obtained. A total of 25 representative HIV-1 sequences from each group and subtype, two SIV cpz and two SIV gor were selected. The sequences have the following GenBank/EMBL/DDBJ accession numbers: AB253421 , AB287379, K03455, AY423387, EF469243, AB254141 , K03454, DQ054367, U54771 , AB253423, FJ771010, AB485658, AF084936, AY612637, AF190127, FJ71 1703, GU237072, AF082394, AJ249235, AJ249239, L20571 , AJ302647, AJ006022, AJ271370, GU1 1 1555, DQ373066, AF103818, FJ424866, FJ424863. Sequences derived in this study were aligned with the reference set using MAFFT version 6.857b (14), and the alignment was manually curated. A Bayesian phylogeny was reconstructed using MrBayes version 3.2 (12) under the General Time Reversible model of nucleotide substitution with proportion of invariable sites and gamma-distributed rate heterogeneity, as determined by jModelTest version 0.1.1. (24). The Markov chain Monte Carlo search was set to 50,000,000 iterations, with trees sampled every 2500 th generation and a 20% burn-in being discarded. Multiple chains were run to check chain convergence. The tree was edited in FigTree version 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Detection of HIV-1 subtypes and recombination. The Rega HIV-1 Subtyping Tool version 2.0 (http://dbpartners.stanford.edu/RegaSubtyping/) and the Recombinant Identification Program version 3.0 (http://www.hiv.lanl.gov/content/sequence/RIP/RIP.html) were used to identify the subtype of sequences and the presence of recombinants. Recombination was verified with the manual bootscan method (27) implemented in the Recombination Detection program version 3.42 (20, 21).

Visualisation of HIV-1 genome diversity. The software Circos (16) was used to visualise the diversity of HIV-1 genes, groups and subtypes.

Drug resistance mutations. The consensus sequence for each sample was used as a sample-specific reference sequence for mapping of reads using BWA (18). To analyse minor species, read depth and frequencies of mutations associated with drug resistance as outlined in the Stanford HIV drug resistance database (http://hivdb.stanford.edu/) were calculated using custom Python scripts. Accession numbers for sequencing data. The Roche/454 Life Sciences sequencing data obtained in this study is available from the EMBL GenBank®/DDBJ Sequence Read Archive under study accession number ERP001257.

RESULTS AND DISCUSSION

Design of a protocol for 'pan'-HIV-1 RT-PCR, NGS and assembly of HIV-1 genomes. We designed a 'pan'-HIV-1 primer set targeting semi-conserved regions of the genome, based on ~1500 HIV-1 genome sequences (Table 1 , Figure 2 and Supplementary Table 3). This novel primer set reverse transcribes and amplifies the amino acid coding region and partial long terminal repeats (LTRs) of the HIV-1 genome in four overlapping products. We chose a one-step RT-PCR protocol and MID-tagged library preparation to minimise sample handling and to enable processing and potential automation of large sample numbers. We did not apply the primer ID approach that addresses the problems of resampling, PCR error and recombination, differential amplification and sequencing error (13), as it cannot be adapted easily to a one-step RT-PCR protocol using long amplicons. It is important for many questions to sequence the viral RNA genome and not the proviral DNA, as the former represents the current replication pool contributing to HIV-1 diversity and evolution. We combined our 'pan'-HIV-1 RT- PCR method with Roche/454 sequencing and de novo virus genome assembly as reference genome- based assembly introduces biases (2). Therefore, our new method can be adapted readily to other RNA viruses, as primer sets (34) or algorithms to design them (33) become available. Not all conserved or semi-conserved sequences are open or accessible to binding by exogenous oligonucleotides or protein. For example RNA and DNA are known to exhibit 3-dimensional structures which can prevent or influence reagent binding. Supplementary Table 3 highlights a series of candidate sequences identified as semi-conserved HIV-1 nucleic acid sequences present on viral RNA and DNA proviral DNA and integrated viral DNA (column B). Primers designed against these sequences were tested in the experiments described below. The amplification obtained when using some of these primers demonstrated that a subset of these sequences (column E) could be efficiently targeted by external reagents. Moreover, these sequences were suitable for binding by oligonucleotides under different experimental conditions and temperatures such as in reverse transcription as well as in PCR. The experiments below showing that targeting these sequences lead to amplification of HIV-1 products in all samples tested also demonstrate that these conserved consensus sequences were indeed conserved across all the HIV-1 diversity tested.

Broad application of the protocol. As a first step, we used the WHO HIV-1 genotype reference panel (10) to validate our primer set. The panel comprises 10 RNA samples of HIV-1 groups M, N and O and various subtypes, and we successfully amplified all four products from each sample (Fig. 6). An in silico PCR was performed using the 468 HIV-1 genome sequences in Genbank of sufficient length to cover all primer sites (>= 9300 bp) to investigate computationally the function of the primer set across a broad range of HIV-1 diversity (Fig. 8). This virtual PCR showed that the primer set could successfully amplify full genomes from 394 (84%) of these viruses, while 74 (16%) failed to bind at least one primer and would not be expected to yield full genome coverage.

As a second step, we reverse transcribed, amplified and sequenced 15 RNA samples of various HIV-1 groups and subtypes as well as CRFs, ranging from cell-culture reference-subtype virus strains, primary clinical isolates, to uncultured EDTA plasma virus samples. Virus strains were selected to be representative of the HIV-1 diversity spectrum and rare subtypes/groups were also included, while primary isolates and plasma virus samples were randomly selected. Short read data was pre-processed using QUASR (http://sourceforge.net/projects/quasr/) and de novo genome assemblies were performed. For all samples, the complete amino acid coding region of the HIV-1 genome was covered. Sequences of all major subtypes of the group M, as well as group N and O, were generated by NGS. Bayesian phylogenetic analysis of the assembled HIV-1 genomes together with a reference set of complete HIV-1 and SIV genomes showed NGS and assembly was successful across the range of HIV-1 diversity (Fig.

1) . Previously sequenced HIV-1 strains were included in the analysis. GenBank sequences and sequences obtained by NGS showed nucleotide identities of 98.8 - 99.4% (Table S1). As expected, clear phylogenetic clustering of GenBank sequence and sequence obtained by NGS was found for both strain #5 (group N) and strain #6 (group O). The minimal differences reflect accumulated genetic variation from culture of these HIV-1 strains since original isolation. We also sequenced two CRFs, CRF01_AE and CRF14_BG, and used the derived HIV-1 genomes to verify recombination (Fig. 3). The ability to amplify and sequence CRFs using one set of 'pan'-HIV-1 primers is of central importance as CRFs play a significant role in regional epidemics and new CRFs continue to emerge (7, 30L.E RE ...1 .

Specificity. For specificity tests, we used RNA or DNA from plasma of patients infected with other blood-borne viruses im portant for differential diagnosis of HIV-1 , and samples from healthy individuals. All samples, including nucleic acid obtained from plasma positive for HIV-2, cytomegalovirus and hepatitis C virus with various viral loads, failed to amplify the HIV-1 per products (Table S1 ). Sensitivity. Sensitivity of the RT-PCR was determined using a total of 90 plasma virus samples of known copy numbers, ranging from 3,000 - 2,800,000 copies/ml (Fig. 7). We were able to amplify all four products from 84/90 samples (93.33%) and determined a sensitivity of at least 3,000 copies/ml. From the remaining 6 samples (6.66%) that did not have specifically low viral loads we successfully amplified three products. This sensitivity of 3,000 copies/m l is sufficient for many samples, but a higher sensitivity would be preferred for clinical utility. Nevertheless, there is the possibility to extract RNA from a larger volume of plasma if the detection limit of the protocol described here is reached.

Reproducibility and accuracy. To investigate the reproducibility and accuracy of our method, we repeated both emulsion PCR and sequencing run for the complete set of 15 samples, using the Roche/454 libraries as a starting material. The consensus sequences that were derived were 100% identical to the consensus sequences generated by the first sequencing run (Table S1). The frequency of mutations associated with drug resistance in clinical samples differed by a maximum of 1.1 % (Table

2) . The clinical impact of these minor species is as yet undetermined, but they can represent a reservoir for the emergence of resistant viruses (32). While the exact frequencies of minor species are clearly of research interest, mutations that imply drug resistance should be considered regardless of their frequencies.

Frequency of in vitro recombination. To determine the frequency of in vitro recombination and the accuracy of frequencies of minor species, we prepared a 1 :1 mix of RNA from two plasma samples containing HIV-1 subtype B (viral load 360,000 copies/ml) and C (viral load 120,000 copies/ml), respectively. We amplified one amplicon from this mixed population using primer set 1 , endpoint diluted the amplicon, and re-amplified it in 96 well-plates so that less than 20% of the wells yielded a product (26). We sequenced 64 re-amplified products separately and included amplicons derived from the non- mixed RNA of subtypes B and C as controls. For each sequence, we determined the subtype and controlled for recombination. We found 46/64 (71.88%) products of subtype B, 17/64 (26.56%) products of subtype C, and a single product with an intersubtype B-C recombinant sequence corresponding to an in vitro recombination rate of 1.56% (Table S2). Therefore, both in vitro recombination and differential amplification did not represent a major problem when applying this protocol. However, in vitro recombination can occur during PCR amplification and depends on various factors such as the amount of template DNA and the specific PCR conditions (22). NGS deep sequencing studies need to take into account and aim to minimise in vitro recombination, as it can be mistaken for viral recombination or lead to an over-estimation of viral diversity.

Genetic diversity of HIV-1 genes, groups and subtypes. To further validate our method, we compared and visualised genetic diversity of HIV-1 in different genes, and between groups and subtypes (Fig. 4, Fig. 5). We found variation across the genome sequences where we expected it, and all open reading frames were maintained. Sequence similarity of different HIV-1 groups and subtypes to the subtype B reference strain HXB2 was in particular low in the env, net, vpr, tat, rev and parts of the gag gene, while the pol gene showed the highest inter-subtype similarity. These results are consistent with a previous study that obtained a comprehensive map of positive selection, as well as functional and structural constraints of the HIV-1 subtype B genome (29). The group O and N sequences differed most from the subtype B reference strain HXB2 and are shown relative to the relevant reference sequence in Fig. 5. Large-scale HIV-1 genome sequencing applying our new method will enable combined analysis of host and viral genetic information for the range of HIV-1 subtypes. Mutations associated with HIV-1 drug resistance. Highly active a nti retroviral therapy is one of the most potent selective pressures on the HIV-1 genome. We used our method to investigate mutations associated with drug resistance in clinical samples (Table 2). Rigorous quality control was performed so that the median quality score of each read was > 30, corresponding to a base call accuracy of 99.9%. We suggest that a minimum average coverage of 500-fold is necessary for the reliable identification of minor species, as a cut-off of 1% (5/500 reads) was used for the lowest frequency of a variant to be considered genuine. This level is conservatively well in excess of the next-generation sequencing error rate of 0.1% (1 , 4). Between 40 (88.8%, plasma #4) and 44 (97.7%, plasma #1) of the 45 positions of interest were sequenced at > 500-fold coverage. While drug resistance mutations were present in two plasma virus samples as the major variant (i.e. the consensus sequence) additional minor species with frequencies between 1% and 46.9% were also identified at various positions. In contrast, the HIV-1 strain 92UG037 that was included as a control only exhibited minority D67N or K219Q/E mutations. These are secondary thymidine analogue mutations that occur in untreated populations (5). We expect that cost-effective monitoring of complex drug resistance genotypes will be most efficiently achieved by NGS of complete HIV-1 genomes as current antivirals target the HIV-1 protease, reverse transcriptase, integrase and envelope (gp41), and CCR5/CXCR4 tropism needs to be considered as well.

In conclusion, we demonstrate the feasibility and potential for large-scale HIV-1 genome sequencing from the range of HIV-1 genetic groups and subtypes prevalent globally. We expect this new method will be pivotal for large-scale studies of ongoing inter- and intra-host evolution of HIV-1. It has the potential to set a new standard for clinical management of HIV infection by combining the detection of minor drug resistance mutations of clinical significance, as well as covering gene targets of all present and future drug classes across the entire HIV-1 genome.

REFERENCES

1. Archer, J., G. Baillie, S. J. Watson, P. Kellam, A. Rambaut, and D. L. Robertson. 2012. Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II. Bmc Bioinformatics 13.

2. Baker, M. 2012. Structural variation: the genome's hidden architecture. Nat Methods 9:133-137. Charneau, P., A. M. Borman, C. Quillent, D. Guetard, S. Chamaret, J. Cohen, G. Remy, L. Montagnier, and F. Clavel. 1994. Isolation and envelope sequence of a highly divergent HIV-1 isolate: definition of a new HIV-1 group. Virology 205:247-253.

Gall, A., S. Kaye, S. Hue, D. Bonsall, R. Ranee, G. J. Baillie, S. Fidler, J. N. Weber, M. O. McClure, and P. Kellam. 2012. Restriction of sequence diversity in the V3 region of the HIV-1 envelope gene during a nti retroviral treatment in a cohort of recent seroconverters. Retrovirology, in press.

Garcia-Lerma, J. G., H. Maclnnes, D. Bennett, H. Weinstock, and W. Heneine. 2004. Transmitted human immunodeficiency virus type 1 carrying the D67N or K219Q/E mutation evolves rapidly to zidovudine resistance in vitro and shows a high replicative fitness in the presence of zidovudine. Journal of virology 78:7545-7552.

Gurtler, L. G., P. H. Hauser, J. Eberle, A. von Brunn, S. Knapp, L. Zekeng, J. M. Tsague, and L. Kaptue. 1994. A new subtype of human immunodeficiency virus type 1 (MVP-5180) from Cameroon. Journal of virology 68:1581-1585.

Hemelaar, J., E. Gouws, P. D. Ghys, and S. Osmanov. 2006. Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004. Aids 20:W13-23.

Henn, M. R., C. L. Boutwell, P. Charlebois, N. J. Lennon, K. A. Power, A. R. Macalalad, A. M. Berlin, C. M. Malboeuf, E. M. Ryan, S. Gnerre, M. C. Zody, R. L. Erlich, L. M. Green, A. Berical, Y. Wang, M. Casali, H. Streeck, A. K. Bloom, T. Dudek, D. Tully, R. Newman, K. L. Axten, A. D. Gladden, L. Battis, M. Kemper, Q. Zeng, T. P. Shea, S. Gujja, C. Zedlack, O. Gasser, C. Brander, C. Hess, H. F. Gunthard, Z. L. Brumme, C. J. Brumme, S. Bazner, J. Rychert, J. P. Tinsley, K. H. Mayer, E. Rosenberg, F. Pereyra, J. Z. Levin, S. K. Young, H. Jessen, M. Altfeld, B. W. Birren, B. D. Walker, and T. M. Allen. 2012. Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection. PLoS Pathog 8:e1002529.

HIV/AIDS, J. U. N. P. o. 201 1. UNAIDS World Aids Day Report. Holmes, H., C. Davis, and A. Heath. 2008. Development of the 1st International Reference Panel for HIV-1 RNA genotypes for use in nucleic acid-based techniques. J Virol Methods 154:86-91. Huang, X., and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome research 9:868-877. Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755. Jabara, C. B., C. D. Jones, J. Roach, J. A. Anderson, and R. Swanstrom. 2011. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. P Natl Acad Sci USA 108:20166-20171. Katoh, K., K. Misawa, K. Kuma, and T. Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059-3066. Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science (New York, N.Y 288:1789-1796. Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, and M. A. Marra. 2009. Circos: an information aesthetic for comparative genomics. Genome research 19:1639-1645. Lauring, A. S., and R. Andino. 2010. Quasispecies theory and the behavior of RNA viruses. PLoS Pathog 6:e1001005. Li, H, and R. Durbin. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589-595. Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B.

Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-380. Martin, D. P., D. Posada, K. A. Crandall, and C. Williamson. 2005. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. Aids Res Hum Retrov 21 :98-102. Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 21 :260-262. Meyerhans, A., J. P. Vartanian, and S. Wainhobson. 1990. DNA Recombination during Per. Nucleic Acids Res 18:1687-1691. Plantier, J. C, M. Leoz, J. E. Dickerson, F. De Oliveira, F. Cordonnier, V. Lemee, F. Damond, D. L. Robertson, and F. Simon. 2009. A new human immunodeficiency virus derived from gorillas. Nature medicine 15:871-872. Posada, D. 2008. jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253-1256. Robertson, D. L, P. M. Sharp, F. E. McCutchan, and B. H. Hahn. 1995. Recombination in HIV- 1. Nature 374:124-126. Salazar-Gonzalez, J. F., M. G. Salazar, B. F. Keele, G. H. Learn, E. E. Giorgi, H. Li, J. M. Decker, S. Y. Wang, J. Baalwa, M. H. Kraus, N. F. Parrish, K. S. Shaw, M. B. Guffey, K. J. Bar,

K. L. Davis, C. Ochsenbauer-Jambor, J. C. Kappes, M. S. Saag, M. S. Cohen, J. Mulenga, C. A. Derdeyn, S. Allen, E. Hunter, M. Markowitz, P. Hraber, A. S. Perelson, T. Bhattacharya, B. F. Haynes, B. T. Korber, B. H. Hahn, and G. M. Shaw. 2009. Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection. J Exp Med 206:1273-1289. 27. Salminen, M. O., J. K. Carr, D. S. Burke, and F. E. Mccutchan. 1995. Identification of Breakpoints in Intergenotypic Recombinants of Hiv Type-1 by Bootscanning. Aids Res Hum Retrov 11 :1423-1425.

28. Simon, F., P. Mauclere, P. Roques, I. Loussert-Ajaka, M. C. Muller-Trutwin, S. Saragosti, M. C.

Georges-Courbot, F. Barre-Sinoussi, and F. Brun-Vezinet. 1998. Identification of a new human immunodeficiency virus type 1 distinct from group M and group O. Nature medicine 4:1032- 1037.

29. Snoeck, J., J. Fellay, I. Bartha, D. C. Douek, and A. Telenti. 2011. Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints. Retrovirology 8:87.

30. Taylor, B. S., M. E. Sobieszczyk, F. E. McCutchan, and S. M. Hammer. 2008. The challenge of HIV-1 subtype diversity. N Engl J Med 358:1590-1602.

31. Vanden Haesevelde, M., J. L. Decourt, R. J. De Leys, B. Vanderborght, G. van der Groen, H. van Heuverswijn, and E. Saman. 1994. Genomic cloning and complete sequence analysis of a highly divergent African human immunodeficiency virus isolate. Journal of virology 68:1586-

1596.

32. Westby, M., M. Lewis, J. Whitcomb, M. Youle, A. L. Pozniak, I. T. James, T. M. Jenkins, M.

Perros, and E. van der Ryst. 2006. Emergence of CXCR4-Using human immunodeficiency virus type 1 (HIV-1) variants in a minority of HIV-1 -Infected patients following treatment with the CCR5 antagonist maraviroc is from a pretreatment CXCR4-using virus reservoir. Journal of virology 80:4909-4920.

33. Yu, Q., E. M. Ryan, T. M. Allen, B. W. Birren, M. R. Henn, and N. J. Lennon. 201 1. PriSM: a primer selection and matching tool for amplification and sequencing of viral genomes. Bioinformatics 27:266-267. 34. Zhou, B., M. E. Donnelly, D. T. Scholes, K. St George, M. Hatta, Y. Kawaoka, and D. E.

Wentworth. 2009. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. Journal of virology 83:10309-10313.

TABLES

TABLE 1 Primers used in this study

Primer Sequence (5'-3') Position Product size 3

Set 1

pan- -HIV-1 AGC CYG GGA GCT CTC TG (SEQ ID 26- -42 1928 bp pan- -HIV-1 _1R CCT CCA ATT CCY CCT ATC ATT TT 1953- -1931

Set 2

pan- -HIV-1 2F GGG AAG TGA YAT AGC WGG AAC 1031- -1051 3574 bp pan- -HIV-1 2R CTG CCA TCT GTT TTC CAT ART C 4604- -4583

Set 3

pan- -HIV-1 3F TTA AAA GAA AAG GGG GGA TTG GG 4329- -4351 3066 bp pan- -HIV-1 3R TGG CYT GTA CCG TCA GCG (SEQ ID 7394- -7377

Set 4

pan- -HIV-1 _4F CCT ATG GCA GGA AGA AGC G (SEQ 5513- -5531 3551 bp pan- -HIV-1 4R CTT WTA TGC AGC WTC TGA GGG 9063- -9043

a According to HIV-1 reference strain HXB2 (accession number NC001802).

TABLE 2 Mutations associated with HIV-1 drug resistance

Amino Mutation Frequency of mutation (%) c

Codon

acid of associated

a

wildtype with drug Plasma # 1 Plasma #2 Plasma #3 Plasma Strain

#4 92UG037 sequences resistance b

Protease inhibitor

46 M IL 1.4/0.3 - - 1.0/0.7 -

47 I VA - - 0.9/1.1 1.0/0.6 -

50 I V - - 1.1/0.8 1.0/0.7 -

82 V ATFSL 1.2/2.1 - - - -

Nucleoside reverse transcriptase inhibitor

65 K R - 46.9/47.0 - - -

67 D N 8.2 /8.0 1.5 / 1.4 7.2/7.6 9.7/10.2 4.6/4.1

115 Y F - 1.2/0.7 1.3 / 1.7 - -

184 M VI 0.7/1.2 66.2/65.8 - - -

215 T YF - 1.0/0.9 - - -

219 K QE 2.2/1.7 - 2.8/2.9 5.7/6.1 5.5/4.9

Non-nucleoside reverse transcriptase inhibitor

101 K EP - 1.9/2.4 - - -

103 K NS - 38.1 /38.5 - - -

181 Y CIV - 38.9 /38.4 - 91.4/ -

190 G ASE 59.9/59.2 91.2/

Integrase inhibitor

no mutations associated with drug resistance found

Fusion inhibitor (enfuvirtide)

42 N T 1.2 -

"According to HIV-1 reference strain HXB2 (accession number

b Only mutations that were found at frequencies of > 1% in at least one sample are

c Frequencies determined by two independent sequencing runs are shown. TABLE SI Samples, RT-PCR and next-generation sequencing results

Nucleotide identity

RT-PCR result Accession number

Group/ of GenBank

Viral load

Sample Strain 2 Subtype/ Origin 0 sequence and

(copies/ml) a

CRF a sequence obtained

Set Set Set Set GenBank/ ENA (this study)"

b NGS (%) a 1 2 3 4 EMBL/

Run 1 Run 2

DDBf

Virus strains - Strain #1 92UG037 nd UCL +++ + +++ + U51190 ERS100441 ERS163954 99.4 Strain #2 98TZ017 C nd NIBSC +++ +++ +++ +++ AF286235 ERS100442 ERS163955 99.3 Strain #3 94UG114 D nd NIBSC +++ +++ +++ +++ U88824 ERS100443 ERS164035 99.3 Strain #4 93BR020 F nd UCL +++ ++ +++ +++ AF005494 ERS100444 ERS163957 99.0 Strain #5 YBF30 N nd NIBSC +++ +++ +++ +++ AJ006022 ERS100445 ERS163958 99.1 Strain #6 MVP5180 O nd NIBSC +++ ++ +++ +++ L20571 ERS100446 ERS164036 98.8 Strain #7 nd O nd UCL +++ ++ +++ +++ nd ERS100447 ERS163960 nd Strain #8 nd CRF01 AE nd UCL nd ERS100448 ERS163961 nd Strain #9 X1870 CRF14 BG nd NIBSC +++ ++ nd ERS100449 ERS163962 nd

Primary isolates - ΗΓ7-1

Primary isolate #1 C nd UCL +++ +++ +++ nd E S 100450 ERS163963 nd Primary isolate #2 D nd UCL +++ +++ +++ nd ERS 100451 ERS163964 nd

Plasma samples

Plasma #1 - fflV-1 B 3,000 UCLH +++ nd ERS 100452 ERS 163965 nd Plasma #2 - HIV- 1 C 30,000 UCLH +++ nd ERS 100453 ERS163966 nd Plasma #3 - fflV-1 B 100,000 UCLH +++ nd ERS 100454 ERS163967 nd Plasma #4 - HIV- 1 A 300,000 UCLH +++ nd ERS 100455 ERS163968 nd Plasma #5 - ΗΓ7-2 nd 180,000 UCLH nd nd nd nd Plasma #6 - ΗΓ7-2 nd 1,077 UCLH nd nd nd nd Plasma #7 - ΗΓ7-2 nd 454 UCLH nd nd nd nd Plasma #8 - ΗΓ7-2 nd 112 UCLH nd nd nd nd Plasma #9 - CMV nd 340 UCLH nd nd nd nd Plasma #10 - CM V nd 440 UCLH nd nd nd nd Plasma #11 - CMV nd 450 UCLH nd nd nd nd Plasma #12 - CMV nd 390 UCLH nd nd nd nd

Plasma #13 HCV nd nd 0 b UCLH nd nd nd nd Plasma #14 HCV nd nd 46 b UCLH nd nd nd nd Plasma #15· HCV nd nd 5,142,560 UCLH nd nd nd nd Plasma #16 - HCV nd nd 10,582,400 UCLH nd nd nd nd Plasma #17 - negative nd nd 0 UCLH nd nd nd nd Plasma #18 ■ negative nd nd 0 UCLH nd nd nd nd Plasma #19 ■ negative nd nd 0 UCLH nd nd nd nd Plasma #20 ■ negative nd nd 0 UCLH nd nd nd nd

" nd, not determined.

TABLE S2 Frequency of in vitro recombination

Sample' NGS result Accession number

J

Per product #1 ERS 163969 Per product #2 B ERS 163970 Per product #3 B ERS 163971 Per product #4 C ERS 163972 Per product #5 B ERS 163973 Per product #6 B ERS 163974 Per product #7 B ERS 163975 Per product #8 C ERS 163976 Per product #9 B ERS 163977 Per product #10 B ERS 163978 Per product #11 B ERS163979 Per product #12 B ERS 163980 Per product #13 C ERS163981 Per product #14 B ERS 163982 Per product #15 C ERS 163983 Per product #16 B ERS 163984 Per product #17 B ERS 163983 Per product #18 B ERS 163984 Per product #19 B ERS 163987 Per product #20 C ERS 163988 Per product #21 B ERS 163989 Per product #22 B ERS 163990 Per product #23 B ERS163991 Per product #24 B ERS 163992 Per product #25 B ERS 163993 Per product #26 B ERS 163994 Per product #27 C ERS 163995 Per product #28 B ERS 163996 Per product #29 B ERS 163997 Per product #30 B ERS 163998 Per product #31 B ERS 163999 Per product #32 B ERS 164000 Per product #3 C ERS 164001 Per product #34 B ERS 164002 Per product #35 B ERS 164003 Per product #36 C ERS 164004 Per product #37 C ERS 164005 Per product #38 B ERS 164006 Per product #39 B ERS 164007 Per product #40 B ERS 164008 Per product #41 C ERS 164009 Per product #42 C ERS164010 Per product #43 B ERS164011 Per product #44 C ERS 164012 Per product #45 C ERS 164013 Per product #46 B ERS164014 Per product #47 C ERS164015 Per product #48 B ERS164016 Per product #49 B ERS164017 Per product #50 B ERS164018 Per product #51 B ERS164019 Per product #52 B ERS 164020 Per product #53 B ERS 164021 Per product #54 B ERS 164022 Per product #55 B ERS 164023 Per product #56 B ERS 164024 Per product #57 B ERS 164025 Per product #58 B ERS 164026 Per product #59 B ERS 164027

Per product #60 B:C ERS 164028 Per product #61 C ERS 164029 Per product #62 B ERS 164030 Per product #63 B ERS 164031 Per product #64 C ERS 164032 Amplicon HIV-1 subtype B B (control) ERS 164033 Amplicon HIV-1 subtype C C (control) ERS 164034

Primer set 1 was used. Table S3

Semi-conserved regions - based on alignment HIV-1_ALL_2009_GENOME_DNA_primer.bio

The table below describes a series of conserved candidate target HIV-1 sequences for binding by external reagents (colum E the ones that were validated and shown to lead to amplification products using oligonucleotide primers. These highly con defined by their consensus sequences and their position on the standard HIV-1 genome reference in Genbank : NC001802

Figure legends

Figure 1 Bayesian phylogeny of HIV-1 genome sequences derived in this study. As a reference set, 25 HIV-1 full genome sequences representing all groups and subtypes, two SIV cpz and two SIV gor , were included. New NGS sequences produced in this study are highlighted. Bayesian posterior probabilities are indicated on the corresponding nodes. The tree is midpoint-rooted. The scale bar represents the number of nucleotide substitutions per site.

Figure 2 Flowchart of the universal method for generation of HIV-1 genome sequences. QUASR is freely available at http://sourceforge.net/projects/quasr/.

Figure 3 Detection of recombination in HIV-1 genomes sequenced in this study. Two CRFs, CRF01_AE (left) and CRF14_BG (right) were sequenced and analysed with (a-c) the Rega HIV-1 Subtyping Tool version 2.0 and (d) the Recombinant Identification Program version 3.0. a Genome view of the CRFs. Regions with bootstrap confidence of > 70% are coloured according to the subtype, with the same colour scheme as in b and c. b & c Bootscan analysis along the genome. A sliding window of 400 nucleotides was used. For each window, the bootstrap support for the different subtypes and the appropriate CRF is plotted. d Similarity to different subtypes and CRF01_AE of the HIV-1 subtype consensus alignment. A sliding window of 400 nucleotides was used.

Figure 4 Genetic diversity of HIV-1 genes, groups and subtypes. Eight representative HIV-1 genomes sequenced in this study are shown in a circular format. They are displayed in the inner tracks and represent subtypes A, B, C, D, A/E and F, as well as groups N and O (from the inside to the outside of the plot). Sequences are compared to the subtype B reference strain HXB2 (acc. no. NC001802) shown as a continuous bar on the outside. For each sequence, every nucleotide differing from the reference strain is shown as a dark-grey line, insertions and deletions are shown as light-grey lines. The positions of primer sets 1 to 4 for the 'pan'-HIV-1 RT-PCR are shown as lines in the reference strain. The genomes contain the complete coding sequence of HIV-1 , a complete U5 and partial R region of the 5'- LTR, and a partial U3 region of the 3 -LTR. The outer track shows the individual open reading frames as bars as well as a scale bar. Figure 5 Genetic diversity of HIV-1 genes, groups and subtypes. Representative HIV-1 genomes of group O and N sequenced in this study are shown in a circular format and relative to the relevant reference sequence. They are displayed in the inner tracks. For each sequence, every nucleotide differing from the reference strain is shown as a blue line, an insertion is shown in orange and a deletion in green. The positions of primer sets 1 to 4 for the 'pan'-HIV-1 RT-PCR are shown as coloured lines in the reference strain. The genomes contain the complete coding sequence of HIV-1 , a complete U5 and partial R region of the 5 -LTR, and a partial U3 region of the 3'-LTR. The outer tracks show the open reading frames in as bars and a scale bar. a HIV-1 genome sequence of group O (strain #7) obtained by NGS is compared to the group O reference strain MVP5180 (acc. no. L20571) shown in grey. b HIV-1 genome sequence of group N reference strain YBF30 obtained by NGS (strain #5) is compared to the YBF30 GenBank sequence (acc. no. AJ006022) shown in grey. The sequences are 99.1% identical.

Figure 6 Broad application of the 'pan'-HIV-1 RT-PCR. The primer set amplifies HIV-1 genomes in four overlapping products.

RNA of the WHO HIV-1 genotype reference panel was reverse transcribed and amplified, and 5 ul per product were analysed by 1% agarose gel electrophoresis. a Gel electrophoresis of 1.9kb products amplified by primer set 1. b Gel electrophoresis of 3.6kb products amplified by primer set 2. c Gel electrophoresis of 3.0 kb products amplified by primer set 3. d Gel electrophoresis of 3.5kb products amplified by primer set 4.

Figure 7 Sensitivity of the 'pan'-HIV-1 RT-PCR. The primer set amplifies HIV-1 genomes in four overlapping products. A total of 90 uncultured plasma virus samples were tested. a Viral loads of samples with successful amplification of 4/4 amplicons. b Viral loads of samples with successful amplification of 3/4 amplicons.

Figure 8 Broad application of the 'pan'-HIV-1 RT-PCR. The primer set amplifies HIV-1 genomes in four overlapping products of 1.9kb, 3.6kb, 3.0kb and 3.5kb. Results of an in silico PCR testing the primer set against representative HIV-1 genome sequences of subtypes most important for the HIV-1 epidemiology are shown. Grey bars represent the amplicons and circles stand for the primers. The number of mismatches between a primer and the respective sequence is indicated by the colour of the circle. As primers contain ambiguous bases and all possible combinations are shown separately, more than one amplicon and primer set per product are displayed. a In silico PCR against subtype A1 strain (accession number AB253421). b In silico PCR against subtype B strain (accession number K03455). c In silico PCR against subtype C strain (accession number AB485645). d In silico PCR against CRF01_AE strain (accession number U54771). e In silico PCR against CRF02_AG strain (accession number L39106). f In silico PCR against group O strain (accession number L20587).