Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD TO AMPLIFY DNA SEQUENCES FROM DEGRADED SOURCES
Document Type and Number:
WIPO Patent Application WO/2017/027975
Kind Code:
A1
Abstract:
A two stage nested multiplex PCR method is described to amplify DNA sequences from degraded specimens. The method of the invention is for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA barcode for recognizing known species and for discovery of species yet to be named.

Inventors:
PROSSER SEAN (CA)
HEBERT PAUL (CA)
DEWAARD JEREMY (CA)
Application Number:
PCT/CA2016/050970
Publication Date:
February 23, 2017
Filing Date:
August 18, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV GUELPH (CA)
International Classes:
C12Q1/68
Other References:
STRUTZENBERGER, P. ET AL.: "DNA Barcode Sequencing from Old Type Specimens as a Tool in Taxonomy: A Case Study in the Diverse Genus Eois (Lepidoptera: Geometridae)", PLOS ONE, vol. 7, no. 11, 21 November 2012 (2012-11-21), pages e49710-1 - e49710-7, XP055365211, ISSN: 1932-6203
SHOKRALLA, S; ET AL.: "Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens", MOLECULAR ECOLOGY RESOURCES, vol. 14, 2014, pages 892 - 901, XP055365213, ISSN: 1755-098X
SHOKRALLA, S; ET AL.: "Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform", SCIENTIFIC REPORTS, vol. 5, no. 9687, 17 April 2015 (2015-04-17), XP055329315, ISSN: 2045-2322
STILLER, M; ET AL.: "Direct multiplex sequencing (DMPS) - A novel method for targeted high-throughput sequencing of ancient and highly degraded DNA", GENOME RESEARCH, vol. 19, no. 10, 1 October 2009 (2009-10-01), pages 1843 - 1848, XP002689876, ISSN: 1088-9051
HEBERT, PN; ET AL.: "A DNA 'Barcode Blitz': Rapid Digitization and Sequencing of a Natural History Collection", PLOS ONE., vol. 8, no. 7, July 2013 (2013-07-01), pages e68535-1 - e68535-14, XP055365219, ISSN: 1932-6203
Attorney, Agent or Firm:
BARTOSZEWICZ, Lola A. et al. (CA)
Download PDF:
Claims:
Claims:

1. A two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that hybridize to portions of the barcode sequence that can pair in any combination to generate a plurality of amplicons that span the entire barcode sequence while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing; and NGS for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.

2. The method of claim 1, wherein said two step multiplex nested PCR co-amplifies fragments covering the barcode region.

3. The method of claim 2, wherein said two step multiplex nested PCR comprises: a first multiplex PCR1 wherein said primers that hybridize to the barcode sequence and simultaneously blocking undesired elongation to form a plurality of amplicons; and a second multiplex PCR2 utilizing the amplicons from PCR1 as a template and a plurality of primers that are adapter-tailed,

wherein in PCR1 forward primers are selected for all downstream reverse primers.

4. The method of claim 3, wherein said primers of PCR1 are tailed with short, non- complementary sequences.

5. The method of any one of claims 1 to 4, wherein said specimen contains at least 0.1 ng of degraded DNA, such as at least 0.5 ng, 1 ng, 10 ng, 100 ng, 500 ng, or from 2μg-5μg of degraded DNA.

6. The method of any one of claims 1 to 5, wherein said barcode sequence is the 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene (COI).

7. The method of any one of claims 1 to 5, wherein said barcode sequence is matK or rbcL for identifying plants.

8. A method to generate a plurality of redundant amplicons for a target degenerated DNA sequence, the method comprising:

(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;

(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID tags that hybridize to the amplicon products of (a), and

(c) pooling the products from (b).

9. The method of claim 8, further comprising removing undesired genomic DNA, primer dimers and/or residual primers.

10. The method of claim 8 or 9, further comprising performing next generation sequencing to the pooled products from (c).

11. A method for amplifying and characterizing a barcode region from the cytochrome c oxidase 1 gene (COI) in a small specimen of degraded DNA using multiplex PCR, the method comprising:

- extracting the degraded DNA to provide a linear template;

- performing first multiplex nested PCRl using a plurality of forward primers and downstream reverse primers that hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;

- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCRl reaction as a template using adapted tailed primers with optional multiplex identifier tags (MID) that hybridize to portions of said amplicons to generate a more degenerate larger pool of amplicons,

- pooling all amplicon products, said amplicon products spanning and overlapping the entire length of said barcode region; and

- performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.

12. The method of claim 11 furthering comprising comparing said determined barcode sequence to a bank of characterized sequences to determine the species of said specimen.

13. A method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA to identify the phylogeny of said specimen, the method comprising;

- extracting linear degraded DNA from said specimen;

- performing two step multiplex nested PCR on said linear degraded DNA using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;

- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and

- classifying said specimen.

14. A kit for performing two step multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode sequence, buffers, optional stabilizers, enzymes and instructions for use.

15. A method for amplifying degraded DNA, the method comprising:

amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and

amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.

16. The method of claim 15, wherein the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.

17. The method of claim 16, wherein the block elongation moieties comprise non- complementary tails.

18. The method of any one of claims 15 to 17, comprising from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.

19. The method of claim 18, comprising 6 forward primers (Fl, F2, F3, F4, F5, and F6) and 6 reverse primers (Rl, R2, R3, R4, R5, and R6).

20. The method of claim 21, wherein for PCR1, Fl, F3, and F5 are paired with Rl, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with Rl, R2, R3, R4, and R5.

21. The method of claim 20, wherein for PCR2, Fl is paired with Rl, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6; and F6 is paired with R6.

22. The method of any one of claims 15 to 21, wherein the primers for PCR2 comprise adapter tailed primers for sequencing.

23. The method of any one of claims 15 to 22, wherein the primers are degenerate.

24. A method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.

25. A method of amplifying a barcode region of a degraded DNA sample, the method comprising:

performing at least a PCRla reaction and a PCRlb reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCRla complement of amplicons and a PCRlb complement of amplicons,

wherein the plurality of forward primers comprise primers Fi, F2, ... , Fn, in order from upstream to downstream of the target sequence, wherein n is a whole number;

wherein the plurality of reverse primers comprise primers Ri, R2, ... , Rm, in order from upstream to downstream of the target sequence, wherein m is a whole number;

wherein the plurality of reverse primers are downstream of Fi and the plurality of forward primers are upstream of Rn; wherein the PCRla reaction comprises each odd-numbered forward primer starting with Fi and further comprises all or substantially all of the reverse primers; and

wherein the PCRlb reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.

26. The method of claim 25, wherein the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.

27. The method of claim 26, wherein the block elongation moieties comprise non- complementary tails.

28. The method of any one of claims 25 to 27, further comprising performing a plurality of PCR2 reactions, PCR2i, PCR22, ... PCR2n, to amplify the PCRla and PCRlb complements of amplicons,

wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and

wherein the PCRla complement of amplicons are amplified using odd-numbered forward primers and wherein the PCRlb complement of amplicons are amplified using even- numbered forward primers.

29. The method of claim 28, further comprising pooling the resulting amplicons.

30. The method of claim 28 or 29, wherein the primers for PCR2 are adapter-tailed for sequence analysis.

31. The method of any one of claims 28 to 30, wherein the primers for PCR2 are MID- tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.

32. The method of any one of claims 25 to 31, wherein n is from 2-10, such as 6.

33. The method of any one of claims 25 to 32, wherein m is from 2-10, such as 6.

34. The method of any one of claims 25 to 33, wherein the forward and reverse primers are as defined in Table 4.

35. The method of any one of claims 1 to 34, wherein the template is not depleted through use of the method.

36. A method of amplifying degraded DNA according to the scheme shown in Figures 2a and 2b herein.

37. The method of any one of claims 1 to 36, for taxonomic classification of unknown specimens.

38. The method of any one of claims 1 to 37, wherein the primers are degenerate.

39. The method of any one of claims 1 to 38, for analyzing a plurality of specimens simultaneously.

40. The method of any one of claims 1 to 39, wherein the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 2μg to about 5μg of degraded DNA.

Description:
Method to Amplify DNA Sequences from Degraded Sources

Field of the Invention

The invention relates to a method to amplify DNA sequences from degraded sources using a combination approach involving NGS (next generation sequencing). More specifically, the method of the invention is a two-stage multiplex PCR (polymerase chain reaction) and NGS approach for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA barcode for recognizing known species and for discovery of species yet to be named. The invention further relates to kits and systems for carrying out such method.

Background of the Invention

Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavorable for DNA preservation, success in sequence recovery has been uncertain.

The immense repositories of identified specimens in the world's natural history museums provide the opportunity to construct a DNA barcode reference library that can subsequently be used to identify newly collected specimens [1,2]. However, the scientific value of this library would be greatly enhanced if each species was represented by sequences from its type material, particularly the holotype. Without such information, there are many cases in which the correct application of taxon names is uncertain. For example, the analysis of type(s) is critical when the study of modern specimens suggests synonymy (e.g. [3]) or when it indicates that a long-known species is actually a complex of two or more morphologically similar taxa (e.g. [4]). The recovery of a barcode sequence from type material is also essential when it represents the only known record(s) for a taxon - a situation that is surprisingly common [5].

Early studies have recovered sequence information from museum specimens, including beetles [6,7], flies [8,9,10], true bugs [11], and moths [12,13]. Some of these investigations analyzed specimens that were relatively young (<50 years), while others extracted DNA from whole specimens. However, Hausmann et al [12] and Rougerie et al [14] recovered barcode sequences from a single leg of type specimens more than 100 years old with a protocol that required six PCRs and twelve sequencing reactions (see details in [15]). Strutzenberger et al [16] reduced costs by processing specimens in batches of 95, but the basic protocol was unchanged, requiring substantial template DNA and careful inspection of data to ensure that contamination among wells had not produced chimeric sequences. As well, the failure of any single reaction led to an incomplete sequence for the barcode region.

Prior studies have often encountered difficulty in recovering sequence information from old museum specimens because of DNA degradation [17,18]. While protocols have improved, there are still important constraints [12,13,15,16]. Past studies have generally employed several PCR reactions to generate a set of short amplicons that were Sanger sequenced and assembled into a barcode record. When many amplification reactions are required, as in cases where difficulties in primer binding are encountered, template can be depleted before sequence is recovered. There is no easy solution because DNA extracts are small (<50 μί) and concentrations are low (typically <0.5 pg/μί) so dilution is rarely feasible [4,14]. As a consequence, sequence recovery from many type specimens is not currently possible.

Next-generation sequencers (NGS) are increasingly used for studies on both freshly collected and museum specimens [e.g. 19]. Work on fresh specimens has shown that the barcode region can be recovered from hundreds of individuals at a time by using multiplex identifier (MID) tags to associate the sequence records from each specimen [20,21]. However, there are still issues with preferential amplification of certain fragments and inefficient amplification that leads to the inability to sequence the full target sequence. Taken together with the challenges of sequencing very small specimen size that contains heavily degraded DNA, it is desirable to provide a method that overcomes at least one disadvantage of known sequencing protocols.

Summary of the Invention

The present invention provides methods, systems and kits that are useful for recovering sequences from degraded DNA present in a sample. When maintained within optimal archival conditions, DNA is highly stable and predicted to be viable for several millennia. Within the ambient environment however, or when exposed to particular stressors such as extreme heat, desiccation, irradiation, or known mutagenic compounds, genomic DNA breaks down rapidly and severely. For various applications and settings, this can prohibit genetic analyses when the quantity and quality of remaining DNA falls below the sensitivity thresholds of current analytical equipment and procedures. Specimens held in biomedical or natural history collections degrade rapidly over time, particularly when stored in compounds such as formalin, paraffin, or low concentration ethanol; forensic cases and environmental samples involving trace quantities (i.e. 'eDNA') can be inhibited by ultraviolet exposure or diluted beyond detection; and processed and manufactured animal and timber products may endure severe temperatures and desiccation, rendering the DNA (and source organism) imperceptible.

The present invention has been made to solve at least one foregoing problem of the prior art and therefore an aspect of the present invention is to provide a method for amplifying and characterizing DNA sequences from small sample amounts containing degraded DNA in an efficient, rapid and economical manner.

In aspects the method comprises a two stage nested multiplex PCR/NGS approach that is effective for amplification of a desired DNA sequence from a small sample of degraded DNA resulting in efficient, unbiased co-amplification of fragments spanning a desired gene region, in aspects a barcode region of a gene.

The present method has the advantage of requiring very little template DNA and providing protection against the failure of any particular amplification reaction due to the novel initial two step multiplex PCR developed approach. The method uses relatively few primers, however the primers are allowed to pair in any combination as opposed to being restricted to specific pairs - all while avoiding common pitfalls (e.g. overlap amplification, primer incorporation and primer dimer sequencing). This is accomplished without the use of special enzymes beyond standard polymerases.

In one aspect, the invention relates to target-specific primers and compositions comprising such primers useful for the selective amplification of one or more target sequences associated with a barcode region in degraded DNA.

Primers are selected with respect to the target sequence to be amplified and the condition of the degraded DNA, that is, primers of about 150bp or more can be utilized to target degraded DNA. However it is understood by one of skill in the art that primers can be designed shorter in order to be able to target shorter segments of degraded DNA where there is limited sequence. The method of the invention can be used to detect and amplify highly degraded DNA in specimens where even no DNA could be detected by other methods.

The present method may recover full-length barcodes from type specimens with heavily degraded DNA by employing a two-step multiplex PCR to generate short amplicons covering the barcode region and then using NGS for their characterization, i.e. sequencing. In this manner the entire barcode region of a desired gene from a small specimen containing degraded DNA can be characterized.

The method of the invention is scalable and widely applicable, that is, has a taxonomic breadth. The method encompasses amplification of DNA over a wide variety of diverse animal groups. It has been scaled to work on 96 samples simultaneously with good success rates, and may be scaled further to several hundred sample simultaneously.

The method of the invention can be used with various conditions of DNA degradation (e.g. samples decades to centuries old, formalin-fixed, fluid-preserved, or processed) and still lead to successful DNA amplification and in aspects, barcode recovery.

As the method is quick and cost-effective to sequence degraded DNA from very limited sources, this method has good potential in a variety of areas for example to researchers, food safety officials, forensic investigators, wildlife enforcement officers, biomedical technicians and so forth.

The effectiveness of the present method has been validated by recovering sequences from century-old specimens of Lepidoptera, including those where Sanger analysis completely failed. Importantly, in aspects, this two stage multiplex PCR/NGS method escapes problems that often confront Sanger analysis, such as uncertain primer binding, amplification bias, and/or the need for large amounts of template DNA.

According to an aspect of the invention there is provided a method comprising a two- step multiplex PCR followed by NGS to sequence degraded DNA.

According to an aspect of the invention the method comprises two stages, one stage involving a two-step multiplex PCR and the other stage comprising NGS to

recover/characterize the sequence of a barcode region in the sample comprising degraded DNA. In aspects the barcode region is of the cytochrome c oxidase I gene.

According to another aspect of the invention there is provided a method comprising multiplex nested PCR to form a plurality of amplicons from a degraded DNA source. NGS is then utilized to recover the sequence from the plurality of degenerate amplicons generated by the two stage multiplex nested PCR, in aspects to characterize a barcode region of a gene.

According to another aspect of the invention there is provided a method comprising multiplex nested PCR, the method comprising performing a two-stage nested PCR on a sample containing degraded DNA. The present invention is based, in part, on the novel use of two stages of specific hybridization between a homologous region in a probe and the complementary sequence in a nucleic acid template of the degraded DNA, each of which is followed by extension of the probe by DNA synthesis. The second stage utilizes the products of the first stage as a template.

In aspects, the method of the invention substantially reduces the formation of spurious reaction products in multiplex amplification reactions of large numbers of specific degraded nucleic acid sequences.

In aspects the present invention provides novel compositions useful in substantially reducing the formation of spurious reaction products in two part multiplex amplification reactions of large numbers of specific nucleic acid sequences from degraded DNA.

According to another aspect of the invention there is provided a multiplex PCR assay mixture for amplification of a target degraded DNA, the mixture comprising a combination of a plurality of primer sets wherein a number of the primer sets are nested. In aspects, a number of the primer sets are lObp and adapter-tailed primers. In further aspects, the primers (forward and reverse) include degeneracy at sites important for primer binding, i.e. 3' terminus for forward primer and 5' terminus for reverse primer, such that 12 forward and reverse primers provide a composition comprising 2010 primers.

According to an aspect of the invention there is provided a two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that span the entire barcode sequence that can pair in any combination to generate a plurality of amplicons while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing; and NGS for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.

In aspects the two step multiplex nested PCR utilizes primers that target non- adjacent fragments of the target sequence in each of the steps. Furthermore, the primers in the first step are designed such that undesired elongation is blocked (in one aspect are non-tailed) and are selected further to be paired with the next downstream reverse primers. The primers in the second step are adapter-tailed primers and may further incorporate a MID tag.

According to an aspect of the invention there is provided a method to generate redundant amplicons for a target DNA sequence of degenerated DNA, the method comprising:

(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy; (b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID tags that hybridize to the amplicon products of (a),

(c) repeating step (a) and then (b); and

(d) pooling the products from (c).

In aspects the method then further comprises performing next generation sequencing to the pooled products from (d). The pooled products from (d) are optionally cleaned to remove any undesired genomic DNA, primer dimers and/or residual primers.

In aspects, undesired elongation in the first step of multiplex PCR can be achieved through various mechanisms such as use of non-complementary tails on the PCRl primers or with the use of any type of agent that blocks elongation from the 5' end of the primers, i.e. chemical conjugation.

According to another aspect of the present invention is a method for amplifying a barcode region from the cytochrome c oxidase 1 gene (COI) from a small specimen of degraded DNA using multiplex PCR, the method comprising:

- extracting the degraded DNA to provide a linear template;

- performing first multiplex nested PCRl using a plurality of forward primers and downstream reverse primers that include degeneracy and hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;

- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCRl reaction as a template using adapted tailed primers that hybridize to portions of said amplicons,

- pooling all amplicon products; and

- performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.

The multiplex PCR described herein is desirably performed under suitable conditions for hybridization.

According to an aspect of the invention is a method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA to identify the taxonomic classification of said specimen, the method comprising;

- extracting linear degraded DNA from said specimen; - performing two step multiplex nested PCR on said linear degraded DNA using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;

- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and

- classifying said specimen.

According to another aspect of the invention is a kit for performing multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode region of said COI gene, suitable buffers, reaction nucleotides, enzymes, optional stabilizers and instructions for use. In aspects, kits can be designed for any specimen type depending on the target gene of interest for amplification and sequencing.

According to another aspect, there is provided a method for amplifying degraded DNA, the method comprising:

amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and

amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.

In an aspect, the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.

In an aspect, the block elongation moieties comprise non-complementary tails.

In an aspect, the method comprises from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.

In an aspect, the method comprises 6 forward primers (Fl, F2, F3, F4, F5, and F6) and 6 reverse primers (Rl, R2, R3, R4, R5, and R6).

In an aspect, for PCR1, Fl, F3, and F5 are paired with Rl, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with Rl, R2, R3, R4, and R5.

In an aspect, for PCR2, Fl is paired with Rl, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6; and F6 is paired with R6.

In an aspect, the primers for PCR2 comprise adapter tailed primers for sequencing. In an aspect, the primers are degenerate.

According to an aspect, there is provided a method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattem of DNA degradation in the target sequence.

In accordance with an aspect, there is provided a method of amplifying a barcode region of a degraded DNA sample, the method comprising:

performing at least a PCRla reaction and a PCRlb reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCRla complement of amplicons and a PCRlb complement of amplicons,

wherein the plurality of forward primers comprise primers Fi, F 2 , ... , F n , in order from upstream to downstream of the target sequence, wherein n is a whole number;

wherein the plurality of reverse primers comprise primers Ri, R2, ... , Rm, in order from upstream to downstream of the target sequence, wherein m is a whole number;

wherein the plurality of reverse primers are downstream of Fi and the plurality of forward primers are upstream of R n ;

wherein the PCRla reaction comprises each odd-numbered forward primer starting with Fi and further comprises all or substantially all of the reverse primers; and

wherein the PCRlb reaction comprises each even-numbered forward primer starting with F 2 and further comprises all or substantially all of the reverse primers that are upstream of F 2 .

In an aspect, the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.

In an aspect, the block elongation moieties comprise non-complementary tails.

In an aspect, the method further comprises performing a plurality of PCR2 reactions, PCR2i, PCR2 2 , ... PCR2 n , to amplify the PCRla and PCRlb complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and

wherein the PCRla complement of amplicons are amplified using odd-numbered forward primers and wherein the PCRlb complement of amplicons are amplified using even- numbered forward primers.

In an aspect, the method further comprises pooling the resulting amplicons. In an aspect, the primers for PCR2 are adapter-tailed for sequence analysis.

In an aspect, the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.

In an aspect, n is from 2-10, such as 6.

In an aspect, m is from 2-10, such as 6.

In an aspect, the forward and reverse primers are as defined in Table 4.

In an aspect, the template is not depleted through use of the method.

In accordance with an aspect, there is provided a method of amplifying degraded DNA according to the scheme shown in Figures 2a and 2b herein.

In an aspect, the method is for taxonomic classification of unknown specimens.

In an aspect, the primers are degenerate.

In an aspect, the method is for analyzing a plurality of specimens simultaneously.

In an aspect, the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 2μg to about 5μg of degraded DNA.

The practice of the present subject matter may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art. Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, polymerization techniques, chemical and physical analysis of polymer particles, preparation of nucleic acid libraries, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be used by reference to the examples provided herein. Other equivalent conventional procedures can also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008); Merkus, Particle Size Measurements (Springer, 2009); Rubinstein and Colby, Polymer Physics (Oxford University Press, 2003); and the like.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which these inventions belong. All patents, patent applications, published applications, and other publications referred to herein, both supra and infra, are incorporated by reference in their entirety. If a definition and/or description is set forth herein that is contrary to or otherwise inconsistent with any definition set forth in the patents, patent applications, published applications, and other publications that are herein incorporated by reference, the definition and/or description set forth herein prevails over the definition that is incorporated by reference.

As used herein, the terms "comprises," "comprising," "includes," "including," "has," "having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of features is not necessarily limited only to those features but may include other features not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive-or and not to an exclusive-or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

Brief Description of the Drawings

The above and other aspects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

Figure 1 schematically depicts the two stage multiplex nested PCR/NGS methodology of the present invention.

Figure 2 schematically depicts primer positions for the first and second rounds of PCR (a) and all possible final amplicons (b). The initial round of PCR (PCR1) includes two separate reactions (a - above broken line) using lObp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR (PCR2) includes six separate reactions (a - below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID tags unique to that specimen. For increased multiplexing, each reverse PCR2 primer can also be tailed with a MID tag allowing a large number of possible combinations (e.g. adding 96 unique MID tags to the forward primers and 4 unique MID tags to the reverse primers allows 384 specimens to be multiplexed and individually tracked). Figure 3 shows the recovery of sequences from ten type specimens in each of three DNA categories, (a) Number of reads; (b) Per base coverage; (c) Number of base pairs (bp) recovered via NGS. HQ - high quality; MQ - medium quality, LQ - low quality. Mean (horizontal black line), standard deviation (edges of box), min and max (whiskers,♦) are shown. The horizontal broken line in (c) represents a full-length (658bp) barcode.

Figure 4 shows a neighbor-joining tree showing 100% concordance between sequences generated from type specimens using NGS and Sanger sequencing. For each species, BOLD Process IDs are shown for both the Sanger and NGS-generated (outlined in red) sequences.

Figure 5 shows a neighbor- Joining tree of barcodes generated century-old type specimens and contemporary congeneric taxa (where available). Barcodes from the 26 century- old specimens (outlined in red) were generated via NGS. Four cases involve confirmed or suspected synonymy: Celerna amplimargo and C. lerne, Aeolochroma caesia and A. saturataria, Sarcinodes subvirgata and S. holzi, Pingasa furvifrons and P. nobilis.

Figure 6 schematically shows the alignments of sequence records derived from two type specimens of Geometridae, one with high quality DNA (a) and one with low quality DNA (b). The alignments show only a single representative of each distinct sequence. In many cases, there were hundreds or thousands of a particular sequence. High quality reads have high coverage across the entire 658bp barcode region and originate from a single source - indicated by a single nucleotide (color) at each position in the contig. Low quality reads do not span the entire barcode region (i.e. they have regions lacking coverage) and often originate from multiple sources - indicated by multiple nucleotides (colors) at certain positions in the contig.

Figure 7 shows that there is no negative impact on sequence recovery when NGS throughput is increased by analyzing 95 samples simultaneously. "10-plex" refers to amplifying and sequencing 10 samples in a single process, while "95-plex" refers to amplifying and sequencing 95 samples in a single process. In addition to decreasing processing time, costs are cut almost 10-fold by moving from a 10- to 95-plex system. A similar move to a 384-plex system is currently being developed and would further cut costs significantly.

Figure 8 shows the effects of designing primers to target a specific taxonomic group. In this example, primers designed to target animal DNA in general are compared to the same primers designed to target vertebrate DNA specifically. Both primer sets were used to amplify the same mammalian DNA, and the results clearly show a significant performance improvement by the vertebrate primers compared to the general primers. By making similar primer modifications, the NGS method can theoretically be applied to any genetic sequence in any type of organism.

Figure 9 shows PCR success rates for general- and vertebrate-specific primers. Both primer sets target the same gene region. To directly compare the primer sets, each set was used to amplify the same DNA from 95 fresh and 95 degraded vertebrate samples. In both cases, the vertebrate-specific primers outperformed the general primers.

Detailed Description of the Invention

The following description of various exemplary embodiments is exemplary and explanatory only and is not to be construed as limiting or restrictive in any way. Other embodiments, features, objects, and advantages of the present teachings will be apparent from the description and accompanying drawings, and from the claims.

As used herein, "amplify", "amplifying" or "amplification reaction" and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single- stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. In some embodiments, amplification includes a template- dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule. Amplification optionally includes linear or exponential replication of a nucleic acid molecule. In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, "amplification" includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further including any of the amplification processes known to one of ordinary skill in the art. In some embodiments, the amplification reaction includes polymerase chain reaction (PCR).

As used herein, "amplification conditions" and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences includes polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence. Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg ++ or Mn ++ (e.g., MgCb, etc.) and can also include various modifiers of ionic strength.

As used herein, "target sequence" or "target sequence of interest" and its derivatives, refers generally to any single or double-stranded nucleic acid sequence that can be amplified or synthesized according to the disclosure, including any nucleic acid sequence suspected or expected to be present in a sample. In some embodiments, the target sequence is present in double-stranded form and includes at least a portion of the particular nucleotide sequence to be amplified or synthesized, or its complement, prior to the addition of target-specific primers or appended adapters. Target sequences can include the nucleic acids to which primers useful in the amplification or synthesis reaction can hybridize prior to extension by a polymerase. In some embodiments, the term refers to a nucleic acid sequence whose sequence identity, ordering or location of nucleotides is determined by one or more of the methods of the disclosure. As defined herein, "sample" is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises DNA, RNA, PNA, LNA, chimeric, hybrid, or multiplex-forms of nucleic acids. The sample can include any biological, animal, avian, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such a genomic DNA, fresh-frozen or formalin-fixed paraffin- embedded nucleic acid specimen.

As used herein, "degraded DNA" is used in its broadest sense to include DNA that is "falling apart" or broken down into smaller pieces. Degraded DNA may be reflective of: using very old DNA samples; using DNA extracted from formalin-fixed paraffin embedded samples; freezing and thawing DNA samples repeatedly; leaving DNA samples at room temperature; or exposing DNA samples to heat or physical shearing.

As used herein, the term "primer" and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest. In some embodiments, the primer can also serve to prime nucleic acid synthesis. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer may be comprised of any combination of nucleotides or analogs thereof, which may be optionally linked to form a linear polymer of any suitable length. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide. (For purposes of this disclosure, the terms "polynucleotide" and "oligonucleotide" are used interchangeably herein and do not necessarily indicate any difference in length between the two). In some embodiments, the primer is single-stranded but it can also be double-stranded. The primer optionally occurs naturally, as in a purified restriction digest, or can be produced synthetically. In some embodiments, the primer acts as a point of initiation for amplification or synthesis when exposed to amplification or synthesis conditions; such amplification or synthesis can occur in a template-dependent fashion and optionally results in formation of a primer extension product that is complementary to at least a portion of the target sequence. Exemplary amplification or synthesis conditions can include contacting the primer with a polynucleotide template (e.g., a template including a target sequence), nucleotides and an inducing agent such as a polymerase at a suitable temperature and pH to induce polymerization of nucleotides onto an end of the target-specific primer. If double-stranded, the primer can optionally be treated to separate its strands before being used to prepare primer extension products. In some embodiments, the primer is an oligodeoxyribonucleotide or an oligoribonucleotide. In some embodiments, the primer can include one or more nucleotide analogs. The exact length and/or composition, including sequence, of the target-specific primer can influence many properties, including melting temperature (Tm), GC content, formation of secondary structures, repeat nucleotide motifs, length of predicted primer extension products, extent of coverage across a nucleic acid molecule of interest, number of primers present in a single amplification or synthesis reaction, presence of nucleotide analogs or modified nucleotides within the primers, and the like. In some embodiments, a primer can be paired with a compatible primer within an amplification or synthesis reaction to form a primer pair consisting or a forward primer and a reverse primer. In some embodiments, the forward primer of the primer pair includes a sequence that is substantially complementary to at least a portion of a strand of a nucleic acid molecule, and the reverse primer of the primer of the primer pair includes a sequence that is substantially identical to at least of portion of the strand. In some embodiments, the forward primer and the reverse primer are capable of hybridizing to opposite strands of a nucleic acid duplex. Optionally, the forward primer primes synthesis of a first nucleic acid strand, and the reverse primer primes synthesis of a second nucleic acid strand, wherein the first and second strands are substantially complementary to each other, or can hybridize to form a double-stranded nucleic acid molecule. In some embodiments, one end of an amplification or synthesis product is defined by the forward primer and the other end of the amplification or synthesis product is defined by the reverse primer. In some embodiments, where the amplification or synthesis of lengthy primer extension products is required, such as amplifying an exon, coding region, or gene, several primer pairs can be created than span the desired length to enable sufficient amplification of the region. In some embodiments, a primer can include one or more cleavable groups. In some embodiments, primer lengths are in the range of about 10 to about 60 nucleotides, about 12 to about 50 nucleotides and about 15 to about 40 nucleotides in length. Typically, a primer is capable of hybridizing to a corresponding target sequence and undergoing primer extension when exposed to amplification conditions in the presence of dNTPS and a polymerase. In some instances, the particular nucleotide sequence or a portion of the primer is known at the outset of the amplification reaction or can be determined by one or more of the methods disclosed herein. In some embodiments, the primer includes one or more cleavable groups at one or more locations within the primer. As used herein, "target-specific primer" and its derivatives, refers generally to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75% complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or identical, to at least a portion of a nucleic acid molecule that includes a target sequence. In such instances, the target-specific primer and target sequence are described as "corresponding" to each other. In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the nucleic acid molecule other than the target sequence. In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non- complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as "non-specific" sequences or "non-specific nucleic acids". In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer can be at least 90%, at least 95% complementary, at least 98% complementary or at least 99% complementary, or identical, across its entire length to at least a portion of its corresponding target sequence. In some embodiments, a forward target-specific primer and a reverse target-specific primer define a target-specific primer pair that can be used to amplify the target sequence via template- dependent primer extension. Typically, each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50% complementary to at least one other target sequence in the sample. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target-specific primer can be substantially non-complementary at its 3' end or its 5' end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target- specific primers include minimal self-complementarity. In some embodiments, the target- specific primers can include one or more cleavable groups located at the 3' end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5' end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3' end or the 5' end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target-specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.

As used herein, "polymerase" and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily, such nucleotide polymerization can occur in a template- dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases. Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term "polymerase" and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide. In some embodiments, the second polypeptide can include a reporter enzyme or a processivity- enhancing domain. Optionally, the polymerase can possess 5' exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer based polymerase that optionally can be reactivated.

As used herein, the term "nucleotide" and its variants comprises any compound, including without limitation any naturally occurring nucleotide or analog thereof, which can bind selectively to, or can be polymerized by, a polymerase. Typically, but not necessarily, selective binding of the nucleotide to the polymerase is followed by polymerization of the nucleotide into a nucleic acid strand by the polymerase; occasionally however the nucleotide may dissociate from the polymerase without becoming incorporated into the nucleic acid strand, an event referred to herein as a "non-productive" event. Such nucleotides include not only naturally occurring nucleotides but also any analogs, regardless of their structure, that can bind selectively to, or can be polymerized by, a polymerase.

The term "extension" and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically but not necessarily such primer extension occurs in a template- dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson-Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm. In one non-limiting example, extension occurs via polymerization of nucleotides on the 3ΌΗ end of the nucleic acid molecule by the polymerase.

As used herein, "multiplex identifier tag (MID)" or "DNA tagging sequence" and its derivatives, refers generally to a unique short (6-14 nucleotide) nucleic acid sequence within an adapter that can act as a "key" to distinguish or separate a plurality of amplified target sequences in a sample. For the purposes of this disclosure, a DNA barcode or DNA tagging sequence can be incorporated into the nucleotide sequence of an adapter.

As used herein, a "barcode" is a short DNA sequence from a uniform locality on the genome used for identifying species.

As defined herein "multiplex amplification" refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The "plexy" or "plex" of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536- plex, 3072-plex, 6144-plex or higher.

As used herein, "nested PCR" means that two pairs of PCR primers were used for a single locus. The first pair amplifies the locus as seen in any PCR experiment. The second pair of primers (nested primers) bind within the first PCR product and produce a second PCR product that may be shorter than the first one.

As used herein "Next Generation Sequencing (NGS)" refers to various types of massive parallel sequencing techniques. NGS extends the process of sequencing by sequencing millions of fragments in parallel fashion. NGS basically incorporates library preparation, cluster generation, sequencing and data analysis. Several different types of NGS platforms are commercially available.

DNA barcoding is a new system of species identification and discovery using a short section of DNA from a standardized region of the genome [1]. This DNA sequence is then used to identify different species in a manner analogous to a supermarket scanner using black stripes of the UPC barcode to identify purchases. It would be very beneficial to be able to barcode any type of sample from any source, no matter how old or how it has been treated. In particular, it is beneficial to be able to barcode specimens whereby the DNA may be degraded to certain extents.

The method of the invention (schematically shown in Figure 1) incorporates a novel two stage multiplex nested PCR approach to first amplify very small amounts of degraded DNA to produce a plurality of amplicons that cover the entire region of the gene or sequence of interest. The amplicons are then subject to sequence characterization by NGS methods. The method of the invention uses NGS to characterize/recover sequences of the pool of amplicons produced by the multiplex PCR from specimens with varying DNA qualities (i.e. different levels of degradation). In combination these two provide for a novel method of amplification and characterization of DNA sequences from degraded sources.

The present method has use in one aspect for sequencing essentially the entire barcode of a specimen that may have varying degrees of DNA degradation, inclusive of specimens with almost no intact DNA. The present method can be used with as little as 2 μg specimen sample size or more containing various degrees of degraded DNA. This will then provide utility with respect to identifying species and confirming species but also for a variety of other applications including biomedicine, forensics and environmental DNA (eDNA) monitoring where assembly of longer sequences from trace amounts of fragmented DNA is necessary. The present method can also be used with respect to foods where artificial sequences may be inserted therein. In general, the method provides for recovery of barcode sequences (or any desired sequence) and possibly promote development of portable devices for DNA barcoding.

A mitochondrial gene barcode is used to enable the identification of most animal species. For plants, mitochondrial genes do not differ sufficiently to distinguish among closely related species. The gene region being used for almost all animal groups is a 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene ("COI") (the Folmer region). It is highly effective in identifying a range of animal groups as well as birds, butterflies, fish and flies. The COI barcode is not effective for identifying plants because it evolves too slowly, but the two gene regions in the chloroplast, matK and rbcL, are approved as the barcode regions for land plants. For fungi, the internal transcribed spacer (ITS) region may be used. Other barcode regions are being identified and it will be understood that the methods described herein are applicable to any barcode region, whether currently known or identified in the future. The method of the invention has been demonstrated herein to recover the barcode region for COI from small amounts of template DNA, initially from a small number of Lepidoptera and subsequently from samples spanning several major insect orders, as well as arachnids, marine invertebrates, and land- and aquatic-based vertebrates. However, it would be understood by one of skill in the art that the present method is very universal and in aspects can also be scaled for plants or other organisms, as well as for other gene regions. The method can be provided as a system in separate kits for invertebrates, mammals, fish and birds as non- limiting examples.

The present two stage multiplex PCR/NGS approach whereby it allows for all fragments to be amplified in a single multiplex PCR due to the multiplex nature of the PCR reactions which allows for high primer redundancy. As a result, each DNA extract processed with the present approach is exposed to amplification by approximately 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The multiplex PCR is performed such that initially generated amplicons using a plurality of primers act as a template for a subsequent round of multiplex PCR with different primer characteristics. This is also in contrast to the traditional Sanger approach which utilizes multiple PCR reactions for each fragment.

In the method a first step of multiplex PCR (PCR1) is performed using nested degenerate primers designed to hybridize to the extracted target DNA template. To avoid preferential amplification of certain fragments and amplification bias, a second round of multiplex PCR (PCR2) is performed targeting non-adjacent fragments of the DNA template using the first round PCR (PCR1) products (amplicons) as a template. The same reaction is basically repeated and then further in a nested approach (using nested PCR). In the first stage PCR1 lObp-tailed primers are used while in PCR2 adapter-tailed primers that are also tailed with a multiplex identifier tag (MID) are used.

To produce more template options for the second multiplex PCR2 without increasing bias, each of the two first step PCR1 reactions contain selected forward primers for all downstream reverse primers to allow the same region of DNA to be covered by multiple amplicons - thus produce more redundancy. Thus each primer pair in the multiplex second stage PCR2 is provided with multiple template amplicon options where only one needs to work to get full coverage of the target sequence. To further neutralize amplification bias, reactions are split into six reactions, one for each forward primer that is paired with the next 1-3 downstream reverse primers. Taken together, this cumulatively prevents overlap amplification, - 22 - reduces amplification bias and results in redundant amplification so that if one particular primer set is not effective or fails then another is likely to cover for it.

To further avoid primer incorporation into the middle of sequence reads, certain reverse primers from PCR1 are omitted so that overlap amplicons cannot form, however, this reduces the amount of amplicons available as templates for PCR2 which leads to a loss of the amplification redundancy. Thus unwanted elongation by the polymerase is blocked in PCR1. This was effected by the use of non-complementary tails on the PCR1 primers, however, any agent that blocks elongation (i.e. chemical conjugation) from the 5' end of the primers can be used.

For NGS, the primers used in the PCR2 are tailed with adapter sequences and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. The superior performance of NGS in sequence recovery is likely due, in part, to the developed multiplex nature of the PCR reactions which allowed high primer redundancy. As a result, each DNA extract processed with the current protocol was exposed to amplification by 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The high diversity of primers undoubtedly meant that there was a greater chance of achieving the primer-template homology necessary for successful amplification. The higher success of the NGS protocol (compared to Sanger sequencing) is likely also a consequence of the greater sensitivity of these sequencing platforms. This difference was evidenced by the fact that, in the initial experiment, 16 of 20 specimens which failed to generate a 164bp sequence via Sanger analysis, generated sequence reads for the same region with NGS. Subsequent experiments comparing Sanger sequencing to the NGS method showed a 5-20 fold increase in the number of recovered barcode sequences using the NGS method (Table 1). Furthermore, while increased sample age has a strong negative affect on barcode recovery via Sanger analysis, the NGS method recovers long barcodes regardless of age (Figure 6). The results show that it was possible to recover a full- length COI barcode with NGS from specimens that failed with Sanger analysis.

Table 1 - Direct comparison of Sanger and NGS method on various taxonomic groups. This table compares the results of analyzing the same DNA using the best available Sanger sequencing method and the NGS method. The results of each experiment (experiment numbers correspond to those in Table 1) show that the NGS method yields significantly more and longer barcode sequences than the Sanger method, and in two cases is the only method that could produce barcode sequences.

Lepidoptera 95 10 74 84 39 407 658 158 342 164 279 11% 78%

Coleoptera 846 110 568 86 35 658 658 193 334 164 658 13% 67%

Arachnids 94 7 S an ger 89 164 95 480 658 299 495 307 658 7% 95%

Arachnids 190 0 164 0 52 0 658 0 426 0 658 0% 86%

Reptiles/

0 95 1 2G S N F T1 166 55 166 371 166 150 166 None 1% 22% amphibians

Mammals 95 0 23 0 56 0 658 0 239 0 None 0% 24%

S an ger

Shokralla et al [20,21] used NGS toG S N F T recover full-length barcodes from freshly collected specimens of Lepidoptera with a single primer pair. However, the present novel method now

S an ger

demonstrates that NGS can regularly recover complete or near-complete barcodes from century-old specimens with heavily degraded DNA.G S N F T Moreover, because it requires little template DNA, much of each DNA extract remains for future analysis. Although analytical

S an ger

costs were approximately $10 CAD a specimen, a 4-fold increase in the number of specimens processed in each run is feasible with a move to a NGS platformG S N F T generating more reads, resulting in an estimated cost of less than $3 per sample.

S an ger

While initially only applied to 10 samples simultaneously, the NGS method can be applied to 96 samples simultaneously without decreasing sequence recovery G S N F T (Figure 8). Additionally, subsequent experiments have demonstrated the method to be successful for over 400 different families of animals, covering several different phyla (Table 2). Samples fix S ean ger d in formalin and preserved in ethanol were also successfully analyzed, from all major groups examined to date: spiders, freshwater insects, molluscs, crustaceans, reptiles, and mammals (Table 2). DNA barcodes have also been successfully recovered from forensic specimens and samples of heavily processed materials confiscated by wildlife enforcement officers (data not shown).

Table 2 - This table lists each experiment used to develop, optimize, and enhance the NGS method. The purpose of each experiment is included, as well as information on the samples employed for each experiment. The type of sequencing used for each experiment and overall success rates are listed. The degree of DNA damage was estimated based on the ease of which barcodes could be amplified using Sanger sequencing methods.

Furthermore, new primer sets can be developed, a task facilitated by the well- parameterized barcode reference library for the animal kingdom, and subsequent experiments have demonstrated that a primer set designed for vertebrates provides increased barcode amplification in comparison with the standard primers outlined here (Figure 9). Indeed the method of the invention can be used to amplify and sequence any desired degraded DNA as primers can be tailored for any given sequence. Past research has employed NGS to sequence genomes, but this study has demonstrated its value in probing sequence diversity in single gene regions when combined with two step multiplex PCR as described herein. A large-scale program to sequence type specimens would represent a major advance in stabilizing and validating the application of scientific names. As well, because many type specimens derive from developing nations, it would represent an important step in the repatriation of knowledge that will aid these nations in managing their biodiversity by enabling DNA-powered identification systems, a major advance in settings where the scientific workforce is small and biodiversity is high.

In a specific aspect, the method described herein involves the following protocol:

1) PCR1

a. Two separate reactions, one with primers Fl, F3, F5 + R1-R6 (PCRla), the other with F2, F4, F6 + R2-R6 (PCRlb). Separate reaction are necessary to prevent non- target amplification. Primers are tailed with short non-complimentary sequence to prevent another form of non-target amplification.

2) PCR2

a. Six separate reactions, one for each forward primer plus the next three downstream reverse primers (or next two or next one, if only two or one downstream reverse primer exists). This reaction uses PCR1 product as template (PCRla product for PCR2 F1,F3,F5 reactions, and PCRlb for PCR2 F2, F4, F6 reactions). PCR2 forward and reverse primers contain MID tags to associate amplicons with individual specimens, so that multiple specimens can be sequenced simultaneously.

3) PCR purification

a. Following PCR2, all six reactions (or all 6 plates of reactions in the case of the 95- plex version) are pooled. We can do this because the MID tags will allow us to re- associate the resulting sequence reads with their original sample. An aliquot of the pooled reactions is purified for sequencing.

4) Sequencing a. Purified products are quantified and diluted to sequencing manufacture's recommendation. The diluted product is then sequenced following manufacturer's instructions.

5) Data Analysis

a. The resulting sequence reads are de-multiplexed via the MID tags and split into separate datasets, one for each specimen (typically 95 datasets)

b. Each dataset is processed through a bioinformatics pipeline that trims off primer, MID, and adapter sequences and filters out low quality reads. The filtered reads, which ideally overlap with one another, are then arranged into a contiguous sequence, which ideally will be a full-length barcode. Formation of the contig can involve alignment of reads to a reference, but can in theory also be de novo (i.e. no reference sequence involved).

Examples

Materials and Methods

The following pertains specifically to the initial experiment, the purpose of which was to develop and optimize the NGS method. Subsequent experiments contained minor modifications, such as the use of additional MID tags in the primer sequences to increase throughput, or the use of different taxa and associated primers, but the overall design and principal of the protocol remained the same.

Type Specimens

Tissue samples were obtained from 1820 specimens (mostly primary types but some were equally important non-types) of Geometridae (Lepidoptera) from the Natural History Museum (London) as part of a project to develop a strongly validated taxonomic system to support species inventories and studies of host plant use in Papua New Guinea [22,23]. Genitalic dissections of these specimens generated residual tissue that was held frozen until its use in the present study.

DNA Extraction

All tissue samples were processed in an isolated 'clean' laboratory at the Canadian Centre for DNA Barcoding (CCDB; www.ccdb.ca) with dedicated reagents, supplies and protective clothing. Each sample was incubated ovemight in lysis buffer, following a modified protocol of Knolke et al [24], before DNA was extracted using a silica membrane-based method in either single columns or 96-well plate format [25]. To maximize the concentration of extracted DNA, elution from each silica membrane was performed with 30 of pre- warmed (to 56°C) lOmM Tris HC1.

Sanger Sequencing

Since DNA quality varies greatly, even among specimens of similar age [2,8], each DNA extract was initially assessed by Sanger analysis. This involved an attempt to amplify both 164bp (C microLepFl tl + C TypeRl) and 94bp (C TypeFl + C TypeRl) regions of the COI barcode [2,9]. PCR amplification and cycle sequencing employed standard CCDB protocols [2,25,26] with amplicons bidirectionally sequenced on an ABI 3730XL (Applied Biosystems). All traces were edited using CodonCode v. 4.2.7 (CodonCode Corporation) and the resulting 164bp and 94bp sequences were validated by comparison with sequences from conspecific individuals or, when they were unavailable, by Neighbor-Joining (NJ) analysis to ensure that each sequence branched as expected. These tests for sequence recovery permitted the assignment of DNA from each specimen to one of three categories: 1) High Quality (HQ) - those that generated a 164bp sequence; 2) Medium Quality (MQ) - those that generated a 94bp sequence; and 3) Low Quality (LQ) - those that failed to generate any sequence. The present study examined ten specimens from each category with the goal of developing a NGS protocol effective across varying levels of DNA degradation. The specimens (Table 3) selected for analysis included a single representative from each of 30 genera in the family Geometridae, all more than a century old (mean age = 111 years). Sequences, electropherograms and primer details for the specimens are on BOLD (dx.doi.org/10.5883/DS-NGSTYPES) and GenBank (see Table 3 for accession numbers).

Table 3. Type specimens analyzed, including sequencing results and accession numbers

The four Process ID's marked with an asterisk (*) represent specimens where NGS analysis generated sequence reads from multiple species. HQ high quality; MQ - medium quality; LQ - low quality; N/A - not applicable.

Next- generation Sequencing

DNA degradation often limits PCR amplicons to <200bp in specimens that are more than 50 years old [18], precluding efforts to recover the entire barcode region with one or two primer sets. As a consequence, primer sets were designed to amplify fragments ranging in length from 120bp to 148bp with enough overlap to permit recovery of the 658bp barcode region. These primers needed to be tailed with adapter sequences for analysis on an Ion Torrent PGM (Life Technologies) and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. Ten sets of MID-tagged primers, each consisting of six forward and six reverse primers, were employed to analyze ten type specimens per NGS run (Table 4).

Table 4. Primers used in the first (PCRl) and second (PCR2) reactions to allow the analy

specimens in an Ion Torrent PGM run.

ARDGGDGGRTAWAC

R3 AncientLepR3-Sanger-ion3 None None

WGTTCAWCC

GTWGWAATRAARTT

R4 AncientLepR4-Sanger-ion4 None None

DATWGCWCC

GTTARWARTATDGT

R5 AncientLepR5-Sanger-ion5 None None

RATDGCWCC

TAAACTTCTGGATGT

R6 LepRl -Sanger-ion6 None None

CCAAAAAATCA

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ionl CTCAG ATTCAACCAA A ssl

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion 1 CTCAG ATTRRWRATG A ssl

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ionl CTCAG TTATAATTGG A ssl

DGGRTTTGGWAATTG

PCR2

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion 1 CTCAG AGWAGWATW A ssl

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ionl CTCAG ATTTTTWSWC A ssl

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion 1 CTCAG TATTTGTWTG A ssl

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 Fl LepFl-ion2 CTCAG ATTCAACCAA A ss2

TCATAAAGATATTGG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion2 CTCAG ATTRRWRAT A ss2

GATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion2 CTCAG TTATAATTGG A ss2

DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion2 CTCAG AGWAGWAT A ss2

WRTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion2 CTCAG ATTTTTWSWC A ss2

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion2 CTCAG TATTTGTWTG A ss2

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion3 CTCAG ATTCAACCAA A ss3

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion3 CTCAG ATTRRWRATG A ss3

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 F3 AncientLepF3-ion3 CTCAG TTATAATTGG A ss3

DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion3 CTCAG AGWAGWATW A ss3

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion3 CTCAG ATTTTTWSWC A ss3

TWCATWTDGCWGG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion3 CTCAG TATTTGTWTG A ss3

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion4 CTCAG ATTCAACCAA A ss4

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion4 CTCAG ATTRRWRATG A ss4

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion4 CTCAG TTATAATTGG A ss4

DGGRTTTGGWAATTG

PCR2

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion4 CTCAG AGWAGWATW A ss4

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion4 CTCAG ATTTTTWSWC A ss4

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion4 CTCAG TATTTGTWTG A ss4

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion5 CTCAG ATTCAACCAA A ss5

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 F2 AncientLepF2-ion5 CTCAG ATTRRWRAT A ss5

GATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion5 CTCAG TTATAATTGG A ss5

DGGRTTTGGWAATTG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion5 CTCAG AGWAGWAT A ss5

WRTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion5 CTCAG ATTTTTWSW A ss5

CTWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion5 CTCAG TATTTGTWTG A ss5

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion6 CTCAG ATTCAACCAA A ss6

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion6 CTCAG ATTRRWRATG A ss6

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion6 CTCAG TTATAATTGG A ss6

DGGRTTTGGWAATTG

PCR2

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion6 CTCAG AGWAGWATW A ss6

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion6 CTCAG ATTTTTWSWC A ss6

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion6 CTCAG TATTTGTWTG A ss6

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 Fl LepFl-ion7 CTCAG ATTCAACCAA A ss7

TCATAAAGATATTGG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion7 CTCAG ATTRRWRATG A ss7

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion7 CTCAG TTATAATTGG A ss7

DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion7 CTCAG AGWAGWATW A ss7

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion7 CTCAG ATTTTTWSWC A ss7

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion7 CTCAG TATTTGTWTG A ss7

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion8 CTCAG ATTCAACCAA A ss8

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion8 CTCAG ATTRRWRATG A ss8

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 F3 AncientLepF3-ion8 CTCAG TTATAATTGG A ss8

DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion8 CTCAG AGWAGWATW A ss8

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion8 CTCAG ATTTTTWSWC A ss8

TWCATWTDGCWGG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion8 CTCAG TATTTGTWTG A ss8

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ion9 CTCAG ATTCAACCAA A ss9

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F2 AncientLepF2-ion9 CTCAG ATTRRWRAT A ss9

GATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3-ion9 CTCAG TTATAATTGG A ss9

DGGRTTTGGWAATTG

PCR2

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion9 CTCAG AGWAGWAT A ss9

WRTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5-ion9 CTCAG ATTTTTWSW A ss9

CTWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion9 CTCAG TATTTGTWTG A ss9

AKCWRTWKKWATTAC

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

Fl LepFl-ionlO CTCAG ATTCAACCAA A sslO

TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

PCR2 F2 AncientLepF2-ion 10 CTCAG ATTRRWRATG A sslO

ATCAARTWTATAAT

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F3 AncientLepF3 -ion 10 CTCAG TTATAATTGG A sslO

DGGRTTTGGWAATTG CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F4 AncientLepF4-ion 10 CTCAG AGWAGWATW A sslO

RTWRAWAVWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F5 AncientLepF5 -ion 10 CTCAG ATTTTTWSWC A sslO

TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA

IonXpre

F6 AncientLepF6-ion 10 CTCAG TATTTGTWTG A sslO

AKCWRTWKKWATTAC

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

Rl AncientLepRl -ion 1 -trP 1 WGGTATWACTATRAAR trPl ssl

AAAATTAT

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

R2 AncientLepR2-ion2-trP 1 TC ARAAWC TWATRTTR trPl ss2

TTTADWCG

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

R3 AncientLepR3 -ion3 -trP 1 ARDGGDGGRTAWACWG trPl ss3

TTCAWCC

PCR2

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

R4 AncientLepR4-ion4-trP 1 GTWGWAATRAARTTDA trPl ss4

TWGCWCC

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

R5 AncientLepR5 -ion5 -trP 1 GTTARWARTATDGTRAT trPl ss5

DGCWCC

CCTCTCTATGGGCAGTCGGTGAT

IonXpre

R6 LepRl-ion6-trPl TAAACTTCTGGATGTCC trPl ss6

AAAAAATCA

The "Code" column refers to primer labels in Fig. 2. The COI binding region within each primer sequence is shown in black, while the lObp tail (PCRl) or MID tag (PCR2) is shown in blue. The "key sequence" (required for Ion Torrent sequencing) is shown in green and the sequencing adapters are shown in red. The lObp tails on the PCRl primers are technically lonXpress MID tags, but they serve only to block short amplicons from acting as primers during PCR1. They were chosen over random decamer tails to maximize primer-template matching in PCR2. The same forward and reverse PCR1 primers are used for all ten samples in the first round of PCR. In the second round of PCR, samples are discriminated by using ten different sets of MTD-tagged forward PCR2 primers (the same set of PCR2 reverse primers is used for all ten samples).

Optimization of NGS Protocols

Optimization studies tested the impact of varied primer combinations, number of PCR cycles, differential concentrations of primers and nesting of PCRs. Efforts to multiplex all six forward and reverse primers in a single reaction were unsuccessful because the small regions of overlap were preferentially amplified over the six target fragments. Splitting the PCR into two reactions, each targeting non-adjacent fragments (e.g. PCR1 = Fl+Rl, F3+R3, F5+R5; PCR2 = F2+R2, F4+R4, F6+R6), solved this issue, but revealed another problem: the dominance of certain amplicons. This problem was overcome by mixing the forward primers with the full complement of reverse primers (e.g. PCR1 = F1+F3+F5 + six reverse primers; PCR2 = F2+F4+F6 + 5 reverse primers). This allowed each forward primer to potentially pair with several downstream reverse primers, creating redundancy that improved sequence recovery while reducing the dominance of any particular amplicon. For example, depending upon the quality of the template DNA, the barcode segment amplified by primers F4+R4 could be amplified by any of the twelve combinations of Fl, F2, F3 or F4 paired with R4, R5 or R6. This redundancy aided the recovery of full-length barcodes from specimens with varied degrees of DNA degradation or with particular primer mismatches (as evidenced by the lack of a certain product in the final sequence array). When DNA quality is poor, primer binding becomes increasingly important to "kick start" amplification [26]. Perfect primer binding is impossible when diverse taxa are analyzed, but the prospects for recovery of desired amplicons can be improved by raising the number of PCR cycles and by increasing the primer degeneracy. Both tactics were employed in the present NGS protocol.

Two rounds of PCR were employed, with 60 cycles in the first and 40 cycles in the second. All forward and reverse primers included degeneracy at the sites most important for primer binding (3' terminus for F, 5' terminus for R). Considering this degeneracy, the 12 forward and reverse primers were actually a cocktail of 2010 primers. Other factors were found to have important impacts on final outcomes. For example, initial tests revealed that primers with the 33bp-40bp adapter/MID tails required for NGS were less effective in generating product than the same primers without tails, a difference that was particularly strong for LQ extracts. This difference was probably due to interference with primer binding caused by the formation of secondary structures in the primers with tails. Although primers without tails produced the highest amplification success, their use allowed short, non-target amplicons to act as primers generating chimeric amplicons which combined sequence information from primers and the specimen. To overcome this problem, lObp tails lacking complementarity to any region in the target genomes were added to the 5' terminus of all primers. Their presence inhibited polymerase elongation when short amplicons or primer dimers attempted to act as primers, preventing the formation of chimeric amplicons while avoiding the secondary structure issues inherent with longer tails. Although the first round of PCR was effective in generating amplicons, a second round of PCR was used to introduce the adapter-tailed primers for sequence analysis. It likely had the additional benefit of reducing amplification bias because it involved six separate reactions, one for each forward primer, dampening amplification bias by limiting primer competition.

Final NGS Protocol

These experimental studies led to the development of a two-stage, nested, multiplex PCR protocol which produced sequence records spanning the barcode region. The first round of PCR included two reactions for each specimen (PCR 1.1 and PCR 1.2 in Fig. 2a), each consuming 2 μΐ. of genomic DNA as template. Each reaction included three forward primers (F1+F3+F5 or F2+F4+F6) with six and five reverse primers respectively, allowing each forward primer to generate from 1-6 amplicons, depending on the quality of DNA and its binding position in relation to the reverse primers. Detailed reaction components (final volume = 12.5 μί) are provided in Table 5. Thermocy cling consisted of 94°C for 2 minutes, 60 cycles of {94°C for 40 seconds, 48°C for 40 seconds, 72°C for 30 seconds}, and a final extension of 72°C for 5 minutes. Table 5. Components of PCR reactions in the NGS protocol.

from Fluka Analytical; Hyclone ultra-pure water from Thermo Fisher Scientific; Buffer, MgC , and Taq polymerase from KAPA Biosystems; primers from Integrated DNA Technologies.

In Figure 2 the primer positions for the first and second rounds of PCR (a) and all possible final amplicons (b) is shown. The initial round of PCR includes two separate reactions (a - above broken line) using lObp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR includes six separate reactions (a - below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID tags unique to that specimen. To assign each amplicon to a particular reaction (i.e. 2.1, 2.2, 2.3, etc.), each reverse PCR2 primer is tailed with a MID tag unique for each reaction in the second round of PCR.

The second round of PCR used product from the first PCR reactions as template and included six reactions per specimen (PCR 2.1-2.6 in Fig. 2a), each coupling a single forward primer with one to three reverse primers and using 2μ1, of the appropriate primary PCR product as template. It boosted amplicon yields while also adding the required sequencing adapters. Each secondary PCR generated 1-3 amplicons which collectively spanned the COI barcode region (Fig. 2b). The first four PCRs (2.1-2.4 in Fig. 2a) contained forward primers F 1-F4, each combined with the three immediately downstream reverse primers (e.g. F 1+R1+R2+R3). The fifth PCR (2.5 in Fig. 2a) combined F5 with R5 and R6, while the sixth PCR (2.6 in Fig. 2a) combined F6 with R6. All of these reactions employed primers with adapter tails and MID tags to enable NGS to discriminate fragments and/or individuals in post processing. Detailed reaction components (final volume = 12.5 μί) are provided in Table 5. Thermocycling consisted of 94°C for 2 minutes, 40 cycles of {94°C for 40 seconds, 48°C for 40 seconds, 72°C for 30 seconds}, and a final extension of 72°C for 5 minutes.

The secondary PCR products from each specimen (six reactions) were pooled and a double size selection protocol (PCRClean DX kit - Aline Biosciences) was employed to remove genomic DNA, primer dimers and residual primers. The first cleanup step was designed to remove any high molecular weight genomic DNA (>800bp) that might reflect recent contamination (e.g. human DNA from researchers working with the specimens). Briefly, the PCR product and magnetic beads were incubated in a 2: 1 ratio (volume PCR product: volume beads) for 8 minutes at room temperature followed by 2 minutes on a magnet. The pellet of beads was discarded, while the supernatant was retained for the second cleanup step which was designed to bind molecular weights ranging from 250bp-700bp (i.e. the PCR products) onto beads, while lower molecular weight DNA (primer dimers, residual primers) remained in solution. This step was carried out by mixing enough beads and sterile water to generate a 5:4 ratio (PCR product: beads) and incubated for 8 minutes followed by two minutes on a magnet. The supernatant was discarded and the pellet of beads was washed three times with 80% ethanol before the PCR products were eluted from the beads with 36 μΐ. of sterile water. Following cleanup, the concentration of each purified PCR product was measured on a Qubit 2.0 spectrophotometer using the Qubit dsDNA HS Assay Kit (Life Technologies) and all 10 samples were normalized to 1 ng/μΐ. and mixed in equal proportions. From this mixture, the final sequencing template library was created by making a 1/300 dilution. An Ion PGM Template OT2 400 kit (Life Technologies) was used for template preparation and sequencing was carried out on an Ion Torrent PGM following the manufacturer's instructions. Sequencing was performed on a 316 chip using an Ion PGM Sequencing 400 Kit (Life Technologies).

Data Analysis

Raw data from each Ion Torrent PGM run were uploaded to the Galaxy platform for analysis (https://usegalaxy.org/) [27]. Several filters were applied to remove low quality, short, and non-target reads before an alignment was constructed to assemble the full barcode contig. Representative examples of the sequence reads recovered from HQ and LQ extracts are shown in Figure 6. The resultant FASTA file was then exported to permit comparisons with Sanger- generated sequences in BOLD. The authenticity of each NGS-generated sequence was subsequently validated by querying the sequence against the BOLD Identification Engine (www.boldsystems.org) to check for contamination or non-target amplification. Further validation was performed via Neighbor- Joining (NJ) analysis that included the NGS-generated sequences as well as sequences from recently collected specimens of the same species or close relatives. The compiled reads from each run were deposited in the Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra) under study accession SRP055961 (see Table 3 for individual sample accession numbers), while the barcode contig for each specimen was deposited in the BOLD dataset (dx.doi.org/10.5883/DS-NGSTYPES) and in GenBank (see Table 3 for accession numbers).

Results

Because the NGS protocol allowed the simultaneous processing of ten specimens, just three runs were required to analyze the 30 specimens. The average number of sequence reads per specimen showed five-fold variation (182K, 59K, 36K), while the average depth of coverage per base showed six-fold variation (36K, 12K, 6K) across the three DNA categories (Figs. 3a and 3b). The number of reads per specimen averaged 90K, resulting in an average coverage depth of 18K per base. Sequences were recovered from every specimen with reads averaging 610bp, 578bp, and 458bp for the HQ, MQ and LQ extracts respectively (Fig. 3c). Barcode compliant sequences (>487bp) were recovered from 8 HQ, 8 MQ, and 4 LQ specimens (Table 3), while sequence records >400bp were recovered from 25 of 30 specimens (83%). In fact, more than 200bp of sequence data was recovered from all 30 specimens (Table 3). The recovery of sequences from ten type specimens in each of three DNA categories was shown in Figure 3.

The sequences generated by NGS samples from the HQ and MQ specimens were perfectly matched in their zones of overlap to the shorter sequences generated by Sanger analysis (Fig. 4). The protocol does involve 100 cycles of PCR amplification, but there was no evidence of artifacts when the NGS sequences were compared to their Sanger counterparts (Fig. 4). Further confirmation of their validity was provided by the fact that they grouped with sequences from closely allied taxa (Fig. 5). It was more difficult to verify the sequences obtained via NGS from the LQ specimens because they had no Sanger counterparts for comparison. In six cases, the NGS sequences clearly derived from a single species, but reads from the other four specimens appeared to originate from two or more species. Obvious contaminants (e.g. fungi, bacteria) were easily removed during post processing, but some sequences in these four records appeared to derive from closely allied species or pseudogenes. In principle, the contaminants and authentic sequences could be discriminated if reference sequences were available from modern specimens of these species, but they were not. Because the four specimens showing these admixtures generated the fewest sequence reads and the lowest depth of coverage, it is likely that their DNA was heavily degraded (Table 3). Once contemporary sequences for these species become available, it should be possible to recognize the authentic sequences.

To summarize, the current method works on a plurality of samples simultaneously with high success rates for good quality degraded DNA with a slight drop for lower quality degraded DNA. The method still works for samples that may contain almost no intact DNA. Lowest quality degraded DNA was still amplified and characterized using the method of the invention and shown to recover >500bp sequences from samples that failed using traditional Sanger approaches. The method may be used universally on any type of degraded DNA sample for many applications including environmental, forensics and food industry (cooked foods contain degraded DNA), generally in any application where DNA is degraded due to age, environment, processing and so forth. The method can be customized for invertebrates, mammals, fish, birds and so forth. In one aspect, the method effectively amplifies and characterizes entire barcode regions for use in biological classification. This will be helpful for classification of old specimens such as for example those found in museums [2-5,28], as demonstrated in two recent studies [29,30].

The invention can be provided as a system in a kit containing the desired primers, buffers, enzymes, instructions for use and so forth. A kit may be customized for a particular specimen, a specimen that would comprise degraded DNA.

It is to be noted that the term "a" or "an" entity refers to one or more of that entity. For example, "a characteristic" refers to one or more characteristics or at least one characteristic. As such, the terms "a" (or "an"), "one or more" and "at least one" are used interchangeably herein. It is also to be noted that the terms "comprising", "including", and "having" have been used interchangeably.

Ranges: throughout this disclosure, various aspects described herein can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope described herein. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.

It will be understood that any aspects described as "comprising" certain components may also "consist of or "consist essentially of," wherein "consisting of has a closed-ended or restrictive meaning and "consisting essentially of means including the components specified but excluding other components except for materials present as impurities, unavoidable materials present as a result of processes used to provide the components, and components added for a purpose other than achieving the technical effect described herein. For example, a composition defined using the phrase "consisting essentially of encompasses any known pharmaceutically acceptable additive, excipient, diluent, carrier, and the like. Typically, a composition consisting essentially of a set of components will comprise less than 5% by weight, typically less than 3% by weight, more typically less than 1% by weight of non-specified components.

It will be understood that any component defined herein as being included may be explicitly excluded from the claimed invention by way of proviso or negative limitation.

Many patent applications, patents, and publications are referred to herein to assist in understanding the aspects described. Each of these references are incorporated herein by reference in their entirety.

The foregoing examples and detailed description are offered by way of illustration and not by way of limitation. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the scope of the appended claims. References

1. Hebert PDN, Cywinska A, Ball S, deWaard JR. Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci. 2003; 270: 313-321.

2. Hebert PDN, deWaard JR, Zakharov EV, Prosser SWJ, Sones JE, McKeown JTA, et al. A DNA 'Barcode Blitz' : Rapid digitization and sequencing of a natural history collection. PLoS ONE. 2013; 8: e68535. doi: 10.1371/journal.pone.0068535.

3. Mutanen M, Kekkonen M, Prosser SW, Hebert PDN, Kaila L. One species in eight: DNA barcodes from type specimens resolve a taxonomic quagmire. Mol Ecol Resour. 2014; doi: 10.1111/1755-0998.12361.

4. Hausmann A, Hebert PDN, Mitchell A, Rougerie A, Sommerer M, Edwards T. Revision of the Australian Oenochroma vinaria Guenee, 1858 species-complex (Lepidoptera: Geometridae, Oenochrominae): DNA barcoding reveals cryptic diversity and assesses status of type specimen without dissection. Zootaxa. 2009a; 2239: 1-21.

5. Kirchman JJ, Witt CC, McGuire JA, Graves GR. DNA from a 100-year-old holotype confirms the validity of a potentially extinct hummingbird species. Biol Lett. 2010; 6: 112-115.

6. Gilbert MTP, Moore W, Melchior L, Worobey M. DNA extraction from dry museum beetles without conferring external morphological damage. 2007; PLoS ONE. 2: e272. doi: 10.1371/journal.pone.0000272.

7. Thomsen PF, Elias S, Gilbert MTP, Haile J, Munch K, Kuzmina S, et al. Non-destructive sampling of ancient insect DNA. PLoS ONE. 2009; 4: e5048. doi: 10.1371/journal.pone.0005048.

8. Dean MD, Ballard JWO. Factors affecting mitochondrial DNA quality from museum preserved Drosophila simulans. Entomol Exp Appl. 2001; 98: 279-283. 9. Hernandez-Triana LM, Prosser SW, Rodriguez -Perez MA, Chaverri LG, Hebert PDN, Gregory, TR. Recovery of DNA barcodes from blackfly museum specimens (Diptera: Simuliidae) using primer sets that target a variety of sequence lengths. Mol Ecol Resour. 2013; 14: 508-518. doi: 10.1111/1755-0998.12208.

10. Van Houdt JKJ, Breman FC, Virgilio M, De Meyer M. Recovering full DNA barcodes from natural history collections of Tephritid fruitflies (Tephritidae, Diptera) using mini barcodes. Mol Ecol Resour. 2010; 10: 459-465.

11. Bluemel JK, King RA, Virant-Doberlet M, Symondson WOC. Primers for identification of type and other archived specimens of Aphrodes leafhoppers (Hemiptera, Cicadellidae). Mol Ecol Resour. 2011; 11 : 770-774.

12. Hausmann A, Sommerer M, Rougerie R, Hebert P. Hypobapta tachyhalotaria n. sp. from Tasmania - an example of a new species revealed by DNA barcoding (Lepidoptera, Geometridae). Spixiana. 2009b; 32: 161-166.

13. Lees DC, Lack HW, Rougerie R, Hernandez-Lopez A, Raus T, Avtzis ND, et al. Tracking origins of invasive herbivores using herbaria and archival DNA: the case of the horse-chestnut leafminer. Front Ecol Environ. 2011; 9: 322-328.

14. Rougerie R, Naumann S, Nassig WA. Morphology and molecules reveal unexpected cryptic diversity in the enigmatic genus Sinobirma Bryk, 1944 (Lepidoptera: Saturniidae). PLoS ONE. 2012; 7: e43920. doi: 10.1371/journal.pone.0043920.

15. Lees DC, Rougerie R, Zeller-Lukashort C, Kristensen NP. DNA mini-barcodes in taxonomic assignment: a morphologically unique new homoneurous moth clade from the Indian Himalayas described in Micropterix (Lepidoptera, Micropterigidae). Zool Scr. 2010; 39: 642-661. 16. Strutzenberger P, Brehm G, Fiedler K. DNA barcode sequencing from old type specimens as a tool in taxonomy: A case study in the diverse genus Eois (Lepidoptera: Geometndae). PLoS ONE. 2012; 7: e49710.

17. Zimmermann J, Hajibabaei M, Blackburn DC, Hanken J, Cantin E, Posfai J, et al. DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool. 2008; 5: 18.

18. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, Hale ML et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc Biol Sci. 2012; 279: 4724-4733.

19. Rowe KC, Singhal S, Macmanes MD, Ayroles JF, Morelli TL, Rubidge EM, et al. Museum genomics: low-cost and high-accuracy genetic data from historical specimens. Mol Ecol Resour. 2011; 11 : 1082-1092. doi: 10.1111/j . l755-0998.2011.03052.x.

20. Shokralla S, Gibson JF, Nikbakht H, Janzen DH, Hallwachs W, Hajibabaei M. Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens. Mol Ecol Resour. 2014; 14: 892-901.

21. Shokralla S, Porter T, Gibson J, Dobosz R, Janzen DH, et al. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform. Sci Rep, 2015; 5: 9687.

22. Holloway JD, Miller SE, Pollock DM, Helgen L, Darrow K. GONGED (Geometridae of New Guinea Electronic Database): a progress report on development of an online facility of images. Spixiana. 2009; 32: 122-123.

23. Miller SE. DNA barcode enabled ecological research on Geometridae in Papua New Guinea. Spixiana. 2014; 37: 245-246.

24. Knolke S, Erlacher S, Hausmann A, Miller MA, Segerer AH. A procedure for combined genitalia dissection and DNA extraction in Lepidoptera. Insect Syst Evol. 2005; 35: 401-409. 25. Ivanova NV, deWaard JR, Hebert PDN. An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes. 2006; 6: 998-1002.

26. deWaard JR, Ivanova NV, Hajibabaei M, Hebert PDN. Assembling DNA barcodes. Methods Mol Biol. 2008; 410: 275-293.

27. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010; Chapter 19: Unit 19.10: 11-21.

28. Miller SE, Hausmann A, Hallwachs W, Janzen DH. Advancing taxonomy and bioinventories with DNA barcodes. Phil. Trans. R. Soc. B. 2016; 371 20150339. doi: 10.1098/rstb.2015.0339.

29. Spiedel W, Hausmann A, Muller GC, Kravchenko V, Mooser J, Witt TJ, et al. Taxonomy 2.0: Sequencing of old type specimens supports the description of two new species of the Lasiocampa decolorata group from Morocco (Lepidoptera: Lasiocampidae). Zootaxa, 2015; 3999: 401-412.

30. Hausmann A, Miller SE, Holloway JD, deWaard JR, Pollock D, Prosser SWJ, Hebert PDN. Calibrating the taxonomy of a megadiverse insect family: 3000 DNA barcodes from geometrid type. Genome. 2016; 0, 0. doi: 10.1139/gen-2015-0197.