Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR THE GENERATION OF MULTIPLE ORDERED NEXT-GENERATION SEQUENCING READS ALONG LARGE SINGLE DNA MOLECULES
Document Type and Number:
WIPO Patent Application WO/2017/023952
Kind Code:
A1
Abstract:
Methods are provided for producing long-range DNA scaffold sequence templates by generating multiple ordered next-generation sequencing short reads on large single DNA molecules, for genome sequence assembly and analysis. The methods described herein pursue cost-effective and massively parallel ways to produce long-range sequence information for large and complex genome sequence assembly and analysis, such as de novo genome sequencing, haplotype-phasing, long-range genome structure variation detection, high-resolution metagenomics, and rapid genotyping for disease diagnosis and important trait-genetic association etc.

Inventors:
ZHOU SHIGUO (US)
Application Number:
PCT/US2016/045209
Publication Date:
February 09, 2017
Filing Date:
August 02, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ZHOU SHIGUO (US)
International Classes:
C12Q1/68; C12P19/34
Foreign References:
US20130244882A12013-09-19
US20140200158A12014-07-17
US20140194324A12014-07-10
Other References:
SCHWARTZ ET AL.: "Capturing native long-range contiguity by in situ library construction and optical sequencing.", PROC NATL ACAD SCI U S A., vol. 109, no. 46, 2012, pages 18749 - 54, XP055139553
METWALLI ET AL.: "Surface characterizations of mono-, di-, and tri-aminosilane treated glass substrates.", J COLLOID INTERFACE SCI., vol. 298, no. 2, 2006, pages 825 - 31, XP024909572
"Illumina", SEQUENCING INTRODUCTION, 2012, pages 1 - 12, Retrieved from the Internet
CARUCCIO ET AL.: "NexteraTM Technology for NGS DNA Library Preparation: Simultaneous Fragmentation and Tagging by In Vitro Transposition.", NEXTERATM TECHNOLOGY., vol. 16, no. 3, 2009, pages 4 - 6, XP055303813
TSAI ET AL.: "Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.", GENOME BIOL., vol. 11, no. 4, 2010, pages 1 - 9, XP021085595
Attorney, Agent or Firm:
MARTIN, Todd (AU)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method for producing a template for determining nucleotide sequence information of a nucleic acid molecule, said method comprising:

(i) producing a stretched and bound target double stranded nucleic acid molecule on a positively charged surface with cross-linking moieties, and this target nucleic acid molecule may have many short single stranded DNA flaps ligated with adaptor oligonucleotides or may have been directly inserted and ligated with adaptor oligonucleotides through in vitro transposition;

(ii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the bound and stretched target nucleic acid molecule includes an adaptor nucleotide sequence having a covalently cross-linkable 5 '-end; and

(iii) covalently cross-linking the fragment of the bound and stretched target nucleic acid molecule including an adaptor nucleotide sequence to the positively charged surface, to thereby produce said template.

2, A method for producing scaffold sequence templates for a target nucleic acid molecule, said method comprising:

(i) producing a stretched and bound target double stranded nucleic acid molecule on a positively charged surface with cross-linking moieties, and this target nucleic acid molecule may have many short single stranded DN A flaps ligated with adaptor oligonucleotides or may have been directly inserted and ligated with adaptor oligonucleotides through in vitro transposition;

(ii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the stretched and bound target nucleic acid molecule includes an adaptor oligonucleotide having a covalently cross-linkable 5 '-end;

(iii) covalently cross-linking the 5 '-ends of one adaptor oligonucleotides on the fragments of the stretched and bound target nucleic acid molecule with the cross-linking moieties on the positively charged surface, to thereby produce the PCR templates;

(iv) PGR amplifying and sequencing the 5 '-end crosslinked fragments of the stretched and bound target nucleic acid molecule after two oligonucleotide primer lawn construction on the surface;

(v) mapping the sequence reads of the fragments back on to the stretched and bound target nucleic acid molecule based on sharing position coordinates on the surface or digital features on their images; and (vi) constructing DNA scaffold sequence templates by representing the gaps between fragment sequence reads of target nucleic acid molecules with "Ns" based on the DNA size estimation of the gaps,

3. The method according to any one of the preceding claims, wherein the positively charged surface has been exposed to two modifying agents that one confers a positive charge, and the other confers crosslinking moiety or group to the silicon or glass surface.

4. The method according to any one of the preceding claims, wherein the modifying agents are silanising agents.

5. The method according to any one of the preceding claims, wherein the silanising agents are selected from the group consisting of an aminosilane, an epoxysilane a glycidoxysilane and a mercaptosilane, or any combination thereof.

6. The method according to any one of the preceding claims, preferably, wherein the silanising agents are aminosilanes.

7. The method according to any one of the preceding claims, preferably, wherein one of the modifying agents is a charged aminosilane.

8. The method according to any one of the preceding claims, wherein the positively charged surface is selected from a silicon surface or a glass surface, or a combination thereof.

9. The method according to any one of the preceding claims, wherein a microfluidic device or a mesofluidic device comprises the positively charged surface with crosslinking moiety or group.

10. The method according to any one of the preceding claims, wherein the microfluidic device or the mesofluidic device is a flowcell.

11. The method according to any one of the preceding claims, wherein the flowcell may be a flowcell of a nucleic acid sequencing apparatus.

12. The method according to any one of the preceding claims, wherein the target nucleic acid molecule is selected from a DNA or a cDNA.

13. The method according to any one of the preceding claims, wherein the DNA is genomic DNA.

14. The method according to any one of the preceding claims, wherein the target nucleic acid molecule is double-stranded.

15. The method according to any one of the preceding claims, wherein the target nucleic acid molecule is a single nucleic acid molecule.

16. The method according to any one of the preceding claims, wherein the length of target nucleic acid molecule is at least about 10 kb, at least about 20 kb, at least about 30 kb, at least about 40 kb, at least about 50 kb, at least about 60 kb, at least about 70 kb, at least about 80 kb, at least about 90 kb, at least about 100 kb or preferably at least about 200 kb.

17. The method according to any one of the preceding claims, wherein fragmentation comprises exposing the bound and stretched target nucleic acid molecule to an enzyme.

18. The method according to any one of the preceding claims, wherein the enzyme is selected from a transposase and a restriction enzyme, or any combination thereof.

19. The method according to any one of the preceding claims, wherein 5 '-end of the adaptor nucleotide sequence includes a cross-linker.

20. The method according to any one of the preceding claims, wherein the cross-linker is cross-linkable by a heterobifunctional cross-linking agent or is cross-linkable by EDC.

21. The method according to any one of preceding claims, wherein the heterobifunctional cross-linking agent is selected from the group consisting of s-MBS, s-SIAB, s-SMCC, s-GMBS and s-MPB, or any combination thereof.

22. The method according to any one of the preceding claims, wherein the adaptor nucleotide sequence is present at a 5 '-end of the fragment.

23. The method according to any one of the preceding claims, which further includes amplifying the cross-linked fragment and nucleic acid sequencing of the amplified cross-link fragment.

24. A scaffold sequence template for nucleic acid sequence analysis and genome analysis produced according to any one of Claims 1 to 23.

25. A device comprising the template of Claim 24.

26. The device according to Claim 25, which is a microfluidic device or a mesofluidic device.

27. The device according to Claim 26, wherein the microfluidic device or the mesofluidic device may be a flowce!l.

28. The device according to Claim 27, wherein the flowcell may be a flowcell configured for a high-throughput sequencing apparatus.

29. A system comprising a device according to any one of Claims 25 to 28.

30. A method of acquiring long-range sequence information, wherein the method includes producing a template according to any one of Claims 1 to 23.

31. The method according to Claim 30, wherein the scaffold sequence template comprises of multiple ordered short sequence reads, and gaps represented by "Ns" based on the gap size.

32. The method of acquiring the scaffold sequence template according to Claim 31, which further includes correlating the nucleotide sequences of cross-linked fragments with their positions on the target nucleic acid molecule.

Description:
TITLE

"METHODS FOR THE GEN ERATION OF MULTIPLE ORDERED NEXT-GENERATION SEQUENCING

READS ALONG LARGE SINGLE DNA MOLECULES"

FIELD

[0001] This present disclosure relates generally to the sequencing of nucleic acids. Particularly, this invention discloses a method and system for the generation of multiple ordered short polynucleotide sequence reads on a large and linear-stretched genomic DNA molecule,

[0002] Bibliographic details of various citations referred to by author in the present specification are listed at the end of the description.

CROSS-REFERENCE TO RELATED APPLICATION

[0003] This application claims the benefit of United States Provisional Application Number 62/200,624, filed 3 August 2015, the subject matter of which is incorporated herein by reference in its entirety.

BACKGROUND

[0004] Knowledge of DNA and RNA sequences has become indispensable for understanding of basic biological systems, as well as in numerous applied fields such as disease- diagnostics, drug development, biotechnology, forensics and biological systematics. DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases— adenine, guanine, cytosine, and thymine— in a strand of DNA. Conventional DNA sequencing relies on the technique of the dideoxy-nucleotide chain termination during polymerase chain reaction as described by Sanger et al. (1). Even though Sanger sequencing has been used successfully for sequencing of several different bacterial, archeal, and eukaryotic genomes, this technique has limitations in both throughput and costs for population and diversity sequencings, as well as other applications. Much effort has been invested into the development of alternative methods for DNA sequencing, and the advances in these developments have resulted in the development of rapid, massively parallel and extremely high throughput next- generation DNA sequencing methods, which have greatly accelerated biological and medical research and discovery.

[0005] Current DNA sequencing methods are mainly polymerase-based methods and these sequencing approaches have substantially improved the throughput over classical Sanger sequencing by using different detection methods of nucleotide addition by polymerase and

. I - massively parallel sequencing reactions at micron or nanometre scales (2, 3), This includes 454 pyrosequencing by the detection of released pyrophosphate (PPi) during DNA synthesis (4), Solexa or Illumina sequencing by the detection of fluorescent dye labelled nucleotides with reversible terminator (5-7), Ion Torrent sequencing by the detection of hydrogen ions that are released during nucleotide addition by polymerase using semiconductor pH meters (8) and Pacific Bioscience Single molecule real time sequencing (SMRT) by a nan o- structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide being incorporated by DNA polymerase (9).

[0006] Other non-polymerase based DNA sequencing methods include SOLiD sequencing (Sequencing by Oligonucleotide Ligation and Detection) (10), sequencing by hybridization (SBH) (1 1) and Nanopore sequencing (12 - 14).

[0007] However, most of these new DNA sequencing technologies produce sequence read lengths (mostly less than 400 base pairs (bp)) that are much shorter than conventional Sanger sequencing reads (-800 bp). These short sequencing reads have imposed substantial challenges for de novo sequencing assembly and for the detection genomic structural variations due to the abundance of repetitive sequence elements (15), especially for complex genomes like plant and animal genomes. Even though the Pacific Bioscience RS DNA sequencing system can produce sequence reads average of -3,000 bp, its low accuracy (-83%) and low throughput makes it less appealing. Sequencing by hybridization is primarily useful in interrogating whether specific oligonucleotide occur in a genome sequence, but not for de novo sequencing, Nanopore sequencing methods (16) have also been shown to produce long sequence reads up to 42 kb, but its low accuracy (78 - 85%), low throughput and high input (-milligram DNA input) make it less appealing for de novo sequencing of complex genomes like plants and animals (14, 17).

[0008] In order to overcome assembly problems in de novo sequencing of large and complex genomes using cheap and short sequencing reads traditionally with Sanger sequencing, clone library, clone-by-clone sequencing, physical mapping, and clone-end sequencing were used. However, this clone-by -clone approach is labour intensive and expensive. With the development of rapid, massive parallel and high throughput next-generation sequencing (NGS) technologies, fragment pair-end and mate-pair sequencings were enabled to facilitate the genome assembly and structural variation detection. Schwartz et al. described a preliminary method to build contiguity based on sharing the uniqueness of different barcodes or in situ library construction ( 18), however the effectiveness of this approach still requires more testing. Even with these new approaches, full assembly of genome sequences has proven rate limiting with hundreds of thousands of sequence contigs or scaffolds, especially for large and complex genomes (19). This, in part, explains why currently there are 221 1 eukaryotic genome sequencing projects submitted to NCBI, while only 343 genomes of which are completed (~ two- thirds of these completed genome sequences are small genomes of fungi, protists, and algae) or with chromosomal sequence scaffolds, and most of the eukaryotic genomes are in draft assemblies or work-in-progress.

[0009] In order to facilitate genome assembly, a system termed Optical Mapping' can be very useful for ordering and orienting the sequence contigs or scaffolds, validating the assembled sequences, and characterizing the gaps between the sequence contigs or scaffolds (20). Optical mapping is a system to constaict whole genome ordered restriction physical maps from the ensemble of single large genomic DNA molecules (-500 kb) that have been stretched and mounted on positively charged optical mapping glass surfaces, digested by chosen restriction enzymes, stained, imaged by automated epifluorescent microscope, and the restriction fragments sized by relative fluorescent intensity towards known sequence standard DNA, such as lambda virus DNA. A recent advance in mapping single large genomic DNA molecules is termed 'nanocoding' (21), which uses genomic DNA analytes and nicking restriction enzymes, where the nick sites created by nicking restriction enzyme are then tagged by polymerase-mediated nick translation using fluorochrome labelled nucleotides. Single-molecule barcodes can be generated when the decorated DNA molecules are loaded into a microfluidic or nanofluidie channel devices, and imaged by fluorescent microscope. A similar system was also described by Lam et al (2012) (22). However, the resolution or the average restriction recognition sequence marker density of these systems is around 10 kb or larger, and in order to confidently align the sequence contigs or scaffolds assembled from next generation short sequence reads to the single DNA molecule barcodes generated either by optical mapping or nanocoding, the sequence contigs or scaffolds have to contain at least 10 restriction recognition sequence markers, which means that the minimum length of the sequence contigs or scaffolds have to be 100 kb or above. However, many genome sequence assemblies could have an average size of assembled sequence contigs or scaffolds of much less than 100 kb, and substantial number of sequence contigs less than 10 kb, and it is especially true for repeat-rich genomes like plant and mammal genomes. Therefore, these approaches are essentially useful for much later or final stage of genome sequence assembly.

[0010] Shortcomings of current genome analysis and assembly methods may be overcome by obtaining multiple short sequence reads on large and linearly stretched DNA molecules. Optical sequencing (23) and flash sequencing (24) technologies have been described to generate really short sequence reads (< 5 bp) or single base composition in the neighbourhood of nicking enzyme recognition sites along the long linear-stretched single genomic DNA molecules. Optical sequencing is a single molecule DNA sequencing technique that uses sequence-by-synthesis and optical mapping technology. Similar to other single molecular sequencing approaches, this technique analyzes single linear-stretched DNA molecules treated with DNase, and generates sequence reads through the incorporation of fluorochrome-labeled nucleotides by DNA polymerases at the nicks on the double-stranded DNA opened by DNase. Flash sequencing is a method to quantify the sequence base composition at the neighbourhood of nicking sites based on the incorporation of one fluorochrome-labeled nucleotide, and the other three unlabel! ed nucleotides by polymerase from nick sites. However, these technologies are heavily reliant upon on single fluorochrome detection or the quantification of the fluorescent intensity at the neighborhood of nick sites, and it has been proven to be extremely challenging (25).

[0011] Shendure and colleagues (26, 27) describe methods of producing multiple short sequence reads along larger double stranded DNA molecules using in vitro transposa.se transposition bubbles with adaptors containing barcodes such that contiguity is built based on sharing the uniqueness of different barcodes or in situ !ibraiy construction. However, the method used by Shendure et al has limited resolution in that it is difficult to obtain more than 50 kb of long-range sequence information.

SUMMARY

[0012] The present disclosure is directed, in part, to improved methods and systems for single molecule polynucleotide sequencing that can yield enhanced accuracy and ease of use by generating multiple ordered short sequencing reads locally at many locations along a target nucleic acid molecule, and in particular a linearly stretched, HMW genomic DNA molecule.

[0013] In one broad aspect, the invention provides a method for producing a template for determining nucleotide sequence information of a nucleic acid molecule, said method comprising: (i) producing a bound and stretched target nucleic acid molecule on a positively charged surface; (ii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the bound and stretched target nucleic acid molecule includes an adaptor nucleotide sequence having a covalentiy cross-linkable 5'-end; and (iii) covalently cross-linking the fragment of the bound and stretched target nucleic acid molecule including an adaptor nucleotide sequence to the positively charged surface, to thereby produce the template. [0014] In a first aspect, the invention provides a method for producing a template for determining nucleotide sequence information of a nucleic acid molecule, said method comprising:

(i) producing a stretched and bound target double stranded nucleic acid molecule on a positively charged surface with cross-linking moieties, and this target nucleic acid molecule may have many short single stranded DNA flaps ligated with adaptor oligonucleotides or may have been directly inserted and ligated with adaptor oligonucleotides through in vitro transposition;

(ii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the bound and stretched target nucleic acid molecule includes an adaptor nucleotide sequence having a covalently cross-linkable 5 '-end; and

(iii) covalently cross-linking the fragment of the bound and stretched target nucleic acid molecule including an adaptor nucleotide sequence to the positively charged surface, to thereby produce the template,

[0015] In a second aspect, the invention provides a method for producing a template for determining nucleotide sequence information of a nucleic acid molecule, said method comprising:

(i) producing a stretched and bound target double stranded nucleic acid molecule on a positively charged surface with cross-linking moieties, and this target nucleic acid molecule may have many short single stranded DNA flaps ligated with adaptor oligonucleotides or may have been directly inserted and ligated with adaptor oligonucleotides through in vitro transposition;

(ii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the stretched and bound target nucleic acid molecule includes an adaptor oligonucleotide having a covalently cross-linkable 5 '-end;

(iii) covalently cross-linking the 5 '-ends of one adaptor oligonucleotides on the fragments of the stretched and bound target nucleic acid molecule with the cross-linking moieties on the positively charged surface, to thereby produce the PGR templates;

(iv) PGR amplifying and sequencing the 5 '-end crossiinked fragments of the stretched and bound target nucleic acid molecule after two oligonucleotide primer lawn construction on the surface;

(v) mapping the sequence reads of the fragments back on to the stretched and bound target nucleic acid molecule based on sharing position coordinates on the surface or digital features on their images; and (vi) constructing DNA scaffold sequence templates by representing the gaps between fragment sequence reads of target nucleic acid molecules with "Ns" based on the DNA size estimation of the gaps.

[0016] Preferably, the positively charged surface has been exposed to two modifying agents, one agent confers the positive charge to the surface, while the other agent confers crosslinking moiety or group to the surface. In preferred embodiments, the modifying agents are silanising agents. More preferably, the silanising agents are selected from the group consisting of aminosilane, epoxysilane, glycidoxysilane and mercaptosilane, and combinations thereof. Even more preferably, the silanising agents are aminosilanes.

[0017] Suitably, the positively charged surface is a silicon surface and / or a glass surface.

[0018] Preferably, a microfluidic device or a mesofluidic device comprises the positively charged surface, and suitably, an inner surface of the microfluidic device or mesofluidic device comprises the positively charged surface. More preferably, the microfluidic device or the mesofluidic device is a flowcell and even more preferably, the flowcell of a nucleic acid sequencing apparatus. In preferred embodiments, the nucleic acid sequencing apparatus is a high-throughput sequencing apparatus.

[0019] In preferred embodiments, the positively charged surface is a silanised, positively charged surface.

[0020] In preferred embodiments, the target nucleic acid molecule is selected from a DNA and a RNA. Preferably, the DNA is selected from cDNA and genomic DNA. More preferably, the DNA is genomic DNA and more preferably, a high molecular weight genomic DNA molecule.

[0021] The target nucleic acid molecule may be double- stranded or single-stranded. Preferably, the target nucleic acid molecule is double-stranded.

[0022] In particular embodiments, the target nucleic acid molecule is a single nucleic acid molecule.

[0023] Preferably, the length of target nucleic acid molecule is at least about 10 kb, at least about 20 kb, at least about 30 kb, at least about 40 kb, at least about 50 kb, at least about 60 kb, at least about 70 kb, at least about 80 kb, at least about 90 kb, at least about 00 kb, at least about 200kb or preferably at least about 300 kb or more.

[0024] In preferred embodiments, fragmentation comprises exposing the bound and stretched target nucleic acid molecule to an enzyme. Preferably, the enzyme is selected from a transposase and a restriction enzyme, and a combination thereof. Suitably, the enzyme may be a wild-type enzyme, or a variant or mutant thereof. The enzyme may be a fragment or subunit including the desired functional activity. Preferably, the transposase is Tn5.

[0025] Preferably, the 5 '-end of the adaptor nucleotide sequence includes a cross-linker. Suitably, the cross-linker molecule is cross-linkable by a heterobitunctional cross-linking agent or is cross-linkable by EDC. In preferred embodiments, the heterobifunctional cross-linking agent is selected from the group consisting of s-MBS, s-SIAB, s-SMCC, s-GMBS and s-MPB, and combinations thereof.

[0026] According to any one of the aforementioned, the method further includes amplifying the cross-linked fragment and nucleotide sequencing of the amplified cross-link fragment.

[0027] In a third aspect, the invention provides a template produced according to the method of the first or second aspect.

[0028] In a fourth aspect, the invention provides a device comprising the template of the third aspect.

[0029] Suitably, the device of the fourth aspect is a microfluidic device, a mesofluidic device and/or a nanofluidic device. Preferably, the device is a flowcell and more preferably, a flowceil configured for high throughput nucleotide sequence analysis.

[0030] In a fifth aspect, the invention provides a system comprising producing a template according to the method of the first aspect, the method according to the second aspect or a device according to the fourth aspect.

[0031] In a sixth aspect, the invention provides a method of acquiring contiguity information, which includes producing a template to the first aspect or second aspect.

[0032] Preferably, the method according to the sixth aspect further includes correlating the nucleic acid sequence of cross-linked fragment with a position on the target nucleic acid molecule.

[0033] Genome analysis as mentioned in any one of the methods, devices and systems is suitably genome assembly (and preferably de novo genome assembly), genome assembly and analysis, haplotype phasing, structural variation discovery, high resolution metagenomics, and complex genome analysis such as cancer genomes and polyploid plant genomes. Genome analysis may also relate to resequencing.

[0034] It will be appreciated that reference herein to "preferred" or "preferably" is intended as exemplary only.

[0035] Unless the meaning is clearly to the contrary, all ranges set forth herein are deemed inclusive of the endpoints.

[0036] Ranges are to be interpreted as being fully inclusive of all values between the limits. [0037] The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element. As used herein, the use of the singular includes the plural (and vice versa) unless specifically stated otherwise.

[0038] Throughout this specification, unless the context requires otherwise, the words "comprise," "comprises" and "comprising" will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements. Thus, use of the term "comprising" and the like indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present. By "consisting of is meant including, and limited to, whatever follows the phrase "consisting of. Thus, the phrase "consisting of indicates that the listed elements are required or mandatory, and that no other elements may be present. By "consisting essentially of is meant including any elements listed after the phrase, and limited to other elements that do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements. Thus, the phrase "consisting essentially of indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present depending upon whether or not they affect the activity or action of the listed elements. In some embodiments, the phrase "consisting essentially of in the context of a recited subunit sequence (e.g nucleic acid sequence) indicates that the sequence may comprise at least one additional upstream subunit (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more upstream subunits, e.g., nucleotides) and/or at least one additional downstream subunit (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50 or more upstream subunits; e.g., nucleotides), wherein the number of upstream subunits and the number of downstream subunits are independently selectable.

[0039] The terms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted.

BRIEF DESCRIPTION OF THE DRAWINGS

[0040] FIG. 1 is a diagrammatic representation of a preferred embodiment of the method and system of this invention showing how the sequencing templates and the following sequencing reads are generated at nearby each nick site along the high molecular weight (FIMW) double stranded genomic DNA molecules using Tn5 transposase-based tagmentation. [0041] FIG. 2 is a diagrammatic representation of an alternative preferred embodiment of the method and system of this invention showing how the sequencing templates and the sequencing reads are generated at nearby each nick site along the HMW double stranded genomic DNA molecules using nicking enzymes, and strand-displacement DNA polymerases.

[0042] FIG. 3 is a diagrammatic representation of digital images of six large, stretched and mounted DNA (faint lines) with fluorescent dots (bright dots) showing where the single stranded DNA flaps are along these large and linearly stretched DNA molecules.

[0043] Some figures may contain colour representations or entities. Colour illustrations are available from the Applicant upon request or from an appropriate Patent Office. A fee may be imposed if obtained from the Patent Office.

DETAILED DESCRIPTION

[0044] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, preferred methods and materials are described. Each embodiment described herein is to be applied mutatis mutandis to each and ever}' embodiment unless specifically stated otherwise.

[0045] Short read genome sequencing used in high-throughput technologies (otherwise known as next-generation short (NGS) sequence reads) provides an attractive approach for sequencing of genomes due, in part, to lower costs per run. However, the challenges in assembling short sequence reads have made this technique less attractive for applications such as de novo genome sequencing of large genomes. Therefore, use of short sequence reads have traditionally been limited to resequencing of existing assemblies.

[0046] The present disclosure is predicated, at least in part, on the surprising determination that if multiple unique or largely unique short (for example, about 50 bp to about 200 bp, although without limitation thereto) sequence reads (for example, 1 read / 5 kilobases (kb)) are generated along long and linearly-stretched nucleic acid molecules, in particular genomic DNA molecules (for example, 100 kb - 2000 kb) using the methods of the present disclosure, one or more challenges associated with assembly of next-generation short sequence reads may be overcome since these unique ordered short sequence reads generated from a long single nucleic acid molecule can be used as sequence assembly scaffolds to anchor the relatively small sequence contigs or scaffolds or even to anchor the NGS short sequence reads directly. A particular (but non-limiting) advantage conferred by the methods of the present disclosure is increased resolution and the ability to produce ultra-long-range sequence information of up to, and including, megabases.

[0047] Methods and systems of the present disclosure generally relate to producing a template for determining nucleotide sequence information of a nucleic acid molecule, said method including (i) producing a bound and stretched target nucieic acid molecule on a positively charged surface that is substantially free of other nucleic acid molecules or nucleotide sequences; (ii) subjecting the bound and stretched target nucieic acid molecule to fragmentation, wherein a fragment of the bound and stretched target nucleic acid molecule includes an adaptor nucleotide sequence having a covalently cross-linkable 5'-end; and (iii) covalently cross-linking the fragment of the bound and stretched target nucleic acid molecule including an adaptor nucleotide sequence to the positively charged surface, to thereby produce the template. Preferably, the methods relate to producing a template for determining nucleotide sequence composition information and in particular, genome analysis. Genome analysis may suitably be assembly of an entire genome, or a part thereof, genome sequence assembly and analysis, and preferably de novo genome assembly, genome sequence finishing, haplotype phasing, structural variation discovery, high resolution metagenomics, and genotyping for disease diagnosis, drug target and molecular marker discovery, complex genome analysis such as cancer and polyploid plant genomes, and important trait and genetic variant association etc. It may also relate to resequencing.

[0048] Also disclosed are methods of acquiring, determining or otherwise capturing contiguity information using methods of producing a template as described herein. By "contiguity information " is meant a spatial relationship between two or more nucleic acid fragments, and preferably DNA fragments, based on shared information. The shared aspect may be with respect to adjacent, compartmental and distance spatial relationships. The methods of the present disclosure may also include simultaneous capture of contiguity information and primary nucleotide sequence information. The methods of the present disclosure may also include use of ordered short sequence reads generated from a long single double- stranded nucleic acid molecule as sequence assembly scaffolds to anchor the relatively small sequence contigs or scaffolds or even to anchor the NGS short sequence reads directly for genome sequence assembly and analysis. In preferred embodiments, the methods may relate to producing multiple short nucleotide sequence reads on stretched (preferably linearly stretched) and surface mounted nucieic acid molecules. The methods of the present disclosure may also include producing multiple sequencing reads (and in particular next-generation sequencing reads) such that these sequencing reads are ordered and spaced with a backbone of stretched and surface-bound large nucleic acid molecule, preferably a DNA molecule. Preferably, the sequencing reads are short sequencing reads. More preferably the short sequencing read is between about 5 bp and about 1000 bp, between about 30 bp and about 900 bp, between about 50 bp and about 800 bp, between about 100 bp and about 600 bp, about 200 bp, about 300 bp, about 400 bp and about 500 bp, (and all integers in-between). The methods of the invention may include producing multiple short sequence reads on linearly stretched and surface bound nucleic acid molecules and in particular, genomic DNA (preferably high molecular weight genomic DNA) molecules.

[0049] Any one of the methods of the present disclosure may further include generating clusters of sequencing templates from the fragments with adaptor and cross-linkers; performing cycle sequencing; and mapping sequencing reads back to the target nucleic acid molecule.

[0050] In a preferred embodiment, a method is provided for producing a scaffold sequence template comprising multiple ordered short sequence reads for genome sequence assembly and analysis, said method comprising, (i) producing a bound and stretched target nucleic acid molecule on a positively charged sila ised surface that is substantially free of other nucleic acid molecules or nucleotide sequences; (ii) introducing at least one adaptor nucleotide sequence, of which 5' end is modified with cross-linker, to a single stranded DNA flap on the large double strand genomic DNA molecules before fragmentation or to the double stranded fragments before or during fragmentation; (iii) subjecting the bound and stretched target nucleic acid molecule to fragmentation, wherein a fragment of the bound and stretched target nucleic acid molecule includes an adaptor nucleotide sequence having a covalently cross-linkable 5 '-end; (iv) covalently cross-linking the cross-linkable 5'-end of the adaptor with the nucleic acid fragment to the positively charged surface; (v) generating clusters of sequencing templates from the fragments with adaptor and crosslinkers, and perform cycle sequencing; and (vi) mapping sequencing reads back to the target DNA molecule.

[0051] The template produced according to the methods of the invention are suitable for nucleic acid amplification and/or sequence analysis.

[0052] The term "nucleic acid" as used herein designates single-or double-stranded DNA and RNA. DNA includes genomic DNA and cDNA. RNA includes mRNA, RNA, RNAi, siRNA, cRNA and autocatalytic RNA. Nucleic acids may also be DNA-RNA hybrids, synthetic forms and both sense and antisense strands. A nucleic acid comprises a nucleotide sequence which typically includes nucleotides that comprise an A, G, C, T or U base. However, nucleotide sequences may include other bases such as inosine, methylycytosine, methylinosine, methyladenosine and/or thio uridine, although without limitation thereto. The term "nucleic acid" may be used interchangeably herein with "genetic material", "genetic forms" and "genome". A nucleic acid may be may be chemically or biochemically modified as will be readily appreciated by those skilled in the art. As will be understood, the term "nucleotide sequence" refers to a polymer composed of a multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof). A "polynucleotide" is a nucleic acid having eighty (80) or more contiguous nucleotides, while an "oligonucleotide" has less than eighty (80) contiguous nucleotides.

[0053] The target nucleic acid molecule may be any nucleic acid amenable to nucleotide sequence analysis. The target nucleic acid may be a DNA or an R A molecule, either natural- occurring material or synthesised. The nucleic acid may be either single- stranded or double- stranded. Preferably, the target nucleic acid i s double- stranded. More preferably, the target nucleic acid molecule is double-stranded DNA. The target nucleic acid molecule may be isolated, purified or partially purified. The target nucleic acid molecule may be derived from a tissue, a cell or a body fluid (such as, but not limited to, blood, plasma or saliva), or a fraction thereof (e.g., a nuclear fraction). The target nucleic acid may be in a liquid solution (e.g., a suitable buffer solution) or a solid matrix (e.g., a gel matrix such as an acrylamide gel or an agarose gel). Methods of the present disclosure may preferably include a step of isolating a target nucleic acid. The methods described herein are amenable for analysis of nucleic acid material of prokaryotes (e.g., any member from Archaea and Bacteria) and eukaryotes (e.g., any member from protists, algae, fungi, plants and animals).

[0054] By "isolated" is meant material that is substantially or essentially free from components that normally accompany it in its native state, or from components present during its production when purified or produced by synthetic means. Thus, the term "isolated" also includes within its scope purified or synthetic material .

[0055] As used herein, the term "purified" refers to material (e.g. , a nucleic acid, peptide or polypeptide) that is substantially free of cellular components or other contaminating material from the cell or tissue source from which the material is derived, or substantially free from chemical precursors or other chemicals when chemically synthesized. "Substantially free" means that a preparation of a material (e.g. , a nucleic acid, peptide or polypeptide) is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% pure. In a preferred embodiment, the preparation of a material has less than about 40%, 30%, 20%, 10% and more suitably 5%, 4%, 3%, 2%, 1% (by dry weight), of non-material components or of chemical precursors or of non-material chemicals (also referred to herein as a "contaminating components"). When a material (e.g., a peptide or polypeptide i s recombinantly produced, it is

- l A - also suitably substantially free of culture medium, i.e., culture medium represents less than about 20%, 15%, 10%, 5%, 4%, 3%, 2%, 1% of the volume of the material preparation.

[0056] The methods include producing a bound and stretched target nucleic acid molecule on a positively charged surface. Producing, generating, preparing or otherwise constructing a bound and stretched target nucleic acid molecule includes binding and stretching the target molecule. Binding and stretching may occur simultaneously or step-wise. Binding may occur prior to stretching or vice versa.

[0057] By "binding", "bind" or "bound" in the context of a bound and stretched target nucleic acid molecule is meant a nucleic acid molecule is linked, attached, deposited or othenvise immobilised to a positively charged surface, preferably by a bond other than a nucleotide to nucleotide hydrogen pair. Suitably, the bond is by way of a charge to charge electrostatic interaction and in particular, a negative charge of the DNA is bound to a positive charge on the surface. The binding may be random or selective. Preferably, an end of a negatively charged target nucleic acid molecule is bound to the positively charged surface although it is contemplated that one or more regions of the target nucleic acid molecule may also be bound.

[0058] By "stretching", "stretch", or "stretched" is meant a nucleic acid molecule is elongated, pulled or othenvise lengthened, typically by treatment with, or under, an external force. Suitably, the target nucleic acid molecule is stretched to a generally linear molecule, and more preferably at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% and up to and including about 100% of its contour length. Stretching may be performed by a capillary force or an electric force such as in an electrofield. This may occur in a microfluidic device, or may occur in a flowcell configured for insertion into a microfluidic device. Preferably, the bound and stretched target nucleic acid molecule is maintained, mounted or held in a stretched or elongated state by bonding other than nucleotide to nucleotide hydrogen pair bonding. Suitably, the bound and stretched target nucleic acid molecule is maintained, mounted or held in a stretched or elongated state by an electrostatic charge to charge interactions with the positively charged surface. By way of example, since the continuous double stranded DNA molecules are maintained after stretching and binding to a solid surface, the order of short sequencing reads originated from the single stranded DNA flaps and their extensions flanking the nick sites will be maintained in their native order in the genome. As such, the order information of the short sequence reads permits a wide range of analysis including sening as scaffolds for genome sequence assembly, haplotype phasing and structural variation discovery. [0059] Producing a bound and stretched target nucleic acid molecule on a positively charged surface that is "substantially free of other nucleic acid molecules and nucleotide sequences" is meant that the positively charged surface is either substantially free of, essentially free of, or there is an absence of one or more other, non-target nucleic acid molecules and other non, target nucleotide sequences, and includes a level of said other molecules and sequences that may permit some nucleotide to nucleotide hydrogen bonding. It will be appreciated that said level is a level sufficient to maintain a net positive charge on the surface and will not result in the bound and stretched target nucleic acid being bound and held to the positively charged surface by nucleotide to nucleotide hydrogen bonding. Said level will be known by a skilled addressee and in certain embodiments may be less than 5%, 4%, 3%, 2%, 1% of the total surface area. In certain embodiments, the positively charged surface does not include a lawn, an array or a plurality of other, non-target nucleic acid molecule and nucleotide sequences e.g. oligonucleotides, primers, adaptor nucleotide sequences and the like, during binding and stretching. Therefore, producing said molecule by binding and stretching a target nucleic acid molecule on a positively charged surface substantially free of other nucleic acid molecules and nucleotide sequences confers the following one or more advantages of: minimising interference that arises due to the presence of other nucleic acid molecules during production of the target bound and stretched nucleic acid molecule (which in turn improves efficiency of the method): better presentation of the nucleic acid molecules on the positively charged surface; permits attachment of long molecules (e.g. >100 kb) with reduced or minimal interference; improved cross-linking of the adaptor nucleotide sequences to the positively charged surface; and enhanced digital images of the nucleic acid molecules, although without limitation thereto. If the positively charged surface includes or is populated, and in particular densely populated, with sequences other than the target nucleic acid molecule (such as, but not limited to, short oligo primers) which are negatively charged, it may be difficult to attach negatively charged double stranded DNA molecules on to negatively charged primer lawn due to the negative-negative electrostatic repulsion. It will be appreciated that when bound and stretched target nucleic acid is produced or being produced, the surface may include other non-nucleic acid or nucleotide sequence compounds or agents as necessary.

[0060] The present disclosure is particularly suited to large target double stranded nucleic acid molecules, although not limited thereto. Preferably, the target nucleic acid molecule is at least about 10 kb, at least about 20 kb, at least about 30 kb, at least about 40 kb, at least about 50 kb, at least about 60 kb, at least about 70 kb, at least about 80 kb, at least about 90 kb, at least about 100 kb or preferably at least about 200 kb, at least about 300 kb, at least about 400 kb, at least 500 kb, at least 1000 kb, at least 2000 kb or more.

[0061] As described herein in alternative embodiments, the target double stranded nucleic acid molecule may be nicked prior to binding and stretching. It will be appreciated that nicking may not result in disruption of a double-stranded configuration of a nucleic acid. The nicked DNA may include an adaptor nucleotide sequence prior to binding and stretching and according to some embodiments, the adaptor nucleotide sequence on a nicked DNA may be present on a single stranded DNA flap (as described below). In other embodiments, the target nucleic acid molecule is largely unprocessed prior to binding and stretching (with the possible exception of isolation / purification),

[0062] The positively charged surface may be an entire, or at least a part of, an inner surface of a device, preferably a mesofluidic device or a microfluidic device, and in particular a floweeli, or at least one inner surface such as a bottom surface, a top surface, and a side surface, and any combination thereof. Alternatively, the positively charged surface is configured for insertion in a suitable device and may, for example, be part of a chamber or slide that is insertable into a suitable device. It is contemplated that the solid substrate of the positively charged surface is suitable may be a solid surface and may be glass, a polymer, or a silicon, although without limitation thereto.

[0063] The positively charged surface may be generated by exposing, contacting or treating a surface with a modifying agent that confers a positive charge to the surface. Any suitable modifying agent is contemplated. It is preferred that the modifying agent that confers a positive charge creates a charged surface that is stable for storage and / or can tolerate other biochemical treatments compared to other chemical surface treatments. In certain preferred embodiments, it may be advantageous that the modifying agent is covalently bond to the surface. In preferred embodiments, the modifying agents is a silanising agent that introduces one or more silane groups to the surface. In preferred embodiments, the silanising agent is selected from the group consisting of an aminosilane, a glyci doxy silane, an epoxy silane and a mercaptosilane, and any combination thereof. More preferably, the silanising agent is an aminosilane. The aminosilane may be selected from (3-aminopropyl)-triethoxysilane, (3-aminopropyl)-diethoxy-methylsilane, (3-aminopropyl)-dimethyl-ethoxysilane, (3-aminopropyl)-trimethoxysilane, and any combination thereof. Other aminosilane compounds are not excluded from use in methods of the present disclosure. The glycidooxysiiane may be (3-glycidoxypropyl)-dimethyl-ethoxysilane, although without limitation thereto. The mercaptosilane may be (3-mercaptopropyl)-trimethoxy silane or (3-mercaptopropyl)-methyl-dimethoxysilane, and a combination thereof. Other mercaptosilane compounds are not excluded from use in the invention. In particular, preferred embodiments, the positively charged surface is a silanised, positively charged surface.

[0064] In particularly preferred embodiments, the surface may be treated with a plurality of modifying agents. Each of the modifying agents may be the same, or different. In particularly preferred embodiments, the surface may be treated with a modifying agent that confers a positive charge and a modifying agent that introduces a moiety or functional group for cross-linking with cross-linker on adaptor nucleotide sequence. The modifying agent that introduces a moiety or functional group for cross-linking with cross-linker on adaptor nucleotide sequence introduces a cross-linkable bond group that can be cross-linked with a cross-linker group present at the 5' -end of the adaptor nucleotide sequence. According to preferred embodiments, the surface is treated with two modifying agents. More preferably, each of the two modifying agents is a silanising agent and more preferably, both of silanising agents is an aminosilane, either the same aminosilane or a different aminosilane. In particularly preferred embodiments, the silanised glass/silicon surfaces typically have two aminosilane treatments, one silane agent confers a positive charge to the surface whereas an other silane agent includes a functional group, moiety, or bond that can be crosslinked with an adaptor nucleotide sequence with 5' end modified with cross-linkers. Accordingly in preferred embodiments, the positively charged surface, and more preferably the silanised, positively charged surface, further includes a moeity, functional group or bond that is cross-linkable with an adaptor nucleotide sequence comprising a 5' end modified with cross-linkers. More preferably, the positively charged surface includes a silane group comprising a moeity, functional group or bond that is cross-linkable with an adaptor nucleotide sequence comprising a 5' end modified with cross-linkers.

[0065] Alternatively, a surface may be used that is positively charged and hence may not require treatment to confer a positive charge. According to these embodiments, a silicon or glass surface may be used.

[0066] The methods of the disclosure include a fragment of the bound and stretched target nucleic acid molecule is covalently cross-linkable to a corresponding moiety present on a positively charged surface due to the presence of an adaptor nucleotide sequence on said fragment. Suitably, said fragment includes an adaptor nucleotide sequence wherein the adaptor nucleotide sequence includes a 5' -end, and in particular a 5'-end nucleotide, that is covalently cross-linkable to one or more moeities, functional groups or bonds present on a positively charged surface. It will be understood that the 5 '-end of the adaptor nucleotide sequence may be modified such that the molecule is covalently cross-linkable to one or more moeities, functional groups or bonds that are also present on a positively charged surface. It will be understood that covalently cross-linking the cross-linkable group on the 5' -end of the adaptor nucleotide sequence present on said fragment to the surface and in particular, cross-linking to a corresponding moiety, functional group or bond present on the surface. Preferably, corresponding moiety, functional group or bond is included in a siiane group. Suitably, the 5'- end cross-linker modified adaptor nucleotide sequence is not bound to the surface through a complementary nucleotide sequence present on the surface. Preferably, the adaptor nucleotide sequence comprising a 5'-end modified with a cross-linkable group is bound to the surface by a bond other than a nucleotide to nucleotide hydrogen bond pair. The modified adaptor nucleotide sequence includes a 3' -end that can be introduced, and preferably ligated, to a 5 '-end of a target nucleic acid. In a preferred embodiment, the 3 '-end of the adaptor nucleotide sequence is introduced to a 5' -terminus of a nucleic acid molecule or fragment. Alternatively, the adaptor sequence may be within the nucleic acid molecule or fragment, and preferably, the adaptor sequence may be extended by a polymerase within the nucleic acid molecule or fragment. In certain embodiments, the 3 '-end of the adaptor nucleotide sequence is introduced to a 5-'end of a single stranded DNA flap (preferably resulting from a nick) on the large double strand genomic DNA molecules before fragmentation or to the fragments (suitably, the double-stranded fragments) before or during fragmentation. The present disclosure contemplates a fragment of the bound and stretched target nucleic acid molecule including one or more adaptor nucleotide sequences wherein the, or each adaptor sequence has a 5 -end covalently cross-linkable to a positively charged surface. The present disclosure contemplates that the adaptor nucleotide sequence may be introduced either before, during or after fragmentation. The adaptor nucleotide sequence may be introduced to the target nucleotide acid sequence by any suitable method. Preferably, the adaptor nucleotide sequence is introduced by ligation or in vitro transposition. Preferably, the adaptor nucleotide sequence is introduced to a target nucleic acid molecule by ligation. Preferably, the adaptor nucleotide sequence is an oligonucleotide sequence having a length of at least 5 residues, at least 10 residues, at least 1 5 residues, at least 20 residues, at least 30 residues, at least 40 residues, at least 50 residues, at least 60 residues, at least 70 residues and up to 80 residues. The terms 'adaptor , 'adaptor 2' etc, as used herein are for illustrative purposes only and are not limiting. Adaptor 1, adaptor 2, adaptor 3 etc. may correspond to a first, a second and a third etc. adaptor nucleotide sequence. It is contemplated that a fragment may include adaptor 1 and adaptor 2 sequences however only one adaptor has a 5 '-end that is covalently cross-linkable to the positively charged surface. The invention contemplates a plurality of adaptor nucleotide sequences having suitable sequences and lengths. It is contemplated that the nucleotide sequence of each adaptor nucleotide sequence may be the same or different. [0067] The 5' -specific covalent cross-linkage of an adaptor nucleotide sequence according to the present disclosure may be achieved by any suitable chemistry as would be known by a skilled addressee (see for example Adessi et al. Nucleic Acids Research, 2000, Vol. 28, No. 20e87, which is incorporated by reference in its entirety). A 5 '-phosphate modified nucleotide can be crosslinked to a silanized glass surface or a silicon surface by reacting with 1 -ethyl -3 -(3- dimethylaminopropyl)-carbodiimide hydrochloride (EDC) and imidazole reagents, which lead to the formation of a phophoramidate linkage between amino-derivatised glass surfaces, and a 5'- phosphate modified nucleotide sequence. It is envisaged that in other embodiments, the 5 '-end of an adaptor nucleotide sequence can be a 5' thiol modified nucleotide that can react with either the maleimide portion of s-MBS-like compounds or the iodoacetamide portion of s-SIAB to form an amide bond between amino groups of the surface and the succinimidyl ester moiety of the cross-linker compounds exemplified by s-MBS or s-SIAB reagents. The covalent cross- linking agent may be selected from the group consisting of EDC, s-MBS (maleimidobenzoyl-N- hydroxysuifo-succinimide ester), s-SIAB (suIfosuccinimidyl(4-iodoacetaty)aminobenzoate), s- SMCC (sulfosuccinimidyl 4-(N-maleimidomethyl)-cyclohexane-l carboxylate), s-GMBS (Ν-(γ- maleimidobutyryloxy)sulfo succinimide ester) and s-SMPB ((sulfosuccinimidyl 4-(p- maleimidophenyl)-butyrate), and any combination thereof. The 5 '-end modification may be an agent selected from the group consisting of a phosphate, a hydroxyl, a thiol and a dimethoxytrityl, and any combination thereof. In preferred embodiments, the covalent cross- linker is a heterobifunctional cross-linking agent. The cross-linkage may be a s-MBS linkage and more particularly, s-MBS heterobifunctional cross-linker and a 5 '-thiol modified nucleotide sequence; or s-SIAB heterobifunctional cross-linker and a 5 '-thiol modified nucleotide sequence. EDC chemistry based on the use of EDC-mediated coupling of a 5 '-phosphate nucleotide sequences may be used.

[0068] According to the present disclosure, the bound and stretched single target nucleic acids may be fragmented to generate spatially separated clusters at a physical distance proportional to their genomic distance. The methods include a one or more fragmentation events. Fragmentation may be by enzymatic or non-enzymatic methods. Suitable enzymes include a transposase or a restriction enzyme.

[0069] A transposase is particularly advantageous for use in the methods described herein. The introduction of a hyperactive variant of the Tn5 transposase that mediates the fragmentation of double-stranded DNA and ligates synthetic nucleotide sequences, and preferably oligonucleotides, at both ends has greatly advanced the fragment library construction for next- generation sequencing. The present invention is based, at least in part, on using a transposase to break a double stranded and linearly stretched large nucleic acid (preferably a genomic DNA) molecules into fragments on a positively charged and preferably a silanized, positively charged surface such as glass or a silicon surface, whilst an adaptor nucleotide sequence is introduced (and preferably ligated) to both ends of each DNA fragment. Since the 5 '-end of one adaptor nucleotide sequence has a nucleotide modifi ed with a cross-linker, one end of each of the double stranded fragments can be cross-linked onto the silanized and positively charged surface by way of a suitable moiety, functional group or bond. Thus, each DNA fragment will remain where it was located on the large and linearly stretched DNA molecule. Further cross-linking both free adaptor nucleotide sequences with modified 5' -ends onto said solid surface where it has no DNA fragments attached, will generate lawn of both adaptors (adaptor 1 and the complement of adaptor 2) to thus enable bridge PCR for template amplification and sequencing. Therefore, the positions of the sequencing reads on the surface will align to where the large and linearly stretched DNA molecule was, which is their original order in the genome. The collection of multiple short sequence reads along each linearly stretched DNA molecule can be scaffolds for haplotype phasing, long range genomic structural variation detection, high resolution metagenomics, genotyping for disease diagnosis, molecular marker and drug target discoveries and de novo genome sequence assembly etc. with the distances between these ordered short reads as the gap sizes. It will be appreciated that the average length of the nucleic acid fragments generated by transposase transposition may be regulated by controlling the unit amount of transposase used in the transposition reaction. In some embodiments, sequential transposase transposition reactions may be needed. By way of example, since the DNA fragments are attached to a solid surface by positive charge, generally speaking, there are salt or metal ions in the transposition buffer. Thus the positive charge of the metal ions will compete with the positive charges on the solid surface for the negative charges on the DNA fragments and therefore small DNA fragments (less than 3 kb) without adaptor attached and cross-linked on to the silanized surface can easily detach from the solid surface and diffuse away after cross-linking. It will be further appreciated that the average density of short nucleic acid fragments generated along the target nucleic acid molecule generated by transposase transposition may be regulated by sequential transpositions. By way of example, the average density of the short DNA fragments along a large and linearly stretched double stranded DNA molecules generated by transposase transposition may be regulated by sequential transpositions. In some embodiments, if it is desirable to generate a sequence read on average ever}' 5 kb in the genome, transposase transposition reaction is performed by using the amount of transposase enzyme unit that is expected to generate the average DNA fragment size of 5 kb. The fragment ends may be cross- linked to the solid and silanized surface, followed by another sequential in vitro transposition reaction with the amount of transposase unit expected to generate an average DNA fragment size less than 1 kb. The DNA fragments generated by the second transposition reaction may be washed away from the surface if one end of these DNA fragments is not cross-linked on to the silanized solid surface. Accordingly, the fragment density may be reduced and it also may reduce the possibility of the interference between neighbouring sequencing reads.

[0070] According to the present disclosure, a device may comprise an array of large and stretched DNA molecules immobilized on a solid surface, with a molecule density allowing each molecule and the spots of sequencing templates along this DNA molecule generated by bridge polymerase chain reaction (PGR), to be easily resolved by epiflourescent microscopy. The spots of sequencing templates may appear like beads on an invisible line, which is the linear-stretched double stranded DNA molecule, under epiflourescent microscope, and can thus be referred to as "beads on a string" sequencing. The fluorescence events after each nucleotide addition to the sequencing templates can be detected under an optical microscope linked with a CCD camera, resulting in information about whether the spots of the sequencing templates along each double stranded molecule have nucleotide addition or not, therefore, whether each spot of sequencing templates along the large and linearly stretched DNA molecule has the nucleotide addition or not can be observed. Multiple sequence reads along each linear-stretched DNA molecule may be generated when many cycles of nucleotide addition are done.

[0071] According to the present disclosure, stretching of the large and double stranded genomic DNA on the positively charged surface may be a consideration since the transposition by tran sposases not only needs to fragment the double stranded DNA, but also needs to ligate an adaptor nucleotide sequence to the both ends of cutting site. Additives such as TAPS, PEG, acrylamide or other chemical substances to the HMW genomic DNA solution may help to improve the DNA stretching. As long as the double stranded DNA molecules are maintained in linear form after stretching and binding to a solid surface, and can be fragmented by the transposase, the order of short sequencing reads may be recorded from the spots of the sequencing templates generated by bridge PGR along the linearly stretched and bound double stranded DNA molecules. Therefore, the order information of the short sequence reads permits a massively parallel approach to monitoring fluorescent or other events on along many molecules. Moreover, this order information is extremely useful in a wide range of analysis including serving as scaffolds for genome sequence assembly, genotyping and structural variation discovery.

[0072] According to preferred embodiments, the transposition reagents (Nextera™ DNA Sample Prep Kit) from Epicentre may be suitable to cut HMW genomic DNA molecules into fragments (>= 0.5 kb) and at the same time, ligate the adaptors to both end of the DNA fragments. Nextera technology employs in vitro transposition to simultaneously fragment and tag DNA fragments with DNA adaptors with a single stranded DNA overhang. In order to produce the optimal average DNA fragment size for the proposed procedures to produce multiple sequence reads along the large and linearly stretched, positively charged-surf ace-attached genomic DNA molecules according to the methods described herein, tuning the amount of transposase, salt concentration, reaction temperature and time, and additives have been envisioned. It is envisaged that any transposition enzyme, reagent or method may be employed that suitably performs the fragmentation and adaptor-ligation in accordance with the present disclosure. Reference is made to U.S. Pat. No, 5,965,443, US Pat No. 6,437, 109; and European Pat. No. 0927258, each of which is incorporated herein by reference in their entireties, that describe exemplary transposase-based transposition methods that may be used in accordance with the present disclosure. It is contemplated that wild-type, or variant, transposase molecules may be used. The variant may be naturally-occurring, a modified or an engineered or synthetic mutation. According to preferred embodiments, the transposase is Tn5, or a variant thereof.

|0073| According to the present disclosure, the DNA fragments derived from in vitro transposition on a silanized solid surface are tagged with DNA adaptors comprising a nick that may be repaired by nick translation. In the present invention, the 5' -end of one of the two DNA adaptor nucleotide sequence may be modified such that the 5 '-end of the adaptor can be covalently cross-linked after nick repairing onto the silanized solid support via 5 '-end cross- linking.

[0074] In preferred embodiments, the methods include high throughput sequencing methods and apparatus, and preferably next generation sequencing technologies, as will be known by a skilled addressee. Suitable high throughput sequencing methods and apparatus that fall within the scope of the invention include, but are not restricted to, 454 pyrosequencing by the detection of released pyrophosphate (PPi) during DNA synthesis (4), Solexa or Illumina sequencing by the detection of fluorescent dye labelled nucleotides with reversible terminator (5-7), Ion Torrent sequencing by the detection of hydrogen ions that are released during nucleotide addition by polymerase using semiconductor pH meters (8) and Pacific Bioscience Single molecule real time sequencing (SMRT) (9). Other non-polymerase based DNA sequencing methods include SOLiD sequencing (Sequencing by Oligonucleotide Ligation and Detection) (10), sequencing by hybridization (SBH) (11) and Nanopore sequencing (12 - 14).

[0075] The 'Sequencing by Synthesis' method as described in PCT Publication No. WO/1998/044151 (which is incorporated by reference herein in its entirety) can be used according to the present invention to identify the addition of particular nucleotides or the sequences of nucleotide additions, although without limitation thereto. A preferred method comprises the repeated steps of: reacting the immobilized sequencing templates with a sequencing primer, a polymerase and the four different nucleoside triphosphates with fluorescently labelled reversible terminator under conditions sufficient for the polymerase reaction to proceed, wherein each nucleoside triphosphate is conjugated at its 3'-position to a different fluorescent label, so that the label, which has undergone the polymerase reaction, can be determined, and then removal of the label from the growing polynucleotide to allow for the next nucleotide addition. Since each incorporated nucleotide can be unambiguously determined by fluorescent measurements under epifluorescent microscopy, this method is particularly suited to detect many thousands of reactions at different locations along the linearly stretched and large DNA molecules on the solid support simultaneously with no phasing problems.

[0076] A primer or oligonucleotide sequence lawn or array may be introduced after cross- linking the 5' end of the adaptor on to the surface and the ligation of adaptor 2 to construct sequencing templates, to enable bridge PGR amplification of the sequencing templates. Such a lawn or array may be produced by cross-linking of a plurality of adaptor nucleotide sequence to the spare or empty space on the silanized inner surface of the flowcell. The cross-linking of the 5'-ends of the two adaptors or primers will generate lawn of the adaptors or primers to facilitate the solid phase PGR or bridge PGR to amplify the surface cross-linked single stranded DNA flaps and to generate clusters of sequencing templates

[0077] According to the present disclosure, the distances between the spots of sequencing templates along the linearly stretched DNA molecules may be used to estimate the gap sizes between two adjacent short sequence reads. Accordingly, the final product may be alternating units of a short sequence read and a sequence gap size until the full span of each large DNA molecule.

[0078] According to preferred embodiments, the methods of the present disclosure may include 'tagmentation' which is where a transposase, or a variant thereof (preferably the transposase is Tn5) mediates fragmentation of double-stranded DNA and ligation of adaptor oligonucleotides at both ends of these fragments, wherein said method comprises the steps of: (a) attaching and stretching one or more HMW genomic DNA molecules (preferably >= 200 kb) onto a silanized inner surface of a flowcell, preferably either by electrophoresis or alternatively by capillary force or by pressure-driven fluidic flow; (b) staining the stretched and bound HMW DNA molecule/s with a fluorescent dye, (such as but not limited to YOYO-1, DAPI, TOTO, ethidium bromide and propidium bromide) and imaging of the entire surface of the flowcell with i the DNA molecules therein; (c) performing in vitro transposition DNA fragmentation and tagmentation, with one or more modified reagents such as an adaptor nucleotide sequence with a modified 5 '-end such that the resulting DNA fragments including the modified adaptor nucleotide sequence can be cross-linked onto the silanized inner surface of the flowcell; (d) cross-linking free adaptors 1 and 2 (both with 5' -end modification) to the silanized inner surface of the flowcell; (e) removing the complementary strands of the DNA fragments that were not cross-linked onto the silanized surface of the flowcell, for example by melting or alkaline solution washing; (f) performing a bridge or solid phase PCR to produce more sequencing templates; (g) performing cycle sequencing by synthesis using a sequencing primer, a polymerase and the four different nucleoside triphosphates with iluorescently labeled reversible terminator under conditions sufficient for the polymerase reaction to proceed, and to read one nucleotide at each spot of sequencing templates at a cycle, preferably using an Illumina sequencing instrument; and (g) correlating the positions of the sequence reads with the positions of the HMW DNA molecules in the image to generate a scaffold sequence for each HMW DNA molecule, with the distances between sequence reads converted to DNA size as the gap size between reads based on the 0.34 nm rise/bp with a correct factor for stretching.

[0079] According to an alternative preferred embodiment, the present disclosure relates to methods that include using nicks on double stranded DNA and strand displacement DNA polymerase, comprising the steps of: (i) introducing sequence-specific or non-specific nick sites on a double stranded DNA molecule residing in a gel matrix. Said nick sites may be introduced using either enzymatic or non-enzymatic methods as would be known by a person of skill in the art, (ii) performing a strand displacement reaction from the nick sites to generate a single stranded DNA flap using strand displacement DNA polymerase; (iii) ligating the 3 '-end of an adaptor 1 nucleotide sequence. The 5 '-end nucleotide of the adaptor 1 nucleotide sequence is modified with cross-linkers to the 5 '-end of the single DNA flaps on the double stranded DNA molecules inside the gel matrix. Any suitable ligase may be employed such as T4 RNA ligase or T7 DNA ligase. Preferably, the ligase is T7 DNA ligase using splint ligation ; (iv) releasing the double stranded DNA molecule comprising the single stranded DNA flap including the adaptor 1, from the gel matrix into a buffer solution; (v) introducing or loading the double stranded DN A molecules into a flowcell, of which the inner surface and more preferably, the inner roof and bottom surfaces, have been silanized and positively charged, so that the double stranded genomic DNA molecules comprising the single stranded DNA flaps and adaptor 1 can be stretched and bound onto the silanized surfaces, by electrophoresis or pressure-driven pump flow; (vi) cross- linking the 5 '-end of adaptor 1 present on the single stranded DNA flaps, onto the silanized surface of the flowcell near the nick sites present on double stranded DNA molecules, (vii) staining the double stranded DNA molecules with a fluorescent dye (such as, but not limited to YoYo-1), imaging, and removing the dye; (viii) digesting the linearly stretched and surface bound HMW genomic DNA molecules using one or more frequent cutter restriction endonucl eases; (ix) removing the single strand DNA restriction fragments without 5 '-ends that are cross-linked to the surface of the flowcell by alkaline or heat washing with salt solution; (x) ligating the 5 '-end of a second adaptor nucleotide sequence, adaptor 2 nucleotide sequence, to the 3 '-end of the single stranded DNA fragments left on the silanised surface of the flowcell; (xi) cross-linking free adaptors 1 and the complement of adaptor 2 with 5' cross-linkers to the spare or empty space on the silanized surface of the flowcell to thereby generate a lawn of adaptors or primers for sequencing template amplification by the solid phase PGR or bridge PGR; (xii) amplifying the surface cross-linked single stranded DNA flaps by bridge PGR to thereby generate one or more clusters of single stranded DNA flaps with adaptors or complementary sequences thereof; (xiii) performing cycle sequencing by synthesis using sequencing DNA polymerase, reversible terminator and fluorescent dye labeled dNTPs, and sequencing primers; (xiv) mapping the short sequence reads back to the HMW double stranded genomic DNA molecules in the fluorescent images.

[0080] In certain aforementioned embodiments, a short, single strand DNA molecule (herein referred to as a "flap") may be generated from a nick site along a large double stranded DNA molecule using a strand-displacing DNA polymerase. The 3 '-end of Adaptor 1 (a nucleotide sequence wherein the 5 '-end nucleotide modified with a cross-linker) can be ligated to the 5 '-end of these "flaps". When the large double stranded DNA molecule is linearly stretched by pressure-driven force or electrophoresis and bound to a silanized surface, the cross-linker at the 5 '-end of adaptor 1 on the single stranded DNA "flaps" may be cross-linked to the silanized solid surface. The restriction digestion of the linearly stretched and mounted double stranded DNA molecule with the single stranded DNA flap using one or more frequent cutter sticky restriction enzymes and subsequent washing will remove all the DNA fragments without a single stranded DNA flap and adaptor 1. The 5 '-end of a second adaptor nucleotide sequence. Adaptor 2, can be ligated to the 3 '-end of the extended fragment from the single stranded DNA flap by restriction digestion including adaptor 1 (of which the 5 '-end nucleotide has been cross-linked onto the silanized surface) after the removal of the single stranded DNA fragment partially complementary to the extension of the single stranded DNA flap, or alternatively the 5 '-end of Adaptor 2 with its complement can be ligated to the cohesive (sticky) end of the double stranded DNA fragmentsfgenerated by restriction digestion) with single stranded DNA flaps and adaptor 1 attached to the silanized solid surface. Subsequent melting and washing leave only the single stranded DNA containing single stranded DNA flap and its extension plus the two adaptors at both ends on the solid surface, and this single stranded DNA construct can serve as PGR templates. Cross-linking free adaptor 1 and the complement nucleotide sequence of adaptor 2 to the solid surface provides primer sequences for use in solid phase PGR (bridge PGR) to thereby produce one cluster for each of the single stranded DNA constructs, which in turn can serve as the sequencing templates. The subsequent cycle sequencing by synthesis at each cluster can produce a sequence read at the nearby locations of nick sites along the linearly stretched large double stranded DNA molecules that were mounted on solid surfaces before restriction digestion and subsequent removal. As such, the position of the sequencing reads can be correlated or otherwise traced back to the images of the mounted double stranded DNA molecules, and are in their original order in the target genome. The collection of the ordered short sequence reads may be suitable scaffolds for sequence assembly with the distances between these ordered short reads as the gap sizes,

[0081] It is envisaged that in preferred embodiments the length of a single strand flap DNA can be regulated by controlling the ratio of the dideoxynucleotide(s) to the same normal nucleotide(s) during strand displacement reaction by a strand displacement DNA polymerase. The 5'-end of the single stranded flap DNA is free, while the 3 '-end of the flap DNA remains covalently connected to the large double stranded DNA molecule by a phosphodiester bond.

[0082] According to alternative preferred embodiments, the length of a single stranded DNA molecule for sequencing may also be regulated by the number and the kind of restriction enzymes used for the restriction digestion since the single strand DNA molecule to be sequenced includes a single strand DNA flap by strand displacement, and the length between the nick site and the immediate restriction site to the 5 '-side of the nick. Tagmentation with Tn5 transposase in vitro transportation can also be used to replace restriction digestion and later adaptor 2 ligation,

[0083] It is further envisaged that methods of the present disclosure are amenable to use of a device to produce an array of large and stretched DNA molecules immobilized on a solid surface, with a molecule density allowing each molecule and the spots of sequencing templates generated by bridge PGR associated with this DNA molecule to be readily resolved by epiflourecent microscopy.

[0084] As will be appreciated by a skilled addressee, the nick sites on a double stranded

DNA molecule may be generated by nicking enzymes, nucleases, or chemical/physical nicking methods prior to stretching and mounting. The nicking action generates a 3 '-end extendable by a DNA polymerase at each nicking site. As long as the continuous double stranded DNA molecules are maintained after stretching and binding to a solid surface, the order of short sequencing reads originated from the single stranded DNA flaps and their extensions flanking the nick sites will be maintained in their native order in the genome. As such, the order information of the short sequence reads permits a wide range of analysis including serving as scaffolds for genome sequence assembly, haplotype phasing, structural variation discovery, high resolution metagenomics and, genotyping for disease diagnosis, molecular marker and drug target discovery, and important trait-genetic variant association etc.

[0085] Once a single stranded DNA flap and its extension are cross-linked onto the silanized solid surface through the 5' end of one adaptor, the subsequent procedures including the template amplification, cyclic sequencing by synthesis, tracing sequencing reads back to the image of the double stranded DNA molecules and calculating the gap sizes between two adjacent short sequence reads are otherwise similar or identical to other embodiments of the invention.

[0086] Reference is made to FIG. I, which shows a preferred embodiment of the present disclosure that includes transposase fragmentation of HMW double stranded genomic DNA. FIG. 1 step 1 depicts introducing or loading of high molecular weight (HMW) genomic DNA molecules into the flowcell, of which the inner surfaces were silanized and positively charged, using pressure-driven pump flow or electrophoresis. The DNA molecule stretching is dependent on, at least in part, on the viscosity of the HMW DNA solution, the dimension of flowcell, the positive charge density on the silanized inner surfaces of the flowcell and the speed of pump flow (or the voltage gradient for the electrophoresis. The viscosity of the HMW DNA solution is adjustable by adding organic substances such as acrylamide or glycerol etc. and the positive charge density of the silicon surface can be optimized using the different silane concentrations for the surface silanization. It will be appreciated that stretching of HMW genomic DNA molecules may have a significant effect on the functional efficiency of in vitro transposition by transposase. By way of example, or optical mapping, restriction endonucl eases function well when DNA molecules are stretched to about 80% of their contour length. It is anticipated that the Tn5 transposase will also function well on surface-bound HMW genomic DNA molecules that are stretched to about 80% of their contour length even with the introduction (preferably by ligation) of the adaptors to the ends of the fragments. The ultimate goal of this step is to produce multiple optimally stretched and parallel-deposited HMW genomic DNA molecules on the silanized inner surface of the flowcell in order to facilitate the later tagmentation of genomic DNA molecules using in vitro transposition. After stretching and depositing of DNA molecules on a positively charged surface, digital images of the stretched, and surface-bound HMW genomic DNA molecules under epiflourescent microscope after fluorescent dye YoYo-1 staining is collected and analyzed in terms of DNA molecule locations.

[0087] FIG. 1 step 2 and 3 describe the in vitro transposition or the genomic DNA tagmentation using Tn5 transposases, which are performed on the stretched and surface-bound HMW genomic DNA molecules and the two adaptors iigated to the fragments during in vitro transposition. According to this preferred embodiment, the 5 '-end of adaptor 2 is modified with a cross-linker so that the strands of DNA fragments with adaptor 2 Iigated after the in vitro transcription can be cross-linked onto the silanized surfaces. The average DNA fragment size after the in vitro transposition or tagmentation is determined by the amount of transposase added and in vitro transcription reactions. The nick repairing or nick translation reaction can be performed after or during tagmentation. The desirable fragment size of the first round in vitro transposition is larger than about 3 kb so that these derived DNA fragments can remain on the positive charged surface and allow the cross-linking of the 5 '-end with modified adaptor 2 from the DNA fragments onto the silanized inner surface of the flowcell. Optionally, a second tagmentation can be performed after the 5 '-end of adaptor 2 is cross-linked on to the surface in order to optimize the fragment sizes for next-generation sequencing.

[0088] FIG. 1 step 4 depicts the process of cross-linking the 5' cross-linker of DNA fragments of adaptor 2 on to the silanized inner surface of the flowcell after the tagmentation and nick repairing. The term "cross-linker" may refer to a heterobifunctional cross-linking reagent. One possible embodiment is that the 5' -end of the adaptor 2 can be 5' -phosphate modified nucleotide, that can be crosslinked to silanized glass or silicon surface by reacting with EDC and imidazole reagents, which lead to the formation of a phophoramidate linkage between amino-derivatised glass surfaces and 5 '-phosphate modified nucleotide sequences in a one-step reaction. According to other embodiments, the 5' -end of the adaptor 2 can be a 5'- thiol modified nucleotide, and that can react with either the maleimide portion of s-MBS-like compounds or the iodoacetamide portion of s-SIAB to form an amide bond between amino groups of the surface and the succinimidyl ester moiety of the cross-linker compounds exemplified with s-MBS or s-SIAB reagents. After cross-linking, the non-crosslinked DNA strands can be removed by heating and washing.

[0089] FIG. 1 step 5 depicts the process of generating a lawn of both primers/adaptors on the spare/empty silanized inner surface of the flowcell for later production of the sequencing templates by bridge PGR and to perform "bridge PGR" on the inner surface of the flowcell using

DNA polymerases, labelled dNTPs, and other buffer reagents to generate sequencing template clusters for sequencing. The 5 '-ends of both free adaptors/primers have been modified with cross-linkers or modified nucleotide so that they can be cross-linked on to the spare surface of the silanized inner surface in floweell.

[0090] FIG. 1 step 6 depicts the process of performing cycle sequencing reactions on a solid surface using sequencing primers, DNA polymerases, d ' NTPs labelled with reversible terminator, and fluorescent dyes, and other buffer reagents to generate base-reads for each of the sequencing template clusters. After cycle sequencing, there will be one sequencing read for every sequencing template cluster on the silanized inner surface of the floweell.

[0091] FIG. 1 step 7 depicts the process of the present disclosure to map the sequence read of each sequencing template cluster back onto the HMW genomic DNA molecules based on its physical position in the fluorescent image.

[0092] Reference is made to FIG. 2, which is a diagrammatic representation of another embodiment of the present disclosure showing how the sequencing templates and the following sequencing reads are generated at nearby each nick site along the HMW double stranded genomic DNA molecules using nicking enzymes and strand-displacement DNA polymerases. Genomic DNA molecules can be released from a gel matrix with nuclei or cells embedded in the gel and lysed as using NDSK solution (0.01 M Tris, 0.5M EDTA, 1% Lauroyl sarcosine, pH 9.51, and 2 mg/ mi proteinase K) (28). Biochemical reactions such as nicking reaction by nicking enzymes, and strand-displacement reaction by DNA strand displacement polymerases can be performed on the naked genomic DNA (without protein binding) inside the gel matrix,

[0093] FIG. 2 step 1 depicts the introduction of sequence-specific or nonspecific nicks on double stranded DNA molecules inside the agarose gel matrix by DNase or nicking endonucleases such as Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.B rDI, Nt.AlwI, Nb.BbvCL Nt.BbvCI , Nb.BsmI, and Nt.BsmAI, although without limitation thereto, or biochemical methods such as using peptides (see for example, Cheng et al., Bioorg Med Chem. 2001 Jun; 9(6): 1493-8. "Synthesis and DNA nicking studies of a novel cyclic peptide: cyclo[Lys-Trp- Lys-Ahx-]", which is incorporated herein by reference). As long as the method can produce nicks on double stranded DNA molecules with extendable 3 '-ends in the nicks by DNA polymerase, then this method can be used for this purpose. Buffer wash steps will be needed after nicking reaction. An advantage of performing biochemical reactions on double stranded DNA inside gel matrix is that multiple reactions can be performed on the DNA molecules separately without interfering with the following reactions since the leftover reagents and buffer from previous reactions can be washed away with water or new buffers that facilitate the following reactions. The gel plugs containing DNA molecules should be reduced to about 10-20 μ] in volume and the DNA concentration in the gel plugs may preferably be less than about 6 g/ml in order to facilitate the completion of the reaction.

[0094] FIG. 2 step 2 depicts the process to generate single stranded DNA flaps from the extensions of the 3 '-ends of the nick sites using strand displacement DNA polymerase (such as phi29, Bst DNA Polymerase, Large Fragment, SD DNA polymerase (a novel Taq DNA polymerase mutant), and Venta® (exo-) DNA Polymerase etc.), dNTPs, a dideoxynucleotides (e.g. ddTTP) with a ratio of 1 :20 to the same normal nucleotide (dTTP), and the strand displacement reaction buffer. The strand displacement reaction will stop when a ddTTP nucleotide is incorporated, therefore, the average length of the single stranded DNA flaps from the ail nick sites is determined by the ratio of the dideoxy nucleotide (ddTTP) to the same normal nucleotide (dTTP). The average length of the single strand DNA flaps is around 80 nucleotides when a ratio of 1 :20 is used. The reaction temperature depends on the strand displacement DNA polymerase used. Another possible approach to generate consistent length of single strand DNA flaps is to omit the ddTTPs so that strand displacement DNA polymerase reaches its maximum ability to extend the 3 '-end of the nicks and replace the DNA strand from the 5 '-end of the nick. Since the HMW double stranded genomic DNA molecules were not melted, the maximum length of strand displacement is limited and mostly under 400 bp for certain strand displacement DNA polymerases such as DNA polymerase Delta. The strand displacement reaction can be stopped by incubation with 200 mM EDTA (pH 8.0, 2 hrs or overnight on ice), and then three washes with sterile double distilled water or TE (10 mM Tris, pH 8.0, 1 mM EDTA, 30 min, on ice) to remove the leftover reagents and reaction buffer. After single stranded DNA flap generation on large single DNA molecules, adaptor 1 (oligonucleotides can be used as primer for template amplification and sequencing) can be ligated to the 5 '-end of the single DNA flap on the double DNA molecules inside the gel matrix using T4 RNA/DNA ligases or T7 DNA ligase or other ligases by single strand or splint DNA ligation. The 5 '-end of adaptor J modified with cross-linkers can be cross-linked onto a silanized silicon or glass surface using the reaction described above.

[0095] FIG. 2 step 3 depicts the releases of the double stranded DNA molecules with single stranded DNA flaps plus adaptor 1 from the gel matrix to solution and the mounting of the large DNA molecules on to the inner surfaces in the floweell. The inner surfaces in the flowcell have been silanized and positively charged, so that the double stranded genomic DNA molecules with single stranded DNA flaps and adaptor J can be stretched and bound to the inner surface of the flowcell, by electrophoresis or pressure-driven vacuum pump flow. Digital images of DNA molecules in flowcells can be collected under fluorescent microscope after staining with DNA dye such as YoYo-1, and these images can be used for mapping later sequencing reads back to the large DNA molecules. The digital image in Figure 2 is an inset showing a single, stretched and mounted, double stranded DNA molecules on the silanised glass surface, and FIG. 3 are images of six large, stretched and mounted DNA (faint lines) with fluorescent dots (bright dots) showing where the single stranded DNA flaps are along these DNA molecules.

[0096] FIG. 2 step 4 depicts the cross-linking of the 5 '-end of adaptor 1 on the single stranded DNA flaps near the nicking sites on double stranded DNA molecules onto the silanized inner surface of the flowcell and digestion of the linearly stretched and surface-bound HMW genomic DNA molecules with one or more frequent cutter restriction endonucl eases. The DNA restriction fragments without single stranded DNA flaps can be removed by melting and washing with salt solution. The optimal restriction endonuclease selection will leave minimal length of double stranded DNA fragments at the 3 '-ends of the single stranded DNA flaps plus adaptor 1, which has been cross-linked onto the inner surface of the flowcell at their 5 '-ends.

[0097] FIG. 2 step 5 depicts the removal of DNA restriction fragments without single stranded DNA fragments cross-linking to the silanized inner surface by melting and salt solution washing, and the ligation of the 5' end of the complement of adaptor 2 to the 3' end of the single stranded DNA fragments, of which their 5 '-ends have been cross-linked onto the silanized inner surface of the flowcell. The 3 '-end nucleotides of all the complement adaptor 2 molecules are not phosphorylated so that self-ligation will be prevented.

[0098] FIG. 2 step 6 depicts the cross-linking of both free adaptors 1 and 2 with 5' cross- linkers or a modified nucleotide to the spare or empty space on the silanized inner surface of the flowcell. The cross-linking of the 5 '-ends of the two adaptors or primers will generate lawn of the adaptors or primers to facilitate the solid phase PG or bridge PGR to amplify the surface cross-linked single stranded DNA flaps and to generate clusters of sequencing templates.

[0099] FIG. 2 step 7 depicts the sequencing by synthesis same as in the Ulumina TrueSeq chemistry using sequencing DNA polymerase, reversible terminator and fluorescent dye labeled dNTPs.

[00100] FIG. 2 step 8 depicts the mapping of the short sequence reads back to the HMW double stranded genomic DNA molecules in the fluorescent images and construction of scaffold sequencing reads.

[00101] By "about" is meant a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that varies by as much 15, 14, 13, 12, 1 1, 10, 9, 8, 7, 6, 5, 4, 3, 2 or J % to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length. By "corresponds to" or "corresponding to" is meant a nucleic acid sequence that displays substantial sequence identity to a reference nucleic acid sequence (e.g., at least about 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77,

78, 79, 80, 81, 82, 83, 84, 85, 86, 97, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% or even up to 100% sequence identity to all or a portion of the reference nucleic acid sequence) or an amino acid sequence that displays substantial sequence similarity or identity to a reference amino acid sequence (e.g., at least 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,

79, 80, 81 , 82, 83, 84, 85, 86, 97, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, 99% or even up to 100% sequence similarity or identity to all or a portion of the reference amino acid sequence). The terms "wild-type" and "naturally occurring" are used interchangeably to refer to a gene or gene product that has the characteristics (e.g., nucleotide sequence, amino acid sequence, etc.) of that gene or gene product when isolated from a naturally occurring source. A wild type gene or gene product (e.g., a polypeptide) is that which is most frequently observed in a population and is thus arbitrarily designed the "normal" or "wild-type" form of the gene.

[00102] The disclosure of every patent, patent application, and publication cited herein is hereby incorporated herein by reference in its entirety. The citation of any reference herein should not be construed as an admission that such reference is available as "Prior Art" to the instant application. Throughout the specification the aim has been to describe the preferred embodiments of the invention without limiting the invention to any one embodiment or specific collection of features. Those of ski ll in the art will therefore appreciate that, in light of the instant disclosure, various modifications and changes can be made in the particular embodiments exemplified without departing from the scope of the present invention. All such modifications and changes are intended to be included within the scope of the appended claims.

BIBLIOGRAHY

1. Sanger et al (1977) "DNA sequencing with chain-terminating inhibitors." Proc Natl Acad Sci U S A. (1977) 74:5463-5467.

2. Metzker (2010) "Sequencing technologies - the next generation." Nature Review Genetics 1 1 :31 -46.

3. Niedringhaus et al. (2011) "Landscape of Next-Generation Sequencing Technologies " Analytical Chemistry 83 :4327-4341 ,

4. Leamon et al . (2003) "A massively parallel PicoTiterPlate based platform for discrete picoliter-scale polymerase chain reactions," Electrophoresis 24: 682-686.

5. U.S. Patent No, 5,302,509.

6. EP-A-0640146.

7. European Patent No. 1 105529B 1. 8. Rothberg et al. (201 1) "An integrated semiconductor device enabling non-optical genome sequencing." Nature 475: 348-352.

9. Levene et al. (2003) "Zero Mode Waveguides for single Molecule Analysis at High Concentrations." Science 299:682-686.

10. Shendure et al. (2005) "Accurate multiplex polony sequencing of an evolved bacterial genome." Science 309: 1728-1732.

11. Drmanac R et al. (2002) "Sequencing by hybridization (SBH): advantages, achievements, and opportunities." Adv Biochem Eng Biotechnoi. 77:75-101.

12. Deamer and Branton (2002) "Characterization of Nucleic Acids by Nanopore Analysis." Acc. Chem. Res. 35:817-825.

13. Meller et al. (2002) "Single Molecule Measurements of DNA Transported through a Nanopore." Electrophoresis 23 :2583-2591.

14. Jain et al. (2015) "Improved data analysis for the MlnlON nanopore sequencer," Nature Methods 12, 351-356.

15. Cahill et al. (2010), "Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies." PLoS One 5 (7), el 1518.

16. Clarke et al. (2009), "Continuous base identification for single-molecule nanopore DNA sequencing." Nature Nanotechnology 4 (4): 265-270.

17. Karlsson et al. (2015). "Scaffolding of a bacterial genome using MinlON nanopore sequencing " Sci Rep 5: 11996.

18. Schwartz et al . "Capturing native long-range contiguity by in situ library construction and optical sequencing " PNAS, 2012 vol. 109 no. 46: 18749-18754.

19. Henson et al. (2014) "Next-generation sequencing and large genome assemblies " Pharmacogenetics. 2012 Jun; 13(8): 901-915.

20. Zhou et al. (2007) "A single molecule system for whole genome analysis." New High Throughput Technologies for DNA Sequencing and Genomics (Ed., K.R. Mitchelson) Elsevier Scientific Publishers 2:265-300.

21. Jo et al. (2007) "A single-molecule barcoding system using nanoslits for DNA analysis " Proc Natl Acad Sci U S A. 104: 2673-2678.

22. Lam et al. (2012) "Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly." Nature Biotech. 30: 771-776.

23. Zhou et al. (2008) "Optical sequencing: Acquisition from mapped single-molecule templates." Next-Generation Genome Sequencing: Towards Personalized Medicine. Ed. Michal Janitz. 1 st ed. Wiley- VCH, 133-151.

24. US 8,221,973.

25. Ramanathan et al. (2004) "An Integrative Approach for the Optical Sequencing of Single DNA Molecules" Analytical Biochemistry 330.2: 227-41.

26. U.S. Publication No. 2013/0203605 Al .

27. Schwartz et al. "Capturing native long-range contiguity by in situ library construction and optical sequencing." Proc Natl Acad Sci U S A., 2012 vol. 109 no. 46: 18749-18754.

28. Schwartz D & Cantor C, "Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis," Cell 37(1):67~75.




 
Previous Patent: APPARATUS AND METHOD TO STOP BLEEDING

Next Patent: WATER DRONE