Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND COMPOSITIONS FOR DETECTING GENOMIC METHYLATION
Document Type and Number:
WIPO Patent Application WO/2023/028478
Kind Code:
A2
Abstract:
Some embodiments relate to the preparation of nucleic acid libraries for detecting genomic methylation. Some embodiments include the use of hairpin adapters to physically link a conversion-sensitive strand with a conversion-resistant strand. Some embodiments include the use of adapters comprising tags such that a sequence derived from a template strand can be matched with a sequence derived from the complementary strand of the nucleic acid of the sample.

Inventors:
DESANTIS GRACE (US)
KENNEDY ANDREW (US)
STEEMERS FRANK J (US)
Application Number:
PCT/US2022/075327
Publication Date:
March 02, 2023
Filing Date:
August 23, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ILLUMINA INC (US)
International Classes:
C40B30/06; C12Q1/686
Attorney, Agent or Firm:
FULLER, Michael L. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method of preparing a polynucleotide library, comprising:

(a) obtaining a plurality of double-stranded template nucleic acids, and a first adapter comprising a hairpin and a double-stranded region comprising a nick or a nickable site, wherein the nickable site comprises a uracil or a ribonucleotide;

(b) ligating the first adapter to each end of the double-stranded template nucleic acids by double-stranded ligation;

(c) denaturing the double-stranded template nucleic acids to obtain single- stranded template nucleic acids comprising a hairpin;

(d) extending the hairpin in the presence of a conversion-resistant cytosine analog to obtain extended hairpins comprising a template strand and a complementary strand; and

(e) ligating a Y-adapter to an end of the extended hairpins by double-stranded ligation to obtain library’ polynucleotides.

2. The method of claim 1, further comprising converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides.

3. The method of claim 2, wherein the converting comprises bisulfite conversion.

4. The method of any one of claims 1-3, wherein the double-stranded region of the first adapter comprises the nickable site.

5. The method of any one of claims 1-4, wherein (c) comprises contacting the nickable site with an enzyme prior to the denaturing, wherein the enzyme is selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase II, or a combination thereof.

6. The method of any one of claims 1-5, wherein the Y-adapter comprises a double-stranded portion and a non-complementary portion comprising a first single strand and a second single strand.

7. The method of claim 6, wherein the first single strand and/or the second single strand comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

8. The method of any one of claims 1-7, further comprising (i) amplifying the library polynucleotides or converted polynucleotides; and/or (ii) adding indexes to the library polynucleotides or converted polynucleotides.

9. The method of any one of claims 1 -8, wherein the double-stranded region of the first adapter comprises a tag sequence.

10. The method of claim 9, wherein (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another.

11. The method of claim 10, further comprising identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide comprising the same tag sequence by comparing tag sequences of the converted polynucleotides, thereby identifying a first sequence of a converted polynucleotide and a second sequence of a library polynucleotide derived from the same double-stranded template nucleic acid.

12. The method of any one of claims 1-11, wherein the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue.

13. The method of any one of claims 1-12, wherein the conversion-resistant cytosine analog is selected from the group consisting of 5-ethyl dCTP, 5-methyl dCTP, 5- fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP.

14. The method of any one of claims 1 -13, wherein the conversion resistant cytosine analog is 5-methyl dCTP.

15. The method of any one of claims 1 -14, further comprising sequencing the converted polynucleotides.

16. The method of claim 15, further comprising aligning sequences of the converted polynucleotides with a reference sequence.

17. The method of claim 15 or 16, further comprising aligning a sequence of a template strand with a sequence of a complementary strand.

18. The method of any one of claims 15-17, further comprising mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

19. The method of any one of claims 1-18, wherein the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

20. A method of preparing a polynucleotide library, comprising:

(a) obtaining a plurality of double-stranded template nucleic acids, and a first adapter comprising a first adapter strand and second adapter strand, wherein the first adapter strand comprises a hairpin and a double-stranded region formed between a 5’ end of the first adapter strand and a 3’ end of the second adapter strand, wherein a 5' end of the second adapter strand is single-stranded;

(b) ligating the first adapter to each end of the double-stranded template nucleic acids by double-stranded ligation;

(c) denaturing the double-stranded template nucleic acids to obtain single- stranded template nucleic acids; and

(d) extending the hairpin in the presence of a conversion-resistant cytosine analog to obtain to obtain library’ polynucleotides comprising a template strand and a complementary strand.

21. The method of claim 20, further comprising converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides.

22. The method of claim 21 , wherein the converting comprises bisulfite conversion.

23. The method of any one of claims 20-22, wherein the single-stranded 5' end of the second adapter strand comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

24. The method of any one of claims 20-23, wherein the tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

25. The method of any one of claims 20-24, wherein the double-stranded region of the first adapter a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

26. The method of any one of claims 20-25, further comprising (i) amplifying the library polynucleotides or converted polynucleotides; and/or (ii) adding indexes to the library polynucleotides or converted polynucleotides.

27. The method of any one of claims 20-26, wherein the double-stranded region of the first adapter comprises a tag sequence.

28. The method of claim 27, wherein (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another.

29. The method of claim 28, further comprising identifying a first sequence of a converted polynucleotide and a second sequence of a converted poly nucleotide comprising the same tag sequence by comparing tag sequences of the converted polynucleotides, thereby identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide derived from the same double-stranded template nucleic acid.

30. The method of any one of claims 20-29, wherein the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue.

31. The method of any one of claims 20-30, wherein the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5- fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP.

32. The method of any one of claims 20-31, wherein the conversion resistant cytosine analog is 5-methyl dCTP.

33. The method of any one of claims 21-32, further comprising sequencing the converted polyn ucl eotides.

34. The method of claim 33, further comprising aligning sequences of the converted polynucleotides with a reference sequence.

35. The method of claim 33 or 34, further comprising aligning a sequence of a template strand with a sequence of a complementary strand.

36. The method of any one of claims 33-35, further comprising mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

37. The method of any one of claims 20-36, wherein the plurality of double- stranded template nucleic acids comprises genomic DNA or cell-free DNA.

38. A method of preparing a polynucleotide library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids by (!) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, and (ii) end-filling each DNA fragment, wherein each end of the double-stranded template nucleic acids comprises the first adapter;

(b) denaturing the double-stranded template nucleic acids to obtain single- stranded template nucleic acids;

(c) hybridizing a first tailed primer to a region at an end of the single-stranded template nucleic acids, and extending the hybridized primer in the presence of a conversion-resistant cytosine analog to obtain extended polynucleotides comprising a template strand, a complementary’ strand and a double-stranded end; and

(d) ligating a second adapter comprising a hairpin to the double-stranded end of the extended polynucleotides by double-stranded ligation to obtain library polynucleotides.

39. The method of claim 38, further comprising converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides.

40. The method of claim 39, wherein the converting comprises bisulfite conversion.

41. The method of any one of claims 38-40, wherein the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue.

42. The method of any one of claims 38-41, wherein the conversion-resistant cytosine analog is selected from the group consisting of 5-ethyl dCTP, 5-metihyl dCTP, 5- fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP.

43. The method of any one of claims 38-42, wherein the conversion resistant cytosine analog is 5-methyl dCTP.

44. The method of any one of claims 38-43, wherein the first adapter comprises conversion-resistant cytosine analogs.

45. The method of any one of claims 38-44, wherein the first tailed primer comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

46. The method of any one of claims 38-45, further comprising amplifying the converted polynucleotides with second tailed primers to obtain library polynucleotides.

47. The method of any one of claims 38-46, wherein the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

48. The method of any one of claims 38-47, further comprising sequencing the converted polynucleotides.

49. The method of claim 48, further comprising aligning sequences of the converted polynucleotides with a reference sequence.

50. The method of claim 48 or 49, further comprising aligning a sequence of a template strand with a sequence of a complementary strand.

51. The method of any one of claims 48-50, further comprising mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

52. The method of any one of claims 38-51, wherein the plurality of double- stranded template nucleic acids comprises genomic DNA or cell-free DNA.

53. A method of preparing a polynucleotide library?, comprising:

(a) obtaining a plurality of double-stranded template nucleic acids by (i) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, wherein the first adapter comprises a hairpin and cleavable site, and (li) end-filling each DNA fragment, wherein each end of the double- stranded template nucleic acids comprises the first adapter;

(b) denaturing the double-stranded template nucleic acids to obtain single- stranded template nucleic acids comprising the cleavable sites,

(c) cleaving the cleavable sites to remove a portion of the first adapter from the single-stranded template nucleic acids such that an end of the cleaved single-stranded template nucleic acids comprises a hairpin; and

(d) extending the hairpins of the cleaved single-stranded template nucleic acids in the presence of a conversion-resistant cytosine analog to obtain a polynucleotide library comprising extended hairpins comprising a template strand and a complementary strand.

54. The method of claim 53, further comprising converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides.

55. The method of claim 54, wherein the converting comprises bisulfite conversion.

56. The method of any one of claims 53-55, wherein the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue.

57. The method of any one of claims 53-56, wherein the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyi dCTP, 5-methyl dCTP, 5- fluoro dCTP, 5-bronio dCTP, 5-iodo dCTP, 5-cliloro dCTP, 5-tnfluoromethyl dCTP, 5-aza dCTP.

58. The method of any one of claims 53-57, wherein the conversion resistant cytosine analog is 5-methyl dCTP.

59. The method of any one of claims 53-58, wherein the first adapter comprises conversion-resistant cytosine analogs.

60. The method of any one of claims 53-59, wherein the first adapter comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe,

61. The method any one of claims 53-60, wherein (c) comprises contacting the cleavable site with an enzyme selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase II, or a combination thereof.

62. The method of any one of claims 53-61, wherein the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

63. The method of any one of claims 53-62, further comprising sequencing the converted polynucleotides.

64. The method of claim 63, further comprising aligning sequences of the converted polynucleotides with a reference sequence.

65. The method of claim 63 or 64, further comprising aligning a sequence of a template strand with a sequence of a complementary strand.

66. The method of any one of claims 63-65, further comprising mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

67. The method of any one of claims 53-66, wherein the plurality of double- stranded template nucleic acids comprises genomic DNA or cell-free DNA.

Description:
METHODS AND COMPOSITIONS FOR DETECTING GENOMIC

METHYLATION

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Prov. App. No. 63/237297, Filed August 26, 2021 which is incorporated by reference in its entirety.

REFERENCE TO SEQUENCE LISTING

[0002] The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled ILLINC566WOSEQ, created August 16, 2022, which is approximately 28 kilobytes in size. The information in the electronic format of the Sequence Listing is incorporated herein by reference in its entirety’.

FIELD OF THE INVENTION

[0003] Some embodiments provided herein relate to the preparation of nucleic acid libraries for detecting genomic methylation. Some embodiments include the use of hairpin adapters to physically’ link a conversion-sensitive strand with a conversion-resistant strand. Some embodiments include the use of adapters comprising tags such that a sequence derived from a template strand can be matched with a sequence derived from the complementary strand of the nucleic acid of the sample.

BACKGROUND OF T HE INVENTION

[0004] Biomolecule methylation, such as DNA methylation is widespread and plays a critical role in the regulation of gene expression in development, differentiation and disease. Methylation in particular regions of genes, for example their promoter regions, can inhibit the expression of these genes. Earlier work has shown that the gene silencing effect of methylated regions is accomplished through the interaction of methylcytosine binding proteins with other structural components of the chromatin, which, in turn, makes the DNA inaccessible to transcription factors through histone deacetylation and chromatin structure changes. Genomic imprinting in which imprinted genes are preferentially expressed from either the maternal or paternal allele also involves DNA methylation. Deregulation of imprinting has been implicated in several developmental disorders.

[0005] In vertebrates, the DNA methylation pattern is established early in embryonic development and in general the distribution of 5-methylcytosine (5 mC) along the chromosome is maintained during the life span of the organism. Stable transcriptional silencing is critical for normal development and is associated with several epigenetic modifications. If methylation patterns are not properly established or maintained, various disorders like mental retardation, immune deficiency and sporadic or inherited cancers may follow. The study of methylation is particularly pertinent to cancer research as molecular alterations during malignancy may result from a local hypermethylation of tumor suppressor genes, along with a genome wide demethylation.

[0006] The initiation and the maintenance of the inactive X-chromosome in female eutherians were found to depend on methylation. Rett syndrome (RTT) is an X-lmked dominant disease caused by mutation of MeCP2 gene, which is further complicated by X- chromosome inactivation (XCI) pattern. The current model predicts that MeCP2 represses transcription by binding methylated CpG residues and mediating chromatin remodeling,

[0007] DNA methylation pattern changes at certain genes often alter their expression, which could lead to cancer metastasis, for example. Thus, studies of methylation pattern in selected, staged tumor samples compared to matched normal tissues from the same patient offers a novel approach to identify unique molecular markers for cancer classification. Monitoring global changes in methylation pattern has been applied to molecular classification in breast cancer. In addition, many studies have identified a few specific methylation patterns in tumor suppressor genes (for example, pl 6, a cyclin-dependent kinase inhibitor) in certain human cancer types.

[0008] Restriction landmark genomic scanning (RLGS) profiling of methylation pattern of 1184 CpG islands in 98 primary human tumors revealed that the total number of methylated sites is variable between and in some cases within different tumor types, suggesting there may be methylation subtypes within tumors having similar histology. Aberrant methylation of a proportion of these genes correlates with loss of gene expression.

[0009] Since genomic DNA is often the target of methylation analyses, it offers advantages in both the availability of the source materials and ease of performing such analyses. Also, methylation analyses of genomic DNA can be complementary to those used for RN A- based gene expression profiling.

[0010] Accordingly, there is a need for improved methods of determining the methylation status of DNA. The compositions, methods and systems described herein satisfy this need and provide other advantages as well.

SUMMARY OF THE INVENTION

[0011] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality’ of double-stranded template nucleic acids, and a first adapter comprising a hairpin and a double- stranded region comprising a nick or a nickable site, wherein the nickable site comprises a uracil or a ribonucleotide; (b) ligating the first adapter to each end of the double-stranded template nucleic acids by double-stranded ligation; (c) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids comprising a hairpin; (d) extending the hairpin in the presence of a conversion-resistant cytosine analog to obtain extended hairpins comprising a template strand and a complementary strand; and (e) ligating a Y-adapter to an end of the extended hairpins by double-stranded ligation to obtain library' polynucleotides.

[0012] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion.

[0013] In some embodiments, the double-stranded region of the first adapter comprises the nickable site.

[0014] In some embodiments, (c) comprises contacting the nickable site with an enzyme prior to the denaturing, wherein the enzyme is selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase H, or a. combination thereof.

[0015] In some embodiments, the Y-adapter comprises a double-stranded portion and a non-complementary portion comprising a first single strand and a second single strand.

[0016] In some embodiments, the first single strand and/or the second single strand comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. [0017] Some embodiments also include (i) amplifying the library polynucleotides or converted polynucleotides: and/or (ii) adding indexes to the library polynucleotides or converted polynucleotides.

[0018] In some embodiments, the double-stranded region of the first adapter comprises a tag sequence.

[0019] In some embodiments, (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another.

[0020] Some embodiments also include identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide comprising the same tag sequence by comparing tag sequences of the converted polynucleotides, thereby identifying a first sequence of a converted polynucleotide and a second sequence of a library’ polynucleotide derived from the same double-stranded template nucleic acid.

[0021] In some embodiments, the conversion- resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyI dCTP, 5- methyl dCTP, 5-fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0022] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0023] In some embodiments, the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0024] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids, and a first adapter comprising a first adapter strand and second adapter strand, wherein the first adapter strand comprises a hairpin and a double- stranded region formed between a 5' end of the first adapter strand and a 3' end of the second adapter strand, wherein a 5' end of the second adapter strand is single-stranded; (b) ligating the first adapter to each end of the double-stranded template nucleic acids by double-stranded ligation; (c) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids; and (d) extending the hairpin in the presence of a conversion-resistant cytosine analog to obtain to obtain library polynucleotides comprising a template strand and a complementary strand.

[0025] Some embodiments also include converting conversion-sensitive cytosine residues of the library’ polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion.

[0026] In some embodiments, the single-stranded 5' end of the second adapter strand comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0027] In some embodiments, the tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0028] In some embodiments, the double-stranded region of the first adapter a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0029] Some embodiments also include (i) amplifying the library polynucleotides or converted polynucleotides; and/or (li) adding indexes to the library/ polynucleotides or converted polynucleotides.

[0030] In some embodiments, the double-stranded region of the first adapter comprises a tag sequence.

[0031] In some embodiments, (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another.

[0032] Some embodiments also include identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide comprising the same tag sequence by comparing tag sequences of the converted polynucleotides, thereby identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide derived from the same double-stranded template nucleic acid. [0033] In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5- methyl dCTP, 5-fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5 -tri fluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0034] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary’ strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0035] In some embodiments, the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0036] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids by (i) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, and (ii) end-filling each DNA fragment, wherein each end of the double-stranded template nucleic acids comprises the first adapter; (b) denaturing the double-stranded template nucleic acids to obtain single- stranded template nucleic acids; (c) hybridizing a first tailed primer to a region at an end of the single-stranded template nucleic acids, and extending the hybridized primer in the presence of a conversion-resistant cytosine analog to obtain extended polynucleotides comprising a template strand, a complementary strand and a double-stranded end; and (d) ligating a second adapter comprising a hairpin to the double-stranded end of the extended polynucleotides by double-stranded ligation to obtain library polynucleotides,

[0037] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCT'P, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0038] In some embodiments, the first adapter comprises conversion-resistant cytosine analogs.

[0039] In some embodiments, the first tailed primer comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0040] Some embodiments also include amplifying the converted polynucleotides with second tailed primers to obtain library polynucleotides. In some embodiments, the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0041] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0042] In some embodiments, the plurality' of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0043] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids by (i) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, wherein the first adapter comprises a hairpin and cleavable site, and (ii) end-filling each DNA fragment, wherein each end of the double-stranded template nucleic acids comprises the first adapter, (b) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids comprising the cleavable sites; (c) cleaving the cleavable sites to remove a portion of the first adapter from the single-stranded template nucleic acids such that an end of the cleaved single- stranded template nucleic acids comprises a hairpin; and (d) extending the hairpins of the cleaved single-stranded template nucleic acids in the presence of a conversion-resistant cytosine analog to obtain extended hairpins comprising a template strand and a complementary strand.

[0044] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCTP, 5~bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5 -trifluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0045] In some embodiments, the first adapter comprises conversion-resistant cytosine analogs. In some embodiments, the first adapter comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0046] In some embodiments, (c) comprises contacting the cieavable site with an enzyme selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase H, or a combination thereof.

[0047] In some embodiments, the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0048] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include comprising mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0049] In some embodiments, the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA. BRIEF DESCRIPTION OF THE DRAWINGS

[0050] FIG. 1 depicts a scheme in which a converted polynucleotide is prepared from genomic DNA or cell-free DNA, and a converted ‘epigenetic’ portion of the converted polynucleotide is sequenced prior to a ‘genetic’ portion of the converted polynucleotide.

[0051] FIG. 2 depicts a scheme in which a converted polynucleotide is prepared from genomic DNA or cell-free DN A, and a ‘genetic’ portion of the converted polynucleotide is sequenced prior to an ‘epigenetic’ portion of the converted polynucleotide.

[0052] FIG. 3 depicts a workflow for the generation of polynucleotides in which a copy-linking adapter comprising a hairpin, a nick and a non-random molecular tag is ligated to a double-stranded DNA fragment by double-stranded ligation, the double-stranded DNA is denatured, the hairpin is extended in the presence of conversion-resistant nucleotides, and a Y- adapter is ligated to a double-stranded end of the extended hairpin.

[0053] FIG. 4 depicts further processing workflow of the product shown in FIG. 3 in which non-niethylated cytosines are converted, such as selectively converted to other nucleotides to obtain converted polynucleotides and the converted polynucleotides are further processed, such as amplified, indexed and/or enriched, and sequenced.

[0054] FIG. 5 depicts a portion of a workflow in which an alternative Y-shaped adapter is used to modify the order in which a ‘genetic’ portion of the converted polynucleotide is sequenced prior to an ‘epigenetic’ portion of the converted polynucleotide.

[0055] FIG. 6 depicts a workflow with an adapter comprising a cleavable site, for the generation of polynucleotides in which a copy-linking adapter comprising a hairpin, a cleavable site (a uracil) and a non-random molecular tag is ligated to a double-stranded DNA fragment by double-stranded ligation. The site is then cleaved and the double-stranded DNA is denatured, the hairpin is extended in the presence of conversion-resistant nucleotides, and a Y-adapter is ligated to a double-stranded end of the extended hairpin. The products can undergo further processing, such as by methods and steps depicted in FIG. 4.

[0056] FIG. 7 depicts a workflow 7 comprising a single ligation step for the generation of polynucleotides in which a copy-linking adapter comprising is ligated to a double-stranded DNA fragment by double-stranded ligation; the double-stranded DNA is denatured; the hairpin is extended in the presence of conversion-resistant nucleotides. The copy-linking adapter comprises a first adapter strand and second adapter strand, wherein the first adapter strand comprises a hairpin and a double-stranded region formed between a 5' end of the first adapter strand and a 3' end of the second adapter strand, wherein a 5’ end of the second adapter strand is single-stranded.

[0057] FIG. 8 depicts further processing of the product shown in FIG. 7 in which non-methylated cytosines are selectively converted to other nucleotides to obtain converted polynucleotides; and the converted polynucleotides are amplified using tailed primers containing P5 or P7 sequences and index sequences (15 or i7), and sequenced.

[0058] FIG. 9 depicts a workflow comprising use of tagmented DNA with a ligation step for the generation of polynucleotides. In the workflow; the tagmented DNA fragments comprise adapters; the double-stranded DNA is denatured; a tailed primer is annealed to the single-stranded polynucleotides and extended in the presence of conversion- resistant nucleotides to obtain extended polynucleotides comprising a template strand, a complementary strand and a double-stranded end; a second adapter comprising a hairpin is ligated to the double-stranded end of the extended polynucleotides by double-stranded ligation to obtain library polynucleotides; and the library polynucleotides are converted and undergo further processing, such as the addition of indexes and/or flowcell adapters by amplification with tailed primers.

[0059] FIG. 10 depicts a workflow comprising use of tagmented DNA with hairpin adapters for the generation of polynucleotides. In the workflow, the tagniented DNA fragments comprise adapters comprising a sequence capable of forming a hairpin and a cleavable site (U nucleotide). The fragments are end-filled, and the cleavable site is cleaved. The hairpin of the adapter is extended in the presence of conversion-resistant nucleotides to obtain extended polynucleotides. The extended hairpins are converted to obtain converted polynucleotides.

[0060] FIG. 1 1 depicts further processing of the converted polynucleotides depicted in FIG. 10 in which flow cell adapter (e.g., P5, P7 sequences) and indexes (e.g., 15 and 17) are added to the converted polynucleotides by amplification with tailed primers, and the products are sequenced.

[0061] FIG. 12A is a graphical image which depicts a single-stranded converted polynucleotide. [0062] FIG. 12B is a block chart which graphically depicts a double stranded product of amplification of the single-stranded converted polynucleotide with tailed primers adding P7, P5, and index (17, i5) sequences.

[0063] FIG. 12C depicts an example first adapter comprising a hairpin, a neck, and a cleavable site comprising a uracil nucleotide useful for workflow's described herein.

DETAILED DESCRIPTION

[0064] Some embodiments of the methods and compositions provided herein relate to preparation of libraries to determine methylation status of a nucleic acid. Some such embodiments include denaturing a double-stranded polynucleotide to obtain a first single- strand containing methylated and non-methylated cytosines (conversion-sensitive); copying the first single-strand in the presence of conversion-resistant nucleotides to obtain a second single-strand; and selectively converting conversion-sensitive cytosines in the first strand to another nucleotide. Some embodiments provided herein relate to efficient preparation of libraries in which the first stand and second strand remain physically linked to one another.

[0065] Some embodiments include the use of a double-stranded ligation with hairpin adapters and Y-shaped adapters. Some embodiments include the use of a single ligation step with modified hairpin adapters. Some embodiments include the use of transposomes to generate nucleic acid fragments comprising adapters at each end of the fragments, and the use of a hairpin adapter. Some embodiments include the use of transposomes to generate nucleic acid fragments comprising adapters at each end of the fragments in which the adapters are capable of forming hairpins. Some embodiments include the use of adapters comprising tags such that a sequence derived from a template strand of a nucleic acid of a sample can be matched with a sequence derived from the complementary strand of the nucleic acid of the sample.

[0066] The methylation status of nucleic acids is important information that is useful in many biological assays and studies. Very often, it is of particular interest to identify patterns of methy lation at specific regions in the genome. Also, it is often of particular interest to identify the methy lation status of specific CpG dinucleotides.

[0067] The methylation level and pattern of a locus in a nucleic acid sample can be determined using any of a variety of methods described herein and capable of distinguishing presence or absence of a methyl group on a nucleotide base of the nucleic acid. In the case of DNA, methylation, when present, typically occurs as 5-methylcytosine (5-mCyt) in CpG dinucleotides. Methylation of CpG dinucleotide sequences or other methylated motifs in DNA can be measured using any of a variety of techniques used in the art for the analy sis of specific CpG dinucleotide methylation status.

[0068] A commonly-used method of determining the methylation level and/or pattern of DNA requires methylation status-dependent conversion of cytosine in order to distinguish between methylated and non-methylated CpG dinucleotide sequences. For example, methylation of CpG dinucleotide sequences can be measured by employing cytosine conversion-based technologies, which rely on methylation status-dependent chemical modification of CpG sequences within isolated genomic DNA, or fragments thereof, followed by DNA sequence analysis. Chemical reagents that are able to distinguish between methylated and non-methylated CpG dinucleotide sequences include hydrazine, which cleaves the nucleic acid, and bisulfite treatment. Bisulfite treatment followed by alkaline hydrolysis specifically converts non-methylated cytosine to uracil, leaving 5-methylcytosine unmodified as described by Olek A., Nucleic Acids Res, 24:5064-6, 1996 or Frommer etal., Proc. Natl, Acad. Sci. USA 89: 1827-1831 (1992), each of which is incorporated herein by reference in its entirety. The bisulfite-treated DNA can subsequently be analyzed by conventional molecular techniques, such as PCR amplification, sequencing, and detection comprising oligonucleotide hybridization.

[0069] Some embodiments of the invention include to use of bisulfite conversion conditions and bisulfite-resistant cytosine analogs. One consequence of bisulfite-mediated deamination of cytosine is that the bisulfite treated cytosine is converted to uracil, which reduces the complexity of the genome. Specifically, a typical 4-base genome (A,T,C,G) is essentially reduced to a 3-base genome (A,T,G) because uracil is read as thymine during downstream analysis techniques such as PCR and sequencing reactions. Thus, the only cytosines present are those that were methylated prior to bisulfite conversion. Because the complexity of the genome is reduced, standard methods for comparing and/or aligning a bisulfite-converted sequence to the pre-conversion genome can be cumbersome and, in some cases, ineffective. For example, problems may arise when aligning converted fragments to the genome, especially when using short sequences. Accordingly, embodiments of the invention facilitate identification of the genomic context of bisulfite converted DNA as described below.

[0070] Provided herein are methods and compositions that ameliorate problems that arise from the reduced genomic complexity after bisulfite conversion of nucleic acids. For example, some embodiments described herein relate to methods of sequencing nucleic acids and determining the methylation level and/or pattern of the nucleic acids. Other embodiments relate to nucleic acid pairs, arrays and methods of making arrays useful for determining the methylation level and/or pattern of nucleic acids. Using the methods and/or compositions described herein, complexity of the target nucleic acids is preserved by keeping track of complementary strands after the strands have been subjected to bisulfite conversion of nucleic acids.

[0071] Some embodiments provided herein relate to making methylated copies of the target nucleic acids prior to bisulfite conversion. The methylated copies can then be sequenced and compared or aligned to the converted target nucleic acids. The methods provided herein are particularly useful in multiplex formats wherein several nucleic acids having different sequences and/or different methylation patterns are assayed in a common sample or pool. Thus, embodiments of the methods set forth herein can provide the advantage of avoiding the need for separation of different sequences into separate vessels during one or more steps of a methylation detection assay. For example, as set forth in further detail below in regard to particular embodiments, several pairs of nucleic acids can be treated with bisulfite in a common pool and the differences in methylation status for individual nucleic acids from the pool can then be determined.

[0072] Although many of the methods and compositions disclosed herein are exemplified or described in connection with DNA, it will be appreciated that these methods and compositions can be used with or include other nucleic acids. Furthermore, it will be understood that methods and compositions described in the context single nucleic acid molecules can also relate to methods and compositions that include or comprise a plurality of the same, similar and/or different nucleic acids. Such embodiments are often referred to as multiplex embodiments. In these multiplex embodiments, the methods are performed using and the compositions comprise a population of nucleic acids. In some embodiments, the population of nucleic acids may be divided into one or more sub-populations. [0073] Certain embodiments provided herein include aspects disclosed in U.S. Pat. No. 8,895,268 which is incorporated by reference in its entirety.

Definitions

[0074] As used herein, reference to determining the methylation status and like terms refers to at least one or more of the following: 1) determining the level or amount of cytosine methylation in a sample, 2) determining the position of methylated cytosine residues within a sequence, 3) determining the pattern of methylated cytosine in a sequence, and/or 4) determining the whole sequence including the specific position and identity’ of methylated residues in the context of the sequence.

[0075] As used herein, "nucleic acid polymerase" or "polymerase" refers to an enzyme that catalyzes the polymerization of nucleoside triphosphates, and encompasses DNA polymerases, RNA polymerases, reverse transcriptases and the like. Generally, the enzyme will initiate synthesis at the 3 '-end of the primer annealed to a template sequence, and wall proceed in the 5 '-direction along the template, and if possessing a 5' to 3' nuclease activity, it may hydrolyze intervening, annealed probe to release both labeled and unlabeled probe fragments, until synthesis terminates.

[0076] As used herein, the term "DNA polymerase" refers to an enzyme which catalyzes the synthesis of DNA. It uses a one strand of the DNA duplex as a template. For example, templates may include, but are not limited to, single-stranded DNA, partially duplexed DNA and nicked double-stranded DNA. The polymerase can generate a new strand from primers hybridized to the template. As presented herein, an oligonucleotide primer is used which has a free 3'-OH group. The polymerase then copies the template in the 5' to 3' direction provided that sufficient quantities of free nucleotides, such as dATP, dGTP, dCTP, 5-methyl dCTP and dTTP are present. Examples of DNA polymerases include, but are not limited to, E. coli DNA polymerase I, the large proteolytic fragment of E. coli DNA polymerase I, commonly known as "Klenow" polymerase, "Taq" polymerase, T7 polymerase, Bst DNA polymerase, 1'4 polymerase, T5 polymerase, reverse transcriptase, exo-BCA polymerase, Thermus thermophilus (Tth) DNA polymerase, Bacillus stearothermophilus DNA polymerase, Thermococcus litoralis DNA polymerase, Thermus aquaticus (Taq) DNA polymerase and Pyrococcus furiosus (PfU) DNA polymerase. [0077] In embodiments of the present invention, the DNA polymerase copies the template in the 5' to 3' direction in the presence of 5-methyl dCTP, or any other suitable nucleotide which is resistant to cytosine conversion such as bisulfate conversion.

[0078] As used herein, "converted," when used in reference to a nucleic acid or portion thereof, refers to nucleic acid or a portion thereof which has been treated under conditions sufficient to convert cytosine to another base. As used herein, "bisulfite-converted", "bisulfite-treated" and like terms, when used in reference to a nucleic acid or portion thereof, refer to nucleic acid or a portion thereof which has been treated with sodium bisulfite under conditions sufficient to convert cytosine to uracil. Thus, for example, in some embodiments, template nucleic acid wall have at least one cytosine residue that is not methylated and which is converted to uracil by bisulfite treatment. However, the template nucleic acid need not comprise a non-methylated cytosine, either because all cytosines are methylated or because no cytosine residues are present in the template nucleic acid.

[0079] As used herein, "non-converted," when used in reference to a nucleic acid or portion thereof, refers to a nucleic acid or portion thereof where one or more of the cytosines, if present, are not converted to another base, such as uracil, after conversion treatment, such as treatment with sodium bisulfite. Thus, for example, a non-converted complementary copy is a nucleic acid that comprises one or more bi sulfite-resistant cytosine analogs that prevent the conversion of cytosine to uracil.

[0080] As used herein, “conversion-resistant cytosine analog" and like terms refer to cytosine analogs which, when incorporated into DNA, RNA, or other nucleic acid polymers, are refractory to being changed into another base under conditions where cytosine is converted into the other base. As used herein, "bisulfite-resistant cytosine analog" and like terms refer to cytosine analogs which, when incorporated into DNA, RNA, or other nucleic acid polymers, are refractory to deamination caused in reactions with sodium bisulfite. Bisulfite-resistant cytosine analogs are known in the art and can include any cytosine analog with the above- described property. Thus, for example, 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCTP, 5- bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5 -trifluoromethyl dCTP, 5-aza dCTP, or any other bisulfite-resistant nucleotides comprising a cytosine analog can be used in the present embodiments as bisulfite-resistant cytosine analogs. Typically, the bisulfite-resistant cytosine analog is 5-methyl dCTP. Although 5-methyl dCTP and 5 -methylcytosine are referred to in the description, examples and figures, it will be readily understood that any suitable bisulfite- resistant cytosine analog can be used in such embodiments.

[0081] In some embodiments, "cytosine" refers to nucleotides, nucleosides, nucleotide triphospates and the like which include cytosine (i.e., 4-amino-3H-pyrimidin-2-one) as the base. Thus, for example, where embodiments describe replacing cytosine with a bisulfite-resistant cytosine analog such as 5-methyl cytosine, it will be understood that the term cytosine does not include cytosine residues that are methylated at the 5-position of the cytosine base, unless specifically indicated to the contrary. In some embodiments, the term cytosine can refer to a base structure that is common between cytosine and cytosine analogs, including bisulfite-resistant cytosine analogs, as described in detail herein.

[0082] In some embodiments, 5-methyl dCTP replaces all cytosines that complement guanine positions in a complementary copy. In some embodiments, 5-methyl dCTP replaces at least one cytosine that complements a guanine position in the complementary copy. In other embodiments, 5-methyl dCTP replaces at least 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5% or at least 1% of the cytosines that complement guanine positions in the complementary copy. Upon treatment of the complementary copy with sodium bisulfate, those methylated cytosines in the complementary copy are refractory to the deamination reaction, and therefore the genomic complexity is maintained.

[0083] As used herein “template nucleic acid" refers to that strand of a polynucleotide from which a complementary polynucleotide strand can be hybridized or synthesized by a nucleic acid polymerase, for example, in a primer extension reaction. In some embodiments, the template nucleic acid is a template DNA.

[0084] In embodiments disclosed herein, a template nucleic acid is provided and a complementary copy of the template nucleic acid is generated or provided. The template nucleic acid can be either a single DNA strand or one or both of the single strands in a double- stranded molecule. In embodiments where the template nucleic acid is single stranded, the complementary copy is generated by extending an oligonucleotide primer with a nucleic acid poly merase s uch that a complementary copy of some or part of the template strand is extended in the 3’ direction of the oligonucleotide primer. In a preferred embodiment, the template nucleic acid comprises a template DN A. [0085] In embodiments where the nucleic acid is double-stranded, one or both strands may serve as the template strand for nucleic acid polymerase. For example, where one strand (the "sense" strand) serves as template, a complementary copy is generated which is complementary to the sense strand. Likewise, where the antisense strand serves as template, a complementary copy is generated which is complementary to the antisense strand. Where both strands serve as template, a separate complementary copy is generated for each of the sense and antisense strands. In a preferred embodiment, each strand of a double-stranded DNA molecule is a template nucleic acid.

[0086] As used herein, the term "complementary" refers to nucleic acid sequences that are capable of forming Watson-Crick base-pairs. For example, a complementary sequence of a first sequence is a sequence which is capable of forming Watson-Crick base-pairs with the first sequence. The term "complementary" does not necessarily mean that a sequence is complementary to the full-length of its complementary’ strand, but the term can mean that the sequence is complementary’ to a portion thereof. Thus, in some embodiments, complementarity encompasses sequences that are complementary along the entire length of the sequence or a portion thereof. For example, two sequences can be complementary to each other along at least 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or at least 200 consecutive nucleotides. Also, as used herein, a statement that one sequence is complementary' to another sequence also encompasses situations in winch the two sequences have some mismatches. For example, complementary sequences can include sequences that are complementary along at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of the length of the sequence. Here, the term "sequence" encompasses, but is not limited to, nucleic acid sequences, polynucleotides, oligonucleotides, probes, primers, primer- specific regions, and target-specific regions. Despite the mismatches, the two sequences should have the ability to selectively hybridize to one another under appropriate conditions.

[0087] A first nucleic acid strand that is converted, for example using bisulfate treatment, can have conversion induced noncomplementarity with a second strand to which it was previously complementary. For example, cytosines in a first nucleic acid strand may be converted to uracils, such that positions in the first strand that formerly contained cytosines capable of forming Watson-Crick base pairs with guanines at comparable positions in the second, complementary strand are no longer capable of doing so. Nevertheless, for ease of identification, the converted nucleic acid strand may be identified with respect to complementarity of the previous, non-converted first strand to the second strand.

[0088] As used herein, the terms "pairing" or "paired" and the like refer to methods used to match a nucleic acid template with its corresponding complementary copy. For example, pairing can be accomplished via a physical tether between the complementary copy and the bisulfite-converted template nucleic acid. Additionally or alternatively, pairing can be accomplished via tag molecules which identify the complementary’ copy and the bisulfite- converted template nucleic acid as members of a nucleic acid pair. Tag molecules are useful, for example, where the nucleic acid template and the complementary copy are not physically tethered together. Thus, through the use of tag molecules, the two paired members are matched and recognized as members of a pair. The use of covalent tethers and tag molecules are described in further detail below. Also, it will be appreciated that the term "pairing" is not limited to an action or step that must occur at a particular point in the processes described herein. Pairing of nucleic acid sequences can occur at any point in the process that would allow information the information present in a complementary’ strand to be associated with the information present in a corresponding template strand. As such, when used as a noun, the term "pairing" can refer to any two or more associated nucleic acid strands, whether associated physically or via other methods such as tagging or labeling, that are present at any step of a process described herein or that are present in any of the compositions described herein.

Preparing nucleic acid libraries

[0089] Some embodiments of the methods and compositions provided herein include preparing nucleic acid libraries. Such libraries can be useful to determine the location of methylated residues in a nucleic acid, such as genomic DNA or cell-free DNA.

[0090] Some embodiments include methods of preparing a nucleic acid library, comprising (a) obtaining a plurality of double-stranded template nucleic acids, and a first adapter comprising a hairpin and a double-stranded region comprising a nick or a nickable site, wherein the nickable site comprises a uracil or a ribonucleotide; (b) ligating the first adapter to each end of the double-stranded template nucleic acids by double-stranded ligation; (c) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids comprising a hairpin; (d) extending the hairpin in the presence of a conversion- resistant cytosine analog to obtain extended hairpins comprising a template strand and a complementary strand; and (e) ligating a Y-adapter to an end of the extended hairpins by double-stranded ligation to obtain library polynucleotides. In some embodiments, the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0091] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifiuoromethyi dCTP, 5 -aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0092] In some embodiments, the double-stranded region of the first adapter comprises the nick. In some embodiments, the double-stranded region of the first adapter comprises the nickable site. In some embodiments, (c) comprises contacting the nickable site with an enzyme prior to the denaturing, wherein the enzyme is selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase H, or a combination thereof.

[0093] In some embodiments, the Y-adapter comprises a double-stranded portion and a non-complementary portion comprising a first single strand and a second single strand. In some embodiments, the first single strand and/or the second single strand comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0094] In some embodiments, the double-stranded region of the first adapter comprises a tag sequence. In some embodiments, (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another. Some embodiments also include identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide comprising the same tag sequence by comparing tag sequences of the library polynucleotides, thereby identifying a first sequence of a converted poly nucleotide and a second sequence of a converted polynucleotide derived from the same double-stranded template nucleic acid. [0095] Some embodiments also include a method comprising (i) amplifying the library polynucleotides or converted polynucleotides; and/or (ii) adding indexes to the library polynucleotides or converted polynucleotides.

[0096] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include analyzing the sequence data. For example, some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0097] An example workflow is depicted in FIG. 3 and FIG. 4. For example, cell- free DNA of about 170 bp or fragmented genomic DNA is end-repaired and A-tailed to obtain double-stranded template nucleic acids. First adapters, such as copy-linking adapters, are ligated to each end of the double-stranded template nucleic acids by double-stranded ligation using a T4 DNA ligase. The first adapter includes a hairpin of about 8-10 bp in the neck and 4 nucleotides in the loop, and a double-stranded portion of about 5-10 nucleotides. The double- stranded portion of the first adapter can include a tag sequence, such as a non-random molecular tag that can be used to track products, such as nucleic acids and sequences derived from a particular template double-stranded template nucleic acid. Also, double-stranded ligation has increased efficiency over methods that include single-stranded ligation.

[0098] In some embodiments, the first adapter comprises a nick in a single-strand between the hairpin and the double-stranded portion. The nicks in each first adapter ligated to a double stranded template nucleic acid allow the double stranded template to be denatured to obtain single-stranded template nucleic acids, such as a Watson strand and a Crick strand of the double stranded template nucleic acid. The single-stranded template nucleic acids comprise the hairpin of the first adapter and a single-strand of the double-stranded portion of the first adapter. The hairpin in each single-stranded template nucleic acid can be extended to obtain an extended hairpin comprising the single-stranded template (template strand), and a complement of the single-stranded template (complementary strand). The extension is performed in the presence of conversion-resistant nucleotides, including conversion-resistant cytosine analogs. The double-stranded end of the extended hairpin can be A-tailed and a Y- adapter can be ligated to the double-stranded end of the extended hairpin by double-stranded ligation with a T4 DNA ligase to obtain library polynucleotides. The Y-adapter is double- stranded and include a portion in which the strands are complementary to one another, and a portion in which the strands are not complementary to one another. Each strand of the non- complementary portion of the Y-adapter can include sequencing primer binding sites, such as P5, P5', P7, P7' sequencing primer binding sites, amplification primer binding sites, and/or capture probe binding sites. The library polynucleotides can undergo selective conversion of non-methylated cytosine residues, such as bisulfite conversion to obtain converted polynucleotides. After conversion, the strands of the extended hairpins are typically no longer complementary to one another and form a single-stranded molecule. The single-stranded molecule can be further processed via the sequences provided by the Y-adapters, for example, amplifying and/or sequencing. The single-stranded molecule can have additional tags, such as indexes, added via extension and/or amplification using tailed primers. The sequences of the single-stranded molecules can be determined and analyzed. For example, sequences can be aligned to a reference sequence, such as a reference genome. An ‘epigenetic’ read of a single- stranded molecule can be aligned with a ‘genetic’ read of the single-stranded molecule. Sequences derived from the Watson strand and the Crick strand of a double- stranded template nucleic acid can be used to construct higher quality consensus sequences.

[0099] In some embodiments, the order an ‘epigenetic’ read of a single-stranded molecule and a ‘genetic’ read of the single-stranded molecule are obtained can be modified by modifying the Y-adapter and/or sequencing primers. An example workflow is depicted in FIG. 5. In some embodiments, the ‘genetic’ read of a single-stranded molecule has a higher complexity than an ‘epigenetic’ read of a single-stranded molecule because certain conversion- sensitive nucleotides would have been converted to other nucleotides in the ‘epigenetic’ read. Therefore, modifying the Y-adapter to position high complexity (unmodified) sequence in a Read 1 (cluster generation) can facilitate high-quality sequencing data (from epigenetic- converted sequence libraries).

[0100] An additional workflow is depicted in FIG. 6. For example, cell-free DNA of about 170 bp, or fragmented genomic DNA, is end-repaired and A-tailed to obtain double- stranded template nucleic acids. First adapters, such as copy-linking adapters, are ligated to each end of the double-stranded template nucleic acids by double-stranded ligation using a T4 DM A ligase. The first adapter includes a hairpin of about 8-10 bp in the neck and 4 nucleotides in the loop, and a double-stranded portion of about 5-10 nucleotides. The double-stranded portion of the first adapter can include a tag sequence, such as a non-random molecular tag that can be used to track products, such as nucleic acids and sequences derived from a particular template double-stranded template nucleic acid. Also, double-stranded ligation has increased efficiency over methods that include single-stranded ligation.

[0101] In some embodiments, the first adapter comprises a nickable site at a location between the hairpin and the double-stranded portion. Cleavage of the nickable site creates nicks in each first adapter ligated to a double stranded template nucleic acid and allow the double stranded template to be denatured to obtain single-stranded template nucleic acids, such as a Watson strand and a Crick strand of the double stranded template nucleic acid. The single-stranded template nucleic acids comprise the hairpin of the first adapter and a single- strand of the double-stranded portion of the first adapter. The hairpin in each single-stranded template nucleic acid can be extended to obtain an extended hairpin comprising the single- stranded template (template strand), and a complement of the single-stranded template (complementary strand). The extension is performed in the presence of conversion-resistant nucleotides, including conversion-resistant cytosine analogs. The double-stranded end of the extended hairpin can be A-tailed and a Y-adapter can be ligated to the double-stranded end of the extended hairpin by double-stranded ligation with a T4 DNA ligase to obtain library polynucleotides. The Y-adapter is double-stranded and include a portion in which the strands are complementary to one another, and a portion in which the strands are not complementary to one another. Each strand of the non-complementary portion of the Y-adapter can include sequencing primer binding sites, such as P5, P5', P7, and P7' sequencing primer binding sites, amplification primer binding sites, and/or capture probe binding sites.

[0102] In some embodiments, the library polynucleotides can undergo selective conversion of non-methylated cytosine residues, such as bisulfite conversion to obtain converted polynucleotides. After conversion, the strands of the extended hairpins are typically no longer complementary to one another and form a single-stranded molecule. The single- stranded molecule can be further processed via the sequences provided by the Y-adapters, for example, amplifying and/or sequencing. The single-stranded molecule can have additional tags, such as indexes, added via extension and/or amplification using tailed primers. The sequences of the single-stranded molecules can be determined and analyzed. For example, sequences can be aligned to a reference sequence, such as a reference genome. An ‘'epigenetic’ read of a single-stranded molecule can be aligned with a ‘genetic’ read of the single-stranded molecule. Sequences derived from the Watson strand and the Crick strand of a double-stranded template nucleic acid can be used to construct higher quality consensus sequences.

Single ligation methods for preparing libraries

[0103] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library including a single ligation step. Some such embodiments include a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids, and a first adapter comprising a first adapter strand and second adapter strand, wherein the first adapter strand comprises a hairpin and a double-stranded region formed between a 5’ end of the first adapter strand and a 3' end of the second adapter strand, wherein a 5’ end of the second adapter strand is single- stranded; (b) ligating the first adapter to each end of the double-stranded template nucleic acids by double- stranded ligation; (c) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids; and (d) extending the hairpin in the presence of a conversion-resistant cytosine analog to obtain to obtain library polynucleotides comprising a template strand and a complementary strand. In some embodiments, the plurality of double- stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0104] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5 -aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0105] In some embodiments, the single-stranded 5' end of the second adapter strand comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. In some embodiments, the tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. In some embodiments, the double-stranded region of the first adapter a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. In some embodiments, the double-stranded region of the first adapter comprises a tag sequence.

[0106] In some embodiments, (a) comprises obtaining a plurality of the first adapter, wherein the tag sequences of the plurality of the first adapter are different from one another. Some embodiments also include identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide comprising the same tag sequence by comparing tag sequences of the converted polynucleotides, thereby identifying a first sequence of a converted polynucleotide and a second sequence of a converted polynucleotide derived from the same double-stranded template nucleic acid.

[0107] Some embodiments also include (i) amplifying the library polynucleotides or converted polynucleotides; and/or (ii) adding indexes to the library polynucleotides or converted polynucleotides.

[0108] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary? strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0109] Another example workflow is depicted in FIG. 7 and FIG. 8. For example, cell-free DNA of about 170 bp or fragmented genomic DNA is end-repaired and A-tailed to obtain double-stranded template nucleic acids. First adapters, such as copy-linking adapters, are ligated to each end of the double-stranded template nucleic acids by double-stranded ligation using a T4 DNA ligase. The first adapter comprises a first adapter strand and second adapter strand, wherein the first adapter strand comprises a hairpin and a double-stranded region formed between a 5' end of the first adapter strand and a 3' end of the second adapter strand, wherein a 5’ end of the second adapter strand is single-stranded. The first adapter includes a non-random molecular tag. The structure of the first adapter allows the double stranded template to be denatured to obtain single-stranded template nucleic acids, such as a Watson strand and a Crick strand of the double stranded template nucleic acid. A first end of the single-stranded template nucleic acid comprises the first adap ter strand of the first adapter. A second end of the single-stranded template nucleic acid comprises the second adapter strand of the first adapter. The hairpin in each single-stranded template nucleic acid can be extended to obtain an extended hairpin comprising the single-stranded template (template strand), and a complement of the single-stranded template (complementary strand). The extension is performed in the presence of conversion-resistant nucleotides, including conversion-resistant cytosine analogs.

[0110] The extended hairpin can undergo selective conversion of non-niethylated cytosine residues, such as bisulfite conversion to obtain converted polynucleotides. After conversion, the strands of the extended hairpins are typically no longer complementary to one another and form a single-stranded molecule. The converted polynucleotides can be further processed via the adapter sequences, for example, amplifying and/or sequencing. The single- stranded molecule can have additional tags, such as indexes, added via extension and/or amplification using tailed primers. The sequences of the converted polynucleotides can be determined and analyzed. For example, sequences can be aligned to a reference sequence, such as a reference genome. An ‘epigenetic’ read of a single-stranded molecule can be aligned with a ‘genetic’ read of the single-stranded molecule. Sequences derived from the Watson strand and the Crick strand of a double-stranded template nucleic acid can be used to construct higher quality consensus sequences.

[0111] In some embodiments, an enrichment step can be performed on the converted polynucleotides by targeting the unmodified portion of molecule. Some such embodiments include less enrichment probes and simpler design than targeting converted strand with varying methylation status and can mitigate potential enrichment hybridization biases with molecules with different methylation status.

Tagmentation methods for preparing libraries

[0112] Some embodiments of the methods and compositions provided herein include a method of preparing a nucleic acid library including use of transposomes. Transposomes include a transposase and a transposon nucleic acid. In some embodiments, a transposomes can be used to fragment a nucleic acid into a plurality of fragments, wherein each fragment comprises the transposon nucleic acid located at at least one end of each fragment. In some embodiments, each fragment comprises the transposon nucleic acid located at each end of each fragment. Some such embodiments include a method of preparing a nucleic acid library , comprising: (a) obtaining a plurality of double-stranded template nucleic acids by (i) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, and (ii) end-filling each DN A fragment, wherein each end of the double-stranded template nucleic acids comprises the first adapter; (b) denaturing the double- stranded template nucleic acids to obtain single-stranded template nucleic acids; (c) hybridizing a first tailed primer to a region at an end of the single-stranded template nucleic acids, and extending the hybridized primer in the presence of a conversion-resistant cytosine analog to obtain extended polynucleotides comprising a template strand, a complementary strand and a double-stranded end; and (d) ligating a second adapter comprising a hairpin to the double-stranded end of the extended polynucleotides by double-stranded ligation to obtain extended hairpins. In some embodiments, the plurality of double-stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0113] Some embodiments also include converting conversion-sensitive cytosine residues of the library polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5-methyl dCTP, 5-fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5-trifluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[Oil 4] In some embodiments, the first adapter comprises conversion-resistant cytosine analogs. In some embodiments, the first tailed primer comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0115] Some embodiments also include amplifying the converted polynucleotides with second tailed primers. In some embodiments, the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. [0116] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0117] Another example workflow is depicted in FIG. 9. For example, double- stranded genomic DNA is tagmented with transposomes comprising a first adapter to obtain a plurality of nucleic acid fragments, each end of each fragment comprising the first adapter. The first adapter consists of conversion-resistant nucleotides. The ends of the fragments are end-filled. The fragments are denatured to obtain single-stranded fragments. A tailed primer is hybridized to an adapter sequence at an end of the single-stranded fragment. The tailed primer is extended in the presence of conversion resistant nucleotides to obtain extended polynucleotides comprising a template strand, a complementary strand and a double-stranded end. A second adapter comprising a hairpin is ligated to the double-stranded end of the extended polynucleotides by double-stranded ligation in the presence of T4 DNA ligase to obtain extended hairpins.

[0118] The extended hairpin can undergo selective conversion of non-methylated cytosine residues, such as bisulfite conversion to obtain converted polynucleotides. After conversion, the strands of the extended hairpins are typically no longer complementary to one another and form a single-stranded molecule. The converted polynucleotides can be further processed via the first adapter sequences, for example, amplifying and/or sequencing. The single-stranded molecule can have additional tags, such as indexes, added via extension and/or amplification using tailed primers. The sequences of the converted polynucleotides can be determined and analyzed. For example, sequences can be aligned to a reference sequence, such as a reference genome. An ‘epigenetic’ read of a single-stranded molecule can be aligned with a ‘genetic’ read of the single-stranded molecule. Sequences derived from the Watson strand and the Crick strand of a double-stranded template nucleic acid can be used to construct higher quality consensus sequences.

[0119] Some embodiments include a method a method of preparing a nucleic acid library, comprising: (a) obtaining a plurality of double-stranded template nucleic acids by (i) contacting double stranded DNA with a plurality of transposomes comprising a first adapter to obtain DNA fragments, wherein the first adapter comprises a hairpin and cleavable site, and (ii) end-filling each DNA fragment, wherein each end of the double-stranded template nucleic acids comprises the first adapter; (b) denaturing the double-stranded template nucleic acids to obtain single-stranded template nucleic acids comprising the cleavable sites; (c) cleaving the cleavable sites to remove a portion of the first adapter from the single-stranded template nucleic acids such that an end of the cleaved single-stranded template nucleic acids comprises a hairpin; (d) extending the hairpins of the cleaved single-stranded template nucleic acids in the presence of a conversion-resistant cytosine analog to obtain library polynucleotides comprising a template strand and a complementary strand. In some embodiments, the plurality of double- stranded template nucleic acids comprises genomic DNA or cell-free DNA.

[0120] Some embodiments also include converting conversion-sensitive cytosine residues of the library’ polynucleotides to another base residue to obtain converted polynucleotides. In some embodiments, the converting comprises bisulfite conversion. In some embodiments, the conversion-resistant cytosine analog comprises a moiety that inhibits conversion to another base residue. In some embodiments, the conversion-resistant cytosine analog is selected from the group consisting of: 5-ethyl dCTP, 5 -methyl dCTP, 5 -fluoro dCTP, 5-bromo dCTP, 5-iodo dCTP, 5-chloro dCTP, 5 -trifluoromethyl dCTP, 5-aza dCTP. In some embodiments, the conversion resistant cytosine analog is 5-methyl dCTP.

[0121] In some embodiments, (c) comprises contacting the cleavable site with an enzyme selected from a uracil DNA glycosylase (UDG), a DNA glycosylase-lyase, an RNase H, or a combination thereof.

[0122] In some embodiments, the first adapter comprises conversion-resistant cytosine analogs. In some embodiments, the first adapter comprises a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe. In some embodiments, the second tailed primers comprise a sequencing primer binding site, an amplification primer binding site, and/or a target site for a capture probe.

[0123] Some embodiments also include sequencing the converted polynucleotides. Some embodiments also include aligning sequences of the converted polynucleotides with a reference sequence. Some embodiments also include aligning a sequence of a template strand with a sequence of a complementary strand. Some embodiments also include mapping a methylated cytosine residue on a sequence of a converted polynucleotide or a reference sequence.

[0124] Still Another example workflow is depicted in FIG. 10 and FIG. 11. For example, double-stranded genomic DNA is tagmented with transposomes comprising a first adapter to obtain a plurality of nucleic acid fragments, each end of each fragment comprising the first adapter. The firs t adapter comprises a cleavable site and a sequence capable of forming a hairpin. The first adapter consists of conversion-resistant nucleotides which allow for epigenetic conversion of unique ends. The ends of the fragments are end-filled. The fragments are denatured to obtain single-stranded fragments. The cleavable site of a single strand is cleaved. The sequence capable of forming a hairpin of the single-stranded fragment is extended in the presence of conversion-resistant nucleotides to obtain an extended hairpin. The extended hairpin can undergo selective conversion of non-methylated cytosine residues, such as bisulfite conversion to obtain converted polynucleotides. Notably, conversion treatment generates unique adapter sequences on 5' and 3’ ends, used to amplify and attach sample index and flow cell adapters.

[0125] After conversion, the strands of the extended hairpins are typically no longer complementary to one another and form a single-stranded molecule. The converted polynucleotides can be further processed via the first adapter sequences, for example, amplifying and/or sequencing. The single-stranded molecule can have additional tags, such as indexes, added via extension and/or amplification using tailed primers. The sequences of the converted polynucleotides can be determined and analyzed. For example, sequences can be aligned to a reference sequence, such as a reference genome. An ‘epigenetic 1 read of a single- stranded molecule can be aligned with a ‘genetic’ read of the single-stranded molecule. Sequences derived from the Watson strand and the Crick strand of a double-stranded template nucleic acid can be used to construct higher quality consensus sequences.

EXAMPLES

Example I — Indexing a prepared library

[0126] A library of converted polynucleotides is prepared by a workflow substantially similar to the workflow depicted in FIG. 10 and FIG. 11. Capture probe binding and index sequences are added to each end of the single-stranded converted polynucleotides by amplification with tailed primers. FIG. 12A depicts a single-stranded converted polynucleotide. FIG. 12B depicts a double stranded product of amplification of the single- stranded converted polynucleotide with tailed primers adding P7, P5, and index (i7, i5) sequences. Sequencing primers are used to sequence the genetic portion in a first read, and to sequence the epigenetic portion read in a second read. FIG. 12C depicts an example first adapter comprising a hairpin, a neck, and a cleavable site comprising a uracil nucleotide useful for workflows described herein. Example sequences for adapters and primers, such as those depicted in FIG. 10, FIG. 11, FIG. 12 A, FIG. 12.B, and FIG. 12C are listed below in TABLE 1.

TABLE 1 | | i | | | | | i | | i | | | |

[0127] The term “comprising” as used herein is synonymous with “including,” “containing,” or “characterized by,” and is inclusive or open-ended and does not exclude additional, unrecited elements or method steps.

[0128] The above description discloses several methods and materials of the present invention. This invention is susceptible to modifications in the methods and materials, as well as alterations in the fabrication methods and equipment. Such modifications will become apparent to those skilled in the art from a consideration of this disclosure or practice of the invention disclosed herein. Consequently, it is not intended that this invention be limited to the specific embodiments disclosed herein, but that it cover all modifications and alternatives coming within the true scope and spirit of the invention.

[0129] All references cited herein, including but not limited to published and unpublished applications, patents, and literature references, are incorporated herein by reference in their entirety and are hereby made a part of this specification. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.