Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITIONS AND METHODS FOR OXYGENATION OF NUCLEIC ACIDS CONTAINING 5-METHYLPYRIMIDINE
Document Type and Number:
WIPO Patent Application WO/2014/074453
Kind Code:
A1
Abstract:
5-methylpyrimidine oxygenases and their use in the modification of nucleic acids are described.

Inventors:
ZHENG YU (US)
SALEH LANA (US)
PAIS JUNE (US)
DAI NAN (US)
ROBERTS RICHARD J (US)
CORREA IVAN R (US)
MABUCHI MEGUMU (US)
Application Number:
PCT/US2013/068298
Publication Date:
May 15, 2014
Filing Date:
November 04, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEW ENGLAND BIOLABS INC (US)
ZHENG YU (US)
SALEH LANA (US)
PAIS JUNE (US)
DAI NAN (US)
ROBERTS RICHARD J (US)
CORREA IVAN R (US)
MABUCHI MEGUMU (US)
International Classes:
C12N9/00; C12Q1/68
Domestic Patent References:
WO2011091146A12011-07-28
WO2010037001A22010-04-01
WO2013138644A22013-09-19
WO2011025819A12011-03-03
Foreign References:
US20120064521A12012-03-15
US20120301881A12012-11-29
US201313804804A2013-03-14
US201313826395A2013-03-14
US201313827087A2013-03-14
Other References:
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718242, retrieved from EBI accession no. UNIPROT:D2W6T1 Database accession no. D2W6T1
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718243, retrieved from EBI accession no. UNIPROT:D2VP32 Database accession no. D2VP32
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718244, retrieved from EBI accession no. UNIPROT:D2VHE0 Database accession no. D2VHE0
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718245, retrieved from EBI accession no. UNIPROT:D2W4Z7 Database accession no. D2W4Z7
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Putative uncharacterized protein;", XP002718246, retrieved from EBI accession no. UNIPROT:D2VG54 Database accession no. D2VG54
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718247, retrieved from EBI accession no. UNIPROT:D2W5I5 Database accession no. D2W5I5
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718248, retrieved from EBI accession no. UNIPROT:D2V161 Database accession no. D2V161
DATABASE UniProt [online] 2 March 2010 (2010-03-02), "SubName: Full=Predicted protein;", XP002718249, retrieved from EBI accession no. UNIPROT:D2W6P1 Database accession no. D2W6P1
MIAO YU ET AL: "Base-Resolution Analysis of 5-Hydroxymethylcytosine in the Mammalian Genome", CELL, CELL PRESS, US, vol. 149, no. 6, 19 April 2012 (2012-04-19), pages 1368 - 1380, XP028521141, ISSN: 0092-8674, [retrieved on 20120503], DOI: 10.1016/J.CELL.2012.04.027
CHUN-XIAO SONG ET AL: "Detection of 5-hydroxymethylcytosine in a combined glycosylation restriction analysis (CGRA) using restriction enzyme Taq[alpha]I", BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, vol. 21, no. 17, 1 September 2011 (2011-09-01), pages 5075 - 5077, XP055014761, ISSN: 0960-894X, DOI: 10.1016/j.bmcl.2011.03.118
THEODORE DAVIS ET AL: "High Sensitivity 5-hydroxymethylcytosine Detection in Balb/C Brain Tissue", JOURNAL OF VISUALIZED EXPERIMENTS, no. 48, 1 February 2011 (2011-02-01), XP055057589, DOI: 10.3791/2661
H. WANG ET AL: "Comparative characterization of the PvuRts1I family of restriction enzymes and their application in mapping genomic 5-hydroxymethylcytosine", NUCLEIC ACIDS RESEARCH, vol. 39, no. 21, 1 November 2011 (2011-11-01), pages 9294 - 9305, XP055057711, ISSN: 0305-1048, DOI: 10.1093/nar/gkr607
ZHIYI SUN ET AL: "High-Resolution Enzymatic Mapping of Genomic 5-Hydroxymethylcytosine in Mouse Embryonic Stem Cells", CELL REPORTS, vol. 3, no. 2, 1 February 2013 (2013-02-01), pages 567 - 576, XP055094531, ISSN: 2211-1247, DOI: 10.1016/j.celrep.2013.01.001
MORÉRA ET AL.: "T4 phage beta- glucosyltransferase: substrate binding and proposed catalytic mechanism", J MOL. BIOL., vol. 292, no. 3, 1999, pages 717 - 730
BORGARO ET AL.: "Characterization of the 5-hydroxymethylcytosine- specific DNA restriction endonucleases", NUCLEIC ACIDS RESEARCH, 2013
Attorney, Agent or Firm:
STRIMPEL, Harriet, M. (Inc.240 County Roa, Ipswich MA, US)
Download PDF:
Claims:
What is claimed:

1. A composition comprising:

(i) a buffer and

(ii) a purified 5-methylpyrimidine oxygenase having a size less than 600 amino acids and having a catalytic domain having at least 90% identity with SEQ ID NO: l.

2. A composition according to claim 1, wherein the buffer contains glycerol.

3. A composition according to claim 1 or 2, wherein the buffer does not contain ATP.

4. A composition according to claim 1 or 2, wherein the buffer contains ATP.

5. A composition according to any one of claims 1-4, wherein the buffer is at a pH from about 6 to about 8.

6. A composition according to claim 5, wherein the buffer is at a pH from about 6 to about 7.5; such as at a pH from about 6 to about 6.5, or at a pH from about 6.5 to about 7.0, or at a pH from about 7.0 to about 7.5.

7. A composition according to any one of claims 1-6, wherein the buffer contains Fe(II) and a-ketoglutarate.

8. A composition according to any one of claims 1-7, further comprising a nucleic acid.

9. A composition according to claim 8, wherein the nucleic acid comprises 5-methylcytosine.

10. A composition according to claim 8, wherein the nucleic acid comprises at least one of 5-hydroxymethylcytosine (5-hmC), 5- formylcytosine (5-fC) or 5-carboxycytosine (5-caC).

11. A composition according to claim 9 or 10, wherein the nucleic acid comprises double-stranded DNA, single-stranded DNA, or RNA.

12. A fusion protein, comprising:

(i) a binding domain; fused to

(ii) a purified 5-methylpyrimidine oxygenase having a size less than 600 amino acids and having a catalytic domain having at least 90% identity with SEQ ID NO: l.

13. The fusion protein according to claim 12, wherein the binding domain is selected from the group consisting of: a His-tag, a maltose- binding protein, a chitin binding domain, and a DNA binding domain.

14. The fusion protein according to claim 13, wherein the binding domain is a DNA binding domain comprising a zinc finger or

transcription activator-like (TAL) effector domain.

15. A fusion protein according to any one of claims 12-14, wherein the 5-methylpyrimidine oxygenase is a 5-methylcytosine oxygenase or a thymine hydroxylase.

16. A fusion protein according to any one of claims 12-15,

formulated in a buffer at a pH from about 6 to about 7.5.

17. A fusion protein according to any one of claims 12-16, for use in oxygenating 5-methylpyrimidine in double-stranded DNA, in single- stranded DNA, or in RNA.

18. A kit comprising a composition according to any one of claims 1- 11, wherein the purified 5-methylpyrimidine oxygenase is in a first container and the buffer is a reaction buffer in a second container.

19. A kit comprising a fusion protein according to any of Claims 12- 15 and a buffer as defined in any of Claims 2-7, wherein the fusion protein is in a first container and the buffer is in a second container.

20. A kit according to claim 19, further comprising a nucleic acid.

21. A kit according to claim 20, wherein the nucleic acid is as defined in any one of Claims 8-11.

22. A kit according to any one of claims 18-21, further comprising a reducing agent, such as sodium borohydride.

23. A kit according to any one of claims 18-22, wherein the kit further comprises a metal halide.

24. A kit according to any one of claims 18-23, further comprising (i) a β-glycosyltransferase (BGT) and (ii) UDP-glucosamine and/ or UDP- glucose.

25. A kit according to any one of claims 18-24, further comprising a DNA glycosylase such as thymine DNA glycosylase.

26. A kit according to any one of claims 18-25, further comprising an endonuclease.

27. A kit according to claim 26, wherein the endonuclease cleaves DNA containing 5-hydroxymethylcytosine (5-hmC) more efficiently than DNA containing p-glucosyl-oxy-5-methylcytosine.

28. A kit according to claim 26, wherein the endonuclease is AbaSI.

29. A method for differentiating a 5-methylcytosine (5-mC) from 5- hydroxymethylcytosine (5-hmC) in a genome or genome fragment, comprising:

(a) reacting the isolated genome or genome fragment

containing 5-mC and 5-hmC with:

(i) UDP-glucose or UDP-glucosamine, and a glycosy transferase for transferring glucose or glucosamine to the 5-hmC; and

(ii) a composition according to any one of claims 1-14, or a fusion protein according to any one of claims 15-24, to convert the 5-mC to 5-hmC;

(b) cleaving the reacted genome or genome fragment of (a) with an endonuclease that recognizes the glucosylated or glucosaminylated 5-hmC generated in (i), or an endonuclease that recognizes the 5-hmC generated in (ii); and

(c) differentiating the 5-mC from the 5-hmC by an altered cleavage pattern.

30. A method according to claim 29, wherein (a)(i) comprises reacting the isolated genome or genome fragment with UDP- glucosamine and a glycosyltransferase for transferring glucosamine to the 5-hmC.

31. A method according to claim 29, wherein the modification- dependent endonuclease is capable of selectively cleaving an 5-hmC and not a p-glucosyl-oxy-5-methylcytosine (5-ghmC).

32. A method according to claim 29, wherein the modification- dependent endonuclease is AbaSI.

33. A method according to any one of claims 29 to 32, wherein (a) further comprises reacting the isolated genome or genome fragment from (a)(i) with a reducing agent.

34. A method of modifying a 5-methylcytosine oxygenase,

comprising:

(a) introducing random or targeted mutations into the 5- methylcytosine oxygenase; and

(b) changing the specificity of the 5-methylcytosine oxygenase so as to exclusively oxidize 5-mC to 5-hmC.

Description:
Compositions and Methods for Oxygenation of Nucleic Acids Containing 5-Methylpyrimidine

BACKGROUND

5-methylcytosine (5-mC) has been linked to gene expression and its distribution in the genome plays an important role in epigenetics. In 2009, two groups independently discovered that an oxidized form of 5-mC, 5- hydroxymethylcytosine (5-hmC), exists in human and mouse DNA, and is especially enriched in the neuronal tissues as well as embryonic stem cells. Three enzymes named TET1/2/3 have been shown in human and mouse to be responsible for oxidizing 5-mC to 5-hmC. TET enzymes belong to the broad family of Fe(II)/2-oxo-glutarate-dependent (20GFE) oxygenases, which use 2-oxo-glutarate (20G), as co-substrate, and ferrous ion (Fe(II)) as cofactor. After additional biochemical studies, it was discovered that these enzymes could oxidize 5-mC to generate oxidation products identified as 5-hmC, 5-formylcytosine (5-fC) and 5-carboxycytosine (5-caC). Finally, 5-caC is believed to be excised via the action of DNA glycosylases and replaced by the unmodified cytosine. The TET enzymes are very large proteins and hence it has been problematic to make these proteins in recombinant form and in sufficient quantities to use as a research reagent. In order to identify the impact of the epigenome on phenotype, it is desirable to map the position of modified nucleotides and to understand when and where the various modifications arise. Sodium bisulfite

sequencing is the predominant method for mapping modified cytosine in the genome. Unfortunately, this technique does not discriminate between 5-mC and 5-hmC. Different methods are required to distinguish 5-mC from 5-hmC and its oxidation products. SUMMARY

Although Neigleria gruberi has not been previously reported to contain 5-mC or 5-hmC, the present inventors have surprisingly discovered that a protein from N. gruberi can be used in vitro to convert 5-mC to oxidized cytosines. That protein can be purified from natural sources or produced recombinantly, optionally as a fusion protein with another amino acid sequence to facilitate its purification or use.

Accordingly, in one aspect embodiments provide a fusion protein in which a binding domain is fused to a recombinant 5-methylpyrimidine oxygenase (mYOXl) having a size less than 600 amino acids and having a catalytic domain having 90% or 100% identity with the amino acid sequence of SEQ ID NO: l. In certain embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to amino acids 209-296, 160-297, 154- 304 or 1-321 of the amino acid sequence of SEQ ID NO: 2 (mYOXl), and/or with the corresponding amino acids of any one of SEQ ID NOs: 3-9 as aligned with SEQ ID NO: 2 in Figure 2B, optionally while retaining 90% or 100% identity with the amino acid sequence of SEQ ID NO: l. In other

embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to the entire length of SEQ ID NO: 2, 3, 4, 5, 6, 7, 8, or 9. The binding domain is capable of recognizing and binding to another molecule. Thus, in some embodiments the binding domain is a histidine tag ("His-tag"), a maltose-binding protein, a chitin-binding domain, or a DNA-binding domain, which may include a zinc finger and/or a transcription activator-like (TAL) effector domain. The fusion protein can be used as a mYOXl (such as a 5-mC oxygenase or a thymine hydroxylase) in single-or double-stranded DNA or in RNA, typically at a pH of about 6 (generally between 5.5 and 6.5) to about 8, and, in some embodiments, at a pH of about 6 to about pH 7.5.

In another aspect, embodiments provide buffered compositions containing a purified mYOXl having a size less than 600 amino acids and having a catalytic domain having 90% or 100% identity with the amino acid sequence of SEQ ID NO: l. In certain embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to amino acids 209-296, 160-297, 154-304 or 1-321 of the amino acid sequence of SEQ ID NO:2, and/or with the corresponding amino acids of any one of SEQ ID NOs:3-9 as aligned with SEQ ID NO: 2 in Figure 2B, optionally while retaining 90% or 100% identity with the amino acid sequence of SEQ ID NO: l. In other embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to the entire length of SEQ ID NO:2, 3, 4, 5, 6, 7, 8, or 9. In various embodiments, the

composition contains glycerol; and/or contains Fe(II), as cofactor, and a- ketoglutarate, as co-substrate, for the enzyme. In some of these

embodiments, the composition does not contain ATP, which can interfere with subsequent oxidation of hydroxy methylated nucleotides; in other embodiments, the composition does contain ATP (e.g. to inhibit further oxidation). The composition is optionally at a pH from about 6 to about 8. In certain embodiments, the pH is about 6, or is from about 6 to about 7.5. The buffered compositions can be used to generate a variety of oxidation products of 5-mC, including 5-hmC, 5-fC, and 5-caC. The distribution of oxidation products can be varied by varying the pH of the reaction buffer. Accordingly, in various embodiments the pH of the buffered composition is about 6; about 6.0 to about 6.5; about 6.0 to about 7.0; about 6.0 to about 7.5; about 6.0 to about 8.0; about 6.5 to about 7.0; about 6.5 to about 7.5; about 6.5 to about 8.0; about 7.0 to about 8.0; or about 7.5 to about 8.0.

In some embodiments, the buffered compositions also include a nucleic acid, such as single- or double-stranded DNA that may include 5-mC (as a substrate for the enzyme) and/or one or more of 5-hmC, 5-fC, or 5- caC (naturally-occurring, and/or resulting from the activity of the enzyme).

Embodiments also provide kits for modifying nucleic acids. The kits include a purified mYOXl having a size less than 600 amino acids and having a catalytic domain having 90% or 100% identity with the amino acid sequence of SEQ ID NO: 1, or any one of the buffered compositions or fusion proteins described above, together with a separate reaction buffer. In certain embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to amino acids 209-296, 160-297, 154-304 or 1-321 of the amino acid sequence of SEQ ID NO : 2, optionally while retaining 90% or 100% identity with the amino acid sequence of SEQ ID NO : l . The reaction buffer has a pH typically from about 6 to about 8, and may contain contains Fe(II) and/or a-ketoglutarate. In various embodiments, the pH of the reaction buffer is about 6; about 6.0 to about 6.5; about 6.0 to about 7.0; about 6.0 to about 7.5; about 6.0 to about 8.0; about 6.5 to about 7.0; about 6.5 to about 7.5; about 6.5 to about 8.0; about 7.0 to about 8.0; or about 7.5 to about 8.0. The kit may also include a nucleic acid such as single- or double- stranded DNA that may include one or more 5-mC residues. Also, or alternatively, the kit may include: a reducing agent, such as sodium

borohydride, or an additive, such as cobalt chloride; a β-glycosyltransferase (BGT) and UDP-glucose and/or UDP-glucosamine; a DNA glycosylase such as thymine DNA glycosylase; and/or an endonuclease, such as an endonuclease that cleaves DNA containing 5-hmC more efficiently than it cleaves DNA containing p-glucosyl-oxy-5-methylcytosine (5-ghmC) (e.g. AbaSI).

Embodiments also provide kits for detecting the 5-mC in double- stranded or single-stranded DNA or RNA by sequencing, e.g., single- molecular sequencing such as Pacific Biosciences platform. The kits include a purified mYOXl having a size less than 600 amino acids and having a catalytic domain having 90% or 100% identity with the amino acid sequence of SEQ ID NO: l, or any one of the buffered compositions or fusion proteins described above, together with a separate reaction buffer. In certain embodiments, the mYOXl has an amino acid sequence with at least 90% identity (or more, such as at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity) to amino acids 209-296, 160-297, 154-304 or 1-321 of the amino acid sequence of SEQ ID NO: 2, optionally while retaining 90% or 100% identity with the amino acid sequence of SEQ ID NO: l. The reaction buffer has a pH typically from about 6 to about 8, and may contain contains Fe(II) and/or a-ketoglutarate. In various embodiments, the pH of the reaction buffer is about 6; about 6.0 to about 6.5; about 6.0 to about 7.0; about 6.0 to about 7.5; about 6.0 to about 8.0; about 6.5 to about 7.0; about 6.5 to about 7.5; about 6.5 to about 8.0; about 7.0 to about 8.0; or about 7.5 to about 8.0. The kit may contain other DNA/RNA repair enzymes for the DNA or RNA to be used in the sequencing platforms. In another aspect, embodiments provide methods for differentiating a

5-mC from 5-hmC in a genome or genome fragment. In one embodiment, the method includes: reacting the isolated genome or genome fragment containing 5-mC and 5-hmC with UDP-glucose or UDP-glucosamine, a glycosy transferase for transferring glucose or glucosamine to the 5-hmC, and one of the previously described fusion proteins or buffered compositions; cleaving the glucosylated template with a modification-dependent

endonuclease that recognizes at least one of the modified nucleotides; and differentiating the 5-mC from the 5-hmC by an altered cleavage pattern. In another embodiment, the method includes: reacting the isolated genome or genome fragment containing 5-mC and 5-hmC with UDP-glucosamine and a glycosy transferase for transferring glucosamine to the 5-hmC; subsequently reacting the isolated genome or genome fragment with one of the previously described fusion proteins or buffered compositions and optionally with a reducing agent; cleaving the template with a modification-dependent endonuclease that is capable of selectively cleaving a 5-hmC and not a 5- ghmC; and differentiating the 5-mC from one or more of its oxidation products by an altered cleavage pattern. In each of these embodiments, the modification-dependent endonuclease is optionally AbaSI.

Embodiments also provide methods of modifying a 5-mC oxygenase by introducing random or targeted mutations and changing the specificity of the enzyme so as to exclusively oxidize 5-mC to 5-hmC.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 shows a phylogram of mYOXl in Naegleria gruberi and TET proteins based on the ClustalW multiple sequence alignment. TETl_hs_C, human TET1 truncated C-terminus; TETl_mm_C, mouse TET1 truncated C- terminus; TET2_hs_C, human TET2 truncated C-terminus; TET2_mm_C, mouse TET2 truncated C-terminus; TET3_hs_C, human TET3 truncated C- terminus; TET3_mm_C, mouse TET3 truncated C-terminus.

Figure 2A-B shows eight mYOX proteins in Naegleria gruberi and their alignments. This family of problems has a consensus sequence

(R/K)X 4 HXDXi 2 GXi 8 - 3 oDXioHXVX 7 -7 2 X 5 FA (SEQ ID NO: l).

Figure 2A shows the conserved domain structure of the 8 mYOX proteins anchored by the 2OGFE catalytic domain. An additional domain, a CHROMO domain, was detected in one of the proteins.

Figure 2B shows multiple sequence alignment of the 2OGFE catalytic domain sequences in mYOX proteins. Alignment was performed by the PROMALS program (http://prodata.swmed.edu/promals/promals.php).

Figure 3 shows a single band of purified recombinant mYOXl having a molecular weight of 37,321 Dalton on an SDS-PAGE. Figure 4A-C shows the activity of mYOXl. Figure 4A shows the activity on double-stranded DNA with 24 fully-methylated CpG sites ("24x oligo"). Figure 4B shows the activity on plasmid DNA ("pTXBl-M.Sssl"). Figure 4C shows the activity on genomic DNA ("IMR90"). All substrate DNA contained 5-mC. The generation of 5-hmC, 5-fC and

5-caC was monitored by liquid chromatography. The generation of 5-hmC was dependent on mYOXl, since no 5-hmC was detected in the absence of the enzyme. In addition, mYOXl was able to convert thymine to 5-hmU, 5- fU and 5-caU (data not shown). These results indicate that mYOXl is an active 5-mC oxygenase and thymine hydroxylase.

Figure 5 shows methods for mapping methylome and

hydroxymethylome using the DNA modification-dependent restriction endonucleases.

DETAILED DESCRIPTION OF EMBODIMENTS In general and in at least one aspect, a novel family of enzymes is described. Generally, these enzymes can be described as mYOXs, or, more specifically, 5-mC oxygenases that can use 2OG, as co-substrate, and ferrous ion (Fe(II)), as cofactor. This novel family, whose members are referred to in this application as mYOXs, is distantly related to the TET proteins, as shown in the phylogram of Figure 1, sharing about 15%

sequence identity with them. Compared to TET proteins, mYOXs have several advantages as reagents for oxygenating 5-mC. With sizes in the range of 174-583aa, mYOXs are substantially smaller than enzymes of the TET family (which are ~1600-2000aa), facilitating their recombinant production. Their small size renders these enzymes suitable as components in fusion proteins with, for example, DNA binding domains such as zinc fingers, and/or one or more additional enzymatic domains such as a glycosylase to promote the eventual excision of the modified cytosine.

Moreover, in contrast to TET proteins, mYOXs operate more efficiently at pH 7.5 or less (e.g. at about pH 6), and do not require ATP which is significant because it reduces the possibility of side reactions, for example,

phosphorylation, and permits use of the enzymes in conjunction with PCR amplification which is inhibited by ATP. An additional advantage of mYOXl over TET proteins as research reagents includes its improved catalytic efficiency. For example, stoichiometrically fewer enzyme molecules are needed to oxidize 5-mCs when using mYOXl rather than a TET enzyme.

One of the advantages of oxidizing 5-mC in vitro is the ability to add chemical or fluorescent labels onto DNA, which can be further coupled to sequencing technologies and map the DNA epigenomes. mYOXs can be cloned and purified from Naegleria gruberi, a free-living single-cell protist as described in Example 1. Host cells suitable for

expression include E. coli, yeast and insect cell systems producing greater than 10 pg/l, 20 pg/l, 30 pg/l, 50 pg/l, 70 pg/l, 100 pg/l, 200 pg/l, 300 pg/l, 400 μg/l, 500 g/l and as much as 10 mg/liter of culture. A unit amount of mYOXl is able to convert lpmol of 5-mC on DNA in 30 minutes at 34°C in IX mYOXl reaction buffer at pH 6.0 (unit definition).

Exemplary mYOX protein sequences are provided in the following table:

Name Accession # SEQ SEOUENCE

ID NO:

mYOXl XP_002667965.1 2 MTTFKQQTIKEKETKRKYCIKGTTANLTQT

HPNGPVCVNRGEEVAN 1 1 1 LLDSGGGINK KSLLQNLLSKCKTTFQQSFTNANITLKDEK WLKNVRTAYFVCDHDGSVELAYLPNVLPK

ELVEEFTEKFESIQTGRKKDTGYSGILDNS

MPFNYVTADLSQELGQYLSEIVNPQINYYIS

KLLTCVSSRTINYLVSLNDSYYALNNCLYPS

TAFNSLKPSNDGHRIRKPHKDNLDITPSSL

FYFGNFQNTEGYLELTDKNCKVFVQPGDVL

FFKG N EYKH WAN ITSG WRIG LVYFAH KG

SKTKPYYEDTQKNSLKIHKETK

mYOX6 XP_002674105.1 3 MPMNYITSDLKTQLGEYLIGIVNPMLDETIT

AALEILSPRTINYLTSLPHPYHILNNCIYPST

AFNYLEPQIEKHRIKNAHKDTRDATPSVLF

YLGDYDEKEGYLEFPEQNCKVFVKPGDLLL

FKGN KYKHQVAPITSGTRLG LVYFAH KACK

VMDFYDDYQKESLNKHKQQNQ

mYOX4 XP_002676528.1 4 MSINTTFNQKTTQSGEPPMMMRMTNSSTP

PLTPKNCLPIFVYNDYGKLIREEQQQPTDII

TNNNNSMMRSMPTTNRWETNPQTPLSVS

PFQPLLPIPNFSHAFIVGNLPPSVSVRRKNR

KMSEKPKNNSAPSKIMHQLELSVLNNQRR

IAPKG PLADISNIQLPQQESTN KSN NTTPK

KPRIRQLMLTTPLRESLQSNQSARSKYIDE

EANNYSINDSPb 11 IIKTSNTKDSEHKAAM

ATNLGLSTDDFECKPFb 111 LPSVIDKNYLV

VDKEGCTQLALLPNHIPTSVCKLIEVKCRK

VSN LRHALKIQKASFYVN WWTKSQPMGY

MCKDNESEIGKVVNEIAELLSDHCRNLLR

MCNERVYKKISELKEDKFFAPCICFNILEHD

LESRITKFHHDKMDYGVSVLFYFGDYSRG

N LNVLDAGSSSTIVTRPG DAVI LRG NYYKH

SVQNIEPGNNKARYSIVFFAHSTHFLKKKY

ELSPAAAKKAFLVDNPDFVSIKKRKQASSS

SDVSVKKSKKSTEDNVEFIQTHTYLGNGY

KSGHKNYQYYVKFNNSDQKEWKSYESLPK

QAVASYWVKFKKLKSLSNQ

mYOX7 XP_002668594.1 5 M LEAQH HKLTIYTG M WG H M KPCVFI AADN

CNKSGETIVENLLFKLGKIGSKLMEILSPFT

MNFLSSLDPEIFLNHDLFPISATNFMIPGNK

HRILKPHKDNQDVGLCIIFYFGNYNAPLEF

VNKGSVFNTERGDVLLMRGSHFRHWKPV DNGLLEHVHDPMRISWLFAHKSLKMNPS YFLNAGSALKAHDEDFPEKAKKRKKKRK

mYOX8 XP_002676954.1 6 MFLRNILPENTTTEVTNILDKINQRRSKENY

YIGSWGKSSSFLFKTNDTIFNELSSQFIKII

NLLKNYVLEILKFGNNKMRKFLEKYNSSDF

LSIYPTVCFNFLDKSVDENRILHIHPDKEDT

GTSLIFYFGKFKGGAISFPELNFKLMVQSA

DVLLFDGKNNLHAVESLHGKDDVRYSWF

FAH KAD LG KTSYPMN RG EVM KG I KN KI N N

mYOX5 XP_002668409.1 7 MDIGIDWRGTHFRHKNHLVKEEVCDRTN

WIVLCPNGQVDIAFFPNAIPEELCLEMETV

VANSDVDILSCKKAIIDGSWTRYGNGIYPV

KTITTNQSILLHELNDKCGPFVLDKLKHINK

NMFNKLDNINEDIKNYKIFAKYPTLALNVS

HNENYNISKKPYRKHTDGNDIGLGVLTYFG

SEIIEGGNLIIHIENLKVFNFPIQRRDLVFLN

SKFYAH QVTKVTSGI RFG LVYFAG EAHFRV

RNNDDFLPALPFNANDKELREERSKKGRK

SMNEYKKRFLKKYLREKKKINKKRVKCKNK

LK

mYOX2 XP_002682154.1 8 MGPLHVSQHDKKKPKHRRRKKQFLKAQAL

TRVCWENEKSIDESGKTRVYKMIKEWEFL

KGNNIQSNEPILSVYGVNDTIPKEISSNTII

VTKEGMVEMALLKSVLPPSLLEECTQLCRE

MSEWLATEKDIDKGSFFSGWWTMNMPM

GYKCADSFRFELVDTKVKQIQALLHDTFQH

ILELANPKLFAKLSKLTERGQTPWCFNMIP

TRNESVKEKFQGSYKSTDKVNRPKTNHRD

RNDMGISAMFYMGKFGGGSLQLIRVNEHT

PKTLVHIQAGDWLLRANKYRHAVSPTRPQ

SFPLANSSQTEVDDVKICENSSPTLNNPQA

DDNTPTLINTCPKQEPTDGDNPVQSSKEP

SNDYEQKRFSFIFFAHRSHFKHSKVYCGM

GQRQALNAFKADHPYYQSQRMKKKLGDD

CLDQSLILTEKRKPIKRNYALFNECGDDKQ

EESDEEEYQQYEPKPTTEEYTIKVIVDHEKV

FKGSDQSRKSYLYHIQWLGYPDETWEPYE

HLDDCQVFEDYLKHHNISLFDEEEEDRKV

DDSMLLPAWMHEDESLFEALLPIICCSTDN PRHHLDDVPPFDFNY

mYOX3 XP_002668005.1 9 MTEIVELSNIEPKDQKQAIIGGTWNRYGNS

IEIVAGISDENNTLLDNLTNCCESFVLDKL

WHLNRSMYNKLDTIEEKIKNFKTYAKYPSL

ALNLLCKENYNGKVKPYRKHIDPNNNGMD

VLMFFGKTFEGGNLIVSYHYTNIDFRMFTLP

IQSGDLVFLNSRIYHHKVTKVTSGVRCGLV

FFAGLDHFSVRKANYKKVKKEEYQKNMDD

KLLALPFQQKDKDLRIERTKTGRKEIKQFH

KNLQNNLPNKKRKK

Figure 2A-B depicts the common structure among these 8 mYOX proteins, including a conserved domain structure 9 (see panel A) and conserved sequences in that conserved domain as revealed by a multiple sequence alignment (see panel B). These 8 proteins share a common consensus sequence: (R/K)X 4 HXDXi 2 GXi 8 - 3 oDXioHXVX 7 - 72 RX 5 FA (SEQ ID NO: l).

Biochemical assays for characterization of these enzymes includes: non-quantitative assays, e.g., dot-blot assay using product-specific

antibodies, thin-layer chromatography, and quantitative assays, e.g., LC/MS, radioactive assay etc. mYOX enzymes may oxidize 5-mC through intermediate product forms to 5-caC. Mutants of these enzymes can be assayed for significant bias toward one oxidized form over another for example, a significant bias for conversion of 5-mC to 5-hmC or 5-mC to 5-fC or 5-caC. This allows direct detection of a single oxidation form and also a temporal means of tracking change in the oxidation state of modified nucleotides in the genome and correlation of these states and their changes to phenotypic change. Additional mutants may include those that only oxidize 5-mC, or 5- hmC, or 5-fC, but not other modified forms of cytosine. For example, a mutant may oxidize 5-hmC to 5-fC or 5-caC, but will not work on 5-mC. These mutants may enable a variety of in vitro epigenomic mapping techniques.

Mutants can be engineered using standard techniques such as rational design by site-directed mutagenesis based on enzyme 3D structures and screening/selection methods in large random mutant libraries.

Embodiments of the invention include uses of mYOXs for mapping of both methylome and hydroxymethylome. For example, differentiation processes in eukaryotic organisms can be studied using N. gruberi as a model system. N. gruberi is a single-cell protist that can differentiate from an ameoba form to a flagella form in a synchronous manner. It thus forms a model system to study dynamic methylome/hydroxymethylome changes that contribute to the gene/pathway regulation during differentiation.

In one embodiment, the 5-mC in the genomic DNA can be converted to 5-hmC using an mYOX such as mYOXl or other member of the mYOX family. Reducing agents, such as NaBH4, can be used in the reaction to ensure that any oxidation products in the form of 5-fC or 5-caC or naturally occurring instances of the same are converted to 5-hmC.

Any chemical or enzyme capable of promoting the reduction of 5-fC or 5-caC to 5-hmC can be used for that purpose. Many water-soluble metal or metalloid hydrides are able to reduce aldehydes and/or carboxylic acids to alcohols. Examples of such reducing agents are sodium borohydride and related compounds where from 1 to 3 of the hydrogens are replaced by other moieties, such as cyano and alkoxy containing up to about 5 carbon atoms. Examples of substituted borohydrides, all of which are sodium, potassium, or lithium salts, include cyanoborohydride, dicyanoborohydride, methoxyborohydride, dimethoxyborohydride, trimethoxyborohydride, ethoxyborohydride, diethoxyborohydride, triethoxyborohydride,

propoxyborohydride, dipropoxyborohydride, tripropoxyborohydride,

butoxyborohydride, dibutoxyborohydride, tributoxyborohydride, and so forth. Examples of other water-soluble metal hydrides include lithium borohydride, potassium borohydride, zinc borohydride, aluminum borohydride, zirconium borohydride, beryllium borohydride, and sodium bis(2- methoxyethoxy)aluminium hydride. Sodium borohydride can also be used in combination with a metal halide, such as cobalt(II), nickel(II), copper(II), zinc(II), cadmium (II), calcium (II), magnesium(II), aluminum(III), titanium (IV), hafnium(IV), or rhodium(III), each of which can be provided as a chloride, bromide, iodide, or fluoride salt. Alternatively, sodium borohydride can be used in combination with iodine, bromine, boron trifluoride diethyl etherate, trifluoroacetic acid, catechol-trifluoroacetic acid, sulfuric acid, or diglyme. Particular reducing strategies include the combination of potassium borohydride with lithium chloride, zinc chloride, magnesium chloride, or hafnium chloride; or the combination of lithium borohydride and

chlorotrimethylsilane. Other reducing strategies include the use of borane, borane dimethyl sulfide complex, borane tetrahydrofuran complex, borane- ammonia complex, borane morpholine complex, borane dimethylamine complex, borane trimethylamine complex, borane N,N-diisopropylethylamine complex, borane pyridine complex, 2-picoline borane complex, borane 4- methylmorpholine complex, borane tert-butylamine complex, borane

triphenylphosphine complex, borane Ν,Ν-diethylaniline complex, borane di(tert-butyl)phosphine complex, borane diphenylphosphine complex, borane ethylenediamine complex, or lithium ammonia borane. Alternative reducing strategies include the reduction of carboxylic acids via the formation of hydroxybenzotriazole esters, carboxy methyleniminium chlorides, carbonates, O-acylisoureas, acyl fluorides, cyanurates, mixed anhydrides, arylboronic anhydrides, acyl imidazolide, acyl azides, or N-acyl benzotriazoles, followed by reaction with sodium borohydride to give the corresponding alcohols.

Chemical groups, e.g., sugars such as glucose, can be added onto 5- hmC using a glycosy transferase such as an a-glucosyltransferase (AGT) or a BGT. Useful glycosy transferases can accept a nucleobase in a nucleic acid as a substrate. Exemplary BGT enzymes are found in bacteriophage, such as T4. The T4 BGT show little DNA sequence specificity, suggesting a

mechanism of non-specific DNA binding combined with specific 5-hmC recognition.

Variants of the T4 BGT can be used. For example, the structure of T4 BGT and the identities of key residues in the enzyme are well understood, facilitating the construction of forms of the protein incorporating one or more amino acid deletions or substitutions. T4 BGT is a monomer comprising 351 amino acid residues and belongs to the α/β protein class. It is composed of two non-identical domains, both similar in topology to Rossmann nucleotide- binding folds, separated by a deep central cleft which forms the UDP-GIc binding site. Amino acids participating in the interaction with UDP include Ile238 (interactions with N3 and 04 of the base); Glu272 (interactions with 02' and 03' of the ribose); Serl89 (interacting with Oi l of the a-phosphate); Argl91 (interacting with 012 of the a-phosphate); Arg269 (interacting with 06 of the α-phosphate and 022 of the β-phosphate); and Argl95

(interacting with 021 and 022 of the β-phosphate). Glu22 and AsplOO have been proposed to participate in the catalytic mechanism and other residues have been proposed to be involved in DNA binding or interactions with the UDP-associated sugar (Morera et al. (1999) "T4 phage beta- glucosyltransferase: substrate binding and proposed catalytic mechanism." J Mol. Biol. 292(3) :717-730, the entire disclosure of which is incorporated herein by reference).

Accordingly, a variant T4 BGT can be used to add a sugar to a nucleic acid. Variants optionally include an amino acid sequence at least 70% (e.g. at least 75%, at least 80%, at least 82%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%) identical to amino acids 1-351, 10-272 or 22-272 of T4 BGT. As assays for glycosylated nucleic acids (e.g. changes in susceptibility to cleavage by a glycosylation-sensitive endonuclease) are readily available, screening for variants retaining

enzymatic activity is relatively straightforward.

Due to the more prominent difference between the 5-gmC and unmodified cytosine, direct observation of its signals in single-molecule sequencing experiments can be achieved using platforms such as PacBio (Pacific Biosciences, Menlo Park, CA) or Oxford Nanopore (Oxford, UK).

Modification-dependent or modification-sensitive endonucleases are described in WO2011/025819 incorporated by reference and also in

REBASE® (www.neb.com, New England Biolabs, Ipswich, MA) and include for example, Mspl, Mfel, Taq, and Hpall endonucleases. Optionally, the endonuclease preferentially binds to a hydroxymethylated cytosine or a glucosyl-oxy-methylated cytosine and cleave the bound nucleic acid at a defined distance from the recognition site. Exemplary endonucleases include those whose amino acid sequences are identical to, or are at least 95% identical to, an enzyme selected from the group consisting of PvuRtslI, PpeHI, EsaSS310P, EsaRBORFBP, PatTI, Ykrl, EsaNI, SpeAI, BbiDI,

PfrCORFlI80P, PcoORF314P, BmeDI, AbaSI, AbaCI, AbaAI, AbaUMB30RFAP and Asp60RFAP, as described in US Patent Application Publication No.

2012/0301881 and/or at least 95% identical to an enzyme referenced in Borgaro et al. (2013) "Characterization of the 5-hydroxymethylcytosine- specific DNA restriction endonucleases," Nucleic Acids Research, doi:

10.1093/nar/gktl02, the entire disclosures of each of which are incorporated herein by reference.

All references cited herein, as well as U.S. Application No. 61/611,295, filed March 15, 2012; U.S. Application No. 61/722,968, filed November 6, 2012; U.S. Application No. 61/723,427, filed November 7, 2012; U.S.

Application No. 61/724,041, filed November 8, 2012; U.S. Application No. 13/804,804, filed March 14, 2013; U.S. Application No. 13/826,395, filed March 14, 2013; U.S. Application No. 13/827,087, filed March 14, 2013, and U.S. Application No. 13/827,885, filed March 14, 2013, are incorporated by reference. EXAMPLES

Example 1 : Expression of mYOXl. mYOXl was cloned in E. coli. T7 Express cells (New England Biolabs (NEB), Ipswich, MA) transformed with pTXBl-(His)6-mYOXl which was induced with 50 μΜ IPTG at OD = 0.8. The cells were grown at 16°C for 12- 16 hours and then lysed using a French press. The lysate supernatant was purified on a Ni-based affinity column followed by a heparin-based affinity column. The typical yield of isolated (His)6-mYOXl was ~ 7-8 mg protein/L culture. The pure protein sample was stored in 20 mM TRIS, pH 7.5, 1 mM DTT, 500 mM NaCI, and 50% glycerol at -20°C.

Example 2: Determination of activity of mYOXl.

(A) Conversion of 5-mC in a double-stranded DNA oligomer with 24 fully- methylated CpG sites ("24x Oligo"), as reflected by the HPLC chromatogram shown in Figure 4A. The DNA sequence of the top strand, with the

methylation sites underlined, is: 5'- ATTACACGCGCGATATCGTTAACGATAATTCGCGCGATTACGATCGATAACGCGTT AATA-3' (SEQ ID NO: 10). For each methylated cytosine in the top strand, the cytosine complementary to the subsequent guanine residue is also methylated, yielding a total of 24 methylated cytosines per double stranded DNA. The assay mix contained in a final volume of 20 μΐ_: 50 mM Bis-TRIS pH 6.0, 50 mM NaCI, 1 mM dithiothreitol (DTT), 2 mM ascorbic acid, 2 mM a-ketoglutarate, 100 μΜ ferrous sulfate (FeSO 4 ), 2 μΜ oligonucleotide (24x), and 4 μΜ mYOXl.

The reaction mixture was incubated for 1 hour at 34°C. The protein was digested using proteinase K (NEB) at a final concentration of 1 μg/μL for 1 hour at 50°C. The DNA was recovered by using QIAquick ® Nucleotide

Removal Kit (QIAGEN, Valencia, CA). The recovered DNA was digested by a mixture of 0.5 U nuclease PI (Sigma-Aldrich, St. Louis, MO), 5 U antarctic phosphatase (NEB), 2 U DNAse I (NEB) in 20 μΙ_ total volume for 1 hour at 37°C. The digested DNA was then subjected to LC-MS analysis. LC-MS was done on Agilent 1200 series (G1316A UV Detector, 6120 Mass Detector, Agilent, Santa Clara, CA) with Waters Atlantis T3 (4.6 x 150 mm, 3 μιτι, Waters, Milford, MA) column with in-line filter and guard. The results are shown in Figure 4A, in which the blue profile depicts a reaction mixture without mYOXl and the red profile depicts a reaction mixture with mYOXl. 5-mC peak is detected in the blue profile, 5-hmC, 5-fC and 5-caC peaks are detected in the red profile. The results of these experiments are

summarized in the table below.

A variety of buffers and pHs were tested to assess the optimum buffer conditions for 5-mC conversion by mYOXl. The experiment was performed on a double-stranded DNA with one fully-methylated CpG site (5'-

CGGCGTTTCCGGGTTCCATAGGCTCCGCCCCGGACTCTGATGACCAGGGCATCAC

A-3'; underlined residue is 5-mC, as is the residue complementary to the adjacent guanine residue; SEQ ID NO: 11; "oligo 9"). The results are shown in the table below:

Buffer hm c m c f C

Citrate pH 5.0 - - 100% -

Citrate pH 5.5 - - 100% -

MES pH 5.5 10.2% 40.9% 9.2% 39.7%

MES pH 5.75 7.7% 42.4% 7.0% 43.0%

MES pH 6.0 25.1% 20.8% - 54.1%

Bis-TRIS pH6.0 38.5% 15.7% 2.1% 43.6%

Bis-TRIS pH6.5 26.1% 19.0% 0.9% 54.0%

MOPS pH 6.5 38.8% 13.6% 2.1% 45.4%

MOPS pH 6.75 41.7% 10.0% 0.7% 47.5%

MOPS pH 7.0 31.7% 18.8% 0.6% 48.9% KH2P04 pH7.0 - - 100% -

TRIS pH 7.5 5.9% 56.8% 7.1% 30.1%

HEPES pH 7.3 20.5% 22.2% 1.0% 56.4%

HEPES pH 7.5 18.5% 37.4% 1.2% 42.8%

HEPES pH 8.0 - 16.8% 81.2% 2.0%

As shown in the table, mYOXl was active at pH 8.0, oxidizing a portion of the 5-mC to 5-hmC and 5-fC. However, the enzyme was even more active at lower pH. For example, at pH 7.5, approximately 90% of the 5-mC residues were oxidized, with most of the product present as 5-hmC and 5-fC. At pH 7.3, the proportions of 5-mC and 5-hmC decreased, with increasing proportions of 5-fC and 5-caC. The proportions of 5-mC and 5-hmC

continued to decrease with decreasing pH through pH 6.0, at which point substantially all of the 5-mC nucleotides were oxidized more than one third to 5-caC. Thus, the enzyme appears to be maximally active at about pH 6. The pH conditions could be used to manipulate distribution of 5-mC

oxidation products. The pH-dependence of mYOXl activity was surprising, as TET enzymes are routinely used at pH 8.

The activity of mYOXl was tested on single-stranded DNA (ssDNA) substrates and compared to that of a double-stranded DNA (dsDNA) with the same sequence under the same experimental conditions discussed for 24x oligo. Surprisingly, it was found that mYOXl oxidizes 5-mC in ssDNA as efficiently as dsDNA. Substrates included double-stranded "oligo 9"; "hemi- oligo 9," a double stranded DNA identical to oligo 9 but lacking

methylcytosine on the complementary strand; "ss oligo 9 (top)," a single stranded DNA including only the residues recited in SEQ ID NO: 11; and "ss oligo 9 (bottom)," a single stranded DNA including the residues

complementary to the residues recited in SEQ ID NO: 11.

Interestingly, mYOXl was further shown to exhibit activity on a 1.6kb

RNA substrate ("5-mc RNA") having all its cytosines in 5-mC form: gggtctagaaataattttgtttaactttaagaaggagatatacatatgaaaatcgaagaa ggtaaaggtcaccatcac catcaccacggatccatggaagacgccaaaaacataaagaaaggcccggcgccattctat cctctagaggatggaacc gctggagagcaactgcataaggctatgaagagatacgccctggttcctggaacaattgct tttacagatgcacatatc gaggtgaacatcacgtacgcggaatacttcgaaatgtccgttcggttggcagaagctatg aaacgatatgggctgaat acaaatcacagaatcgtcgtatgcagtgaaaactctcttcaattctttatgccggtgttg ggcgcgttatttatcgga gttgcagttgcgcccgcgaacgacatttataatgaacgtgaattgctcaacagtatgaac atttcgcagcctaccgta gtgtttgtttccaaaaaggggttgcaaaaaattttgaacgtgcaaaaaaaattaccaata atccagaaaattattatc atggattctaaaacggattaccagggatttcagtcgatgtacacgttcgtcacatctcat ctacctcccggttttaat gaatacgattttgtaccagagtcctttgatcgtgacaaaacaattgcactgataatgaat tcctctggatctactggg ttacctaagggtgtggcccttccgcatagaactgcctgcgtcagattctcgcatgccaga gatcctatttttggcaat caaatcattccggatactgcgattttaagtgttgttccattccatcacggttttggaatg tttactacactcggatat ttgatatgtggatttcgagtcgtcttaatgtatagatttgaagaagagctgtttttacga tcccttcaggattacaaa attcaaagtgcgttgctagtaccaaccctattttcattcttcgccaaaagcactctgatt gacaaatacgatttatct aatttacacgaaattgcttctgggggcgcacctctttcgaaagaagtcggggaagcggtt gcaaaacgcttccatctt ccagggatacgacaaggatatgggctcactgagactacatcagctattctgattacaccc gagggggatgataaaccg ggcgcggtcggtaaagttgttccattttttgaagcgaaggttgtggatctggataccggg aaaacgctgggcgttaat cagagaggcgaattatgtgtcagaggacctatgattatgtccggttatgtaaacaatccg gaagcgaccaacgccttg attgacaaggatggatggctacattctggagacatagcttactgggacgaagacgaacac ttcttcatagttgaccgc ttgaagtctttaattaaatacaaaggatatcaggtggcccccgctgaattggaatcgata ttgttacaacaccccaac atcttcgacgcgggcgtggcaggtcttcccgacgatgacgccggtgaacttcccgccgcc gttgttgttttggagcac ggaaagacgatgacggaaaaagagatcgtggattacgtcgccagtcaagtaacaaccgcg aaaaagttgcgcggagga gttgtgtttgtggacgaagtaccgaaaggtcttaccggaaaactcgacgcaagaaaaatc agagagatcctcataaag gccaagaagggcggaaagtccaaactcgagtaaggttaacctgcaggagg (SEQ ID NO: 12). The assay conditions were as follows: 50 mM Bis-TRIS pH 6.0, 50 mM NaCI, 1 mM DTT, 2 mM ascorbic acid, 2 mM a-ketoglutarate, 100 μΜ FeSO 4 , 1 g 5- mC RNA, and 4 μΜ mYOXl. The reaction mixture was incubated for 1 hour at 34°C. The protein was digested using proteinase K (NEB) at a final concentration of 1 μg/μL for 1 hour at 37°C. The RNA was recovered by using QIAquick ® Nucleotide Removal Kit (QIAGEN, Valencia, CA). The recovered RNA was digested into nucleosides and analyzed by LC-MS as described in example 2A. The results were as follows:

(B) Conversion of 5-mC in plasmid and genomic DNA, as depicted in the HPLC chromatogram shown in Figures 4B and 4C, respectively. The assay components are as follows: 50 mM Bis-TRIS pH 6.0, 50 mM NaCI, 1 mM DTT, 2 mM ascorbic acid, 2 mM a-ketoglutarate, 100 μΜ FeSO 4 , 2 μg DNA, and 20 μΜ mYOXl.

The reaction mixture was incubated for 1 hour at 34°C. The reaction mixture was then digested with proteinase K for 1 hour at 50°C. The DNA was recovered by using QIAquick ® PCR Purification Kit (QIAGEN, Valencia, CA). The recovered DNA was digested and analyzed by LC-MS as described in Example 2A. As shown, mYOXl efficiently oxygenates 5-mC in plasmid and genomic DNA samples.

(C) ATP interferes with the chemical processivity of mYOXl (ability to undergo second and third oxidation steps) as reflected in the table presented below. This is contradictory to what has been described for the TET

enzymes where the presence of ATP has been required for the formation of higher amounts of 5-caC. Experimental conditions are as described before for oligos 24x and oligo9. DNA substrate

mYOXl 1 mM hm c

fC

ATP

- - - - 100% - oligo9 + - 38.7% 15.7% 2.1% 43.6%

+ + 13.6% 40.9% 2.3% 43.2%

Example 3: mYOXl can be used in conjunction with BGT.

An mYOXl/T4-BGT coupled assay was performed as described in Example 2A for genomic DNA (IMR90), with the following exceptions: 50mM Hepes pH 7.0 was used instead of Bis-Tris pH 6.0, and 40μΜ uridine diphosphoglucose (UDP-GIc) and 50 U T4 BGT were added in the oxidation reaction.

Alternatively, for bacterial genomic DNA (MG1655), the reaction was carried out exactly as described in Example 2A. Then the reaction mixture was digested with proteinase K for 1 hour at 50°C. The sample was then treated with lOOmM NaBH 4 , 40μΜ uridine diphosphoglucose (UDP-GIc) and 50 U T4-BGT in lx NEBuffer 4 (NEB) and incubated for 1 hour at 37°C. The DNA was recovered by using QIAquick ® PCR Purification Kit (QIAGEN,

Valencia, CA). The recovered DNA was digested and analyzed by LC-MS as described in Example 2A, and the results are summarized in the table below.

hm c

Substrate T4- GT NaBH 4 P_ghm c

fC in oxidation

IMR90 - 7.4% - 4.1% 85.9% 2.6%

reaction

after

MG1655 oxidation/ + 29.3% - 3.0% 67.7% - reduction The effects of increasing ATP concentration on the activity of mYOXl when coupled with the activity of T4-BGT in the presence of NaBH 4 and UDP- Glc were tested. ATP concentrations higher than 1 mM exhibit inhibiting effects on the activity of mYOXl to convert 5-mC to 5-hmC. The reaction was carried out exactly as described in Example 2A for oligo 9 except for the duration of the oxidation reaction (20 minutes instead of 1 hour), and the presence of varying amounts of ATP. The reaction mixture was then digested with proteinase K and glucosylated using T4 BGT as described above for MG1655 genomic DNA. The DNA was recovered by using

QIAquick® PCR Purification Kit (QIAGEN, Valencia, CA). The recovered DNA was digested and analyzed by LC-MS as described in Example 2A, and the results are summarized in the table below.

Example 4: Qualitative and quantitative assays for characterization of the mYOX family of enzymes.

Immunodot-blot assay: This is a qualitative, but relatively fast assay. Many samples can be tested simultaneously, which can be used for

screening purposes, e.g., tracking active fractions during the enzyme purification process. By immobilizing the reacted DNA onto a membrane, it was possible to confirm the identity of the oxidation products of 5-mC, i.e. 5-hmC, 5-fC and 5-caC by probing with specific antibodies (obtainable from Active Motif, Carlsbad, CA). LC-MS analysis: To quantify mYOXl oxidation products, LC-MS analysis was performed on a reverse-phase Waters Atlantis T3 C18 column (3 μιτι, 4.6 x 150 mm) with an Agilent 1200 LC-MS system equipped with an Agilent G1315D DAD detector and an Agilent 6120 Quadruple MS detector. A binary solvent system with ammonium acetate (10 mM, pH 4.5) and

methanol was used. The HPLC method included an isocratic condition with 2% methanol for 10 minutes followed by a slow gradient from 2% to 25% methanol in 30 minutes. The quantification of each nucleoside was based on the peak area by integration of each peak at 278 nm with UV detector. For more accurate quantification, each nucleoside peak can be quantified at its absorption maximum and adjusted by the extinction coefficient constant.

The identity of each peak was confirmed by MS.

Example 5: 5-hmC specific endonuclease assay.

We have developed a family of 5-hmC specific endonucleases which digest 5-hmC at the site of 5~hmC N 22 - 23 G. By cloning the Hpall DNA

methylase (C m CGG) into a vector with only two CCGG sites, the vector will contain two sites of 5"mC N 22 - 23 G. When the 5-mC in these sites were

oxidized to 5-hmC, digestion using the 5-hmC specific endonuclease such as PvuRtslI or AbaSI produced a DNA fragment detectable in an agarose gel. This method detected 5-hmC only.

Example 6: Methods for sequencing the methylome and

hydroxymethylome using the DNA modification-dependent restriction endonucleases.

Genomic DNA was digested with either MspJI or AbaSI. These

enzymes cleaved the DNA at fixed distances from the modified cytosine leaving a sticky end (MspJI: 4-base 5'-overhang; AbaSI: 2-base 3'- overhang). The first biotinylated adaptor (Plb in Figure 5) was then ligated to the cleaved ends. The ligated DNA was then subjected to random fragmentation to about 300 bp. Avidin beads were used to pull out the fragments with the ligated Plb. After polishing the ends, adaptor P2 was then ligated onto the DNA fragments on the beads. Adaptor-specific PCR was performed and the resultant DNA entered the library preparation pipeline for specific sequencing using the HiSeq ® platform (Illumina, San Diego, CA). The end-sequencing was done from the PI end.

Bioinformatic analysis of the sequencing reads utilized the PI ends to mark the enzyme's cleavage sites. After mapping the read back to the reference genome, the modified cytosine was determined to be located at a fixed distance away from the cleavage sites and on either side.