Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITIONS AND METHODS FOR IDENTIFICATION, ASSESSMENT AND TREATMENT OF CANCER USING DNA METHYLATION FINGERPRINTS
Document Type and Number:
WIPO Patent Application WO/2019/165375
Kind Code:
A1
Abstract:
The present invention relates to methods for identifying the primary source of cancer. In some aspects, the present invention relates to methods and compositions for assessing and treating cancer.

Inventors:
SHIVDASANI RAMESH A (US)
JADHAV UNMESH (US)
Application Number:
PCT/US2019/019429
Publication Date:
August 29, 2019
Filing Date:
February 25, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DANA FARBER CANCER INST INC (US)
SHIVDASANI RAMESH A (US)
JADHAV UNMESH (US)
International Classes:
C12Q1/68
Foreign References:
US20160017419A12016-01-21
US20160340749A12016-11-24
Attorney, Agent or Firm:
SMITH, DeAnn F. et al. (US)
Download PDF:
Claims:
What is claimed: 1. A method of determining the primary source of a cancer in a subject, the method comprising:

a) obtaining a biological sample comprising cancer cells from the subject;

b) detecting hypomethylated regions of a portion of the genome of the cancer cells in the biological sample,

c) comparing the distribution of hypomethylated regions of a portion of the genome of the cancer cells to the distribution of hypomethylated regions of portions of the genomes of different tissues, and

wherein the primary source of the cancer is the tissue that has a substantially similar distribution of hypomethylated regions to the distribution of hypomethylated regions in the cancer cells in the biological sample. 2. A method of determining the primary source of a cancer in a subject, the method comprising:

a) obtaining a biological sample comprising cancer cells from the subject;

b) detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample,

c) comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of different tissues, and

wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample. 3. The method of claim 2, wherein the detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. 4. The method of claim 3, wherein an LMR is a region of the genome wherein less than 60% of CpGs in that region are methylated. 5. The method of claim 3, wherein an LMR is a region of the genome wherein less than 50% of CpGs in that region are methylated.

6. The method of claim 3, wherein an LMR is a region of the genome wherein less than 40% of CpGs in that region are methylated. 7. The method of claim 3, wherein an LMR is a region of the genome wherein less than 30% of CpGs in that region are methylated. 8. The method of claim 3, wherein an LMR is a region of the genome wherein less than 20% of CpGs in that region are methylated. 9. The method of claim 3, wherein an LMR is a region of the genome wherein less than 10% of CpGs in that region are methylated. 10. The method of any one of claims 4 to 9, wherein the LMR region is modified with H3K4me1. 11. The method of any one of claims 4 to 9, wherein the LMR region is modified with H3K27ac. 12. A method of determining the primary source of a cancer in a subject, the method comprising:

a) obtaining a biological sample comprising cancer cells from the subject;

b) detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells;

c) comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, and

wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample.

13. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 60% of CpGs are methylated. 14. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 50% of CpGs are methylated. 15. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 40% of CpGs are methylated. 16. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 30% of CpGs are methylated. 17. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 20% of CpGs are methylated. 18. The method of claim 12, wherein the enhancer is a region of the genome wherein less than 10% of CpGs are methylated. 19. The method of claim 12, wherein an enhancer is a region of the genome that is modified with H3K4me1. 20. The method of claim 12, wherein an enhancer is a region of the genome that is modified with H3K27ac. 21. The method of any one of claims 1 to 20, wherein the biological sample is a blood sample. 22. The method of any one of claims 1 to 20, wherein the biological sample is a tumor sample.

23. A method of treating cancer in a subject, the method comprising:

a) obtaining a biological sample comprising cancer cells from the subject;

b) detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample,

c) identifying the primary source of cancer by comparing the distribution of LMRs in a portion of the genome of the cancer cells to the distribution of LMRs in a portion of the genome in different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample, and

d) administering a cancer therapy to the subject. 24. The method of claim 23, wherein the cancer treatment is a cancer therapy that is directed to the primary source of the cancer. 25. The method of claim 23 or 24, wherein the detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. 26. The method of claim 25, wherein an LMR is a region of the genome wherein less than 60% of CpGs in that region are methylated. 27. The method of claim 25, wherein an LMR is a region of the genome wherein less than 50% of CpGs in that region are methylated. 28. The method of claim 25, wherein an LMR is a region of the genome wherein less than 40% of CpGs in that region are methylated. 29. The method of claim 25, wherein an LMR is a region of the genome wherein less than 30% of CpGs in that region are methylated. 30. The method of claim 25, wherein an LMR is a region of the genome wherein less than 20% of CpGs in that region are methylated.

31. The method of claim 25, wherein an LMR is a region of the genome wherein less than 10% of CpGs in that region are methylated. 32. The method of any one of claims 26 to 31, wherein the LMR region is modified with H3K4me1. 33. The method of any one of claims 26 to 31, wherein the LMR region is modified with H3K27ac. 34. A method of treating a cancer in a subject, the method comprising:

a) obtaining a biological sample comprising cancer cells from the subject;

b) detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells;

c) identifying the primary source of cancer by comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample, and

d) administering a cancer therapy to the subject. 35. The method of claim 34, wherein the cancer treatment is a cancer treatment directed to the primary source of the cancer. 36. The method of claim 34 or 35, wherein the enhancer is a region of the genome wherein less than 60% of CpGs are methylated. 37. The method of claim 36, wherein the enhancer is a region of the genome wherein less than 50% of CpGs are methylated.

38. The method of claim 36, wherein the enhancer is a region of the genome wherein less than 40% of CpGs are methylated. 39. The method of claim 36, wherein the enhancer is a region of the genome wherein less than 30% of CpGs are methylated. 40. The method of claim 36, wherein the enhancer is a region of the genome wherein less than 20% of CpGs are methylated. 41. The method of claim 36, wherein the enhancer is a region of the genome wherein less than 10% of CpGs are methylated. 42. The method of claim 36, wherein an enhancer is a region of the genome that is modified with H3K4me1. 43. The method of claim 36, wherein an enhancer is a region of the genome that is modified with H3K27ac. 44. The method of any one of claims 23 to 43, wherein the biological sample is a blood sample. 45. The method of any one of claims 23 to 43, wherein the biological sample is a tumor sample.

Description:
COMPOSITIONS AND METHODS FOR IDENTIFICATION,

ASSESSMENT AND TREATMENT OF CANCER USING DNA METHYLATION FINGERPRINTS

Related U.S. Applications

This application claims priority to U.S. Provisional Application 62/635,287 filed February 26, 2018, and U.S. Provisional Application 62/700,457, filed July 19, 2018, each of which is incorporated herein by reference in its entirety. Government Support

This invention was made with government support under grant number s U01 DK103152, F32 DK103453, and K01 DK113067 awarded by The National Institutes of Health. The government has certain rights in the invention. Background of the Invention

Cancer of unknown primary site (CUP) is a well-recognized clinical disorder, accounting for 3–5% of all malignant epithelial tumors. CUP is clinically characterized as an aggressive disease with early dissemination. Diagnostic approaches to identify the primary site include detailed histopathological examination with specific

immunohistochemistry and radiological assessment. Metastatic adenocarcinoma is the most common CUP histopathology (80%). CUP patients are divided into subsets of favorable (20%) and unfavorable (80%) prognosis. Favorable subsets are mostly given local or regional treatment or systemic platinum-based chemotherapy. Responses and survival are similar to those of patients with relevant known primary tumors. Patients in unfavorable subsets are treated with empirical chemotherapy based on combination regimens of platinum or taxane.

The prognosis for patients with CUP is poor. As a group, the median survival is approximately three to four months with less than 25% and 10% of patients alive at one and five years, respectively. CUP is represented by a heterogeneous group of diseases all of which have presented with metastasis as the primary manifestation (Altman et al. (1986) Cancer 57 (1): 120-4; Ringenberg (1985) Med Pediatr Oncol 13 (5): 301-6). Often, identification of a cancer’s primary source is an important factor in tailoring cancer treatment. Thus, there is a need for novel diagnostic methods to detect and elucidate the primary origin of CUP. Summary of the Invention

In some aspects, the methods and compositions disclosed herein are related to the identification of the primary source of cancer in patients with CUP. Additionally, provided herein are methods of treatment of cancer (e.g., CUP).

Adult tissues use different cis-regulatory elements than those deployed in embryos. Properties of decommissioned enhancers (e.g., enhancers activated in development and subsequently silenced) and the basis for their subsequent silence are unknown. The present invention is related, in part, that in adult cells, hypomethylated CpG dinucleotides preserve a comprehensive archive of tissue-specific developmental enhancers. Sites previously regarded as‘primed’ enhancers, because they carry the active histone mark H3K4me1, acted late in organogenesis, whereas sites decommissioned after early embryonic activity retain hypomethylated DNA as a singular property. Additionally, as exemplified in the data shown herein, sustained absence of Polycomb Repressive Complex 2 in adult cells indirectly reactivates nearly all–and only– hypomethylated developmental enhancers. In this setting, tissue-restricted embryonic and fetal transcriptional programs re-emerge in reverse chronology to enhancer and gene inactivation during development. Therefore, hypomethylated enhancer DNA in adult cells preserves a catalogue of tissue-specific cis- elements, reminiscent of the‘fossil record’, marking decommissioned enhancers for aberrant reactivation. The invention disclosed herein also relates, in part, to the

identification of tens of thousands of developmental (embryonic) enhancers. This discovery will allow tissue enhancer signatures to be delineated with far greater precision than has been possible to date. These findings also have implications for cellular reprogramming and for gene deregulation in cancer.

In some aspects, provided herein are methods and compositions for determining and detecting the primary source of a cancer in a subject. In some aspects, provided herein are methods and compositions to determine or detect a primary source of cancer from tissue DNA or circulating DNA. The methods may comprise obtaining a biological sample comprising cancer cells from the subject and detecting hypomethylated regions of a portion of the genome of the cancer cells in the biological sample. The methods may further comprise comparing the distribution of hypomethylated regions of a portion of the genome of the cancer cells to the distribution of hypomethylated regions of portions of the genomes of different tissues (e.g., other tissues from the subject or tissues from a third party or adult or embryonic tissues), wherein the primary source of the cancer is the tissue that has a substantially similar distribution of hypomethylated regions to the distribution of hypomethylated regions in the cancer cells in the biological sample. The methods disclosed herein may comprise obtaining a biological sample comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample, comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of different tissues (e.g., other tissues from the subject or tissues from a third party), wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample. In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. In some

embodiments, an LMR is a region of the genome wherein less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or less than 10% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac.

In some aspects, provided herein are methods and compositions for determining the primary source of a cancer in a subject, by first obtaining a biological sample comprising cancer cells from the subject and detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells. The method may further comprise comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues (e.g., CpG methylation fingerprints generated from tissues from the subject or tissues from a third party, such as from a library of tissue- specific CpG methylation fingerprints), wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample. In some embodiments, the enhancer is a region of the genome wherein less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or less than 10% of CpGs in the region are methylated. In some embodiments, an enhancer is a region of the genome that is modified with H3K4me1. In some embodiments, the enhancer is a region of the genome that is modified with H3K27ac. The biological sample may be a blood sample or tumor sample. In some aspects, provided herein are methods of treating cancer. In some embodiments, the method comprises obtaining a biological sample comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample, identifying the primary source of cancer by comparing the distribution of LMRs in a portion of the genome of the cancer cells to the distribution of LMRs in a portion of the genome in different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample, and based on this information, administering a cancer therapy to the subject.

In some embodiments, the cancer treatment is a cancer therapy that is directed to the primary source of the cancer. In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. In some embodiments, an LMR is a region of the genome wherein less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or less than 10% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac.

In some aspects, provided herein are methods of treating cancer in a subject by obtaining a biological sample comprising cancer cells from the subject, detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells. The method may further comprise identifying the primary source of cancer by comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample, and based on this information, administering a cancer therapy to the subject. In some embodiments, the cancer therapy is a cancer therapy directed and/or tailored to the primary source of the cancer. In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. In some

embodiments, an LMR is a region of the genome wherein less than 60%, less than 50%, less than 40%, less than 30%, less than 20%, or less than 10% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac. Brief Description of Figures

Figure 1 has five parts, A-E, and shows enhancer states in adult mouse intestinal villus epithelium. Part A shows known features associated with diverse cis-element states. Part B shows high concordance among biological replicates of ChIP-seq experiments in this study; numbers represent Pearson correlations. Part C shows high concordance between the (x-axis) and published (y-axis) base-resolution data on methylated DNA (meCpG) in purified villus epithelial cells. Parts D and E shows high concordance among biological replicates of RNA-seq (Part D, sample-to-sample Euclidean distances, based on the maximum difference of 366.57 between WT adult and E11.5 intestinal epithelium) and ATAC-seq (Part E, Pearson correlations) used in this study.

Figure 2 has four parts, A-D, and shows identification of low-methylated regions (LMRs) in adult mouse intestinal villus epithelium. Part A shows distinction of active (H3K4me1+ H3K27ac+) from‘primed’(H3K4me1+ H3K27ac-) enhancers by ChIP-seq in purified adult mouse villus epithelium. mRNA levels of genes within 50 kb of the two types of enhancers diverge widely, but both groups show comparable levels of hypomethylated DNA at more than half of all regions; the remaining sites failed to meet conventional LMR criteria (FDR <0.05). Integrated Genome Viewer (IGV) tracks of meCpG (0-100%) show examples of DNA hypomethylation at H3K4me1+ enhancers that have and those that lack H3K27ac. Density plot shows high concordance between the (x-axis) and published (Sheaffer et al., 2014) (y-axis) base resolution data on meCpG fractions at sites in purified villus epithelial cells. Part B a cumulative plot of H3K4me1 marks at LMRs and UMRs, considered in increasing order of median meCpG content (and proportionally lower confidence of LMR calls) from left to right. When the false discovery rate (FDR) was relaxed from 0.05 (corresponding to meCpG ~50%) to 0.1 (meCpG <59%), about 9% of additional LMRs showed H3K4me1, implying a regulatory function. Amending the false discovery rate (FDR) from 0.05 (corresponding to meCpG <56%) to 0.1 (meCpG <59%) resulted in identification of ~12,000 additional LMRs. Part C shows fraction of all UMRs that map to promoters (<-2kb and <1 kb from TSSs) and distant regions, with representative examples shown as IGV tracks. Part D shows a distribution of all candidate enhancers based on H3K4me1, H3K27ac, and meCpG. Figure 3 has five parts, A-E, and shows the identity and characteristics of low- methylated regions (LMRs) in adult intestinal villus epithelium. Part A shows micrograph and schematic drawings of mouse duodenal epithelium, showing predominance of enterocytes in the post-mitotic villus compartment. Part B (left) shows distribution of all UMRs and LMRs in purified villus epithelium, determined from WGBS data. Contour map (right) and aggregate meCpG density plots show the distribution of promoters, active and ‘primed’ enhancers, and sites recognized only by reduced meCpG. Part C shows signals for methylated DNA (meCpG), histone marks (H3K4me1/3 and H3K27ac), HNF4A occupancy, and open chromatin (ATAC) at 12,710 promoter UMRs, active and‘primed’ enhancers partitioned into groups with and without hypomethylated DNA, and LMR-only regions. Part D shows representative Integrative Genome Viewer (IGV) tracks showing cis- element features. Part E shows that high evolutionary conservation of enhancers from each group, including LMR-only sites.

Figure 4 has four parts, A-D, and shows the relation of LMRs in mouse intestine and other tissues. Part A shows the mapping of UMRs and LMRs using public WGBS data from mouse blood (shown as a cloud map) and other tissues (tabulated below). Part B is a demonstration of substantial fractions of‘LMR-only’ candidate enhancers (low meCpG, H3K4me1-) in blood as a representative tissue. Part C shows that most sites hypomethylated in adult intestinal epithelium are fully methylated in other tissues. Similar profiles in villus and purified Lgr5+ intestinal stem cells (ISC) reveal reduced enhancer meCpG as a tissue- and not a differentiation state-specific signature. Part D shows highly divergent tissue specificity (represented by z-scores) of LMRs, which are typically hypomethylated in only 1 or 2 of the 4 tissues examined, and UMRs, most of which are hypomethylated in every tissue. In agreement with this difference, genes linked to LMRs associate with tissue- specific functions, whereas genes linked to UMRs associate with general cellular properties. IGV tracks at the representative Cars locus reveal distinct LMRs (candidate tissue-restricted enhancers) in each tissue.

Figure 5 has eight parts, A-H, and shows characteristics of LMR-only putative enhancers and dynamics of gene expression and chromatin features in mouse intestine development . Parts A shows specificity of each candidate enhancer group relative to the other mouse tissues evaluated in Figure 4 (blood, skin, and brain). Although LMR-only sites are predominantly intestine-restricted, they are hypomethylated in more tissues than are active or‘primed’ sites, which typically show reduced meCpG only in the intestine and at most one other tissue. Part B shows that, relative to the genome background, all 3 groups of sites hypomethylated in the adult intestine are highly enriched for TF sequence motifs. Whereas active and‘primed’ enhancers are enriched for the motifs of well-known intestine- active TFs, both‘primed’ and LMR-only sites are enriched for developmentally active TFs such as FOX proteins. Part C shows biological processes affiliated with genes located within 50 kb of each group of putative enhancers. Part D shows example of flow cytometry separation of EPCAM+ prospective epithelial cells from E12.5 intestinal endoderm. Part E shows expression dynamics of 12,266 genes that were modulated between any 2

developmental stages and adult epithelium. Each column represents data from 1 of at least 2 biological replicates. Part F shows gap statistics to determine the optimal number of clusters for differential (k-means) analysis of ATAC-seq data. The 10 clusters fell into the 4 distinct groups shown in Figure 6, Part A. Part G shows comparison of histone marks at regions identified as showing open chromatin (by ATAC-seq) selectively in E11.5 and E12.5 endoderm (embryonic), E14.5 and E16.5 epithelium (fetal), and adult intestinal villus cells. ChIP for H3K4me1 was performed; the data on H3K27ac are from Kazakevych et al., hereby incorporated by reference in its entirety. While areas of open chromatin in adult cells correspond to H3K4me1+ H3K27ac+ adult enhancers, many‘primed’ and LMR-only regions carry active histone marks in the developing intestine. Part H shows the most of the ~55,000 LMRs identified objectively in mouse epiblast (see are methylated in adult intestinal cells. Thus, preservation of hypomethylation begins after the epiblast stage.

Figure 6 has sixparts, A-F, showing chromatin and meCpG dynamics during mouse intestine development. Part A shows profiles of open chromatin in purified EPCAM + intestinal cells across four gestational stages and in adult villus cells at regions >-2 or >1 kb from TSSs. Among the 68,510 sites identified by ATAC-seq at any stage, successive waves of accessible chromatin delineate embryonic, fetal, and adult enhancers. Part B shows the correlation of ATAC-identified regions specific to embryonic (E11.5 or E12.5), fetal (E14.5 or E16.5), and 6-week adult intestinal epithelium with mRNAs expressed during each period (from Figure 5, Part E). Numbers represent z-scores from regulatory potential of genes linked to each group of enhancers from BETA analysis. Part C shows profiles of hypomethylated DNA at the 68,510 ATAC-identified regions, extracted from WGBS data on undifferentiated E6.5 epiblast, E12.5 and E16.5 intestinal endoderm, and adult villus epithelium. Part D shows overlap of meCpG state in embryonic, fetal, and adult intestinal epithelium and representative IGV tracks showing chromatin, meCpG, and mRNA dynamics at representative embryonic (Hapln1) and fetal (Myl1) loci. Among the 53,350 developmental LMRs identified by WGBS, only 8,914 sites (16.7%) were methylated in adults. Part E shows a schema illustrating that reduced enhancer meCpG persists indefinitely after developmental genes and enhancers are decommissioned. Part F shows limited overlap of accessible chromatin (ATAC) sites in developing and adult intestinal epithelium. Most of the 38,376 regions open in fetal or embryonic intestinal endoderm were closed in adults. meCpG states in adult, fetal, and embryonic intestinal epithelium. All but 8,914 of the 53,350 LMRs (16.7%) identified objectively in the developing intestine met stringent LMR criteria in adults. Most enhancers that became inaccessible after

development retained hypomethylated DNA.

Figure 7 has two parts, A-B, and shows the dynamics of enhancer DNA

hypomethylation during mouse intestine development. Part A shows serial cloud maps depicting the meCpG status of all intestinal LMRs identified at any stage (E12.5, E16.5, adult) in the E6.5 epiblast and in E12.5 and E16.5 intestinal epithelium. The positions of adult,‘primed’ (fetal), and‘LMR-only’ (embryonic) enhancers is depicted in the contour plots below each cloud map. Hypomethylation first occurs predominantly at LMR-only regions and subsequently at‘primed’ and adult enhancers, with the adult profile evident by E16.5. meCpG principally decreases during the transitions; a dotted box in the center panel marks the minority of sites (5,891 of 31,287) hypomethylated at E12.5 that are methylated in adults. Part B shows representative sites hypomethylated successively during intestine development, with persistent hypomethylation in adult intestinal epithelium.

Figure 8 has three parts, A-C, and shows Eed deletion in intestinal villus epithelium and ensuing loss of PRC2 activity. Part A shows intestinal epithelium-specific loss of all methylated H3K27 forms in Villin-Cre ER-T2 ;Eed Fl/Fl mice 9 days after the 1 st of 5 daily doses of tamoxifen. Residual fluorescence signals are restricted to the sub-epithelial mesenchyme, distinguished from the overlying epithelium by dotted lines. Pie charts depict the fractions of each covalently modified form of H3K27 and the table lists levels of other histone modifications measured by mass spectrometry in purified WT and Eed -/- villus epithelium (N=3 each). All methyl-H3K27 was markedly reduced and levels of 10 other histone marks were unperturbed, but H3K27ac was slightly increased. Part B shows schema to sustain PRC2 deficiency in vivo and resulting intestinal histology, cell replication, and total H3K27me3. Part C shows quantitative identification of H3K27ac gains and losses (>1.5- fold, q<0.01) in mutant cells by diffReps. Figure 9 shows has five parts, A-E, and shows selective reactivation of

developmental enhancers after prolonged PRC2 deficiency. Part A shows sequential appearance of H3K27ac and accumulation of H3K4me1 at fetal and embryonic enhancers in Eed -/- intestinal villus epithelium and preferential losses of these histone marks from non- hypomethylated active enhancers. Sites are arranged in the same order as Figure, 3, Part C, with adult (active) and fetal (‘primed’) enhancers partitioned into those with and without LMRs. P values were calculated using the Chi-square test. Illustrative IGV tracks below show H3K27ac and H3K4me1 gains (scales refer to relative ChIP signals). Part B shows high overlap of sites of H3K27ac gain with the full complement of hypomethylated fetal and embryonic enhancers. meCpG levels (right) in purified WT villus epithelium at the 9,658 sites of H3K27ac gain that failed stringent LMR criteria (FDR <0.01, meCpG <59%), but are incompletely methylated (meCpG 60-90%). Part C shows that any of the 34,846 hypomethylated developmental enhancers that did not acquire sufficient H3K27ac to detect by diffReps showed measurable H3K27ac gains. In contrast, H3K27ac was not acquired at enhancers that are hypomethylated in the developing intestine (n=8,914) or in macrophages (n=21,083) but are fully methylated in adult intestines. P values were calculated using the Wilcoxon signed rank test. IGV tracks below each split-violin plot (WT, light; Eed -/- dark) illustrate presence or absence of H2K27ac gains at cis-elements with different levels of basal (WT) meCpG. Part D shows macrophage-restricted enhancers (n=21,083) do not acquire H3K27ac in Eed-/- intestine. P values were calculated using the Wilcoxon signed rank test. IGV tracks below split-violin plot (WT, light; Eed-/- dark) illustrate presence or absence of H2K27ac gains at cis-elements with different levels of basal (WT) meCpG in intestine. Part E shows H3K27ac did not accumulate at the 8,914 enhancers that are hypomethylated in development but methylated in adults, or at the 12,710 promoter UMRs. P values were calculated using the Wilcoxon signed rank test. IGV tracks below each split- violin plot (WT, light; Eed-/- dark) illustrate H2K27ac changes at cis-elements with different levels of basal (WT) meCpG.

Figure 10 has five parts, A-E, and shows H3K27ac flux in PRC2-null macrophages and H3K27me1/2/3 status at intestinal enhancers. Part A shows erasure of all methyl- H3K27 forms in Eed -/- macrophages, cultured as depicted in the schema. H3K27me1/2/3 were detected by immunofluorescence. Part B shows relation of 7,836 sites of objective H3K27ac acquisition (diffReps) in Eed -/- macrophages to LMRs in WT cells. H3K27ac gains were restricted to sites with basal hypomethylated DNA. IGV tracks are shown at a representative locus. Part C shows LMRs in macrophages and intestine that lack H3K27ac in the respective native tissues show tissue-specific H3K27ac gains in PRC2-null cells. Part D shows representative IGV tracks of ChIP-seq for H3K27me1/2/3 in purified villus epithelial cells, illustrating that the 3 marks are mutually exclusive and that SUZ12 binding coincides with areas (mainly silent promoters) with the most H3K27me3. Part E shows distributions (left) of H3K27me1, H3K27me2, H3K27me3, and SUZ12 around all candidate enhancers classified in Figure 3, Part C. Aggregate plots (right) and

representative IGV tracks (numbers refer to the scales for relative ChIP signals). Most regions lacked H3K27me3 and SUZ12, whereas H3K27me1 and especially H3K27me2 were ubiquitous, with no focal enrichment over the enhancers that acquire active histone marks in PRC2-null cells. These distributions imply that enhancer flux is an indirect consequence of PRC2 loss.

Figure 11 has four parts, A-D, and shows the consequences of short-term PRC2 deficiency. Part A shows transcripts from bivalent genes activated within 9 days of initial PRC2 depletion continue to accumulate, and transcripts from genes with low basal promoter H3K4me3 appear at 11 and 14 days. Genes are arrayed in decreasing order of mRNA gains at 14 days, and the corresponding promoter levels of H3K27me3 and

H3K4me3 highlight the relation of gene activation to basal H3K4me3 levels. Part B shows a partial list of induced and representative stable TF mRNAs of the FOX family encoded by bivalent and non-bivalent genes in mouse intestine. Part C shows new FOXA1 binding in Eed -/- cells occurs predominantly at the intestinal enhancers defined in this study, including hypomethylated fetal enhancers that acquired H3K4me1 and H3K27ac. Part D shows FOXA1 and FOXG1 occupancy at enhancers within 11 days of initial Eed deletion.

Representative IGV tracks are shown to the right.

Figure 12 has five parts, A-E, and shows enhancers and genes altered after sustained PRC2 deficiency. Part A and B shows mRNA (left) and basal (WT) promoter H3K27me3 and H3K4me3 levels (right) of 4,171 genes reduced (Part A) and 6,127 genes increased (Part B) in Eed -/- villus cells by day 14. Genes are arranged in order from highest to lowest mRNA loss (Part A) or gain (Part B); IGV tracks show examples from each group. Part C shows composite plots of promoter H3K27me3 and H3K4me3 levels for the 3 groups of genes affected in Eed-/- villus cells, including early-activated bivalent genes (Figure 11, Part A). In contrast to the latter group, genes affected at days 11 and 14 have high basal H3K4me3 and virtually no HK27me3. Part D shows gene set enrichment analysis (GSEA) showing that genes repressed between 9 and 14 days after initial PRC2 depletion associate specifically with active enhancers that lose H3K27ac. Part E shows fraction of all non-bivalent genes reactivated in Eed -/- villus cells that were expressed during intestine (grey), heart (blue) or lung (green) development. The Venn diagram shows the overlap of embryonic heart and lung genes reactivated in Eed -/- adult intestine with embryonic and fetal intestinal genes.

Figure 13 has five parts, A-E, and shows activation of developmental genes linked to reactivated enhancers. Part A shows an inferred sequence of events following PRC2 loss. TF and other genes are activated early from bivalent promoters. Subsequently, some TFs selectively reactivate first hypomethylated fetal and then embryonic, but not fully methylated, enhancers. Part B shows gene set enrichment analysis (GSEA) showing distribution of genes near (<50 kb) fetal and embryonic enhancers among the genes reactivated by day 14 in Eed -/- villus cells. All RefSeq genes are arrayed in increasing order of gain in Eed -/- cells. Part C shows fraction (left) of all non-bivalent genes expressed during intestine development that are reactivated in Eed -/- villus cells. Euclidean distances (right) between RNA-seq profiles of triplicate (WT and 9-day Eed -/- ) or duplicate samples (all others) of WT and Eed -/- adult villus cells and WT intestinal endoderm. Part D shows distributions of mRNA levels-during intestine development and after adult PRC2 depletion-of all genes located <50 kb from hypomethylated fetal (blue) and embryonic (salmon) enhancers. Although WT levels of both transcript groups are comparably low, fetal genes were reactivated sooner than embryonic genes; both groups were re-expressed at levels similar to those in the developing endoderm. IGV tracks show representative RNA- seq data for the exons marked with arrows. Part E shows recommissioned H3K27ac+ fetal and embryonic enhancers (left) were significantly associated with reactivation of genes within 50 kb, while those lacking H3K27ac gains were not (right). NES, normalized enrichment score. Representative IGV tracks (right) show recrudescence of

decommissioned enhancers in PRC2-null (Eed-/-) intestines. Arrows point to exons at which mRNA levels are shown in panel D.

Figure 14 shows an enhancer landscape analysis in intestinal villus cells.

Figure 15 shows the distribution of UMRs and LMRs in tissue. A contour map shows the sub-distribution of promoters, active and‘primed’ enhancers, and sites recognized by reduced meCpG.

Figure 16 shows adult enhancer identity. Figure 17 shows dynamic developmental enhancers recorded in DNA

hypomethylation.

Figure 18 shows developmental enhancers in adult tissue may have PRC2 protection.

Figure 19 shows embryonic enhancer activation upon loss of PRC2 action.

Figure 20 shows multilayered gene control by PRC2 at promotors and enhancers. Figure 21 shows multilayered gene control by PRC2 through transcription factor activation.

Figure 22 shows a model for PRC2 gene control.

Figure 23 shows bivalent genes in intestine are sensitive to PRC2 loss. Detailed Description of the Invention

In some aspects, provided herein are methods and compositions for determining the primary source of a cancer in a subject. The methods may comprise obtaining a biological sample comprising cancer cells from the subject, and detecting hypomethylated regions of a portion of the genome of the cancer cells in the biological sample. In some embodiments, the method further comprises comparing the distribution of hypomethylated regions of a portion of the genome of the cancer cells to the distribution of hypomethylated regions of portions of the genomes of different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of hypomethylated regions to the distribution of hypomethylated regions in the cancer cells in the biological sample. Also provided herein are methods of treating cancer, by first determining the primary source of cancer and subsequently administering a cancer therapy or treatment based on this information, as disclosed herein. Methods disclosed herein also include methods of increasing survival in a subject with CUP, by determining the primary source of cancer, and subsequently administering a cancer therapy or treatment based on this information, as disclosed herein. Definitions

The articles“a” and“an” are used herein to refer to one or to more than one (i.e. to at least one) of the grammatical object of the article. By way of example,“an element” means one element or more than one element. The term“DNA methylation fingerprint” may refer to any experimental technique known in the art of detecting or determining the methylation levels or states of DNA, and subsequently displaying the levels of DNA methylation (e.g., CpG methylation) relative to other portions of the DNA.

The terms“cancer” or“tumor” refers to the presence of cells possessing

characteristics typical of cancer-causing cells, such as uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, and certain characteristic morphological features. Cancer cells are often in the form of a tumor, but such cells may exist alone within an animal, or may be a non-tumorigenic cancer cell, such as a leukemia cell. Cancers include, but are not limited to, B cell cancer, e.g., multiple myeloma, Waldenström's macroglobulinemia, the heavy chain diseases, such as, for example, alpha chain disease, gamma chain disease, and mu chain disease, benign monoclonal gammopathy, and immunocytic amyloidosis, melanomas, breast cancer, lung cancer, bronchus cancer, colorectal cancer, prostate cancer, pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine or endometrial cancer, cancer of the oral cavity or pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel or appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, cancer of hematological tissues, and the like. Other non-limiting examples of types of cancers applicable to the methods encompassed by the present invention include human sarcomas and carcinomas, e.g., fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor,

leiomyosarcoma, rhabdomyosarcoma, colon carcinoma, colorectal cancer, pancreatic cancer, breast cancer, ovarian cancer, prostate cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, liver cancer, choriocarcinoma, seminoma, embryonal carcinoma, Wilms' tumor, cervical cancer, bone cancer, brain tumor, testicular cancer, lung carcinoma, small cell lung carcinoma, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma,

craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, melanoma, neuroblastoma, retinoblastoma; leukemias, e.g., acute lymphocytic leukemia and acute myelocytic leukemia (myeloblastic,

promyelocytic, myelomonocytic, monocytic and erythroleukemia); chronic leukemia (chronic myelocytic (granulocytic) leukemia and chronic lymphocytic leukemia); and polycythemia vera, lymphoma (Hodgkin's disease and non-Hodgkin's disease), multiple myeloma, Waldenstrom's macroglobulinemia, and heavy chain disease. In some embodiments, the cancer whose phenotype is determined by the method of the invention is an epithelial cancer such as, but not limited to, bladder cancer, breast cancer, cervical cancer, colon cancer, gynecologic cancers, renal cancer, laryngeal cancer, lung cancer, oral cancer, head and neck cancer, ovarian cancer, pancreatic cancer, prostate cancer, or skin cancer. In other embodiments, the cancer is breast cancer, prostate cancer, lung cancer, or colon cancer. In still other embodiments, the epithelial cancer is non-small-cell lung cancer, nonpapillary renal cell carcinoma, cervical carcinoma, ovarian carcinoma (e.g., serous ovarian carcinoma), or breast carcinoma. The epithelial cancers may be

characterized in various other ways including, but not limited to, serous, endometrioid, mucinous, clear cell, brenner, or undifferentiated. The term“response to cancer therapy” or “outcome of cancer therapy” relates to any response of the cancer to a cancer therapy, preferably to a change in tumor mass and/or volume after initiation of neoadjuvant or adjuvant chemotherapy. Cancer response may be assessed, for example for efficacy or in a neoadjuvant or adjuvant situation, where the size of a tumor after systemic intervention can be compared to the initial size and dimensions as measured by CT, PET, mammogram, ultrasound or palpation. Response may also be assessed by caliper measurement or pathological examination of the tumor after biopsy or surgical resection for solid cancers. Responses may be recorded in a quantitative fashion like percentage change in tumor volume or in a qualitative fashion like“pathological complete response” (pCR),“clinical complete remission” (cCR),“clinical partial remission” (cPR),“clinical stable disease” (cSD),“clinical progressive disease” (cPD) or other qualitative criteria. Assessment of cancer response may be done early after the onset of neoadjuvant or adjuvant therapy, e.g., after a few hours, days, weeks or preferably after a few months. A typical endpoint for response assessment is upon termination of neoadjuvant chemotherapy or upon surgical removal of residual tumor cells and/or the tumor bed. This is typically three months after initiation of neoadjuvant therapy. In some embodiments, clinical efficacy of the therapeutic treatments described herein may be determined by measuring the clinical benefit rate (CBR). The clinical benefit rate is measured by determining the sum of the percentage of patients who are in complete remission (CR), the number of patients who are in partial remission (PR) and the number of patients having stable disease (SD) at a time point at least 6 months out from the end of therapy. The shorthand for this formula is CBR=CR+PR+SD over 6 months. In some embodiments, the CBR for a particular cancer therapeutic regimen is at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, or more. Additional criteria for evaluating the response to cancer therapies are related to “survival,” which includes all of the following: survival until mortality, also known as overall survival (wherein said mortality may be either irrespective of cause or tumor related);“recurrence-free survival” (wherein the term recurrence shall include both localized and distant recurrence); metastasis free survival; disease free survival (wherein the term disease shall include cancer and diseases associated therewith). The length of said survival may be calculated by reference to a defined start point (e.g., time of diagnosis or start of treatment) and end point (e.g., death, recurrence or metastasis). In addition, criteria for efficacy of treatment can be expanded to include response to chemotherapy, probability of survival, probability of metastasis within a given time period, and probability of tumor recurrence. The outcome measurement may be pathologic response to therapy given in the neoadjuvant setting. Alternatively, outcome measures, such as overall survival and disease- free survival can be monitored over a period of time for subjects following cancer therapy for whom the measurement values are known. In certain embodiments, the same doses of cancer therapeutic agents are administered to each subject. In related embodiments, the doses administered are standard doses known in the art for cancer therapeutic agents. The period of time for which subjects are monitored can vary. For example, subjects may be monitored for at least 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 months. Thus, the present invention further provides methods for making a treatment decision for a cancer patient, comprising carrying out the methods for determining the primary source of cancer according to the different aspects and embodiments of the present invention, and then weighing the results in light of other known clinical and pathological risk factors, in determining a course of treatment for the cancer patient. For example, a cancer patient that is shown by the methods of the invention to have an increased risk of poor outcome because of the primary source of cancer can be treated with more aggressive therapies, including but not limited to radiation therapy, or novel or experimental therapies under clinical investigation. The term“sample” is typically a tumor sample, or may be whole blood, plasma, serum, saliva, urine, stool (e.g., feces), tears, and any other bodily fluid (e.g., as described above under the definition of“body fluids”), or a tissue sample (e.g., biopsy) such as a small intestine, colon sample, or surgical resection tissue.

The term“subject” refers to any healthy animal, mammal or human, or any animal, mammal or human afflicted with a condition of interest (e.g., cancer). The term“subject” is interchangeable with“patient.” DNA Methylation Fingerprints

In some aspects, provided herein are methods and compositions for determining the primary source of a cancer in a subject from tissue DNA or circulating DNA. In some embodiments, the methods described herein include evaluating or analyzing methylation of regions of DNA (i.e., CpG regions) in cancer cells in a biological sample from a subject. The methods disclosed herein may include categorizing or identifying the low methylation regions(LMRs), hypomethylated regions, or unmethylated regions (UMRs)) in cancer cells in a biological sample from a subject. The methods disclosed herein may utilize any known method for analyzing methylation of CpGs in DNA. The methods disclosed herein may also include generating a heatmap, CpG fingerprint, or DNA methylation fingerprint of the low methylated regions, hypomethylated regions, or reduced meCpG regions of the cancer cells. The methods disclosed herein may also comprise comparing the heatmap, CpG fingerprint, or DNA methylation fingerprint of the cancer cells from the biological sample to a heatmap, CpG fingerprint, or DNA methylation fingerprint of cells from different tissues.

Most current DNA methylation analysis technologies rely on the DNA treatment with sodium bisulfite after chemical denaturation. This treatment deaminates all the unmethylated cytosine residues, converting them to uracil. Methylated cytosines, however, are resistant to deamination and remain unaffected. The bisulfite treated DNA is

subsequently analyzed by PCR amplification, where the converted uracil residues in the sequence are substituted by thymine in the amplification product and therefore easily identified. Several techniques have been designed to study the methylation of regions of interest. Among them, the most popular technologies are bisulfite genomic sequencing, methylation-specific PCR (MSP), MethyLight, combined bisulfite and restriction analysis (COBRA), and more recently methylation microarrays. Before the widespread use of bisulfite-based techniques, several fingerprint approaches were developed based on methylation-sensitive restriction enzymes, whose activity is blocked by methylation of their target sequences.

In some embodiments, the methods disclosed herein include the use of Restriction Landmark Genomic Scanning for Methylation (RLGS-M) to elucidate the methylation states of the DNA from cancer cells in the biological samples disclosed herein. RLGS is a high-resolution two-dimensional gel electrophoresis method based on the use of rare-cutter restriction enzymes to produce a discrete number of fragments. The original method, based on a methylation-insensitive rare-cutter PacI, was capable of detecting polymorphisms and copy number changes but not methylation differences. RLGS-M is a variation of RLGS in which methylation-sensitive rare-cutters are employed, like NotI (GCGGCCGC) and AscI (GGCGCGCC) that only cut when both internal CpG sites (in bold type) are unmethylated. RLGS-M is suitable to assess copy number changes as well as differences in methylation. After digestion, the 5’-protruding ends are labeled by filling the overhanging strand with radioactive nucleotides. The DNA is then subjected to a second digestion to produce smaller DNA fragments, most commonly with the restriction enzyme EcoRV, and separated in a first dimension agarose gel. After this electrophoresis, the sample is subjected to an in situ digestion with a third restriction enzyme, usually HinfI, allowing adequate separation of the resultant smaller fragments in a second-dimension

polyacrylamide gel electrophoresis. After autoradiography, the end result is a reproducible RLGS profile that displays hundreds of spots reflecting the combination of copy number and methylation status of individual loci. RLGS-M profiles from different tissues can be compared to assess for tissue-specific methylation, or from tumor and adjacent normal tissue to investigate cancer-associated methylation aberrations. Deletions or

hypermethylation are detected by the loss or reduction of signal intensity of a RGLS-M spot, while amplification or hypomethylation result in increased signal intensity or new spots. Using the surrounding single copy spots as internal controls, the loss or gain of intensity of a locus can be visually estimated or quantified by densitometry. To facilitate the identification task, it is now possible to conduct automated RLGS fragment prediction and to download corresponding sequences for mouse and human studies.

RLGS-M fingerprinting was one of the first methods capable of identifying many landmark fragments in a single run and the simultaneous identification of both gene copy number and methylation status. RLGS-M may be employed, for example, to analyze aberrant DNA methylation in breast, ovarian, colon, gastric, lung and hepatocellular cancers. A RLGS-M study on colorectal cancer found that methylation of some CpG islands located in non-promoter regions of genes was associated with gene expression upregulation, indicating that alterations in the methylation status within CpG islands in colon tumors may have complex consequences on gene expression and tumorigenesis.

In some embodiments, the methods disclosed herein include the use of Methylation- Sensitive Arbitrarily Primed PCR (MS-AP-PCR) and Restriction Fingerprinting (MSRF). Both methods rely on the treatment of the genomic DNA with a methylation-insensitive frequent cutter yielding a methylation-independent DNA fragments library and, in a separate aliquot, with the same frequent-cutter and a methylation-sensitive restriction enzyme yielding a methylation-dependent DNA fragments library. In MSRF, the methylation-independent library is generated with the frequent cutter MseI (TTAA) and the methylation-dependent library with a mixture of MseI and the methylation-sensitive BstUI (CGCG), which cuts only if both CpG sites within its recognition sequence are

unmethylated. In MS-AP-PCR, two different methylation independent libraries are generated: one with RsaI (GATC), and another one with a mixture of RsaI and MspI, which digests in CCGG sequences regardless the methylation status of the internal CG

dinucleotide. The methylation-dependent library is generated with a mixture RsaI and the methylation-sensitive HpaII, which recognizes the same sequence than MspI (CCGG) but only digests if the internal CG dinucleotide is demethylated. After enzymatic restriction, these DNA fragments libraries serve as template for PCR amplification in AP-PCR or RAPD conditions. To maximize the likelihood of amplifying regions susceptible of being differentially methylated, primers containing at least one CpG dinucleotide in their 3’ end are recommended. The rationale in both methods is that the methylated sequences, which are protected from restriction, will amplify in both libraries, whereas sequences containing unmethylated HpaII (in MS-AP-PCR method) or BstUI (in MSFR) sites are cleaved and will not yield any product in the methylation-dependent library. MS-AP-PCR includes an additional methylation-independent library generated with a mix of RsaI and MspI to discern whether the fragments amplified in both the RsaI methylation-independent and the RsaI-HpaII methylation-dependent libraries contain internal HpaII sites. The radioactive MS-AP-PCR and MSFR PCR products are resolved on high-resolution polyacrylamide gels under denaturing (MS-AP-PCR) or non-denaturing (MSRF) conditions. Comparison of methylation-dependent fingerprints from two samples, for instance a tumor and the surrounding normal tissue, reveals methylation differences. In both methods, hypermethylation is detected by increased intensity, whereas hypomethylation result in decreased intensity of the fingerprint bands. In principle, changes in intensity of the fingerprints bands from the methylation-dependent libraries can arise also from copy number alterations. The nature of the alteration can be ascertained by comparing the fingerprints from the methylation-independent libraries. Since these libraries are generated with methylation-insensitive restriction enzymes, changes in band intensity in their fingerprints exclusively result from copy number alterations.

MSFR and MS-AP-PCR offer several advantages. Since the original sample is amplified by PCR, only a small amount of genomic DNA is required (100ng-1μg). Also, these methods provide a direct way to determine whether the change in intensity of a particular band reflects methylation changes. However, the number of loci that can be interrogated with these technologies is limited, especially with MSFR, and even when four to eight sets of arbitrary primers are run on each gel, only a small number of candidates (30–40) can be identified in one autoradiography. MS-AP-PCR, and to a lesser extent MSFR, have been widely used to identify aberrantly methylated genes in human cancers.

In some embodiments, the methods disclosed herein include the use of

Amplification of Inter Methylated Sites (AIMS). AIMS is a modified version of the methylated CpG island amplification (MCA) method, based on the differential cleavage of isoschizomers with distinct methylation sensitivity SmaI and XmaI. Both enzymes recognize the CCCGGG octamer, but whereas SmaI activity is blocked by methylation of the central CpG site, XmaI is methylation-insensitive. These isoschizomers also differ in the type of DNA ends generated after digestion. While SmaI generates blunt ends, XmaI generates 4-nt 5’ protruding ends. In AIMS, the genomic DNA is first treated with SmaI, which cleaves all the unmethylated CpG sites leaving blunt ended DNA molecules. Then, the DNA is treated with XmaI, which cuts all the remaining methylated sites producing cohesive ends. Specific adaptors are ligated to the cohesive ends of the digested genomic DNA. The ligated sequences amplified by PCR using adaptor-specific primers extended at the 3’ end with two to four arbitrarily chosen nucleotide residues to reduce the complexity of the product. This method amplifies DNA sequences that have two close methylated SmaI sites and show homology to the nucleotides extended at the 3’ end of the primer. Lack of methylation at either site will allow the digestion by SmaI, leaving blunt ends not compatible to the adaptor and therefore preventing amplification. Different fingerprints can be generated from the same sample by modifying the arbitrarily chosen 3’ end of the amplification primer. After PCR, the amplicons are resolved in denaturing polyacrylamide- sequencing gels generating fingerprints that consist of multiple anonymous bands, ranging in size from about 200 to 1200 bp, representing DNA sequences flanked by two methylated sites. Individual bands can be excised from the gel, re-amplified using the same

fingerprinting primer and characterized by sequencing. AIMS is suitable to compare large numbers of samples and the simultaneous identification of hypomethylation and

hypermethylation events. Hypomethylation is visualized as the loss of the fingerprint band in the tumor sample compared to its normal counterpart and hypermethylation is detected as the appearance of a new band in the tumor specimen. The method is especially powerful in detecting hypermethylation. Dilution experiments indicate that the technique is able to detect hypermethylated sequences even if they are present in <1% of the cells.

Applications of AIMS to investigate epigenetic changes in cancer include the identification of recurrent hypermethylation associated with gene silencing, screening for both hypomethylation and hypermethylation in cancer cell lines with altered DNA methylation function; and the genome-wide estimation of abnormal DNA methylation in cancers. AIMS has been employed to compare the levels of hypermethylation and hypomethylation alterations in colorectal carcinomas and adenomas. They found that the premalignant lesions already attained high levels of hypomethylation, but not

hypermethylation, which suggested that this factor might play a key role in conferring the malignant potential since early stages.

In some embodiments, the methods disclosed herein include the use of Methylation Sensitive-Amplified Fragment Length Polymorphism (MS-AFLP). The NotI-MseI methylation sensitive-amplified fragment length polymorphism method (MS-AFLP) is also a DNA fingerprinting technology that allows for the simultaneous genome-wide detection of DNA hypomethylation in tumor samples compared with normal tissue samples. MS- AFLP utilizes the methylation-sensitive restriction endonuclease NotI. This enzyme recognizes the GCGGCCGC sequence, which is frequent inside or in the proximity of CpG islands. The DNA methylation statuses of the two CpG sites in the octamer sequence are analyzed at hundreds of NotI sites in the genome by using this gel-electrophoresis based DNA fingerprinting technique. Epigenetic alterations, both hypo- and hypermethylation, in tumor tissue DNA are detected by comparing band intensities between normal and tumor tissue DNA fingerprints. In short, genomic DNA is digested with the methylation-sensitive rare-cutter NotI and the methylation-insensitive frequent-cutter MseI. All the MseI sites are cleaved, and the MseI ends are ligated to specific MseI adaptors. The NotI sites are cleaved only when both of the cytosines of the two CpG dinucleotides in the octamer NotI recognition sequence are unmethylated. The NotI ends are ligated to specific NotI adaptors. When either one or both of the two cytosines are methylated, the NotI site is protected from cleavage and, therefore, no adaptors are ligated. The fragment library is subsequently amplified by PCR with a 32P-labeled primer complementary to the NotI adaptors, and a non-labeled primer complementary to the MseI adaptors. Different combinations of primers differing in their 3’ ends can be employed to increase the number of NotI sites analyzed. The PCR products are then resolved in standard sequencing gels and autoradiographied. Three different types of fragments are amplified: MseI-MseI, MseI-NotI and NotI-NotI. The MseI-MseI fragments do not provide any information about DNA methylation, but they are not detected in the autoradiography since only the NotI-primer is labeled. The amplified products are DNA fragments in the range of 50–1,000bp, representing random anonymous genomic sequences.

MS-AFLP is highly reproducible and a relatively large number of bands (~100) are amplified with a high ratio of band signal/background noise. Additionally, MS-AFLP requires little amount of template DNA. As little as five nanograms are sufficient for one MS-AFLP experiment with one pair of primers. In contrast with MSFR and MS-AP-PCR, bands in MS-AFLP fingerprints originate from the unmethylated (cleaved) NotI sites and therefore hypermethylation results in decreased intensity while hypomethylations results in increased intensity of the bands. DNA fragments exhibiting alterations can be directly cloned from fingerprint bands by amplification of gel-eluted DNA with the same pair of primers used for the fingerprint. Several fingerprints generated with different primer combinations can be analyzed in parallel to maximize de number of loci per gel. As some MS-AFLP bands are common to several primer combinations, this provides an internal control for the alterations. Identification of consistent changes in a particular type of cancer can be facilitated by the analysis of multiple samples in the same gels.

Fluorescent-MS-AFLP (FL-MS-AFLP) is a non-radioactive adaptation of this technique in which fluorescently labeled NotI primers are employed. In FL-MS-AFLP the differences in methylation levels between normal and tumor samples are visualized by the difference in signal intensities in the electropherogram. MS-AFLP products are labeled with a fluorescent primer, TAMRA, and analyzed the changes in DNA methylation in cancer using, for example, an ABI Prism 377 automatic DNA Sequencer. FL-MS-AFLP method has been further developed to analyze the methylation levels of blood DNA from gastric cancer patients, as well as methylation differences between gastric tumor samples and adjacent normal tissue samples. For example, the NotI primer is labeled with the fluorescent dye FITC, and measured fluorescence intensity using the DSQ-2000 automatic DNA sequencer. Methylation status of more than 350 NotI sites in the human genome may be evaluated per run by FL-MS-AFLP. The use of fluorescence for MS-AFLP has not only increased the number of bands that can be analyzed, but also made the analysis of longer sequences (up to 1000 bp) possible. FL-MS-AFLP is safer than MS-AFLP because it does not require the use of radioactivity, which may be an important factor in clinical settings. In addition, multiple dyes with different absorption and emission range can be utilized.

Furthermore, as opposed to the short half-life of 32P (14.3 days), fluorescently labeled primers have a longer shelf life if they are kept in the dark. However, using fluorescent tags for NotI-MseI MS-AFLP has also disadvantages. The critical one is that the method does not allow direct cloning of DNA fragments after the identification of alterations.

MS-AFLP has also been applied to a DNA microarray hybridization format (DNA Microarray MS-AFLP). In short, in the microarray hybridization method PCR products are labeled with CY5 and CY3 (using the Klenow fragment and CY5-dCTP or CY3-dCTP), the two probes are mixed, and unincorporated dNTPs are removed. The probes are then used to hybridize DNA microarrays on glass slides in the CGH manner. Fluorescent signal associated with each target spot is measured using a fluorescence scanner. The fluorescence intensity correlates with the number of unmethylated NotI sites cleaved by the enzyme. Furthermore, as in the FL-MS-AFLP, no radioisotopes are used. On the other hand, in the microarray format only two samples can be compared at a time. Another drawback of this technique is its limited use as a gene discovery tool because only the sequences on the microarrays can be analyzed. Nevertheless, gene identification is straightforward as each dot in the microarray is represented by a single sequence. This feature makes DNA

Microarray MS-AFLP a valid method for DNA methylation analysis of larger number of NotI sites in a larger scale of analysis. More details regarding DNA methylation analysis can be found in Mutat Res.(2010); 693(1-2): 61–76, hereby incorporated in its entirety.

The methods described herein comprise obtaining a biological sample comprising cancer cells from the subject, detecting hypomethylated regions of a portion of the genome of the cancer cells in the biological sample, comparing the distribution of hypomethylated regions of a portion of the genome of the cancer cells to the distribution of hypomethylated regions of portions of the genomes of different tissues (e.g., adult or embryonic tissues, or non-cancerous or cancerous tissues), wherein the primary source of the cancer is the tissue that has a substantially similar distribution of hypomethylated regions to the distribution of hypomethylated regions in the cancer cells in the biological sample. The method may comprise obtaining a biological sample comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample, comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample. Comparing the distribution of LMRs or CpG methylation fingerprints and determining the primary source of cancer may include, according to the methods disclosed herein, comparing the distribution of LMRs or CpG methylation in regulatory elements in a portion of the genome of the cancer cells to distribution of LMRs or CpG methylation in regulatory elements in portions of the genomes of different tissues, and identifying the tissue with a substantially similar LMR distribution or CpG methylation fingerprint. As used herein,“substantially similar” may be a statistical calculation, in which the LMR distribution or CpG fingerprint is compared to several other LMR distributions and/or CpG fingerprints from different tissues, and the primary source of cancer is the tissue that has the highest rate of similarity in LMR distribution or CpG methylation. The similarity between two LMR distributions or CpG fingerprints may be calculated based on any portion of the genome, including a global similarity analysis (i.e., a methylation analysis that includes a majority or substantially all of the genome) or regional/local genome analysis.

Provided herein are methods of generating tissue-specific methylation fingerprints (i.e., CpG methylation fingerprints from non-cancerous or cancerous tissues) to create a library of tissue-specific methylation fingerprints. An individual may generate a tissue specific CpG methylation fingerprint utilizing any of the methylation detection techniques disclosed herein. The tissue may be any tissue in the human body capable of becoming cancerous. Therefore, as used herein, tissue includes, but not limited to, epithelial tissue, connective tissue, muscles tissue, or nervous tissue. Epithelial tissue, also referred to as epithelium, refers to the sheets of cells that cover exterior surfaces of the body, lines internal cavities and passageways, and forms certain glands. Connective tissue binds the cells and organs of the body together and functions in the protection, support, and integration of all parts of the body. Muscle tissue occurs as three major types: skeletal (voluntary) muscle, smooth muscle, and cardiac muscle in the heart. The tissue may be a tissue found in any organ in the human body. In some embodiments, the tissue is an adrenal tissue, an anal tissue, a bile duct tissue, a bladder tissue, a bone tissue, a brain/CNS tissue, a breast tissue, a cervical tissue, a colorectal tissue, an endometrial tissue, an esophageal tissue, an eye tissue, a gallbladder tissue, a gastrointestinal tissue, a kidney tissue, a laryngeal or hypopharyngreal tissue, a liver tissue, a lung tissue, a muscle tissue, a nasopharyngeal tissue, an ovarian tissue, a pancreatic tissue, a penile tissue, a pituitary tissue, a primary tissue, a prostate tissue, a salivary gland tissue, a testicular tissue, a thymus tissue, a thyroid tissue, a uterine tissue, a vaginal tissue, or a vulvar tissue.

In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome (e.g., utilizing any of the methods to detect methylation disclosed herein). In some embodiments, an LMR is a region of the genome wherein less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% or less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac.

In some aspects, provided herein are methods and compositions for determining the primary source of a cancer in a subject, the method comprising obtaining a biological sample comprising cancer cells from the subject, detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells, and comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample. In some embodiments, the enhancer is a region of the genome wherein less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% or less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5% of CpGs in that region are methylated. In some embodiments, an enhancer is a region of the genome that is modified with H3K4me1. In some embodiments, the enhancer is a region of the genome that is modified with H3K27ac. The biological sample may be a blood sample or tumor sample.

Also provided herein are methods of early diagnosis of cancer by determining the source of cancer from circulating tumor DNA. Circulating tumor DNA (ctDNA) is tumor- derived fragmented DNA in the bloodstream that is not associated with cells. ctDNA originates directly from the tumor or from circulating tumor cells (CTCs),which describes viable, intact tumor cells that shed from primary tumors and enter the bloodstream or lymphatic system. Often, cancer causing mutations (e.g., HER2 somatic mutations) may be found in ctDNA, but the primary source of cancer is not immediately apparent or difficult to ascertain. In some embodiments, the biological sample is a liquid biopsy. In these cases, the methods disclosed herein may include obtaining a biological sample from a liquid biopsy and detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells, and comparing the CpG methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample from the liquid biopsy. The methods disclosed herein may comprise obtaining a liquid biopsy comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the liquid biopsy, comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the liquid biopsy. Therapeutic Methods of Treating Cancer

In some aspects, provided herein are methods of treating cancer. In some embodiments, the method comprises obtaining a biological sample comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample, identifying the primary source of cancer by comparing the distribution of LMRs in a portion of the genome of the cancer cells to the distribution of LMRs in a portion of the genome in different tissues, wherein the primary source of the cancer is the tissue that has a substantially similar distribution of LMRs to the distribution of LMRs in the cancer cells in the biological sample, and further administering a cancer therapy to the subject.

In some embodiments, the cancer treatment is a cancer therapy that is directed to the primary source of the cancer. In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. In some embodiments, an LMR is a region of the genome (e.g., a hypomethylated portion of the genome) wherein less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% or less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac.

In some embodiments, the cancer comprises a solid tumor, and the biological sample comprises cancer cells (e.g., tumor cells). In some embodiments, the tumor is an adenocarcinoma, an adrenal tumor, an anal tumor, a bile duct tumor, a bladder tumor, a bone tumor, a blood born tumor, a brain/CNS tumor, a breast tumor, a cervical tumor, a colorectal tumor, an endometrial tumor, an esophageal tumor, an Ewing tumor, an eye tumor, a gallbladder tumor, a gastrointestinal, a kidney tumor, a laryngeal or

hypopharyngreal tumor, a liver tumor, a lung tumor, a mesothelioma tumor, a multiple myeloma tumor, a muscle tumor, a nasopharyngeal tumor, a neuroblastoma, an oral tumor, an osteosarcoma, an ovarian tumor, a pancreatic tumor, a penile tumor, a pituitary tumor, a primary tumor, a prostate tumor, a retinoblastoma, a Rhabdomyosarcoma, a salivary gland tumor, a soft tissue sarcoma, a melanoma, a metastatic tumor, a basal cell carcinoma, a Merkel cell tumor, a testicular tumor, a thymus tumor, a thyroid tumor, a uterine tumor, a vaginal tumor, a vulvar tumor, or a Wilms tumor.

In some aspects, provided herein are methods of treating cancer in a subject by obtaining a biological sample comprising cancer cells from the subject, detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of the cancer cells in the biological sample, thereby generating a CpG methylation fingerprint of the cancer cells, identifying the primary source of cancer by comparing the CpG

methylation fingerprint of the cancer cells to the CpG methylation fingerprint of different tissues, wherein the CpG methylation fingerprint of each different tissue is generated by detecting the percentage of CpGs that are methylated in enhancers in a portion of the genome of each different tissue, wherein the primary source of the cancer is a tissue that has a CpG methylation fingerprint that is substantially similar to the CpG methylation fingerprint of the cancer cells in the biological sample, and administering a cancer therapy to the subject. In some embodiments, the cancer therapy is a cancer therapy directed to the primary source of the cancer. In some embodiments, detecting the distribution of LMRs comprises detecting the percentage of CpGs that are methylated in the portion of the genome. In some embodiments, an LMR is a region of the genome wherein less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% or less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5% of CpGs in that region are methylated. The LMR region may be modified by H3K4me1 and/or H3K27ac.

Provided herein are methods of treating cancer by first elucidating the primary source of a cancer or tumor, and based on this information, administering to the subject a therapy for cancer.

In some embodiments, the therapy for cancer is chemotherapy. Chemotherapy may comprise administering one or more chemotherapeutic agents. A chemotherapeutic agent may be, for example, cyclophosphamide, hydroxydaunorubicin (also known as doxorubicin or adriamycin), oncovorin (vincristine), and prednisone. In one embodiment, the chemotherapy comprises a combination of cyclophsophamide, oncovorin, prednisone, and one or more chemotherapeutics selected from the group consisting of anthracycline, hydroxydaunorubicin, epirubicin, and motixantrone.

The cancer therapy may comprise an anti-cancer agent. Examples of other anti- cancer agents include, but are not limited to: acivicin; aclarubicin; acodazole hydrochloride; acronine; adozelesin; aldesleukin; altretamine; ambomycin; ametantrone acetate;

amsacrine; anastrozole; anthramycin; asparaginase; asperlin; azacitidine; azetepa;

azotomycin; batimastat; benzodepa; bicalutamide; bisantrene hydrochloride; bisnafide dimesylate; bizelesin; bleomycin sulfate; brequinar sodium; bropirimine; busulfan;

cactinomycin; calusterone; caracemide; carbetimer; carboplatin; carmustine; carubicin hydrochloride; carzelesin; cedefingol; celecoxib (COX-2 inhibitor); chlorambucil;

cirolemycin; cisplatin; cladribine; crisnatol mesylate; cyclophosphamide; cytarabine; dacarbazine; dactinomycin; daunorubicin hydrochloride; decitabine; dexormaplatin;

dezaguanine; dezaguanine mesylate; diaziquone; docetaxel; doxorubicin; doxorubicin hydrochloride; droloxifene; droloxifene citrate; dromostanolone propionate; duazomycin; edatrexate; eflornithine hydrochloride; elsamitrucin; enloplatin; enpromate; epipropidine; epirubicin hydrochloride; erbulozole; esorubicin hydrochloride; estramustine; estramustine phosphate sodium; etanidazole; etoposide; etoposide phosphate; etoprine; fadrozole hydrochloride; fazarabine; fenretinide; floxuridine; fludarabine phosphate; fluorouracil; fluorocitabine; fosquidone; fostriecin sodium; gemcitabine; gemcitabine hydrochloride; hydroxyurea; idarubicin hydrochloride; ifosfamide; ilmofosine; iproplatin; irinotecan;

irinotecan hydrochloride; lanreotide acetate; letrozole; leuprolide acetate; liarozole hydrochloride; lometrexol sodium; lomustine; losoxantrone hydrochloride; masoprocol; maytansine; mechlorethamine hydrochloride; megestrol acetate; melengestrol acetate;

melphalan; menogaril; mercaptopurine; methotrexate; methotrexate sodium; metoprine; meturedepa; mitindomide; mitocarcin; mitocromin; mitogillin; mitomalcin; mitomycin; mitosper; mitotane; mitoxantrone hydrochloride; mycophenolic acid; nocodazole;

nogalamycin; ormaplatin; oxisuran; paclitaxel; pegaspargase; peliomycin; pentamustine; peplomycin sulfate; perfosfamide; pipobroman; piposulfan; piroxantrone hydrochloride; plicamycin; plomestane; porfimer sodium; porfiromycin; prednimustine; procarbazine hydrochloride; puromycin; puromycin hydrochloride; pyrazofurin; riboprine; safingol; safingol hydrochloride; semustine; simtrazene; sparfosate sodium; sparsomycin;

spirogermanium hydrochloride; spiromustine; spiroplatin; streptonigrin; streptozocin;

sulofenur; talisomycin; tecogalan sodium; taxotere; tegafur; teloxantrone hydrochloride; temoporfin; teniposide; teroxirone; testolactone; thiamiprine; thioguanine; thiotepa;

tiazofurin; tirapazamine; toremifene citrate; trestolone acetate; triciribine phosphate;

trimetrexate; trimetrexate glucuronate; triptorelin; tubulozole hydrochloride; uracil mustard; uredepa; vapreotide; verteporfin; vinblastine sulfate; vincristine sulfate; vindesine;

vindesine sulfate; vinepidine sulfate; vinglycinate sulfate; vinleurosine sulfate; vinorelbine tartrate; vinrosidine sulfate; vinzolidine sulfate; vorozole; zeniplatin; zinostatin; and zorubicin hydrochloride.

The cancer therapy may comprise an anti-cancer drug. Anti-cancer drugs include, but are not limited to: 20-epi-1,25 dihydroxyvitamin D3; 5-ethynyluracil; abiraterone;

aclarubicin; acylfulvene; adecypenol; adozelesin; aldesleukin; ALL-TK antagonists;

altretamine; ambamustine; amidox; amifostine; aminolevulinic acid; amrubicin; amsacrine; anagrelide; anastrozole; andrographolide; angiogenesis inhibitors; antagonist D; antagonist G; antarelix; anti-dorsalizing morphogenetic protein-1; antiandrogen, prostatic carcinoma; antiestrogen; antineoplaston; antisense oligonucleotides; aphidicolin glycinate; apoptosis gene modulators; apoptosis regulators; apurinic acid; ara-CDP-DL-PTBA; arginine deaminase; asulacrine; atamestane; atrimustine; axinastatin 1; axinastatin 2; axinastatin 3; azasetron; azatoxin; azatyrosine; baccatin III derivatives; balanol; batimastat; BCR/ABL antagonists; benzochlorins; benzoylstaurosporine; beta lactam derivatives; beta-alethine; betaclamycin B; betulinic acid; bFGF inhibitor; bicalutamide; bisantrene;

bisaziridinylspermine; bisnafide; bistratene A; bizelesin; breflate; bropirimine; budotitane; buthionine sulfoximine; calcipotriol; calphostin C; camptothecin derivatives; capecitabine; carboxamide-amino-triazole; carboxyamidotriazole; CaRest M3; CARN 700; cartilage derived inhibitor; carzelesin; casein kinase inhibitors (ICOS); castanospermine; cecropin B; cetrorelix; chlorins; chloroquinoxaline sulfonamide; cicaprost; cis-porphyrin; cladribine; clomifene analogues; clotrimazole; collismycin A; collismycin B; combretastatin A4;

combretastatin analogue; conagenin; crambescidin 816; crisnatol; cryptophycin 8;

cryptophycin A derivatives; curacin A; cyclopentanthraquinones; cycloplatam; cyclosporin A; cypemycin; cytarabine ocfosfate; cytolytic factor; cytostatin; dacliximab; decitabine; dehydrodidemnin B; deslorelin; dexamethasone; dexifosfamide; dexrazoxane;

dexverapamil; diaziquone; didemnin B; didox; diethylnorspermine; dihydro-5-azacytidine; dihydrotaxol, 9-; dioxamycin; diphenyl spiromustine; docetaxel; docosanol; dolasetron; doxifluridine; doxorubicin; droloxifene; dronabinol; duocarmycin SA; ebselen; ecomustine; edelfosine; edrecolomab; eflornithine; elemene; emitefur; epirubicin; epristeride;

estramustine analogue; estrogen agonists; estrogen antagonists; etanidazole; etoposide phosphate; exemestane; fadrozole; fazarabine; fenretinide; filgrastim; finasteride;

flavopiridol; flezelastine; fluasterone; fludarabine; fluorodaunorunicin hydrochloride;

forfenimex; formestane; fostriecin; fotemustine; gadolinium texaphyrin; gallium nitrate; galocitabine; ganirelix; gelatinase inhibitors; gemcitabine; glutathione inhibitors;

hepsulfam; heregulin; hexamethylene bisacetamide; hypericin; ibandronic acid; idarubicin; idoxifene; idramantone; ilmofosine; ilomastat; imatinib (e.g., Gleevec®), imiquimod;

immunostimulant peptides; insulin-like growth factor-1 receptor inhibitor; interferon agonists; interferons; interleukins; iobenguane; iododoxorubicin; ipomeanol, 4-; iroplact; irsogladine; isobengazole; isohomohalicondrin B; itasetron; jasplakinolide; kahalalide F; lamellarin-N triacetate; lanreotide; leinamycin; lenograstim; lentinan sulfate; leptolstatin; letrozole; leukemia inhibiting factor; leukocyte alpha interferon;

leuprolide+estrogen+progesterone; leuprorelin; levamisole; liarozole; linear polyamine analogue; lipophilic disaccharide peptide; lipophilic platinum compounds; lissoclinamide 7; lobaplatin; lombricine; lometrexol; lonidamine; losoxantrone; loxoribine; lurtotecan;

lutetium texaphyrin; lysofylline; lytic peptides; maitansine; mannostatin A; marimastat; masoprocol; maspin; matrilysin inhibitors; matrix metalloproteinase inhibitors; menogaril; merbarone; meterelin; methioninase; metoclopramide; MIF inhibitor; mifepristone;

miltefosine; mirimostim; mitoguazone; mitolactol; mitomycin analogues; mitonafide;

mitotoxin fibroblast growth factor-saporin; mitoxantrone; mofarotene; molgramostim; Erbitux, human chorionic gonadotrophin; monophosphoryl lipid A+myobacterium cell wall sk; mopidamol; mustard anticancer agent; mycaperoxide B; mycobacterial cell wall extract; myriaporone; N-acetyldinaline; N-substituted benzamides; nafarelin; nagrestip;

naloxone+pentazocine; napavin; naphterpin; nartograstim; nedaplatin; nemorubicin;

neridronic acid; nilutamide; nisamycin; nitric oxide modulators; nitroxide antioxidant; nitrullyn; oblimersen (Genasense®); O6-benzylguanine; octreotide; okicenone;

oligonucleotides; onapristone; ondansetron; ondansetron; oracin; oral cytokine inducer; ormaplatin; osaterone; oxaliplatin; oxaunomycin; paclitaxel; paclitaxel analogues;

paclitaxel derivatives; palauamine; palmitoylrhizoxin; pamidronic acid; panaxytriol;

panomifene; parabactin; pazelliptine; pegaspargase; peldesine; pentosan polysulfate sodium; pentostatin; pentrozole; perflubron; perfosfamide; perillyl alcohol;

phenazinomycin; phenylacetate; phosphatase inhibitors; picibanil; pilocarpine

hydrochloride; pirarubicin; piritrexim; placetin A; placetin B; plasminogen activator inhibitor; platinum complex; platinum compounds; platinum-triamine complex; porfimer sodium; porfiromycin; prednisone; propyl bis-acridone; prostaglandin J2; proteasome inhibitors; protein A-based immune modulator; protein kinase C inhibitor; protein kinase C inhibitors, microalgal; protein tyrosine phosphatase inhibitors; purine nucleoside phosphorylase inhibitors; purpurins; pyrazoloacridine; pyridoxylated hemoglobin polyoxyethylene conjugate; raf antagonists; raltitrexed; ramosetron; ras farnesyl protein transferase inhibitors; ras inhibitors; ras-GAP inhibitor; retelliptine demethylated; rhenium Re 186 etidronate; rhizoxin; ribozymes; RII retinamide; rohitukine; romurtide; roquinimex; rubiginone B1; ruboxyl; safingol; saintopin; SarCNU; sarcophytol A; sargramostim; Sdi 1 mimetics; semustine; senescence derived inhibitor 1; sense oligonucleotides; signal transduction inhibitors; sizofuran; sobuzoxane; sodium borocaptate; sodium phenylacetate; solverol; somatomedin binding protein; sonermin; sparfosic acid; spicamycin D;

spiromustine; splenopentin; spongistatin 1; squalamine; stipiamide; stromelysin inhibitors; sulfinosine; superactive vasoactive intestinal peptide antagonist; suradista; suramin;

swainsonine; tallimustine; tamoxifen methiodide; tauromustine; tazarotene; tecogalan sodium; tegafur; tellurapyrylium; telomerase inhibitors; temoporfin; teniposide;

tetrachlorodecaoxide; tetrazomine; thaliblastine; thiocoraline; thrombopoietin;

thrombopoietin mimetic; thymalfasin; thymopoietin receptor agonist; thymotrinan; thyroid stimulating hormone; tin ethyl etiopurpurin; tirapazamine; titanocene bichloride; topsentin; toremifene; translation inhibitors; tretinoin; triacetyluridine; triciribine; trimetrexate;

triptorelin; tropisetron; turosteride; tyrosine kinase inhibitors; tyrphostins; UBC inhibitors; ubenimex; urogenital sinus-derived growth inhibitory factor; urokinase receptor antagonists; vapreotide; variolin B; velaresol; veramine; verdins; verteporfin; vinorelbine; vinxaltine; vitaxin; vorozole; zanoterone; zeniplatin; zilascorb; and zinostatin stimalamer. Specific active agents include, but are not limited to, chlorambucil, fludarabine, dexamethasone (Decadron®), hydrocortisone, methylprednisolone, cilostamide, doxorubicin (Doxil®), forskolin, rituximab, cyclosporin A, cisplatin, vincristine, PDE7 inhibitors such as BRL-50481 and IR-202, dual PDE4/7 inhibitors such as IR-284, cilostazol, meribendan, milrinone, vesnarionone, enoximone and pimobendan, Syk inhibitors such as fostamatinib disodium (R406/R788), R343, R-112 and Excellair® (ZaBeCor Pharmaceuticals, Bala Cynwyd, Pa.).

In some embodiments, the cancer therapy comprises administering an immune checkpoint inhibitor. Immune Checkpoint inhibition broadly refers to inhibiting the checkpoints that cancer cells can produce to prevent or downregulate an immune response. Examples of immune checkpoint proteins are CTLA-4, PD-1, VISTA, B7-H2, B7-H3, PD- L1, B7-H4, B7-H6, ICOS, HVEM, PD-L2, CD160, gp49B, PIR-B, KIR family receptors, TIM-1, TIM-3, TIM-4, LAG-3, BTLA, SIRPalpha (CD47), CD48, 2B4 (CD244), B7.1, B7.2, ILT-2, ILT-4, TIGIT, HHLA2, butyrophilins, A2aR, and combinations thereof. Actual dosage levels of the active ingredients in the pharmaceutical compositions or agents to be administered may be varied so as to obtain an amount of the active ingredient (e.g., an agent described herein) which is effective to achieve the desired therapeutic response for a particular patient, composition, and mode of administration, without being toxic to the patient. Tumor Specific Antigens

In some aspects, provided herein are methods and compositions related to cancer neoantigens (e.g., tumor specific antigens). In one aspect the invention provides methods of identifying a neoantigen by identifying a tumor specific antigen or protein expressed as a result of an activated hypomethylated developmental enhancer, in a subject having cancer. Accordingly, the present invention relates to methods for identifying and/or detecting T-cell epitopes of an antigen. Specifically, the invention provides method of identifying and/or detecting tumor specific neoantigens that are useful in inducing a tumor specific immune response in a subject. The invention also provides methods of vaccinating or treating a subject by identifying tumor specific mutations.

The immune system can recognize developing cancers and therapeutic manipulation of immunity can induce tumor regression. The capacity to manifest remarkably durable responses in some patients has been ascribed in part to T cells that can (a) kill tumor cells directly, (b) orchestrate diverse antitumor immune responses, (c) manifest long-lasting memory, and (d) display remarkable specificity for tumor-derived proteins. This specificity stems from fundamental differences between cancer cells and their normal counterparts in that the former develop protein-altering mutations and undergo epigenetic (e.g.,

methylation) alterations, resulting in aberrant protein expression (e.g., protein expression as a result of activated developmental enhancers, such as protein expression as a result of PRC2 deficiency). These events can result in formation of tumor antigens. In some embodiments, a neoantigen is a protein or polypeptide expressed in cancer cells as a result of an activated developmental enhancer (e.g., a hypomethylated enhancer, such as a hypomethylated enhancer activated by a depletion of PRC2), which is not activated in non- cancerous cells. In some aspects, provided herein are methods and compositions to identify tumor specific antigens. In some embodiments, identifying tumor specific antigens comprises obtaining a biological sample (e.g., a tumor or blood sample) comprising cancer cells from the subject, detecting hypomethylated regions of a portion of the genome of the cancer cells in the biological sample, comparing the distribution of hypomethylated regions of a portion of the genome of the cancer cells to the distribution of hypomethylated regions of portions of the genomes of different tissues (e.g., adult or embryonic tissues), and identifying protein or polypeptides expressed as a result of activated genes in the hypomethylated regions, thereby identifying tumor specific antigens. The method may comprise obtaining a biological sample comprising cancer cells from the subject, and detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample. The method may further comprise comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of non-cancerous tissues, identifying protein or polypeptides expressed in the cancerous cells, but not non-cancerous cells, as a result of activation of genes controlled by hypomethylated regulatory elements (e.g., enhancers, such as developmental enhancers). Also provided herin are assays to identify novel cancer neo antigens by obtaining a biological sample comprising cancer cells from the subject, detecting the distribution of low methylated regions (LMRs) in a portion of the genome of the cancer cells in the biological sample and comparing the distribution of LMRs in a portion of the genome of the cancer cells to distribution of LMRs in portions of the genomes of non-cancerous tissues, identifying protein or polypeptides expressed in the cancerous cells, but not non-cancerous cells, as a result of activation of genes controlled by hypomethylated regulatory elements (e.g., enhancers, such as developmental enhancers normally silenced in non-cancerous tissues).

In another aspect, provided herein are methods of inducing a tumor specific immune response in a subject by administering to the subject autologous dendritic cells or antigen presenting cells that have been pulsed with one or more of the peptides or polypeptides identified according to the methods disclosed herein. In a further aspect the invention provides methods of inducing a tumor specific immune response in a subject by

administering one or more peptides or polypeptides identified according to the methods disclosed herein (e.g., proteins expressed as a result of activation of developmental enhancers or hypomethylated enhancer disclosed herein) and an adjuvant. The adjuvant is for example, a TLR-based adjuvant or a mineral oil based adjuvant. In some aspects the peptide or polypeptide and TLR-based adjuvant is emulsified with a mineral oil based adjuvant. Therefore, the invention also provides a methods and compositions for vaccinating or treating cancer a subject by identifying tumor specific antigens according to the methods disclosed herein. In some embodiments, provided herein are cancer vaccines comprising the proteins or polypeptides disclosed herein.

A person skilled in the art will be able to select preferred peptides, polypeptide or combination of thereof by testing, for example, the generation of T-cells in vitro as well as their efficiency and overall presence, the proliferation, affinity and expansion of certain T- cells for certain peptides, and the functionality of the T-cells, e.g. by analyzing the IFN-γ production or tumor killing by T-cells. The most efficient peptides may be combined as a vaccine. A vaccine may contain between 1 and 20 peptides, more preferably 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 different peptides, further preferred 6, 7, 8, 9, 1011, 12, 13, or 14 different peptides, and most preferably 12, 13 or 14 different peptides. In one embodiment of the present invention the different peptides and/or polypeptides are selected so that one vaccine composition comprises peptides and/or polypeptides capable of associating with different MHC molecules, such as different MHC class I molecule. A vaccine composition may comprise peptides and/or polypeptides capable of associating with the most frequently occurring MHC class I molecules. In some embodiments, the vaccine composition is capable of raising a specific cytotoxic T-cells response and a specific helper T-cell response. In some embodiments, the vaccine composition comprises an adjuvant and/or a carrier. Examples of adjuvants and carriers are given herein below. The peptides and/or polypeptides in the composition can be associated with a carrier such as e.g. a protein or an antigen-presenting cell such as e.g. a dendritic cell (DC) capable of presenting the peptide to a T-cell.

Adjuvants are any substance whose admixture into the vaccine composition increases or otherwise modifies the immune response to the peptide or polypeptide. Carriers are scaffold structures, for example a polypeptide or a polysaccharide, to which the neoantigenic peptides, is capable of being associated. Optionally, adjuvants are conjugated covalently or non-covalently to the peptides or polypeptides of the invention. The ability of an adjuvant to increase the immune response to an antigen is typically manifested by a significant increase in immune-mediated reaction, or reduction in disease symptoms. For example, an increase in humoral immunity is typically manifested by a significant increase in the titer of antibodies raised to the antigen, and an increase in T-cell activity is typically manifested in increased cell proliferation, or cellular cytotoxicity, or cytokine secretion. An adjuvant may also alter an immune response, for example, by changing a primarily humoral or Th response into a primarily cellular, or Th response. Suitable adjuvants include, but are not limited to 1018 ISS, aluminium salts, Amplivax, AS15, BCG, CP-870,893, CpG7909, CyaA, dSLIM, GM-CSF, IC30, IC31, Imiquimod, ImuFact IMP321, IS Patch, ISS, ISCOMATRIX, JuvImmune, LipoVac, MF59, monophosphoryl lipid A, Montanide IMS 1312, Montanide ISA 206, Montanide ISA 50V, Montanide ISA-51, OK-432, OM-174, OM-197-MP-EC, ONTAK, PepTel® vector system, PLG microparticles, resiquimod, SRL172, Virosomes and other Virus-like particles, YF-17D, VEGF trap, R848, beta- glucan, Pam3Cys, Aquila's QS21 stimulon (Aquila Biotech, Worcester, Mass., USA) which is derived from saponin, mycobacterial extracts and synthetic bacterial cell wall mimics, and other proprietary adjuvants such as Ribi's Detox. Quil or Superfos. Adjuvants such as incomplete Freund's or GM-CSF are preferred. Several immunological adjuvants (e.g., MF59) specific for dendritic cells and their preparation have been described previously (Dupuis et al., Cell Immunol. (1998) 186(1):18-27; Allison et al.; Dev Biol Stand. (1998) 92:3-11). Also cytokines may be used. Several cytokines have been directly linked to influencing dendritic cell migration to lymphoid tissues (e.g., TNF-alpha), accelerating the maturation of dendritic cells into efficient antigen-presenting cells for T-lymphocytes (e.g., GM-CSF, IL-1 and IL-4) (U.S. Pat. No.5,849,589, incorporated herein by reference in its entirety) and acting as immunoadjuvants (e.g., IL-12).

Other examples of useful adjuvants include, but are not limited to, chemically modified CpGs (e.g. CpR, Idera), Poly(I:C)(e.g. polyi:CI2U), non-CpG bacterial DNA or RNA as well as immunoactive small molecules and antibodies such as cyclophosphamide, sunitinib, bevacizumab, celebrex, NCX-4016, sildenafil, tadalafil, vardenafil, sorafinib, XL- 999, CP-547632, pazopanib, ZD2171, AZD2171, ipilimumab, tremelimumab, and

SC58175, which may act therapeutically and/or as an adjuvant. The amounts and concentrations of adjuvants and additives useful in the context of the present invention can readily be determined by the skilled artisan without undue experimentation. Additional adjuvants include colony-stimulating factors, such as Granulocyte Macrophage Colony Stimulating Factor (GM-CSF, sargramostim).

A vaccine composition according to the present invention may comprise more than one adjuvant. Furthermore, the invention encompasses a therapeutic composition comprising any adjuvant substance including any of the above or combinations thereof. It is also contemplated that the peptide or polypeptide, and the adjuvant can be administered separately in any appropriate sequence. A carrier may be present independently of an adjuvant. The function of a carrier can for example be to increase the molecular weight of in particular mutant in order to increase their activity or immunogenicity, to confer stability, to increase the biological activity, or to increase serum half-life. Furthermore, a carrier may aid presenting peptides to T-cells. The carrier may be any suitable carrier known to the person skilled in the art, for example a protein or an antigen presenting cell. A carrier protein could be but is not limited to keyhole limpet hemocyanin, serum proteins such as transferrin, bovine serum albumin, human serum albumin, thyroglobulin or ovalbumin, immunoglobulins, or hormones, such as insulin or palmitic acid. For immunization of humans, the carrier must be a physiologically acceptable carrier acceptable to humans and safe. However, tetanus toxoid and/or diptheria toxoid are suitable carriers in one embodiment of the invention. Alternatively, the carrier may be dextrans for example sepharose.

Cytotoxic T-cells (CTLs) recognize an antigen in the form of a peptide bound to an MHC molecule. The MHC molecule itself is located at the cell surface of an antigen presenting cell. Thus, an activation of CTLs is only possible if a trimeric complex of peptide antigen, MHC molecule, and APC is present. Correspondingly, it may enhance the immune response if not only the peptide is used for activation of CTLs, but if additionally APCs with the respective MHC molecule are added. Therefore, in some embodiments the vaccine composition according to the present invention additionally contains at least one antigen presenting cell. The antigen-presenting cell (or stimulator cell) typically has an MHC class I or II molecule on its surface, and in one embodiment is substantially incapable of itself loading the MHC class I or II molecule with the selected antigen. As is described in more detail below, the MHC class I or II molecule may readily be loaded with the selected antigen in vitro. Antigen presenting cells may be dendritic cells. The dendritic cells may be autologous dendritic cells that are pulsed with the neoantigenic peptide. The peptide may be any suitable peptide that gives rise to an appropriate T-cell response. Thus, in one embodiment of the present invention the vaccine composition containing at least one antigen presenting cell is pulsed or loaded with one or more peptides of the present invention. Alternatively, peripheral blood mononuclear cells (PBMCs) isolated from a patient may be loaded with peptides ex vivo and injected back into the patient. As an alternative the antigen presenting cell comprises an expression construct encoding a peptide of the present invention. The polynucleotide may be any suitable polynucleotide and it is preferred that it is capable of transducing the dendritic cell, thus resulting in the

presentation of a peptide and induction of immunity.

In some aspects, provided herein are methods of inducing a tumor specific immune response in a subject, vaccinating against a tumor, treating and or alleviating a symptom of cancer in a subject by administering the subject a neoantigenic peptide or vaccine composition disclosed herein. The methods and compositions of cancer specific immunotherapy disclosed herein may be administered in combination with any cancer therapy known in the art, including those disclosed herein. Exemplification

This invention is further illustrated by the following examples, which should not be construed as limiting.

Adult organs arise through sequential activation of selected enhancers and target genes in tissue primordia. Cis-elements used transiently during development are subsequently silenced, yielding to regulatory and transcriptional programs unique to each adult tissue. The histone 3 lysine 4 (H3K4) demethylase LSD1 helps inactivate enhancers during embryonic stem cell (ESC) differentiation, but it is unclear how enhancers become decommissioned during development and if adult tissues retain a recoverable memory of their developmental enhancers. This process is important to understand because induction of pluripotency in adult somatic cells represents development in reverse and because cancers reactivate certain fetal genes. Pancreatic cancer metastases, for example, selectively reactivate developmental enhancers specific to the embryonic foregut.

Enhancers active in embryos and adults show accessible chromatin, transcription factor (TF) occupancy, and the canonical histone marks H3K4me1 and H3K27ac; sites carrying both these activation marks associate best with expressed genes. The histone mark H3K27me3 is associated in ESC and adult tissues with repressed promoters (Figure 1, Part A), especially those of morphogenetic and TF genes, but adult tissue enhancers usually lack H3K27me3. Instead, many candidate enhancers carry H3K4me1 but not H3K27ac, and in ESC this group appears to be‘primed’ or‘poised’ for activation upon acquiring H3K27ac. The need for enhancer‘priming’ in terminally differentiated adult cells is unclear.

Most CpG dinucleotides in mammalian DNA are fully methylated (me), though long stretches of high CG density (islands, CGIs) near transcription start sites (TSSs) avoid methylation in those tissues where the gene is active. In contrast, enhancers have short stretches of CG-poor DNA with 20% to 50% meCpG, reflecting oxidation of meCpG. This intermediate state represents the net effect of methylation mediated by DNA

methyltransferases (DNMTs) and demethylation mediated by TET enzymes. In TET2- deficient ESC, where enhancer hypomethylation is widely reversed, transcriptional disturbance is modest, related mainly to few enhancers that lose H3K27ac. Although hypomethylation at some adult enhancers is ascribed to their activity in embryos, interactions and dependencies among meCpG, histone marks, and gene activity have not been investigated. Furthermore, most enhancers deployed during development remain unidentified and it is unknown if low meCpG is a universal feature that, like TF binding and histone marks, reverses when enhancers are decommissioned.

Mouse intestinal epithelium is suited to address these questions in adult and developmental precursor cells. Stem cells in intestinal crypts continually produce terminally differentiated, short-lived villus cells of one predominant type: enterocytes (Figure 3, Part A). Villus epithelial cells are readily separated from the proliferative crypt compartment, and their endoderm-derived precursors can be purified from embryos. These features were exploited to examine tissue-specific adult and developmental enhancers. These features were taken advantage of to study recruitment and decommissioning of cis-elements during mouse gut development. It was found that adult intestinal cells retain a record of ~90% of tissuespecific developmental enhancers in the form of hypomethylated DNA. Enhancers used early in development retain no other discernible mark, whereas thousands of sites used late in gestation retain some H3K4me1 but not H3K27ac. These H3K4me1+H3K27ac- regions are therefore not necessarily‘poised’ or‘primed’ for activation, but are sites that were decommissioned late in development. Remarkably, prolonged absence of Polycomb Repressor Complex 2 (PRC2) results in indirect and delayed reactivation of silenced developmental enhancers, in roughly reverse order to their use in embryos. In diverse PRC2-null cells, only tissue-restricted hypomethylated enhancers–and nearly all such enhancers– are reactivated, with attendant expression of developmental genes.

meCpG is confidently implicated in X chromosome inactivation, gene imprinting, and silencing of endogenous viruses, but its role in other forms of epigenetic memory is uncertain. These findings reveal meCpG as a key determinant of bona fide and recoverable epigenetic memory dating to the period of organogenesis. Delineation of intestinal low-methylated regions (LMRs) Short stretches of hypomethylated DNA vastly outnumber marked, active enhancers

In adult mouse duodenal villus epithelium, ChIP-seq (Table 1 and Figure 1, Part B) identified 33,676 distant (>-2 and >+1 kb from TSSs) H3K4me1+ regions with, and 40,234 sites lacking, H3K27ac; genes near H3K27ac+ enhancers were expressed at high levels, whereas those near H3K27ac- sites were not (Figure 2, Part A). Because the proportion of primed enhancers seemed high for short-lived, terminally differentiated cells and the extent of meCpG at these sites is unknown, genome-wide DNA methylation was assessed at base resolution. The whole-genome bisulfite sequencing (WGBS) data showed high concordance with published results (Figure 1, Part C) and comparably low meCpG fractions in both groups of H3K4me1+ sites (Figure 2, Part A). Defined by stringent criteria, 53% to 58% of sites in each group showed reduced meCpG, revealing hypomethylated DNA as a common but not universal enhancer property. Notably, ~32,000 regions far from TSSs showed <50% meCpG, identified at a false discovery rate (FDR) of 0.05, but lacked H3K4me1 or

H3K27ac, indicating that sites with hypomethylated DNA outnumber those with active histone marks. Moreover, 9% of additional hypomethylated sites identified at a relaxed FDR of 0.1 coincided with called H3K4me1 peaks, signifying a regulatory role (Figure 2, Part B). This 0.1 FDR was used to map a total of 20,856 unmethylated (UMRs) and 80,519 low-methylated regions (LMRs, corresponding to meCpG <59%) in purified intestinal villus epithelium (Figure 3, Part B). UMRs overlapped extensively with promoter CGIs, but included 8,146 non-promoter sites (Figure 2, Part C), whereas LMRs encompassed active regions,‘primed’ enhancers, and 47,612 sites that lack both active histone marks (Figure 3, Parts B-C). Going forward, all non- promoter regions with H3K4me1, H3K27ac were considered, or reduced m 101 eCpG as potential enhancers (Figure 3, Part C and Figure 2, Part D). The conclusions that follow apply whether 68,517 LMRs (FDR

0.05) or 80,519 LMRs (FDR 0.1, Figure 2, Part B) were considered; the latter, more inclusive set gives a fuller picture of embryonic enhancer usage and recrudescence. Of note, the LMR profile, including LMR-only regions, was similar in villus and Lgr5+ intestinal stem cells (Figure 4, Part C), indicating that it characterizes the tissue and not just enterocytes.

Active enhancers showed accessible chromatin and binding of HNF4A, a TF that binds most enterocyte enhancers. These features were absent or much reduced among ‘primed’enhancers, and totally lacking in the regions showing only low meCpG (Figure 3, Parts C and D). Of note, although levels of hypomethylated DNA were comparable at active and‘primed’ enhancers, H3K4me1 was generally weaker at the latter sites and histone marks were more robust at active enhancers with reduced meCpG than at fully methylated sites (Figure 3, Part C). Moreover, the marked nucleosomes at active enhancers flanked a central area of DNA hypomethylation, accessible chromatin, and TF occupancy, whereas nucleosomes at‘primed’ enhancers coincided with areas of low meCpG (Figure 3, Part C), in agreement with their diminished chromatin access and infrequent HNF4A binding. These data identify distinct patterns of histone marking and DNA hypomethylation at active and‘primed’ enhancers, and reveal thousands of sites with reduced meCpG as a solitary feature.

To examine the bona fides of the latter group, first LMRs at FDR 0.1 were identified in public WGBS data from mouse blood, skin, and brain (Figure 4, Part A). In blood, for example, 45% of LMRs lacked H3K4me1, revealing comparable abundance of LMR-only sites (Figure 4, Part B). In keeping with the tissue specificity attributed to enhancers, LMRs identified in villus epithelium or any other tissue were usually fully methylated in the others (Figure 4, Parts C and D); moreover, the LMR profile, including LMR-only regions, was similar in villus epithelium and Lgr5+ intestinal stem cells, indicating that it characterizes the tissue and not just differentiated enterocytes. Of note, intestinal LMRs lacking H3K4me1 showed at least as much tissue specificity, enrichment of TF sequence motifs (Figure 5, Parts A and B), and evolutionary conservation (Figure 3, Part E) as H3K4me1+ enhancers. Whereas active and‘primed’ enhancers were enriched for DNA sequence motifs associated with known intestinal TFs and nearby genes are enriched for enterocyte functions, both‘primed’ and LMR-only sites were enriched for the motifs of developmental TFs, such as FOX factors, and nearby genes are enriched for developmental functions (Figure 5, Parts B and C). Thus, adult intestinal LMRs encompass both active and seemingly inactive cis-elements of two types, H3K4me1+ and H3K4me1-, and the features of 130 H3K4me1- sites suggest prior activity during development. Adult cells retain a comprehensive archive of developmental enhancers

A fraction of hypomethylated areas of DNA in adult mouse and zebrafish cells show cis-element activity in embryos, but >50% of sites with reduced meCpG 134 in the fetal brain were fully methylated in adult brain cells. Because the brain contains numerous cell types, it is unclear which fetal enhancers lose or preserve hypomethylated DNA in a given population. In contrast, the intestinal epithelium descends directly from region-specific endoderm, and after embryonic day (E) 11, the surface protein EPCAM selectively marks prospective epithelial cells, allowing their isolation by flow cytometry (Figure 5, Part D). To trace the possible origins of hypomethylated regions in the adult intestine, gene activity was assessed and enhancer dynamics in endodermal cells purified from different stages in intestine development (Table 1 and Figure 1, Part D and E). Between E11.5 and adult intestinal epithelium, EPCAM+ cells showed 12,266 mRNA alterations (>2-fold, q <0.05) between any two stages, in waves that coincided with early, mid, and late gestation (Table 1 and Figure 3, Part D and Figure 13, Part E). Over the same period, ATAC-seq on EPCAM+ cells identified 68,510 unique areas of open chromatin >1 kb from promoters (Table 1, and Figure 3, Part E). Unsupervised k-means clustering placed these sites in 10 clusters (Figure 5, Part F), which formed 4 distinct groups (Figure 6, Part A): accessible at all stages, open only before E14.5, accessible mainly in mid- (E14.5) to late (E16.5) gestation, and open after E16.5. Developmental stage-specific expression of nearby genes correlated with these waves of chromatin access (Figure 6, Part A), indicating that regions identified by ATAC represent active enhancers. Indeed, endoderm cells purified at E16.5 (this study) or E12.5 showed H3K4me1 or H3K27ac, respectively, in many areas marked by ATAC before E14.5 (Figure 5, Part G). About 78% of sites showing open chromatin before E16.5 have closed chromatin and no H3K4me1 in adult cells. These ~26,000 ATAC+ sites function during intestine

development and are subsequently decommissioned; as ChIP- and ATAC-seq have limited sensitivity in small cell populations, the full complement of decommissioned enhancers is likely even larger.

To determine the dynamics of DNA methylation during intestine development, WGBS was performed on purified E12.5 and E16.5 intestinal epithelium and identified 53,350 unique non-promoter hypomethylated regions. ATAC-identified enhancers were fully methylated in the epiblast and meCpG was reduced sequentially: first at sites showing open chromatin at E11.5 and E12.5, later in regions that opened in mid-gestation, and last at active adult enhancers (Figure 6, Part C). Moreover, sites hypomethylated at E12.5 and E16.5 correspond to the areas identified in adult villus cells as the‘LMR-only’ and ‘primed’ groups, respectively (Figure 7). Notably, 83% of sites hypomethylated at E12.5 and 92% of sites hypomethylated at E16.5 remained so in adult villus cells (Figure 6, Part D), and only 8,914 enhancers became fully methylated. Thus, after decommissioned enhancers relinquish chromatin access, active histone marks and regulatory functions, they retain reduced meCpG as a singularly stable feature, preserved over the hundreds of cell divisions that separate embryos from adult cells. Thousands of sites hypomethylated in the epiblast, however, were almost fully methylated in adult intestinal cells (Figure 7, Part A), indicating that tissue-restricted enhancers are archived only after cells are specified. Within this archive, sites active late in gestation retain H3K4me1 and occasional weak TF binding and open chromatin in adult cells. Accordingly, these are not‘primed’ cis-elements per se, but transient and late-acting developmental enhancers. In contrast, sites active early in organogenesis retain no features other than hypomethylated DNA. Prolonged PRC2 inactivity selectively reactivates developmental enhancers

Inactivation of developmental genes is a cardinal function of PRC2, which places the histone mark H3K27me3. However, up to 9 days after loss of PRC2 from intestinal epithelium in Villin -CreER-T2 ;Eed Fl/Fl mice, only the few genes with tissue-specific bivalent (H3K4me3+ H3K27me3+) promoters are derepressed. Among 14 covalent histone marks examined by mass spectrometry 9 days after initial activation of Cre recombinase, only methylated H3K27 was abrogated and total H3K27ac was modestly increased (Figure 8, Part A), hinting that absence of PRC2 might expose H3K27 residues to indiscriminate acetylation. To examine the distribution of H3K27ac in Eed -/- villus epithelium, tamoxifen was administered over 2 weeks to preclude Eed-proficient‘escaper’ crypts; by the time mice became moribund, at 14 days, the epithelium lacked H3K27me3 or dividing cells (Figure 8, Part B). Within 9 days of PRC2 depletion, ChIP revealed H3K27ac at inactivated fetal enhancers, and by day 14 the marking was stronger and also evident at embryonic enhancers (Figure 9, Part A). These gains were verified by meticulous normalization using Drosophila chromatin‘spike-in’ controls (Methods; Table 1). The same sites also acquired H3K4me1 (Figure 9, Part A), indicating increased enhancer activity rather than passive acetylation of unmethylated H3K27. Among millions of nucleosomes depleted of

H3K27me in Eed -/- cells, diffReps objectively identified only 43,816 sites of H3K27ac gain and 17,562 sites of reduced H3K27ac located >1 kb from promoters (Figure 8, Part C). To determine whether H3K27ac gains also occur after PRC2 loss in other cells, bone marrow was cultured from Eed Fl/Fl mice, deleted Eed by viral CRE expression, and induced macrophage differentiation in vitro. Mutant CD11b+ macrophages lacked H3K27me and diffReps identified 7,836 sites of quantitative H3K27ac gain, which occurred selectively at hypomethylated enhancers (Figure 10, Part B). Active enhancers with fully methylated DNA were more susceptible to H3K27ac loss than those with reduced meCpG (Figure 9, Part A). More significantly, 80% of enhancers that acquired H3K27ac coincided with hypomethylated developmental sites, and nearly 50% of hypomethylated developmental enhancers showed objective H3K27ac gains (Figure 9, Part B). Because LMRs and areas of H3K27ac gain are defined by arbitrary cut-offs, even these high fractions underestimate the overlap. For example, diffReps confidently tagged only 19% of embryonic enhancers, even though active histones were evident at many sites, (Figure 9, Part A). Indeed,

developmental enhancers that failed the stringent criterion (q <0.01) for H3K27ac acquisition had measurable gains (Figure 9, Part C). Conversely, sites that acquired

H3K27ac but did not qualify as strictly hypomethylated (note that 29% of methylated fetal LMRs gained H3K27ac) were not fully methylated in WT cells; rather, their meCpG fractions were between 90% and 60% (Figure 9, Part B), higher than the 59% ceiling established at FDR 0.01. To assess site specificity, the 8,914 sites that had low meCpG in embryonic or fetal intestine were considered, but are fully methylated in the adult (Figure 6, Part D), and 21,083 macrophage-specific LMRs (Figure 10, Part C). Neither group of categorically methylated enhancers showed significant H3K27ac gains in Eed -/- intestines (Figure 9, Part C). Moreover, macrophage-restricted LMRs did not acquire H3K27ac in Eed -/- intestines and vice versa (Figure 10, Part C). Thus, only silenced developmental enhancers with low residual meCpG acquire active histone marks in the absence of PRC2 and this occurred sooner at fetal than at embryonic enhancers. Of note, reduced meCpG is a feature common to both active enhancers that are stable in the absence of PRC2 (Figure 9, Part A) and decommissioned enhancers that acquire active histone marks (Figure 9, Parts B-C). Thus, histone activation was almost entirely restricted to decommissioned

developmental enhancers. Basis of enhancer reactivation in the absence of PRC2

To examine the specificity of this response to PRC2 loss, first LMRs were identified in public WGBS data from mouse skin and brain (Figure 4, Part A). In keeping with the tissue specificity attributed to enhancers, LMRs identified in villus epithelium or any other tissue were usually fully methylated in the others, whereas UMRs (mostly promoters) usually lacked meCpG in all tissues (Figure 4, Part C and D). To determine if H3K27ac gains occur at LMRs after PRC2 loss in other cell types, bone marrow was cultured from EedFl/Fl mice, deleted Eed by viral CRE expression, induced macrophage differentiation in vitro, and confirmed that mutant CD11b+ macrophages lacked H3K27me (Figure 5, Part A). In PRC2-null macrophages, diffReps identified 7,836 sites of quantitative H3K27ac gain, and these occurred selectively at hypomethylated enhancers (Figure 5, Part B).50,000 inactive (H3K27ac-) enhancers with unambiguous differential methylation between macrophages and intestinal cells were also identified. In each case, H3K27ac accumulated only in the corresponding PRC2-null tissue, as is evident in heatmaps (Figure 5, Part C) and by strict quantitation of macrophage sites in intestinal cells (Figure 9 , Part D). In summary, only silenced developmental enhancers with low meCpG acquire active histone marks in the absence of PRC2; this occurs with high tissue specificity and, in the intestine, sooner at fetal than at embryonic enhancers.

To investigate the basis for significant enhancer modulation in PRC2-null cells, ChIP was used to map the distributions in adult wild-type (WT) villus cells of all PRC2- modified forms of H3K27 (mono-, di-, and tri-methyl) and the PRC2 component SUZ12 (Table 1). Mirroring the distributions reported in ESC30, H3K27me1 predominated at active loci, H3K27me2 in intergenic regions, and SUZ12 and H3K27me3 at silenced promoters and genes (Figure 10, Part D). Enhancers generally lacked SUZ12 or

H3K27me3, and when present, H3K27me3 was dispersed across large regions, at levels considerably lower than those found at repressed bivalent promoters (Figure 10, Part E). Likewise, H3K27me1 and H3K27me2 were distributed over hundreds of kb throughout the genome, with no focal enrichment over the sites that acquire H3K27ac focally in Eed -/- cells. Thus, modulation of these enhancers is likely a secondary effect stemming from the direct, short-term consequence of PRC2 loss: activation of numerous bivalent TF genes that mediate early intestine development, including factors whose motifs are enriched at developmental LMRs (Figure 5, Part B).

These mRNAs, including several FOX-family TFs, continued to accumulate after day 9, showing persistent association with basal promoter H3K4me3 levels, and prolonged PRC2 loss also increased levels of certain non-bivalent FOX genes (Figure 11, Parts A and B). To examine the likely proximate cause of enhancer reactivation, ChIP-seq was performed for the two FOX TFs with available ChIP-grade antibodies. FOXA1 binding was robust and >78% of new sites in Eed -/- villus cells mapped to hypomethylated adult and developmental enhancers (Figure 11, Part C). FOXG1 binding was less strictly quantifiable, but unambiguous occupancy was detected at fetal and embryonic enhancers showing H3K27ac gains, some of which also bound FOXA1 (Figure 11, Part D). Thus, following promoter-based early gene reactivation, these and other TFs likely account in aggregate for cis-element reactivation as an indirect consequence of PRC2 deficiency (Figure 13, Part A). Transcriptional consequences of enhancer recrudescence

Compared to the few bivalent genes activated within 9 days of initial CreER-T2 activation, by 14 days–the limit to which mice tolerated intestinal PRC2 loss– 10,298 additional genes were modulated (>2-fold, q <0.05). Most corresponding promoters had high basal levels of H3K4me3 and no H3K27me3 in WT intestines (Figure 12, Parts A-C). Thus, promoter H3K4me3 is the limiting factor–or a reliable proxy for one– at genes activated soon after PRC2 loss, but does not explain the genes activated after sustained PRC2 deficiency. Rather, genes activated after day 9 showed striking association with developmental enhancers that acquired H3K27ac, but not with enhancers that lacked H3K27ac gains (Figure 13, Part B). Conversely, genes that declined 11 days and 14 days after CRE activation were associated specifically with active enhancers that lost H3K27ac and H3K4me1 in Eed -/- cells (Figure 12, Part D).

More than 77% of non-bivalent genes decommissioned during intestine

development were activated in the prolonged absence of PRC2, and intestinal villus cells accordingly expressed both adult and embryonic transcriptomes (Figure 13, Part C). Among genes activated after sustained PRC2 deficiency, 70.2% had been expressed in the developing intestine, compared for example to 23.3% and 21.4% that were expressed in developing heart or lungs (P <0.0001), and most of the latter genes were also expressed in the developing intestine (Figure 12, Part E). Thus, activation was largely confined to, and encompassed most, intestine-specific developmental genes. Those expressed late in gestation and linked to silenced fetal (H3K4me1+) enhancers were reactivated by day 11 and further increased at day 14, whereas genes expressed early in development and associated with decommissioned embryonic (LMR-only) enhancers were re-expressed later (Figure 13, Part D). Basal expression of both gene groups was equally low in WT adult villi and the timing of re-expression in Eed -/- cells correlated with sequential reactivation of fetal and embryonic enhancers (Figure 9, Part A). Transcript levels of both groups in adult Eed -/- cells approached those present during development (Figure 13, Part E). Thus

H3K27ac/H3K4me1 and mRNA fluxes together reveal enhancer DNA hypomethylation as the crucial property that underlies memory and reactivation of tissue-restricted

developmental programs. Discussion

Methylated DNA inactivates endogenous viruses, sex chromosomes and imprinted genes, but its roles in tissue-specific and developmental gene control draw on correlative findings and remain controversial. Separately, it has been unclear if the decommissioning of developmental enhancers is irrevocable. This study reveals low meCpG as a persistent adult feature of ~90% of those decommissioned embryonic and fetal enhancers that were hypomethylated during development and the signature of sites reactivated after prolonged PRC2 deficiency. Among the millions of cis-elements in the genome only developmental enhancers with low meCpG were activated, despite their lack of focal H3K27me or PRC2 binding in wild-type cells. Together with embryonic TF binding at activated sites, these findings imply that enhancer recrudescence is unrelated to H3K27me per se and that hypomethylated DNA either determines which enhancers become activated or at least signifies the potential for robust, tissue-specific activation.

Hypomethylated DNA, presumably generated during development by TET enzymes is stably preserved from embryonic tissues to adult organs. The maintenance

methyltransferase DNMT1 offers a simple explanation because in each cell cycle it reproduces existing meCpG states on the new DNA strand. Thus, reduced meCpG could be preserved through faithful copying of fetal and embryonic meCpG templates over many cell generations; alternatively, it may reflect continually opposing DNMT and TET activities. Methylated histone H3K4 is thought to repel DNMT3L to help maintain absence of meCpG at promoters and CGIs in ES and germ cells and a recent study implicates DNMT3A in persistent hemimethylation of DNA at cohesin- and CTCF-binding sites during ES cell replication. Most LMRs in this study, however, lack H3K4me or other overt barriers to DNA methylation and de novo DNMTs decline appreciably after development. Of note, epiblast LMRs are largely methylated in adult intestinal cells and embryonic LMRs have slightly larger meCpG fractions than fetal enhancers. These observations imply that the copying mechanism may be imperfect over long periods, in agreement with Holliday and Pugh’s ideas about‘developmental clocks’, wherein DNMTs and demethylases are inherently inefficient over many cell cycles.

Both embryonic and fetal enhancers lack H3K27ac and their target genes are comparably silent in adults, but fetal enhancers retain low levels of H3K4me1, the solitary mark associated with enhancer‘priming’ or‘poising’ in ESC10-12. In adult intestines, however, enhancers bearing only H3K4me1 are those that were decommissioned late in organogenesis. As vestiges of fetal life, these sites likely have little physiologic function, though they do activate sooner in PRC2- null cells than H3K4me1- enhancers. Enhancers that act only during development often elude detection by assays for accessible chromatin or modified histones. Hypomethylated DNA, which preserves a faithful record of these enhancers into adulthood, can therefore be used to identify embryonic cis-elements in purified adult cells. However, only ~60% of active enhancers are categorically

hypomethylated (Figure 1, Part B), owing in part to limited CpG content and because some enhancers act through methyl-insensitive TFs32. Accordingly, whole-genome meCpG profiles will likely identify a significant but incomplete fraction of all cis-elements.

Hypomethylated developmental enhancers are the principal sites capable of activation in adult PRC2-deficient cells, and possibly more generally. This data imply that reduced enhancer meCpG is necessary and sufficient to recruit meCpG-sensitive TFs and that developmental gene reactivation is constrained mainly by the repertoires of latent enhancers and available TFs. The finite enhancer repertoire in any tissue, delineated during development, hence suggests a basis for oncofetal gene expression, such as activation of embryonic foregut-restricted genes in pancreatic cancer metastases. Moreover, depletion of the meCpG-binding domain protein MBD3, among other maneuvers, considerably enhances the efficiency of induced pluripotency in adult somatic cells. The nature of developmental enhancer repertoires may underlie such observations and suggests other novel approaches to tailor tissue-specific dedifferentiation.

At least in PRC2-deficient cells, reduced enhancer meCpG is necessary and sufficient for embryonic gene reactivation, which seems constrained by the repertoires of latent enhancers and available TFs. To the extent that these constraints apply generally, these findings may explain certain features of cellular reprogramming and of aberrant gene activity in cancer. TF-driven induction of pluripotency in adult somatic cells can leave vestiges of the starting cell type and limit the differentiation potential of resulting iPS cells. This is attributed in part to retained promoter CGI fingerprints, but the restricted potential may reflect residual tissue-specific enhancer hypomethylation. In the examples of TF- mediated modulation of one cell type into another, only selected heterologous cell fates are achieved, e.g., CEBPA-driven conversion of lymphocytes into macrophages, possibly constrained by the repertoire of tissue-specific developmental enhancers. Three TFs– PDX1, NEUROG3 and MAFA– together convert gut endocrine cells into insulinproducing beta cells in vivo. This process is markedly more efficient in the gastric antrum, which shares an origin with the pancreas, than in the intestine, possibly reflecting the pool of available hypomethylated enhancers. Finally, the finite enhancer repertoire in any cell, delineated during development, suggests a basis for oncofetal gene activation, such as tissue-specific fetal genes in colon and brain tumors and embryonic foregut-restricted genes in pancreatic cancer metastases. Methods

Mice and tamoxifen treatment. Mice, maintained on a mixed C57Bl/6 and 129/Sv background, were housed and handled according to ethical and procedural guidelines from the Animal Care and Use Committee of the Dana-Farber Cancer Institute. Eed fI mice have been described; Villin CreER-T2 mice were a gift from Sylvie Robine (Institut Pasteur, France); and Lgr5 EGFP-IRES- CreERT2 mice were purchased from The Jackson Laboratories. Animals 8 weeks or older were injected intraperitoneally with 2 mg tamoxifen on 5 consecutive days and, in some experiments, with 1 mg tamoxifen on alternate days thereafter (Figure 8, Part B).

Isolation of intestinal villus and stem cells. The proximal 1/3 small intestine

(duodenum) was used for histology and to collect cells for RNA and chromatin studies. Intestines harvested immediately after euthanasia were washed with cold phosphate- buffered saline (PBS), followed by rotation for 40 min in 5 mM EDTA in PBS (pH 8) at 4°C, with manual shaking every 10 min. Villus epithelium was recovered by filtering the resulting suspension over 70-µm filters (B-D Falcon). Villi retained on these filters were washed with ice-cold PBS and used to extract RNA, chromatin, or DNA. To purify Lgr5 + ISC, Lgr5 EGFP-IRES-CreERT2 mouse intestines were washed in PBS and villi were depleted by scraping with glass slides. Crypts were extracted by rotating for 30 min in 5 mM EDTA in PBS (pH 8) at 4°C, with manual shaking every 10 min, followed by discarding the supernatant and adding fresh EDTA solution for 10 additional min. Crypts in the 70-µm filtrate were dissociated into single cells by treatment with 4% TrypLE solution (Invitrogen) at 37°C for 30 min and GFP hi ISC were isolated from the viable (DAPI - ) cell fraction by flow cytometry on a BD FACSAria II SORP instrument.

Isolation of embryonic and fetal mouse endoderm. The morning of the copulation plug was designated embryonic day (E) 0.5 and embryos were harvested from pregnant dams on 11.5, E12.5, E14.5, and E16.5 into ice-cold PBS. The intestine (digestive tract distal to the pylorus and proximal to the cecum) was digested with 0.25% trypsin (Life Technologies) for 30 min at 37°C to release single cells, followed by neutralization with fetal bovine serum (FBS, Life Technologies). Cells were passed over 40-µm filters to remove tissue fragments, centrifuged at 1,200 g for 5 min, washed in cold PBS, suspended in fluorescent-activated cell sorting (FACS) buffer (5 mM EDTA in PBS and 2% FBS), stained with APC-conjugated EpCAM antibody (Biolegend 118214, Lot B217174, 1:100) for 1 h at 4°C. Viable (DAPI - ) EpCAM + cells were isolated by flow cytometry on a FACS Aria II SORP instrument.

Purification and treatment of macrophages. Bone marrow cells from Eed

F l/Fl ;Rosa26R EYFP mice were cultured in media supplemented with SCF, IL-3 and IL-6 (R&D Systems, 10 ng/ml each) for 3 days, followed by infection with MSCV-Cre retrovirus, prepared by cloning Cre cDNA into the BglII and XhoI restriction sites in an MSCV-Hygro vector (Clontech). After 48 h, KIT + cells were enriched using CD117 microbeads (Miltenyi Biotech). Cre-infected (Eed -/- , EYFP + ) and uninfected (Eed Fl/Fl , EYFP-) cell fractions were separated by flow cytometry and cultured for 3 additional days in media with 10 ng/ml SCF and IL-3, followed by 4 days in media with 10 ng/ml IL-3 and M-CSF to obtain populations highly enriched for CD11b + macrophages.

Histology and immunohistochemistry. Tissues were fixed overnight in 4% paraformaldehyde at 4ºC, washed in PBS, dehydrated in ethanol, and embedded in paraffin; 5-µm tissue sections were deparaffinized and rehydrated. Sections were stained with hematoxylin and eosin or treated with 10 mM sodium citrate (pH 6) for antigen retrieval and incubated overnight at 4°C with antibodies (Ab) against H3K27me1 (Active Motif 61015, Lot 35813006, 1:1000), H3K27me2 (Cell Signaling D18C8, Lot 12, 1:1000), H3K27me3 (Millipore 07-449, Lot 2607758, 1:1000) or KI67 (Vector VP-K452, 1:500) in PBS. After washing in PBS for 10 min, sections were incubated with anti-rabbit or anti-mouse IgG conjugated to Cy3, FITC or biotin (Jackson Laboratories, 1:1000) and signals were detected by fluorescence or by staining with Vectastain Elite ABC Kit (Vector) and 3,3'

diaminobenzidine tetrahydrochloride (Sigma P8375). Cultured macrophages were pelleted onto glass slides using Cytospin3 (Shandon) at 800 rpm for 5 min and fixed with 1% paraformaldehyde for 5 min. Slides were washed with PBS and processed as above for H3K27me1/2/3 immunohistochemistry. Analyses were conducted on tissues from at least 5 mice of each genotype.

Post-translational histone modifications. Mouse intestinal epitelial cells were lysed with an AFA Focused-Ultrasonicator (Covaris). Histones were purified from the lysates (Active Motif, Cat. No.40026) and desalted by off-line reversed phase chromatography on an Agilent 1200 tower using a Jupiter 5 µm C4300 Å Column 150 x 2 mm (Phenomenex). The resulting peak area was used to estimate concentration against histone preparations of known concentration. Desalted histones were lyophilized and spiked 1:1 with histones isolated from 5x10 6 HeLa cells grown in RPMI 1640 SILAC heavy arginine ( 13 C6 15 N4) medium (Cambridge Isotope Labs), followed by treatment with NHS-propionyl synthesized at neutral pH and digestion with trypsin (Promega). Peptides were lyophilized and treated again to obtain a homogenous population of derivatized lysines and new N-termini, followed by high resolution– high mass accuracy LC- MS/MS in an Orbitrap Elite instrument (Thermo Scientific) equipped with a nanoACQUITY UPLC tower with a 1x100mm HSS T3 1.8 µm column (Waters). Mass spectrometry data were interpreted using Mascot Distiller (Matrix Science) for identification, followed by manual verification. The peak area was processed using Skyline Software v1.4.0.422 (University of Washington) to quantify histone peptides bearing specific post-translational modifications.

RNA-seq and data analysis. Tissues or purified cells were lysed in Trizol (Life Technologies) and total RNA was extracted. RNA-seq libraries were prepared using TruSeq RNA Sample Preparation Kit V2 (Illumina RS-122-2001) for adult cells or the SMARTer- Seq v4 Low Input mRNA library kit (Clontech) for embryonic (E11.5, E12.5, E14.5, and E16.5) samples. Libraries were sequenced on a NextSeq 500 instrument (Illumina) and 75 bp single-end reads were aligned to the mouse genome (Mm9, NCBI build 37) using STAR aligner v2.5.3a. Data quality measures, including per-base sequence quality, per-read GC content (~50%), comparable read alignments to +/- strands, exonic vs intronic read distributions and 3’ bias, were determined using RSeQC v2.6.2. Gene specific read counts were determined using HTSeq v0.6.1, followed by normalization in DESeq2. Normalized tag counts were converted into reads per kb of transcript length per 1M mapped reads (RPKM) and heatmaps representing this measure were generated using GENE-E software (Broad Institute). Differential gene expression between cell types was determined in DESeq2 using negative binomial GLM fitting and the Wald test to calculate statistical significance (P value). Benjamini-Hochberg correction was used to calculate adjusted P values (FDR); genes with FDR <0.05 and the indicated fold-changes were regarded as differentially expressed.

A TAC-seq. ATAC-seq was performed on 5,000 to 30,000 FACS-sorted epithelial cells from embryonic or adult intestinal epithelium. Freshly isolated cells were washed with cold PBS, resuspended in 50 µl cold ATAC lysis buffer (10 mM Tris·Cl, pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.1% (v/v) Igepal CA-630), and nuclei were isolated by centrifugation at 500 g at 4°C. Nuclear pellets were treated with Nextera Tn5 Transposase (Illumina, FC- 121-1030) for 30 min at 37°C in 50 µl reactions. Transposed DNA was column-purified (Qiagen, 28004) and amplified using high-fidelity 2X PCR Master Mix (New England Biolabs) with a common forward primer and different reverse primers carrying sample- specific barcodes. After 5 cycles of amplification, 5 µl of the reaction-mix was amplified using qPCR for 20 cycles; the remaining 45 µl was then amplified for the number of cycles necessary to achieve 1/3 of the maximum fluorescence intensity in qPCR. Primer dimers (<100 bp) were removed from the amplified ATAC-seq library using AMPure beads (Beckman Coulter, A63880), library size distribution was determined using high sensitivity DNA Chip detection on Bioanalyzer 2100 (Agilent Genomics), and sequencing was done on a NextSeq 500 instrument (Illumina) to obtain 75 bp single-end reads.

ChIP-seq. Intestinal epithelium or FACS-sorted cells were fixed by in 1%

f ormaldehyde for 25 min at room temperature immediately following isolation. ~1x10 6 to 5x10 6 cells were lysed in buffer containing 30^mM Tris-HCl (pH^8), 1% SDS, 10^mM EDTA, and protease inhibitors (Roche), and chromatin was sheared by sonication in a Covaris E210 sonicator for 50^min with 5^min on/off cycles at 4 o C. After centrifugation to remove debris, chromatin was incubated overnight at 4°C with Ab against H3K4me1 (Diagenode C15410194, Lot A1862D), H3K4me3 (Diagenode C15410003 ,Lot A1052D), H3K27ac (Active Motif 39135, Lot F1311), H3K27me1 (Active Motif 61015, Lot

35813006), H3K27me2 (Cell Signaling D18C8, Lot 12), H3K27me3 (Millipore 07-449, Lot 2607758), Suz12 (Cell Signaling, D39F6, Lot 6), FOXG1 (Active Motif 61211 ,Lot 34711001), HNF4A (Santa Cruz sc-6546 (C-19)X, Lot F1311), or 1:1.5 combination of FOXA1 Ab (Abcam 23738, Lot gr292351-2 and Abcam 5089, Lot gr122110-14). For samples indicated in Table 1, before Ab incubation, 10 ng was added, Spike-in chromatin from Drosophila Line 2 (S2, Active Motif 53083, Lot 11316004) and 2 µg Spike-in Ab against the Drosophila-specific histone variant H2Av (Active Motif 61686, Lot 17316003). Ab-bound chromatin was capture with magnetic beads (Dynal) and washed sequentially in low-salt (20 mM Tris-HCl pH 8.1, 150 mM NaCl, 2 mM EDTA, 0.1% SDS, 1% TritonX- 100), high-salt (20 mM Tris-HCl pH 8.1, 500 mM NaCl, 2 mM EDTA, 0.1% SDS, 1% TritonX-100), and lithium (10 mM Tris pH 8.1, 0.25 M LiCl, 1 mM EDTA, 1% NP-40, 1% deoxycholate) buffers. Cross-links were reversed using 1% SDS and 0.1 M NaHCO 3 for 6 h at 65°C, DNA was purified using columns (Qiagen), and ChIP-seq libraries were prepared using ThruPLEX kit (Rubicon, R400427). DNA size distribution in the libraries was determined using high sensitivity DNA Chip detection on Bioanalyzer 2100 (Agilent Genomics) and 75 bp single-end reads were sequenced on a NextSeq 500 instrument (Illumina).

ATAC- and ChIP-seq data processing and computational analysis. Reads were aligned to the mouse genome (Mm9, NCBI build 37) using Bowtie2 v2.1.0; PCR duplicates and reads aligned to multiple locations were removed from the raw alignment (bam) files and peaks (P<10 -5 ) were detected using MACS v1.4. For further analysis, peaks were divided into promoters (<2 kb upstream and <1 kb downstream from TSSs) or enhancers (non-promoter). Read distributions were visualized on the Integrated Genomics Viewer version 2.3 after conversion into signal files (bigWig) using DeepTools v2.1.0. Signals across samples were quantile normalized with Haystack, using 50-bp windows across the genome. For libraries with Drosophila chromatin spike-in, reads were aligned to the Drosophila genome (dm6). A normalization factor (NF) was derived for each library: NF = Drosophila reads in library with the lowest count / Drosophila reads in that library. Each library was then down-sampled to the read counts proportional to its NF.

RNA-seq reads were aligned to the mouse genome (Mm9, NCBI build 37) using STAR

aligner v2.5.3a. Data quality measures, including per-base sequence quality, per-read GC content (~50%), comparable read alignments to +/- strands, exonic vs intronic read distributions and 3’ bias, were determined using RSeQC v2.6.2. Gene specific read counts were determined using HTSeq v0.6.1, followed by normalization in DESeq2. Normalized tag counts were converted into reads per kb of transcript length per 1M mapped reads (RPKM) and heatmaps representing this measure were generated using GENE-E software (Broad Institute). Differential gene expression between cell types was determined in DESeq2 using negative binomial GLM fitting and the Wald test to calculate statistical significance (P value). Benjamini-Hochberg correction was used to calculate adjusted P values (FDR); genes with FDR <0.05 and the indicated foldchanges were regarded as differentially expressed. RNA expression in developing tissues (Figure 12, Part E) was determined using data from the ENCODE consortium (www.encodeproject.org). Genes expressed from E12.5 to E16.5 heart and lung and silenced before birth were determined by unsupervised k-means clustering of all differentially expressed genes among the developmental and postnatal stages

Enhancer ATAC peaks from E11.5, E12.5, E14.5, E16.5, and adult intestinal epithelium were pooled to identify a total of 68,510 unique peaks. Unsupervised k-means clustering was conducted for signal within 1.5 kb from the centres of ATAC peaks, using DeepTools v2.1.0. Gap statistics were used to determine the optimal number of clusters (Figure 5, Part F) and clusters with visually similar signal patterns were merged to form 4 final groups (Figure 6, Part A). Differential ChIP signals between cell types were determined with diffReps v1.55.4 using comparison of read counts over 1-kb windows with step size of 100 bp across the genome; ChIP-seq signals for input DNA samples were used as the background. Negative binomial test was used to calculate P-values, and FDRs were calculated using Benjamini–Hochberg adjustment; differential regions with FDR <0.01 were selected for further processing.

Evolutionary conservation of enhancer groups (Figure 3, Part E) was determined by comparison with all vertebrate genomes using Cistrome. PhastCons conservation scores were plotted as average profiles centered on enhancer summits. Gene enrichment analysis was conducted using GREAT analysis tool v3.0 and default parameters to identify Gene Ontology (GO) terms for biological processes associated with genes within 50 kb of different enhancer groups. Significantly enriched terms were plotted using binomial P- values (Figure 9, Part D and Figure 5, Part C). To identify TF motifs enriched in different groups of enhancers, the de novo motif- finding tool HOMER v4.7.2 was used, based on cumulative binomial distributions.

WGBS library preparation, data alignment, and determination of meCpG. Genomic DNA was purified from cells using MasterPure DNA Purification kit (Epicenter

MCD85201) and 50 ng DNA was treated with the EZ DNA Methylation-Gold kit (Zymo Research D5005) for bisulfite conversion.10 ng of bisulfite-converted DNA was amplified and prepared whole genome bisulfite sequencing (WGBS) libraries using the EpiGenome Methyl-Seq kit (Epicenter EGMK81312). Libraries were purified using AMPure beads (Beckman Coulter) and DNA size range of 200- 800bp confirmed using high sensitivity DNA Chip detection (Bioanalyzer 2100, Agilent Genomics). Libraries were sequenced on a NextSeq 500 instrument (Illumina) with up to 50% PhiX phage DNA (Illumina) to obtain 150-bp paired-end reads. WGBS reads were filtered to remove those of poor quality and the error-prone 6 bases from the 5’ end using Trimmomatic v0.32. The mouse genome (mm9, build NCBI37) was bisulfite- converted using Bismark v0.13.1 and trimmed WGBS reads were aligned to this genome using Bowtie2 within Bismark. Duplicate reads were removed and the number of alignments with C (methylated) or T (unmethylated) were determined for each CpG dinucleotide using Bismark. Percent methylation for individual positions was calculated as the fraction of alignments with C relative to the total number of alignments (C or T). In further analysis of each CpG, counts from the two Cs on complementary strands were combined to determine strand- independent methylation.

Determination of unmethylated (UMR) and low-methylated (LMR) regions and tissue specificity.

WGBS reads were filtered to remove those of poor quality and the error-prone 6 bases from the 5’ end using Trimmomatic v0.32. The mouse genome (mm9, build NCBI37) was bisulfite converted using Bismark v0.13.1 and trimmed WGBS reads were aligned to this genome using Bowtie2 within Bismark. Duplicate reads were removed and the number of alignments with C (methylated) or T (unmethylated) were determined for each CpG dinucleotide using Bismark. Percent methylation for individual positions was calculated as the fraction of alignments with C relative to the total number of alignments (C or T). In further analysis of each CpG, counts from the two Cs on complementary strands were combined to determine strand-independent methylation.

U MRs and LMRs were identified using the MethylSeekR package with the mouse bisulfite-converted genome (BSgenome.Mmusculus.UCSC.mm9) from Bioconductor (www.bioconductor.org) as the reference. All CpGs with coverage <5 were disregarded and C nucleotides overlapping known SNPs between the reference strains (C57BL/6J and 129/S5) were removed to eliminate spurious effects from polymorphism; methylation levels were smoothed over 3 consecutive CpG. Hypomethylated regions were identified as those with smoothed meCpG levels below various cut-offs and containing a minimum numbers of CpGs, to calculate corresponding false discovery rate (FDR) (Figure 2, Part A). Regions were divided into UMRs (unmethylated and high number of CpGs) or LMRs (CpG-poor with fractional methylation between 10% and pre-defined upper limits).

Average DNA methylation was determined at UMR and LMRs from the intestinal epithelium, skin, blood, and brain. Tissue specificity was calculated as z-scores by comparing the methylated fraction for each UMR or LMR in one tissue against the average of all other tissues.

Linking cis-regulatory regions and gene expression. To correlate groups of enhancers with gene expression, the Regulatory Potential Score (RPS) for individual genes was calculated from a distance-based metric linking all enhancers within 50 kb of that gene using BETA. For comparative analysis of gene regulation by different groups of enhancers (Figure 6, Part B), z-scores were calculated using the mean RPS of the designated gene set and embryonic, fetal or adult enhancers. To analyze expression changes for genes linked to different groups of enhancers in PRC2-null intestinal cells (Figure 13, Part B and Figure 12, Part D), genes were assigned to adult, embryonic or fetal enhancers based on the highest regulatory potential, as defined by BETA. Enriched expression of genes assigned to each group of enhancers in wild-type or Eed -/- cells was determined using the Gene Set

Enrichment Analysis approach. Enrichment scores were determined in relation to 1,000 permutations of random gene sets of similar size.

Additional data analyses and display. Scatter plots with density and contour estimates of DNA methylation levels (Figure 3, Part B, Figure 1, Part C, Figure 4, Part A, and Figure 7, Part A) and violin plots with gene expression, meCpG levels or H3K27ac signal estimates were generated in base R (version 3) or using ggplot2 package v2.2.1. Aggregate density profiles of ChIP-seq and meCpG read distributions (Figure 3, Part B right, Figure 1, Part C, and Figure 12, Part C) were generated using the SitePro package in Cistrome. Venn diagrams were created using BioVenn. Heatmaps representing ATAC-seq and ChIP-seq data for the same mark (from the same antibody) were created using signals normalized across all samples. Average signal intensity was calculated over non-overlapping 50-bp bins over the regions indicated in each panel; scales alongside the heatmaps represent the relative signal range. Incorporation by Reference

The contents of all references, patent applications, patents, and published patent applications, as well as the Figures and the Sequence Listing, cited throughout this application are hereby incorporated by reference. Equivalents

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims. able 1. Summary of genome-wide analyses

EY RESOURCES TABLE