Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS OF CAPTURING CELL-FREE METHYLATED DNA AND USES OF SAME
Document Type and Number:
WIPO Patent Application WO/2017/190215
Kind Code:
A1
Abstract:
There is described herein, a method of capturing cell-free methylated DNA from a sample having less than 100 mg of cell-free DNA, comprising the steps of: subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated; denaturing the sample; and capturing cell-free methylated DNA using a binder selective for methylated polynucleotides.

Inventors:
DE CARVALHO DANIEL DINIZ (CA)
SHEN SHU YI (CA)
SINGHANIA RAJAT (CA)
Application Number:
PCT/CA2017/000108
Publication Date:
November 09, 2017
Filing Date:
May 03, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV HEALTH NETWORK (CA)
International Classes:
C12Q1/68; C12N15/10; C40B30/04
Other References:
TAIWO, O. ET AL.: "Methylome analysis using MeDIP-seq with low DNA concentrations", NATURE PROTOCOLS, vol. 7, no. 4, 8 March 2012 (2012-03-08), pages 617 - 636, XP055581354, ISSN: 1750-2799, DOI: 10.1038/nprot.2012.012
ZHAO, M-T ET AL.: "Methylated DNA Immunoprecipitation and High-Throughput Sequencing (MeDIP-seq) Using Low Amounts of Genomic DNA", CELLULAR REPROGRAMMING, vol. 16, no. 3, 2014, pages 175 - 184, XP055581361, ISSN: 2152-4971, DOI: 10.1089/cell.2014.0002
LEHMANN-WERMAN, R. ET AL.: "Identification of tissue-specific cell death using methylation patterns of circulating DNA", PNAS PLUS, vol. 113, 14 March 2016 (2016-03-14), pages E1826 - E1834, XP055436315, ISSN: 1091-6490
HEYN, H. ET AL.: "DNA methylation profiling in the clinic: application and challenges", NATURE REVIEWS GENETICS, vol. 13, no. 10, 4 September 2012 (2012-09-04), pages 679 - 692, XP055436318, ISSN: 1471-0064
See also references of EP 3452615A4
Attorney, Agent or Firm:
CHIU, Jung-Kay (CA)
Download PDF:
Claims:
CLAIMS:

1. A method of capturing cell-free methylated DMA from a sample having less than 100 ng of cell-free DNA, comprising the steps of; a. subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; b. adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated; c. denaturing the sample; and d. capturing cell-free methylated DNA using a binder selective for methylated polynucleotides.

2. The method of claim 1 further comprising the step of amplifying and subsequently sequencing the captured cell-free methylated DNA.

3. The method of claim 1, wherein the sample has less than 50 ng of cell-free DNA

4. The method of claim 1 , wherein the first amount of filler DNA comprises about 6%, 10%,. 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% methylated filler DNA with remainder being unmethytated filler DNA, and preferably between 5% and 50%, between 10%-40%, or between l5%-30% methylated filler DNA. 5. The method of claim 1, wherein the first amount of filler DNA is from 20 ng to 100 ng, preferably 30 ng to 100 ng. more preferably 50 ng to 100 ng.

6.

The method of claim 1, wherein the ceil-free DNA from the sample and the first amount of filler DNA together comprises at least 50 ng of total DNA. preferably at least 100 ng of total DNA

7.

The method of claim 1, wherein the filler DNA Is 50 bp to 800 bp long, preferably 100 bp to 600 bp long, and more preferably 200 bp to 600 bp long. 8. The method of claim 1 , wherein the fflter DNA is double stranded.

9. The method of claim 1 , wherein the filter DMA la junk DNA.

10. The method of claim 1, wherein the filler DNA is endogenous or exogenous DNA.

11. The method of claim 10, wherein the filler DNA is non-human DNA, preferably λ DNA.

12. The method of claim 1 , wherein the filler DNA has no alignment to human DNA.

13. The method of claim 1, wherein the binder is a protein comprising a Methyi- CpG-bindfng domain.

14. The method of claim 13, wherein the protein is a MBD2 protein.

15. The method of claim 1, wherein step (d) comprise* immunopreclpltating the cell-free methylated DNA using an antibody.

16. The method of claim 15, comprising adding at least 0.05 pg of the antibody to the sample for immunoprecipKation, and preferably at least 0.16 pg.

17. The method of claim 15, wherein the antibody is 5-MeC antibody or 5- hydroxym ethyl cytoalne antibody.

18. The method of claim 15, further comprising the step of adding a second amount of control DNA to the sample after step (b) for confirming the immunopreclpftation reaction. 19. The method of claim 1 , further comprising the step of adding a second amount of control DNA to the sample after step (b) for confirming the capture of cell- free methylated DNA.

20. Use of the method of any one of claims 1-19 for measuring a DNA methylation profile within the sample.

21. Use of the DNA methylation profile as defined in claim 20 to identify the presence of cell free DNA from cancer cede within the sample by correlating the profile with known methyiatJon profiles of tumour tissue.

22. Use of the DNA rnethytation profile as defined In claim 20 for identifying ttoue- of-origln of the cell-free DNA wtthln the sample by correlating the profile with Known methylation profiles of specific tissues.

23. The use of claim 21 , further comprising the use of claim 22 for Identifying tissue of origin of the cancer cells within the celWree DNA within the sample.

24. The use of any one of claims 20-23 for monitoring immune therapy.

25. The use of any one of claims 20-23 for the diagnosis of autoimmune conditions.

26. The use of claim 22 for determining cell turnover in a subject from which the sample is taken.

Description:
METHODS OF CAPTURING CELL-FREE METHYLATED DNA

AND USES OF SAME

REFERENCE TO RELATED APPLICATION This application claims priority to U.S. Provisional Patent Application No.62/331 ,070 filed on May 3, 2016, which is Incorporated herein In Us entirety.

FIELD OF THE INVENTION

The invention relates to the field of cell free DNA and, more specifically, to methods and uses of capturing cell-free methylated DNA.

BACKGROUND OF THE INVENTION

DNA methytetion is a covalent modification of DMA and a stable gene regulatory mechanism that plays an important role In the chromatin architecture. In humans, DNA methylation primarily occurs at cytokine residues In CpG dlnudeotldes. Unlike other dinucleotides, CpGs are not evenly distributed across the genome but are instead concentrated in short CpG-rich DNA regions called CpG islands. DNA methylation can lead to gene repression by two main mechanisms: 1) recruiting methyl-binding domain proteins, which can in turn recruit h ' etone deacetylases (HDACs) and 2) blocking the access to binding sites of transcription factors (TFs), such as c-MYC 1 .

In general, the majority of the CpG sites in the genome are methylated, whfle most of the CpG Islands remain unmethyiated during normal development and in differentiated tissues 1 . Despite this fact, K is possible to identify tissue-specific patterns of DNA methylation in normal primary tissues 2 . Moreover, during malignant transformation, global DNA hypometnyiation, and focal hypermethytarjon at CpG islands are frequently observed 1 . In fact, DNA methylation patterns have been used to stratify cancer patients fnto clinically relevant subgroups with prognostic value In glioblastoma', ependymomas 4 , colorectal 1 , breast 1 ' 7 , among many other cancer types.

Due to Ks stability and role in normal differentiation and diseases such as cancer, DNA methylation is a good biomarker that can be used to represent tumor characteristics and phenotypic states and therefore, has high potential for personalized medicine. Many sample types are suitable for DNA methylation mapping and for blomarker discovery including fresh and FFPE tumor tissue, blood cells, urine, saliva, stool, among others'. More recently, the use of circulating celt-free DNA (cfDNA) as a blomarker is gaining momentum, especiaRy in situations where genomic distinctions exist, such as In cancer (somatic mutations) 8 , transplants (donor versus recipient DNA) 10 and pregnancy (fetus versus mother DNA) 11,12 . Use of DNA methylation mapping of cfDNA as a blomarker could have a significant Impact, as it could allow for the identification of the fissue-of-orlgin and stratify cancer patients in a minimally invasive fashion. Moreover, it could enable the use of cfDNA as a biomaricer In situations where genomic distinctions do not exist, such as monitoring immune response, neurodegenerative diseases or myocardial Infarction, where the epigenetic aberration can be detected in the cfDNA.

Furthermore, using genome-wide DNA methylation mapping of cfDNA could overcome a critical sensitivity problem in detecting circulating tumor DNA (dDNA) in patients with early-stage cancer with no radiographic evidence of disease. Existing ctQNA detection methods are based on sequencing mutations and have limited sensitivity In part due to the limited number of recurrent mutations available to distinguish between tumor and normal circulating cfDNA 1a,u . On the other hand, genome-wide DNA methylation mapping leverages large numbers of epigenetic alterations that may be used to distinguish circulating tumor DNA (ctDNA) from normal circulating cell-free DNA (cfDNA). For example, some tumor types, such as ependymomas, can have extensive DNA methylation aberrations without any significant recurrent somatic mutations 4 .

Moreover, pan-cancer data from The Cancer Genome Adas (TCQA) shows large numbers of DMRs between tumor and normal tissues across virtually all tumor types 19 . Therefore, these findings highlighted that an assay that successfully recovered cancer- specific DNA methylation alterations from ctDNA could serve as a very sensitive tool to detect, classify, and monitor malignant disease with low sequencing-associated costs.

' However, genome-wide mapping DNA methylation In cfDNA is extremely challenging due to the low amount of DNA available and to the fact that cfDNA is fragmented to less than 200 bp In length 16 . This makes it impossible to perform traditional MeDIP- seq, which needs at least 50-100 ng of DNA 17 or RRBS (Reduced Representation Bisulfite Sequencing), which needs non-fragmented DNA 1 '. Another Issue to mapping DNA methylation in cfDNA, Is the low abundance of the DNA of interest within the normal cfDMA 1 '. This makes It impractical to perform WGBS, as the cost of sequencing with enough depth to capture the low abundance DMA is prohibitive. On the other hand, a method that selectively enriches for CpG-rfch features prone to methylatlon Is likely to maximize the amount of useful information available per read, decrease the cost, and decrease the DNA losses.

SUMMARY OF INVENTION

According to one aspect, there is provided a method of capturing cell-free methylated DNA from a sample having less than 100 ng of cell-free DNA, comprising the steps of: subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated; denaturing the sample; and capturing cell-free methylated DNA using a binder selective for methylated polynucleotides.

BRIEF DESCRIPTION OF FIGURES

Embodiments of the Invention may best be understood by referring to the following description and accompanying drawings. In the drawings:

Figure 1 shows the methytome analysis of cfDNA Is a highly sensitive approach to enrich and detect ctDNA in low amounts of input DNA. A) Computer Simulation of the probability to detect at least one eplmutation as a function of the concentration of ctDNA (columns), number of DMRs being Investigated (rows), and the sequencing depth (x-axta). B) Genome-wide Pearson correlation between DNA methylatlon signal tor 1 to 100 ng of input DNA from HCT116 cell line fragmented to mimic plasma cfDNA. Each concentration has two biological replicates. C) DNA methyl ati on profile obtained from ctMeDlP-seq from different concentrations of Input DNA from HCT116 (Green Tracks) plus RRBS (Reduced Representation Bisulfite Sequencing) HCT116 data obtained from ENCODE (ENCSROOODFS) and WGBS (Whole-Genome Bisulfite Sequencing) HCT116 data obtained from GEO (GSM1465024). For the heatmap (RRBS track), yellow means methylated, blue means unmethylated and gray means no coverage. D-E) Serial dilution of the CRC cell ine HCT116 into the Multiple Myeloma (MM) eel! line MM1.S. cfMeDiP-seq was performed in pure HCT116 DNA (100% CRC), pure MM1.S DNA (100% MM) and 10%, 1%, 0.1%, 0.01%, and 0.001% CRC DNA diluted into MM DNA. All DNA was fragmented to mimic plasma cfDNA. We observed an almost perfect linear correlation (1^0.99, p<0.0001) between the observed versus expected (D) numbers of DMRs and (E) the DNA methylation signal (In RPKM) within those DMRs. F) In the same dilution series, known somatic mutations are only detectable at 1/100 allele fraction by ultra-deep (>10.000X) targeted sequencing, above the background sequencer and polymerase error rate. Shown are the fractions of reads containing each base or an Insertion/deletion at the site of each mutation in the CRC cell line. G) Frequency of ctDNA (human) as a percentage of total cfDNA (human + mice) in the plasma of mice harboring patient-derived xenograft . (PDX) from two colorectal cancer patients.

Figure 2 shows the schematic representation of the cfMeDIP-seq protocol..

Figure 3 shows sequencing saturation analysis and qualty controls. A) The figure shows the results of the saturation analysis from the Bioconductor package MEDIPS analyzing cfMeDIP-seq data from each replicate for each input concentration from the HCT116 DNA fragmented to mimic plasma cfDNA. B) The protocol was tested in two replicates of four starting DNA concentrations (100, 10, 5, and 1 ng) of HCT116 cell line. Specificity of the reaction was calculated using methylated and unmethylated spiked-in A. thaliana DNA. Fold enrichment ratio was calculated using genomic regions of the fragmented HCT116 DNA (Primers for methylated testifrepeciflc H2B, TSH2B0 and unmethylated human DNA region (GAPDH promoter)). The horizontal dotted line indicates a fold-enrichment ratio threshold of 25. Error bars represent ± 1 s.e.m. C) CpG Enrichment Scores of the sequenced samples show a robust enrichment of CpGs wfthln the genomic regions from the immunoprecipitatod samples compared to the input control. The CpG Enrichment Score was obtained by dividing the relative frequency of CpGs of the regions by the relative frequency of CpGs of the human genome. Error bars represent ± 1 s.e.m.

Figure 4 shows quality controls from cfMeDIP-seq from serial dilution. A) Schematic representation of the CRC DNA (HCT116) dilation into MM DNA (MM1.S). B) Specificity of reaction for each dlution was calculated using methylated and unmethylated spiked-in A. thaliana DNA. C) CpG Enrichment Scores of the sequenced samples show a strong enrichment of CpGs within the genomic regions from the immunoprecipitated samples. The CpG Enrichment Score was obtained by dividing the relative frequency of CpGs of the regions by the relative frequency of CpGs in the human genome. D) The figure shows the resuRs of the saturation analysis from each dilution point

Figure S shows that the cfMeOIP-seq method can Identify thousands of differentially methylated regions on circulating cfDNA obtained from pancreatic adenocarcinoma patients. A) Experimental design. B) Volcano plot for circulating cfDNA from pancreatic cancer (cases, n=24) versus healthy donors (controls, n=24) using cfMeOIP-seq. Red dots Indicate the windows that reached significance after correction tor multiple tests. C) Heatmap of the 38,085 DMRs Identified In the plasma DMA from healthy donors and pancreatic cancer patients. Hierarchical clustering method: Ward. D) Permutation analysis to estimate the frequency of expected versus the observed overlap between the DMRs identified In the plasma (cases versus controls) and the cancer-specific DMCs identified in the primary tumor tissue (primary tumor versus normal tissue). The box-plots represent the null distribution for the overlap. The diamonds represent the experimentally observed number of overlap between primary tumor tissue and DMA methylation from circulating cfDNA. Red diamonds mean the observed number of overlaps Is significantly more than expected by chance. Green diamonds mean that the observed number of overlaps is significantly less than expected by chance and blue diamonds are non-significant We calculated four possible overlaps: rtypermethylated In the primary tumor tissue and hypermetnytated in the circulating cfDNA (Enriched, P-value: 6.4 x 10 " "); Hypermethytated In the tumor tissue and hypomethylated In the circulating cfDNA (Depleted, P-value: 9.43 x 10" 17 ); Hypomethyiated in the tumor tissue and hypomethylated in the circulating cfDNA (Enriched, P-value: 1.88 x 10 "2 ™); Hypomethylated in the tumor tissue and hyperrnethytated In the circulating cfDNA (P-value: 0.105). E) Permutation analysis to estimate the frequency of expected versus the observed overlap between the DMRs identified in the plasma (cases versus controls) and the cancer-specific DMCs Identified In the primary tumor tissue (primary tumor versus normal PBMCs).

Figure 6 shows quality controls for cfMeDIP-seq from circulating cfDNA from pancreatic adenocarcinoma patients (cases) and healthy donors (controls). A-B) Specificity of reaction for each case (A) and each control (B) sample was calculated using methylated and unmethyiated spiked-ln A. thaliana DNA. Fold enrichment ratio was not calculated due to the very limited amount of DNA available. C-D) CpG Enrichment Scores of fhe sequenced samples show a strong enrichment of CpGs within the genomic regions from the immunoprecfpteted samples.

Figure 7 shows A) PCA on the 48 plasma cfDNA methylation from healthy donors and early stage pancreatic adenocarcinoma patients using the top million most variable genome-wide windows. For each window, variability was calculated using the MAD (Mean Absolute Deviation) metric, which is a robust measurement that returns the median of the absolute deviations from the data's median value; in this case, the data Is the RPKM values across all the 48 samples for a given window. PC1 versus PC2 (left) and PC1 versus PC3 (right) are shown. B) Percentage of variance for each principal component C) Volcano plot for tumor versus normal LCM tissue from pancreatic adenocarcinoma patients using RRBS. Total numbers of DMCs (Differentralfy Methylated CpGs) Identified are listed. Red dots indicate the windows that reached significance after correction for multiple tests and having absolute methylation difference (absolute delta beta) > 0.25. D) Scatter-plot showing the significance of the DNA methylation difference for each overlapping window. X-axis shows the tog10 q values for the primary pancreatic adenocarcinoma tumor versus normal tissue from the RRBS data. If the region Is hypermethylated in the tumor, the significance Is showed on a positive scale. Hypomethylated regions are shown on a negative scale. Y-axis shows the log 10 q values for the plasma cfDNA methylation from pancreatic adenocarcinoma patients versus healthy donors from the cfMeDIP-eeq data. Blue dots are significant in both. Red line shows the trend line. E) Scatter-plot showing the DNA methylation difference for each overlapping window. X-axis shows the DNA methylation difference for the primary pancreatic adenocarcinoma tumor versus normal tissue from the RRBS data. Y-axis shows the DNA methylation difference for the plasma cfDNA methylation from pancreatic adenocarcinoma patients versus healthy donors from the cfMeDIP-seq data. Blue line shows the trend line. F) Volcano plot for LCM pancreatic adenocarcinoma tissue versus normal PBMCs using RRBS. Total numbers of DMCs (Differentially Methylated CpGs) Identified are listed. Red dots Indicate the windows that reached significance after correction for multiple tests and having absolute methylation difference (absolute delta beta) > 0.25. G) Scatter-plot showing the significance of the DNA methylation difference for each overlapping window. X-axis shows the log10 q values for the primary pancreatic adenocarcinoma tumor versus normal PBMCs from the RRBS data. If the region is hypermethylated in the tumor, the significance is showed on a positive scale. Hypomethylated regions are shown on a negative scale. Y-axis shows the log 10 q values for the plasms cfDNA metfiylatton from pancreatic adenocarcinoma patients versus healthy donors from the cfMeDIP-seq data. Blue dots are significant in both. Red line shows the trend line. H) Scatter-plot showing the DNA methylatlon difference far each overlapping window. X-axis shows the DMA methylatlon difference for the primary pancreatic adenocarcinoma tumor versus normal PBMCs from the RRBS data. Y-axis shows the DNA methylatlon difference for the plasma cfDNA methylatlon from pancreatic adenocarcinoma patients versus healthy donors from the cfMeDIP-seq data. Figure 8 shows circulating cfDNA methylatlon profile can be used to identify transcription factors (TFs) footprints and infer active transcriptional networks In the tissue-of-origln. A) Expression profile of all TPs (n=33) whose motifs were enriched (using the software HOMER 20 ) in the regions hypomethylated in the cfDNA from healthy donors (hypomethytated footprints In contra le) across multiple human tissues. The expression data was obtained from the Genotype-Tissue Expression (GTEx) project 21 . Several TFs preferentially expressed In the hematopoietic system were identified (PU.1, Fill , STAT5B, KLF1). B) Expression profile of all TFs with hypomethylated motifs in controls (n*33) versus the expression profile of 1,000 random sets of 33 TFs in whole blood (GTEx data). C) Expression profile of all TFs (n«85) whose motifs were enriched in the regions hypomethylated in the cfDNA from pancreatic adenocarcinoma patients (hypomethylated footprints in cases). Several pancreas-specific or pancreatic cancer-associated TFs were identified. Moreover, hallmark TFs that drive molecular subtypes of pancreatic cancer were also identified. D) Expression profile of all TFs with hypomethylated motifs m cases (n=85) versus the expression profile of 1 ,000 random sets of 85 TFs in normal pancreas (GTEx data). E) Expression profile of all TFs with hypomethylated motifs in cases (n=>85) versus the expression profile of 1,000 random sets of 85 TFs In pancreatic adenocarcinoma tissue (TCGA data).

Figure 9 shows % Recovery of spiked-in unmethylatad A thaliana DNA after cfMeDIP-seq using 10ng, 5ng and 1 ng of starting cancer cell-free DNA amounts (n=3), combined with 90ng, 95ng and 99ng of filler DNA respectively or no filler DNA, prior to immunoprecipftatjon. The filler DNA used varied in the composition of % artificially methylated to % unmethylated lambda DNA present to increase final amount prior to immunoprecipitation to 100ng. The % recovery of spitocUn unmethylated DNA desired is <1.0%, with lower recovery resulting In higher % specificity of reaction.

Figure 10 shows % Recovery of aplked-in methylated A. thaiiana DNA after cfMeDIP- seq using 10ng, 6ng and 1ng of starting cancer cell-free DNA amounts (n*3), combined with 90ng, 85ng and 99ng of filler DNA respectively or no filler DNA, prior to lmmunopreclpitation. The filler DNA used varied In the composition of % artificially methylated to % unmethylated lambda DNA present to increase final amount prior to lmmunopreclpitation to 100ng. Minimum % recovery of sp!ked-in methylated DNA desired is 20%.

DETAILED DESCRIPTION

We biolnforrnadcally simulated mixtures with different proportions of ctDNA, from 0.001% to 10% (Figure 1A, column facets). We also simulated scenarios where the ctDNA had 1, 10, 100, 1000, or 10000 DMRs (Differentially Methylated Regions) as compared to normal cfDNA (Figure 1A, row facets). Reads were then sampled at varying sequencing depths at each locus (10X, 100X, 1000X, and 10000X) (Figure 1A, x-axis). We found an Increasing probability of detecting of at least 1 cancer-specific event (Figure 1 A) as the number of DMRs increased, even at low abundance of cancer ctDNA and shallow coverage. To overcome these challenges, we have developed a new method calad cfMeDIP-eeq (cell-free Methylated DNA lmmunopreclpitation and high-throughput sequencing) to perform genome-wide DNA methylation mapping using cell-free DNA. The cfMeDIP- seq method described here was developed through the modification of an existing low Input MeDIP-seq protocol 17 that is robust down to 100 ng of input DNA. However, the majority of plasma samples yield much less than 100 ng of DNA. To overcome this challenge, we added exogenous λ DNA (filler DNA) to the adapter-llgated cfDNA library In order to artificially inflate the amount of starting DNA to 100 ng (Figure 2). This minimizes the amount of non-specific binding by the antibody and also minimizes the amount of DNA lost due to binding to plasticware. The filler DNA consisted of ampllcons similar In size to an adapter-llgated cfDNA library and was composed of unmethylated and m vitro methylated DNA at different methylation levels (Figure 9 and Figure 10). The addition of this filler DNA also serves a practical use, as different patients will yield different amounts of cfDNA, allowing for the normalization of input DNA amount to 100 ng. This ensures that the downstream protocol remains exactly the same for all samples regardless of the amount of available cfDNA.

According to one aspect, there is provided a method of capturing ceH-free methylated DNA from a sample having less than 100 ng of ceil-free DNA, comprising the steps of: a. subjecting the sample to library preparation to permit subsequent ' sequencing of the cell-free methylated DNA; b. adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated; c denaturing the sample; and d. capturing cell-tree methylated DNA using a binder selective for methylated polynucleotides.

In some embodiments, this method further comprises the step of amplifying and subsequently sequencing the captured cell-free methylated DNA.

Various sequencing techniques are Known to the person skilled in the art, such as polymerase chain reaction (PGR) followed by Sanger sequencing. Abo available are next-generation sequencing (NGS) techniques, also known as high-throughput sequencing, which Includes various sequencing technologies including: Illumine (Solexa) sequencing, Roche 454 sequencing, Ion torrent Proton / PGM sequencing, SOLiD sequencing. NGS allow for the sequencing of DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing. In some embodiments, said sequencing is optimized for short read sequencing.

Cell-free methylated DNA is DNA that is circulating freely in the blood stream, and are methylated at various known regions of the DNA. Samples, for example, ptasma samples can be taken to analyze cell-free methylated DNA.

As used herein, 'library preparation" includes list end-repair, A-talBng, adapter ligation, or any other preparation performed on the cefl free DNA to permit subsequent sequencing of DNA.

As used herein, filler DNA" can be noncoding DNA or it can consist of ampllcons. DNA samples may be denatured, for example, using sufficient heat In some embodiments, samples have less than 50 ng of cell-free DNA.

In soma embodiments, the first amount of filler DNA comprises about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% methylated filler DNA. In preferred embodiments, the first amount of filler DNA comprises about 50% methylated filler DNA.

In some embodiments, the first amount of filler DNA is from 20 ng to 100 ng. In preferred embodiments, 30 ng to 100 ng of filler DNA. In more preferred embodiments 50 ng to 100 ng of fiPer DNA. When the cell-free DNA from the sample and the first amount of filler DNA are combined together, there comprises at least 50 ng of total DNA, and preferably at least 100 ng of total DNA.

In some embodiments, the filler DNA is 50 bp to BOO bp long. In preferred embodiments, 100 bp to 600 bp long; and In more preferred embodiments 200 bp to 600 bp long.

The filler DNA is double stranded. For example, the filler DNA can be Junk DNA. The filler DNA may also be endogenous or exogenous DNA. For example, the filler DNA is non-human DNA, and In preferred embodiments, λ DNA. As used herein, "λ DNA" refers to Enterobacteria phage λ DNA. In some embodiments, the filler DNA has no alignment to human DNA.

In some embodiments, the binder is a protein comprising a Methyl-CpO-blndlng domain. One such exemplary protein is MBD2 protein. As used herein, "Methyl-CpG- binding domain (MBD)" refers to certain domains of proteins end enzymes that is approximately 70 residues long and binds to DNA that contains one or more symmetrically methylated CpGs. The MBD of MeCP2, MBD1, MBD2, MBD4 and BAZ2 mediates binding to DNA, and In cases of MeCP2, MBD1 and MBD2, preferentially to methylated CpG. Human proteins MECP2, MBD1, MBD2, MBD3, and MBD4 comprise a family of nuclear proteins related by the presence in each of a methyl-CpG-binding domain (MBD). Each of these proteins, with the exception of MBD3, is capable of binding specifically to methylated DNA.

In other embodiments, the binder is an antibody and capturing cell-free methylated DNA comprises immunopredpltating the cell-free methylated DNA using the antibody. As used herein, Immurtoprecipitation" refers a technique of precipitating an antigen (such as polypeptides and nucleotides) out of solution using an antibody that specifically binds to that particular antigen. This process can be used to Isolate and concentrate a particular protein or DMA from a sample and requires that the antibody be coupled to a solid substrate at some point in the procedure. The solid substrate includes for examples beads, such as magnetic beads. Other types of beads and solid substrates are known in the art.

One exemplary antibody is 5-MeC antibody. For the Immunopreclpltatfon procedure, in some embodiments at least 0.05 pg of the antibody is added to the sample; while in more preferred embodiments at least 0.16 pg of the antibody is added to the sample. To confirm the rmmunoprecipitation reaction, in some embodiments the method described herein further comprises the step of adding a second amount of control DMA to the sample after step (b).

Another exemplary antibody is or 5-hydroxymethyl cytosfne antibody.

In other embodiments, the method described herein further comprises the step of adding a second amount of control DNA to the sample after step (b) for confirming the capture of cell-free methylated DNA.

As used herein, the "control* may comprise both positive and negative control, or at least a positive control.

According to a further aspect, there is provided use of the methods described herein for measuring a DNA methylation profile within the sample.

According to a further aspect, there is provided use of the methods described herein to identify the presence of cell free DNA from cancer cells wfthln the sample by correlating the profile with known methylation profiles of tumour tissue.

According to a further aspect, there is provided use of the DNA methylation profile as described herein for identifying tissue-of-orig ' m of the celWree DNA within the sample by correlating the profile with known methylation profiles of specific tissues.

In some embodiments, the use further comprising the use of described herein for Identifying tissue of origin of the cancer cells within the cell-free DNA within the sample.

According to a further aspect, there is provided the use described herein for monitoring immune therapy. According to a further aspect, there Is provided the use described herein for the diagnosis of autoimmune conditions.

According to a further aspect, there is provided the use described herein for determining cell turnover in a subject from which the sample is taken.

The following examples are illustrative of various aspects of the invention, and do not limit the broad aspects of the invention as disclosed herein.

EXAMPLES

Methods Donor Recruitment and Sample Acquisition

Pancreatic adenocarcinoma (PDAC) patient samples were obtained from the University Health Network BioBank; healthy controls were recruited through the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples collected with patient consent were obtained with Institutional approval from the Research Ethics Board, from University Hearth Network and Mount Sinai Hospital in Toronto, Canada.

Specimen Processing - Purified Tumor and Normal Cells

For primary PDAC samples, specimens were processed Immediately following resection and representative sections were used to confirm the diagnosis. Laser capture microdissection (LCM) of freshly liquid nitrogen-frozen tissue samples was performed on a Leica LMD 7000 instrument Briefly, frozen tissue maintained in vapor- phase liquid nitrogen was embedded in OCT cutting medium and sectioned in a cryotome into 8-um thick sections. Sections were mounted on PEN membrane slides (Leica) and Bghtly stained with hematoxylin to facilitate microscopic identification of tumor areas. LCM was performed on the same day when sections were cut to minimize nucleic acid degradation.

Mlcrodlaaected tumor cells were collected by gravity into the caps of sterile, RNAse- free microcentrifuge tubes. Approximately 150,000-200,000 tumor cells were collected for DNA sample and stored at -80 'C until further processing. LCM typically took 1-2 days per case to collect sufficient amounts of purified tumor cells. Q lag en Cell Lysis Buffer was used to extract genomic DNA. Matched normal, histologically reviewed reference tissue was collected for each patient from frozen duodenal or gastric mucosa by scraping unstained frozen sections on glass slides Into the appropriate DNA extraction buffer. Specimen Processing - cfDNA

EDTA and ACD plasma samples were obtained from the BloBank and from the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples were either stored at -BO'C or in vapour phase Gquid nitrogen until use. Cell-free DNA was extracted from 0.5-3.5 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen). The extracted DNA was quantified through Qublt prior to use.

Specimen Processing - PDX cfDNA

Human colorectal tumor tissue obtained with patient consent from the University Health Network Btobank as approved by the Research Ethics Board at University Health Network, was digested to single cells using coilagenase A. Single cells were subcutaneousiy injected into 4-6 week old NOD/SCID male mouse. Mice were euthanized by C02 Inhalation prior to blood collection by cardiac puncture and stored in EDTA tubes. From the collected blood samples, the plasma was isolated and stored at -8QC. Cell-free DNA was extracted from 0.3-0.7 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen). All animal work was carried out in compliance with the ethical regulations approved by the Animal Care Committee at University Hearth Network.

RRBS

Genomic DNA extracted from the LCM-enriched tumor and normal samples coming from the same patients, for who the cell-free DNA had been obtained, was subjected to RRBS following the protocol from Qu et al., 2011 1B with minor modifications. Briefly, 10 ng of genomic DNA, determined through Qubit, was digested using restriction enzyme Msp\, then subjected to end-repair, A-tafllng and adapter ligation to lllumina TruSeq methylated adapters. The prepared libraries were then subjected to bisulfite conversion using the Z!ymo EZ DNA methytatioh kit following manufacturer's protocol. followed by gel size selection for fragments of 160 bp-300 bp. The optimal number of cycles to amplify each purified library was determined through the use of qPCR, after which the samples were amplified using the KAPA HiFI Uracil* Mas term ix (Kapa Biosystems) and purified with AMPure beads (Beckman Coulter). The final libraries were submitted for BloAnalyzer analysis prior to sequencing at the UHN Princess Margaret Genomic Centre In an lllumina HlSeq 2000. Preparation of exogenous Enterebactaria phaoe λ PCR product

Enterobacteria phage λ DNA (ThermoFischer Scientific) was amplified using the primers Indicated in Table 1 , generating 6 different PCR am pi icons products. The PCR reaction was carried out using KAPA HIFI Hotstart ReadyMix with the following condition: activation of enzyme at 95'C for 3 mln, 30 cycles of 98 ' C for 20 sec, 60'C for 15 sec, 72'C for 30 sec and a final extension at 72'C for 1 min. The PCR amplicons were purified with QlAQuick PCR purification kit (Qiagen) and ran on a gel to verify . size and amplification. Amplicons for 1CpG, SCpG, 10CpG, 15CpG and 20CpGL were methylated using CpG Methy!transferase (M.Sss\) (ThermoFischer Scientific) and purified with the QlAQuick PCR purification kit Methytatton of the PCR amplicons was tested using restriction enzyme HpyCHffl (New England Biotabs Canada) and ran on a gel to ensure its methylation. The DNA concentration of the unmethylated (20CpGS) and methylated (1CpG, 5CpG, 10CpG, 15CpG, 20CpGL) amplicons was measured using picogreen prior to pooGng with 50% of methylated and 50% unmethylated λ PCR product. cfMeDIP-scq

A schematic representation of the cfMeDIP-seq protocol Is shown in Figure 2. Prior to cfMeOIP, the DNA samples were subjected to library preparation using the Kapa Hyper Prep Kit (Kapa Bfosystems). The manufacturer protocol was followed with some modifications. Briefly, the DNA of interest was added to 0.2 mL PCR tube and subjected to end-repefr and A-Tafflng. Adapter ligation was followed using NEBNext adapter (from the NEBNext Multiplex Oitgos for lllumina kit, New England Bioiabs) at a final concentration of 0.181 μΜ, Incubated at 20'C for 20 mine and purified with AMPure XP beads. The eluted library was digested using the USER enzyme (New England Bioiabs Canada) followed by purification with Qiagen MinElute PCR Purification Kit prior to MeDIP.

The prepared libraries were combined with the pooled methylated/unmethylated λ PCR product to a final DNA amount of 100 ng and subjected to MeDIP using the protocol from TaJwo et al. 2012 17 with some modifications. For MsDIP, the Diagenode MagMeDIP kit (Cat* C02010021) was used following the manufacturer's protocol with soma modifications. After the addition of 0.3 ng of the control methylated and 0.3ng of the control unmethylated A. thaHana DNA, the filler ONA (to complete the total amount of DNA [cfDNA + Filler + Controls] to 100ng) and the buffers to the PCR tubes containing the adapter Dgated DNA, the samples were heated to B5'C for 10 rnins, then immediately placed into an ice water bath for 10 mins. Each sample was partftioned into two 0.2 mL PCR tubes: one for the 10% Input control and the other one for the sample to be subjected to immunopreclpRation. The Included 5-mC monoclonal antibody 33D3 (Cat#C15200081) from the MagMeDIP kit was diluted 1:15 prior to generating the diluted antibody mix and added to the sample. Washed magnetic beads (following manufacturer instructions) were also added prior to Incubation at 4'C for 17 hours. The samples were purified using the Diagenode iPure Kit and edited In SO μΐ of Buffer C. The success of the reaction (QC1) was valdated through qPCR to detect the presence of the spiked-in A. thaliana DMA, ensuring a % recovery of unmethylated splked-ln DNA <1% and the % specificity of the reaction >99% (as calculated by 1- [recovery of spKed-in unmethylated control DNA aver recovery of splked-ln methylated control DNA]}, prior to proceeding to the next step. The optimal number of cycles to ampDfy each library was determined through the use of qPCR, after which the samples were amplified using the KAPA HJF1 Hotstart Mastermix and the NEBNext multiplex oligos added to a final concentration of 0.3 μΜ. The PCR settings used to amplify the libraries were as follows: activation at 95 * C for 3 mln, followed by predetermined cycles of 98'C for 20 sec, 65'C for 15 sec and 72'C for 30 sec and a final extension of 72'C for 1 min. The amplified libraries were purified using MmElute PCR purification column and then gel size selected with 3% Nusieve QTO agarose gel to remove any adapter dimers. Prior to submission far sequencing, the fold enrichment of a methylated human DNA region (testls-spedflc H2B, TSH2B) and an unmethylated human DNA region (GAPDH promoter) was determined for the MeDIP-eeq and cfMeDlP-seq libraries generated from the HCT116 cell fine DNA sheared to mimic ceD free DNA (Cell line obtained from ATCC, mycoplasma free). The final libraries were submitted for BioAnaryzor analysis prior to sequencing at the UHN Princess Margaret Genomic Centre on an lllumina HiSeq 2000. Differ™ % of methvlatton in the filter DMA cfMeDIP-seq was performed using different % of methylated to unmethylated lambda DMA In the filler component of the protocol as follows:

As shown in Figures 9 and 10, the filler DNA (lambda DNA) used to Increase the final amounts prior to immunoprecipitatJon to 100ng. should preferably have some artificially methylated DMA In Its composition (from 100%-15%) in order to have the minimal recovery unmethylated DNA (Figure 9), while stHl getting a good yield in terms of recovery of methylated DNA (Figure 10). In the samples where we have 100% unmethylated filler DNA or no filler DNA present, although there is really higher recovery of methylated DNA, we also have high % recovery of unmethylated DNA. This shows that the additional methylated DNA in the filler DNA helps to occupy the excess antibody present in the reaction, minimizing the amount of uns pacific binding to unmethylated DNA found in the sample. Given mat optimizing antibody amounts is not very economical or even feasible In cases where different cell-free DNA samples are used, as It is unknown how much methylated DNA Is present throughout the sample and this could differ drastically sample to sample, this filler DNA helps normalize the different starting amounts and allow for different cell-free DNA samples to be processed the same way (l.e use same amount of antibody), while still recovering good methylation data from ft. ULfra-deep targeted saguandna for point mutation detection

We used the QIAgen Circulating Nucleic Add kit to isolate celWree DMA from -20 mL of plasma (4-Sx 10mL EDTA blood tubes) from patients with matched tumor tissue molecular profiling data generated prior to enrolment in early phase clinical trials at the Princess Margaret Cancer Centre. DMA was extracted from cell lines (dilution of CRC and MM cell lines) using the Pure Gene Gerrtra kit, fragmented to -180 bp using a Covaris sonlcator. and larger size fragments excluded using Ampure beads to mimic the fragment size of cell-free DMA. DMA sequencing libraries were constructed from 83 ng of fragmented DMA using the KAPA Hyper Prep Kit (Kapa Biosystems, WOmtngton. MA) utilizing NEXTflex-96 DMA Barcode adapters (Bk> Scientific, Austin, TX) adapters. To isolate DMA fragments containing known mutations, we designed biotJnylated DMA capture probes (xQen Lockdown Custom Probes Mini Pool, Integrated DNA Technologies, CorahriQe, LA) targeting mutation hotspots from 48 genes tested by the clinical laboratory using the lllumlna TruSeq AmpRcon Cancer Panel. The barcoded libraries were pooled and then applied the custom hybrid capture library following manufacturer's Instructions (IDT xQEN Lockdown protocol version 2.1). These fragments were sequenced to >10,000X read coverage using an Mumlna HiSeq 2000 instrument Resulting reads were aligned using bwa-mem and mutations detected using samtoote and muTect version 1.1.4. Modelling relationships between number of tumor-SDeciBc features and probablltv of detection bv sequencing depth

We created 145,000 simulated genomes, with file proportion of cancer-specific methylated DMRs set to 0.001%, 0.01%, 0.1%, 1%, and 10% and consisting of 1, 10, 100, 1000 and 10000 independent DMRs respectively. We sampled 14,500 diptoft. genomes (representing 100ng of DNA) from these original mixtures and further sampled 10, 100, 1000, and 10000 reads per locus to represent sequencing coverage at those depths. This process was repeated 100 times for each combination of coverage, abundance, and number of features. We estimated the frequency of successful detection of at least 1 DMR for each combination of parameters and plotted probability curves (Figure 1A) to visually evaluate the influence of the number of features on the probability of successful detection conditional on sequencing depths. Calculation and Visualization of Dffferentialtv Methylated Regions from cfDNA of Pancreatic Cancer patterns and Healthy Donors

Differentially Methylated Regions (DMRa) between cfDNA samples from 24 Pancreatic Cancer (PC) patients and 24 Healthy Donors were calculated using the MEDIPS R package 21 . For each sample, the BAM alignment (to human genome hg19) files were used to create MEDIPS R objects. Next, DMRe were calculated by comparing the RPKMs from the two sets of samples using t-tests. The raw p-valuea from the t-tests were adjusted using the Benjaminl-Hochberg procedure. DMRs were then defined as an the windows with adjusted p-values less than 0.1; 38,085 total DMRs were found: 6,651 Hyper in Pancreatic Cancer patients and 31,544 Hypo. The scaled RPKM values from these DMRs were presented as a heabnep {Figure 5C). This heatmap was made with the distance function "eucfidean", and the clustering function "ward" for columrvwise clustering and "average" for row-wise clustering.

Comparison of RRBS samples from 24 Pancreatic Cancer tissues and S Normal PBMCs

Five normal PBMC samples profiled by RRBS were downloaded from GEO (all control samples under Accession ID GSE89473) to compare their methylatfon profiles to those of 24 Pancreatic Cancer tissue RRBS samples. Downloaded bed files were parsed and processed wfth the R methylK!t package 2 *. These five samples were next compared to similarly processed RRBS samples from 24 Pancreatic Cancer patients. Custom functions were used to extract CpGs that were present m at least 18 of the 24 PC samples, and 4 of the 5 PBMC samples, and only the CpGs in autosomes were retained, to yield a Background set of 1,806,808 CpGs. From these, DMCs were obtained using the criteria of Benjarnini-Hochberg adjusted p-value < 0.01 and Delta Beta > 0.25, and 134,021 DMCs were found to be Hyper in Pancreatic Cancer compared to PBMCs. Analogously, using the same q-value cutoff and Delta Beta < - 0.25, we obtained 179.662 Hypo DMCs. The total of 313,683 DMCs are represented by the red posits in the corresponding volcano plot (Figure 7F), In which the negative Iog10 of the q-values are plotted against the Delta Betas (the horizontal line at negative Iog10 q-value = 2 represents the q-value cutoff for calling DMCs, and the dotted vertical lines represents the Delta Beta cutoffe). Assessment of overlap of differential methvlatlon signals from Primary Tumors vs Normal PBMCs and from cfDNA of Pancreatic Cancer patients and Healthy Donors

Permutation analysis was carried out to compare the frequency of expected versus the observed overlap between the DMRs identified in the plasma (with circulating cfDMA subjected to our cfMeOlP-eeq protocol) and the cancer-specific DMCs Identified in the primary tumor tissue (with RRBS). We examined four possible cases: Hyper DMCs overlapping with Hyper DMRs, Hyper DMCs with Hypo DMRs, Hypo DMCs with Hypo DMRs, and finally, Hypo DMCs with Hyper DMRs. For each case, the Hyper or Hypo DMCs were overlapped with the Hyper or Hypo DMRs to get the number of "biological Intersections"; each set of DMCs was then randomly shuffled across the Background set of 1,808,808 CpOs 1000 times, and overlapped again with each set of the DMRs. These random and biological Intersections were put on the same scale using Z-scores and are shown with boxplots and diamonds, respectively (Figure 5E). The dashed horizontal lines in these plots represent the cutoff Z-scores associated with a Bonferronl adjustment-derived q-vslue of 0.05.

Comparison of RRBS samples from 24 Pancreatic Cancer tissues and 24 Normal tissues & Assessment of overlap of differential methvlatlon atonal from these tissues and from cfDNA of Pancreatic Cancer patients and Healthy Donora

The 24 PC samples that were compared to 5 Normal PBMC samples were also compared separately to 24 normal tissues from the same patients. The Background set (783,874 CpGs) and DMCs Hyper & Hypo In PC (34,013 & 11,160 respectively) were calculated using the same methodology, and these were used to construct a volcano plot (Figure 7C) & boxplots (Figure 5D) in the same manner as weH.

PCA plots on 24 PC and 24 Healthy cfDNA samples We performed unsupervised clustering analysis with PCA (Figure 7A-B) on the 24 PC and 24 Healthy cfDNA samples using the top million most variable genome-wide windows. For each window, variability was calculated using the MAD (Mean Absolute Deviation) metric. This Is a robust measurement that returns the median of the absolute deviations from the data's median value, where the data is the RPKM values across these 43 samples for a given window. Heatmaps with GTEx Expression Profiles of TFs associated with motifs hypomethylated in 24 PC and 24 Healthy cfDMA samples

RNA-Seq data was obtained from the GTEx database in the form of median RPKMe by tissue for ail human genes (obtained from file GTEx.j^natysia_vep_RNA-eeq_RNA- SeQCv1.1.8jjeneji^ian_roton.gct.gz under httpsy/gtexportaLorg/home/datBsets). TFs of Interest were matched to their gene names, and heatmaps (Figure 8A, 8C) were constructed with the median RPKMs of each TF scaled across all tissues. The distance function "manhattan" and clustering function "average" were used for both row-wise and column-wise clustering. Violin Plots with GTEx Expression Profiles of TFs associated wfth motifs hvpomethvlated In 24 PC and 24 HaaHhv cfDMA samples

In orderto estimate if the TFs for which we detected significantly enriched motifs in hypomethylated regions in cases versus controls were significantly upregulated in pancreatic cancer samples, we used a randomisation test wfth the ssGSEA score as the test statistic. For each sample, we computed the scores using the 85 TFs found significantly associated with hypomethylated motifs, and 1 ,000 random sets of 85 TFs (the list of all human TFs was obtained from file TFCheckpolnt_download_1805l5.txt under http^/www.tfchieokpointcxy/data/); expression levels from 178 pancreatic adenocarcinoma patients on TCGA were used.

The distribution of these scores can be seen m the associated vloBn plots (Figure 8E).

A Wrlcoxon's Rank Sum test was then used to compare the random distribution versus the observed distribution, yielding a p-value < 2.2e-16.

The same analysis was done on the GTEx data with normal pancreas (Figure 8D). The analysis was also repeated with TFs (n=33) whose motifs were Identified as hypomethylated footprints in the plasma cfDNA from healthy donors, on the GTEx data wfth whole blood (Figure 8B).

Results/Discussion

A aenome-wide method suitable for cfDMA methvlation mapping

The cfMeDIP-seq method described here was developed through the modification of an existing low input MeDIP-seq protocol 17 that Is robust down to 100 ng of input DNA. However, the majority of plasma samples yield much lees than 100 ng of DMA. To overcome this challenge, we added exogenous λ DNA (filler DMA) to the adapter- ligaied cfONA library in order to artificially Inflate the amount of starting DNA to 100 ng (Figure 2). This minimizes the amount of non-specific binding by the antibody and also minimizes the amount of DNA lost due to binding to plastJcware. The filler DNA consisted of amplicons similar in size to an adapter-Hgated cfDNA library and was composed of unmethyiated and in wfro methylated DNA at different CpG densities. The addition of this filler DNA also serves a practical use, as different patients will yield different amounts of cfDNA. alowing for the normalization of input DNA amount to 100 ng. This ensures that the downstream protocol remains exactly the same for all samples regardless of the amount of available cfDNA.

We first validated the cfMeDIP-seq protocol using DNA from human colorectal cancer cell line HCT116, sheared to a fragment size similar to that observed in cfDNA. HCT116 was chosen because of the avallabBity of public DNA methyiatfon data. We simultaneously performed the gold standard MeDIP-aeq protocol 17 using 100 ng of sheared cell line DNA and the cfMeDJP-seq protocol using 10 ng. 5 ng, and 1 ng of the same sheared cell line DNA. This was performed In two biological replicates. For all the conditions, we obtained more than 98% specificity of the reaction (1- [recovery of splked-ln unmethyiated control DNA over recovery erf spiked- In methylated control DNA]), and a very high enrichment of a known methylated region over an unmethyiated region (TSH2B0 and GAPDH, respectively) (Figure 3B).

The libraries were sequenced to saturation (Figure 3A) at around 30 to 70 million reads per library (Table 2). The raw reads were aligned to both the human genome and the λ genome, and found virtually no alignment was found to the λ genome (Table 3A and 3B). Therefore, the addition of the exogenous λ DNA as filler DNA did not interfere with the generation of sequencing data. Finally, we calculate the CpG Enrichment Score as a quality control measure for file immunoprecipitatlon step 36 . All the libraries showed similar enrichment for CpGs while the Input control, as expected, showed no enrichment (Figure 3C). validating our ImmunopredpitatJons even at extremely low Inputs (1 ng).

Genome-wide correlation estimates comparing different input DNA levels show that both MeDIP-seq (100 ng) and cfMeDIP-seq (10, 5, and 1 ng) methods were very robust, with Pearson correlation of at least 0.94 between any two biological replicates (Figure 1 B). The analysis also demonstrates that cfMeDIP-seq at 5 and 10 ng of input DNA can robustly recapitulate the methytation profile obtained by traditional MeDIP- seq at 100 ng (Palrwtee Pearson correlation of at least 0.9) (Figure 1B). The performance of cfMeDIP-seq at 1 ng of input DNA is reduced compared to MeDJP-seq at 100 ng but still shows a strong Pearson correlation at >0.7 (Figure 1B). We also observed that the cfMeDIP-seq protocol recapftu!ates the DNA methytation profile of HCT116 using gold standard RRBS (Reduced Representation Bisulfite Sequencing) and WGBS (Whole-Genome Bisulfite Sequencing) (Rgiire 1C). Altogether, our data suggests that cfMeDIP-eeq is a robust protocol for genome-wide methytation mapping of fragmented and km input DNA material, such as circulating cfDNA. cfMeDIP-seq displays hloh^nafflvitv far detection of tumor-derived ctDNA

To evaluate the sensitivity of the cfMeDIP-seq protocol, we performed a serial dilution of Colorectal Cancer (CRC) HCT116 cell line DNA into a Multiple Myeloma (MM) MM1.S cell line DNA, both sheared to mimic cfDNA sizes. We diluted the CRC DNA from 100%, 10%, 1%, 0.1%, 0.01%, 0.001%, to 0% and performed cfMeDIP-seq on each of these dilutions (Figure 4A-D). We also performed ultra-deep (10.000X median coverage) targeted sequencing for detection of three point mutations in the same samples. The observed number of DMRs identified at each CRC dilution point versus the pure MM DNA using a 5% False Discovery rate (FDR) threshold was almost perfectly linear (Γ*»0.99, p<0.0001) with the expected number of DMRs based on the dilution factor (Figure 1D) down to a 0.001% dilution. Moreover, the DNA methytation signal within these DMRs also shows almost perfect linearity (Γ*><0.99, p<0.0001) between the observed versus expected signal (Figure 1E). In comparison, beyond the 1% dilution, ultre-deap targeted sequencing could not reliably distinguish between the CRC-specific variants and the spurious variants due to PCR or sequencing-errors (Figure 1F). Thus, cfMeDIP-seq displays excellent sensitivity for the detection of cancer-derived DNA, exceeding the performance of variant detection by ultra-deep targeted sequencing using a standard protocol.

Cancer DNA is frequently hypermethylated at CpO-rich regions 1 . Since cfMeDIP-seq specifically targets methylated CpG-rksh sequences, we hypothesized that ctDNA would be preferentially enriched during the immunoprecipitation procedure. To test this, we generated patient-derived xenografts (PDXs) from two colorectal cancer patients and collected the mouse plasma. Tumor-derived human cfDNA was present at less than 1% frequency within the total cfDNA pool in the input samples and at 2- fold greater abundance following immunoprstipitatfon (Figure 1G). These results suggest that through biased sequencing of ctONA. the cfMeDIP procedure could further Increase ctDMA detection sensitivity. Methvlorne analysis of Plasma cfDNA distinguishes eariv stage pancreatic adenocarcinoma patients from healthy donors

We sought to Investigate whether methytome analysis of plasma cfDNA could be used to detect ctDNA In early stage cancer. We performed the methylome analysis in the pre-surgery plasma of 24 early stage pancreatic cancer patients (cases) and 24 age and sex-matched healthy donors (controls) (Tables 4A, 4B and 5). For each patient laser-capture microdiesected (LCM) tumor samples with high tumor purity and normal tissue samples were examined. cfMeDIP-eeq was performed on the circulating cfDNA and RRB8 on the tumor and normal tissues (Figure 5A and Figure 6, Tables SA and 66). Using a Meat and Benjamlnl-Hochberg correction for multiple testing, we obtained 38,085 DM Re (p<0.01 , q<0.1 ) between the cases and controls cfDNA (Figure 5B-C).

In order to evaluate whether the differences in the cfDNA methytetfon profiles between cases and controls were due to the presence of ctDNA, the DMA methylation patterns of the primary tumors and normal tissue, obtained from the same patients after surgical resection, were mapped using RRBS. We Identified 45,173 differentially methylated CpGs (DMCs) between tumors (n-24) versus normal (n«24) tissues (Figure 7A-C).

The utility of ctDNA methylation profiles in recapitulating methylafion profiles of their original tumor was tested by examining combinations of DMCs in tumors and DMRs in cfDNA (rryperrnethylated in both, hypornethylated in both, hypermethylated in one and hypornethylated in the other) for enrichment relative to the background. We observed significant enrichment for tumor-specific hypermethyated and hypornethylated eites in the concordant direction in cfDNA, while tumor-specific hypermethylated sites were under-represented in cfDNA hypornethylated DMRs (Figure 5D). Indeed, there to a correlation between the DNA methylation status for a given region In the tumor end the methylation profile in the plasma cfDNA (Figure 7D-E).

Finally, since the majority of the plasma cfDNA molecules in cancer patients, especially at early stage, are non-tumor-derived and likely released from blood cells 14 , we evaluated the DNA methyl atton differences between the pancreatic adenocarcinoma tumor tissue against normal Peripheral Blood Mononuclear Cells (PBMCs). We Ident!fied 313,683 DMCs between tumors (n=24) versus PBMCs (n-5) (Figure 7F). We observed significant enrichment for tumor-specific hypermethyatad and hypomathyiated sites in the concordant direction in cfDNA, while tumor-specific hyperrnethylated sites were under-represented in cfDNA hypometnylated DMRs (Figure SEE). Again, there ' * a correlation between the DNA methytation status for a given region in the tumor and the methylation profile in the plasma cfDNA (Figure 7G- H). Altogether, these results suggest that the difference In the circulating cfDNA methylation profile between cases and controls was largely due to the presence of tumor-derived DNA in the circulating system (Figure 6D-E and Figure 7C-H).

Plasma cfDNA mathvlomea Permit inference of tumor-associated active transcription factor networks Since the DMRs between cases and controls were highly enriched for tumor-derived DMRs (Figure 5D-E), we hypothesized that cfDNA methyiomes would reveal enrichment for motifs related to tumor-specific or tissue-related active transcription factors. These cfDNA methyiomes could be used to Infer active transcriptional networks In the tissue-of-origln of these DNA molecules. To infer the active transcriptional networks, we investigated whether the DMRs in cfDNA could uncover enrichment for transcription factor (TF) footprints, as the majority of TFs display variable binding based on DNA-methylation states of target sequences 21 . Motif analysis was carried out with the HOMER software 30 on the hypomethytated DMRs 30 , separately for healthy donors (Figure 8A) and pancreatic cancer patients (Figure 8C), to uncover potential TF footprints.

We identified 33 motifs as hypometnylated footprints in the healthy donors as compared to the pancreatic adenocarcinoma cases and 85 motifs as hypomethytated footprints hi the pancreatic adenocarcinoma cases as compared to the healthy donors.

Out of the 33 motifs identified as hypomathyiated footprints in the healthy donors, we Identified several TFs preferentially expressed In the hematopoietic lineage, including PU.1, FIH , STAT5B, and KLF1 (Figure 8A-B). Similarly, out of the 85 motifs identified as hypo methylated footprints In the pancreatic adenocarcinoma cases, we id entitled several TFs preferentially expressed In the pancreas, including RBPJL, PTF1a, Onecutl (HNF6), and NR5A2 (Rgure 8C-D). The TF motifs identified as rrypomethylated footprints In the pancreatic adenocarcinoma cases were also frequently overexpressed In pancreatic adenocarcinoma patients from TCGA (Figure 8E). Furthermore, we were able to Identify several hypometnylated footprints in the pancreatic adenocarcinoma cases that correspond to TFs previously identified as drivers of each molecular subtypes of pancreatic cancer 24 . These included c-MYC and HIF1a (Squamous subtype drivers), NR5A2, MAFA, RBPJL, and NEUROD1 (AD EX drivers) and finally FOXA2 and HNF4A (pancreatic progenitor subtype).

Altogether, these results suggest that methyiome analysis of circulating cfDMA can be used to infer active transcriptional networks within the tumor based on the differentially methylated TF footprints and potentially Identify systemic shifts in Immune cell populations between healthy donors and cancer patients.

Here we present a novel genome-wide DNA methytation method suitable for ultra-low input and fragmented DNA, such as circulating cell-free DNA. We were able to show that cfMeDlP-seq Is very robust at low level of input DNA and allows for rapid generation of libraries. Moreover, since our method relies on the enrichment of methylated DNA, to sequence the libraries to saturation required only around 30 to 70 mBBon reads , per library, making whole genome sequencing unnecessary and significantly decreasing the associated cost The rapid turnaround time in addition to the relatively small cost may allow for a quick translation of cfMeDlP-seq to a clinical setting. Moreover, since cfMeDIP-eeq relies on epigenetic, rather than genomic information, It could potentially be used to non-invasively monitor tissue damage h a broad set of non-malignant diseases. For instance, it could be used to monitor immune responee to an infection or after cancer immunotherapy; It could be used to monitor heart DNA in the circulation after myocardial infarction or brain DNA during early stages of neurodegenerative diseases.

Finally, In the context of oncology, multiple cancer types have shown to have clinically distinct subgroups. These subgroups can be stratified by different DNA methytation profiles with prognostic value In glioblastoma', ependymomas 4 , colorectal 5 , breast"' 7 , and pancreatic cancer 14 among many other cancer types. Recent data suggests that pancreatic cancer patients can be stratified into four subgroups driven by several mechanisms 24 : squamous, pancreatic progenitor, Immunogenic and aberrantly differentiated endocrine exocrine (AD EX). In the circulating cfDNA methylome of pancreatic cancer patients, we were able to identify the hypomethylated footprints from TFs that drive these subtypes. For instance, we Identified MYC and HIF1 alpha (Hypoxia-Inducible factor 1 -alpha), two pathways enriched In the squamous subtype 24 . We were also able to Identify HNF4A and FOXA2; two TFs enriched in the progenitor subtype 34 . Finally, we were able to identify NR5A2, RBPJL, and MAFA, three TFs enriched in the ADEX subtype". This suggests that cfMeOIP-seq could also be used as a btomarker to stratify cancer patients with a minimally invasive approach.

The Invention has been described with regard to specific embodiments. It will be apparent to a person skilled in the art that variations and changes may be made while keeping within the spirit and scope of the invention. Specific embodtnents disclosed herein are not Intended to limit the scope of protection, which should be determined solely by the claims. AH publications and references disclosed herein are incorporated in their entirety by reference.

Tables

Table 1 : PCR primers used to generate Enterobactaria phage λ PCR product from Tahvo et el., 2012

Table SB: Number of roads and mapping efficiency of sequenced cfMeDIP-seq libraries

Table s: Pathology of adenocarcinoma of pancreas case samples

References

1 Sharma, S., Kelly, T. K. & Jones, P. A. Eplgenetics In cancer. Carcinogenesis 31, 27-36, doi:10.1093A»rcin/bgp220 (2010).

2 V!ailey, K. E. ef a/. Dynamic DNA methylatton across diverse human cell lines and tissues. Genome Res 23, 555-567. doi:10.1101/gr.l47942.112 (2013).

3 Sturm, D. ef al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 22, 425-437, doi:10.1016/].ccr.20l2.08.024 (2012).

4 Mack. S. C. ef a/. Eplgenomic alterations define lethal CIMP-posItrve ependymomas of Infancy. Nature 506, 445-450, dol:10.1038/nature1310B (2014).

5 Hinoue, T. et al. Genome-scale analysis of aberrant DNA methylatlon in colorectal cancer Qenome Raa 22, 271-282, dol:10.1101/gr.117523.110 (2012).

6 StJrzaker, C. ef el. Methyioma sequencing In triple-negative breast cancer reveals distinct methyiatjon dusters with prognostic value. Nat Commun 6, 5899, dol:10.1038/ncomm86899 (2015).

7 Fang, F. ef al. Breast cancer methylomee estabHsh an epigenomlc foundation for metastasis. Sci Trans! Med 3, 75ra25, doi:10.1126/scftranslmed.3001875 (2011).

8 Mikeska. T. & Craig, J. M. DNA methylation biomarkers: cancer and beyond.

Genes (Basel) 6, 821-864, dof:10.3390/genes5030821 (2014).

Θ Diaz, L. A., Jr. & Bardeffl, A. Liquid biopsies: genotyping circulating tumor DNA.

J Clin OnccV 32, 579-586, dol:10.1200/JCO.2012.45.2011 (2014).

10 Snyder, T. M., Khush, K. K., Valantine, H. A. & Quake, S. R. Universal noninvasive detection of solid organ transplant rejection. Pnx Natl Acad Sci U SA 108, 6229-6234, doi:10.1073/pnas.1013924108 (2011).

11 Chiu, R. W. ef al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci U S A 105, 20458-20463, doi:10.1073/pnas.0B10641105 (2008).

12 Fan, H. C, Blumenfetd, Y. J., ChKkara, U., Hudglns, L & Quake, S. R.

Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Nail Aced Sci U S A 105, 16266-16271, doi:10.1073/pnas.08083igi05 (2008).

13 Newman, A. M. ef al. An ultrasensitive method for quantjtating circulating tumor DNA with broad patient coverage. Nat Med 20, 548-554, doM0.1038mm.3519 (2014).

14 Aravanls, A. M., Lee, M. & Klausner, R. D. Next-Generation Sequencing of

Circulating Tumor DMA for Early Cancer Detection. Co// 168, 571-674, doi:10.1016/].cell.2017.01.030 (2017).

15 Hoadtay, K. A. ef a/. Multiplatfonn analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158, 929-944, dol:10.l0l8/J.celi.2014.06.049 (2014).

16 Fleischhacker, M. & Schmidt, B. Circulating nucleic acids (CNAs) and cancer-a survey. Bkxhim BiophysActa 1775, 181-232, ctoi:10.1016fi.bbcan.2006.10.001 (2007).

17 Talwo, O. ef al. Methyiome analysis using MeDIP-seq with low DNA concentrations. NatPmtoc 7, 617-636, doi:10.l038/nprot2012.012 (2012).

18 Gu, H. ef a/. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylatlon profiling. Nat Protoc 8, 468-481, doi:10.1038/hprot.20l0.190 (2011).

19 Hung, E. C, Chlu, R. W. & Lo, Y. M. Detection of circulating fetal nucleic acids: a review of methods and applications. J Clin Pathol 62, 308-313, dof:10.1l36/lcp.2007.048470 (2009).

20 Heinz, S. ef al. Simple combinations of lineage-determining transcription factors prime ds-regulatory elements required for macrophage and B cell

1 identities. Mo/Ce/738, 576-589, doi:10.101 e/|.rno]c6i.2010.05.004 (2010). 21 Consortium, G. T. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: murtrrJssue gene regulation in humans. Science 348, 648-660, dol:10.1126/science.1262ll0 (2015).

22 Su, A. I. ei a/. A gene atlas of the mouse and human protein-encoding transcriptomes. Prxx Net! Acad Scf U S A 101, 6062-6067, doi:l0.l073/pnas.0400782101 (2004).

23 Wu, C, Jin, X., Tsueng, G., Afrasiabi, C. & Su, A. I. BloGPS: building your own mash-up of gene annotations and expression profiles. Nucleic Acids Res 44, D313-316, dol;10.1Q93/nar/gkv1l04 (2016).

24 Bailey, P. ef al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature, dol:10.1038mature16965 (2018).

25 Lienhard, M., Grimm. C, Markel, M., Herwig, R. & Chavez, L. MEDIPS: genome-wide differential coverage anatysle of sequencing data derived from DNA enrichment experiments. Bioinformatics 30, 284-286, doi:10.1093^ioInformatJcs^tt650 (2014).

26 AkaBn, A. ef al. mathylKtt: a comprehensive R package for the analysis of genome-wide DNA methylatlon profiles. Genome Biol 13, R87, doi: 10.1186/gb- 2012-13-10-Γ87 (2012).

27 Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylatton caller for Biaultlte-Seq applications. BlolnformaVca 27, 1571-1572, doi:l0.lomioinfomiatk»/btr167 (2011). 28 Hu, S. ef al. DNA methyl Htton presents distinct binding situ for human transcription factors. EHfe 2, ©00726, do): 10.7554/eUfe.00726 (2013).

29 Lul, Y. Y. ef a/. Predominant hematopoietic origin of cell-free DNA In plasma end serum after sex-mismatched bone marrow transplantation. Clin Cham 48, 421-427 (2002).

30 Snyder, M. W., Klrcher, M., Hill, A. J., Daza. R. M. & Shendure, J. CelMree DNA Comprises an In Vivo Nucleosome Footprint that Informs Hs Hssues-Of-Origin. Cell 164, 57-68, dOi:10.1016/j.cell.20l5.11.050 (2016).