Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHYLATION MARKERS FOR CANCER
Document Type and Number:
WIPO Patent Application WO/2009/079208
Kind Code:
A2
Abstract:
The present invention is directed toward methods useful in the diagnosis of cancers such as colorectal cancer. Such methods can also be useful in determining the susceptibility to or the stage of cancer. The invention discloses CpG sites in the PERl, ZNF 145, and MN/CA6 genes that are differentially methylated in cancer. Generally, the inventive methods involve determining the extent of methylation at one or more differentially methylated CpG sites in a subject or a sample obtained from a subject and determining presence of, susceptibility to, and/or stage of cancer based on the extent of methylation at such CpG sites.

Inventors:
HARVEY JEANNE (US)
MUNNES MARC (DE)
Application Number:
PCT/US2008/085351
Publication Date:
June 25, 2009
Filing Date:
December 03, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SIEMENS HEALTHCARE DIAGNOSTICS (US)
HARVEY JEANNE (US)
MUNNES MARC (DE)
International Classes:
C12Q1/68
Foreign References:
US20020137086A1
US20070178458A1
US5756668A
Attorney, Agent or Firm:
SIEMENS CORPORATION (170 Wood Avenue SouthIselin, NJ, US)
Download PDF:
Claims:

Claims

What is claimed is:

1. A method comprising steps of:

(a) detecting, in a subject, the methylation state of at least one CpG site in the ZNF 145 gene; and

(b) determining, based on the detected methylation state, that the subject has or is susceptible to cancer, or that the subject has a particular stage of cancer.

2. The method of claim 1, wherein the cancer is colorectal cancer.

3. The method of claim 1 , wherein the cancer is breast cancer.

4. The method of claim 1, wherein the detected methylation state in step (b) is hypermethylation.

5. The method of claim 1, wherein at least one CpG site in step (b) is from the promoter region or first exon of ZNF 145.

6. The method of claim 1, wherein at least one CpG site in step (b) is a CpG site corresponding to a position selected from the group consisting of positions 1963, 2008, 2023, 2063, 2074, 2083, 2091, 2110, 2113, 2120, and 2154 in SEQ ID NO: 2.

7. The method of claim 1, further comprising a step of obtaining a sample containing nucleic acids from the subject.

8. The method of claim 7, wherein step (b) comprises: i. treating the nucleic acids of the sample with an agent that modifies unmethylated cytosines but does not modify methylated cytosines so that a modified ZNF 145 nucleic acid is produced, and ii. detecting modification of unmethylated cytosines by the agent.

9. The method of claim 8, wherein modification by the agent leads to changes in the sequence of DNA and detecting the modification comprises sequencing the nucleic acids in one or more regions of DNA corresponding to the ZNF 145 gene.

10. The method of claim 8, further comprising a step of: amplifying the nucleic acid using oligonucleotide primers that are complementary to the modified ZNF 145 nucleic acids produced in step i. but not to unmodified ZNF 145 nucleic acids before step ii.

11. The method of claim 8, wherein the agent in step i. is sodium bisulfite.

12. A method comprising steps of:

(a) detecting, in a cell or cells, hypermethylation of at least one CpG site in the ZNF 145 gene; and

(b) determining, based on the detected hypermethylation, the tumorigenicity of the cell or cells.

13. The method of claim 12, wherein the detecting step also comprises detecting hypermethylation of at least one CpG site in a gene or genes selected from the group consisting of MN/CA9, PERl, or combinations thereof.

14. A method comprising steps of:

(a) detecting, in a subject, hypermethylation of at least one CpG site in the MN/CA9 gene; and

(b) determining, based on the detected hypermethylation, that the subject has or is susceptible to cancer, or that the subject has a particular stage of cancer.

15. The method of claim 14, wherein the cancer is colorectal cancer.

16. The method of claim 14, wherein at least one CpG site in step (b) is from the promoter region or first exon of MN/CA9.

17. The method of claim 14, wherein at least one CpG site in step (b) is a CpG site corresponding to a position selected from the group consisting of positions 1002, 1115, 1024, and 1060 in SEQ ID NO: 3.

18. The method of claim 14, wherein the step of detecting also comprises detecting hypermethylation of at least one CpG site in the PERl gene.

19. The method of claim 14, further comprising a step of obtaining a sample containing nucleic acids from the subject.

20. The method of claim 19, wherein step (b) comprises: i. treating the nucleic acids of the sample with an agent that modifies unmethylated cytosines but does not modify methylated cytosines so that a modified MN/CA9 nucleic acid is produced, and ii. detecting modification of unmethylated cytosines by the agent.

21. The method of claim 20, wherein modification by the agent leads to changes in the sequence of DNA and detecting the modification comprises sequencing the nucleic acids in one or more regions of DNA corresponding to the MN/CA9 gene.

22. The method of claim 20, further comprising a step of: amplifying the nucleic acid using oligonucleotide primers that are complementary to the modified MN/CA9 nucleic acids produced in step i. but not to unmodified MN/CA9 nucleic acids before step ii.

23. The method of claim 20, wherein the agent in step i. is sodium bisulfite.

24. A method comprising steps of:

(a) detecting, in a subject, hypermethylation of at least one CpG site in the PERl gene; and

(b) determining, based on the detected hypermethylation, that the subject has or is susceptible to colorectal cancer, or that the subject has a particular stage of colorectal cancer.

25. The method of claim 24, wherein at least one CpG site in step (b) is from the promoter region or first exon of PERl .

26. The method of claim 24, wherein the CpG sites analyzed include one or more CpG sites corresponding to a position selected from the group 917, 946, and 977 in SEQ ID NO: 1.

27. The method of claim 24, further comprising a step of obtaining a sample containing nucleic acids from the patient.

28. The method of claim 27, wherein step (b) comprises: i. treating the nucleic acids of the sample with an agent that modifies unmethylated cytosines but does not modify methylated cytosines so that a modified PERl nucleic acid is produced, and ii. detecting modification of unmethylated cytosines by the agent.

29. The method of claim 28, wherein modification by the agent leads to changes in the sequence of DNA and detecting the modification comprises sequencing the nucleic acids in one or more regions of DNA corresponding to the PERl gene.

30. The method of claim 28, further comprising a step of amplifying the nucleic acid using oligonucleotide primers that are complementary to the modified PERl nucleic acids produced in step i. but not to unmodified PERl nucleic acids before step ii.

31. The method of claim 28, wherein the agent in step i. is sodium bisulfite.

32. A kit useful in the diagnosis or assessment of cancer comprising: oligonucleotide primers that are designed to hybridize to nucleic acids from a region of one or more genes selected from the group consisting of PERl, ZNF 145, MN/CA9, and combinations thereof, wherein the oligonucleotide primers are complementary to nucleic acids that have been treated with an agent that modifies unmethylated cytosines to uracils but does not modify methylated cytosines.

Description:

Methylation markers for cancer

Background

[0001] Many methods for diagnosing cancer rely on genetic mutations as diagnostic markers. For example, the major Mendelian disorders involving predispositions to colorectal cancer are hereditary nonpolyposis colorectal cancer (HNPCC) and familial adenomatous polyposis (FAP). FINPCC is caused by mutations in any of the mismatch repair genes MLHl, MSH2, MSH6, PMSl, and PMS2, while FAP is caused by mutations in the adenomatous polyposis coli (APC) gene. These mutations are often used as diagnostic markers for colorectal cancer (Rowley, PT, "Inherited Susceptibility to Colorectal Cancer," Annual Review of Medicine, February 2005, Vol. 56, Pages 539-554). Similarly, mutations in the genes BRCA 1 (breast cancer gene 1) and BRCA 2 (breast cancer gene 2) are used as markers for breast cancer (Lux, MP et al., "Hereditary breast and ovarian cancer: review and future perspectives," J. MoI. Med. 2006 Jan;84(l): 16-28).

[0002] Nevertheless, not all cases of cancer involve gene mutations. Only a fraction of cancers are hereditary. For example, hereditary factors account for only about 10% of colorectal cancers and only about 5-10% of breast cancers (Lynch, HT and de Ia Chapelle, A., "Hereditary Colorectal Cancer," N. Engl. J. Med. 2003 Mar 6;348(10):919-32 and Lux, MP et al., "Hereditary breast and ovarian cancer: review and future perspectives," J. MoI. Med. 2006 Jan;84(l): 16-28).

Summary of the Invention

[0003] The invention encompasses the recognition that epigenetic markers can be particularly useful in diagnosing diseases including cancer. Epigenetic modifications are changes to DNA or to proteins associated with DNA that are stably inherited through cell divisions, are reversible, and do not involve changes to the coding content of the DNA itself. Examples of epigenetic modifications include DNA methylation, histone methylation, and histone deacetylation. One type of DNA methylation is also known as 5-CpG methylation and involves the addition of a methyl group to the position 5 carbon of the cytosine ring in a CpG dinucleotide. This type of DNA methylation is generally associated with gene silencing, especially when it occurs in promoter regions. 5-CpG methylation is involved in a variety of developmental processes, such as the regulation of imprinted genes, X- chromosome inactivation, and the suppression of parasitic DNA elements (Robertson, KD., and Jones, PA., "DNA methylation: past, present, and future directions," Carcinogenesis .

2000 Mar;21(3):461-7). Aberrant 5-CpG methylation, such as increased methylation (hypermethylation) or decreased methylation (hypomethylation) often occurs in cancer (Jones, PA and Laird, PW, "Cancer epigenetics comes of age," Nat. Genet. 1999 Feb; 21(2): 163-7). For example, the tumor suppressor gene TCF21 is hypermethylated and downregulated in lung cancer (Smith, LT, "Epigenetic regulation of the tumor suppressor gene TCF21 on 6q23-q24 in lung and head and neck cancer." Proc. Natl. Acad. Sci. U. S. A. 2006 Jan 24;103(4):982-7). Additional markers would be advantageous. [0004] The present invention encompasses the finding that particular CpG sites in the PERl, ZNF 145, and MN/CA9 genes are differentially methylated in cancer. Provided are methods and reagents useful in the diagnosis of cancer and/or in other assessments related to cancer. Generally, such methods involve detecting a methylation state in a subject or in a sample obtained from a subject and making a determination based on the detected methylation state. The determination can be, for example, that the subject has cancer, that the subject is susceptible to cancer, and/or that the subject has a particular stage of cancer. In certain embodiments of the invention, the detected methylation state is hypermethylation. In certain embodiments of the invention, the cancer is colorectal cancer. In certain embodiments of the invention, the cancer is breast cancer. In certain embodiments of the invention, methylation states are detected by a process that involves treating nucleic acids with an agent that selectively modifies unmethylated cytosines but not methylated cytosines. Treatment with such an agent may lead to changes in the sequence of the nucleic acids. In certain embodiments of the invention, detection of modifications to unmethylated cytosines comprises sequencing the nucleic acids that have been treated with the agent. In certain embodiments of the invention, the agent is sodium bisulfite, which modifies unmethylated cytosines to uracil.

[0005] The present invention also provides kits useful in the diagnosis of cancer. Generally, such kits comprise oligonucleotide primers that are specific for the PERl gene, ZNF 145 gene, MN/CA9 gene, or combinations thereof. In certain embodiments of the invention, such primers are designed to amplify and/or facilitate sequencing DNA that has been treated with an agent such as sodium bisulfite. Provided kits may also include instructions for detecting methylation at specific CpG sites in the PERl gene, ZNF 145 gene, MN/CA9 gene, or combinations thereof. In certain embodiments of the invention, control samples are provided.

[0006] These and other objects, advantages, and features of the present invention will become apparent to those of ordinary skill in the art having read the following detailed description of the preferred embodiments.

[0007] This application refers to various patents and publications. The contents of all of these are incorporated by reference. In case of a conflict between the instant specification and one or more of the incorporated references, the specification shall control. The determination of whether a conflict exists can be made by the inventors at any time.

Brief Description of the Drawings

[0008] Figure IA depicts the chemical structure of a CpG dinucleotide. Methylation occurs at the position 5 carbon of the cytosine ring.

[0009] Figure IB depicts a simplified mechanism of the methylation reaction. For simplicity, only the cytosine ring is shown. The enzyme attacks the carbon at position 6 and forms a covalent bond with the cytosine ring. The intermediate attacks the methyl group of the methyl donor S-adenosyl L-methionine (AdoMet). The result is methylation at the carbon at position 5 and a demethylated version of the donor molecule, S-adenosyl L-homocysteine (AdoHcy). Finally, the enzyme is released by beta-elimination. Though not shown, the mechanism also involves protonation of the nitrogen at position 3; abstraction of the proton from the cytosine-5 position is carried out by an unidentified enzyme base or water molecule. (This mechanism was adapted from Kumar, S, et al, "The DNA (cytosine-5) methyltransferases," Nucleic Acids Res. 1994 Jan 11; 22(l):l-10 and from GoIl, MG and Bestor, TH, "Eukaryotic cytosine methyltransferases," Annu. Rev. Biochem. 2005;74:481- 51.)

[0010] Figure 2 is a schematic diagram summarizing the methylation status at some CpG sites in the PERl and ZNF 145 genes, as determined by sodium bisulfite sequencing. The numbers in the top rows indicate individual CpG sites and correspond to positions in SEQ ID NO: 1 and in SEQ ID NO: 2 for PERl and ZNF 145 respectively. Open circles indicate that no methylation was detected at that CpG site. Half-filled circles indicate partial methylation (that methylation was detected in a subset of amplicons). Filled circles indicate that methylation at that site was detected in all amplicons analyzed.

[0011] Figures 3A and 3B depict the locations of sequencing primers and of primers used in methylation-specific PCR, as well as CpG sites in a portion of MN/CA9 gene. Shown is the expected DNA sequence after treatment by sodium bisulfite. The portion of

the MN/CA9 gene that is depicted starts at approximately 1000 bp upstream of the start of exon 1 (marked with the "atg" that is underlined) and ends at approximately 1000 bp downstream of the start of exon 1. Shown are two versions of the sequence: one version assuming that all the cytosines in CpG dinucleotides are methylated (and are not converted by sodium bisulfite) (Figure 3A) and another version assuming that all the cytosines in the CpG dinucleotide are unmethylated (and are converted to uracil by sodium bisulfite and amplified as thymine in a polymerase chain reaction) (Figure 3B). CpG dinucleotides are emphasized in bold capital letters and in a larger font, and are replaced by TpG dinucleotides in the "unmethylated" version of the sequence. Rectangular boxes indicate sequencing primers. Primers used in methylation-specific PCR assays are shaded in gray. CpG sites lying within the region enclosed by the delineated sequencing primers were queried for their methylation status by sodium bisulfite sequencing. This representation is for illustration purposes only and not meant to imply that the CpG dinucleotides within this sequence are either all methylated or all unmethylated in any given strand. For example, it is possible for some of the CpG dinucleotides within a given strand to be methylated while the rest are not. [0012] Figure 4 depicts the results of a sodium bisulfite sequencing experiment to determine the extent of methylation at CpG sites in the MN/CA9 gene. Rectangular boxes outline cytosines within CpG sites. Samples analyzed were either from late stage colorectal tumor tissue (Duke's C or D) or from adjacent normal tissue. Solid bold curves indicated thymine, and double curves indicate cytosine (see Figure legend). Treatment with sodium bisulfite converts unmethylated cytosines to uracil, which is subsequently amplified as thymine during polymerase chain reaction (PCR). Methylated cytosines are unchanged by sodium bisulfite treatment. The area under the curve of the cytosine trace was used as an indication of methylation, and was compared between tumor tissues and corresponding normal tissue. The CpG sites correspond to positions 1002, 1115, 1024, and 1060 in SEQ ID NO: 3. (Positions 1002, 11115, and 1024 are depicted from left to right in the top two rows of sequencing data, and position 1060 is depicted in the bottom two rows of sequencing data.) [0013] Figure 5 depicts the results of an experiment conducted to analyze the methylation state of CpG dinucleotides in the promoter region of the MN/CA9 gene. Shown is a picture of an agarose gel under ultraviolet illumination, through which polymerase chain reaction (PCR) products stained with ethidium bromide were electrophoresed. Samples of DNA from tumor tissue (lanes 1, 3, 5, 7, 9, 11) and from adjacent normal tissue (lanes 2, 4, 6, 8, 10, and 12) were amplified using primers specific for methylated DNA (top row) or with primers specific for unmethylated DNA (bottom row).

Definitions

[0014] The term "3 ' " refers to a region or position in a nucleic acid or oligonucleotide 3' (i.e., downstream) from another region or position in the same polynucleotide or oligonucleotide. The term "5' " refers to a region or position in a nucleic acid or oligonucleotide 5' (i.e., upstream) from another region or position in the same polynucleotide or oligonucleotide. The terms "J ' end," as used herein in reference to a nucleic acid molecule, refer to the end of the nucleic acid that contains a free hydroxyl group attached to the 3' carbon of the terminal pentose sugar. The term "5 ' end, "as used herein in reference to a nucleic acid molecule, refers to the end of the nucleic acid molecule which contains a free hydroxyl or phosphate group attached to the 5' carbon of the terminal pentose sugar. [0015] The term "amplification" when used to refer to nucleic acids is used herein to mean a method or process that increases the representation in a population of a specific nucleic acid sequence in a sample by producing multiple {i.e., at least 2) copies of the desired sequence. Methods for nucleic acid amplification are known in the art and include, but are not limited to, polymerase chain reaction (PCR) and ligase chain reaction (LCR). In a typical PCR amplification reaction, a nucleic acid sequence of interest is often amplified at least fifty thousand fold in amount over its amount in the starting sample. A "copy" or "amplicon" does not necessarily mean perfect sequence complementarity or identity to the template sequence. For example, copies can include nucleotide analogs such as deoxyinosine, intentional sequence alterations (such as sequence alterations introduced through a primer comprising a sequence that is hybridizable but not complementary to the template), and/or sequence errors that occur during amplification. Amplification methods (such as polymerase chain reaction or PCR) are known in the art.

[0016] The terms "approximately" and "about" in reference to a number is used herein to include numbers that fall within a range of 20%, 10%, 5%, or 1% in either direction (greater than or less than) the number unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value.) [0017] The abbreviation "bp" as used herein refers to base pairs, two nucleotides on opposite complementary DNA or RNA strands that are connected via hydrogen bonds. [0018] The term "cancer" is used interchangeably with "tumor " herein. As used herein, the term "cancer" refers to or describes a physiological condition in mammals that is typically characterized by unregulated cell growth. Examples of cancers include, but are not

limited to carcinoma, lymphoma, blastoma, sarcoma, and leukemia. More particularly, examples of such cancers include colorectal cancer, lung cancer, breast cancer, ovarian cancer, prostate cancer, multiple myeloma, bone cancer, liver cancer, pancreatic cancer, skin cancer, cancer of the head or neck, cutaneous or intraocular melanoma, uterine cancer, rectal cancer, cancer of the anal region, stomach cancer, uterine cancer, carcinoma of the sexual and reproductive organs, Hodgkin's Disease, cancer of the esophagus, cancer of the small intestine, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, cancer of the adrenal gland, sarcoma of soft tissue, bladder cancer, kidney cancer, renal cell carcinoma, carcinoma of the renal pelvis, neoplasms of the central nervous system (CNS), neuroectodermal cancer, spinal axis tumors, glioma, meningioma, and pituitary adenoma. In some embodiments of the invention, the cancer is colorectal cancer. In some embodiments of the invention, the cancer is breast cancer.

[0019] The term "cancer cell" is used herein to refer to a cell that undergoes undesired and/or unregulated cell growth or abnormal persistence. In some embodiments of the invention, the cell is part of an organ or tissue. In some embodiments of the invention, the cell is part of an organism. In some embodiments of the invention, the cancer cell is grown in vitro and is from a cell line that is a permanently immortalized established cell culture that will proliferate indefinitely and in an unregulated manner, given appropriate fresh medium and space.

[0020] The term "colorectal cancer, " also known as colon cancer or bowel cancer, is used herein to refer to a group of cancers that includes cancerous growths in the colon, rectum and/or appendix.

[0021] The phrase "corresponding to, " when used to describe positions or sites within nucleotide sequences, is used herein as it is understood in the art. As is well known in the art, two or more nucleotide sequences can be aligned using standard bioinformatic tools, including programs such as BLAST, ClustalX, Sequencher, and etc. Even though the two or more sequences may not match exactly and/or do not have the same length, an alignment of the sequences can still be performed and, if desirable, a "consensus" sequence generated. Indeed, programs and algorithms used for alignments typically tolerate definable levels of differences, including insertions, deletions, inversions, polymorphisms, point mutations, etc. Such alignments can aid in the determination of which positions in one nucleotide sequence correspond to which positions in other nucleotide sequences.

[0022] The abbreviation "CpG" is used herein to refer to a dinucleotide comprised of a cytosine nucleotide (deoxycytidine) linked via a phosphate group to a guanine nucleotide

(deoxyguanosine) through linkages to the 5' position of the deoxycytidine and the 3' position of the deoxyguanosine. The cytosine in this dinucleotide is said to be in the "5' position" of the dinucleotide, and the guanine is said to be in the "3' position" of the dinucleotide. The typical structure of a CpG dinucleotide is depicted in Figure IA. As is understood by one of ordinary skill in the art, the abbreviation "CpG" also refers to modified dinucleotides similar but not identical to the structure depicted in Figure IA, so long as the 5' nucleotide is still identifiable as deoxycytidine and the 3' nucleotide is still identifiable as deoxyguanosine. For example, a deoxycytidine-deoxyguanosine dinucleotide in which the cytosine ring is methylated at the 5 position is still considered a CpG dinucleotide, and may be referred to as a methylated CpG or abbreviated as 5mCpG. A dinucleotide similar in structure to the one depicted in Figure IA, in which the position 4 carbon in the cytosine ring is not aminated but is a carbonyl carbon, is not a CpG dinucleotide, because the 5' nucleotide is identified as deoxythymidine rather than deoxycytidine. As used herein, the abbreviation CpG can also refer to a CpG site, defined below.

[0023] The term "CpG island" is used herein to refer to regions of DNA of at least approximately 100 base pairs in length that have a guanosine and cytosine content above 50% and a ratio of observed CpGs versus expected CpGs close to or above 0.6. In some embodiments of the invention, the CpG island is at least approximately 200 base pairs in length.

[0024] The term "CpG site " is used herein to refer to a position within a region of DNA corresponding to a position where a CpG dinucleotide is found in a reference sequence. One of ordinary skill in the art will understand the term CpG site to encompass the location in the region of DNA where a CpG dinucleotide is typically found, whether or not the dinucleotide at that position is a CpG dinucleotide in a particular DNA molecule. For examples, the DNA sequence of a gene may typically contain a CpG dinucleotide at a particular position, but may contain other dinucleotides at the corresponding position in mutant versions, polymorphic variants, or other variations of the gene. Some mutations such as single base substitutions may alter the identity of the dinucleotide to be something other than CpG. Other mutations such as insertions or deletions may alter the position of the CpG dinucleotide typically found at a particular site. In cases such as these, the term CpG site is understood by one of ordinary skill in the art to encompass the site corresponding to the position where a CpG dinucleotide is typically found, for example, in a wild type version of the DNA. Similarly, the term CpG site also encompasses the corresponding site in a nucleic acid that has been modified

experimentally, for example by labeling, methylation, demethylation, deamination (including conversion of a cytosine to uracil by a chemical such as sodium bisulfite), etc.

[0025] The term "diagnosis ' " as used herein refers to a process aimed at determining if an individual is afflicted with a disease or ailment.

[0026] The term "dinucleotide " is used herein to refer to a sequence of two nucleotides.

[0027] The term "differentially methylated, " when used to describe one or more CpG sites, is used herein to refer to the state of being methylated differently depending on the type of cell, tissue, or sample from which the DNA is derived. For example, a CpG site that is hypermethylated in tumors compared to normal tissue is said to be "differentially methylated " because the methylation state of such a site is different in tumors than it is in normal tissues.

[0028] The abbreviation "DNA " is used herein to refer to deoxyribonucleic acid, a polymer of nucleotides that contains genetic information. In some embodiments of the invention, "DNA" refers to the genetic information of an organism.

[0029] The term "DNA methylation " is used herein to refer to an epigenetic modification to DNA involving the addition of a methyl group via a covalent linkage to a base of DNA.

Except when otherwise indicated, the form of DNA methylation to which this term refers is that of the methylation of the cytosine at the carbon 5 position of a cytosine ring in a CpG dinucleotide. Such methylation of a CpG dinucleotide commonly occurs in mammalian

DNA.

[0030] The term "epigenetic" is used herein in accordance with its common meaning in the art, to describe features and modifications to genetic material or to the proteins and other molecules that associate with genetic material. "Epigenetic " modifications are stable over rounds of cell division yet are reversible, but do not involve changes in the DNA coding content of the organism. "Epigenetic " modifications may include, for example, modifications to chromatin such as histone deacetylation and histone methylation, or DNA modifications such as DNA methylation.

[0031] The term "gene, " as used herein has its meaning as understood in the art. In general, a gene may include gene regulatory sequences (e.g., promoters, enhancers, etc.) and/or intron sequences, in addition to coding sequences (open reading frames). A gene typically encodes a gene product such as an RNA molecule or a protein that is produced by expression of the gene. It will be appreciated by those of ordinary skill in the art that definitions of "gene" include references to nucleic acids that do not encode proteins but rather encode functional RNA molecules such as tRNAs. For the purpose of clarity it is noted that,

as used in the present application, the term "gene" often refers to a portion of a nucleic acid that encodes a protein, optionally encompassing regulatory sequence(s). This definition is not intended to exclude application of the term "gene" to non-protein coding expression units but rather to clarify that, in most cases, the term as used in this document refers to a nucleic acid that encodes a protein.

[0032] The term "hybridize" as used herein refers to the interaction between two complementary nucleic acid sequences. The phrase "hybridizes under high stringency conditions" describes an interaction that is sufficiently stable that it is maintained under art- recognized high stringency conditions. Guidance for performing hybridization reactions can be found, for example, in Current Protocols in Molecular Biology, John Wiley & Sons, N. Y., 6.3.1-6.3.6, 1989 (and in more recent updated editions), and in Sambrook et ah, Molecular Cloning: A Laboratory Manual, 3 rd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2001. Aqueous and nonaqueous methods are described in these references, and either can be used. Typically, for nucleic acid sequences over approximately 50-100 nucleotides in length, various levels of stringency are defined, such as low stringency {e.g., 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by two washes in 0.2X SSC, 0.1% SDS at least at 50 0 C (the temperature of the washes can be increased to 55°C for medium-low stringency conditions)); 2) medium stringency hybridization conditions utilize 6X SSC at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 60 0 C; 3) high stringency hybridization conditions utilize 6X SSC at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 65°C; and 4) very high stringency hybridization conditions are 0.5M sodium phosphate, 0.1% SDS at 65°C, followed by one or more washes at 0.2X SSC, 1% SDS at 65°C.) Hybridization under high stringency conditions occurs between sequences with a very high degree of complementarity. One of ordinary skill in the art will recognize that the parameters for different degrees of stringency will generally differ based various factors such as the length of the hybridizing sequences, whether they comprise RNA or DNA, etc. For example, appropriate temperatures for high, medium, or low stringency hybridization will generally be lower for shorter sequences such as oligonucleotides (including oligonucleotide primers used in polymerase chain reactions and/or DNA sequencing reactions) than for longer sequences.

[0033] The term "hypermethylation " is used herein to refer to a methylation state characterized by an increase in the methylation of DNA as compared with a reference. Except when otherwise indicated, hypermethylation refers specifically to an increase in the methylation of cytosines in DNA.

[0034] The term "hypomethylation, " also referred to as undermethylation, is used herein to refer to a methylation state characterized by a decrease in the methylation of DNA as compared with a reference. Except when otherwise indicated, hypomethylation refers specifically to a decrease in the methylation of cytosines in DNA.

[0035] The term "metastasis " is used herein to refer to spread of a tumor from one organ or part of an organ to another organ or to a non-adjacent part of the same organ.

[0036] The term "methylated, " when used to refer cytosines or CpG sites in nucleic acids such as DNA, is used herein to describe cytosines that are modified by covalent addition of a methyl group at the position 5 carbon of the cytosine.

[0037] The term "methylation " is used herein to refer to modification of a substrate by covalent addition of a methyl group. Except when otherwise indicated, methylation refers to the addition of a methyl group to cytosine residues at CpG sites in DNA.

[0038] The term "methylation state" is used herein to refer to the status of DNA in respect to modifications by methylation. Except when otherwise indicated, the methylation to which this term refers is methylation of cytosine residues at CpG sites in DNA. Examples of methylation states include, but are not limited to, hypermethylation, hypomethylation, and normal, etc.

[0039] The term "methyltransf erase " is used herein to refer to an enzyme that transfers a methyl group to a DNA or protein such that the methyl group is covalently attached to the

DNA or protein molecule. Except when otherwise indicated, the term "methyltransf erase" refers specifically to enzymes that have the ability to transfer methyl groups to DNA at the position 5 carbon of a cytosine ring in a CpG dinucleotide. Examples of such DNA methyltransferases in mammals include Dnmtl (DNA methyltransferase 1), Dnmt2, Dnmt3a, and Dnmt3b. It will be understood by one of ordinary skill in the art that some methyltransferases, for example Dnmt3a, may also transfer methyl groups to other targets, such as CpA and CpT dinucleotides.

[0040] The term "MN/CA9, " is used herein to refer to a gene also known as CA9 and carbonic anhydrase IX and that is identified in the Online Mendelian Inheritance in Man

(OMIM) database by the identification number 603179. MN/CA9 is located on chromosome

17 in humans and encodes a zinc metalloenzyme of the carbonic anhydrase family, which catalyzes the reversible hydration of carbon dioxide to carbonic acid. Carbonic anhydrases have important roles in facilitating transport of carbon dioxide and protons in the intracellular space, across biological membranes and in the extracellular space. The genomic nucleotide

sequence for the human MN/CA9 gene is deposited in GenBank under Accession number Z54349.

[0041] Those of ordinary skill in the art will appreciate, however, that the term "MN/CA9" in reference to a nucleic acid encompasses not only nucleic acids having the complete sequence deposited in GenBank Accession number Z54349, but also to nucleic acids of related genes in other species such as homologues, orthologues, and paralogues of MN/CA9. Further, the term "MN/CA9" in reference to a nucleic acid also encompasses allelic variants of, mRNA and cDNA sequences derived from or related to, and nucleic acids that represent fragments of the complete sequence or related sequences from other species. The encompassed fragments include fragments by themselves, as well as fragments that are part of fusions with other nucleic acids and fragments that have been cloned into plasmids, expression vectors, and such. Moreover, those of ordinary skill in the art understand that nucleotide sequences generally tolerate variations such as polymorphisms, insertions, deletions, inversions, and other mutations, while still being recognizable as being "MN/CA9 " sequences. Indeed, the products arising from the MN/CA9 gene can tolerate some substitution without altering the identity and/or activity of the MN/CA9 gene product. Thus, any nucleic acid that shares at least about 30-40% overall sequence identity, often greater than about 50%, 60%, 70%, or 80%, and further usually including at least one region of much higher identity, often greater than 90% or even 95%, 96%, 97%, 98%, or 99% in one or more highly conserved regions, usually encompassing at least 10-15 and often up to 60 or more base pairs, with the sequence deposited in GenBank Accession number Z54349, is encompassed within the relevant term "MN/CA9 " as used herein.

[0042] The terms "normal" and "healthy" are used herein interchangeably. They refer to a subject or group of subjects who do not have cancer, or to cells or tissue samples that are not cancer cells. In some embodiments of the invention, "normal" cells or tissue samples can be derived from a subject or group of subjects that have cancer. For example, noncancerous cells or tissue that is adjacent to cancerous cells or tissue may be considered "normal. " The term "normal" is also used herein to describe cells or tissue samples isolated from a healthy individual.

[0043] The term "nucleotide" is used herein to refer to the monomer from which nucleic acids are built. The chemical structure of a nucleotide consists of 3 portions: a heterocyclic base, a sugar, and one or more phosphate groups. In the most common nucleotides the base is a derivative of purine or pyrimidine, and the sugar is the pentose (five-carbon sugar) deoxyribose or ribose.

[0044] The term "oligonucleotide " is used herein to refer to a string of nucleotides or analogues thereof. Oligonucleotides may be obtained by a number of methods including, for example, chemical synthesis, restriction enzyme digestion or PCR. As will be appreciated by one skilled in the art, the length of an oligonucleotide {i.e., the number of nucleotides) can vary widely, often depending on the intended function or use of the oligonucleotide. Generally, oligonucleotides comprise between about 5 and about 300 nucleotides, for example, between about 15 and about 200 nucleotides, between about 15 and about 100 nucleotides, or between about 15 and about 50 nucleotides. Throughout the specification, whenever an oligonucleotide is represented by a sequence of letters (chosen from the four base letters: A, C, G, and T, which denote adenosine, cytidine, guanosine, and thymidine, respectively), the nucleotides are presented in the 5' to 3' order from the left to the right. In certain embodiments, the sequence of an oligonucleotide of the present invention contains the letters Y and/or R. As used herein, the letter "Y" represents a degenerative base, which can be C or T with substantially equal probability. Thus, for example, in the context of the present invention, if an oligonucleotide contains one degenerative base Y, the oligonucleotide is a substantially equimolar mixture of two subpopulations of a first oligonucleotide where the degenerative base is C and a second oligonucleotide where the degenerative base is T, the first and second oligonucleotide being otherwise substantially identical. Similarly, as used herein, the letter "R" represents a degenerative base, which can be A or G with substantially equal probability.

[0045] The term "PERl, " is used herein to refer to a gene that is postulated to be the human orthologue of the Drosophila Per (Period gene), is also known as RIGUI, and is identified in the Online Mendelian Inheritance in Man (OMIM) database by the identification number 602260. PERl is located on chromosome 17 in humans and encodes a protein postulated to be involved in the regulation of circadian rhythms. The genomic nucleotide sequence for human PERl gene is deposited in GenBank under Accession number AF 102137.

[0046] Those of ordinary skill in the art will appreciate, however, that the term "PERl " in reference to a nucleic acid encompasses not only nucleic acids having the complete sequence deposited in GenBank Accession number AF 102137, but also to nucleic acids of related genes in other species such as homologues, orthologues, and paralogues of PERl . Further, the term "PERl " in reference to a nucleic acid also encompasses allelic variants of, mRNA and cDNA sequences derived from or related to, and nucleic acids that represent fragments of the complete sequence or related sequences from other species. The

encompassed fragments include fragments by themselves, as well as fragments that are part of fusions with other nucleic acids and fragments that have been cloned into plasmids, expression vectors, and such. Moreover, those of ordinary skill in the art understand that nucleotide sequences generally tolerate variations such as polymorphisms, insertions, deletions, inversions, and other mutations, while still being recognizable as being "PERl" sequences. Indeed, the products arising from the PERl gene can tolerate some substitution without altering the identity and/or activity of the PERl gene product. Thus, any nucleic acid that shares at least about 30-40% overall sequence identity, often greater than about 50%, 60%, 70%, or 80%, and further usually including at least one region of much higher identity, often greater than 90% or even 95%, 96%, 97%, 98%, or 99% in one or more highly conserved regions, usually encompassing at least 10-15 and often up to 60 or more base pairs, with the sequence deposited in GenBank Accession number AF 102137, is encompassed within the relevant term "PERl " as used herein.

[0047] The term "primer" is interchangeable with "oligonucleotide primer" and is used herein to refer to an oligonucleotide that acts as a point of initiation of synthesis of a primer extension product, when placed under suitable conditions {e.g., buffer, salt, temperature and pH), in the presence of nucleotides and an agent for nucleic acid polymerization {e.g., a DNA-dependent or RNA-dependent polymerase). The primer is preferably single-stranded for maximum efficiency in amplification, but may alternatively be double-stranded. If double-stranded, the primer may first be treated {e.g. , denatured) to allow separation of its strands before being used to prepare extension products. Such a denaturation step is typically performed using heat, but may alternatively be carried out using alkali, followed by neutralization. A typical primer comprises about 10 to about 35 nucleotides in length of a sequence substantially complementary to the target sequence. A "primer" can be used in an amplification reaction such as a PCR, and/or may be used as a "sequencing primer" in DNA sequencing reactions.

[0048] The term "promoter, " has its meaning as understood in the art, referring to a region of DNA containing regulatory elements located upstream of the gene. It will be understood by those of ordinary skill in the art that the exact boundaries of the promoter for a given gene may not be fully defined, and that this may include some elements that are not located 5' to the gene, though the term "promoter" is still understood.

[0049] The term "promoter region " is used herein to refer to a region of a gene including the promoter as well as a portion of the gene downstream from the beginning of the first exon. When not specified otherwise, the "promoter region " contains the DNA from about 1

kb upstream of the beginning of the first exon of the gene to about 1 kb downstream of the beginning of the first exon of the gene. The "promoter region " may extend further in either direction, for example, to about 2 kb upstream and/or downstream of the start of the first exon of the gene.

[0050] The term "sample," as used herein, refers to any liquid or solid material containing nucleic acids. A sample may be, or may be derived from, any biological tissue or fluid that can contain nucleic acids. Frequently, the sample will be a "clinical sample", i.e., a sample obtained or isolated from a patient to be diagnosed for or otherwise assessed in relation to cancer. Such samples include, but are not limited to, bodily fluids that contain cellular materials and may or may not contain cells, e.g., blood, blood product, plasma, serum, urine, seminal fluid, saliva, lymphatic fluid, amniotic fluid, synovial fluid, cerebrospinal fluid, peritoneal fluid, and the like; endocervical, urethral, rectal, vaginal, vulva-vaginal samples; and archival samples that may or may not be used as references. Samples may include sections of tissue {e.g., colon biopsy samples) or of tumors, such as frozen sections. The term "sample" also encompasses any material derived from processing a biological sample. Derived materials include, but are not limited to, cells (or their progeny) isolated from the sample, cell components, and nucleic acid molecules extracted from the sample. Processing of a biological sample to obtain a sample may involve one of more of: filtration, distillation, centrifugation, extraction, concentration, dilution, purification, inactivation of interfering components, addition of reagents, and the like. [0051] The term "sodium bisulfite, " also known as sodium hydrogen sulfite, bisulfite, sodium bisulphite, sodium hydrogen sulphite, and bisulphite, is used herein to refer to a chemical with the formula NaHSOs. Sodium bisulfite deaminates unmethylated cytosines to uracil but does not modify methylated cytosines.

[0052] The terms "subject" and "individual" are used herein interchangeably. They refer to a human or another animal {e.g., mouse, rat, rabbit, dog, cat, cattle, swine, sheep, horse or primate) that can be afflicted with or is susceptible to a disease or disorder {e.g., cancer) but may or may not have the disease or disorder. In many embodiments, the subject is a human being. Unless otherwise stated, the terms "individual" and "subject" do not denote a particular age, and thus encompass adults, children, and newborns.

[0053] The phrase "suffering from, " when used to describe a subject and in reference to a disease, is used herein to describe subjects who have been diagnosed as having the disease and/or are experiencing symptoms related to the disease. Thus, a subject who is diagnosed with cancer but not experiencing symptoms related to the cancer is "suffering from " cancer.

[0054] The term "susceptible " is used herein to mean having an increased risk for and/or a propensity for something, i.e. a disease such as cancer. The term takes into account that an individual "susceptible" for a disease may never be diagnosed with the disease. [0055] The term "tumorigenicity" is used herein to mean the ability to cause or form tumors and is often used to describe a characteristic of cells or tissues.

[0056] The term "unmethylated, " is used herein interchangeably with "demethylated. " When used to refer cytosines or CpG sites in nucleic acids such as DNA, the term "unmethylated" is used herein to describe cytosines that are not modified by covalent addition of a methyl group.

[0057] The term "ZNF 7 I ' 45' '(zinc finger protein 145), as used herein, refers to a gene also known as ZBTB 16 and PLZF (promyelocytic leukemia zinc finger) that is identified in the Online Mendelian Inheritance in Man (OMIM) database by the identification number 176797. ZNF 145 is located on chromosome 11 in humans and encodes a transcription factor. The genomic nucleotide sequence for human ZNF 145 is deposited in GenBank under Accession number AF060568.

[0058] Those of ordinary skill in the art will appreciate, however, that the terms "ZNF145, " "ZBTB16, " and "PLZF" in reference to a nucleic acid encompasses not only nucleic acids having the complete sequence deposited in GenBank Accession number AF060568, but also to nucleic acids of related genes in other species such as homologues, orthologues, and paralogues of ZNF 145. Further, the terms "ZNF145, " "ZBTB16, " and "PLZF" in reference to a nucleic acid also encompasses allelic variants of, mRNA and cDNA sequences derived from or related to, and nucleic acids that represent fragments of the complete sequence or related sequences from other species. The encompassed fragments include fragments by themselves, as well as fragments that are part of fusions with other nucleic acids and fragments that have been cloned into plasmids, expression vectors, and such. Moreover, those of ordinary skill in the art understand that nucleotide sequences generally tolerate variations such as polymorphisms, insertions, deletions, inversions, and other mutations, while still being recognizable as being "ZNF145" sequences. Indeed, the products arising from the ZNF 145 gene can tolerate some substitution without altering the identity and/or activity of the ZNF 145 gene product. Thus, any nucleic acid that shares at least about 30-40% overall sequence identity, often greater than about 50%, 60%, 70%, or 80%, and further usually including at least one region of much higher identity, often greater than 90% or even 95%, 96%, 97%, 98%, or 99% in one or more highly conserved regions, usually encompassing at least 10-15 and often up to 60 or more base pairs, with the sequence

deposited in GenBank Accession number AF060568, is encompassed within the relevant terms "ZNF145, " "ZBTB16, " and "PLZF" as used herein.

Detailed Description of the Certain Embodiments of the Invention

[0059] The present invention provides, among other things, methods of diagnosis, determination of susceptibility, and/or staging of a disease in a subject by evaluating the methylation state of certain CpG sites. For example, inventive methods are useful in evaluating a subject's condition or susceptibility to cancer, such as colorectal cancer. Inventive methods often involve evaluating the methylation state of CpG sites in the PERl, ZNF 145, or MN/CA9 genes, or in combinations thereof. Also provided are kits useful in the diagnosis or other assessment of cancer.

I. Methods useful in the diagnosis or other assessment of cancer

[0060] Inventive methods generally involve detecting methylation states in a subject, in a sample obtained from a subject, or in other samples. Inventive methods may also involve determining, based on the detected methylation state, that the subject has cancer, is susceptible to cancer, or has a particular stage of cancer.

Subjects and samples

[0061] Inventive methods involving detecting methylation states may be applied to any appropriate subject, which is typically an animal or human being. In certain embodiments of the invention, the subject is a human being. In some embodiments of the invention, the subject is healthy. In some embodiments of the invention, the subject is susceptible to or suffering from cancer. For example, the subject could be suspected as having cancer, identified as having an increased risk for cancer, treated for cancer, in remission for cancer, being monitored for recurrence of cancer, etc. The subject could already be diagnosed as having cancer, and provided methods may be useful in determining the stage of disease progression.

[0062] Inventive methods involving detecting methylation states may be applied to any samples containing DNA, such as those described in the "Definitions". In certain embodiments of the invention, the sample is obtained from a subject such as described above.

In certain embodiments of the invention, the sample is processed or treated before being used in accordance with the inventive methods. For example, the sample may be processed such that the sample contains mostly nucleic acids such as DNA.

[0063] In certain embodiments of the invention, the sample is or comprises cells of a cell line. The cell line can be originally derived from a human being or from an animal such as a mouse, rat, rabbit, dog, cat, cattle, swine, sheep, horse or primate. The cell line can be

derived from another cell line, for example by genetic or other laboratory manipulation. Inventive methods may be performed on the cells in order to determine a particular characteristic of the cell, for example the tumorigenicity of the cells.

Methylation states

[0064] Inventive methods generally involve detecting methylation states of DNA in a sample or subject. Methylation states of DNA are related to presence or absence of methyl groups at one or more CpG sites in DNA. As mentioned previously, methylation of CpG sites in promoter regions of DNA is associated with gene silencing, and aberrant methylation is often associated with diseases such as cancer. Aberrant methylation may participate in cancer initiation or progression by improperly silencing, downregulating, activating, or upregulating genes. For example, improper silencing or downregulation of tumor suppressor genes by aberrant methylation may lead to cancer. Thus, information about methylation states may be useful in making an evaluation of a subject or a sample in respect to cancer and/or tumorigenicity.

[0065] In certain embodiments of the invention, the detected methylation state is hypermethylation, an increase in the methylation of cytosines in CpG sites. The increase can be about 5%, about 10%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or more than about 95% greater than the extent of methylation of cytosines typically expected or observed for the CpG site or sites being evaluated. [0066] In certain embodiments of the invention, the detected methylation state is hypomethylation, a decrease in the methylation of cytosines in CpG sites. The decrease can be about 5%, about 10%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, or more than about 95% less than the extent of methylation of cytosines typically expected or observed for the CpG site or sites being evaluated. [0067] In certain embodiments of the invention, the detected methylation state is an alteration of the pattern of methylation at the CpG sites being evaluated. For example, CpG sites located at positions A and B in a gene may be typically methylated, while a CpG site located at positions C, D, and E are typically unmethylated. The detected methylation state in this example may be an alteration of the methylation state such that the CpG sites at B, D and E are methylated but the CpG sites at A and C are unmethylated.

[0068] In certain embodiments of the invention, the detected methylation state can be both hypermethylation and hypomethylation. For example, hypermethylation can be detected at one or more CpG sites, while hypomethylation can be detected at one or more CpG sites distinct from the CpG sites where hypermethylation is detected.

[0069] In certain embodiments of the invention, the detected methylation state is determined in part or wholly on the basis of a comparison with a control. The control can be a value or set of values related to the extent and/or pattern of methylation in a normal sample. In certain embodiments of the invention, such a value or values may be determined, for example, by calculations, using algorithms, and/or from previously acquired and/or archived data. In certain embodiments of the invention, the value or set of values for the control is derived from experiments performed on samples or using a subject. For example, control data can be derived from experiments on samples derived from comparable tissues or cells, such as normal tissue adjacent to tumors.

[0070] In certain embodiments of the invention, a control comprising DNA that is mostly or entirely demethylated, at one or more of the CpG sites being analyzed, is used. Such a control might be obtained, for example, from mutant tissues or cells lacking methyltransferase activity and/or from tissues or cells that have been chemically demethylated. For example, controls may be obtained from tissues or cells lacking activity of methyltransferases Dnmtl, Dnmt2, Dnmt3a, Dnmt3b, or combinations thereof. Agents such as 5-aza-2'-deoxycytidine may be used to chemically demethylate DNA. [0071] In certain embodiments of the invention, a control comprising DNA that is mostly or entirely methylated, at one or more of the CpG sites being analyzed, is used. Such a control might be obtained, for example, from cells or tissues that are known or expected to be mostly or entirely methylated at the CpG site or sites of interest. Such a control could also be obtained by cells or tissues in which methylation levels have been altered and/or manipulated, for example, by overexpression of methyltransferases (such as enzymes Dnmtl, Dnmt2, Dnmt3a, Dnmt3b, any of the bacterial 5-CpG methyltransferases, or combinations thereof). In certain embodiments of the invention, samples used to obtain control values are processed and/or manipulated in the same manner as the samples being evaluated.

Genes and CpG sites

[0072] Inventive methods are generally directed toward detection of methylation states of one or more CpG sites in the PERl, ZNF 145, and MN/CA9 genes. In some embodiments of the invention, such CpG site or sites are located in the promoter region and/or the first exon

of the gene or genes. CpG sites whose methylation states may be detected in the inventive methods include the sites at positions 917, 946, and 977 in SEQ ID NO: 1; positions 1963, 2008, 2-23, 2063, 2074, 2083, 2091, 2110, 2113, 2120 and 2154 in SEQ ID NO: 2; and positions 1002, 115, 1024, and 1060 in SEQ ID NO: 3. The methylation states of one or any combination of these sites may be detected.

[0073] In certain embodiments of the invention, methylation states of one or more CpG sites in only one of the genes (PERl, ZNF 145, and MN/CA9) are detected. In certain embodiments of the invention, methylation states of CpG sites of a combination of these genes (that is, from two or all three of the genes) are detected. Analysis of CpG sites in one or a combination of these genes can be combined with analysis of CpG sites in other genes, and/or with other methods of diagnosing, determining susceptibility to, and/or staging of cancer. In certain embodiments of the invention, the other genes are known to contain one or more CpG sites that are differentially methylated in cancer or other disease. In certain embodiments of the invention, the other genes are identified in a screen for genes that may be misregulated in cancer or disease. Genes identified in such screens may be identified, for example, on the basis of downregulation or upregulation of gene product in abnormal tissues or cells (such as cancer). Genes may also be identified using global or genome-wide methods that may identify differentially methylated sites, such as Restriction Landmark Genomic Scanning for Methylation-M (RLGS-M).

Detection of methylation states

[0074] Any of variety of techniques to detect methylation states can be used in the practice of inventive methods described herein. The following descriptions provide some examples of such techniques, and are not intended to limit the types of techniques that can be used with such methods. As will be understood of one of ordinary skill in the art, variations to the described techniques can also be used in accordance with inventive methods described herein. In certain embodiments of the invention, two or more methods of detecting methylation states are used together or in combination. In certain embodiments of the invention, the technique or techniques used to detect methylation states is or are quantitative. For example, such methods may provide estimates of the percentage of DNA molecules in a sample that are methylated at one or more particular CpG sites.

[0075] In certain embodiments of the invention, methylation states are detected by modification of DNA by sodium bisulfite followed by sequencing of modified DNA, a technique known as sodium bisulfite sequencing or bisulfite sequencing. Sodium bisulfite

converts unmethylated cytosines to uracil, while leaving methylated cytosines unmodified. Sodium bisulfite-converted DNA is typically amplified by polymerase chain reaction (PCR), during which uracils are amplified as thymines. Effectively, sodium bisulfite can act to introduce changes to the nucleic acid sequence such that unmethylated cytosines are distinguishable from methylated cytosines. The oligonucleotide primers used in PCR are designed such that they can hybridize to sodium-bisulfite treated DNA and such that they flank the regions containing the CpG sites being analyzed.

[0076] Amplification products can then be sequenced, and resulting sequences analyzed to deduce the methylation state of the DNA sample being analyzed. For example, a cytosine in the sequence of DNA converted by sodium-bisulfite signifies a methylated cytosine. A thymine in such a sequence at a position where a cytosine would be expected in DNA that has not been treated with sodium bisulfite signifies an unmethylated cytosine. Any method of DNA sequencing can be used with sodium-bisulfite converted DNA, including the Maxam-Gilbert method, chain termination methods (such as the Sanger method or dye terminator methods), pyrosequencing, variations thereof, and etc.

[0077] In certain embodiments of the invention, methylation states are detected by methylation-specific PCR. In this technique, DNA is first treated with sodium bisulfite. Sodium-bisulfϊte-treated DNA is then amplified by PCR using two different sets of primers in separate reactions. Both sets of primers contain at least one primer that hybridizes to a region of DNA containing at least one CpG dinucleotide whose methylation state is being determined. One set of primers is specific for methylated DNA in that the primers are designed to hybridize to sequences in which the cytosines within CpG dinucleotides remain unmodified. The other set of primers is specific for unmethylated DNA in that the primers are designed to hybridize to sequences in which the cytosines within CpG dinucleotides have been modified to uracil by sodium bisulfite. Amounts of amplification product obtained with primers specific for methylated DNA and amounts of amplification product obtained with primers specific for unmethylated DNA are then compared, and such a comparison may be used in a determination of methylation states.

[0078] In certain embodiments of the invention, methylation states are detected by combined bisulfite restriction analysis (COBRA). In this technique, DNA is treated with sodium bisulfite and subsequently amplified by PCR. Primers used in PCR reactions in this technique do not hybridize to any DNA regions containing CpG sites. Thus, amounts of amplification products of methylated and unmethylated DNA are expected to be directly proportional to amounts of methylated and unmethylated DNA templates respectively.

Alterations to DNA sequences caused by sodium bisulfite treatment may lead to methylation- dependent creation of new restriction sites, or to methylation-dependent preservation of preexisting restriction sites. Amplification products are then digested with a restriction enzyme whose recognition site is either created or preserved in a methylation-dependent manner after sodium bisulfite treatment. The amount of amplification product that is digested by such a restriction enzyme is compared to the amount of amplification product that is undigested, and used to determine the extent of methylation at the CpG sites that were queried. [0079] In certain embodiments of the invention, methylation states are detected by single nucleotide primer extension (SNuPE). In this technique, DNA is treated with sodium bisulfite and then amplified in a PCR using primers specific for sodium bisulfite-treated DNA. Amplification products are then annealed to a primer whose complementarity with the sodium-bisulfite -treated DNA terminates immediately 5' to the CpG site being analyzed. Two separate primer extension reactions are then performed: one using labeled dCTP and the other with labeled dTTP. The proportion of methylated cytosines versus unmethylated cytosines at the CpG site can be determined from the relative amounts of dCTP and dTTP incorporated respectively. (As mentioned before, unmethylated cytosines will be converted by sodium bisulfite such that they are amplified as thymines during PCR, whereas methylated cytosines are unmodified.)

[0080] In certain embodiments of the invention, methylation states are determined using methylation-sensitive restriction enzymes. Such enzymes have recognition sequences that contain at least one CpG site and are unable to digest DNA that is methylated at one or more of the CpG sites in the recognition sequence. For example, HpaII and one of its isoschizomers, Mspl, both recognize the same recognition sequence: 5'-CCGG-3'. HpaII is sensitive to methylation of the internal cytosine and cannot digest DNA methylated at that site. Conversely, Mspl is insensitive to methylation at that cytosine. Thus, HpaII and Mspl can be used to query the methylation state of a CpG site lying within their recognition sequence. One set of DNA samples can be digested with HpaII, while a duplicate set of DNA samples can be digested with Mspl. A Southern blot can then be performed on the digestion products, using a probe that will hybridize to a DNA fragment containing the CpG site or sites being queried. The sizes and relative intensities of bands for samples digested by each enzyme can be used to determine the extent of methylation at particular CpG sites in the sample.

[0081] Other methylation-sensitive restriction enzymes that can be used in accordance with the inventive methods include, among others, Aatll, Acil, AcII, Afel, Agel, Ascl,

AsiSI, Aval, BceAI, BmgBI, BsaAI, BsaHI, BsiEI, BsiWI, BsmBI, BspDI, BspEI, BsrBI, BsrFI, BssHII, BstBI, BstUI, CIaI, EagI, Faul, Fsel, Fspl, Haell, Hgal, Hhal, HinPII, Hpy99I, HpyCH4IV, KasI, MIuI, Nael, NarI, NgoMIV, NotI, NmI, PaeR7I, PmII, Pvul, RsrII, SacII, Sail, Sfol, SgrAI, Smal, SnaBI, Till, Xhol, and combinations thereof.

[0082] In certain embodiments of the invention, detection of methylation states is facilitated by use of proteins or other molecules that can distinguish methylated CpGs and unmethylated CpGs. For example, antibodies that bind specifically to methylated CpGs but not to unmethylated CpGs can be used with the inventive methods to facilitate determining the methylation state. A sample containing DNA can be contacted with such antibodies, and the extent and/or pattern of methylation at CpG sites in DNA of the sample may be detected through direct or indirect labeling of the antibody.

[0083] Other proteins that bind to methylated CpGs could also be used. For example, methyl-CpG binding proteins such as methyl CpG binding protein 2 (MeCP2), methyl binding domain protein 1 (MBDl), methyl binding domain protein 2 (MBD2), methyl binding domain protein 3 (MBD3), and methyl binding domain protein 4 (MBD4) bind to methylated CpG sites in DNA via their methyl-CpG binding domains. Any or all of these proteins could be contacted to a sample containing DNA. Such proteins may be directly or indirectly labeled and such labeling detected to facilitate determining the extent and/or pattern of methylation of the DNA. Any protein or protein fragment, including recombinant proteins, that contains a methyl-CpG binding domain, may be used in this manner. [0084] In certain embodiments of the invention, quantitation or estimation of the extent of methylation is based partially or wholly on the extent of binding by such methylation- differentiating molecules (antibodies, methyl-CpG binding proteins, restriction enzymes, and etc).

[0085] In certain embodiments of the invention, methylation states are detected using methods of analyzing methylation globally (that is, throughout the genome) such as Restriction Landmark Genomic Scanning for Methylation-M (RLGS-M). Though such methods are not specifically designed toward analysis of methylation of particular genes, they may yield information about particular genes.

[0086] While sodium bisulfite is often used in many of these techniques to convert unmethylated cytosines to uracil, any agent that selectively modifies unmethylated cytosines while leaving methylated cytosines unmodified, or that selectively modifies methylated cytosines while leaving unmethylated cytosines unmodified, can potentially be used in

accordance with inventive methods disclosed herein. Similarly, any agent that selectively protects methylated cytosines but does not protect unmethylated cytosines from subsequent modification or degradation (for example, by another agent), or vice versa, may be used in accordance with provided methods. Any agent that selectively makes methylated cytosines vulnerable but does not make unmethylated cytosines vulnerable to subsequent modification or degradation, or vice versa, may be used in accordance with provided methods.

Diagnosis of and susceptibility to cancer

[0087] In certain embodiments of the invention, a determination is made that the subject has cancer. Such a diagnosis may be made using inventive methods alone, or in combination with other diagnostic methods such as clinical evaluations, histological analysis, genetic testing, etc.

[0088] In certain embodiments of the invention, an assessment is made that the subject is susceptible to cancer. Such a determination may be made using the inventive methods alone, or in combination with other evaluations and factors. For example, a subject's susceptibility to cancer may also be evaluated in terms of genetic predisposition, family medical history, personal medical history, environmental exposure to agents that may increase risk for cancer, and/or lifestyle.

[0089] In certain embodiments of the invention, a diagnosis or determination of susceptibility to cancer applies to a particular type of cancer. Types of cancer that include those that are described in the "Definitions." In certain embodiments of the invention, the cancer is colorectal cancer. In certain embodiments of the invention, the cancer is breast cancer. In some embodiments of the invention, a diagnosis or determination of susceptibility to multiple types of cancer is made, and the multiple types of cancer include colorectal cancer, breast cancer, or both. In some embodiments of the invention, a diagnosis or determination of susceptibility to multiple types of cancer is made, and the multiple types of cancer include cancers other than colorectal cancer and breast cancer.

Staging of cancer

[0090] In certain embodiments of the invention, a determination is made that the subject has a particular stage or grade of cancer. In certain embodiments of the invention, the subject has already been diagnosed with cancer. Inventive methods may be useful in determining, for example, that the cancer is in an early or late stage; that the cancer can or cannot be surgically removed; that the cancer is low grade, medium grade, or high grade; that the cancer is

relatively well differentiated, moderately differentiated or poorly differentiated; and/or that the cancer is localized, has spread to lymph nodes, or has metastasized to other organs. [0091] The stage of cancer that is determined may be a stage that is part of a formal classification scheme, such as the TNM classification. In the TNM classification system, "T" represents the size of the tumor; "N" designates the presence and extent of the cancer spread to lymph nodes in the region; and "M" indicates the presence of metastasis beyond the region. The TNM designation is further categorized into one of five stages, from the smallest noninvasive cancer, (Stage 0) to the most advanced (Stage 4).

[0092] Another classification scheme that can be used in accordance with inventive methods disclosed herein is the Dukes staging system, which is often used in the staging of colorectal cancer. In the Dukes staging system, tumors are classified as A, B, C, or D depending on how advanced the tumor has progressed. Dukes stage A generally refer to tumors that are confined to mucosa, stage B in colorectal cancer generally refers to spreading limited to a nearby muscle layer, stage C generally refers to some spreading to the lymph nodes, and stage D generally refers to distant metastases.

Other epigenetic changes

[0093] DNA methylation of a particular gene or gene region is known to be associated with other epigenetic changes in the same gene or gene region. Thus, one of ordinary skill in the art will understand that other epigenetic changes of the same gene or gene region may also be informative in making an assessment in respect to cancer. In certain embodiments of the invention, other epigenetic changes involving PERl, ZNF 145, MN/CA9, and combinations thereof are also detected and used in making a determination with respect to diagnosis of cancer, susceptibility to cancer, staging of cancer, or tumorigenicity. For example, epigenetic modifications to histones, the proteins around which DNA is wound, are often associated with DNA methylation of the same region. Such epigenetic modifications include deacetylation of histones H3 and H4 and methylation of lysine and arginine residues in histones H3 and H4. Histones can be mono-, di-, or tri-methylated, and examples of residues that are known to be methylated include lysines 4, 9, and 27 of histone 3.

II. Kits

[0094] Provided are kits useful in the diagnosis of or other assessment in relation to cancer. Inventive kits may be used in accordance with inventive methods disclosed herein. Inventive kits generally comprise oligonucleotide primers that can facilitate detecting

methylation states of CpG sites in the PERl, ZNF 145, and/or MN/CA9 genes. In certain embodiments of the invention, a set of oligonucleotide primers is provided, each of which is designed to hybridize to sodium bisulfϊte-modifϊed nucleic acids of one particular gene, e.g., PERl, ZNF 145, or MN/CA9. In certain embodiments of the invention, a single kit contains multiple oligonucleotide primers, such that methylation of CpG sites in more than one gene can be analyzed. For example, a kit may contain oligonucleotide primers designed to hybridize to sodium bisulfϊte-modifϊed PERl nucleic acids as well as oligonucleotide primers designed to hybridize to sodium bisulfite -modified ZNF 145 nucleic acids. Similarly, a kit may contain oligonucleotide primers designed to hybridize to sodium bisulfite-modified PERl nucleic acids, oligonucleotide primers designed to hybridize to sodium bisulfite- modified ZNF 145 nucleic acids, and oligonucleotide primers designed to hybridize to sodium bisulfite-modified MN/CA9 nucleic acids. Other combinations of oligonucleotide primers can also be provided in inventive kits.

[0095] In certain embodiments of the invention, the kit provides reagents and/or instructions for sodium bisulfite sequencing analysis of PERl, ZNF 145, and/or MN/CA9 nucleic acids. Such reagents may include any of, for example, sodium bisulfite, buffers and solutions, spin columns for separation and/or purification of nucleic acids, reaction tubes, etc. [0096] In certain embodiments of the invention, guidance is provided as to how to interpret the detected methylation states in order to make a diagnosis or other assessment in relation to cancer.

[0097] In certain embodiments of the invention, kits further comprise a control or reference sample that comprises DNA that is mostly or entirely demethylated. The DNA in the reference sample may be demethylated globally (e.g., at all or most CpG sites), in particular genes or regions of genes, or at one or more particular CpG sites. In certain embodiments of the invention, kits further comprise a control or reference sample that comprises DNA that is mostly or entirely methylated. The DNA in the reference sample may be methylated globally (e.g., at all or most CpG sites), in particular genes or regions of genes, or at one or more particular CpG sites.

Examples

[0098] The following examples describe some particular modes of making and/or practicing the present invention. However, it should be understood that these examples are for illustrative purposes only and are not meant to limit the scope of the invention.

Furthermore, unless the description in an Example is presented in the past tense, the text, like the rest of the specification, is not intended to suggest that experiments were actually performed or data were actually obtained.

EXAMPLE 1: Expression profiling of tumor tissues

[0100] The experiments in this example identified candidate genes for which DNA methylation may be altered in cancer by identifying genes whose RNA expression levels are downregulated in tumor tissues. Twelve pairs of breast tumor and adjacent normal tissue were analyzed using Affymetrix GeneChips for their RNA expression profiles. A list of 110 genes, which showed down-regulation of expression in breast cancer as compared to normal breast tissues, was compiled (depicted in Table 1). Aside from the twelve pairs of tumor and normal tissues, expression data from 135 tumor samples and from 5 unmatched normal samples (obtained from a separate experiment) were analyzed. Analysis using this additional data set yielded an additional list of 10 genes (shaded in Table 1) that were below the cutoff (> 2-fold difference in expression) in the dataset using the 12 pairs of breast tumor/ normal tissue samples.

Table 1: Gene list showing sequences with expression down-regulated in breast cancer as compared to normal breast tissues.

EXAMPLE 2: Identification of CpG islands

[0101] The analysis describe in this Example yielded a smaller set of candidate genes for methylation analysis in cancer.

[0102] From the genes listed in Table 1, a subset of genes was identified for potential further analysis based on frequency of down-regulation in the sample sets, status of intellectual property, and potential scientific relevance. (These genes are summarized in Table 2.) Genes in this subset containing at least one CpG island in the published sequence of the promoter- first exon region (1000 bp upstream from and 500 bp downstream from exon 1) were identified using programs in the public domain (UCSC Genome Browser, CpGP lot). A CpG island was defined as a region of DNA greater than 200 bp, with a guanine/ cytosine content above 0.5 and a ratio of observed over expected CpG presence above 0.6. In Table 2, genes that are shaded contained CpG islands and were then analyzed further.

Table 2: Annotated gene list from breast cancer tumor/normal profiling.

EXAMPLE 3: Analysis of methylation at CpG sites in candidate genes by sodium bisulfite sequencing

[0103] The experiments described in this Example were conducted to determine the methylation status in tumor samples of selected CpG sites of candidate genes identified in

Example 2. Regions of CpG islands identified in Example 2 were selected for sodium bisulfite sequencing using the following criteria, which have been described as associated with methylation changes during gene regulation:

[0104] (1) transcription site near CpG island,

[0105] (2) SpI (a transcription factor) binding sites, and

[0106] (3) repetitive elements serving as boundaries.

[0107] Sequencing primers for sodium bisulfite-sequencing were selected using Oligo v6 software, and are summarized in Table 3.

Table 3: Sequencing primers

[0108] A total of twelve tumor samples and twelve samples from adjacent normal tissue were analyzed. These included samples from early stage tumor and adjacent normal tissue (AB-T and AB-N respectively) and from late stage tumor and adjacent normal tissues (CD-T

and CD-N respectively). Cells from the five colorectal cancer cell lines described in Table 4 were pooled together and used as controls (labeled as "control"). A portion of the same population of pooled cells from colorectal cancer cell lines was chemically demethylated using 5-aza-2'-deoxycitidine (5-aza-dC) and used as a negative control (labeled as "5-aza- dC").

Table 4: Colorectal cancer cell lines used in expression profiling

[0109] DNA was extracted from all samples (tumor, normal tissue, and controls). DNA from similar categories was pooled together. That is, DNA from early stage tumors was pooled together; DNA from normal tissue adjacent to early stage tumors was pooled together, and etc. Pooled DNA samples were treated with sodium bisulfite, which converts unmethylated cytosines to uracil but does not alter methylated cytosines. [0110] After sodium bisulfite treatment, DNA samples were amplified by PCR using the sequencing primers listed in Table 3. During amplification, uracils are amplified as thymines. To amplify products from the PERl gene, primers PERl -if 1 and PERl -irl were used together. To amplify products from the ZNF 145 gene, primers ZNF 145 -if 1 and ZNF145-irl were used together.

[0111] Amplicons were then sequenced directly using the same primers that were used in PCRs. Methylated cytosines are detected as cytosines in the sequencing reaction, whereas unmethylated cytosines appear as thymines. Methylation levels were determined by analyzing the peaks in the sequencing traces, comparing the C peaks (representing a methylated cytosine) to the T peaks (representing an unmethylated cytosine) at the same location in a CpG site.

[0112] As shown in Figure 2, PERl and ZNF145 showed differential methylation in tumor samples compared to normal samples. Nucleotides in each amplicon were numbered according to their positions in SEQ ID NO: 1 and SEQ ID NO: 2 for the PERl and ZNF 145 genes respectively. In PERl, CpG sites located at positions 917, 946, and 977 in position SEQ ID NO: 1 were hypermethylated in a portion of the amplicons from both early and late

stage colorectal cancers that were sequenced. In ZNF 145, CpG sites located at positions 1963, 2008, 2023, 2063, 2120, and 2154 in SEQ ID NO: 2 were hypermethylated in all of the amplicons from early stage colorectal cancers, and CpG sites located at positions 2074, 2083, 2110, and 2113 in SEQ ID NO: 2 were hypermethylated in a portion of the amplicons from early stage colorectal cancers. (Sequence traces for late stage tumors for ZNF 145 were not readable.) In Figure 2, hypermethylation in a portion of amplicons sequenced (partial hypermethylation) is indicated by a half-filled circle, whereas hypermethylation in all amplicons sequences is indicated by a completely filled circle.

EXAMPLE 4: Methylation-specific PCR analysis CpG sites in the PERl and ZNF145 genes

[0113] The experiments in this Example can be used to determine the methylation state of CpG sites in candidate genes. These experiments can also be used to verify information about the methylation state of CpG sites obtained by different methods such as sodium bisulfite sequencing.

[0114] DNA can be extracted from a sample and then treated with sodium bisulfite, resulting in the conversion of unmethylated cytosines to uracil and in no modification of methylated cytosines. The sodium bisulfite-treated DNA can then be amplified using primers designed for methylation-specific PCR (ms-PCR). For example, primers for ms-PCR analysis of PERl and ZNF 145 were designed and summarized in Table 5. "MSP-M" refers to primers that are designed to amplify sodium bisulfite-treated methylated DNA, whereas "MSP-U" refers to primers that are designed to amplify sodium-bisulfϊte-treated unmethylated DNA. The amount of product from amplification reactions using MSP-M primers can be compared to the amount of product using MSP-U primers. This comparison can be used to estimate the relative amounts of methylated versus unmethylated cytosines in the sample at the CpG site or sites being analyzed.

EXAMPLE 5: Sodium bisulfite sequencing of CpG sites in the MN/CA9 promoter region

[0115] The sequence of the MN/CA9 promoter region does not have a CpG island as defined by Gardiner-Garden and Fromer (i.e. regions of DNA of at least 200 bp in length that have a G + C content above 50% and a ratio of observed versus expected CpGs close to or above 0.6). Nevertheless, there is a small region in the promoter region with a relatively high density of CpG sites. The sequence of this region is listed in the present application in SEQ ID NO: 3, and the expected sequence after conversion by sodium bisulfite treatment is depicted in Figure 3. Methylation in this region was interrogated using sodium bisulfite- sequencing methodology as described below.

[0116] DNA was isolated from samples and treated with sodium bisulfite, which converts unmethylated cytosine to uracil and leaves methylated cytosine unchanged. The sodium- bisulfite-treated DNA was then amplified with primers. Such primers were designed to bind to and amplify bisulfite -treated sequences irrespective of methylation. Samples were sequenced using a fluorescent dye -terminator method.

[0117] Pooled DNA samples of early stage colorectal tumors, adjacent normal tissue from early stage patients, late stage colorectal tumors, and adjacent normal tissue from late stage patients were sequenced. At four of the seven CpG sites contained within the sequenced amplicon, a significant increase in methylation was seen in the tumor samples as compared to the adjacent normal tissue samples. These four CpG sites are shown in the sequence traces depicted in Figure 4 in the boxed areas and correspond to positions 1002, 1115, 1024, and 1060 in SEQ ID NO: 3. Solid bold curves indicate thymine (which represent thymines and unmethylated cytosines) and double curves indicate cytosine (which represent methylated cytosines) (see Figure legend). The relative increases in the area under the curve of the cytosine trace in the tumor samples as compared to the adjacent normal tissue samples indicate increased methylation in the tumor samples.

EXAMPLE 6: Methylation-specific PCR analysis of methylation in the MN /C A9 promoter region

[0118] Methylation at CpG sites in the promoter region of the MN/CA9 gene was also analyzed using methylation-specific PCR. DNA from tumor and adjacent normal tissue samples was treated with sodium bisulfite and then amplified. The shaded regions in the sequence depicted in Figure 4 indicate the regions where oligonucleotide primers used in the PCR reactions bound. Products from amplification reactions using a set of primers specific for methylated DNA were compared to products from amplification reactions using a set of primers specific for unmethylated DNA.

[0119] Increased methylation was observed in tumor samples for three pairs of tumor and adjacent normal tissue samples, as shown in the agarose shown in Figure 5. (In the sample pairs indicated by white arrows, increased product signal is detected in the tumor samples as compared to the adjacent normal tissue samples, when primers were amplified using methylation-specific primers.)