Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPOSITIONS AND METHODS FOR DETECTING PREDISPOSITION TO A SUBSTANCE USE DISORDER
Document Type and Number:
WIPO Patent Application WO/2010/129354
Kind Code:
A2
Abstract:
The present invention provides screening kits, compositions, and diagnostic methods for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder by determining a nucleic acid methylation profile from a biological sample from the subject, wherein a given profile indicates that the subject has a predisposition to a substance use disorder.

Inventors:
PHILIBERT ROBERT (US)
MADAN ANUP (US)
Application Number:
PCT/US2010/032815
Publication Date:
November 11, 2010
Filing Date:
April 28, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV IOWA RES FOUND (US)
PHILIBERT ROBERT (US)
MADAN ANUP (US)
International Classes:
C12Q1/68
Foreign References:
US20070292880A12007-12-20
US20050153347A12005-07-14
US20090075827A12009-03-19
Other References:
BLEICH ET AL.: 'Epigenetic DNA hypermethylation of the HERP gene promoter induces down-regulation of its mRNA expression in patients with alcohol dependence.' ALCOHOL CLIN EXP RES. vol. 30, no. 4, 2006, pages 587 - 91
PHILIBERT ET AL.: 'MAOA methylation is associated with nicotine and alcohol dependence in women.' AM J MED GENET B NEUROPSYCHIATR GENET. vol. 147B, no. 5, 2008, pages 565 - 70
NIELSEN ET AL.: 'Increased OPRM1 DNA Methylation in Lymphocytes of Methadone-Maintained Former Heroin Addicts.' NEUROPSYCHOPHARMACOLOGY. vol. 34, no. 4, March 2009, pages 867 - 73
DEMPSTER ET AL.: 'The quantification of COMT mRNA in post mortem cerebellum tissue: diagnosis, genotype, methylation and expression.' BMC MED GENET. vol. 7, no. 10, 2006, pages 1 - 7
Attorney, Agent or Firm:
VIKSNINS, Ann, S., et al. (P.O. Box 111098St. Paul, Minnesota, US)
Download PDF:
Claims:
What is claimed is:

1. A screening kit for determining whether a human subject has the likelihood of using, abusing or being dependent upon a substance comprising:

(a) a solid substrate, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or

(b) a solid substrate, at least one probe specific for methylation status a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or

(c) a solid substrate, at least one probe specific for methylation status a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence.

2. The kit of claim 1, wherein the substance is nicotine and the CpG dinucleotide repeat motif is located in a gene from Table 5 or Table 9.

3. The kit of claim 1, wherein the substance is alcohol and the CpG dinucleotide repeat motif is located in a gene from Table 6 or Table 13.

4. The kit of claim 1, wherein the substance is cannabis and the CpG dinucleotide repeat motif is located in a gene from Table 7 or Table 14.

5. A screening kit for determining whether a subject has the likelihood of using, abusing or being dependent upon a substance comprising at least one probe specific for a methylated monoamine oxidase A (MAOA) locus or a methylated monoamine oxidase B (MAOB) locus in a peripheral blood cell, wherein the methylation of MAOA is associated with the using, abusing or being dependent upon the substance.

6. The kit of claim 5, wherein the substance use disorder is nicotine dependence.

7. The kit of claim 6, wherein the probe detects methylation at CpG residue 18, 42, 48, 52, 64, 65, 66, 67, 68, 69, and/or 77 of MAOA.

8. The kit of claim 5, wherein the substance use disorder is alcohol dependence.

9. The kit of claim 8, wherein the probe detects methylation at CpG residue 27, 38, 41 and/or 48 of M4O4.

10. The kit of claim 5, wherein the substance use disorder is cannabis dependence.

11. The kit of claim 10, wherein the subject is female and the probe detects methylation at CpG residue 69 and/or 88 of MAOA.

12. The kit of claim 10, wherein the subject is male and the probe detects methylation at CpG residue 11-12, 13, 64, 69, 72 and/or 73 of MAOA.

13. The kit of any one of claims 1 to 12, wherein the substrate is a polymer, glass, semiconductor, paper, metal, gel or hydrogel.

14. The kit of any one of claims 1 to 13, further comprising a solid substrate and at least one control probe, wherein the at least one control probe is bound onto the substrate in a distinct spot.

15. The kit of any one of claims 1 to 14, wherein the solid substrate is a microarray or microfluidics card.

16. The kit of any one of claims 1 to 15, wherein the probe is an oligonucleotide probe.

17. The kit of any one of claims 1 to 16, wherein the probe is a nucleic acid derivative probe.

18. A screening kit that uses bisulfite treated DNA for determining whether a subject has the likelihood of using, abusing or being dependent upon a substance comprising:

(a) a single base pair extension probe, with at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or

(b) a single base pair extension probe, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or

(c) a single base pair extension probe, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence.

19. A screening kit that uses bisulfite treated DNA for determining whether a subject has the likelihood of having a substance use disorder or substance use syndrome comprising:

(a) a nucleic acid primer, with at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or

(b) a nucleic acid primer, at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or

(c) a nucleic acid primer, at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence.

20. A diagnostic method using bisulfite treated DNA for determining whether a subject has the likelihood of having a substance use disorder or substance use syndrome comprising:

(a) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or (b) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or

(c) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence.

21. A diagnostic method for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder, by determining a nucleic acid methylation profile from a single type of peripheral blood cell or blood cell derivative from the subject, the method comprising:

(a) obtaining a profile associated with the sample, wherein the profile comprises quantitative data for methylation of a monoamine oxidase A (MAOA) locus in the blood cell;

(b) inputting the data into an analytical process that uses the data to classify the sample, wherein the classification is a "substance use disorder" classification or a "healthy" classification; and

(c) classifying the sample according to the output of the process.

22. The method of claim 20 or 21, wherein the blood cell is a lymphocyte.

23. The method of claim 20 or 21, wherein the blood cell type is a monocyte.

24. The method of claim 20 or 21 , wherein the blood cell type is a basophil.

25. The method of claim 20 or 21, wherein the blood cell type is an eosinophil.

26. The method of claim 20 or 21 , wherein the blood cell type is a neutrophil.

27. The method of claim 20 or 21, wherein the blood cell type is a mixture of peripheral white blood cells.

28. The method of claim 20 or 21 , wherein the peripheral blood cell has been transformed into a cell line.

29. The method of claim 28, wherein the analytical process comprises comparing the obtained profile with a reference profile.

30. The method of claim 28, wherein the reference profile comprises data obtained from one or more healthy control subjects, or comprises data obtained from one or more subjects diagnosed with a substance use disorder.

31. The method of claim 28, further comprising obtaining a statistical measure of a similarity of the obtained profile to the reference profile.

32. The method of claim 28, wherein the blood cell or blood cell derivative is a peripheral blood cell.

33. The method of any of claims 20 to 32, wherein the profile is obtained by sequencing of methylated DNA.

Description:
COMPOSITIONS AND METHODS FOR DETECTING PREDISPOSITION TO A SUBSTANCE USE DISORDER RELATED APPLICATION

This application claims priority to U.S. Provisional Patent Application No. 61/173,274, filed April 28, 2009, the entirety of which is incorporated herein by reference.

STATEMENT OF GOVERNMENT SUPPORT Work related to this invention was funded by the U.S. government (NIH

Grants DA015789, DA010923, DA02173603, MH080898, and P30DA027827). The government has certain rights in this patent.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on May 7, 2010, is named 17023100.txt and is 9,145 bytes in size.

BACKGROUND

Substance use disorders cause serious problems, both for the affected individuals and for society in general. Despite intensive research, however, a reliable laboratory test for diagnosing a patient as having, or for being at risk for developing, such conditions has not been developed. Such diagnoses are still generally made clinically, on the basis of observed behavior. Given the difficulties of defining normal experience and behavior and the lack of reliable objective indicators, it is not surprising that to date systems of diagnosis in psychiatry have been less than satisfactory. A reliable laboratory test would be of practical value in everyday clinical practice, for example, in assisting doctors in prescribing the appropriate treatment for their patients. Thus, methods of identifying subjects that have, or are at risk for developing, substance use disorders are needed.

SUMMARY OF CERTAIN EMBODIMENTS OF THE INVENTION The present invention provides a screening kit for determining whether a human subject has the likelihood of using, abusing or being dependent upon a substance comprising: (a) a solid substrate, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or (b) a solid substrate, at least least one probe specific for methylation status a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or (c) a solid substrate, at least one probe specific for methylation status a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence. As used herein, the term "methylation status" means the determination whether a certain target DNA, such as a CpG dinucleotide, is methylated. As used herein the term "CpG dinucleotide repeat motif means a series of two or more CpG dinucleotides positioned in a DNA sequence.

In certain embodiments of the present invention, the substance is nicotine and the CpG dinucleotide repeat motif is located in a gene from Table 5 or Table 9. In certain embodiments of the present invention, the substance is alcohol and the CpG dinucleotide repeat motif is located in a gene from Table 6 or Table 13. In certain embodiments of the present invention, the substance is cannabis and the CpG dinucleotide repeat motif is located in a gene from Table 7 or Table 14.

The present invention also provides a screening kit for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder including at least one probe specific for a methylated monoamine oxidase A (MAOA) or monoamine oxidase B (MAOE) locus in a peripheral blood cell, wherein the methylation of MAOA is associated with a substance use disorder. In certain embodiments, the kit further includes a solid substrate, wherein each probe is bound onto the substrate in a distinct spot. In certain embodiments, the substance use disorder is nicotine dependence. In certain embodiments, the probe detects methylation at CpG residue 18, 42, 48, 52, 64, 65, 66, 67, 68, 69, and/or 77. In certain embodiments, the substance use disorder is alcohol dependence. In certain embodiments, the probe detects methylation at CpG residue 27, 38, 41 and/or 48. In certain embodiments, the substance use disorder is cannabis dependence. In certain embodiments, the subject is female and the probe detects methylation at CpG residue 69 and/or 88 of MAOA. In certain embodiments, the subject is male and the probe detects methylation at CpG residue 11-12, 13, 64, 69, 72 and/or 73 of MAOA. In certain embodiments, the substrate is a polymer, glass, semiconductor, paper, metal, gel or hydrogel. In certain embodiments, the kit further includes at least one control probe, wherein the at least one control probe is bound onto the substrate in a distinct spot. In certain embodiments, the solid substrate is a microarray or microfluidics card. In certain embodiments, the probe is an oligonucleotide probe or a nucleic acid derivative probe. The present invention provides a screening kit that uses bisulfite treated

DNA for determining whether a subject has the likelihood of using, abusing or being dependent upon a substance comprising: (a) a single base pair extension probe, with at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or (b) a single base pair extension probe, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or (c) a single base pair extension probe, at least one probe specific for methylation status of a CpG dinucleotide repeat motif expressed by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence. As used herein, a "single base pair extension probe" is a nucleic acid that selectively recognizes a single nucleotide polymorphism (i.e., either the A or the G of an A/G polymorphism). Generally, these probes take the form of a DNA primer (e.g., as in PCR primers) that are modified so that incorporation of the primer releases a fluorophore. One example of this is a Taqman ® probe that uses the 5' exonuclease activity of the enzyme Taq Polymerase for measuring the amount of target sequences in the samples. TaqMan® probes consist of a 18-22 bp oligonucleotide probe, which is labeled with a reporter fluorophore at the 5' end, and a quencher fluorophore at the 3' end. Incorporation of the probe molecule into a PCR chain (which occurs because the probe set is contained in a mixture of PCR primers) liberates the reporter fluorophore from the effects of the quencher. The primer must be able to recognize the target binding site. Some primer extension probes can be "activated" directly by DNA polymerase without a full PCR extension cycle.

The present invention provides a screening kit that uses bisulfite treated DNA for determining whether a subject has the likelihood of having a substance use disorder or substance use syndrome comprising: (a) a nucleic acid primer, with at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or (b) a nucleic acid primer, at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or (c) a nucleic acid primer, at least one primer specific for methylation status of a CpG dinucleotide repeat motif region contained by a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence. In certain embodiments, the kit may contain a number of primers that is any integer between 1 and 10,000, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . 9997, 9998, 9999, 10,000. As used herein, the term "nucleic acid primer" encompasses both DNA and RNA primers. The present invention provides a diagnostic method using bisulfite treated

DNA for determining whether a subject has the likelihood of having a substance use disorder or substance use syndrome comprising: (a) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with nicotine use, abuse or dependence; and/or (b) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with alcohol use, abuse or dependence; and/or (c) determining methylation status of a CpG dinucleotide repeat motif region in a peripheral blood cell or its derivative, wherein the methylation status of the CpG dinucleotide is associated with cannabis use, abuse or dependence. In certain embodiments, the method determines the methylation status of a plurality of CpG dinucleotide repeat motif regions. Such a plurality may be any integer between 1 and 10,000, such as at least 100.

The present invention provides a diagnostic method for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder, by determining a nucleic acid methylation profile from a single type of peripheral blood cell or blood cell derivative from the subject, the method comprising: (a) obtaining a profile associated with the sample, wherein the profile comprises quantitative data for methylation of a monoamine oxidase A (MAOA) locus in the blood cell; (b) inputting the data into an analytical process that uses the data to classify the sample, wherein the classification is a "substance use disorder" classification or a "healthy" classification; and (c) classifying the sample according to the output of the process.

In certain embodiments of the present invention, the blood cell is a lymphocyte, such as a monocyte, a basophil, an eosinophil, and/or a neutrophil. In certain embodiments, the blood cell type is a mixture of peripheral white blood cells. In certain embodiments, the peripheral blood cell has been transformed into a cell line.

In certain embodiments, the analytical process comprises comparing the obtained profile with a reference profile. In certain embodiments, the reference profile comprises data obtained from one or more healthy control subjects, or comprises data obtained from one or more subjects diagnosed with a substance use disorder. In certain embodiments, the method further comprising obtaining a statistical measure of a similarity of the obtained profile to the reference profile. In certain embodiments, the blood cell or blood cell derivative is a peripheral blood cell. In certain embodiments, the profile is obtained by sequencing of methylated DNA, such as by digital sequencing. The present invention provides a diagnostic method for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder, by determining a nucleic acid methylation profile from a single type of blood cell or blood cell derivative from the subject, the method involves: (a) obtaining a profile associated with the sample, wherein the profile determines quantitative data for methylation of a monoamine oxidase A [MAOA) locus in the blood cell; (b) inputting the data into an analytical process that uses the data to classify the sample, wherein the classification is a "substance use disorder" classification or a "healthy" classification; and (c) classifying the sample according to the output of the process. In certain embodiments, the analytical process involves comparing the obtained profile with a reference profile. In certain embodiments, the reference profile provides data obtained from one or more healthy control subjects, or provides data obtained from one or more subjects diagnosed with a substance use disorder. In certain embodiments, the method further involves obtaining a statistical measure of a similarity of the obtained profile to the reference profile. In certain embodiments, the blood cell or blood cell derivative is a peripheral blood cell. In certain embodiments, the blood cell is a lymphocyte. In certain embodiments, the lymphocyte type is a B-lymphocyte. In certain embodiments, the B-lymphocytes have been immortalized. In certain embodiments, the blood cell type is a monocyte. In certain embodiments, the blood cells type is a basophil. In certain embodiment, the substance use disorder is nicotine dependence, alcohol dependence, or cannabis dependence.

In certain embodiments, a solid substrate may contain a number of probes that is any integer between 1 and 10,000 probes, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . 9997, 9998, 9999, 10,000. In one kit, all of the probes may be physically located on a single solid substrate or on multiple substrates.

In certain embodiments, the current invention can also take the form of a PCR (polymerize chain reaction) assay. In some cases, this will take the form of real time PCR assays (RTPCR) assays. In certain embodiments of these PCR assays, a kit may contain two primers that specifically amplify a region of a MAOA and gene specific probe that selectively recognizes the amplified region. Together, the primers and the gene specific probes are referred to as a primer-probe set. By measuring the amount of gene specific probe that has hybridized to an amplified segment at a given point of the PCR reaction or throughout the PCR reaction, one who is skilled in the art can infer the amount of nucleic acid originally present at the start of the reaction. In some cases, the amount of probe hybridized is measured through fluorescence spectrophotometry. The number of primer-probe sets can be any integer between 1 and 10,000 probes, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . 9997, 9998, 9999, 10,000. In one kit, all of the probes may be physically located in a single reaction well or in multiple reaction wells. The probes may be in dry or in liquid form. They may be used in a single reaction or in a series of reactions. In certain embodiments, the probe is an oligonucleotide probe. In certain embodiments, the probe is a nucleic acid derivative probe. The term "substrate" refers to any solid support to which the probes may be attached. The substrate material may be modified, covalently or otherwise, with coatings or functional groups to facilitate binding of probes. Suitable substrate materials include polymers, glasses, semiconductors, papers, metals, gels and hydrogels among others. Substrates may have any physical shape or size, e.g., plates, strips, or microparticles.

The term "spot" refers to a distinct location on a substrate to which probes of known sequence or sequences are attached. A spot may be an area on a planar substrate, or it may be, for example, a microparticle distinguishable from other microparticles.

The term "bound" means affixed to the solid substrate. A spot is "bound" to the solid substrate when it is affixed in a particular location on the substrate for purposes of the screening assay. In certain embodiments of the kit of the present invention, the substrate is a polymer, glass, semiconductor, paper, metal, gel or hydrogel. In certain embodiments of the present invention, the kit further includes a solid substrate and at least one control probe, wherein the at least one control probe is bound onto the substrate in a distinct spot. In certain embodiments of the present invention, the solid substrate is a microarray. An "array" or "microarray" is used synonymously herein to refer to a plurality of probes attached to one or more distinguishable spots on a substrate. A microarray may include a single substrate or a plurality of substrates, for example a plurality of beads or microspheres. A "copy" of a microarray contains the same types and arrangements of probes.

The present invention also provides a composition for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder by determining a nucleic acid methylation profile from a single type of blood cell or blood cell derivative from the subject, the method including obtaining a profile associated with the sample, wherein the profile includes quantitative data for MAOA; (b) inputting the data into an analytical process that uses the data to classify the sample, wherein the classification is a "substance use disorder" classification or a "healthy" classification; and (c) classifying the sample according to the output of the process. In certain embodiments, a solid substrate may contain a number of probes that is any integer between 1 and 10,000 probes, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . 9997, 9998, 9999, 10,000. As used herein, the term "healthy" means that a subject does not manifest a particular condition, and is no more likely that at random to be susceptible to a particular condition.

The present invention also provides a composition for determining whether a subject has a predisposition to, or likelihood of having nicotine dependence, alcohol dependence or cannabis dependence including (a) a solid substrate; (b) at least one probe specific for a methylated MAOA gene associated with nicotine dependence, alcohol dependence or cannabis dependence wherein each probe is bound onto the substrate in a distinct spot. In certain embodiments, a solid substrate may contain a number of probes that is any integer between 1 and 10,000 probes, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . 9997, 9998, 9999, 10,000.

In addition to the specific biomarker sequences identified in this application by name, accession number, or sequence, the invention also contemplates use of biomarker variants that are at least 90% or at least 95% or at least 97% identical to the exemplified sequences and that are now known or later discover and that have utility for the methods of the invention. These variants may represent polymorphisms, splice variants, mutations, and the like. Various techniques and reagents find use in the diagnostic methods of the present invention. In one embodiment of the invention, blood samples, or samples derived from blood, e.g. plasma, circulating, etc. are assayed for the presence of polypeptides. Typically a blood sample is drawn, and a derivative product, such as plasma or serum, is tested. Such polypeptides may be detected through specific binding members. The use of antibodies for this purpose is of particular interest. Various formats find use for such assays, including antibody arrays; ELISA and RIA formats; binding of labeled antibodies in suspension/solution and detection by flow cytometry, mass spectroscopy, and the like. Detection may utilize one or a panel of antibodies, preferably a panel of antibodies in an array format. Expression signatures typically utilize a detection method coupled with analysis of the results to determine if there is a statistically significant match with a disease signature. The present invention also provides a composition for determining whether a subject has a predisposition to, or likelihood of having nicotine dependence, alcohol dependence or cannabis dependence including a PCR or RTPCR assay kit containing at least one primer-probe set specific for a methylated MAOA nucleic acid.

The present invention also provides a diagnostic method for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder by determining a nucleic acid methylation profile from a single type of blood cell or a blood cell derivative from the subject, the method involves (a) obtaining a profile associated with the sample, wherein the profile comprises quantitative data for at least one methylated MAOA nucleic acid; (b) inputting the data into an analytical process that uses the data to classify the sample, wherein the classification is a "substance use disorder" classification or a "healthy" classification; and (c) classifying the sample according to the output of the process. In certain embodiments, the analytical process comprises comparing the obtained profile with a pre-determined reference profile. In certain embodiments the reference profile comprises data obtained from one or more healthy control subjects, or comprises data obtained from one or more subjects diagnosed with a substance use disorder. In certain embodiments, the method further involves obtaining a statistical measure of a similarity of the obtained profile to the reference profile.

In certain embodiments the blood cell is a lymphocyte. In certain embodiments the lymphocyte type is a B-lymphocyte. In certain embodiments, the B-lymphocytes have been immortalized. In certain embodiments, the blood cell type is a monocyte. In certain embodiments, the blood cell type is a basophil.

The present invention provides a diagnostic method for determining whether a subject has a predisposition to, or likelihood of having, a substance use disorder. As used herein the term "predisposition" is defined as a tendency or susceptibility to manifest a condition. A subject is more likely than a control subject to manifest the condition. The term "substance use disorder" includes both abuse and dependence on a substance. The method involves determining a nucleic acid methylation profile from cells in a biological sample from the subject, wherein a given profile indicates that the subject has a predisposition to, or likelihood of having, a substance use disorder. The substance use disorder to be diagnosed may include nicotine dependence and/or alcohol dependence.

The present invention also provides a method for diagnosing a predisposition to, or likelihood of having, a substance use disorder, where the method involves (a) determining a nucleic acid methylation profile of MAOA from a single type of cell from a biological sample from the subject; and (b) comparing the nucleic acid methylation profile with a nucleic acid methylation profile characteristic of the condition to determine if the patient has the a predisposition to, or likelihood of having, a substance use disorder.

The present invention further provides a method for evaluating and treating a patient experiencing a substance use disorder, where the method involves (a) obtaining a baseline laboratory profile comprising collecting blood from the patient to determine the patient's baseline nucleic acid methylation of MAOA profile level from a single type of cell; (b) treating the patient for the substance use disorder; (c) obtaining a post-treatment laboratory profile comprising collecting blood from the patient to determine the patient's post-treatment nucleic acid methylation profile level from the same type of cell tested previously; and (d) comparing the baseline and post-treatment laboratory profile to evaluate the effectiveness of the treatment.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1. The sequence and structure of the MAOA promoter region (SEQ ID NO:9). The first CpG island begins at bp 43398975 and contains 18 CpG residues. A second CpG island begins at bp 43399493 and contains 70 CpG residues. The position of each of the CpG residues is noted in the figure. The first exon of MAOA is denoted by small letters and is wholly contained within the second island. The positions of the primers used to amplify the MAOA VNTR are denoted by boxed letters. The transcription start site (TSS) is at bp43400353 between CpG residues 64 and 65.

Figure 2. The average methylation ratios (methyl CpG/total CpG) at each CpG residue for each sex. The bp position on the X chromosome is given on the X axis and corresponds to the position of each of the residues in Figure 1. The average values for female subjects are depicted by blue squares, while the average values for males are depicted by red circles. The position of MAOA exon 1 is denoted by the box with the direction of transcription being indicated by the line with arrows. Figure 3. The relationship of MAOA VNTR genotype to methylation in females

(above) and males (below). There was a trend for association for female 3,3 homozygotes to have higher average methylation (methyl CpG/total CpG) than female 4,4 homozygotes (43.3% ± 3.8 vs 40.9% ± 5.2; pO.lO). There was no significant difference between males hemizygous for the 3 repeat allele as compared to those with the 4 allele although the arithmetic difference was in the same direction (9.0 ± 3.7 vs 8.3 ± 2.6; p<0.32).

Figure 4. The average methylation (methyl CpG/total CpG) at each CpG residue for each sex. The first island consists of 18 CpG residues while the second larger island consists of 70 residues, of which only the first 56 were analyzed in this study. Tic marks at the positions corresponding to CpG 24, 26 and 28 are missing because average methylation could not be reliably determined at those residues. The average methylation value for females at each residue is depicted by a pink square while the corresponding value for males is depicted by a blue diamond. The overall average methylation value is depicted by the value corresponding at position 75 (34.8% and 7.2% for females and males, respectively). The exact position of the transcription start site is between CpG 65 and CpG 66.

Figure 5. The relationship of MAOA VNTR genotype to methylation in females (above) and males (below). There was no significant difference between males with the 3 R (n=35; mean Z score -0.03, non-transformed average methylation (NTWAM) is 7.1%), and those with a 4R allele (n=61; mean Z score -0.03, NTWAM is 7.2%). Female 4R homozygotes (Z = -.101, NTWAM 33.6%) had significantly lower methylation than 3,4 heterozygotes (Z = 0.137, NTWA 36.2%; p<0.01 ). The difference between 4R and 3R homozygotes (Z=0.007, NTWAM, 34.7%) was not statistically different (p<0.39).

Figures 6A-6F. Plot of average methylation Z score at each residue in LB DNA for each grouping of smoking status. CpG residues are in order from left to right. The hatched bar indicates the residues in the first promoter island. The open bar indicates the TSS region. Group A. Current daily male smokers (n=42). Group B. Males who have quit smoking (n=20). Group C. Males who have never smoked daily (n=59). Group D. Current daily female smokers (n= 45 ). Group E. Females who have quit (n=27). Group F. Females who have never smoked daily (n=83).

Figures 7A-7F. Plot of average methylation Z score at each residue in LB (A, B and C) or WB (D, E and F) DNA from 77 female subjects of function of smoking status. The CpG residues are in order from left to right. The hatched bar indicates the residues from the first promoter island. The open bar indicates the TSS region. Group A and D. Female daily smokers (n=24). Group B and E. Females who have quit smoking (n=15). Group C and F. Females who have never smoked daily (n=38). Figure 8. The sequence of the AX2R promoter associated CpG island according to the UCSC Genome Browser, Build 18. The area corresponding to the probes listed in Table 11 are highlighted and boxed. The CpG residues in the island are numbered 1 through 37 and correspond to the numbers given in Table 12.

DETAILED DESCRIPTION DNA methylation

DNA does not exist as naked molecules in the cell. For example, DNA is associated with proteins called histones to form a complex substance known as chromatin. Chemical modifications of the DNA or the histones alter the structure of the chromatin without changing the nucleotide sequence of the DNA. Such modifications are described as "epigenetic" modifications of the DNA. Changes to the structure of the chromatin can have a profound influence on gene expression. If the chromatin is condensed, factors involved in gene expression may not have access to the DNA, and the genes will be switched off. Conversely, if the chromatin is "open," the genes can be switched on. Some important forms of epigenetic modification are DNA methylation and histone deacetylation. DNA methylation is a chemical modification of the DNA molecule itself and is carried out by an enzyme called DNA methyltransferase. Methylation can directly switch off gene expression by preventing transcription factors binding to promoters. A more general effect is the attraction of methyl-binding domain (MBD) proteins. These are associated with further enzymes called histone deacetylases (HDACs), which function to chemically modify histones and change chromatin structure. Chromatin-containing acetylated histones are open and accessible to transcription factors, and the genes are potentially active. Histone deacetylation causes the condensation of chromatin, making it inaccessible to transcription factors and causing the silencing of genes.

CpG islands are short stretches of DNA in which the frequency of the CpG sequence is higher than other regions. The "p" in the term CpG indicates that cysteine ("C") and guanine ("G") are connected by a phosphodiester bond. CpG islands are often located around promoters of housekeeping genes and many regulated genes. At these locations, the CG sequence is not methylated. By contrast, the CG sequences in inactive genes are usually methylated to suppress their expression. About 56% of human genes and 47% of mouse genes are associated with CpG islands. Often, CpG islands overlap the promoter and extend about 1000 base pairs downstream into the transcription unit. Identification of potential CpG islands during sequence analysis helps to define the extreme 5' ends of genes, something that is notoriously difficult with cDNA-based approaches.

The methylation of a CpG island can be determined by the art worker using any method suitable to determine such methylation. For example, the art worker can use a bisulfite reaction-based method for determining such methylation.

The present invention provides methods to determine the nucleic acid methylation of MAOA of a patient in order to predict the clinical course and eventual outcome of patients suspected of being predisposed or of having a substance use disorder. Previously, the only way to determine possible diagnoses was through subjective psychiatric evaluations. The present methods provide an objective component to diagnosis process. Nicotine dependence is the physical vulnerability of a person's body to the chemical nicotine, which is potently addicting when delivered by various tobacco products. Smoke from cigarettes, cigars and pipes contains thousands of chemicals, including nicotine. Nicotine is also found in chewing tobacco. Alcohol dependence is the physical vulnerability of a person's body to the chemical ethyl alcohol. In particular, in certain embodiments of the invention, the methods may be practiced as follows. A sample, such as a blood sample, is taken from a patient. In certain embodiments, a single cell type, e.g., lymphocytes, basophils, or monocytes isolated from the blood, may be isolated for further testing. The DNA is harvested from the sample and examined to determine if the MAOA region is methylated. For example, the DNA of interest can be treated with bisulfite to deaminate unmethylated cytosine residues to uracil. Since uracil bas pairs with adenosine, thymidines are incorporated into subsequent DNA strands in the place of unmethylated cytosine residues during subsequence PCR amplifications. Next, the target sequence is amplified by PCR, and probed with a MA OA -specific probe. Only DNA from the patient that was methylated will bind to the probe. A specific profile associates with a specific condition. For example, certain methylated CpG islands in MAOA are found with women having nicotine dependence (or are predisposed to having nicotine dependence), and certain methylated CpG islands in MAOA are found with women having alcohol dependence (or are predisposed to having alcohol dependence). Namely, methylated CpG islands 18, 42, 48, 52, 64-69 and 77 are associated with nicotine dependence, and methylated CpG islands 27, 38, 41 and 48 are associated with alcohol dependence. Methods of determining the patient nucleic acid profile are well known to the art worker and include any of the well-known detection methods. Various PCR methods are described, for example, in PCR Primer: A Laboratory Manual, Dieffenbach 7 Dveksler, Eds., Cold Spring Harbor Laboratory Press, 1995. Other analysis methods include, but are not limited to, nucleic acid quantification, restriction enzyme digestion, DNA sequencing, hybridization technologies, such as Southern Blotting, etc. , amplification methods such as Ligase Chain Reaction (LCR), Nucleic Acid Sequence Based Amplification (NASBA), Self-sustained Sequence Replication (SSR or 3SR), Strand Displacement Amplification (SDA), and Transcription Mediated Amplification (TMA), Quantitative PCR (qPCR), or other DNA analyses, as well as RT-PCR, in vitro translation, Northern blotting, and other RNA analyses. In another embodiment, hybridization on a microarray is used.

As used herein, the term "nucleic acid probe" or a "probe specific for" a nucleic acid means a nucleic acid sequence that has at least about 80%, e.g., at least about 90%, e.g., at least about 95% contiguous sequence identity or homology to the nucleic acid sequence encoding the targeted sequence of interest. A probe (or oligonucleotide or primer) of the invention has at least about 7-50, e.g., at least about 10-40, e.g., at least about 15-35, nucleotides. The oligonucleotide probes or primers of the invention may comprise at least about seven nucleotides at the 3' of the oligonucleotide that have at least about 80%, e.g., at least about 85%, e.g., at least about 90% contiguous identity to the targeted sequence of interest.

"Northern analysis" or "Northern blotting" is a method used to identify RNA sequences that hybridize to a known probe such as an oligonucleotide, DNA fragment, cDNA or fragment thereof, or RNA fragment. The probe is labeled with a radioisotope such as 32 P, by biotinylation or with an enzyme. The RNA to be analyzed can be usually electrophoretically separated on an agarose or polyacrylamide gel, transferred to nitrocellulose, nylon, or other suitable membrane, and hybridized with the probe, using standard techniques well known in the art. "Stringent conditions" are those that (1) employ low ionic strength and high temperature for washing, for example, 0.015 M NaCl/0.0015 M sodium citrate (SSC); 0.1% sodium lauryl sulfate (SDS) at 50 0 C, or (2) employ a denaturing agent such as formamide during hybridization, e.g., 50% formamide with 0.1% bovine serum albumin/0.1 % Ficoll/0.1 % polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mM NaCl, 75 mM sodium citrate at 42 0 C. Another example is use of 50% formamide, 5 x SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5 x Denhardt's solution, sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42 0 C, with washes at 42 0 C in 0.2 x SSC and 0.1% SDS. Other examples of stringent conditions are well known in the art.

The term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form, made of monomers (nucleotides) containing a sugar, phosphate and a base that is either a purine or pyrimidine. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues. The terms "nucleic acid," "nucleic acid molecule," or "polynucleotide" are used interchangeably and may also be used interchangeably with gene, cDNA, DNA and/or RNA encoded by a gene.

The term "nucleotide sequence" refers to a polymer of DNA or RNA which can be single-stranded or double-stranded, optionally containing synthetic, non- natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers. A DNA molecule or polynucleotide is a polymer of deoxyribonucleotides (A, G, C, and T), and an RNA molecule or polynucleotide is a polymer of ribonucleotides (A, G, C and U). A "gene," for the purposes of the present disclosure, includes a DNA region encoding a gene product, as well as all DNA regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. The term "gene" is used broadly to refer to any segment of nucleic acid associated with a biological function. Genes include coding sequences and/or the regulatory sequences required for their expression. Accordingly, a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions. For example, "gene" refers to a nucleic acid fragment that expresses mRNA, functional RNA, or specific protein, including regulatory sequences. "Functional RNA" refers to sense RNA, antisense RNA, ribozyme RNA, siRNA, or other RNA that may not be translated but yet has an effect on at least one cellular process. "Genes" also include nonexpressed DNA segments that, for example, form recognition sequences for other proteins. "Genes" can be obtained from a variety of sources, including cloning from a source of interest or synthesizing from known or predicted sequence information, and may include sequences designed to have desired parameters.

"Gene expression" refers to the conversion of the information, contained in a gene, into a gene product. It refers to the transcription and/or translation of an endogenous gene, heterologous gene or nucleic acid segment, or a transgene in cells. In addition, expression refers to the transcription and stable accumulation of sense (mRNA) or functional RNA. Expression may also refer to the production of protein. The term "altered level of expression" refers to the level of expression in transgenic cells or organisms that differs from that of normal or untransformed cells or organisms.

A gene product can be the direct transcriptional product of a gene (e.g., mRNA, tRNA, rRNA, antisense RNA, ribozyme, structural RNA or any other type of RNA) or a protein produced by translation of an mRNA. Gene products also include RNAs which are modified, by processes such as capping, polyadenylation, methylation, and editing, and proteins modified by, for example, methylation, acetylation, phosphorylation, ubiquitination, ADP-ribosylation, myristilation, and glycosylation. The term "RNA transcript" refers to the product resulting from RNA polymerase catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complementary copy of the DNA sequence, it is referred to as the primary transcript or it may be a RNA sequence derived from posttranscriptional processing of the primary transcript and is referred to as the mature RNA. "Messenger RNA" (mRNA) refers to the RNA that is without introns and that can be translated into protein by the cell. "cDNA" refers to a single- or a double-stranded DNA that is complementary to and derived from mRNA.

A "coding sequence," or a sequence that "encodes" a selected polypeptide, is a nucleic acid molecule that is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral (e.g., DNA viruses and retroviruses) or prokaryotic DNA, and especially synthetic DNA sequences. A transcription termination sequence may be located 3' to the coding sequence.

Certain embodiments of the invention encompass isolated or substantially purified nucleic acid compositions. In the context of the present invention, an "isolated" or "purified" DNA molecule or RNA molecule is a DNA molecule or RNA molecule that exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or RNA molecule may exist in a purified form or may exist in a non-native environment such as, for example, a transgenic host cell. For example, an "isolated" or "purified" nucleic acid molecule is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. In one embodiment, an "isolated" nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.

By "fragment" is intended a polypeptide consisting of only a part of the intact full-length polypeptide sequence and structure. The fragment can include a C- terminal deletion an N-terminal deletion, and/or an internal deletion of the native polypeptide. A fragment of a protein will generally include at least about 5-10 contiguous amino acid residues of the full-length molecule, preferably at least about 15-25 contiguous amino acid residues of the full-length molecule, and most preferably at least about 20-50 or more contiguous amino acid residues of the full- length molecule, or any integer between 5 amino acids and the full-length sequence.

Certain embodiments of the invention encompass isolated or substantially purified nucleic acid compositions. In the context of the present invention, an "isolated" or "purified" DNA molecule or RNA molecule is a DNA molecule or RNA molecule that exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or RNA molecule may exist in a purified form or may exist in a non-native environment such as, for example, a transgenic host cell. For example, an "isolated" or "purified" nucleic acid molecule is substantially free of other cellular material or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. In one embodiment, an "isolated" nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.

"Naturally occurring" is used to describe a composition that can be found in nature as distinct from being artificially produced. For example, a nucleotide sequence present in an organism, which can be isolated from a source in nature and which has not been intentionally modified by a person in the laboratory, is naturally occurring.

"Functional RNA" refers to sense RNA, antisense RNA, ribozyme RNA, siRNA, or other RNA that may not be translated but yet has an effect on at least one cellular process.

The term "RNA transcript" refers to the product resulting from RNA polymerase catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complementary copy of the DNA sequence, it is referred to as the primary transcript or it may be a RNA sequence derived from posttranscriptional processing of the primary transcript and is referred to as the mature RNA. "Messenger RNA" (mRNA) refers to the RNA that is without introns and that can be translated into protein by the cell. "cDNA" refers to a single- or a double-stranded DNA that is complementary to and derived from mRNA.

"Regulatory sequences" and "suitable regulatory sequences" each refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, translation leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences.

A "5' non-coding sequence" refers to a nucleotide sequence located 5' (upstream) to the coding sequence. It is present in the fully processed mRNA upstream of the initiation codon and may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. A "3' non-coding sequence" refers to nucleotide sequences located 3'

(downstream) to a coding sequence and may include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.

The term "translation leader sequence" refers to that DNA sequence portion of a gene between the promoter and coding sequence that is transcribed into RNA and is present in the fully processed mRNA upstream (5') of the translation start codon. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.

A "promoter" refers to a nucleotide sequence, usually upstream (5') to its coding sequence, which directs and/or controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription. "Promoter" includes a minimal promoter that is a short DNA sequence comprised of a TATA-box and other sequences that serve to specify the site of transcription initiation, to which regulatory elements are added for control of expression. "Promoter" also refers to a nucleotide sequence that includes a minimal promoter plus regulatory elements that is capable of controlling the expression of a coding sequence or functional RNA. This type of promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Accordingly, an "enhancer" is a DNA sequence that can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. It is capable of operating in both orientations (normal or flipped), and is capable of functioning even when moved either upstream or downstream from the promoter. Both enhancers and other upstream promoter elements bind sequence- specific DNA-binding proteins that mediate their effects. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even be comprised of synthetic DNA segments. A promoter may also contain DNA sequences that are involved in the binding of protein factors that control the effectiveness of transcription initiation in response to physiological or developmental conditions. "Constitutive expression" refers to expression using a constitutive promoter.

"Conditional" and "regulated expression" refer to expression controlled by a regulated promoter.

"Operably-linked" refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one of the sequences is affected by another. For example, a regulatory DNA sequence is said to be "operably linked to" or "associated with" a DNA sequence that codes for an RNA or a polypeptide if the two sequences are situated such that the regulatory DNA sequence affects expression of the coding DNA sequence (i.e., that the coding sequence or functional RNA is under the transcriptional control of the promoter). Coding sequences can be operably-linked to regulatory sequences in sense or antisense orientation.

"Expression" refers to the transcription and/or translation of an endogenous gene, heterologous gene or nucleic acid segment, or a transgene in cells. In addition, expression refers to the transcription and stable accumulation of sense (mRNA) or functional RNA. Expression may also refer to the production of protein. The term "altered level of expression" refers to the level of expression in cells or organisms that differs from that of normal cells or organisms.

The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) "reference sequence," (b) "comparison window," (c) "sequence identity," (d) "percentage of sequence identity," and (e) "substantial identity."

(a) As used herein, "reference sequence" is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.

(b) As used herein, "comparison window" makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches. Methods of alignment of sequences for comparison are well-known in the art. Thus, the determination of percent identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller (Myers and Miller, CABIOS, 4, 11 (1988)); the local homology algorithm of Smith et al. (Smith et al, Adv. Appl. Math., 2, 482 (1981)); the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, JMB, 48, 443 (1970)); the search-for- similarity-method of Pearson and Lipman (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85, 2444 (1988)); the algorithm of Karlin and Altschul (Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 87, 2264 (1990)), modified as in Karlin and Altschul (Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90, 5873 (1993)).

Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. (Higgins et al, CABIOS, 5, 151 (1989)); Corpet et al. (Corpet et al, Nucl. Acids Res., 16, 10881 (1988)); Huang et al. (Huang et al, CABIOS, 8, 155 (1992)); and Pearson et al. (Pearson et al, Meth. MoI. Biol., 24, 307 (1994)). The ALIGN program is based on the algorithm of Myers and Miller, supra. The BLAST programs of Altschul et al. (Altschul et al., JMB, 215, 403 (1990)) are based on the algorithm of Karlin and Altschul supra.

Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached. In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, less than about 0.01, or even less than about 0.001. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=- 4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. Alignment may also be performed manually by inspection.

For purposes of the present invention, comparison of nucleotide sequences for determination of percent sequence identity to the promoter sequences disclosed herein may be made using the BlastN program (version 1.4.7 or later) with its default parameters or any equivalent program. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by the program. (c) As used herein, "sequence identity" or "identity" in the context of two nucleic acid or polypeptide sequences makes reference to a specified percentage of residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window, as measured by sequence comparison algorithms or by visual inspection. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity." Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).

(d) As used herein, "percentage of sequence identity" means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e. , gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.

(e)(i) The term "substantial identity" of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, or 94%, or even at least 95%, 96%, 97%, 98%, or 99% sequence identity, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill in the art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 70%, 80%, 90%, or even at least 95%.

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH. However, stringent conditions encompass temperatures in the range of about 1°C to about 20°C, depending upon the desired degree of stringency as otherwise qualified herein. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.

(e)(ii) The term "substantial identity" in the context of a peptide indicates that a peptide comprises a sequence with at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, or 94%, or even 95%, 96%, 97%, 98% or 99%, sequence identity to the reference sequence over a specified comparison window. In certain embodiments, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, JMB, 48, 443

(1970)). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Thus, the invention also provides nucleic acid molecules and peptides that are substantially identical to the nucleic acid molecules and peptides presented herein.

For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.

As noted above, another indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. The phrase "hybridizing specifically to" refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. "Bind(s) substantially" refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.

"Stringent hybridization conditions" and "stringent hybridization wash conditions" in the context of nucleic acid hybridization experiments such as Southern and Northern hybridizations are sequence dependent, and are different under different environmental parameters. Longer sequences hybridize specifically at higher temperatures. The thermal melting point (Tm) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the T m can be approximated from the equation of Meinkoth and Wahl (1984); T m 81.5°C + 16.6 (log M) + 0.41 (%GC) - 0.61 (% form) - 500/L; where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. T m is reduced by about 1°C for each 1% of mismatching; thus, T m , hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the T n , can be decreased 10°C. Generally, stringent conditions are selected to be about 5°C lower than the T m for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4°C lower than the T m ; moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10°C lower than the T m ; low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20°C lower than the T m . Using the equation, hybridization and wash compositions, and desired temperature, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a temperature of less than 45°C (aqueous solution) or 32°C (formamide solution), the SSC concentration is increased so that a higher temperature can be used. Generally, highly stringent hybridization and wash conditions are selected to be about 5°C lower than the T m for the specific sequence at a defined ionic strength and pH.

An example of highly stringent wash conditions is 0.15 M NaCl at 72 0 C for about 15 minutes. An example of stringent wash conditions is a 0.2 x SSC wash at 65°C for 15 minutes. Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1 x SSC at 45°C for 15 minutes. For short nucleotide sequences (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.5 M, less than about 0.01 to 1.0 M, Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 3O 0 C and at least about 60°C for long probes (e.g., >50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2 x (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. Very stringent conditions are selected to be equal to the T m for a particular probe. An example of stringent conditions for hybridization of complementary nucleic acids that have more than 100 complementary residues on a filter in a Southern or Northern blot is 50% formamide, e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1 x SSC at 60 to 65°C. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37°C, and a wash in 1 x to 2 x SSC (20 x SSC = 3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55 0 C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C, and a wash in 0.5 x to 1 x SSC at 55 to 60 0 C.

In a further embodiment of the invention, there are provided articles of manufacture and kits containing probes, oligonucleotides or antibodies which can be used, for instance, for the diagnostic applications described above. The article of manufacture comprises a container with a label. Suitable containers include, for example, bottles, vials, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. The container holds a composition which includes an agent that is effective for diagnostic applications, such as described above. The label on the container indicates that the composition is used for a specific diagnostic application. The kit of the invention will typically comprise the container described above and one or more other containers comprising materials desirable from a commercial and user standpoint, including buffers, diluents, filters and package inserts with instructions for use. The probes of the present invention can be labeled using techniques known to those of skill in the art. For example, the labels used in the assays of invention can be primary labels (where the label comprises an element that is detected directly) or secondary labels (where the detected label binds to a primary label, e.g., as is common in immunological labeling). An introduction to labels (also called "tags"), tagging or labeling procedures, and detection of labels is found in Polak and Van Noorden (1997) Introduction to Immunocytochemistry, second edition, Springer Verlag, N. Y. and in Haugland (1996) Handbook of Fluorescent Probes and Research Chemicals, a combined handbook and catalogue Published by Molecular Probes, Inc., Eugene, Oreg. Primary and secondary labels can include undetected elements as well as detected elements. Useful primary and secondary labels in the present invention can include spectral labels such as fluorescent dyes (e.g., fluorescein and derivatives such as fluorescein isothiocyanate (FITC) and Oregon Green™, rhodamine and derivatives (e.g., Texas red, tetramethylrhodamine isothiocyanate (TRITC), etc.), digoxigenin, biotin, phycoerythrin, AMCA, CyDyes™, and the like), radiolabels (e.g., 3 H, 125 1, 35 S, 14 C, 32 P, 33 P), enzymes (e.g., horse-radish peroxidase, alkaline phosphatase) spectral colorimetric labels such as colloidal gold or colored glass or plastic (e.g., polystyrene, polypropylene, latex) beads. The label may be coupled directly or indirectly to a component of the detection assay (e.g., the labeled nucleic acid) according to methods well known in the art. As indicated above, a wide variety of labels may be used, with the choice of label depending on sensitivity required, ease of conjugation with the compound, stability requirements, available instrumentation, and disposal provisions. In general, a detector that monitors a probe-substrate nucleic acid hybridization is adapted to the particular label that is used. Typical detectors include spectrophotometers, phototubes and photodiodes, microscopes, scintillation counters, cameras, film and the like, as well as combinations thereof. Examples of suitable detectors are widely available from a variety of commercial sources known to persons of skill. Commonly, an optical image of a substrate comprising bound labeled nucleic acids is digitized for subsequent computer analysis.

Preferred labels include those that use (1) chemiluminescence (using Horseradish Peroxidase and/or Alkaline Phosphatase with substrates that produce photons as breakdown products) with kits being available, e.g., from Molecular Probes, Amersham, Boehringer-Mannheim, and Life Technologies/Gibco BRL; (2) color production (using both Horseradish Peroxidase and/or Alkaline Phosphatase with substrates that produce a colored precipitate) (kits available from Life Technologies/Gibco BRL, and Boehringer-Mannheim); (3) hemifiuorescence using, e.g., Alkaline Phosphatase and the substrate AttoPhos (Amersham) or other substrates that produce fluorescent products, (4) Fluorescence (e.g., using Cy-5 (Amersham), fluorescein, and other fluorescent labels); (5) radioactivity using kinase enzymes or other end-labeling approaches, nick translation, random priming, or PCR to incorporate radioactive molecules into the labeled nucleic acid. Other methods for labeling and detection will be readily apparent to one skilled in the art. Fluorescent labels are highly preferred labels, having the advantage of requiring fewer precautions in handling, and being amendable to high-throughput visualization techniques (optical analysis including digitization of the image for analysis in an integrated system comprising a computer). Preferred labels are typically characterized by one or more of the following: high sensitivity, high stability, low background, low environmental sensitivity and high specificity in labeling. Fluorescent moieties, which are incorporated into the labels of the invention, are generally are known, including Texas red, dixogenin, biotin, 1- and 2- aminonaphthalene, p,p'-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene, bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol, bis-3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol, benzimidazolylphenylamine, 2-oxo-3-chromen, indole, xanthen, 7- hydroxycoumarin, phenoxazine, calicylate, strophanthidin, porphyrins, triarylmethanes, flavin and many others. Many fluorescent labels are commercially available from the SIGMA Chemical Company (Saint Louis, Mo.), Molecular Probes, R&D systems (Minneapolis, Minn.), Pharmacia LKB Biotechnology (Piscataway, N.J.), CLONTECH Laboratories, Inc. (Palo Alto, Calif.), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, Wis.), Glen Research, Inc., GIBCO BRL Life Technologies, Inc. (Gaithersberg, Md.), Fluka ChemicaBiochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), and Applied Biosystems™ (Foster City, Calif.), as well as many other commercial sources known to one of skill. Means of detecting and quantifying labels are well known to those of skill in the art. Thus, for example, where the label is a radioactive label, means for detection include a scintillation counter or photographic film as in autoradiography. Where the label is optically detectable, typical detectors include microscopes, cameras, phototubes and photodiodes and many other detection systems that are widely available.

The present invention is further detailed in the following Examples, which are offered by way of illustration and are not intended to limit the invention in any manner. Standard techniques well known in the art or the techniques specifically described below are utilized. All patent and literature references cited in the present specification are hereby incorporated by reference in their entirety.

EXAMPLE 1: MAOA Methylation is Associated with Nicotine and Alcohol

Dependence

Over the past several years, it has become increasingly evident that gene- environment interactions (GxE) and residual gene-environment correlations (rGE) have a prominent role in the etiology of most common behavioral illnesses. However, the exact processes underlying these interactions and the extent of their relative contributions are unclear. At the molecular level, epigenetic phenomena such as DNA methylation and histone modification are thought to contribute to these processes. Unfortunately, empirical data to support this hypothesis at behaviorally relevant loci have been scarce.

Two candidate loci at which epigenetic phenomena may participate in GxE, rGE or E effects are the Serotonin Transporter (SLC6A4) and Monoamine Oxidase A (MAOA). The protein products of both of these two loci play prominent roles in regulating serotonergic and monoaminergic transmission, respectively. These moderating roles have come under increasing scrutiny due to recent studies which have demonstrated prominent GxE effects for depression at SLC6A4 (Caspi et al., Science. 301 (5631), 386-9 (2003)) and for aggression at MA OA (Kim-Cohen et al., MoI Psychiatry. 11(10), 903-913 (2006); Caspi et al., Science. 297(5582), 851-4 (2002)). Hence, there is a great deal of curiosity as to the mechanisms through which E or GxE effects could influence biological processes at these loci.

One mechanism through which GxE or E effects could become manifest at the molecular level is through altering relevant gene expression through methylation of gene promoters in response to environmental stressors. In the initial study of the relationship between promoter methylation and behavioral phenomena, the inventors conducted quantitative methylation analyses of the SLC6A4 associated promoter CpG island and demonstrated that methylation of this promoter is both sex dependent and associated with increased vulnerability to major depression.

However, whether there is a similar promoter associated CpG island at MAOA, and if it exists, whether its methylation has behavioral consequences was unclear.

Two types of disorders that could potentially be influenced by methylation induced changes in MAOA activity are Antisocial Personality Disorder (ASPD) and substance use disorders (SUD). Already, genetic variation in a variable nucleotide repeat (VNTR) located immediately upstream of the MAOA minimal central promoter has been associated with different vulnerability to ASPD and two forms of SUD: alcohol dependence (AD) and nicotine dependence (ND).

In this report, using a set of similar techniques to prior methylation and gene expression analyses of SCL6A4 (Philibert et al., American Journal of Medical

Genetics Part B: Neuropsychiatric Genetics, (2007)) and the resources of the Iowa Adoption Studies (IAS), a large longitudinal adoption study focusing on the role of GxE effects in SUD, the inventors examined the relationship of MAOA genotype and methylation to SUD and ASPD.

METHODS

The procedures used in the IAS have been described in detail elsewhere (Yates et al., Drug and Alcohol Dependence. 41(1), 9 (1996)). Briefly, the IAS is a case and control adoption study of G, E and GxE effects in SUD and ASPD. This study, founded by Remi Cadoret, contrasts the outcomes of 475 adoptees from the State of Iowa who are at high biological risk for SUD or ASPD (i.e., one of their biological parents was severely affected) with those of 475 adoptees who were not at biological risk for either SUD or ASPD. After birth, each of these adoptees was randomly placed in an adoptive home. Since their inception in the study, the adoptees and their adoptive environments have been serially assessed. The subjects included in this pilot study were the first 95 males and 96 females to participate in this wave of the study. The overall study design and all procedures described in this communication were approved by the University of Iowa Institutional Review Board.

Briefly, the behavioral and biological material used in these studies was obtained from subjects who participated in the last two waves of the Iowa Adoptions Studies (IAS). In both of these waves, each subject was interviewed with a version of the Semi Structured Assessment for the Genetics of Alcoholism, Version 2

(SSAGA-II) (Bucholz et al., J Stud Alcohol. 55(2), 149-58 (1994)). In addition, in the latest round of the study, phlebotomy was performed on each of the participants. Symptom counts and categorical diagnoses for each of the disorders (ASPD, AD, ND) were derived from SSAGA-II data using the individual dependence or personality disorder criteria from DSM-IV (Association, AP, Diagnostic and Statistical Manual of Mental Disorder, Fourth Edition. 1994, Washington D. C: American Psychiatric Association), with the highest total symptom count from these two interviews being defined as the lifetime symptom count.

RNA and DNA used in the studies were derived from lymphoblast cell lines using biomaterial contributed by the participants. These lymphoblast cell lines were prepared using standard EBV transfection techniques from the specimens contributed by the study participants (Klaus, GGB, Lymphocytes: A practical Approach. 1987, Oxford: IRL Press. 149-162). Total RNA was prepared from lymphoblast using a Midi RNA purification kit from Invitrogen™ (Carlsbad, CA) according to the manufacturer's instructions. DNA was prepared from lymphoblast cell pellets using cold protein precipitation (Lahiri et al., Nucleic Acids Research. 19(19), 5444 (1991)). PCR amplification of the MAOA variable nucleotide repeat (VNTR) polymorphism was conducted using the method of Sabol and colleagues (Sabol et al., Hum Genet. 103(3), 273-9 (1998)). The resulting PCR products were electrophoresed on a 6% non-denaturing polyacrylamide gel and imaged using silver staining (Merril et al., Analytical Biochemistry. 156(1), 96-110 (1986)). The resulting alleles were compared to internal standards and the genotypes were called by two individuals blind to affected status.

RTPCR was conducted as previously described (Philibert et al., American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, (2007); Bradley et al., Am J Med Genet B Neuropsychiatr Genet. 136(1), 58-61 (2005)). Briefly, RNA was reverse transcribed using an Applied Biosystems™ cDNA archiving kit (Foster City, CA). Then, 12.5 ng aliquots of cDNA were robotically dispensed and RTPCR performed using reagents from Applied Biosystems™ including primer-probe sets for MAOA (Hs 00165140) and the endogenous control loci GAPDH (from the GAPDH Control kit) and LDHA (Hs 00855332).

The existence, location, size and sequence of the MAOA CpG islands were determined using the default browser settings of the University of California Genome Browser (UCSC) website (world-wide-web at genome.ucsc.edu). The sequences for these islands are freely available from the website or from the authors on request.

Quantitative methylation analyses for each of the samples at these CpG residues were conducted by Sequenom ® Inc. (San Diego, CA) as previously described (Philibert et al., [erratum appears in MoI Psychiatry 1999 Mar;4(2):197.]. Molecular Psychiatry. 3(4), 303-9 (1998)). First, aliquots of purified DNA were treated using bisulfite modification (Frommer et al., Proceedings of the National Academy of Sciences. 89(5), 1827-1831 (1992)). Treatment of DNA with bisulfite deaminates unmethylated cytosine residues to uracil. Since uracil base pairs with adenosine, thymidines are incorporated into subsequent DNA strands in the place of unmethylated cytosine residues during subsequent PCR amplifications. Next, contigs covering the CpG islands (see Figure 1) were PCR amplified. Because of the size of the region, the CpG enriched regions were PCR amplified in four separate reactions. The primers for each of those PCR amplifications are as follows: Amplicon A (from BP 43398925 to 43399181):F- TTAAAGAATGAAAGTATTAGGTTGAGAGTT (SEQ ID NO: 1) and R- ATACCCACTCTTAAAAACCAACCCC (SEQ ID NO:2); Amplicon B (from BP 43399430 to 43399858): F-GGGTGTTGAATTTTGAGGAGAAG (SEQ ID NO:3) and R-AAAACACAACTACCCAAATCCC (SEQ ID NO:4); Amplicon C (from BP 43400453 to 43400805) : F-GGGGAGTTGATAGAAGGGTTTTTTTTAT (SEQ ID NO:5) and R-TATATCTACCTCCCCCAATCACACC (SEQ ID NO:6) and Amplicon D (from BP 43400486 to 43400035): F- AAAGGGTGGGAAGGATTTTTTTATTAATT (SEQ ID NO:7) and R- CATCCTCAATATCCAACTTCCCCTA (SEQ ID NO:8) using standard touchdown PCR conditions (Philibert et al., Am J Med Genet B Neuropsychiatr Genet. 144(1), 101-5 (2007)). Methylation ratios for each of the CpG residues (methyl CpG/total CpG) were then determined using a MassARRAY™ mass spectrometer using proprietary peak picking and spectra interpretation tools (Ehrich et al., Proc Natl AcadSci USA. 102(44), 15785-90 (2005); Ehrich et al., Nucleic Acids Res. 35(5), e29 (2007)).

The data were analyzed using the JMP (version 7; SAS Institute, Cary, SC) using Pearson's correlation coefficients, regression [analysis of variance (ANOVA) and ordinal logistic regression (OLR)] or Chi-square testing as indicated in the text (Fleiss, JL, Statistical Methods for Rates and Proportions. 2nd ed. 1981, New York, NY: John Wiley & Sons Inc.). All tests were two-tailed and all analyses were conducted by gender.

RESULTS

The characteristics of the IAS subjects who contributed the biomaterials to this study are given in Table 1. In total, 96 female and 95 male subjects provided biomaterials for the study. The male subjects were significantly older than the female subjects (t-test, p<0.002) and had a significantly higher symptom count for ASPD (Chi-Square, pO.OOl).

Table 1. Demographic and Clinical Characteristics of the IAS Subjects

Male Female

N 95 96

Age (years ± SD) 42.4 ± 8.5 38.8 ± 6.8

Ethnicity

White 87 91

African American 5 2 White of Hispanic Origin 2 1

Other 1 2

DSM IV Symptom Counts

ASPD AD ND

# Symptoms M F M F M F

0 18 41 35 49 47 50

1 26 30 25 25 4 6

2 21 9 16 13 10 7

3 7 7 11 3 15 8

4 10 5 2 2 6 14

5 9 3 3 3 8 8

6 4 3 2 0 3 3

7 0 0 1 0 2 0

The MAOA VNTR genotypes for the subjects are given in Table 2. The testing for Hardy Weinberg equilibrium in the female subjects was unremarkable.

Table 2. MAOA VNTR Genotype.

Genotype Female Subjects Male Subjects*

2, 2 0 1

2, 4 1 -

3, 3 18 34

3, 4 41 -

3, 5 1

3.5, 3.5 0 1

3.5, 4 1 -

4, 4 31 59

4, 5 3

*Male subjects are hemizygous with respect to this X-chromosome locus.

Sequence analysis of MAOA demonstrated the presence of two CpG islands in the gene (Figure 1). The first island, stretching from bp 43398975 to bp 43399158, contains 18 CpG residues and is approximately 1200 bp upstream of the transcription start site for MAOA. The second CpG island begins at bp 43399493 and contains 70 CpG residues. Exon 1 of MAOA is wholly contained within the CpG island with the transcription start site (TSS) for the gene occurring between CpG residues 64 and 65. The MAOA VNTR is found between the two CpG islands. The average methylation ratio at each of these residues is shown in Figure 2. As the figure demonstrates, females have consistently higher methylation ratios at each CpG residue than males (who are hemizygous for this gene). Please note that secondary to methodological limitations with respect to the ability of the mass spectrograph to resolve individual residues, the values for CpG residues, 1-2, 5-7, 11-12, 19-20, 30-31, 43-44, 55-57, 67-68, 72-73, and 79-80 are shown as aggregates.

The interrelationships of MAOA methylation between individual residues for each gender were studied. The correlation between methylation was higher between residues in the smaller 5' CpG island than it was in between residues in the larger CpG island that encompasses Exon 1. Of particular potential interest, methylation of the two residues immediately flanking the TSS, CpG 64 and 65, is poorly correlated with methylation throughout the rest of the island. However, methylation at the residues CpG 58-63 and CpG 66-70 is highly inter-correlated. In order to test the hypothesis that MAOA genotype influences the amount of methylation, the relationship of average methylation to genotype at the VNTR (Figure 3) for each gender was analyzed. There was a trend for female 3,3 homozygotes to have a higher average methylation than female 4,4 homozygotes (43.3% ± 3.8 vs 40.9% ± 5.2; pθ.10). There was no significant difference between males hemizygous for the 3 repeat allele as compared to those with the 4 allele although the arithmetic difference was in the same direction (9.0 ± 3.7 vs 8.3 ± 2.6; pθ.32).

The relationship between symptom counts for ASPD, AD and ND with average methylation for each gender was then analyzed using ordinal regression analysis. There was no relationship between ASPD and overall methylation for neither men (OLR, p<0.37) or women (OLR, p>0.70). There also were not any significant relationships between average methylation and AD (OLR, p<0.23) and ND (OLR, pθ.68) in male subjects. However, there were strong relationships between average overall methylation and symptom counts for AD (OLR, p<0.008) and ND (OLR, p<0.002) in female subjects.

In order to identify the residues driving the strong correlations between overall methylation and symptom counts for AD and ND in women, the relationship between methylation at individual CpG residues and symptom counts was analyzed. With respect to former, methylation at CpG residues 27, 38, 41, and 48 were nominally significantly associated (p<0.05 before correction for multiple comparisons) with AD symptom count in female subjects. With respect to the latter, methylation at CpG residues 18, 42, 48, 52, 64, 65, 67-68, 69, and 77 were nominally associated (p<0.05 before correction for multiple comparisons) with ND symptom counts.

Finally, in an attempt to discern whether gene expression was correlated with MAOA genotype or methylation, the inventors attempted to measure MAOA gene expression using our previously described techniques (Philibert et al., American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, (2007); Philibert, et al., Am J Med Genet B Neuropsychiatr Genet. 144(1), 101-5 (2007); Philibert et al., Am J Med Genet B Neuropsychiatr Genet. 144(5), 683-90 (2007)). Unfortunately, despite several attempts, we could not reliably detect MAOA gene expression. DISCUSSION In summary, it was discovered that MAOA methylation is associated with

ND and AD in women, but not men. In addition, a significant relationship between ASPD and CpG methylation was not found in men or women. Finally, there was a trend for MAOA genotype to be associated with methylation in women.

The results with respect to ND are perhaps the most compelling. Review of the animal model literature shows that MAOA knockout mice exhibit impaired nicotine preference but have normal responses to other novel stimuli (Agatsuma et al., Hum. MoI. Genet. 15(18), 2721-2731 (2006)). Furthermore, treatment of rats with the monoamine oxidase inhibitor phenelzine enhances the discriminant stimulus effect of nicotine (Wooters et al., Behav Pharmacol. 18(7), 601-8 (2007)) and increases nicotine self administration (Villegier et al, Neuropharmacology. 52(6), 1415-25 (2007); Guillem et al., Eur J Neurosci. 24(12), 3532-40 (2006)). Review of the literature with respect to humans reveals that the targeting of neurotransmitter systems regulated by MAOA, using agents such as reboxetine, a selective norepinephrine reuptake inhibitor, or bupropion, which targets the dopaminergic system, have been shown to be clinically effective in the treatment of ND (Miller et al., J Pharmacol Exp Ther. 302(2), 687-695 (2002); George, T and Weinberger, A: Monoamine Oxidase Inhibition for Tobacco Pharmacotherapy. MoI Ther, (2007); David et al., Tobacco Research. 9(8), 821 - 833 (2007)). Finally, platelet MAOA activity is reduced in smokers (Berlin et al., Int J Neuropsychopharmacol. 4(1), 33-42 (2001)).

This evidence is made even more compelling by closer inspection and consideration of the MAOA methylation data with respect to ND in the female subjects. The control of transcription initiation is one of the major mechanisms through which cells regulate gene expression (Levine et al., Nature. 424(6945), 147- 151 (2003)). Hence, the TSS is a frequent target of epigenetic modifications including methylation and histone modification (Kawaji et al., Genome Biology. 7(12), Rl 18 (2006); Liang et al., Proc Natl Acad Sci USA. 101(19), 7357-62 (2004)). Therefore, it is expected that any significant changes in ND association MAOA methylation preferentially affects the MAOA TSS. This is indeed what is observed with a strong clustering of CpG residues that are either nominally significantly associated or with a trend for association (p<0.10) surrounding the TSS. It is important to note that the primary outcome measure with respect to methylation in this study was overall methylation, not individual CpG residue methylation. This is because it was not known prior to the study which CpG residues might be most important.

Surprisingly, the inventors did not find any relationship between MAOA methylation and ASPD. Nor did the inventors find a significant relationship between genotype and methylation. Once again, this may simply be a function of low power. At the same time, these findings do not preclude specific GxE effects on ASPD at this locus because they did not examine the relationship of environmental factors hypothesized to elicit such effects, such as maltreatment, in this study. In summary, it was discovered that methylation of the MAOA promoter is associated with ND and AD in females.

EXAMPLE 2: The Effect of Smoking on MAOA Promoter Methylation in DNA Prepared from Lymphoblasts and Whole Blood Monoamine Oxidase A (MAOA) plays a key role in modulating monoaminergic neurotransmission through its catabolism of dopamine, norepinephrine, epinephrine, serotonin and related neurotransmitter catabolism byproducts. The MAOA gene is located on XpI 1 and consists of 15 exons that are transcribed to 4.1 kb mRNA and translated into a 527 amino acid protein (Chen ZY et al. 1991. Structure of the human gene for monoamine oxidase type A. Nucleic Acids Res 19(16):4537-41). Two regulatory motifs for the gene have been previously described. The first is a 44 bp variable nucleotide repeat (VNTR) that is found approximately 1200 bp upstream of the transcription start site (TSS) (Hotamisligil GS, Breakefield XO. 1991. Human monoamine oxidase A gene determines levels of enzyme activity. Am J Hum Genet 49(2):383-92). The second is a set of two promoter associated CpG islands that flank either side of the VNTR (See Example 1 above). As discussed above, it was demonstrated that increased lifetime symptom counts for Alcohol (AD) and Nicotine Dependence (ND) were associated with decreased MAOA methylation, with the effects being most prominent in women in the region of the gene surrounding the TSS. Furthermore, evidence was provided that the three-repeat (3R) allele of the VNTR was associated with increased methylation at this locus. In the present study several further questions were raise. First, were these findings simply a type I error due to multiple tests across CpG island loci? Second, given the direct pharmacological effects of nicotine consumption, is decreased methylation associated with current smoking only, or is there an effect of history of smoking as well? Third, are some regions of the promoter more important in characterizing this process? Finally, do lymphoblasts provide better or worse resolution than alternative media, such as whole blood, for the examination of epigenetic effects in substance use research?

These are important concerns because MAOA is hypothesized to play a key role in ND and other complex behavioral illnesses. MAOA inhibitors are used in the treatment of ND as well as other frequently co-morbid syndromes, such as major depression (MD) (George TP, Weinberger AH. 2008. Monoamine oxidase inhibition for tobacco pharmacotherapy. Clin Pharmacol Ther 83 (4): 619-21). Furthermore, MAOA VNTR gene-environment (GxE) interaction specific to the 3-epeat allele (3R) may be important in the etiology of antisocial conduct (Caspi A et al. 2002. Role of genotype in the cycle of violence in maltreated children. Science

297(5582):851-4; Frazzetto G et al., 2007. Early Trauma and Increased Risk for Physical Aggression during Adulthood: The Moderating Role of MAOA Genotype. PLoS ONE 2(5):e486). Finally, the present researchers have recently confirmed earlier findings that a similar GxE effect specific to the 4R allele may moderate vulnerability to MD (Beach SR et al., in submission. Child Maltreatment and MAOA Genotype in Depression and Antisocial Personality Disorder: Genetic Moderation of Family Environment). Therefore, the development of a detailed understanding of the molecular underpinnings of genetic and epigenetic effects at this locus is beneficial to the understanding and treatment of complex behavioral illness.

To help accomplish this goal and more finely hone our understanding of genetic and epigenetic effects at this locus, the inventors recently re-examined the original findings using the insights derived from our prior study and the resources provided by 289 additional participants in the IAS. METHODS

The study design and clinical measures in the IAS have been described in detail elsewhere (Yates WR et al. 1996. An adoption study of DSM-IIIR alcohol and drug dependence severity. Drug and Alcohol Dependence 41(1):9). The behavioral and demographic data were obtained from subjects participating in the last two waves of the IAS (1997-2003; 2004-2009). In each wave, subjects were interviewed with a version of the Semi-Structured Assessment for the Genetics of Alcoholism, version 2 (SSAGA-II) (Bucholz KK et al. 1994. A new, semi-structured psychiatric interview for use in genetic linkage studies: a report on the reliability of the SSAGA. J Stud Alcohol 55(2): 149-58). In addition, in the last wave subjects were phlebotomized to provide biomaterial for the preparation of DNA and lymphoblast cell lines. All these procedures were approved by the University of Iowa Institutional Review Board. The clinical and laboratory methods used in this study are very similar to those used previously. With respect to the behavioral data, symptom counts and categorical diagnoses for nicotine dependence were derived from SSAGA-II data using criteria from DSM-IV (American Psychiatric Association 1994). The highest total symptom count from these two interviews was defined as the lifetime symptom count. Smoking status was also determined using SSAGA data. Those who denied a history of daily smoking at both interviews were classified as "non-smokers." Those subjects who were daily smokers at the time of the first interview, but had totally quit at time 2 were classified as "quitters." Those who smoked daily at the time of both interviews were classified as "continuous smokers."

DNA from two different cellular sources was used in this study. The lymphoblast (LB) DNA for all 289 subjects was prepared from cell lines using blood contributed by the-participants. These cell lines were derived using standard EBV transfection techniques (Klaus GGB. 1987. Lymphocytes: A practical Approach. Oxford: IRL Press, p.149- 162) and the DNA was harvested using the method of Lahiri and Schnabel (Lahiri DK, Schnabel B. 1993. DNA isolation by a rapid method from human blood samples: effects of MgC12, EDTA, storage time, and temperature on DNA yield and quality. Biochem Genet 31 (7-8):321 -8). For a subset of the female subjects (n=78), we also analyzed DNA that was prepared from the whole blood sample (WB DNA) drawn at the same time as the specimen used to prepare the cell lines. This DNA was also extracted using the method of Lahiri and Schnabel (Lahiri DK, Schnabel B. 1993. DNA isolation by a rapid method from human blood samples: effects OfMgCl 2 , EDTA, storage time, and temperature on DNA yield and quality. Biochem Genet 31(7-8):321-8), and the methylation signatures of both types of DNA were determined at the same time.

Genotyping of the MAOA variable nucleotide repeat (VNTR) polymorphism was conducted as previously described (see Example 1 above). Quantitatative methylation determination was performed under contract by Sequenom ® Inc. (San Diego, CA) using the same methods previously described (Philibert RA et al. 1998. Association of an X-chromosome dodecamer insertional variant allele with mental retardation, [erratum appears in MoI Psychiatry 1999 Mar;4(2):197.]. Molecular Psychiatry 3(4):303-9). First, aliquots of purified DNA underwent bisulfite modification. The modified DNA samples were then used as a template for the PCR amplification of three contigs covering the MAOA promoter islands using standard touchdown conditions (Philibert R et al. 2007. Serotonin transporter mRNA levels are associated with the methylation of an upstream CpG island. Am J Med Genet B Neuropsychiatr Genet 144(l):101-5). Amplicon A stretches from BP 43398925 to 43399181, covers CpG residues 1 to 18, and uses the following primers: F- TAAAGAATGAAAGTATTAGGTTGAGAGTT (SEQ ID NO: 1) and R- ATACCCACTCTTAAAAACCAACCCC (SEQ ID NO:2). Amplicon B stretches from BP 43399430 to 43399858, covers CpG residues 19 to 45, and uses the following primers: F-GGGTGTTGAATTTTGAGGAGAAG (SEQ ID NO:3) and R- AAACACAACTACCCAAATCCC (SEQ ID NO:4); Amplicon C stretches from BP 43400453 to 43400805, covers CpG residues 46 to 74, and uses the following primers : F-GGGGAGTTGATAGAAGGGTTTTTTTTAT (SEQ ID NO:5) and R- TATATCTACCTCCCCCAATCACACC (SEQ ID NO:6). A fourth contig that covered CpG 75-88 was not used in this study because the residues in this amplicon were neither correlated with methylation in other amplicons nor with substance use in Example 1.

After amplification, the methylation ratios for each of the CpG residues (methyl CpG/total CpG) in these contigs were then determined using a

MassARRAY™ system mass spectrometer (Sequenom ® ). These data were then analyzed with proprietary peak picking and spectra interpretation tools to generate the methyl CpG/total CpG ratios (Ehrich M et al. 2005. Quantitative high- throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry. Proc Natl Acad Sci USA l 02(44): 15785-90; Ehrich M, et al. 2007. A new method for accurate assessment of DNA quality after bisulfite treatment. Nucleic Acids Res 35(5):e29). The peak for some residues could not be de-convoluted by the spectral interpretation tools. In those cases (CpG 5-7, 8-9, 11- 12, 19-20, 30-31, 61-62, 67-68, 72-73), the value for each residue is presented as an average of the aggregated values. In addition, no signal could be reliably observed for CpG residues 24, 26, and 28.

Because the methylation data had differing means and standard deviations at each loci, all methylation data were Z-transformed before comparison to genotype or clinical data. All data were analyzed using the JMP (version 7; SAS Institute, Cary, SC) using Pearson's correlation coefficients, regression, analysis of variance

(ANOVA), T-tests, and ordinal logistic regression (OLR)] as indicated in the text (Fleiss 1981). Factor analyses were conducted using SAS Version 9.1 (SAS Institute, Cary, NC). For analyses of VNTR genotype data, genotypes that contained uncommon alleles (i.e. 2, 3.5, and 5 repeats) were excluded and the remaining genotype data were analyzed using an additive model. All tests were two-tailed.

RESULTS The basic behavioral and demographic characteristics of this cohort of 289 IAS subjects are given in Table 3. As with the prior cohort, most of the subjects are White and well into adulthood. The male subjects do not differ from the female subjects with respect to age nor ethnicity. Consistent with the study design of the IAS, the sample is enriched for behavioral illness with 100 subjects reporting 3 or more lifetime criteria for ND.

Table 3. Demographic and Clinical Characteristics of the IAS Subjects

Male Female

N 125 164

Age (years ± SD) 41.1 ± 7.7 40.9 ± 7.7

Ethnicity

White 117 155

African American 3 2

White of Hispanic Origin 4 4

Other 1 3

DSM IV Symptom Counts for ND

# Symptoms Males Females

0 55 84

1 9 12

2 8 11

3 12 9

4 20 18

5 9 19

6 11 9

7 1 2

The genotype distribution of the subjects is given in Table 4. No relationship emerged between the MAOA VNTR genotype and lifetime symptom count for ND for males (p< 0.98, OLR) or females (p< 0.19, OLR).

Table 4. MAOA VNTR Genotype.

Genotype Female Subjects (n= 164) Male Subjects (n=125) 2, 2 0 1

2, 4 0

3. 3 21 43 3, 3.5 1

3. 4 64 3, 5 1

3.5, 3.5 0 4

3.5, 4 2

4. 4 71 76

4. 5 2 5, 5 0 1

Unknown 2

*male subjects are hemizygous with respect to this X-chromosome locus.

The untransformed sex averaged methyl CpG/total CpG ratio for each residue is given in Figure 4. The first CpG island contains 18 CpG residues and begins approximately -1200 bp before the transcription start site of MAOA. The VNTR lies between the two CpG islands. The second island consists of 70 CpG residues, the first 56 residues of which were measured in this study. The TSS is located between CpG residues 64 and 65. Overall, males have a average methylation ratio (methyl CpG/total CpG) of 7.2% and females have an average methylation ratio of 34.8%.

Not surprisingly, because MAOA is an X chromosome gene, females consistently had a higher average methylation ratio at every CpG residue. Ethnicity was not associated with average methylation. However, a trend emerged for increasing age to be associated with increasing methylation in females (p<0.07; ANOVA) but not males (p<0.30; ANOVA).

Average and Locus Specific Methylation. The relationship between the Z- transformed average methylation ratios across the 74 residues examined and VNTR genotype is shown in Figure 5. The average Z-transformed methylation ratio was greater in DNA from heterozygous females (3R,4R) than in DNA from 4R homozygotes (p<0.04; T-test). Although the directionality of differential methylation was consistent with prior findings, hemizygous males and homozygous females for the 3 R allele did not have significantly higher average amounts of methylation than did their 4R counterparts (pθ.24 and p<0.20, respectively). Next, the inventors examined the relationship between global or TSS region specific methylation, which was defined as being the average of Z-transformed values for residues CpG 61-70, and lifetime ND symptom count for all 289 subjects. Although the pattern of relationships was similar to prior findings, the relationship between global methylation and lifetime ND symptom count was not statistically significant for males (p<0.19) or females (p<0.12). However, before correction for multiple comparisons, eight individual CpG residues (CpG 22, 25, 32, 36, 39, 64, 65 and 69), including three in the TSS region, were nominally associated (p <0.05) with ND symptom for the male subjects but no such relationships emerged in the female subjects.

Because the inventors noted that a substantial number of subjects had quit smoking, yet were still counted as affected using the lifetime symptom count criterion, the inventors next examined current smoking status for 274 subjects whose smoking status could be easily classified. First, for these analyses of current smoking status, those who denied a history of smoking one or more days per week were designated non-smokers (male n = 59 and female n = 83). "Daily smokers" were defined as those who smoked 7 days per week at the times of both the first and second interviews (male n = 42 and female n = 45). Finally, "quitters" were defined as those subjects who smoked daily at the time of the first interview, but denied smoking regular smoking (1 or more days per week) at the time of the second interview (male n = 20 and female n = 27). The 15 subjects excluded from these three groups were removed because either they were never truly daily smokers at both interviews (i.e., did not smoke every day; n = 10), did not fully quit smoking (n = 4), or started smoking after the first interview (n = 1). Using these definitions of smoking status, the examined the relationship between global and site-specific methylation and current daily smoking status. The distribution of the differential methylation at each residue for male and female "lifetime daily smokers," "quitters" and non-smokers" is illustrated in Figure 6. The results are most marked for the male subjects. As compared to non-smokers, smokers had lower amounts of methylation globally (p<0.02; T-test) and at the transcription start site (p<0.009; T-test) with 7 residues meeting nominal significance level before correction for multiple comparisons. As Figure 6 demonstrates, smoking is associated with a pervasive decrease in methylation across the second larger CpG island with particular consistency in two areas. The first is from CpG 19 to CpG 32. The second is from CpG 55 to CpG 69, a region that includes the TSS. In contrast, the methylation pattern in those male subjects who quit in the five years prior to the blood draw is decidedly mixed across both islands, with both elevated and decreased methylation at particular residues. Finally, in those male subjects without a history of daily smoking, the net methylation is pervasively increased across the larger CpG island, but somewhat mixed and perhaps decreased overall in the first CpG island.

The methylation pattern in LB DNA from female smokers is similar to that of the male smokers but less intense and consistent. A clear contrast is seen between the amount and pattern methylation observed in those females who quit smoking as compared to those who never smoked, with a trend for reduced overall methylation (p<0.08; T-test) and a significant reduction of methylation at the TSS (p<0.04; T- test) in those who quit. Factor Analytic Results. To determine whether methylation data could aggregated in a meaningful way, the inventors used the FACTOR procedure in the SAS computer program (SAS Institute, Cary NC) to factor analyze the set of CpG residues for which > 95% of both male and female participants had scores. This approach provided a stable three dimensional factor structure accounting for 39% of the reliable variance. The inventors used a varimax rotation to identify regions of covariation in degree of methylation. Use of the three factor scores has the advantage of summarizing the reliable signal in the data, while minimizing the number of separate contrasts required to describe effects, which enhances the signal to noise ratio in the data. The three regions identified by the factor analysis were: Factor 1 (CpG 19-

CpG 45), Factor 2 (CpG 58-CpG 74), and Factor 3 (CpG 1- CpG 18). Use of average scores across the identified region provided a similar pattern of results as use of factor scores. Therefore, factor scores were used in all analyses reported below. Replicating and extending the analyses reported above for genotype, the inventors found that methylation was greater for heterozygous (3R,4R) or homozygous (4R) females, but the effect was confined to Factor 3 (i.e., CpG 1-CpG 18), F(1, 137) = 4.50, p<.05. The average factor scores for the three groups across CpG 1-18 were (-.17 vs. .23 vs. -.10) for homozygous 4R, heterozygous 3R,4R and homozygous 3R females respectively. The inventors also found a significant effect of genotype for males, but in this case the effect was confined to factor 1(CpG 19- CpG 45), F(1, 122) = 5.25, p < .03. The average factor scores for the two groups across CpG 19-45 were (-.11 vs. .20) for the hemizygous 4R vs. 3R males respectively. For both males and females, the 4R allele was associated with significantly less methylation.

Replicating and extending the analysis of global methylation effects, a significant association between methylation in the region of CpG 19-45 and days smoking at time 1 ( p< .002) and time 2 (p< .02) for males was found. A significant association also emerged between days smoking and methylation in the region around the TSS (i.e., Factor 2; CpG 56-74) for males, but only at time 1 (p < .02). For females, the only significant association emerged between factor 3 (CpG 1-18) and smoking at time 1 (p < .04). For ND symptom count, we found trends for males p < .07, for factor 1 (CpG 19-45) and p<.1 for factor 2 (CpG 56-74), but no significant associations for females.

The inventors next replicated and extended the analyses contrasting continuous smokers, quitters, and non-smokers. The inventors found significant group differences for males in methylation of factor 1 (CpG 19-45), F(2, 117) = 5.46, p < .01, and factor 2 (CpG 56-74), F(2, 117) = 3.91, p < .05. The average factor scores for the three groups across CpG 19-45 were (-.19 vs. -.25 vs. -.15) for non-smokers, continuous smokers, and quitters respectively. Males who never smoked had the highest level of methylation whereas continuous smokers had the least, and quitters were intermediate. For Factor 2 (CpG 56-74), those who never smoked also had the highest methylation, but the quitters had the least. The average factors scores for the three groups were (.15 vs. -.15 vs. -.29) for non-smokers, continuous smokers, and quitters, respectively. For females, only Factor 3 (CpG 1- 18) reliably differentiated the groups F(2,150) = 3.04, p = .05. Females who never smoked had the highest methylation and those who had quit had the lowest. The average factors scores for the three groups were (.15 vs. -.15 vs. -.41) for non- smokers, continuous smokers, and quitters, respectively.

Comparison of Lymphoblasts to Whole Blood. Finally, because there is considerable controversy in the field as to which source(s) of DNA can or should be used in methylation studies, the inventors next compared the relationship of smoking status to ND in 78 of the female subjects included in the above analyses using DNA prepared from whole blood (WB) or from the lymphoblast line (LB) derived from the same sample of blood. Each set of samples had a similar amount of overall methylation (LB 33.3% vs WB 34.0%, p<0.45; T-test). The distribution with respect to VNTR allele status was virtually identical (data not shown). With respect to smoking status, there was a trend for decreased overall methylation in DNA of smokers (n=24) as compared to that from non-smoking females (n=38) when the DNA was derived from the lymphoblasts (p<0.09; T-test). However, there was no difference when the same comparison was performed using DNA prepared from whole blood (p<0.89; T-test). To gain a better understanding of this, the inventors plotted the methylation signatures at each residue for those who were daily smokers, recently quit, or who had never smoked. Although the same patterns are present in the DNA from both sources, visual inspection of the methylation plots demonstrates greater consistency and intensity of the differential methylation patterns in the DNA derived from lymphoblasts as compared to that from whole blood.

To compare the results of the methylation results from the two sources of DNA in a more quantitative manner, the inventors next examined average methylation in the three regions identified in the factor analysis using a 3 (smoking status) by 2 (LB vs. WB DNA) ANOVA for each region. As before, only the region identified by Factor 3 (CpG 1-18) reliably differentiated the three smoking status groups F(2,74) = 4.61, p < .02. There was no interaction with type of assessment (WB or LB DNA) for this region, suggesting that, given enough observations, a method using either source of DNA would have identified the pattern - even though the spread of the distribution of means was slightly more pronounced for LB than for WB samples (.21, -.09, -.38 vs. .15, -.03, -.34 for non-smokers, continuous smokers, and quitters, respectively for LB vs. WB samples). There was, however, a trend toward significance for the interaction of smoking status with assessment method for Factor 1 (CpG 19-45) F(2,74) = 2.45, p < .1, suggesting that the two approaches might lead to somewhat different conclusions for that region of the CpG island. In particular, the pattern of means for the three smoking status groups was (.10, -.11, -.03 vs. -.14, .16, .11) for non-smokers, continuous smokers, and quitters respectively for LB vs. WB samples, indicating a reversal of the relative positions of never smokers and quitters in average level of methylation in this region depending on which assessment method was used. DISCUSSION

In summary, using another sample of subjects from the IAS, the inventors replicated and extended their previous findings to show that a significant portion of the methylation signature status at MAOA is associated with current smoking status, that quitting has an effect on methylation status, and that gender and region of the CpG island examined are also important for accurate specification of associations between smoking and level of methylation. The inventors also examined an important methodological issue by using methylation data on the same subjects using two different sources of DNA, and by examining relationships using a factor analytic approach to reduce the number of dimensions required to describe the methylation results.

The current data provide compelling evidence that the methylation status of the two CpG islands associated with the MAOA promoter is dependent upon smoking status. The real question is why? The answer may be to increase the amount of MAOA protein that is produced. Previous work by others has shown that acute exposure to smoke decreased human brain MAOA activity (Fowler JS, et al. 1996. Brain monoamine oxidase A inhibition in cigarette smokers. Proc Natl Acad Sci USA 93(24): 14065-9), and that this decrease in protein activity may be a direct pharmacological/toxicological effect of substances in tobacco smoke (Berlin I, Anthenelli RM. 2001. Monoamine oxidases and tobacco smoking. Int J Neuropsychopharmacol 4(l):33-42; Fowler JS et al. 2003. Monoamine Oxidase and Cigarette Smoking. NeuroToxicology 24(l):75-82.). Since promoter methylation, particularly at the TSS generally decreases mRNA transcription, it seems plausible that the association of decreased methylation with increasing ND symptom count could result from the attempt of the cell to upregulate MAOA RNA production in the face of increased MAOA protein turnover or inhibition caused by smoking.

Whereas this appears to readily explain the contrast in methylation between current smokers and non-smokers, this rationale does not fully explain the effect of "quitting" on MAOA methylation that does not appear to lead to uniform changes and a return to methylation levels similar to those who never smoked. Indeed, on most indices the quitters were as different from non-smokers as the continuous smokers, albeit more variable in their methylation profiles. However, at this time one should be cautious in the interpretation of this portion of these findings. The window of time for "quitting" for these subjects used in this study was rather large and it is highly likely that the subjects differed significantly between one another with respect to total time of smoking abstinence. Therefore, aggregating all

"quitters" together in analyses may be insensitive important heterogeneity in this group. Still, taken at face value, these data suggest that the process of returning to non-smoking methylation status may be a lengthy one and that the process may be dynamic at the molecular level as well as at the clinical level. The finding that female 4R homozygotes have significantly lower methylation than 3R,4R heterozygotes and arithmetically lower methylation than 3 R homozygotes is consistent with the inventors' prior work in which they showed a trend for the 4R homozygotes to have lower average methylation than 3 R homozygotes (40.9% vs 43.3%; p<0.10). In unpublished data from that analysis, the average methylation of the 3,4 heterozygotes was only slightly less than that of the 3R homozygotes (42.9%). Hence, when the data is pooled, it is clear that the average methylation of the 4R homozygotes is significantly lower than that of both 3,4 heterozygotes as well as the 3R homozygotes. In addition, this pattern was found for males when factor scores were examined, albeit only for CpG residues in the region from 19-45. Unfortunately, at this time, there is not a good explanation for the observation that the "low activity" 3R allele is associated with greater average methylation overall, and in particular, the region of the first CpG island. The inventors' expectation going into these studies was that the 4R allele would have greater methylation than the 3 R allele in order to compensate for the greater amount of gene transcription that has been shown in most, but not all, transfection studies (Beach SR et al., in submission, Child Maltreatment and MAOA Genotype in Depression and Antisocial Personality Disorder: Genetic Moderation of Family Environment; Cirulli ET, Goldstein DB. 2007. In vitro assays fail to predict in vivo effects of regulatory polymorphisms. Hum MoI Genet 16( 16): 1931 - 1939; Guo G et al. 2008. The VNTR 2 repeat in MAOA and delinquent behavior in adolescence and young adulthood: associations and MAOA promoter activity. Eur J Hum Genet 16(5):626-34; Sabol SZ et al. 1998. A functional polymorphism in the monoamine oxidase A gene promoter. Hum Genet 103(3):273-9). But this is not the case, suggesting that more complex regulatory processes may be at work or that transfections of these MAOA alleles does not fully capture the transcriptional complexity present at this locus.

Lymphoblast cultures are homogenous cell lines that are derived from long lived peripheral β-lymphocyte populations and are relatively unaffected by acute changes in the health status of the host (Hao Z, Rajewsky K. 2001. Homeostasis of peripheral B cells in the absence of B cell influx from the bone marrow. J Exp Med 194(8): 1151 -64; Tough DF, Sprent J. 1995. Lifespan of lymphocytes. Immunol Res 14(1): 1 -12). Others have demonstrated that the epigenetic signature is preserved in lymphoblasts (Monks SA et al. 2004. Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75(6): 1094- 105; Morello F et al. 2004. Differential Gene Expression of Blood-Derived Cell Lines in Familial Combined Hyperlipidemia. Arterioscler Thromb Vase Biol 24(11):2149-2154). The present observation of nearly identical amounts of total methylation and allele specific methylation in the WB and LB samples further supports this supposition. In contrast, there are several reasons to believe that the methylation signatures in WB DNA may be more variable. Peripheral white blood cells are a varying mixture of neutrophils, lymphocytes, eosinophils, basophils and monocytes, each of which probably has a slightly different methylation signature. The composition of this cell mix can change suddenly. In particular, the neutrophil portion of this mixture is subject to marked swings in population secondary to margination of these cells to the blood stream in response to processes such as stress, infection or drug ingestion (e.g., lithium). Because these processes are associated with changes in neutrophil protein and gene expression signatures (Bussiere FI et al. 2002. Stress protein expression cDNA array study supports activation of neutrophils during acute magnesium deficiency in rats. Magnes Res 15(l-2):37-42; Macdonald J, Galley HF, Webster NR. 2003. Oxidative stress and gene expression in sepsis. Br J Anaesth 90(2):221-232), it is likely that as part of these processes, changes in methylation signatures also occur, leading to greater variability in WB than LB DNA. In light of this source of variability in the constituent elements of WB DNA and the likelihood that the various cell types in blood differ slightly in their methylation signatures, it is reasonable to assume that WB DNA may have greater variability in its methylation signature. However, this does not mean it should not be used in these types of studies. Careful review of Figure 7 demonstrates that the same patterns are evident in both sources of DNA and the current data are from just one locus.

The apparent differences in the methylation profiles with respect to smoking status are intriguing. Although the inventors initially analyzed only overall and TSS specific methylation, one advantage of using factor analytic scores is that they provide a potentially useful way of defining and then summarizing methylation for all regions of the CpG island, allowing better specification of possible differences between groups and between genders. For example, average factor analytic scores for males show an orderly transition from decreased to increased methylation as a function of smoking status that is most apparent for Factor 1 comprising the region from CpG 19 to CpG 45. For females, non-smokers also demonstrate the highest methylation, but this is most evident on Factor 3 comprising the region from CpG 1 to CpG 18. Both male and female quitters demonstrated lower levels of methylation than did non-smokers on Factor 2 (i.e., the region containing the TSS) with continuous smokers being intermediate (-.29 vs -.14 vs .14 for males; -.26, -.02, .13 for females), suggesting that effects of smoking status at the TSS may be more similar than different for males and females, and that quitting smoking may be associated with lowered methylation for both. EXAMPLE 3: Genome-wide Methylation Analysis

Genome wide methylation analyses were conducted using lymphoblast DNA from 10 well controls, 8 subjects with active alcohol dependence, 7 subjects with active Nicotine Dependence and 4 subjects with active Cannabis Dependence from the Iowa Adoption Studies. Briefly, 10 μg of highly purified lymphoblast DNA from each subject was digested with to completion with Msel, purified and a 300 ng aliquot (input) stored for further analysis. Then, 5 μg aliquots of each sample were denatured at 95 0 C for 10 min, and subsequently rapidly chilled. The denatured DNA was then resuspended in immunoprecipitation buffer, then sequentially immunoprecipitated with mouse anti 5-methylcytosine (Abeam, USA), and sheep anti-mouse IgG antibodies. The resulting immunoprecipitated DNA was then cleaved from the precipitated complex by overnight proteinase K digestion and purified. Then, aliquots of both the input and enriched (immunoprecipitated) DNA were amplified with a Whole Gene Amplification-2 (WGA-2) kit (Sigma, USA) according to manufacturer's instructions. The resulting DNA was purified and quantified. Then, 5 μg aliquots of resulting amplified DNA samples were shipped to Roche-Nimblegen (Indianapolis) for labeling and hybridization under contract. In short, the input and enriched DNA samples were labeled with Cy-3 and Cy-5, respectively and then matching specimens were be hybridized to the 385 K NimbleGen promoter array and scanned.

Analysis of Genome wide data: Cy3-Cy5 ratios for probe were computed, log 2 transformed, then scaled by subtracting the bi-weight mean from each value for each feature. The resulting values were then analyzed in relation to all features and directly neighboring features by fixed window Kolmogorov-Smirnov test to identify significantly differentially regulated promoter regions in subjects. The resulting peak scores for each differentially region for each subject were exported and the results from cases and controls contrasted using standard t-tests to determine differentially regulated individual gene promoter regions in type of substance use syndrome.

Three tables are given (Table 5, 6 and 7) with respect to the identity of gene promoter region differentially regulated in Nicotine, Alcohol and Cannabis Dependence. These promoter-associated islands are listed according to their HUGO identification of the gene to which they are associated.

Table 5. Genes whose methylation is differentially regulated in DNA from subjects with active Nicotine Dependence as compared to DNA from well Controls.

ACCN3 LPHNl VMD2L1

ATP6V0A4 LTB4R WFIKKNl

C10orf39 LW-I ZCCHC 13

C10orf53 MAFG ZFP64

C21orfl23 MAGEA4 ZNF274

CACNAlG MATK ZNF320

CCDC49 MCARTl ZNF516

CCNC MGC4728

CD8A MYADML

CDH 16 N/A

CIDEB NCAMl

CLEClOA NCR3

CYP2B6 NOXOl

DOK2 NUDTl

DYDCl NUMB

EFNA3 OPNlLW

FABP6 OR2B11

FAM107B OR6V1

FKBPL PAQR5

FLJ32569 PAX3

FLJ40365 PDCD4

FLJ43870 PNCK

GALP PPAN

GJBl PRAME

GMPPA PSG7

HOXA5 PTPRT

HRASLS2 RIBCl

KCNQlDN RPI l-

KHSRP 159H20.4

KIAAl 843 SlOOAl

KIRREL2 SlOOAl 3

KLK9 SCN5A

LCAlO SELS

LIMS3 SH3PX3

LOCI 55006 SLC35E1

LOC285095 SPIN-2

LOC643274 TAC3

LOC645811 TBX4

LOC646836 TCOFl

LOC653176 TNFSF9

LOC653700 TOMM40 Table 6. Genes whose methylation is differentially regulated in DNA from subjects with active Alcohol Dependence as compared to DNA from well Controls.

ZGPl FBXW5 MLX SHARPIN

BHLHB8 FLJ40448 MPG SLITRK4

C6orf26 FRGl N/A SNAPC2

C20orf70 GIYD2 OPRSl SULT1A3

CACNAlS KIAAl 875 PANX2 TBX2

CMTM2 KLK8 PIP5KL1 THTPA

COL6A2 LOC339047 PPPlCA TMEMlOl

COLECI l LOC440354 PRSS27 TMEM121

CSAG2 LOC642628 PSG3 TRIM 17

ELF3 LOC644122 REEP6 TYRO3

FAM3A LOC645598 RFNG

Table 7. Genes whose methylation is differentially regulated in DNA from subjects with active Cannabis Dependence as compared to DNA from well Controls.

IL32 ZNF42 SPANXAl TOMM40

PEOl FNDC8 TMEM88 SERTAD3

LOC653210 FAM84A C14orfl20 IER2

C7orf21 PTPN20A IGFBP6 ARHGEFl

PTPN20A RBP5 ACSS2 PEOl

KRTl 7 LOC642358 FXYDl BMF

PTPN20B PAQR8 CMTMl KIAA0310

LOC653107 DNAIl H2AFB3 DNAJC 19

GIYD2 LOC653680 KIAA0892 SEPT6

FTL HSPAlA ZNF409 MAGED4

FLJ21767 SIRT2 IRF7 HSDI lBlL

LOC653107 UBOX5 LOC653257 GIYD2

CSH2 TUBA2 RTN2 EGLN2

CSAG3A KCNK7 KCNK7 PRCP

MGC 12760 ZNF580 SULT1A3 CYBA

MUC4 RNASEH2A N/A TUSC4

RRP22 SCT LOC339123 BCKDK

SlOOAl 3 LOC653210 LOC644083 GH2

TRIM74 LOC644733 ATP6V0C BOLA2

BAD RBMlO PAIP2 PITRMl

CSAG3A DARC LOC653483 LOC401019

ANKRD25 LIMEl MEIS3 FAM39A

SNRPN MAGEA2B MAGED4 CKAPl

FKRP FLJ21767 CA5BL GMFG

RNF 126 SNRPN ARRB2 CYP2D6

ITGB4BP COX6B2 RIPK3 BAGl

PEOl Clorfl42 CKSlB Rgr

LRDD C7orf21 TSEN34 LOC653107

CHMP5 CACNAlC SFTPC DEDD2

CRAT CSAG3A OBP2B PITX3

FAM39A MGC 12760 BAGE

FAM39A FLJ36046 CRYAB

ECMl PEOl LOC389833 EXAMPLE 4: Methylation Profiling of Nicotine Dependence

Nicotine dependence (ND) is one of the largest public health challenges in the developed world. Despite extensive treatment and prevention efforts, approximately 20% of U.S. adults still smoke on a daily basis which results in 440,000 premature deaths and $92 billion dollars of economic costs annually (Center for Disease Control. 2005. Annual Smoking- Attributable Mortality, Years of Potential Life Lost, and Productivity Losses — United States, 1997—2001. Morbidity and Mortality Weekly 54(25):625- 628; Center for Disease Control. 2009. State-Specific Prevalence and Trends in Adult Cigarette Smoking-United States, 1998-2007. JAMA 302(3):250-252.) Not surprisingly, a large number of studies have been conducted to identify the genetic and environmental factors associated with smoking. While the analyses of both types of factors have been informative and useful in the provision of better treatment and prevention measures, the rate of smoking in the general population may have reached a nadir and in fact may be increasing in young adults (Kumra V, Markoff BA. 2000. WHO'S SMOKING NOW?: The Epidemiology of Tobacco Use in the United States and Abroad. Clinics in Chest Medicine 21(l):l-9.). Hence, there is increased urgency to understand the biology underlying ND. Unfortunately, even though recent genome wide analyses have clearly identified significant genetic variation for ND (Bierut LJ, Madden PA, Breslau N, Johnson EO, Hatsukami D, Pomerleau OF, Swan GE, Rutter J, Bertelsen S, Fox L and others. 2007. Novel genes identified in a high-density genome wide association study for nicotine dependence. Hum MoI Genet 16(l):24-35; Vink JM, Smit AB, de Geus EJ, Sullivan P, Willemsen G, Hottenga JJ, Smit JH, Hoogendijk WJ, Zitman FG, Peltonen L and others. 2009. Genome-wide association study of smoking initiation and current smoking. Am J Hum Genet 84(3):367-79.), the majority of the biological vulnerability for initiation and maintenance of smoking behaviors remains unexplained.

Recently there has been an increasing appreciation that a portion of the biology responsible for the initiation and maintenance of smoking behaviors may be epigenetic. Over the past two years, a number of studies have demonstrated that smoking itself induces biological changes at loci such as monoamine oxidase A (AdAOA) and monoamine oxidase B (MAOB) which are known to be important in human behavior (Fowler JS, Logan J, Wang G-J, Volkow ND. 2003. Monoamine Oxidase and Cigarette Smoking. NeuroToxicology 24(l):75-82; Fowler JS, Volkow ND, Wang GJ, Pappas N, Logan J, Shea C, Alexoff D, MacGregor RR, Schlyer DJ, Zezulkova I and others. 1996b. Brain monoamine oxidase A inhibition in cigarette smokers. Proc Natl Acad Sci U S A 93(24): 14065-9). Whereas some of the biological effects are known to be due to the direct effects of cigarette smoke (Yu PH, Boulton AA. 1987. Irreversible inhibition of monoamine oxidase by some components of cigarette smoke. Life Sci 41(6):675-82), it is also becoming evident that smoking may directly affect the methylation status of genes (Breton CV, Byun H-M, Wenten M, Pan F, Yang A, Gilliland FD. 2009. Prenatal Tobacco Smoke Exposure Affects Global and Gene- Specific DNA Methylation. Am J Respir Crit Care Med:200901-0135OC; Philibert R, Beach SR, Gunter T, Brody GH, Madan A. 2009. The Effect of Smoking on MAOA Promoter Methylation in DNA Prepared from Lymphoblasts and Whole Blood. Am J Med Genet B Neuropsychiatr Genet Sep 23; [Epub ahead of print]). These findings are intriguing because altered DNA methylation is an integral part of the biological processes in the carcinogenic pathway (Tessema M, Yu YY, Stidley CA, Machida EO, Schuebel KE, Baylin SB, Belinsky SA. 2009. Concomitant promoter methylation of multiple genes in lung adenocarcinomas from current, former and never smokers. Carcinogenesis 30(7): 1132- 1138) and they suggest the possibility that methylation may also affect behaviorally relevant genes.

There is strong support for the hypothesis that smoking alters DNA methylation of behaviorally relevant genes at the Xp 13 locus containing MAOA and MAOB. Monoamine oxidase activity is essential for the normal catabolism of monoaminergic neurotransmitters. Classically, disruption of this oxidase activity is associated with aberrant behavior, especially aggression (Brunner HG, Nelen M, Breakefield XO,

Ropers HH, van Oost BA. 1993. Abnormal behavior associated with a point mutation in the structural gene for monoamine oxidase A. Science 262(5133):578-80). Since that seminal discovery, there has been an increasing body of evidence, including a set of elegant neuroimaging analyses by Volkow and associates, that implicates altered MAOA and MAOB protein activity in the CNS and non-CNS pathophysiology associated with smoking (Alia-Klein N, Goldstein RZ, Kriplani A, Logan J, Tomasi D, Williams B, Telang F, Shumay E, Biegon A, Craig IW and others. 2008. Brain Monoamine Oxidase A Activity Predicts Trait Aggression. J Neurosci 28(19):5099- 5104; Fowler JS, Logan J, Wang G-J, Volkow ND. 2003. Monoamine Oxidase and Cigarette Smoking. NeuroToxicology 24(l):75-82; Fowler JS, Volkow ND, Wang GJ, Pappas N, Logan J, MacGregor R, Alexoff D, Shea C, Schlyer D, WoIfAP and others. 1996a. Inhibition of monoamine oxidase B in the brains of smokers. Nature 379(6567):733-6; Fowler JS, Volkow ND, Wang GJ, Pappas N, Logan J, Shea C, Alexoff D, MacGregor RR, Schlyer DJ, Zezulkova I and others. 1996b. Brain monoamine oxidase A inhibition in cigarette smokers. Proc Natl Acad Sci U S A 93(24): 14065-9). Some of these changes in smoking associated monoamine oxidase activity are secondary to direct effects of smoke (Yu PH, Boulton AA. 1987. Irreversible inhibition of monoamine oxidase by some components of cigarette smoke. Life Sci 41(6):675-82). However, an emerging literature has indicated that altered epigenetic regulation of both of these genes may also be playing a role in altering monoamine oxidase activity (Launay J-M, Del Pino M, Chironi G, Callebert J, Peoc'h K, Megnien J-L, Mallet J, Simon A, Rendu F. 2009. Smoking Induces Long-Lasting Effects through a Monoamine-Oxidase Epigenetic Regulation. PLoS ONE 4(11):e7959; Philibert R, Beach SR, Gunter T, Brody GH, Madan A. 2009. The Effect of Smoking on MAOA Promoter Methylation in DNA Prepared from Lymphoblasts and Whole Blood. Am J Med Genet B Neuropsychiatr Genet Sep 23; [Epub ahead of print]; Philibert RA, Gunter TD, Beach SR, Brody GH, Madan A. 2008. MAOA methylation is associated with nicotine and alcohol dependence in women. Am J Med Genet B Neuropsychiatr Genet 147B(5):565-70). Taken together with a recent genome wide study of methylation of the effects of maternal prenatal smoking (Breton CV, Byun H-M, Wenten M, Pan F, Yang A, Gilliland FD. 2009. Prenatal Tobacco Smoke Exposure Affects Global and Gene-Specific DNA Methylation. Am J Respir Crit Care Med:200901-0135OC) and studies by others indicating that altered methylation loci such as OPRMl, DAT and SNCA (Bonsch D, Lenz B, Kornhuber J, Bleich S. 2005. DNA hypermethylation of the alpha synuclein promoter in patients with alcoholism. Neuroreport 16(2): 167-70; Hillemacher T, Frieling H, Hartl T, Wilhelm J, Kornhuber J, Bleich S. 2009. Promoter specific methylation of the dopamine transporter gene is altered in alcohol dependence and associated with craving. Journal of Psychiatric Research 43(4):388-392; Nielsen DA, Yuferov V, Hamon S, Jackson C, Ho A, Ott J, Kreek MJ. 2008. Increased OPRM 1 DNA Methylation in Lymphocytes of Methadone- Maintained Former Heroin Addicts. Neuropsychopharmacology) in other addictive behaviors, a nascent literature is emerging that supports the assertion that various addictive substances may alter DNA methylation at a broad number of loci relevant to behavior, and that better understanding changes in methylation may enhance our understanding of the biology of addiction.

Since DNA methylation is a major mechanism through which gene expression and ultimately behavior is regulated, these findings also suggest that smoking induced altered DNA methylation may be in part responsible for some of the processes which maintain smoking as well as some of the other behavioral phenomena associated with smoking, such as increased risk for panic disorder (Isensee B, Wittchen HU, Stein MB, Hofler M, Lieb R. 2003. Smoking increases the risk of panic: findings from a prospective community study. Arch Gen Psychiatry 60(7):692-700). Capturing a broader understanding of that biology may generate critical insights that may be important to the development of better treatment and prevention measures for smoking and associated phenomena. Therefore, in order to begin the facilitation of this better understanding of this altered DNA methylation on a more systematic basis, an analysis was conducted of DNA methylation at 18,028 promoter associated CpG islands using lymphoblast DNA from 23 actively smoking ND subjects and 18 age and ethnicity matched controls from the Iowa Adoption Studies. METHODS The design and diagnostic measures used in the IAS have been extensively described previously and all have been approved by the University of Iowa Institutional Review Board (Yates WR, Cadoret RJ, Troughton E, Stewart MA. 1996. An adoption study of DSM-IIIR alcohol and drug dependence severity. Drug and Alcohol Dependence 41(1):9). The clinical data used in the study was derived from the latest two rounds of structured interviews conducted in our studies (1999-2003 and 2004- 2009). The core instrument for these studies was an adaptation of the Structured Assessment for the Genetic Studies of Alcoholism, version 2 (SSAGA-II) (Bucholz KK, Cadoret R, Cloninger CR, Dinwiddie SH, Hesselbrock VM, Nurnberger JI, Jr., Reich T, Schmidt I, Schuckit MA. 1994. A new, semi-structured psychiatric interview for use in genetic linkage studies: a report on the reliability of the SSAGA. J Stud Alcohol 55(2): 149-58). The lifetime symptom counts for ND and Fagerstrom Tests for Nicotine Dependence (FTND) scores were compiled from this data using DSM-IV criteria and published scales as previously described (Heatherton TF, Kozlowski LT, Frecker RC, Fagerstrom KO. 1991. The Fagerstrom Test for Nicotine Dependence: a revision of the Fagerstrom Tolerance Questionnaire. Br J Addict 86(9): 1119-27; Philibert RA, Ryu GY, Yoon JG, Sandhu H, Hollenbeck N, Gunter T, Barkhurst A, Adams W, Madan A. 2007. Transcriptional profiling of subjects from the Iowa adoption studies. Am J Med Genet B Neuropsychiatr Genet 144(5):683-90). These scores and the rest of the available clinical data were then reviewed by two board-certified psychiatrists to provide two pools of individuals; a set of cases with severe, active ND and a set of age and ethnicity matched controls without a history of behavioral illness or significant alcohol, nicotine or illicit substance use.

The lymphoblast DNA used in the study was prepared from standard EBV transfected cell lines that were grown in standard bovine serum-based growth media supplemented with /-glutamine and penicillin-streptomycin as previously described (Philibert RA, Ryu GY, Yoon JG, Sandhu H, Hollenbeck N, Gunter T, Barkhurst A, Adams W, Madan A. 2007. Transcriptional profiling of subjects from the Iowa adoption studies. Am J Med Genet B Neuropsychiatr Genet 144(5):683-90). The media was changed for each of these cell lines 24 hours prior to the extraction of DNA.

Input and methylation enriched fractions of DNA were prepared per the standard Nimblegen protocol (Roche Nimblegen I. 2007. Sample Preparation Protocol For DNA methylation Microarrays v3.0. Indianapolis). Briefly, 20 μg of DNA was reduced in complexity by digestion with Mse I, column, and a small aliquot taken for future analysis (i.e. input DNA). Five μg of the remainder of the digested DNA from each subject was resuspended in immunoprecipitation buffer (50 mM NaPO 4 , 700 mM NaCl, 0.25% Triton X-100) and hybridized with 1 μg of monoclonal mouse anti-5 -methyl cytidine antibody (Calbiochem USA) at 4° C overnight. The resulting solution was then hybridized to a magnetic bead coupled secondary antibody (Dynabeads M-280, Invitrogen USA) and the DNA- antibody moiety purified by magnetic separation. The DNA was removed from the antibody complex by overnight digestion with protease K and column purified. Then 100 ng aliquots of both the methyl enriched DNA fraction and the input DNA were amplified using a WGA2 genome amplification kit used according to manufacturer's instructions (Sigma, St. Louis). After purification, this DNA was then frozen at -2O 0 C until use in the microarray analyses.

Hybridization to the 385K RefSeq whole genome promoter array (HGl 8 RefSeq) was conducted by Roche-Nimblegen (Indianapolis) under contract. These arrays contain 50-75 mer probes to 18,028 annotated RefSeq gene promoters with an average probe spacing of 100 bp. The resulting data, including the scaled log 2 weighted ratios of the Cy3 (input) and Cy5 (methyl enriched) hybridization signals used in this report, were returned via courier.

The resulting data were then analyzed using a two-step process. In the first step, t-tests were conducted to identify probes whose hybridization values differed between the cases and controls at a significance level of p<0.01 (uncorrected). A clustering algorithm was then applied to this reduced probe set to identify probes which co- localized.

Bisulfite confirmation of differential methylation was conducted using standard procedures. Briefly, the DNA for each subject was first bisulfite modified then amplified using an Epitech® 96 Bisulfite and an Epitech® Whole Bisulfitome kit (both Qiagen, USA) according to manufacturer's instructions. The DNA samples were then amplified using a nested PCR protocol (1 st round primers, AGTGTTGGTGTATTTATTTTAAAA (SEQ ID NO: 10) and TCCTAAAAACAAATATCTTTCAATC (SEQ ID NO: 11); 2 nd round primers TAACAATACTAATCATTTCATAAAATA (SEQ ID NO: 12) and

AGTTTAGTAATTTGGAATAATAGGTTT (SEQ ID NO: 13)). The resulting PCR products were gel purified, cloned using a StrataClone TA cloning kit (Stratagene USA), then sequenced at the University of Iowa DNA facility. The methylation status of each residue was then determined using CpG Viewer (Carr IM, Valleley EMA, Cordery SF, Markham AF, Bonthron DT. 2007. Sequence analysis and editing for bisulphite genomic sequencing projects. Nucl Acids Res 35(10):e79-) and the resulting data was analyzed via chi-square testing.

Gene pathway analysis was conducted using the web version of GOMiner™ using the default settings (Zeeberg B, Feng W, Wang G, Wang M, Fojo A, Sunshine M, Narasimhan S, Kane D, Reinhold W, Lababidi S and others. 2003. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biology 4(4):R28) while frequency analyses were conducted using the binomial test (Fleiss JL. 1981. Statistical Methods for Rates and Proportions. New York, NY: John Wiley & Sons Inc.). Comparison of Cy3/Cy5 weighted values was conducted using logistic regression (Fleiss JL. 1981. Statistical Methods for Rates and Proportions. New York, NY: John Wiley & Sons Inc.). RESULTS

The clinical and demographic information for the 41 subjects used in the case and control analyses are given in Table 8. All subjects were White with the average age of the cases being 43 ± 7 years old and the controls 46 ± 7 years old (p<0.17). The cases averaged over a pack of cigarettes per day at the time of phlebotomy with almost all of them having smoked heavily for over 20 years.

Table 8. Clinical and Demographic Data

Cases Controls

Male Female Male Female

5 N 10 13 10 8

Age (years ± SD) 47 ± 8 41 ± 5 47 ± 9 46 ± 5 DSM IV ND 4.8 ± 2.0 5.2 ± 1. 0 - - Symptom Count

10 FTND 4.7 ± 2.7 4.4 ± 2. 4 - -

Daily Cigarette Consumption 25 ± 9 22 ± 9 - -

Years Smoking 24 ± 10 21 ± 7

15

*Fagerstrom Test for Nicotine Dependence Scale (FTND), DSM IV Diagnostics and Statistics Manual Version 4

The methylation signals were analyzed as a group and by gender. The first analysis contrasted the signal from all cases versus all controls. The second analysis, consistent with prior strategies for analyzing behavioral data featured gender specific analyses. As the initial step of all cases (n=23) vs controls (n=18) contrast, t-tests were conducted comparing the scaled and weighted Cy5/Cy3 ratios of the cases to that of the controls. Overall, the hybridization signal for 2534 probes differed at an uncorrected p- value of p<0.01. Because the clinical phenomenology associated with ND differs between males and females in our population, we then conducted gender specific analyses using the same methods. In this contrast, the male ND cases had 1790 probes that were differentially methylated at a p value of <0.01 while the female ND cases had 2070 probes that were differentially methylated at an uncorrected p value of p<0.01. Fifteen of the significant probes in the female only contrast were also found to be significant in the male only contrast (p<0.03) with two of those probes localizing to the same gene promoter (SLCO2B1).

Since our prior work in this area has demonstrated that methylation analyses are inherently noisy, we performed a cluster analysis of all significant probes from the combined set (<0.01) in order to increase the likelihood that the gene promoters selected for further analysis would represent real signal. The distribution of these differentially hybridizing probes was significantly nonrandom with 237 of the probes, localizing to just 113 gene promoters (p<0.0001). Seven gene promoters had three significant probes. Table 9 gives the HUGO approved names for the 106 genes that have names and which have two or more significant probes localizing to the gene promoter.

Table 9. List of Genes with Two or More Significant Probes

3 Significant Probes

ANKRDl 3 A ATG2A AX2R CSNKl G2 NOVAl SETBPl SLM02

2 Significant Probes

AFAPlLl ATPI lA C15orf57 C16orf61 C6orfl95 C7orf45 C9orf72

CAMTAl CARHSPl CCDC 144NL CCNH CCT6B CFTR CMIP

CNTDl COL4A3 CSMDl CTBP2 D2HGDH DDX41 DLGAP2

DMRTA2 DOPEY2 EBF2 EIF4H ELL EMP3 ENTPD2

ENTPD2 EPOR FER FGR FHDCl FHODl FOXCl

FXN GDFlO GFPT2 GPM6A GRIK2 HIRIP3 HIST1H2BK

HIST2H2 AA3 HSD 17B4 IFNAl 7 ISL2 JPH2 KBTBD2 LAXl

LBXl LOC254559 LRRC66 LRRN2 MAT2B NECAB3 NID2

NUBPl OTOPl PARP4 PDE5A PNLDCl PNMA5 PPIA

PRKARlA PRR7 PTDSS2 PTPRN2 RACl RBM20 RNPSl

RPIA RPL39L RPS17 SAVl SCG5 SFRS 17A SG0L2

SH2D4B SKPl SLC25A21 SLC5A5 SLC02B1 SOXl 7 SSTRl

STK40 TACR3 TBC1D8B TESC THOPl TMC2 TOPORS

TP53INP1 ZIC5 ZNF 148 ZNF830 ZPLDl

The list of RefSeq genes with CpG islands containing two or more significantly probes that were symmetrically associated with active nicotine dependence (p<0.01 nominal) in the genome wide analysis. Briefly, to generate this list, the normalized Log 2 hybridization ratios scores were analyzed a two-step process. In the first step, genome wide t-tests were conducted to identify probes whose hybridization values differed between the cases and controls at a significance level of p<0.01 (uncorrected). A clustering algorithm was then applied to this reduced probe set to identify probes which co-localized to a 1000 bp sliding winding in the same island. Then, the genomic location of the CpG was checked against the HG 18 build of the human genome to identify RefSeq annotated genes associated with the island.

These 106 named genes from Table 9 were then subjected to pathway analysis using GOMiner™ (Zeeberg and others 2003) to identify gene pathways whose methylation patterns are differentially affected by smoking. In brief, the results of the analysis show that epigenetic change in proteins associated with cell proliferation and transmembrane transport are recurrent themes in these analyses (Table 10).

Table 10. Gene Pathway Analysis of the 113 Promoters with 2 or More Significant Probes.

Go Miner Category Changed Genes/Total Genes P Value

GO:0007215 Glutamate Signaling Pathway 2/5 <0.001

GO:0008285 Negative Regulation of Cell Proliferation 6/128 <0.002

GO:0016607 Nuclear Speck 4/59 <0.002

GO:0016614 Oxidoreductase Activity Acting on CH-OH 4/60 <0.003

GO:0003007 Heart Morphogenesis 2/9 <0.003

GO:0000786 Nucleosome 2/11 <0.005

GO:0042626 ATPase Activity Coupled to 2/11 <0.005

Transmembrane Movement of Substances GO:0022804 Active Transmembrane Transporter Activity 3/37 <0.005 GO:0016820 Hydrolase Activity Acting on Acid Anhydrides 2/12 <0.006 GO:0043492 ATPase Activity Coupled to Movement of Substances 2/12 <0.006 GO:0022414 Reproductive Process 6/178 <0.006 GO:0016604 Nuclear Body 4/79 <0.006 GO:0007548 Sex Differentiation 3/45 <0.008 GO:0051082 Unfolded Protein Binding 3/45 <0.008 GO:0015276 Ligand-Gated Ion Channel Activity 2/15 <0.008 GO:0022834 Ligand-Gated Channel Activity 2/15 <0.008 GO:0000003 Reproduction 8/324 <0.009 GO:0030551 Cyclic Nucleotide Binding 2/16 <0.009

GO:0003006 Reproductive Developmental Process 3 3//44y9 < <υ0..u0i1

GO:0022892 Substrate-Specific Transporter Activity 77//226655 < <00..0011

In order to validate the microarray analyses, we conducted sequencing of plasmid clones of bisulfite PCR products from 15 randomly selected cases (92 clones in total) and 15 randomly selected controls (77 clones in total) with respect to the AX2R promoter across the three significant probe regions identified in our initial analyses which localized to this gene. Figure 8 shows the structure of the tiled region of the AX2R gene promoter. Table 11 gives the Cy3/Cy5 methylation ratios for the cases and controls, as well as their uncorrected p-values. When evaluating the strength of these p- values, it is important to note that these probes all recognize the same DNA contig produced by the Mse I digest.

Table 11. Sequence and Significance of AX2R Probes

Cv3/Cv5 ratio (input to i methyl enriched fraction)

Probe Sequence Avg Cases Avg Controls Difference Pvalue

Ttcaggtgccaggtctggagtgctggtgcacctatctcaaaacgctgtct (SEQ ID NO: 14) 1.58 1.32 0.27 <0.03 gcaaacagcagtccagtaacctggaacaacaggctctgcgaaaccaagga (SEQ ID NO: 15) 1.84 1.48 0.36 <0.005 agaaatgaatggcgttgtcatcgaaaaaacacagactcgattgtgacagaaataccg (SEQ ID NO: 16) 1.66 1.34 0.32 <0.005 tgcgcctccacggaataactgccagccggcacagtgcgagtgagaaaccg ( SEQ ID NO : 17) 1.74 1.40 0.34 <0.009 ggaaaagaatccgacgtcgccaacaagcggtgctaccaggagaaacgcct (SEQ ID NO: 18) 1.52 1.29 0.23 <0.09 aaaacacagctggataaaccgagaaccttcggagtggttgcaccgaaacg (SEQ ID NO: 19) 1.52 1.29 0.23 <0.09 gaagcaaccggcagtgctaacaccgaggagcacctagagcggcaaaacta (SEQ ID NO:20) 1.18 0.86 0.32 <0.04

Table 12 gives the average methylation ratios for the sequenced CpG residues in the targeted region. As evidenced by the consistently elevated Cy3/Cy5 ratios across the promoter region, there was a relative decrease in the amount of methylation in the ND subjects as compared to the controls. This was particularly evident in the second, third and fourth probes covering the region. Bisulfite sequencing of this region of plasmid clones containing inserts from the PCR products of the bisulfite converted DNA samples from cases and controls confirmed those observations and demonstrated a nearly twofold greater amount of unmethylated residues in the smoking subjects as compared to the controls (average methylation in cases vs controls; 77.6% vs 88.8%, pO.OOOl).

Table 12. Average Methylation at Bisulfite Sequenced Residues at AX2R

CG 4 CG 5 CG 6 CG 7 CG 8 CG 9 CG lO CG Il CG 12 CG 13 CG 14 CG 15 CG 16 CG 17

Cases*

85% 77% 78% 71% 79% 78% 78% 76% 79% 80% 78% 71% 78% 77%

Controls

88% 91% 87% 92% 88% 91% 87% 90% 92% 92% 88% 74% 94% 88%

Average number of residues successfully counted per CpG residue in the cases and controls was 74 and 89, respectively.

Finally, in order to compare our results with respect to previous published results using peripheral lymphocyte DNA (Launay J-M, Del Pino M, Chironi G, Callebert J, Peoc'h K, Megnien J-L, Mallet J, Simon A, Rendu F. 2009. Smoking Induces Long-Lasting Effects through a Monoamine-Oxidase Epigenetic Regulation. PLoS ONE 4(11):e7959), we compared the probe values between cases and controls at this X-chromosome locus. Consistent with prior findings, the amount of methylation at the MAOB promoter was significantly decreased in both males (LR; p<0.006) and females (LR; p<0.007). DISCUSSION In summary, it is reported that smoking is associated with both altered overall and locus specific alterations in DNA methylation with particular enrichment of altered methylation in pathways associated with glutamate signaling, cell proliferation and detoxification. Strengths of this study are the well characterized subjects, the similarity of the whole genome promoter array results for males and females, and the sequencing confirmation.

The vast majority of the loci with differential methylation in this study are not directly involved with neurotransmission. Consistent with the role of smoking in cancer and altered DNA methylation part of the oncogenic process, it is logical to find that each of the 7 genes with 3 significant probes have suggested roles in carcinogenesis (Buckanovich RJ, Yang YY, Darnell RB. 1996. The onconeural antigen Nova-1 is a neuron-specific RNA-binding protein, the activity of which is inhibited by paraneoplastic antibodies. J Neurosci 16(3): 1114-1122; Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C and others. 2007. Patterns of somatic mutation in human cancer genomes. Nature 446(7132): 153-8; Koike Folgueira MA, Brentani H, Carraro DM, De Camargo Barros Filho M, Hirata Katayama ML, Santana de Abreu AP, Mantovani Barbosa E, De Oliveira CT, Patrao DF, Mota LD and others. 2009. Gene expression profile of residual breast cancer after doxorubicin and cyclophosphamide neoadjuvant chemotherapy. Oncol Rep 22(4):805-13; Ma S, Huang JK, Shen S. 2009. Identification of Cancer Associated Gene Clusters and Genes Via Clustering Penalization. Statistics and Its

Interface 2:1-11; Masayoshi M, Naoki K, Manuel JG-R, Tatsuo A, Terry DC, Kunihiro U, Yoshifumi A. 2001. Identification and characterization of SEB, a novel protein that binds to the acute undifferentiated leukemia-associated protein SET. European Journal of Biochemistry 268(5): 1340- 1351; Wright PK, May FE, Darby S, Saif R, Lennard TW, Westley BR. 2009. Estrogen Regulates Vesicle Trafficking Gene Expression in EFF-3, EFM-19 and MCF-7 Breast Cancer Cells. Int J Clin Exp Pathol 2(5):463-75; Yusuke S, Aaron MH, Younghun J, Anne MZ, Elisabeth AP, Jingcheng W, Jianhua W, Ganwei L, Roodman GD, Robert DL and others. 2008. Annexin II/Annexin II receptor axis regulates adhesion, migration, homing, and growth of prostate cancer, p 370-380). This suggests that the processes affected by smoking in other cells may be reflected in the differential methylation of the lymphoblasts, and that the lymphoblast model may provide a reasonable representation of systemic methylation changes.

In light of this apparent enrichment of genes involved carcinogenesis, it is notable that most significant Gene Ontology (Gene Ontology C. 2004. The Gene Ontology (GO) database and informatics resource. Nucl Acids Res 32(suppl_l):D258- 261) pathway identified in the GOMiner™ analysis in this study is the glutamate signaling pathway (on the basis of GRIK2 and SSTRl). GRIK2 gene expression is decreased in the brains of smoking mice (Wang J, Gutala R, Hwang Y, Kim J, Konu O, Ma J, Li M. 2008. Strain- and region-specific gene expression profiles in mouse brain in response to chronic nicotine treatment, p 78-87) and genetic variation in GRIK2 (Vink JM, Smit AB, de Geus EJ, Sullivan P, Willemsen G, Hottenga JJ, Smit JH, Hoogendijk WJ, Zitman FG, Peltonen L and others. 2009. Genome-wide association study of smoking initiation and current smoking. Am J Hum Genet 84(3):367-79) was linked to smoking in recently published GWAS of smoking. These recent and other prior findings support a role for glutamate signaling in the mood altering and drug reinforcing effects of nicotine (Lambe EK, George TP. 2008. Perspective: Translational Studies on Glutamate and Dopamine Neurocircuitry in Addictions: Implications for Addiction Treatment. Neuropsychopharmacology 34(2):255-256). These current results add to that body of evidence and further suggest that the role of glutamate signaling system should receive greater attention in analyses of mechanisms of addiction associated with smoking. It will be important to identify which of these methylations are static and which are dynamic. In our previous work at MAOA, we found that reduction in methylation was particularly pronounced as a result of smoking cessation and given MAOA' s prominence in catabolizing dopamine, we speculated that this epigenetic change could be part of the withdrawal syndrome. Given the current systematic findings, the findings of others with respect to methylation of the MAOB gene promoter (Launay J-M, Del Pino M, Chironi G, Callebert J, Peoc'h K, Megnien J-L, Mallet J, Simon A, Rendu F. 2009. Smoking Induces Long-Lasting Effects through a Monoamine-Oxidase Epigenetic Regulation. PLoS ONE 4(11):e7959) and our results at MAOB, it is unlikely that the MAOA promoter is the only regulatory motif changed after smoking cessation. If so, by studying withdrawal on a genome wide basis, it may be possible to more readily identify the pathways involved in nicotine craving and devise more effective interventions to short circuit this disruptive syndrome that obfuscates effective treatment.

EXAMPLE 5: Methylation Profiling of Alcohol Dependence The list of RefSeq genes with CpG islands containing two or more significantly probes that were symmetrically associated with active alcohol dependence (p<0.001 nominal) in the genome wide analysis is provided in Table 13. Briefly, to generate this list, the normalized Log 2 hybridization ratios scores were analyzed a two-step process. In the first step, genome wide t-tests were conducted to identify probes whose hybridization values differed between the alcohol cases and controls at a significance level of pO.OOl (uncorrected). A clustering algorithm was then applied to this reduced probe set to identify probes which co-localized to a 1000 bp sliding winding in the same island. Then, the genomic location of the CpG was checked against the HG 18 build of the human genome to identify RefSeq annotated genes associated with the island. Table 13

ABCA12 EMILIN3 ODZ2 WAC

ABL2 EXOC6B ODZ4 WASF2

AGBLl FAM125B PARK2 WDR78

AK097539 FBXL4 PDAPl WSCD2

AK125749 FCGBP PDGFA XPRl

AK128353 FLJl 6779 PDGFRA XRCC5

AKl 29763 FOXN3 PDLIMl ZBTB7B

AK309744 FXRl PDXDC2 ZDHHC2

AK311380 GAD2 PDZD2 ZFHX4

AKAP 12 GNB3 PEX7 ZFP92

AMPH GRAP2 PIHlDl ZNF221

ANKRD53 GTF2I PLSCR3 ZNF263

APBAl HCCA2 PNRCl ZNF33A

APBB2 HEXIMl PON2 ZNF423

ARHGAPlO HEXIM2 PPMlA ZNF623

ARHGEF 16 HIPlR PSG6 ZSCAN5A

ARLl 7 HISPPDl QSERl

ATP6V1E1 HOXA2 RAB26

BC032407 HSFl RMND5A

BC051727 IMAA ROSl

Cl lorf64 IMMT RPIA

C12orf53 INTSlO RSPOl

ClorflOl ITGA5 RUNDC2C

C20orfl l7 KCNH5 SCT

C7orf50 KCNQl SDF4

C9orf72 KLHDCl SLC02B1

C9orf82 KLHL9 SMCR7L

CCBLl LHFPL3 SNTB2

CDC 123 LNPEP SPATA5

CDH5 LOC100133545 SPDYE3

CHRl 1 :002610294 LOC284805 SPNS2

COL2A1 LRSAMl SRL

COPS7A MECP2 STAM2

CPNE4 MIB2 STK36

CR936796 MLL SYTl 3

CSMDl MPZLl TANCl

CSRNP3 MYO9B TJP2

DGKH NAT9 TMEM205

DNHDl NBPF 14 TNRC6B

DOCKI l NEATl TRAF3IP2

DOCK4 NF2 TXNDCI l

DPYl 9L4 NME2P1 USP45

DYNClLIl NOC2L UVRAG

EDEM2 NPASl VCPIPl EXAMPLE 6: Methylation Profiling of Cannabis Dependence

Table 14 provides a list of CpG residues whose methylation was significantly associated with Cannabis Dependences at a nominal p-value of p<0.05. For female subjects: CpG 69 and CpG 88. For male subjects: CpG 11-12, 13, 64, 69, 72-73. Unpublished data from Philibert et al., 2008 "MAOA methylation is associated with nicotine and alcohol dependence in women."

The list of RefSeq genes with CpG islands containing two or more significantly probes that were symmetrically associated with active cannabis dependence (p<0.01 nominal) in the genome wide analysis is provided in Table 14. Briefly, to generate this list, the normalized Log 2 hybridization ratios scores were analyzed a two-step process. In the first step, genome wide t-tests were conducted to identify probes whose hybridization values differed between the cannabis cases and controls at a significance level of p<0.01 (uncorrected). A clustering algorithm was then applied to this reduced probe set to identify probes which co-localized to a 1000 bp sliding winding in the same island. Then, the genomic location of the CpG was checked against the HG 18 build of the human genome to identify RefSeq annotated genes associated with the island.

Table 14

AK056486 ANKHD1 LOC283050 FNTB SRRD

HES4 RNF14 GSTO2 WDR25 PATZ1

BC033949 EBF1 BC 132944 AKT1 DNAL4

SDF4 RREB1 PWWP2B TMCO5A CYP2D7P1

AURKAIP1 TXN DC5 DRD4 MGA MIOX

AX747988 ABT1 PNPLA2 CDAN 1 SHOX

MSTP2 GLP1 R HCCA2 TSPAN3 CD99

ECE1 C6orf130 HCCA2 C16orf13 RPL39

C1orf212 TRERF1 AK126380 TMEM159 MCF2

TEKT2 TAAR1 IGF2 ATP2A1

STK40 UST INS-IGF2 IMAA

CYP4Z1 AKAP12 TRPM5 IRX3

PARS2 FNDC1 KCNQ1 RTN4RL1

NBPF16 IGF2R KCNQ1 TEKT1

MST01 IGF2R KCNQ1 SLC25A35

KIAA0907 BC087858 SLC22A18AS GAS7

TOMM40L AK299216 NAP1 L4 HS3ST3B1

PTGS2 HOXA OSBPL5 C17orf76

AK095633 AK093987 MRGPRE USP22

OR2T1 CDK13 DENND5A STAT5B

MYT1 L C7orf40 CALCB KPNB1

TSSC 1 GTF2IRD1 SYT13 TMEM100

AK055918 LMTK2 APLNR AXIN2

EPAS1 SLC26A5 MACROD1 ASPSCR1

TMEM177 C7orf60 KCN K4 TBCD

GAL3ST2 FLJ43663 EHD1 SETBP1

EFHB EXOC4 PPP2R5B ST8SIA5

GLT8D1 MFHAS1 CAPN 1 LIPG

GLT8D1 NUDT18 PITPNM1 CTDP1

ITIH4 UNC5D PITPNM1 BSG

KIAA1013 DPY19L4 GPR83 GPX4

PDZRN3 LY6K NCAM 1 SBN02

CGGBP1 DNAJ B5 OPCML STK11

RG9MTD1 UNC13B IFFO1 KIAA1532

IGSF11 FXN SLC2A3 ALKBH7

AMOTL2 C9orf85 NDUFA4L2 ICAM 1

GNB4 PCSK5 PEBP1 KCNA7

RPL39L AK309476 SIRT4 ETFB

MFSD7 MEGF9 OASL ZNF530

AX748388 CIZ1 RSRC2 TCF15

CRMP1 USP20 RIMBP2 PSMF1

PDGFRA NTNG2 IFT88 BTBD3

H2AFZ GTF3C5 PARP4 SLC12A5

LARP7 CAMSAP1 ESD EYA2

CLCN3 LCNL1 TPPP2 ZNF217

ANKRD37 C10orf18 EFS TPD52L2

AHRR STOX1 C14orf147 DNAJC5

IRX1 CHST3 S0CS4 SIK1 All publications, patents and patent applications are incorporated herein by reference. While in the foregoing specification this invention has been described in relation to certain embodiments thereof, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the invention is susceptible to additional embodiments and that certain of the details described herein may be varied considerably without departing from the basic principles of the invention.

The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms "comprising," "having," "including," and "containing" are to be construed as open- ended terms {i.e., meaning "including, but not limited to") unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.

Embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.