Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BHC80 - HISTONE COMPLEXES AND USES THEREOF
Document Type and Number:
WIPO Patent Application WO/2009/015283
Kind Code:
A2
Abstract:
Provided herein is the tertiary structure of domains of the BHC80 protein, including the PHD finger domain, such as when the BHC80 protein is complexed with an unmodified histone H3 tail in the presence of zinc. The invention provides structural and functional information for use in the identification and design of compounds that agonize, antagonize or otherwise regulate enzyme (e.g., demethylase) activity, and to the compounds identified by such methods and the research, diagnostic and therapeutic uses of such compounds.

Inventors:
SHI YANG (US)
LAN FEI (US)
CHENG XIAODONG (US)
COLLINS ROBERT E (US)
HORTON JOHN R (US)
Application Number:
PCT/US2008/071042
Publication Date:
January 29, 2009
Filing Date:
July 24, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HARVARD COLLEGE (US)
UNIV EMORY (US)
SHI YANG (US)
LAN FEI (US)
CHENG XIAODONG (US)
COLLINS ROBERT E (US)
HORTON JOHN R (US)
International Classes:
C07K14/47; A61K38/00
Domestic Patent References:
WO2006071608A2
Other References:
LAN, F. ET AL.: 'Recognition of unmethylated histone H3 lysine 4 links BHC80 to LSDI-mediated gene repression.' NATURE. vol. 448, no. 7154, 09 August 2007, pages 718 - 722
DATABASE GENBANK 15 February 2005 Database accession no. AAH15714
DATABASE GENBANK 22 May 2001 Database accession no. AAF64262
DATABASE GENBANK 03 June 2007 Database accession no. NP_057705
LI, H. ET AL.: 'Molecular basis for site-specific read-out of histone H3K4me3 by the BPTF PHD finger of NURF.' NATURE. vol. 442, no. 7098, 06 July 2006, pages 91 - 95.
WYSOCKA J ET AL.: 'A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling.' NATURE. vol. 442, no. 7098, 06 July 2006, pages 86 - 90.
SHI X ET AL.: 'ING2 PHD domain links histone H3 lysine 4 methylation to active gene repression.' NATURE. vol. 442, no. 7098, 06 July 2006, pages 96 - 99
MELLOR, J., UK: 'It takes a PHD to read the histone code.' CELL. vol. 126, no. 1, 14 July 2006, pages 22 - 24.
GARCIA-BASSETS, I. ET AL.: 'Histone Methylation-Dependent Mechanisms Impose Ligand Dependency for Gene Activation by Nuclear Receptors' CELL. vol. 128, no. 3, 09 February 2007, pages 505 - 518
Attorney, Agent or Firm:
SMITH, DeAnn, F. (Foley Hoag LLP,155 Seaport Boulevar, Boston MA, US)
Download PDF:
Claims:
HMV-123.25 HU 2741

We claim:

1. A polypeptide comprising at least about amino acids 486 to 543 of SEQ ID NO: 14 or an amino acid sequence that is at least about 90% identical thereto and binds to unmethylated H3K4, wherein the polypeptide does not comprise about amino acids 1-485 and/or about amino acids 544 to 680 of SEQ ID NO: 14.

2. The polypeptide of claim 1, comprising an amino acid sequence consisting essentially of SEQ ID NO: 15.

3. The polypeptide of claim 1, comprising an amino acid sequence consisting essentially of SEQ ID NO: 16.

4. The polypeptide of claim 1, further linked to a heterologous polypeptide.

5. The polypeptide of claim 3, wherein the heterologous polypeptide is selected from the group consisting of histidine tag, glutathione S-transferase tag, hemaglutinin tag, FLAG tag, protein A, protein G, calmodulin-binding peptide, chitin-binding peptide, thioredoxin, maltose binding protein, myc, poly(arginine), and poly(His-Asp).

6. An isolated protein complex, comprising a BHC80 polypeptide or a histone binding homolog thereof and a histone or a BHC80 binding homolog thereof.

7. The isolated protein complex of claim 6, wherein said complex exists in solution.

8. The isolated protein complex of claim 7, wherein said complex is a crystal having the space group C2 and a unit cell with the following parameters: a= about 79.51 Angstroms, b= about 25.39 Angstroms, c= about 62.28 Angstroms, and beta= about 96.9 degrees; or a= about 79.70 Angstroms, b= about 25.23 Angstroms, c= about 62.53 Angstroms, and beta= about 96.6 degrees; or a= about 80.33 Angstroms, b= about 25.36 Angstroms, c= about 62.73 Angstroms, andbeta= about 96.6 degrees.

9. The isolated protein complex of claim 8, wherein said crystal comprises an atomic structure characterized by the coordinates provided herein and deposited at the Protein Data Bank with accession number 2PUY.

10. A method of making a crystal of a BHC80 protein or histone binding homolog thereof and a histone or a BHC80 polypeptide binding homolog thereof, comprising combining a BHC80 protein or homolog thereof and a histone or homolog thereof in a ratio of about

- 74 -

HMV-123.25 HU 2741

1: 1.5 ratio (respectively) and using the sitting drop vapour-diffusion method at about 16°C, with mother liquor containing about 100 mM sodium citrate at about pH 5.6 or MES 6.2-6.5, about 5-20% polyethylene glycol 4000 and about 20% isopropanol.

11. A method for identifying an agent that modulates the interaction between BHC80 and a histone to which BHC80 binds, comprising contacting a BHC80 polypeptide or a histone binding homolog thereof and a histone or a BHC80 binding homolog thereof with an agent under conditions in which the BHC80 polypeptide or homolog thereof and the histone or homolog thereof interact or do not interact in the absence of the agent; and determining the level of interaction between BHC80 polypeptide or a homolog thereof and histone or a homolog thereof, wherein a different level of interaction in the presence of the agent relative to the absence of the agent indicates that the agent modulates the interaction between BHC80 polypeptide and a histone to which BHC80 polypeptide binds.

12. The method of claim 11 , wherein the BHC80 polypeptide or histone binding homolog thereof comprises at least about amino acids 486 to 543 of a human BHC80 protein consisting of SEQ ID NO: 2 or an amino acid sequence that is at least about 90% identical to amino acids 486 to 543 of SEQ ID NO: 2.

13. The method of claim 11, wherein the histone or BHC80 binding homolog thereof comprises at least about amino acids 1-10 of human H3 consisting of SEQ ID NO: 4 or an amino acid sequence that is at least about 90% identical to amino acids 1-10 of SEQ ID NO: 4.

14. The method of claim 11 , wherein the complex further comprises a cofactor.

15. The method of claim 14, wherein the cofactor is zinc.

16. The method of claim 11, wherein the agent is a small molecule.

17. The method of claim 11 , wherein the agent is designed or selected using computer modeling.

18. The method of claim 11 , wherein the agent is designed de novo.

19. The method of claim 11, wherein the agent is designed based on a known modulator.

20. A method of structure-based identification of candidate compounds for the regulation of BHC80 activity, comprising: a) providing a three dimensional structure of a BHC80 - histone complex, the three dimensional structure being selected from the group consisting of:

- 75 -

HMV-123.25 HU 2741 i) a structure defined by atomic coordinates of a three dimensional structure of a crystalline BHC80 - histone complex; ii) a structure defined by atomic coordinates selected from the group consisting of:

(1) atomic coordinates provided herein; and,

(2) atomic coordinates that define a three dimensional structure wherein at least 50% of the structure has an average root-mean-square deviation (RMSD) from backbone atoms in secondary structure elements in at least one domain of a three dimensional structure represented by the atomic coordinates of (1) of equal to or less than about 1.5 A;

(3) a structure defined by atomic coordinates derived from a BHC80 - histone complex arranged in a crystalline manner as described substantially herein; and

(4) a structure of a BHC80 - histone complex constructed using as a template the three-dimensional structure of (i) or (ii); b) identifying at least one candidate compound for interacting with or mimicking the three dimensional structure of (a) by performing structure based drug design with the structure of (a).

21. The method of claim 20, wherein the candidate compound is a candidate inhibitor of BHC80 activity.

22. The method of claim 20, wherein the candidate compound is a candidate enhancer of BHC80 activity.

23. The method of claim 20, wherein the candidate compound is a candidate agonist of BHC80.

24. The method of claim 20, wherein the candidate compound is a candidate antagonist of BHC80.

25. The method of claim 20, further comprising the steps of: synthesizing the candidate compound identified in (b); and selecting candidate compounds from (c) that regulate the activity of the BHC80/histone complex.

26. The method of claim 25, wherein the step of selecting comprises selecting candidate compounds that inhibit the interaction of the BHC80 protein with a histone.

- 76 -

HMV-123.25 HU 2741

27. The method of claim 25, wherein the step of selecting comprises selecting candidate compounds that enhance the interaction of the BHC80 protein with a histone.

28. The method of claim 25, wherein the step of selecting comprises selecting candidate compounds that inhibit or enhance tumorigenesis of a cell or gene transcription.

29. The method of claim 20, wherein said step of identifying comprises computational screening of one or more databases of chemical compounds.

30. The method of claim 20, wherein the compound is predicted to interact with the histone binding domain of the BHC80 protein.

31. The method of claim 25, wherein the compound is predicted to interact with the substrate binding domain defined by the tertiary structure near D489 and M502.

32. The method of claim 20, wherein the compound is predicted to interact with the binding pocket of an unmethylated lysine residues.

33. The method of claim 20, wherein the compound is predicted to interact with a PHD domain tertiary structure of BHC80.

34. A method for identifying a modulator of a BHC80 - histone complex from a database, the method comprising:

(a) providing a set of three-dimensional structure coordinates so as to define part or all of a

BHC80-histone complex;

(b) identifying a draggable region of the BHC80-histone complex; and

(c) selecting from a database at least one potential modulator comprising three dimensional coordinates which indicate that the modulator may bind or interfere with the draggable region.

35. The method of claim 34, further comprising supplying or synthesizing the potential modulator, then assaying the potential modulator to determine whether it modulates BHC80 activity.

36. A method for identifying a BHC80 modulator, comprising:

(a) supplying a computer modeling application with a set of three-dimensional structure coordinates so as to define part or all of a BHC80-histone complex;

(b) obtaining a potential modulator using the three dimensional structure;

(c) contacting the potential modulator with a BHC80; and

- 77 -

HMV-123.25 HU 2741

(d) assaying the activity of the BHC80, wherein a change in the activity of the BHC80 indicates that the compound may be useful as a BHC80 modulator.

37. A method for preparing a potential modulator of a BHC80, the method comprising:

(a) generating one or more three-dimensional structures of a molecule comprising a draggable region from a BHC80;

(b) employing one or more of the three dimensional structures of the molecule to design or select a potential modulator of the draggable region; and

(c) synthesizing or obtaining the modulator.

38. A method for making a modulator of BHC80 activity, the method comprising chemically or enzymatically synthesizing a chemical entity to yield a modulator of BHC80 activity, the chemical entity having been identified during a computer-assisted process comprising supplying a computer modeling application with a set of structure coordinates of a BHC80-histone complex, comprising at least a portion of at least one draggable region; supplying the computer modeling application with a set of structure coordinates of a chemical entity; and determining whether the chemical entity is expected to bind or to interfere with the BHC80-histone complex at a draggable region, wherein binding to or interfering with the BHC80-histone complex is indicative of potential modulation of BHC80 activity.

- 78 -

Description:

BHC80 - HISTONE COMPLEXES AND USES THEREOF

RELATED APPLICATIONS

The present application claims benefit of priority to U.S. Provisional Patent Application No. 60/961,740, filed on July 24, 2007, the contents of which are incorporated by reference herein in their entirety.

GOVERNMENT INTEREST

This invention was made with Government support under grants GM68680 and NCIl 8487 awarded by the U.S. National Institutes of Health. The Government has certain rights in the invention.

BACKGROUND OF THE INVENTION

The histone N-terminal tails are subjected to multiple covalent modifications that affect chromatin structure and consequently transcription. One of the best-characterized modifications is acetylation, which is controlled by both histone acetyltransferases (HATs) and deacetylases (HDACs) suggesting that acetylation regulation is a dynamic process (Kouzarides, 2000). More recently, histone methylation has also emerged as a form of posttranslational modification that significantly impacts chromatin structure (Rice and Allis, 2001; Zhang and Reinberg, 2001). Unlike histone acetylation, which takes places only on lysine (K), methylation occurs on both lysine and arginine (R). While acetylation is generally correlated with active transcription (Roth et al., 2001), histone methylation is linked to both transcriptional activation and repression (Zhang and Reinberg, 2001). For instance, histone H3 K9 (H3-K9) methylation is associated with heterochromatin formation (Nakayama et al., 2001; Peters et al., 2002; Rea et al., 2000) and also euchromatic gene repression (Nielsen et al., 2001; Shi et al., 2003). In the case of heterochromatin assembly, H3-K9 is first methylated by Suv39H, and the methylated K9 is then recognized and bound by the chromodomain protein HPl (Bannister et al., 2001; Lachner et al., 2001; Nakayama et al., 2001). The Suv39H-HPl methylation system is proposed to be responsible for heterochromatin propagation. In contrast, methylation of histone H3 K4 (H3-K4) is linked to active transcription (Liang et al., 2004; Litt et al., 2001; Noma et al., 2001; Santos-Rosa et al., 2002; Schneider et al., 2004), as is methylation of arginine residues of histone H3 and H4 (Zhang and Reinberg, 2001). Mechanisms that underlie methylation-dependent transcriptional activation are not completely understood, although H3-K4-specific

methylases have recently been shown to associate with RNA polymerase II (Hamamoto et al, 2004; Ng et al., 2003b).

While histone acetylation is dynamically regulated by HATs and HDACs, histone methylation has been considered a "permanent" modification. At least two models are currently being considered to explain the turnover of methyl groups on histones. The first one suggests that a cell may remove histone methylation by clipping the histone tail (Allis et al., 1980) or by replacing the methylated histone with a variant histone in the case of methyl group turnover at H3-K9 (Ahmad and Henikoff, 2002; Briggs et al., 2001; Johnson et al., 2004). However, this mechanism would not allow for dynamic regulation of histone methylation and the plasticity that may be essential for gene transcription regulation in some biological processes. The second model proposes the existence of histone demethylases that function to remove the methyl groups from lysine and arginine, which would make dynamic regulation possible. Recently, a human peptidyl arginine deiminase, PAD14/PAD4, has been shown to antagonize methylation on the arginine residues by converting arginine to citrulline, (Cuthbert et al., 2004; Wang et al., 2004). PAD14/PAD4 catalyzes the deimination reaction irrespective of whether the arginine residue is methylated or not. These findings suggest that histone methylation can be dynamically regulated through the opposing actions of histone methylases and enzymes such as PADI4/PAD4. The search for histone demethylases began in the 1960s when Paik and colleagues first reported an enzyme that can demethylate free mono- and di-N-methyllysine (Kim et al., 1964). Subsequently, the same investigators partially purified an activity that can demethylate histones (Paik and Kim, 1973; Paik and Kim, 1974). These early studies suggested the possibility that histone demethylases may exist but the molecular identity of these putative histone demethylases have remained elusive for the past four decades. Classical amine oxidases play important roles in metabolism and their substrates range from small molecules (e.g., spermine and spermidine) to proteins. More recently, amine oxidases have also been proposed to function as histone demethylases via an oxidation reaction that removes methyl groups from lysine or arginine residues of histones (Bannister et al., 2002). KIAA0601 encodes a protein that shares significant sequence homology with FAD-dependent amine oxidases (Humphrey et al., 2001 ; Shi et al., 2003). We identified KIAA0601/NPAO as a component of the CtBP co-repressor complex (Shi et al., 2003), and it has also been found in a number of other co-repressor complexes, including NRD (Tong et al., 1998), Co-REST (You et al., 2001), and subsets of the HDAC

complexes (Hakimi et al, 2002; Hakimi et al., 2003; Humphrey et al., 2001). Recent studies of the C. elegans homolog, SPR-5, provided genetic evidence for a role in transcriptional repression (Eimer et al., 2003; Jarriault and Greenwald, 2002). However, its exact role in transcriptional regulation has been unclear. LSDl is a histone demethylase that represses transcription via demethylation of histone H3K4 \ The LSDl complex contains HDAC 1 and 2, LSDl, Co-REST, BRAF35 and BHC80, a PHD finger-containing protein. Previous studies identified the roles for all but BHC80 in mediating events upstream of LSDl -mediated demethylation 2 ^.

There is a continuing need in the art to identify the components of the transcription regulatory system so that they can be manipulated to treat diseases that involve aberrations of the system.

SUMMARY OF THE INVENTION

Posttranslational modifications of histones regulate chromatin structure and gene expression. Histone demethylases, members of a newly emerging transcription factor family, remove methyl groups from the lysine residues of the histone tails, and thereby regulate the transcriptional activity of target genes. Some histone demethylases rely on association with other proteins to perform certain functions. For example, the histone demethylase lysine-specific histone demethylase (LSDl), a histone demethylase that represses transcription via demethylation of histone H3K4, is part of a complex containing HDAC 1 and 2, LSDl, Co-REST, BRAF35 and BHC80, a PHD finger-containing protein. Previous studies identified the roles for all but BHC80 in mediating events upstream of LSDl -mediated demethylation.

Here, the structure of the BHC80 PHD domain bound to an unmodified H3 peptide has been determined by x-ray crystallography. The structure of the PHD domain, consisting of residues 486-543 of BHC80 bound to the unmodified H3 tail (residues 1-10) revealed the unique elements that form a potential substrate binding pocket. Site-directed mutagenesis of BHC80 motifs in conjunction with in vitro pulldown assays using biotinylated histone peptides allowed the inventors to propose a molecular model for substrate selection by the BHC80 PHD domain.

The structure of the BHC80 PHD domain - H3 peptide complex reveals significant information regarding the unique features of the BHC80 protein and its interaction with H3. These features distinguish the BHC80 protein from other PHD finger domain-containing

proteins, such as BPTF and ING2. Through functional and structural characterization, the inventors have identified the binding site on the BHC80 PHD domain for the H3 peptide. The importance of BHC80 PHD domain residues D489 and M502 to H3 binding was confirmed by mutagenesis, where mutation of D to A and M to W, respectively, abolished PHD binding to the unmodified H3 peptide. The inventors' data further reveals that molecular recognition of unmodified lysine is primarily through bonds to the unmodified epsilon amino group and steric exclusion of methyl groups. The information provided herein provides the molecular basis to design or identify compounds of high affinity and specificity that modulate the binding of BHC80 homologues to histone H3 homologues. Further, it was shown that the BHC80 PHD finger domain binds unmethylated

H3K4 and that this interaction is specifically abrogated by H3K4 methylation. This is in contrast to other known proteins containing PHD finger domains, such as BPTF and ING2, which bind methylated H3K4 (H3K4me3). The finding that BHC80 recognizes a different methylated state at the same lysine residue provides a mechanism for fine-tuning histone methylation.

Still further, the inventors show that modulation of BHC80 protein concentrations in cells, using short-hairpin RNA (shRNA) results in the de-repression of LSDl target genes, including SCNlA, SCN3A and SYNl. Introduction of shRNA-resistant wild-type BHC80, but not PHD mutant D489A, into the cells receiving shRNA restored repression of LSDl target genes, indicating that BHC80 binding to H3Kme0 is important for LSDl -mediated gene repression and that BHC80 is required for LSDl association with H3 after demethylation. A reciprocal interdependence between BHC80 and LSDl for chromatin association was confirmed by showing that BHC80 binding to SYNl and SCNlA promoters was reduced in LSDl -depleted cells, indicating that LSDl -mediated demethylation of H3K4me2 is important for BHC80 chromatin association.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1. BHC80 binds histone H3 through the PHD zinc finger. (A) Diagram of BHC80 domain architecture. (B) In vitro binding assays of recombinant BHC80 to histone tail peptides. (C) Native BHC80 in purified LSDl complex also preferentially binds unmodified H3 tail. (D) PHD finger of BHC80 is necessary for H3 tail binding. (E) PHD finger is sufficient for H3 tail binding. (F) H3K4 methylation inhibits PHD finger binding to H3 tail. (G) ITC measurement of BHC80 PHD binding to unmodified, mono-, or di-

methylated H3K4 peptides (residues 1-10). KD values are averages (with standard deviations) of at least 3 experiments using varied peptide and protein concentrations.

Figure 2. Structure of BHC80 PHD with H3 1-10. (A) Cross-brace topology of the PHD domain. Residues involved in peptide binding are underlined. (B) H3 peptide binds as an anti-parallel β-strand (gray). The zinc atoms are 14.5 A apart. (C) Peptide binding is specified by the insertion of M502 between H3R2 and H3K4 and D489 between H3K4 and H3R8. H3K4 also hydrogen bonds with the carbonyl oxygen of E488, with the β-carbon from H487 (4.0 A away) further defining the unmethylated binding site (also see Figure 8). The N-terminal amine of H3 is caged by carbonyl oxygens, and side chain of H3A1 is recognized by a shallow hydrophobic pocket. (D) Mutations of D489 and M502 disrupt PHD binding to unmodified H3, as determined by In vitro pulldown assays using biotinylated histone peptides.

Figure 3. Structural comparison of the BHC80, BPTF, and ING2 PHD fingers. (A) Superimposition of BHC80 PHD (green, with H3 gray), BPTF PHD (PDB 2F6J, red with H3 pink), and ING2 PHD domains (PDB 2GQ6, blue with H3 purple). BHC80 PHD does not recognize R2, but contacts R8, in contrast to BPTF and ING2. (B) H3K4 recognition by BHC80 (left panel) and H3K4me3 binding by BPTF (middle) and ING2 (right). (C) Sequence alignment of BHC80, BPTF, and ING2 PHD fingers. Zinc binding residues are bold, and residues involved in K4 binding are blue. Red denotes the R2 binding pocket (absent in BHC80), and green the residues that form the N-amine and Al side chain binding pockets. Residues that form the anti-parallel β-sheet are underlined. Invariant residues are marked with stars, and highly conserved residues with colons.

Figure 4. BHC80 binding to H3 tail is important for LSDl -mediated repression. (A) Quantitative ChIP analysis of BHC80 occupancy of SYNl and SCNlA promoters in HeLa cells treated with control or LSDl RNAi. Error Bars represent s.e.m. calculated based on three independent experiments. (B) Endogenous BHC80 was effectively depleted by two independent BHC80 shRNA plasmids. Three LSDl target gene transcripts (SCNlA, SCN3A and SYNl) were derepressed in the BHC80 RNAi cells (lower panel). (C) RNAi- resistant wildtype BHC80, but not the PHD point mutant D489A, restored repression of the target genes in BHC80 RNAi HeLa cells. (D) Reduction of BHC80 in LSDl complex resulted in decreased binding of LSDl to H3 peptide. (E) Inhibition of BHC80 does not affect LSDl complex formation in solution. Similar amounts of LSDl, CoREST and HDACl were co-immunoprecipitated by the LSDl antibody from wildtype or BHC80

RNAi cells (lanes 3 and 4). (F) BHC80 RNAi results in decreased LSDl occupancy at its target genes. The BHC80 RNAi cells showed reduced LSDl occupancies at the target gene loci (SYNl and SCNlA). Error Bars represent s.e.m. calculated based on three independent experiments. Figure 5. BHC80 binds the first 21 residues of histone H3, but the BHC80-H3 interaction is disrupted by methylation of K4, but is insensitive to modifications at K9 or K14.

Figure 6. The co-crystal structure of the BHC80 PHD domain (residues 486 to 543) bound to the unmodified H3 tail (residues 1-10) was solved at the resolution of 1.43 angstroms.

Figure 7. The co-crystal structure of the BHC80 PHD domain bound to the unmodified H3 tail showing binding of the H3 peptide to one of the two molecules in the asymmetric unit

(Figure 7A). The cognate PHD finger contacts the first eight residues of the H3 peptide, but

H3 residues 9 and 10 interact with a neighboring PHD molecule (Figure 7B).

Figure 8. In BHC80, a modeled mono-methyl lysine is allowed only a 15-degree range of motion before clashing with other atoms. Figure 9. Structural comparison demonstrating that the N-terminal Cys-rich domain of

DNMT3L is highly similar to the PHD finger of BHC80.

DETAILED DESCRIPTION OF THE INVENTION

General It is shown herein that the PHD finger of BHC80 of the LSDl demethylase complex binds specifically to histone H3 and that the binding occurs at the unmethylated residue K4 of H3. Furthermore, provided herein is the tertiary structure of the PHD finger domain of the BHC80 protein (residues 486-543) complexed with an unmodified histone H3 tail (residues 1-10) in the presence of Zn. This structural and functional information is proposed, inter alia, for use in the identification and design of compounds that agonize, antagonize or otherwise regulate this demethylase activity and related activities, and to the compounds identified by such methods and the research or therapeutic uses of such compounds.

The embodiments and practices of the present invention, other embodiments, and their features and characteristics, will be apparent from the description, figures and claims that follow, with all of the claims hereby being incorporated by this reference into this Summary.

Definitions

For convenience, certain terms employed in the specification, examples, and appended claims are collected here. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.

The term "agent" is used herein to denote a chemical compound, a small molecule, a mixture of chemical compounds, a biological macromolecule (such as a nucleic acid, an antibody, a protein or portion thereof, e.g., a peptide), or an extract made from biological materials such as bacteria, plants, fungi, or animal (particularly mammalian) cells or tissues. Agents may be identified as having a particular activity by screening assays described herein below. The activity of such agents may render it suitable as a "therapeutic agent" which is a biologically, physiologically, or pharmacologically active substance (or substances) that acts locally or systemically in a subject.

The term "amino acid" is intended to embrace all molecules, whether natural or synthetic, which include both an amino functionality and an acid functionality and capable of being included in a polymer of naturally-occurring amino acids. Exemplary amino acids include naturally-occurring amino acids; analogs, derivatives and congeners thereof; amino acid analogs having variant side chains; and all stereoisomers of any of any of the foregoing.

A "BHC80 - histone complex" refers to a complex comprising a BHC80 protein or histone binding homolog thereof and a histone or BHC80 binding homolog thereof. For example, a BHC80 - histone complex may be a complex between a peptide consisting of peptides 1-10 of human histone H3 that is unmethylated at K4 and the PHD finger of human BHC80.

The term "binding" or "interacting" refers to an association, which may be a stable association, between two molecules, e.g., between a polypeptide and a binding partner or agent, e.g., small molecule, due to, for example, electrostatic, hydrophobic, ionic and/or hydrogen-bond interactions under physiological conditions.

The term "chemical entity," as used herein, refers to chemical compounds, complexes of two or more chemical compounds, and fragments of such compounds or

complexes. In certain instances, it is desirable to use chemical entities exhibiting a wide range of structural and functional diversity, such as compounds exhibiting different shapes (e.g., flat aromatic rings(s), puckered aliphatic rings(s), straight and branched chain aliphatics with single, double, or triple bonds) and diverse functional groups (e.g., carboxylic acids, esters, ethers, amines, aldehydes, ketones, and various heterocyclic rings).

The term "complex" refers to an association between at least two moieties (e.g. chemical or biochemical) that have an affinity for one another. Examples of complexes include associations between antigen/antibodies, lectin/avidin, target polynucleotide/probe oligonucleotide, antibody/anti-antibody, receptor/ligand, enzyme/ligand, polypeptide/ polypeptide, polypeptide/polynucleotide, polypeptide/co-factor, polypeptide/substrate, polypeptide/inhibitor, polypeptide/small molecule, and the like. "Member of a complex" refers to one moiety of the complex, such as a protein. "Protein complex" or "polypeptide complex" refers to a complex comprising at least two polypeptides or proteins. The terms "comprise" and "comprising" are used in the inclusive, open sense, meaning that additional elements may be included.

When using the term "comprising" or "having" herein, it is understood that this term may also be replaced by the phrases "consisting essentially of or "consisting of," where appropriate. For example, "a fragment comprising amino acids 1-100 of sequence X" should be read as providing support for "a fragment consisting essentially of amino acids 1- 100 of sequence X" as well as for "a fragment consisting of amino acids 1-100 of sequence X."

As used herein the term "docking" refers to a process of placing a chemical entity in close proximity with a druggable region, or a process of finding low energy conformations of a chemical entity/druggable region complex.

The term "domain", when used in connection with a polypeptide, refers to a specific region within such polypeptide that comprises a particular structure or mediates a particular function. In the typical case, a domain of a polypeptide is a fragment of the polypeptide. In certain instances, a domain is a structurally stable domain, as evidenced, for example, by mass spectroscopy, or by the fact that a modulator may bind to a druggable region of the domain.

The term "druggable region", when used in reference to a polypeptide, nucleic acid, complex and the like, refers to a region of the molecule which is a target or is a likely

target for binding a modulator. For a polypeptide, a draggable region generally refers to a region wherein several amino acids of a polypeptide would be capable of interacting with a modulator or other molecule. For a polypeptide or complex thereof, exemplary draggable regions include binding pockets and sites, enzymatic active sites, interfaces between domains of a polypeptide or complex, surface grooves or contours or surfaces of a polypeptide or complex which are capable of participating in interactions with another molecule. In certain instances, the interacting molecule is another polypeptide, which may be naturally-occurring. A draggable region may be on the surface of the molecule. For example, the draggable region of BHC80 may be the region of BHC80 that interacts with the unmethylated histone, in particular, amino acids 1-8 of unmethylated H3K4. The draggable region of a histone may be the region that interacts with BHC80, such as about amino acids 486-543 of human BHC80.

Draggable regions may be described and characterized in a number of ways. For example, a draggable region may be characterized by some or all of the amino acids that make up the region, or the backbone atoms thereof, or the side chain atoms thereof

(optionally with or without the Ca atoms). Alternatively, in certain instances, the volume of a draggable region corresponds to that of a carbon based molecule of at least about 200 amu and often up to about 800 amu. In other instances, it will be appreciated that the volume of such region may correspond to a molecule of at least about 600 amu and often up to about 1600 amu or more.

Alternatively, a draggable region may be characterized by comparison to other regions on the same or other molecules. For example, the term "affinity region" refers to a draggable region on a molecule (such as a polypeptide of the invention) that is present in several other molecules, in so much as the structures of the same affinity regions are sufficiently the same so that they are expected to bind the same or related structural analogs. An example of an affinity region is an ATP-binding site of a protein kinase that is found in several protein kinases (whether or not of the same origin). The term "selectivity region" refers to a draggable region of a molecule that may not be found on other molecules, in so much as the structures of different selectivity regions are sufficiently different so that they are not expected to bind the same or related structural analogs. An exemplary selectivity region is a catalytic domain of a protein kinase that exhibits specificity for one substrate. In certain instances, a single modulator may bind to the same affinity region across a number of proteins that have a substantially similar

biological function, whereas the same modulator may bind to only one selectivity region of one of those proteins.

When used in reference to a draggable region, the "selectivity" or "specificity' of a molecule such as a modulator to a draggable region may be used to describe the binding between the molecule and a draggable region. For example, the selectivity of a modulator with respect to a draggable region may be expressed by comparison to another modulator, using the respective values of Kd (i.e., the dissociation constants for each modulator- draggable region complex) or, in cases where a biological effect is observed below the Kd, the ratio of the respective ECso's (i.e., the concentrations that produce 50% of the maximum response for the modulator interacting with each draggable region).

A "form that is naturally occurring" when referring to a compound means a compound that is in a form, e.g., a composition, in which it can be found naturally. A compound is not in a form that is naturally occurring if, e.g., the compound has been purified and separated from at least some of the other molecules that are found with the compound in nature.

The term "isolated polypeptide" refers to a polypeptide, in certain embodiments prepared from recombinant DNA or RNA, or of synthetic origin, or some combination thereof, which (1) is not associated with proteins that it is normally found with in nature, (2) is isolated from the cell in which it normally occurs, (3) is isolated free of other proteins from the same cellular source, (4) is expressed by a cell from a different species, or (5) does not occur in nature.

The term "isolated nucleic acid" refers to a polynucleotide of genomic, cDNA, or synthetic origin or some combination there of, which (1) is not associated with the cell in which the "isolated nucleic acid" is found in nature, or (2) is operably linked to a polynucleotide to which it is not linked in nature.

The terms "label" or "labeled" refer to incorporation or attachment, optionally covalently or non-covalently, of a detectable marker into a molecule, such as a polypeptide. The term "percent identical" refers to sequence identity between two amino acid sequences or between two nucleotide sequences. Identity can each be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When an equivalent position in the compared sequences is occupied by the same base or amino acid, then the molecules are identical at that position; when the equivalent site occupied by the same or a similar amino acid residue (e.g., similar in steric and/or

electronic nature), then the molecules can be referred to as homologous (similar) at that position. Expression as a percentage of homology, similarity, or identity refers to a function of the number of identical or similar amino acids at positions shared by the compared sequences. Expression as a percentage of homology, similarity, or identity refers to a function of the number of identical or similar amino acids at positions shared by the compared sequences. Various alignment algorithms and/or programs may be used, including FASTA, BLAST, or ENTREZ. FASTA and BLAST are available as a part of the GCG sequence analysis package (University of Wisconsin, Madison, Wis.), and can be used with, e.g., default settings. ENTREZ is available through the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Md. In one embodiment, the percent identity of two sequences can be determined by the GCG program with a gap weight of 1, e.g., each amino acid gap is weighted as if it were a single amino acid or nucleotide mismatch between the two sequences. Other techniques for alignment are described in Methods in Enzymology, vol. 266:

Computer Methods for Macromolecular Sequence Analysis (1996), ed. Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Preferably, an alignment program that permits gaps in the sequence is utilized to align the sequences. The Smith- Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. MoI. Biol. 70: 173-187 (1997). Also, the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences. An alternative search strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH uses a Smith- Waterman algorithm to score sequences on a massively parallel computer. This approach improves ability to pick up distantly related matches, and is especially tolerant of small gaps and nucleotide sequence errors. Nucleic acid-encoded amino acid sequences can be used to search both protein and DNA databases.

The term "mammal" is known in the art, and exemplary mammals include humans, primates, bovines, porcines, canines, felines, and rodents (e.g., mice and rats).

The term "modulation", when used in reference to a functional property or biological activity or process (e.g., enzyme activity or receptor binding), refers to the capacity to either up regulate (e.g., activate or stimulate), down regulate (e.g., inhibit or suppress) or otherwise change a quality of such property, activity or process. In certain instances, such regulation may be contingent on the occurrence of a specific event, such as

activation of a signal transduction pathway, and/or may be manifest only in particular cell types.

A "modulator" may be a polypeptide, nucleic acid, macromolecule, complex, molecule, small molecule, compound, species or the like (naturally-occurring or non- naturally-occurring), or an extract made from biological materials such as bacteria, plants, fungi, or animal cells or tissues, that may be capable of causing modulation. Modulators may be evaluated for potential activity as inhibitors or activators (directly or indirectly) of a functional property, biological activity or process, or combination of them, (e.g., agonist, partial antagonist, partial agonist, inverse agonist, antagonist, anti-microbial agents, inhibitors of microbial infection or proliferation, and the like) by inclusion in assays. In such assays, many modulators may be screened at one time. The activity of a modulator may be known, unknown or partially known.

The terms "polynucleotide", and "nucleic acid" are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. The following are non- limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified, such as by conjugation with a labeling component. The term "recombinant" polynucleotide means a polynucleotide of genomic, cDNA, semisynthetic, or synthetic origin which either does not occur in nature or is linked to another polynucleotide in a nonnatural arrangement.

A "patient", "subject" or "host" refers to either a human or a non-human animal. The term "pharmaceutically acceptable carrier" is art-recognized and refers to a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, solvent or encapsulating material, involved in carrying or transporting any subject composition or component thereof from one organ, or portion of

the body, to another organ, or portion of the body. Each carrier must be "acceptable" in the sense of being compatible with the subject composition and its components and not injurious to the patient. Some examples of materials which may serve as pharmaceutically acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol; (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) phosphate buffer solutions; and (21) other non-toxic compatible substances employed in pharmaceutical formulations. The term "pharmaceutically-acceptable salts" is art-recognized and refers to the relatively non-toxic, inorganic and organic acid addition salts of compounds, including, for example, those contained in compositions described herein.

The terms "polypeptide fragment" or "fragment", when used in reference to a reference polypeptide, refers to a polypeptide in which amino acid residues are deleted as compared to the reference polypeptide itself, but where the remaining amino acid sequence is usually identical to the corresponding positions in the reference polypeptide. Such deletions may occur at the amino-terminus or carboxy-terminus of the reference polypeptide, or alternatively both. Fragments typically are at least 5, 6, 8 or 10 amino acids long, at least 14 amino acids long, at least 20, 30, 40 or 50 amino acids long, at least 75 amino acids long, or at least 100, 150, 200, 300, 500 or more amino acids long. A fragment can retain one or more of the biological activities of the reference polypeptide. In certain embodiments, a fragment may comprise a draggable region, and optionally additional amino acids on one or both sides of the druggable region, which additional amino acids may number from 5, 10, 15, 20, 30, 40, 50, or up to 100 or more residues. Further, fragments can include a sub-fragment of a specific region, which sub-fragment retains a function of the region from which it is derived. In another embodiment, a fragment may have immunogenic properties. Fragments may be devoid of about 1, 2, 5, 10, 20, 50, 100 or more amino acids at the N- or C-terminus of the wildtype protein.

The term "small molecule" is art-recognized and refers to a composition which has a molecular weight of less than about 2000 amu, or less than about 1000 amu, and even less than about 500 amu. Small molecules may be, for example, nucleic acids, peptides, polypeptides, peptide nucleic acids, peptidomimetics, carbohydrates, lipids or other organic (carbon containing) or inorganic molecules. Many pharmaceutical companies have extensive libraries of chemical and/or biological mixtures, often fungal, bacterial, or algal extracts, which can be screened with any of the assays described herein. The term "small organic molecule" refers to a small molecule that is often identified as being an organic or medicinal compound, and does not include molecules that are exclusively nucleic acids, peptides or polypeptides.

The term "substantially homologous" when used in connection with amino acid sequences, refers to sequences which are substantially identical to or similar in sequence with each other, giving rise to a homology of conformation and thus to retention, to a useful degree, of one or more biological (including immunological) activities. The term is not intended to imply a common evolution of the sequences.

"Substantially purified" refers to a protein that has been separated from components which naturally accompany it. Preferably the protein is at least about 80%, more preferably at least about 90%, and most preferably at least about 99% of the total material (by volume, by wet or dry weight, or by mole percent or mole fraction) in a sample. Purity can be measured by any appropriate method, e.g., in the case of polypeptides by column chromatography, gel electrophoresis or HPLC analysis.

Exemplary compositions

The present invention makes available in a variety of embodiments soluble, crystallized, purified and/or isolated forms of BHC80 and histone polypeptides or homologs thereof and complexes between or including these proteins. These proteins and complexes may be used, e.g., in screening assays for identifying modulators of a demethylase LSDl. "BHC80" is also referred to as "PHD finger protein 21 A" ("PHF21 A"), "BM-006" and "KIAA1696," and is a component of the "BRAF35/HDAC2 complex" or "BRAF35/HDAC2 complex (80 kDa)." BHC80 exists as a longer or a shorter isoform. The nucleotide and amino acid sequences of the short isoform of human BHC80 (634 amino acids) are set forth in GenBank Accession Nos. NM_016621.2 and NP_057705.2, which correspond to SEQ ID NOs: 1 and 2, respectively. The nucleotide and amino acid

sequences of the long isoform of human BHC80 (680 amino acids) are set forth in GenBank Accession Nos. BC015714 and AAH15714, which correspond to SEQ ID NOs: 13 and 14, respectively. The long human isoform contains a leucine zipper at about amino acids 33-54, an AT-Hook at about amino acids 427-437, a PHD zinc finger domain (also referred to as PHD domain or finger) located at about amino acids 486-532, and a second leucine zipper at about amino acids 586-607 (see Fig. 2). In the short isoform, the PHD domain corresponds to about amino acids 440 to 486 of SEQ ID NO: 2, and the amino acid sequence is identical to that of the PHD domain of the long isoform.

Fragments of BHC80 include proteins comprising one or more conserved domains of BHC80, e.g., the Zinc Finger - RING-type domain, Zinc Finger - PHD-type domain, HMG-I DNA-binding domain, HMG-Y, DNA-binding domain, and the Zinc finger - FYVE/PHD-type domain. Fragments or portions of BHC80 include fragments comprising the PHD finger or a portion thereof, e.g., amino acids 486-543 of SEQ ID NO: 14 or about amino acids 440 to 486 of SEQ ID NO: 2, with the proviso that the fragment does not include the full length protein. Exemplary fragments of BHC80 include fragments of the full length protein that do not comprise at least about 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 300, 350, 400, 450, 460, 470 or 480 amino acids at the N-terminus. Fragments may also be fragments of the full length protein that do not comprise at least about 10, 20, 30, 40, 50, 100, 110 or 115 amino acids at the C-terminus. Exemplary fragments of BHC80 are proteins comprising, consisting essentially of or consisting of about amino acids 450, 460, 470, 480, 485, 486, 490, 495 or 500 to about amino acids 520, 525, 530, 535, 540, 543, 545, 550, 555, 560 or 570 of a BHC protein, e.g., having SEQ ID NO: 14. Exemplary fragments of BHC80 may comprise any of the fragments set forth above, with the proviso that the fragment does not comprise about amino acid 1-50, 1-100, 1-150, 1-200, 1-250, 1-300, 1- 350, 1-400, 1-450, 1-460, 1-470, 1-480, 1-485, 1-490 and/or about amino acids 530-680, 535-680, 540-680, 545-680, 550-680, 555-680, 560-680, 570-680, 580-680, 590-680, 600- 680, 620-680, or 640-680 of a BHC protein, e.g., of SEQ ID NO: 14.

Amino acids 486 to 543 of the long isoform of human BHC80 (set forth in Fig. 2) is set forth as SEQ ID NO: 15. Amino aicds 486-532 of the long isoform is set forth as SEQ ID NO: 16. A preferred fragment of BHC80 is a peptide or protein comprising, consisting essentially of or consisting of SEQ ID NO: 15 or 16. Such a peptide may comprise at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20 or more amino acids at the N-terminus and/or C-

terminus of the protein, which amino acids form a sequence that is unrelated to the corresponding sequence in human BHC80.

Fragments of BHC80 may be fragments that are functional, i.e., have biological activity, such as the ability to bind to a histone, such as H3 or a fragment thereof comprising K4, e.g., in an unmethylated form. The phrase "histone binding portion of BHC80" is intended to refer to a fragment of BHC80 that binds to a histone, e.g., at an unmethylated K4 residue of a H3 histone. An exemplary histone binding portion of BHC80 is a peptide consisting of SEQ ID NO: 15.

Histones that may be used in a complex with BHC80 include any histone or variant thereof to which BHC80 can bind. A histone may be histone 3 (H3). Human H3 has the amino acid sequence set forth in GenBank Accession No. NP_003484.1 (SEQ ID NO: 4; see sequence below) and is encoded by the nucleotide sequence set forth in GenBank Accession No. NM_003493.2 (SEQ ID NO: 3).

1 martkqtark stggkaprkq latkvarksa patggvkkph ryrpgtvalr eirryqkste

61 llirklpfqr lmreiaqdfk tdlrfqssav malqeacesy lvglfedtnl cvihakrvti

121 mpkdiqlarr irgera (SEQ ID NO: 4)

Histones may be useful in methylated or unmethylated forms or in forms that are partially methylated. Unmethylated forms of H3 include H3 proteins that are not methylated on K4. An H3 protein may be methylated at K9 and/or K14, but not on K4. A methylation state may be mono-, di or tri-methyl.

Fragments of histones may also be used. Exemplary fragments are proteins comprising, consisting essentially of or consisting of about amino acids 1 or 2 to about amino acids 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 120, 130 and 135 of histone H3, e.g., having SEQ ID NO: 4. Exemplary fragments of H3 comprise amino acid residue 4 with the proviso that the histone is not the full length histone, e.g., it does not comprise amino acid 1, 2, 3 and/or it does not comprise about amino acids 5-136, 6-136, 7-136, 8-136, 9-136, 10-136, 11-136, 12-136, 13-136, 14- 136, 15-136, 16-136, 17-136, 18-136, 19-136, 20-136, 21-136, 22-136, 25-136, 30-136, 35- 136, 40-136, 50-136, 100-136, 110-136, 120-136 or 130-136.

Fragments of a histone may be fragments that are functional, i.e., have biological activity, such as the ability to bind to BHC80, such as the PHD finger of BHC80 and/or

about amino acid residues 486-543. The phrase "BHC80 binding portion of a histone" is intended to refer to a fragment of a histone that binds to BHC80 or a fragment thereof.

Another protein or polypeptide that may be complexed to or present in a complex including BHC80 and a histone is a demethylase, e.g., a histone demethylase. A histone demethylase may be an enzyme that demethylates the residue K4 on histone H3 (a "H3-K4 demethylase"). An exemplary H3-K4 demethylase is LSDl, which is also referred to as "FAD-binding protein BRAF35-HDAC complex, 110 kDa subunit" ("BHCl 10"), "KIAA0601", and "amine oxidase (flavin containing) domain 2" ("AOF2"). The protein exists in two isoforms: variant (1) represents the longer transcript and encodes the longer isoform (a); and variant (2) lacks two alternate in-frame exons, compared to variant 1, resulting in a shorter protein (isoform b), compared to isoform a.

The following Table (Table 1) provides references for the nucleotide and amino acid sequences of the human LSDl proteins: isoform nucleic acid SEQ ID NO protein SEO ID NQ a NM_015013.2 5 NP_001009999 (876 aas) 6 b NM_015013.2 7 NP_055828.2 (852 aas) 8

Table 2: Approximate location of conserved domains in human LSDl proteins: isoform amino oxidase domain SWIRM domain FAD binding motif a aas 548-849; 311-450 aas 195-284 aas 300-359 b aas 524-825; 291-426 aas 175-264 aas 280-339

Eukaryotic histone demethylase enzymes, according to the present invention are those eukaryotic proteins which have a SWIRM domain, a FAD binding motif, and an amine oxidase domain. The presence of these domains can be determined using tools available in the art including NCBI GenBank and NCBI Conserved Domain Search Program. The amino acid sequence of the FAD binding motif is

KVIIIGSGVSGLAAARQLQSFGMDVTLLEARDRVGGRVATFRKGNYVADLGAMW TGLGG.

Another demethylase is AOFl or amine oxidase (flavin containing) domain 1 protein. The amino acid and nucleotide sequences of human AOFl are set forth in GenBank Accession numbers NM_153042 and NP_694587 and in SEQ ID NOs: 9 and 10, respectively. An NAD/FAD-dependent oxidoreductase domain is located at about amino acids 268-588 and a flaving containing amine oxidoreductase domain located at about amino acids 319-587 and 267-322 of SEQ ID NO: 10.

Another protein that may be included in a complex with a demethylase, e.g., LSDl, is CoREST. "CoREST" is a corepressor of REl -silencing transcription factor (REST) and is also referred to as "REST corepressor 1" and "RCORl". The nucleotide and amino acid sequences of human CoREST are set forth in GenBank Accession Nos. NM_015156.1 and NP_055971.1 (482 amino acids), which correspond to SEQ ID NOs: 11 and 12, respectively. The human protein contains the following conserved domains: SANTl (about amino acids 190-293), SANT2 (about amino acids 381-450) and ELM (about amino acids 105-182).

A homolog or analog of a protein of interest, such as BHC80, a histone, LSDl and CoREST, includes proteins comprising or consisting of an amino acid sequence that has at least about 70%, 80%, 90%, 95%, 98% or 99% identity with an amino acid sequence of the protein described herein, such as SEQ ID NOs: 2, 4, 6, 8, 10, 12 or 14 or a fragment thereof. A homolog may also be a protein that is encoded by a nucleic acid that has at least about 70%, 80%, 90%, 95%, 98% or 99% identity with a nucleotide sequence described herein, such as SEQ ID NOs: 1, 3, 5, 7, 9 , 11 or 13 or the coding sequence thereof, or a fragment thereof. A homolog may also be a protein that is encoded by a nucleic acid that hybridizes, e.g., under stringent hybridization conditions, to a nucleic acid consisting of a nucleotide sequence described herein, e.g., SEQ ID NOs: 1, 3, 5, 7, 9 , 11 or 13, or the coding sequence thereof, or a fragment thereof. For example, homologs may be encoded by nucleic acids that hybridize under high stringency conditions of 0.2 to 1 x SSC at 65 0 C followed by a wash at 0.2 x SSC at 65 0 C to a nucleic acid consisting of a sequence described herein. Nucleic acids that hybridize under low stringency conditions of 6 x SSC at room temperature followed by a wash at 2 x SSC at room temperature to nucleic acid consisting of a sequence described herein or a portion thereof can be used. Other hybridization conditions include 3 x SSC at 40 or 50 0 C, followed by a wash in 1 or 2 x SSC at 20, 30, 40, 50, 60, or 65 0 C. Hybridizations can be conducted in the presence of formaldehyde, e.g., 10%, 20%, 30% 40% or 50%, which further increases the stringency of hybridization. Theory and practice of nucleic acid hybridization is described, e.g., in S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology- hybridization with nucleic acid probes, e.g., part I chapter 2 "Overview of principles of hybridization and the strategy of nucleic acid probe assays," Elsevier, New York provide a basic guide to nucleic acid hybridization.

Homologs of proteins described herein, such as BHC80, histones, LSDl and CoREST or fragments thereof may also be proteins that differ from the naturally occurring protein (or fragment thereof), e.g. a protein having an amino acid sequence set forth as SEQ ID NO: 2, 4, 6, 8, 10, 12 or 14, by conservative amino acid sequence differences or by modifications which do not affect sequence, or by both. Analogs can differ from naturally occurring proteins by conservative amino acid sequence differences or by modifications which do not affect sequence, or by both. Any number of procedures may be used for the generation of mutant, derivative or variant forms of a protein of interest using recombinant DNA methodology well known in the art such as, for example, that described in Sambrook et al. (1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York) and Ausubel et al. (1997, Current Protocols in Molecular Biology, Green & Wiley, New York).

For example, conservative amino acid changes may be made, which although they alter the primary sequence of the protein or peptide, do not normally alter its function. Conservative amino acid substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine (in positions other than proteolytic enzyme recognition sites); phenylalanine, tyrosine.

Homologs of a protein of interest also includes portions thereof, such as portions comprising one or more conserved domains, such as those described herein.

Three-dimensional structural homologues of a protein described herein are also encompassed in the term "homolog." A "structure" of a protein refers to the components and the manner of arrangement of the components to constitute the protein. The "three dimensional structure" or "tertiary structure" of the protein refers to the arrangement of the components of the protein in three dimensions. Such term is well known to those of skill in the art. It is also to be noted that the terms "tertiary" and "three dimensional" and "ternary" can be used interchangeably.

A "functional homolog" of a protein of interest refers to a homolog of the protein, e.g., a fragment or a protein having a certain homology, having at least one biological activity of the protein. For example, homologs of BHC80 may be proteins that are functional, i.e., have biological activity, such as the ability to bind to a histone, such as H3 or a fragment thereof comprising K4, e.g., in an unmethylated form. The phrase "histone binding homolog of BHC80" is intended to refer to a homolog of BHC80, e.g., a fragment,

that binds to a histone or a functional homolog thereof, e.g., at an unmethylated K4 residue of a H3 histone. An exemplary histone binding homolog of BHC80 is a protein comprising an amino acid sequence that is at least about 90%, 95%, 97%, 98% or 99% identical to SEQ ID NO: 15 or 16 and has the ability to bind to unmethylated H3K4. Homologs of a histone may be proteins that are functional, i.e., have biological activity, such as the ability to bind to BHC80, such as the PHD finger of BHC80 and/or about amino acid residues 486-543. The phrase "BHC80 binding homolog of a histone" is intended to refer to a homolog, e.g., a fragment, of a histone that binds to BHC80 or a functional homolog thereof, e.g., a functional fragment. A functional homolog of LSDl may be a portion of the wild type LSDl protein including one or more of the conserved domains. A functional homolog of LSDl may comprise at least a portion of the amino oxidase domain, the SWIRM domain and/or the FAD binding motif. Exemplary functional homologs of LSDl isoform a include polypeptides comprising from about amino acid 195, 190, 175, 150 or 100 to about amino acid 849, 850, 860, 870 or 876 of SEQ ID NO: 6. Exemplary functional homologs of LSDl isoform b include polypeptides comprising from about amino acid 175, 174, 170, 150 or 100 to about amino acid 825, 830, 840, 850, 851 or 852 of SEQ ID NO: 8. Functional LSDl homologs may also include those comprising an amino acid sequence from about amino acid 311, 310, 300 or 250 to about amino acid 849, 850, 860, 870 or 876 of SEQ ID NO: 6 (LSDl isoform a) and those comprising an amino acid sequence from about amino acid 291, 290, 280, 270 or 250 to about amino acid 825, 830, 840, 850, 851 or 852 of SEQ ID NO: 8 (homologs comprising the amino oxidase domain). Other LSDl homologs that may have a biological activity include those comprising the SWIRM domain, e.g., about amino acid 195, 190, 175, 150 or 100 to about amino acid 284, 285, 290 or 300 of SEQ ID NO: 6 (LSDl isoform a) or about amino acid 175, 174, 170, 150 or 100 to about amino acid 264, 265, 270, 280, 290 or 300 of SEQ ID NO: 8 (LSDl isoform b).

Functional homologs of AOFl include an oxidoreductase domain, e.g., the NAD/FAD-dependent oxidoreductase domain or the flavin containing amine oxidoreductase domain. Exemplary functional homologs of AOFl include those comprising from about amino acid 268, 260, 250 or 200 to about amino acid 588, 590, 595 or όOO of SEQ ID NO: 10.

Functional homologs of CoREST include the ELM, SANTl and/or SANT2 domains. Exemplary functional homologs of CoREST include those comprising about

from about amino acid 293, 290, 280, 270, 260 or 250 to about amino acid 480 or 482 of SEQ ID NO: 12. Other CoREST functional homologs may comprise from about amino acid 293, 290, 280, 270, 260 or 250 to about amino acid 381, 385, 390 or 300 of SEQ ID NO: 12. Whether a homolog is a functional homolog can be determined according to methods known in the art and further described herein. Measurement of binding between a BHC80 protein or homolog and a histone or homolog can be accomplished by any means known in the art. These include, without limitation Western blotting or calorimetry. An illustrative example for determining whether a demethylase homolog has demethylase activity includes contacting the demethylase homolog with a target peptide that is methylated, and determining whether the demethylase homolog is capable of demethylating the target peptide. The assay may further comprise one or more other components, such as other proteins, e.g., CoREST, or cofactors, e.g., flavin adenine dinucleotide (FAD). A target peptide may be a histone peptide. Any histone peptide can be used. Preferably it is used with a histone demethylase enzyme that recognizes the histone peptide as a substrate. The full histone protein can be used or a peptide comprising only a portion of the histone protein can be used, so long as that portion contains the methylated residue upon which the demethylase enzyme acts and the portion contains sufficient contextual residues to permit its recognition by the enzyme. Typically at least 3, at least 4, at least 5, at least 6, or at least 7 residues on either side of the methylated residue are believed to be sufficient for recognition. The methylated residue can be either a lysine or an arginine. Preferably the histone peptide and the histone demethylase are derived from the same species of organism.

Measurement of the reaction between a histone and an eukaryotic histone demethylase protein can be accomplished by any means known in the art. These include, without limitation Western blotting, measuring formation of formaldehyde, mass spectrometry, and measuring formation of peroxide.

Compositions comprising an isolated polypeptide or protein described herein, e.g., BHC80 or a homolog thereof or a histone or a homolog thereof may comprise less than about 10%, or alternatively about 5%, or alternatively about 1%, contaminating biological macromolecules or polypeptides.

In certain embodiments, a protein described herein is further linked to a heterologous polypeptide, e.g., a polypeptide comprising a domain which increases its

solubility and/or facilitates its purification, identification, detection, and/or structural characterization. Exemplary domains, include, for example, glutathione S-transferase (GST), protein A, protein G, calmodulin-binding peptide, thioredoxin, maltose binding protein, HA, myc, poly arginine, poly His, poly His- Asp or FLAG fusion proteins and tags. Additional exemplary domains include domains that alter protein localization in vivo, such as signal peptides, type III secretion system-targeting peptides, transcytosis domains, nuclear localization signals, etc.

A protein described herein may be linked to at least 2, 3, 4, 5, or more heterologous polypeptides. Polypeptides may be linked to multiple copies of the same heterologous polypeptide or may be linked to two or more heterologous polypeptides. The fusions may occur at the N-terminus of the polypeptide, at the C-terminus of the polypeptide, or at both the N- and C-terminus of the polypeptide. It is also within the scope of the invention to include linker sequences between a protein described herein and the fusion domain in order to facilitate construction of the fusion protein or to optimize protein expression or structural constraints of the fusion protein. A polypeptide may also be constructed so as to contain protease cleavage sites between the fusion polypeptide and polypeptide of the invention in order to remove the tag after protein expression or thereafter. Examples of suitable endoproteases, include, for example, Factor Xa and TEV proteases.

In another embodiment, a protein may be modified so that its rate of traversing the cellular membrane is increased. For example, the polypeptide may be fused to a second peptide which promotes "transcytosis," e.g., uptake of the peptide by cells. The peptide may be a portion of the HIV transactivator (TAT) protein, such as the fragment corresponding to residues 37-62 or 48-60 of TAT, portions which have been observed to be rapidly taken up by a cell in vitro (Green and Loewenstein, (1989) Cell 55:1179-1188). Alternatively, the internalizing peptide may be derived from the Drosophila antennapedia protein, or homologs thereof. The 60 amino acid long homeodomain of the homeo-protein antennapedia has been demonstrated to translocate through biological membranes and can facilitate the translocation of heterologous polypeptides to which it is coupled. Thus, the polypeptide may be fused to a peptide consisting of about amino acids 42-58 of Drosophila antennapedia or shorter fragments for transcytosis (Derossi et al. (1996) J Biol Chem 271:18188-18193; Derossi et al. (1994) J Biol Chem 269:10444-10450; and Perez et al. (1992) J Cell Sci 102:717-722). The transcytosis polypeptide may also be a non-narurally-

occurring membrane-translocating sequence (MTS), such as the peptide sequences disclosed in U.S. Patent No. 6,248,558.

In another embodiment, a protein described herein is labeled with an isotopic label to facilitate its detection and or structural characterization using nuclear magnetic resonance or another applicable technique. Exemplary isotopic labels include radioisotopic labels such as, for example, potassium-40 ( 40 K), carbon-14 ( 14 C), tritium ( 3 H), sulphur-35 ( 35 S), phosphorus-32 ( 32 P), technetium-99m ( 99m Tc), thallium-201 ( 201 Tl), gallium-67 ( 67 Ga), indium-I l l ( 111 In), iodine-123 ( 123 I), iodine-131 ( 131 I), yttrium-90 ( 90 Y), samarium-153 ( 153 Sm), rhenium-186 ( 186 Re), rhenium-188 ( 188 Re), dysprosium- 165 ( 165 Dy) and holmium- 166 ( 166 Ho). The isotopic label may also be an atom with non zero nuclear spin, including, for example, hydrogen-1 ( 1 H), hydrogen-2 ( 2 H), hydrogen-3 ( 3 H), phosphorous-31 ( 31 P), sodium-23 ( 23 Na), nitrogen-14 ( 14 N), nitrogen-15 ( 15 N), carbon-13 ( 13 C) and fluorine-19 ( 19 F). In certain embodiments, the polypeptide is uniformly labeled with an isotopic label, for example, wherein at least 50%, 70%, 80%, 90%, 95%, or 98% of the possible labels in the polypeptide are labeled, e.g., wherein at least 50%, 70%, 80%, 90%, 95%, or 98% of the nitrogen atoms in the polypeptide are 15 N, and/or wherein at least 50%, 70%, 80%, 90%, 95%, or 98% of the carbon atoms in the polypeptide are 13 C, and/or wherein at least 50%, 70%, 80%, 90%, 95%, or 98% of the hydrogen atoms in the polypeptide are 2 H. In other embodiments, the isotopic label is located in one or more specific locations within the polypeptide, for example, the label may be specifically incorporated into one or more of the leucine residues of the polypeptide. The invention also encompasses the embodiment wherein a single polypeptide comprises two, three or more different isotopic labels, for example, the polypeptide comprises both 15 N and 13 C labeling.

In yet another embodiment, a protein described herein is labeled to facilitate structural characterization using x-ray crystallography or another applicable technique. Exemplary labels include heavy atom labels such as, for example, cobalt, selenium, krypton, bromine, strontium, molybdenum, ruthenium, rhodium, palladium, silver, cadmium, tin, iodine, xenon, barium, lanthanum, cerium, praseodymium, neodymium, samarium, europium, gadolinium, terbium, dysprosium, holmium, erbium, thulium, ytterbium, lutetium, tantalum, tungsten, rhenium, osmium, iridium, platinum, gold, mercury, thallium, lead, thorium and uranium. In an exemplary embodiment, the polypeptide is labeled with seleno-methionine.

A variety of methods are available for preparing a polypeptide with a label, such as a radioisotopic label or heavy atom label. For example, in one such method, an expression vector comprising a nucleic acid encoding a polypeptide is introduced into a host cell, and the host cell is cultured in a cell culture medium in the presence of a source of the label, thereby generating a labeled polypeptide. The extent to which a polypeptide may be labeled may vary.

In still another embodiment, a protein described herein is labeled with a fluorescent label to facilitate its detection, purification, or structural characterization. In an exemplary embodiment, the polypeptide of the invention is fused to a heterologous polypeptide sequence which produces a detectable fluorescent signal, including, for example, green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), Renilla Reniformis green fluorescent protein, GFPmut2, GFPuv4, enhanced yellow fluorescent protein (EYFP), enhanced cyan fluorescent protein (ECFP), enhanced blue fluorescent protein (EBFP), citrine and red fluorescent protein from discosoma (dsRED). In other embodiments, a protein described herein is immobilized onto a solid surface, including, microtiter plates, slides, beads, films, etc. A protein described herein may be immobilized onto a "chip" as part of an array. An array, having a plurality of addresses, may comprise one or more polypeptides in one or more of those addresses.

In other embodiments, proteins described herein are contained within vessels useful for the manipulation of the polypeptide sample. For example, the polypeptide of the invention may be contained within a microtiter plate to facilitate detection, screening or purification of the polypeptide. The polypeptide may also be contained within a syringe as a container suitable for administering the polypeptide to a subject in order to generate antibodies or as part of a vaccination regimen. The polypeptides may also be contained within an NMR tube in order to enable characterization by nuclear magnetic resonance techniques.

In still other embodiments, the invention relates to a crystallized polypeptide of the invention and crystallized polypeptides which have been mounted for examination by x-ray crystallography as described further below. In certain instances, a protein described herein in crystal form may be single crystals of various dimensions (e.g., micro-crystals) or may be an aggregate of crystalline material.

In certain embodiments, it may be advantageous to provide naturally-occurring or experimentally-derived homologs of the polypeptide of the invention. Such homologs may

function in as a modulator to promote or inhibit a subset of the biological activities of the naturally-occurring form of the polypeptide. Thus, specific biological effects may be elicited by treatment with a homolog of limited function, and with fewer side effects relative to treatment with agonists or antagonists which are directed to all of the biological activities of the polypeptide of the invention. For instance, antagonistic homologs may be generated which interfere with the ability of the wild-type polypeptide of the invention to associate with certain proteins, but which do not substantially interfere with the formation of complexes between the native polypeptide and other cellular proteins.

Such homologues include fragments or mutants of the PHD finger domain of BHC80 protein and can be referred to herein as a ligand-binding fragment or protein. In general, the biological activity or biological action of a protein refers to any function(s) exhibited or performed by the protein that is ascribed to the naturally occurring form of the protein as measured or observed in vivo (i.e., in the natural physiological environment of the protein) or in vitro (i.e., under laboratory conditions). Modifications of a protein, such as in a homologue or mimetic (discussed below), may result in proteins having the same biological activity as the naturally occurring protein, or in proteins having decreased or increased biological activity as compared to the naturally occurring protein. Modifications which result in a decrease in protein expression or a decrease in the activity of the protein, can be referred to as inactivation (complete or partial), down-regulation, or decreased action of a protein. Similarly, modifications that result in an increase in protein expression or an increase in the activity of the protein, can be referred to as amplification, overproduction, activation, enhancement, up-regulation or increased action of a protein.

Nucleic acids encoding any of the proteins or homologs described herein are also provided herein. A nucleic acid may further be linked to a promoter and/or other regulatory sequences, as further described herein. Exemplary nucleic acids are those that are at least about 80%, 85%, 90%, 95%, 98%, 99% or 100% identical to a nucleotide sequence provided herein or a fragment thereof, such as nucleic acid sequence encoding the protein fragments described herein. Nucleic acids may also hybridize specifically, e.g., under stringent hybridization conditions, to a nucleic acid described herein or a fragment thereof. Also provided herein are molecular complexes, e. g,. protein complexes, comprising a BHC80 protein or homolog thereof and a histone or homolog thereof, and optionally other cofactors or molecules. Such compositions and complexes may be used, e.g., in screening

assays to identify agents that modulate the interaction between a BHC80 protein and a histone, and the interaction between a demethylase and a histone.

Exemplary complexes comprise a BHC80 protein or histone binding homolog thereof and a histone or BHC80 binding homolog thereof. BHC80 and histones and homologs are further described herein. A complex may further comprise a cofactor, e.g., such as a salt, metal (e.g., zinc), nucleotide, oligonucleotide or polypeptide, a modulator, or a small molecule. A protein complex may further comprise another protein, e.g., a protein or homolog thereof that binds to BHC80 and/or the histone, e.g., a demethylase, such as LSDl or a homolog thereof. The histone or homolog thereof is preferably demethylated at residue K4.

Proteins and complexes described herein may exist in solution. A solution may be a composition, e.g., pharmaceutical composition, such as comprising a therapeutically acceptable diluent.

Proteins or complexes described herein may also exist in crystal form. A crystallized complex may include a protein described herein and one or more of the following: a histone or homolog thereof, a co-factor (such as a salt, metal, nucleotide, oligonucleotide or polypeptide), a modulator, or a small molecule. In another aspect, the present invention contemplates a crystallized complex including a polypeptide of the invention and any other molecule or atom (such as a metal ion) that associates with the polypeptide in vivo. As described in the Examples, a preferred crystal structure is that between a portion of BHC80 comprising the PHD domain and a portion of histone H3 in which K4 is unmethylated. A crystalline complex between BHC80 and a histone may be produced using the crystal formation method described herein. A crystal can comprise any crystal structure that comes from crystals formed in any of the allowable space groups for these complexes proteins. A unit cell having "approximate dimensions of a given set of dimensions refers to a unit cell that has dimensions that are within plus (+) or minus (-) 2.0% of the specified unit cell dimensions. Such a small variation is within the scope of the invention since one of skill in the art could obtain such variance by performing X-ray crystallography at different times on the same crystal. In one embodiment, a crystalline complex of the present invention has the specified unit cell dimensions described herein, e.g., in the Exemplary section. A preferred crystal of the present invention provides X-ray diffraction data for determination of atomic coordinates of the complex to a resolution of about 4.0 A, and preferably to about 3.2 A, and preferably to about 3.0 A, and more

preferably to about 2.3 A, and more preferably to about 2.0 A, and even more preferably to about 1.8A.

Also provided herein are antibodies that bind specifically to a complex between a BHC80 protein or homolog thereof and a histone or homolog thereof, but essentially do not bind specifically to the BHC80 protein or homolog alone nor to the histone or homolog alone.

In one aspect, the present invention contemplates a purified antibody that binds specifically to a BHC80/histone protein complex and which does not substantially cross- react with a protein which is less than about 80%, or less than about 90%, identical to the amino acid sequences of BHC80 or histone. In another aspect, the present invention contemplates an array comprising a substrate having a plurality of address, wherein at least one of the addresses has disposed thereon a purified antibody that binds specifically to a protein complex described herein.

Antibodies may be full length antibodies, fragments of antibodies (e.g., Fab or F(ab')2), monoclonal antibodies, polyclonal antibodies, single chain antibodies, chimeric antibodies, humanized antibodies, human antibodies, mini antibodies or any other form of a molecule or complex of molecules that binds specifically to a molecular complex described herein.

Screening methods

Provided herein are screening methods for identifying agents that modulate the interaction between BHC80 and a histone.

A method for identifying an agent that modulates the interaction between a BHC80 and a histone may comprise contacting a BHC80 reagent and a histone reagent in the presence of a test agent; and determining the level of interaction between the BHC80 reagent and the histone reagent, wherein a different level of interaction between the BHC80 reagent and the histone reagent in the presence of the test agent relative to the absence of the test agent indicates that the test agent is an agent that modulates the interaction between a BHC80 protein and a histone. The binding reaction may further comprise other components of a histone demethylase complex, e.g., a demethylase, e.g., an LSDl reagent, or a cofactors, e.g., zinc. A method may further comprise determining the effect of the test agent on a biological activity, e.g., a biological activity of BHC80, histone, demethylase or complex thereof.

"BHC80 reagent" refers to a BHC80 protein or homolog thereof, e.g., histone binding homolog thereof, as further described herein. For example, a BHC80 reagent may comprise about amino acids 486 to 543 of a long form of a BHC80 protein, but not about amino acids 1 to 485 or 544 to 680 of a long form of BHC80 protein, a homolog, or functional homolog thereof or portion thereof sufficient for use in the particular assay. "Histone reagent" refers to a histone protein or homolog thereof, e.g., BHC80 binding homolog thereof. A histone reagent may be a peptide comprising about amino acids 1 to 8, 9 or 10 of a histone, a homolog, or functional homolog thereof or portion thereof sufficient for use in the particular assay. It is also recognized that the PHD domain of BHC80 may interact with histones other than H3 and that this interaction may occur in regions other than amino acids 1 to 10. In an assay for determining whether two proteins interact, it is only necessary to include portions of those proteins that interact with each other.

Reagents may be fused directly or indirectly to another moiety or label, e.g., a fluorophore or radioactive label or another peptide that may be useful in identifying, quantitating, isolating or purifying the reagent.

An interaction between a BHC80 protein or homolog thereof and a histone or homolog thereof may be detected by a variety of techniques. Modulation of the formation of complexes can be quantitated using, for example, detectably labeled proteins such as radiolabeled, fluorescently labeled, or enzymatically labeled polypeptides, by immunoassay, by chromatographic detection or by label-free surface plasmon resonance (SPR) based technology for studying biomolecular interactions in real time (BIACore).

Typically, it will be desirable to immobilize either the BHC80 protein or homolog thereof ("BHC80 reagent") or the histone protein or homolog thereof ("histone reagent") to facilitate separation of complexes from uncomplexed forms of one or both of the proteins, as well as to accommodate automation of the assay. Binding of the BHC80 reagent to the histone reagent, in the presence and absence of a candidate agent, can be accomplished in any vessel suitable for containing the reactants. Examples include microtitre plates, test tubes, and micro-centrifuge tubes.

In one embodiment, a BHC80 or histone reagent is provided in the form of a fusion protein comprising a domain that allows the protein to be bound to a matrix. For example, glutathione-S-transferase/BHC80 (GST/BHC80) fusion proteins can be adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, Mo.) or glutathione derivatized microtitre plates, which are then combined with the other protein, which may be labeled,

and the test compound, and the mixture incubated under conditions conducive to complex formation, e.g. at physiological conditions for salt and pH, though slightly more stringent conditions may be desired. Following incubation, the beads may be washed to remove any unbound label, the matrix immobilized and the presence of radiolabel determined directly (e.g. beads placed in scintillant), or in the supernatant after the complexes are subsequently dissociated. Alternatively, the complexes can be dissociated from the matrix, separated by SDS-PAGE, and the level of binding protein found in the bead fraction quantitated from the gel using standard electrophoretic techniques.

Other techniques for immobilizing proteins or peptides on matrices are also available for use in the subject assay. For instance, either the BHC80 or histone reagent can be immobilized utilizing conjugation of biotin and streptavidin. For instance, biotinylated BHC80 molecules can be prepared from biotin-NHS(N-hydroxy-succinimide) using techniques well known in the art (e.g., biotinylation kit, Pierce Chemicals, Rockford, 111.), and immobilized in the wells of streptavidin-coated 96 well plates (Pierce Chemical). Alternatively, antibodies reactive with the BHC80 or histone reagent, but which preferably do not interfere with the interaction between the the BHC80 reagent and the histone reagent, can be derivatized to the wells of the plate, and BHC80 or histone reagent trapped in the wells by antibody conjugation. As above, preparations of a binding protein and a test agent are incubated in the BHC80 or histone-presenting wells of the plate, and the amount of complex trapped in the well can be quantitated. Exemplary methods for detecting such complexes, in addition to those described above for the GST-immobilized complexes, include immunodetection of complexes using antibodies reactive with the binding protein, or which are reactive with the BHC80 or histone reagent and compete with the binding protein; as well as enzyme-linked assays which rely on detecting an enzymatic activity associated with the binding protein, either intrinsic or extrinsic activity. In the instance of the latter, the enzyme can be chemically conjugated or provided as a fusion protein with the binding protein. To illustrate, the binding protein can be chemically cross-linked or genetically fused (if it is a polypeptide) with horseradish peroxidase, and the amount of polypeptide trapped in the complex can be assessed with a chromogenic substrate of the enzyme, e.g. 3,3'-diamino-benzadine terahydrochloride or 4-chloro-l-napthol. Likewise, a fusion protein comprising the polypeptide and glutathione-S-transferase can be provided, and complex formation quantitated by detecting the GST activity using l-chloro-2,4- dinitrobenzene (Habig et al (1974) J Biol Chem 249:7130).

For processes which rely on immunodetection for quantitating proteins trapped in the complex, antibodies against the protein, such as anti-BHC80 or anti-histone antibodies, can be used. Alternatively, the protein to be detected in the complex can be "epitope tagged" in the form of a fusion protein which includes, in addition to the BHC80 or histone sequence, a second polypeptide for which antibodies are readily available (e.g. from commercial sources). For instance, the GST fusion proteins described above can also be used for quantification of binding using antibodies against the GST moiety. Other useful epitope tags include myc-epitopes (e.g., see Ellison et al. J Biol. Chem. 266:21150-21157 (1991)) which includes a 10-residue sequence from c-myc, as well as the pFLAG system (International Biotechnologies, Inc.) or the pEZZ-protein A system (Pharmacia, NJ.).

The efficacy of a test compound can be assessed by generating dose response curves from data obtained using various concentrations of the test compound. Moreover, a control assay can also be performed to provide a baseline for comparison. In an exemplary control assay, interaction of a BHC80 reagent and binding protein is quantitated in the absence of the test compound.

Test agents (or substances) for screening to identify modulators, e.g., inhibitors or enhancers, of BHC80/histone interaction can be from any source known in the art. They can be natural products, purified or mixtures, synthetic compounds, members of compound libraries, etc. The compounds to be tested may be chosen at random or may be chosen using a filter based on structure and/or binding sites of the proteins. The test substances can be selected from those that have previously identified to have biological or drug activity or from those that have not. In some embodiments a natural substrate is the starting point for designing a modulator of binding.

In one embodiment, a test agent is a peptide or a peptide-like molecule. For example, a peptide may be a homolog of the amino acid sequence of a histone from residues 1, 2, 3, 4, 5 to 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. Homologs may comprise 1, 2, 3, or more amino acids that differ from a naturally occurring peptide.

A screening assay may also comprise using a cell or cell lysate or portion thereof, comprising a BHC80 reagent and a histone reagent; contacting the cell or cell lysate or portion thereof with a test agent; and determining whether the interaction between the BHC80 reagent and the histone reagent is affected by the presence of the test agent. The BHC80 and histone reagents may be proteins that are encoded by a heterologous or exogenous nucleic acid, i.e., a nucleic acid that is not present in a naturally occurring cell.

An exemplary assay may comprise (i) providing a cell comprising a heterologous nucleic acid encoding a BHC80 reagent and/or a heterologous nucleic acid encoding a histone reagent; (ii) contacting, or administering into, the cell a test agent; and (iii) determining whether the test agent increases or decreases the interaction between the BHC80 reagent and the histone reagent, wherein an increase or a decrease in the interaction indicates that the test agent is an agent that increases or decreases, respectively, the interaction between BHC80 and a histone. BHC80 and histone reagents may be fused (directly or indirectly through a linker) to a heterologous peptide, such as a tag or specific amino acid sequence, that may be detected with a specific reagent, as further described herein. The cell may be or cell lysate may be from a eukaryotic cell, e.g., a mammalian cell, such as a human cell, a yeast cell, a non-human primate cell, a bovine cell, an ovine cell, an equine cell, a porcine cell, a sheep cell, a bird (e.g., chicken or fowl) cell, a canine cell, a feline cell or a rodent (mouse or rat) cell. It can also be a non-mammalian cell, e.g., a fish cell. Yeast cells include S. cerevesiae and C. albicans. The cell may also be a prokaryotic cell, e.g., a bacterial cell. The cell may also be a single-celled microorganism, e.g., a protozoan. The cell may also be a metazoan cell, a plant cell or an insect cell.

Methods of Use ofBHC80 - Histone Crystal Complexes

The present invention provides the atomic coordinates that define the three dimensional structure of the BHC80 PHD domain bound to the unmodified H3 tail, and accordingly, the atomic coordinates that define the three dimensional structures of the BHC80 PHD domain. This information may be used for rational drug design, as further described herein. Furthermore, using the guidance provided herein, one of skill in the art will be able to reproduce any of such structures and define atomic coordinates of such a structure.

A BHC80 - histone crystal may be obtained as described in the Examples. The method may comprise combining a BHC80 protein or homolog thereof and a histone or homolog thereof in a 1:1.5 ratio (respectively); using the sitting drop vapour-diffusion method at about 16°C, with mother liquor containing about 100 mM sodium citrate, about pH 5.6, about 20% polyethylene glycol 4000 and about 20% isopropanol. Crystals may also be obtained using the sitting drop vapour diffusion method with mother liquor containing about MES6.2-6.5, about 5-10% polyethylene glycol 4000 and about 20% isopropanol.

In certain embodiments, the crystals of the present invention diffract x-rays to a resolution of about 1.43 to about 40.0 Angstroms. In certain other embodiments, the crystals of the present invention diffract x-rays to a resolution of about 1.43 to about 2.06 Angstroms. In certain embodiments, the crystals of the present invention have cell constants with the dimensions of a= about 79.51 Angstroms, b= about 25.39 Angstroms, c= about 62.28 Angstroms, and beta= about 96.9 degrees. In certain other embodiments, the crystals of the present invention have cell constants with the dimensions of a= about 79.70 Angstroms, b= about 25.23 Angstroms, c= about 62.53 Angstroms, and beta= about 96.6 degrees. In still other embodiments, the crystals of the present invention have cell constants with the dimensions of a= about 80.33 Angstroms, b= about 25.36 Angstroms, c= about 62.73 Angstroms, and beta= about 96.6 degrees.

In certain embodiments, the crystals of the present invention belong to the space group C2. In certain embodiments, the crystals of the present invention comprise an atomic structure characterized by the coordinates deposited at the Protein Data Bank with accession number 2PUY.

The crystalline protein or complex of the present invention may be analyzed by X- ray diffraction and, based on data collected from this procedure, models are constructed which represent the tertiary structure of the protein or complex. Therefore, one embodiment of the present invention includes a representation, or model, of the three dimensional structure of a protein or complex of the present invention or of a component thereof, such as a computer model. A computer model of the present invention can be produced using any suitable software modeling program, including, but not limited to, the graphical display program O (Jones et. al., Acta Crystallography, vol. A47, p. 110, 1991), CNS (Brunger, et al. Crystallography & NMR system: A software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54, 905-21. (1998)), the graphical display program GRASP, MOLSCRIPT 2.0 (Avatar Software AB, Heleneborgsgatan 21C, SE-11731 Stockholm, Sweden), the program CONTACTS from the CCP4 suite of programs (Bailey, 1994, Acta Cryst. D50:760-763), or the graphical display program INSIGHT. Suitable computer hardware useful for producing an image of the present invention are known to those of skill in the art (e.g., a Silicon Graphics Workstation).

A representation, or model, of the three dimensional structure of the complex or protein component for which a crystal has been produced can also be determined using techniques which include molecular replacement or SIR/MIR (single/multiple isomorphous replacement), or MAD (multiple wavelength anomalous diffraction) methods (Hendrickson et al., 1997, Methods Enzymol, 276:494-522). Methods of molecular replacement are generally known by those of skill in the art (generally described in Brunger, Meth. Enzym., vol. 276, pp. 558-580, 1997; Navaza and Saludjian, Meth. Enzym., vol. 276, pp. 581-594, 1997; Tong and Rossmann, Meth. Enzym., vol. 276, pp. 594-611, 1997; and Bentley, Meth. Enzym., vol. 276, pp. 611-619, 1997, each of which are incorporated by this reference herein in their entirety) and are performed in a software program including, for example, AmoRe (CCP4, Acta Cryst. D50, 760-763 (1994), SOLVE (Terwilliger et al., 1999, Acta Crystallogr., D55:849-861), RESOLVE (Terwilliger, 2000, Acta Crystallogr., D56:965- 972) or XPLOR. Briefly, X-ray diffraction data is collected from the crystal of a crystallized target structure. The X-ray diffraction data is transformed to calculate a Patterson function. The Patterson function of the crystallized target structure is compared with a Patterson function calculated from a known structure (referred to herein as a search structure). The Patterson function of the crystallized target structure is rotated on the search structure Patterson function to determine the correct orientation of the crystallized target structure in the crystal. The translation function is then calculated to determine the location of the target structure with respect to the crystal axes. Once the crystallized target structure has been correctly positioned in the unit cell, initial phases for the experimental data can be calculated. These phases are necessary for calculation of an electron density map from which structural differences can be observed and for refinement of the structure. Preferably, the structural features (e.g., amino acid sequence, conserved disulphide bonds, and β-strands or β-sheets) of the search molecule are related to the crystallized target structure.

As used herein, the term "model" refers to a representation in a tangible medium of the three dimensional structure of a protein, polypeptide or peptide. For example, a model can be a representation of the three dimensional structure in an electronic file, on a computer screen, on a piece of paper (i.e., on a two dimensional medium), and/or as a ball- and-stick figure. Physical three-dimensional models are tangible and include, but are not limited to, stick models and space-filling models. The phrase "imaging the model on a computer screen" refers to the ability to express (or represent) and manipulate the model on

a computer screen using appropriate computer hardware and software technology known to those skilled in the art. Such technology is available from a variety of sources including, for example, Evans and Sutherland, Salt Lake City, Utah, and Biosym Technologies, San Diego, CA. The phrase "providing a picture of the model" refers to the ability to generate a "hard copy" of the model. Hard copies include both motion and still pictures. Computer screen images and pictures of the model can be visualized in a number of formats including space-filling representations, α carbon traces, ribbon diagrams and electron density maps. A variety of such representations of the structural models of the present invention are shown, for example, in the figures. Preferably, a three dimensional structure of a BHC80 - histone complex includes:

(a) a structure defined by atomic coordinates of a three dimensional structure of a crystalline BHC80 protein or histone binding homolog thereof and histone or BHC80 binding homolog thereof;

(b) a structure defined by atomic coordinates selected from: (i) atomic coordinates represented in the table attached hereto;

(ii) atomic coordinates that define a three dimensional structure wherein at least 50% of the structure has an average root-mean-square deviation (RMSD) from backbone atoms in secondary structure elements in at least one domain of a three dimensional structure represented by the atomic coordinates of (i) of equal to or less than about 1.5 A;

(c) a structure defined by atomic coordinates derived from a BHC80 - histone complex arranged in a crystalline manner in the space group defined herein; and/or

(d) a structure of a BHC80 - histone complex constructed using as a template the three-dimensional structure of (a), (b) or (c). In one aspect of the invention, a three dimensional structure of a complex or a component thereof includes a structure wherein the structure has an average root-mean- square deviation (RMSD) of equal to or less than about 1.7 A over the backbone atoms in secondary structure elements of at least 50% of the residues in at least one domain of a three dimensional structure represented by the atomic coordinates provided herein. Such a structure can be referred to as a structural homologue of the complexes or components thereof defined by the coordinates provided herein. Preferably, the structure has an average root-mean-square deviation (RMSD) of equal to or less than about 1.6 A over the backbone atoms in secondary structure elements of at least 50% of the residues in at least one domain

of a three dimensional structure represented by the atomic coordinates provided herein, or equal to or less than about 1.5 A, or equal to or less than about 1.4 A, or equal to or less than about 1.3 A, or equal to or less than about 1.2 A, or equal to or less than about 1.1 A, or equal to or less than about 1.0 A, or equal to or less than about 0.9 A, or equal to or less than about 0.8 A, or equal to or less than about 0.7 A, or equal to or less than about 0.6 A, or equal to or less than about 0.5 A, or equal to or less than about 0.4 A, or equal to or less than about 0.3 A, or equal to or less than about 0.2 A, over the backbone atoms in secondary structure elements of at least 50% of the residues in at least one domain of a three dimensional structure represented by the atomic coordinates provided herein. In another aspect, a three dimensional structure of a complex or component thereof provided by the present invention includes a structure wherein the structure has the recited RMSD over the backbone atoms in secondary structure elements of at least 75% of the residues in at least one domain of a three dimensional structure represented by the atomic coordinates provided herein, and more preferably at least about 80%, and more preferably at least about 85%, and more preferably at least about 90%, and more preferably at least about 95%, and most preferably, about 100% of the residues in at least one domain of a three dimensional structure represented by the atomic coordinates provided herein.

In one embodiment, the RMSD of a structural homologue of a complex or component thereof can be extended to include atoms of amino acid side chains. As used herein, the phrase "common amino acid side chains" refers to amino acid side chains that are common to both the structural homologue and to the structure that is actually represented by such atomic coordinates (e.g., a structure represented by the coordinates provided herein). Preferably, at least 50% of the structure has an average root-mean-square deviation (RMSD) from common amino acid side chains in a three dimensional structure represented by the atomic coordinates provided herein of equal to or less than about 1.7 A, or equal to or less than about 1.6 A, equal to or less than about 1.5 A, or equal to or less than about 1.4 A, or equal to or less than about 1.3 A, or equal to or less than about 1.2 A, or equal to or less than about 1.1 A, or equal to or less than about 1.0 A, or equal to or less than about 0.9 A, or equal to or less than about 0.8 A, or equal to or less than about 0.7 A, or equal to or less than about 0.6 A, or equal to or less than about 0.5 A, or equal to or less than about 0.4 A, or equal to or less than about 0.3 A, or equal to or less than about 0.2 A. In another embodiment, a three dimensional structure of a complex or component thereof provided by the present invention includes a structure wherein at least about 75% of such

structure has the recited average root-mean-square deviation (RMSD) value, and more preferably, at least about 85% of such structure has the recited average root-mean-square deviation (RMSD) value, and most preferably, about 95% of such structure has the recited average root-mean-square deviation (RMSD) value. Accordingly, one embodiment of the present invention relates to a method of structure-based identification of compounds that regulate the interactions of BHC80 with its cognate substrates and the activities resulting from such interactions. The method may include a computer-assisted method of structure based drug designn. A method may comprise one or more of the following steps: (a) providing atomic coordinates that define the three dimensional structure of a BHC80 protein, domain or complex including such protein or domain, including a model that uses as a template the actual atomic coordinates provided herein, and including any of the three dimensional structures or atomic coordinates described herein; and (b) identifying at least one candidate compound for interacting with the three dimensional structure of an active site in such protein, domain or complex by performing structure based drug design with the structure of (a). The step of identifying is typically performed in conjunction with computer modeling.

A method may comprise obtaining, providing, supplying, accessing, displaying, retrieving, or otherwise making available the atomic coordinates defining any three dimensional structures as described herein. For example, a method may include accessing the atomic coordinates for the structure from a database or other source; importing the atomic coordinates for the structure into a computer or other database; displaying the atomic coordinates and/or a model of the structure in any manner, such as on a computer, on paper, etc.; and determining the three dimensional structure described by the present invention de novo using the guidance provided herein. A second step of the method of structure based identification of compounds of the present invention may include identifying a candidate compound by performing structure based drug design with the model of the structure. According to the present invention, the step of "identifying" can refer to any screening process, modeling process, design process, or other process by which a compound can be selected as useful for binding or inhibiting the activity of the protein or complex. The selection of compounds that compete with, disrupt or otherwise inhibit, the biological activity of the enzymes of the invention, or alternatively that enhance, activate or otherwise stimulate the biological activity of such

enzymes are highly desirable. Such compounds can be designed using structure based drug design using models of the structures disclosed herein.

Structure based identification of compounds (e.g., structure based drug design, structure based compound screening, or structure based structure modeling) refers to the prediction or design of a conformation of a peptide, polypeptide, protein, or to the prediction or design of a conformational interaction between such protein, peptide or polypeptide, and a candidate compound, by using the three dimensional structure of the peptide, polypeptide or protein. Typically, structure based identification of compounds is performed with a computer (e.g., computer-assisted drug design, screening or modeling). For example, generally, for a protein to effectively interact with (e.g., bind to) a compound, it is necessary that the three dimensional structure of the compound assume a compatible conformation that allows the compound to bind to the protein in such a manner that a desired result is obtained upon binding. Knowledge of the three dimensional structure of the components of the complexes described herein in the conformation in which they bind to one another enables a skilled artisan to design a compound having such compatible conformation, or to select such a compound from available libraries of compounds and/or structures thereof.

Suitable structures and models useful for structure based drug design are disclosed herein. Preferred target structures to use in a method of structure based drug design include any representations of structures produced by any modeling method disclosed herein, including molecular replacement and fold recognition related methods.

According to the present invention, the step of identifying, selecting or designing a compound for testing in a method of structure based identification of the present invention can include creating a new chemical compound structure or searching databases of libraries of known compounds (e.g., a compound listed in a computational screening database containing three dimensional structures of known compounds). Designing can also be performed by simulating chemical compounds having substitute moieties at certain structural features. The step of designing can include selecting a chemical compound based on a known function of the compound. A preferred step of designing comprises computational screening of one or more databases of compounds in which the three dimensional structure of the compound is known and is interacted (e.g., docked, aligned, matched, interfaced) with the three dimensional structure of a complex of the invention (or protein or DNA component thereof) by computer (e.g. as described by Humblet and

Dunbar, Animal Reports in Medicinal Chemistry, vol. 28, pp. 275-283, 1993, M Venuti, ed., Academic Press). The compound itself, if identified as a suitable candidate by the method of the invention, can be synthesized and tested directly, for example, in a biological assay. Methods to synthesize suitable chemical or protein-based compounds are known to those of skill in the art and depend upon the structure of the chemical being synthesized. Such methods are discussed in detail below. Methods to evaluate the bioactivity of the synthesized compound depend upon the bioactivity of the compound (e.g., inhibitory or stimulatory) and are discussed herein.

Various other methods of structure-based drug design are disclosed in Maulik et al., 1997, Molecular Biotechnology: Therapeutic Applications and Strategies, Wiley-Liss, Inc., which is incorporated herein by reference in its entirety. Maulik et al. disclose, for example, methods of directed design, in which the user directs the process of creating novel molecules from a fragment library of appropriately selected fragments; random design, in which the user uses a genetic or other algorithm to randomly mutate fragments and their combinations while simultaneously applying a selection criterion to evaluate the fitness of candidate ligands; and a grid-based approach in which the user calculates the interaction energy between three dimensional receptor structures and small fragment probes, followed by linking together of favorable probe sites.

In one aspect, the method of drug design generally includes computationally evaluating the potential of a selected chemical entity to associate with any of the molecules or complexes of the present invention (or portions thereof). For example, this method may include the steps of (a) employing computational means to perform a fitting operation between the selected chemical entity and a draggable region of the molecule or complex; and (b) analyzing the results of said fitting operation to quantify the association between the chemical entity and the draggable region.

A chemical entity may be examined either through visual inspection or through the use of computer modeling using a docking program such as GRAM, DOCK, or AUTODOCK (Dunbrack et al., Folding & Design, 2:27-42 (1997)). This procedure can include computer fitting of chemical entities to a target to ascertain how well the shape and the chemical structure of each chemical entity will complement or interfere with the structure of the subject polypeptide (Bugg et al., Scientific American, Dec: 92-98 (1993); West et al., TIPS, 16:67-74 (1995)). Computer programs may also be employed to estimate the attraction, repulsion, and steric hindrance of the chemical entity to a draggable

region, for example. Generally, the tighter the fit (e.g., the lower the steric hindrance, and/or the greater the attractive force) the more potent the chemical entity will be because these properties are consistent with a tighter binding constant. Furthermore, the more specificity in the design of a chemical entity the more likely that the chemical entity will not interfere with related proteins, which may minimize potential side-effects due to unwanted interactions.

A variety of computational methods for molecular design, in which the steric and electronic properties of druggable regions are used to guide the design of chemical entities, are known: Cohen et al. (1990) J. Med. Cam. 33: 883-894; Kuntz et al. (1982) J. MoI. Biol 161: 269-288; DesJarlais (1988) J. Med. Cam. 31: 722-729; Bartlett et al. (1989) Spec. Publ, Roy. Soc. Chem. 78: 182-196; Goodford et al. (1985) J. Med. Cam. 28: 849-857; and DesJarlais et al. J. Med. Cam. 29: 2149-2153. Directed methods generally fall into two categories: (1) design by analogy in which 3-D structures of known chemical entities (such as from a crystallographic database) are docked to the druggable region and scored for goodness-of-fit; and (2) de novo design, in which the chemical entity is constructed piece- wise in the druggable region. The chemical entity may be screened as part of a library or a database of molecules. Databases which may be used include ACD (Molecular Designs Limited), NCI (National Cancer Institute), CCDC (Cambridge Crystallographic Data Center), CAST (Chemical Abstract Service), Derwent (Derwent Information Limited), Maybridge (Maybridge Chemical Company Ltd), Aldrich (Aldrich Chemical Company), DOCK (University of California in San Francisco), and the Directory of Natural Products (Chapman & Hall). Computer programs such as CONCORD (Tripos Associates) or DB-Converter (Molecular Simulations Limited) can be used to convert a data set represented in two dimensions to one represented in three dimensions. Chemical entities may be tested for their capacity to fit spatially with a druggable region or other portion of a target protein. As used herein, the term "fits spatially" means that the three-dimensional structure of the chemical entity is accommodated geometrically by a druggable region. A favorable geometric fit occurs when the surface area of the chemical entity is in close proximity with the surface area of the druggable region without forming unfavorable interactions. A favorable complementary interaction occurs where the chemical entity interacts by hydrophobic, aromatic, ionic, dipolar, or hydrogen donating and accepting forces. Unfavorable interactions may be steric hindrance between atoms in the chemical entity and atoms in the druggable region.

If a model is a computer model, the chemical entities may be positioned in a draggable region through computational docking. If, on the other hand, the model is a structural model, the chemical entities may be positioned in the draggable region by, for example, manual docking. In an illustrative embodiment, the design of a potential modulator begins from the general perspective of shape complimentary for the draggable region of a BHC80 protein, and a search algorithm is employed which is capable of scanning a database of small molecules of known three-dimensional structure for chemical entities which fit geometrically with the target draggable region. Most algorithms of this type provide a method for finding a wide assortment of chemical entities that are complementary to the shape of a draggable region of the subject BHC80 protein. Each of a set of chemical entities from a particular data-base, such as the Cambridge Crystallographic Data Bank (CCDB) (Allen et al. (1973) J. Chem. Doc. 13: 119), is individually docked to the draggable region of a BHC80 protein in a number of geometrically permissible orientations with use of a docking algorithm. In certain embodiments, a set of computer algorithms called DOCK, can be used to characterize the shape of invaginations and grooves that form the active sites and recognition surfaces of the draggable region (Kuntz et al. (1982) J. MoI. Biol 161: 269-288). The program can also search a database of small molecules for templates whose shapes are complementary to particular binding sites of a BHC80 protein (DesJarlais et al. (1988) J Med Chem 31 : 722-729).

The orientations are evaluated for goodness-of-fϊt and the best are kept for further examination using molecular mechanics programs, such as AMBER or CHARMM. Such algorithms have previously proven successful in finding a variety of chemical entities that are complementary in shape to a draggable region. Goodford (1985, J Med Chem 28:849-857) and Boobbyer et al. (1989, JMeJ Chem

32:1083-1094) have produced a computer program (GRID) which seeks to determine regions of high affinity for different chemical groups (termed probes) of the draggable region. GRID hence provides a tool for suggesting modifications to known chemical entities that might enhance binding. It may be anticipated that some of the sites discerned by GRID as regions of high affinity correspond to "pharmacophoric patterns" determined inferentially from a series of known ligands. As used herein, a "pharmacophoric pattern" is a geometric arrangement of features of chemical entities that is believed to be important for binding. Attempts have been made to use pharmacophoric patterns as a search screen

for novel ligands (Jakes et al. (1987) JMoI Graph 5:41-48; Brint et al. (1987) JMoI Graph 5:49-56; Jakes et al. (1986) JMo/ Graph 4:12-20).

Yet a further embodiment of the present invention utilizes a computer algorithm such as CLIX which searches such databases as CCDB for chemical entities which can be oriented with the druggable region in a way that is both sterically acceptable and has a high likelihood of achieving favorable chemical interactions between the chemical entity and the surrounding amino acid residues. The method is based on characterizing the region in terms of an ensemble of favorable binding positions for different chemical groups and then searching for orientations of the chemical entities that cause maximum spatial coincidence of individual candidate chemical groups with members of the ensemble. The algorithmic details of CLIX is described in Lawrence et al. (1992) Proteins 12:31-41.

In this way, the efficiency with which a chemical entity may bind to or interfere with a druggable region may be tested and optimized by computational evaluation. For example, for a favorable association with a druggable region, a chemical entity must preferably demonstrate a relatively small difference in energy between its bound and fine states (i.e., a small deformation energy of binding). Thus, certain, more desirable chemical entities will be designed with a deformation energy of binding of not greater than about 10 kcal/mole, and more preferably, not greater than 7 kcal/mole. Chemical entities may interact with a druggable region in more than one conformation that is similar in overall binding energy. In those cases, the deformation energy of binding is taken to be the difference between the energy of the free entity and the average energy of the conformations observed when the chemical entity binds to the target.

In this way, the present invention provides computer-assisted methods for identifying or designing a potential modulator of BHC80 protein activity including: supplying a computer modeling application with a set of structure coordinates of a BHC80 protein or BHC80 protein - histone complex, the BHC80 protein or BHC80 protein - histone complex including at least a portion of a druggable region from a BHC80 protein; supplying the computer modeling application with a set of structure coordinates of a chemical entity; and determining whether the chemical entity is expected to bind to the BHC80 protein or BHC80 protein - histone complex, wherein binding to the BHC80 protein or BHC80 protein - histone complex is indicative of potential modulation of the BHC80 protein.

Also provided is a computer-assisted method for identifying or designing a potential BHC80 protein modulator, supplying a computer modeling application with a set of structure coordinates of a BHC80 protein or BHC80 protein - histone complex, the BHC80 protein or BHC80 protein - histone complex including at least a portion of a druggable region of a BHC80 protein; supplying the computer modeling application with a set of structure coordinates for a chemical entity; evaluating the potential binding interactions between the chemical entity and active site of the molecule or molecular complex; structurally modifying the chemical entity to yield a set of structure coordinates for a modified chemical entity, and determining whether the modified chemical entity is expected to bind to the BHC80 protein or BHC80 protein - histone complex, wherein binding to the BHC80 protein or BHC80 protein - histone complex is indicative of potential BHC80 protein modulator.

According to the present invention, suitable candidate compounds to test using the method of the present invention include proteins, peptides or other organic molecules, and inorganic molecules. Suitable organic molecules include small organic molecules.

Peptides refer to small molecular weight compounds yielding two or more amino acids upon hydrolysis. A polypeptide is comprised of two or more peptides. As used herein, a protein is comprised of one or more polypeptides. Preferred therapeutic compounds to design include peptides composed of "L" and/or "D" amino acids that are configured as normal or retroinverso peptides, peptidomimetic compounds, small organic molecules, or homo- or hetero-polymers thereof, in linear or branched configurations.

Preferably, a compound that is identified by the method of the present invention originates from a compound having chemical and/or stereochemical complementarity with a site on a BHC80 protein described herein. Such complementarity is characteristic of a compound that matches the surface of the protein(s) either in shape or in distribution of chemical groups and binds to protein(s) to regulate (e.g., by inhibition or stimulation/enhancement) binding of a the BHC80 protein to one or more of its cognate ligands, for example, or to otherwise inhibit the biological activity of the BHC80 protein or one or more of its cognate ligands. More preferably, a compound that binds to a binding site on either the BHC80 protein or its cognate ligand associates with an affinity of at least about 10 "6 M, and more preferably with an affinity of at least about 10 "7 M, and more preferably with an affinity of at least about 10 "8 M.

A potential modulator can be obtained by screening a chemical or peptide library (Scott and Smith, Science, 249:386-390 (1990); Cwirla et al, Proc. Natl. Acad. ScL, 87:6378-6382 (1990); Devlin et al., Science, 249:404-406 (1990)). A potential modulator selected in this manner could then be systematically modified by computer modeling programs until one or more promising potential drugs are identified. Such analysis has been shown to be effective in the development of HIV protease inhibitors (Lam et al., Science 263:380-384 (1994); Wlodawer et al., Ann. Rev. Biochem. 62:543-585 (1993); Appelt, Perspectives in Drug Discovery and Design 1:23-48 (1993); Erickson, Perspectives in Drug Discovery and Design 1:109-128 (1993)). Alternatively, the potential modulator may be synthesized de novo, for example, using modifications of the methods described herein.

Preferred draggable regions for targeting for structure based drag design or identification of candidate compounds and lead compounds via biological assays include any of the regions described in the Exemplification (e.g., PHD finger domain, and particularly the histone-binding sites), although other regions may become apparent to those of skill in the art based on the three-dimensional structures provided herein. Although many of the draggable regions described below are illustrated with respect to the specific amino acid sequence of BHC80, because the tertiary structures are predicted to be highly similar in homologous draggable regions on other highly related proteins and complexes (e.g., the homologous protein in different mammalian species, other BHC80 family members, or other enzymes with histone lysine demethylase activity), it is to be understood that the description of the target sites is intended to encompass all other such homologues of the exemplified sequence and structures. One of skill in the art can readily extrapolate the amino acid residues within a sequence described herein to the corresponding amino acid residues in a highly related sequence simply by aligning the related sequences. More specifically, one of skill in the art can readily determine whether a given sequence aligns with another sequence, as well as identify conserved regions of sequence identity or homology within sequences, by using any of a number of software programs that are publicly available. For example, one can use BLOCKS (GIBBS) and MAST (Henikoff et al., 1995, Gene, 163, 17-26; Henikoff et al., 1994, Genomics, 19, 97-107), typically using standard manufacturer defaults.

Combinations of any of the draggable regions identified herein are also suitable draggable regions. These draggable regions are generally referenced with regard to the tertiary structure of the draggable regions.

A candidate compound for binding to, agonizing, antagonizing, or otherwise modulating (regulating, modifying, upregulating, downregulating) the activity of a protein or complex of the invention is identified by one or more methods of structure-based or function-based identification using the proteins, structures, and functional information provided herein.

The present invention also includes methods which confirm whether or not a candidate compound has the predicted properties with respect to its effect on the actual protein(s) or complex, preferably by synthesizing the candidate compound and conducting biological, molecular or chemical assays to select those compounds that actually have the desired activity in vitro, ex vivo or in vivo. Alternatively, such methods can be performed without designing a compound based on the tertiary structure of the BHC80 proteins of the invention, but instead based on the description of the novel biological function (e.g., the demethylase enzymatic activity or related activities) of the proteins described herein. Therefore, the invention further includes methods of identifying homologues (agonists or antagonists) or regulators (binding partners, inhibitors, stimulators) or BHC80 biological activities as described herein. Such assays can include, but are not limited to, cell-based or non-cell-based enzyme assays, competition assays, and/or binding assays.

Further refinements to the structure of the modulator will generally be necessary and can be made by the successive iterations of any and/or all of the steps provided by the particular screening assay, in particular further structural analysis by e.g., 15 N NMR relaxation rate determinations or x-ray crystallography with the modulator bound to the subject polypeptide. These studies may be performed in conjunction with biochemical assays.

Once identified, a potential modulator may be used as a model structure, and analogs to the compound can be obtained. The analogs are then screened for their ability to bind a BHC80 protein. An analog of the potential modulator might be chosen as a modulator when it binds to a BHC80 protein with a higher binding affinity than the predecessor modulator.

In a related approach, iterative drag design is used to identify BHC80 protein modulators, e.g., modulators of its interaction with a histone. Iterative drag design is a

method for optimizing associations between a protein and a modulator by determining and evaluating the three dimensional structures of successive sets of protein/ modulator complexes. In iterative drug design, crystals of a series of protein/ modulator complexes are obtained and then the three-dimensional structures of each complex is solved. Such an approach provides insight into the association between the proteins and modulators of each complex. For example, this approach may be accomplished by selecting compounds with inhibitory activity, obtaining crystals of this new BHC80 protein/compound complex, solving the three dimensional structure of the complex, and comparing the associations between the new complex and previously solved BHC80 protein/inhibitor complexes. By observing how changes in the modulator affected the BHC80 protein/ modulator associations, these associations may be optimized.

In addition to designing and/or identifying a chemical entity to associate with a draggable region, as described above, the same techniques and methods may be used to design and/or identify chemical entities that either associate, or do not associate, with affinity regions, selectivity regions or undesired regions of BHC80 proteins. By such methods, selectivity for one or a few targets, or alternatively for multiple targets, from the same species or from multiple species, can be achieved.

For example, a chemical entity may be designed and/or identified for which the binding energy for one draggable region, e.g., an affinity region or selectivity region, is more favorable than that for another region, e.g., an undesired region, by about 20%, 30%, 50% to about 60% or more. It may be the case that the difference is observed between (a) more than two regions, (b) between different regions (selectivity, affinity or undesirable) from the same target, (c) between regions of different targets, (d) between regions of homologs from different species, or (e) between other combinations. Alternatively, the comparison may be made by reference to the Kd, usually the apparent Kd, of said chemical entity with the two or more regions in question.

In another aspect, prospective modulators are screened for binding to two nearby draggable regions on a BHC80 protein. For example, a inhibitor that binds a first region of a BHC80 protein does not bind a second nearby region. Binding to the second region can be determined by monitoring changes in a different set of amide chemical shifts in either the original screen or a second screen conducted in the presence of an modulator (or potential inhibitor) for the first region. From an analysis of the chemical shift changes, the approximate location of a potential modulator for the second region is identified.

Optimization of the second modulator for binding to the region is then carried out by screening structurally related compounds (e.g., analogs as described above). When inhibitors for the first region and the second region are identified, their location and orientation in the ternary complex can be determined experimentally. On the basis of this structural information, a linked compound, e.g., a consolidated modulator, is synthesized in which the modulator for the first region and the modulator for the second region are linked. In certain embodiments, the two modulators are covalently linked to form a consolidated modulator. This consolidated modulator may be tested to determine if it has a higher binding affinity for the target BHC80 protein than either of the two individual modulators. A consolidated modulator is selected as a modulator when it has a higher binding affinity for the target than either of the two modulators. Larger consolidated modulators can be constructed in an analogous manner, e.g., linking three modulators which bind to three nearby regions on the target to form a multilinked consolidated modulator that has an even higher affinity for the target than the linked modulator. In this example, it is assumed that is desirable to have the modulator bind to all the draggable regions. However, it may be the case that binding to certain of the draggable regions is not desirable, so that the same techniques may be used to identify modulators and consolidated modulators that show increased specificity based on binding to at least one but not all draggable regions of a target. In one embodiment, a method for identifying a potential modulator of a BHC80 protein comprises: (a) providing the three-dimensional coordinates of a BHC80 protein or a portion thereof comprising a draggable region and (b) selecting from a database at least one compound that comprises three dimensional coordinates which indicate that the compound may bind the draggable region, wherein the selected compound is a potential modulator of a BHC80 protein. The selecting step may be done manually or via an algorithm. For example such a method may comprise: (a) supplying a computer modeling application with a set of structure coordinates of a BHC80 protein or BHC80 protein complex, the BHC80 protein or BHC80 protein complex including at least a portion of a BHC80 protein; (b) supplying the computer modeling application with a set of structure coordinates of a candidate chemical entity; and (c) determining whether the chemical entity is a modulator expected to bind to or interfere with the BHC80 protein or BHC80 protein complex, wherein binding to or interfering with the BHC80 protein or BHC80 protein complex is indicative of potential inhibition of the activity of the BHC80 protein.

The determining step may include comprise performing a fitting operation between the chemical entity and a druggable region of the BHC80 protein or BHC80 protein complex, followed by computationally analyzing the results of the fitting operation to quantify the association between the chemical entity and the druggable region. In another embodiment, a method for identifying a potential modulator of a BHC80 protein comprises: (a) providing the three-dimensional coordinates of a BHC80 protein or BHC80 protein - histone complex, the BHC80 protein or BHC80 protein - histone complex including at least a portion of a BHC80 protein; (b) optimizing the binding of a chemical entity to a druggable region of the BHC80 protein or BHC80 protein - histone complex using an algorithm and (c) determining whether the chemical entity exhibits improved binding or interference with the formation of a BHC80 protein - histone complex relative to known compounds. For example, such a method may comprise: (a) providing the three-dimensional coordinates of a BHC80 protein or BHC80 protein - histone complex, the BHC80 protein or BHC80 protein - histone complex including at least a portion of a BHC80 protein; (b) providing a set of structure coordinates for a chemical entity; (c) evaluating the potential binding interactions between the chemical entity and the BHC80 protein or BHC80 protein - histone complex; (d) structurally modifying the chemical entity to yield a set of structure coordinates for a modified chemical entity; and (e) determining whether the modified chemical entity exhibits improved binding or interference with the formation of a BHC80 protein - histone complex relative to known compounds.

In certain embodiments of the above methods, the chemical entities may be selected from a library or database of chemical entities. In other embodiments, the chemical entity may be computationally built de novo or by using a scaffold structure, e.g. such as the scaffold of the BHC80 protein modulators described herein. For example, a chemical entity may be designed to bind to a druggable region based on the three dimensional structure of the druggable region of a BHC80 protein.

In another aspect, a method for designing a BHC80 protein modulator comprises: (a) providing the three dimensional structure of a BHC80 protein or a fragment thereof; (b) synthesizing a potential modulator based on the three dimensional structure of the BHC80 protein or fragment; (c) contacting a BHC80 protein or fragment or domain thereof with the potential modulator; and (d) assaying BHC80 protein activity, wherein a

change in BHC80 protein activity indicates that the compound may be a BHC80 protein modulator.

Methods of Using BHC80 Polypeptides and Nucleic Acids Also provided herein are methods for modulating the expression of genes that are regulated by methylation/demethylation, e.g., of a transcriptional regulator protein, such as a histone ("demethylase target gene"). As described in the Examples, it has been demonstrated herein that BHC80 interacts with unmethylated H3K4 and that RNAi knockdown of BHC80 results in derepression of LSDl target genes. It is further demonstrated that expression of LSD 1 target genes is restored by re-introduction of a BHC80 protein that is capable of binding to H3K4. Accordingly, modulating the interaction between BHC80 and histones, e.g., H3K4, can be used to modulate expression of genes that are regulated by methylation/demethylation.

Some genes are upregulated by methylation of a histone ("methylated histone- activated genes"), whereas other genes are downregulated by methylation of a histone ("methylated histone-repressed gene"). Exemplary genes that are amongst those upregulated by the methylation of histone H3 at the lysine K4 inlcude: M4 AchR, SCNlA, SCN2A, SCN3A, and p57. Other target genes include those containing a REST-responsive repressor element 1 (REl). These genes are repressed by a demethylase, such as LSDl, which has been shown herein to require BHC80 for stable association with its target promoters.

Accordingly, the expression of these methylated histone-activated genes can be repressed by binding of BHC80 to a histone, e.g., H3K4, and activated (or derepressed) by inhibiting the binding of BHC80 to the histone, e.g., H3K4, and optionally removing, deleting or reducing LSDl. Binding of BHC80 to a histone may be increased or stimulated, e.g., by providing a molecule (e.g., contacting a cell with a molecule) that stimulates the association between BHC80 and the histone, such as identified as further described herein. Binding of BHC80 to a histone may also be increased or stimulated, e.g., in a cell, by contacting the cell with an agent that increases BHC80 protein level or activity, e.g., by administering a functional BHC80 or homolog thereof.

In addition, methylated histone-activated genes can be activated by removing BHC80, such as by using a BHC80 siRNA or antisense or dominant negative mutant, and repressed by the presence of BHC80. It is important to note that LSDl and BHC80 appear

to act in concert, with BHC80 binding to H3K4meO, which is a demtheylation product of LSDl. The methylated histone-activated genes may also be modulated by modulating the expression of one or more components of the LSDl complex, including LSDl, CoREST and BHC80. For example, binding of BHC80 to a histone may also be increased by increasing the demethylation of the histone, e.g., by increasing the activity or protein level of a demethylase, such as LSDl . Also, methylated histone-activated genes can be represssed by the presence of CoREST and activated (or derepressed) by removing CoREST, such as by using a CoREST siRNA or antisense or dominant negative mutant. Genes that are downregulated by the methylation of histone H3 include those that are regulated by a nuclear receptor, such as the androgen or estrogen receptor (Metzger et al. (2005) Nature 437:436 and Garcia-Bassetts et al. (2007) Cell 128:505), such as those containing an androgen receptor element (ARE) in their promoter. Exemplary genes that are regulated by the androgen receptor include: prostate specific antigen isoform 1 (PSA)(NPJ)01639); Synaptotagmin-like 4 (SYTL4) (CAI42004); nerve growth factor receptor associated protein l(NGFRAPl) (CAI41523); 6-phosphofructo-2-kinase/fructose- 2,6-biρhosphatase 1 (PFKFBl) (NP_002616); fatty acid synthase (FAS) (NP_004095); and Proteinase-activated receptor 1 precursor (PAR-I) (P25116). Genes regulated by the androgen receptor may be activated by a demethylase, such as LSDl. Accordingly, the expression of these methylated histone-repressed genes can be activated (or derepressed) by the presence of BHC80 and repressed by removing BHC80, such as by using a BHC80 siRNA or antisense or dominant negative mutant. Expression of methylated histone- repressed genes can also be activated (or derepressed) by the presence of CoREST and repressed by removing CoREST, such as by using a CoREST siRNA or antisense or dominant negative mutant. In addition, methylated histone-repressed genes can be repressed by removing BHC80, such as by using a BHC80 siRNA or antisense or dominant negative mutant, and activated by the presence of BHC80. The methylated histone- repressed genes may also be modulated by modulating the expression of one or more of LSDl, CoREST and BHC80.

BHC80 binding to histones may also be decreased in a cell by administering into the cell an unmethylated histoner or BHC80 binding homolog thereof, e.g., comprising amino acids 1-10 of H3, so as to compete BHC80 away. Similarly, BHC80 or histone binding homologs thereof may be administered to a cell to bind to unmethylated histones to regulate gene expression.

The following Table 3 summarizes how gene expression of methylated bistone- repressed and histone-activated genes can be modulated:

Gene modulation BHC80 BHC80/hist. CoREST LSDl methylated histone activation decrease decrease decrease decrease -activated repression increase increase increase decrease

methylated histone activation increase increase increase increase -repressed repression decrease decrease decrease decrease

In Table 3, "increase" of a protein refers to increasing the level of protein or its activity. Increasing the level of protein or activity of a particular protein in a cell may be achieved by contacting the cell with, or administering into the cell: the protein or a functional homolog thereof; a nucleic acid (e.g., an expression vector) encoding the protein or a functional homolog thereof; an agent that upregulates the level of expression of the gene encoding the protein; or an agent that upregulates the activity of the protein, such as a cofactor. Increasing the level of protein or activity of a protein may be by a factor of at least about 50%, 2 fold, 5 fold, 10 fold, 30 fold, 50 fold or 100 fold.

In Table 3, "decrease" of a protein refers to decreasing its level of protein or activity. Decreasing the level of protein or activity of a particular protein in a cell may be achieved by contacting the cell with, or administering into the cell: an siRNA; an antisense; a ribozyme; a triplex nucleic acid; a dominant negative mutant of the protein; a substrate mimetic; an agent that down-regulates the expression of the gene encoding the protein; or an agent that decreases the activity of the protein. Decreasing the level of protein or activity of a protein may be by a factor of at least about 50%, 2 fold, 5 fold, 10 fold, 30 fold, 50 fold or 100 fold.

Methods for modulating the expression of a gene whose expression is modulated by the methylation status of one or more histones may comprise modulating the acetylation/deacetylation status of one or more histones. In one embodiment, demethylation is facilitated or improved by deacetylation. Accordingly, in certain embodiments, a method comprising increasing BHC80 protein level or activity in a cell comprises contacting the cell with an agent that increases histone deacetylase (HDAC) protein or activity levels and/or an agent that decreases histone acetylase protein or activity levels. On the other hand, a method comprising decreasing BHC80 protein level or activity

in a cell may comprise contacting the cell with an agent that decreases HDAC protein or activity levels and/or an agent that increases histone acetylase protein or activity levels.

Methods for modulating the expression of a gene whose expression is modulated by the methylation status of one or more histones may also comprise (i) modulating the methylation status and (ii) modulating the acetylation status of one or more histones involved in regulating the expression of the gene.

The following Table (Table 4) summarizes how gene expression of methylated histone-repressed and histone-activated genes can be modulated by modulating the level of protein or activity of deacetylases or acetylases: Gene modulation deacetylase acetylase methylated histone-repressed activation increase decrease repression decrease increase methylated histone-activated activation decrease increase repression increase decrease "Increase" and "decrease" is as described above for Table 3.

An "acetylase" is used interchangeable herein with "acetyl transferase" and refers to an enzyme that catalyzes the addition of an acetyl group (CH 3 CO " ) to an amino acid. Exemplary acetyl transferases are histone acetyl transferases (HAT).

The term "deacetylase" refers to an enzyme that catalyzes the removal of an acetyl group (CH 3 CO " ) from an amino acid. Class I histone deacetylases (HDACs) includes the yeast Rpd3-like proteins (HDACl, HDAC2, HDAC3, HDAC8, and HDACl 1. Class II HDACs includes the yeast Hdal-like proteins HDAC4, HDAC5, HDAC6, HDAC7, HDAC9, and HDAClO (Fischle, W., et al., J. Biol. Chem, 21 A, 11713-11720 (1999)). Class III HDACs includes the silent mating type information regulation 2 (Sir2) and homologs thereof, such as SIRTl in humans.

The nucleotide and amino acid sequences of each of these human HDACs and the location of conserved domains in their amino acid sequences is set forth in the following table (Table 5) ("i" refers to "isoform"): HDAC nucleotide sequence amino acid sequence conserved domains

(in amino acids)

HDACl NM_004964 NP_004955 28-321

HDAC2 NM_001527 NP_001518 29-322

HDAC3 NM 003883 NP 003874 3-315

HDAC4 NM_006037 NP_006028 91-142; 653-994

HDAC5ϊ1 NM_001015053 NP_001015053 683-1026 i2 NM_005474 NP_005465 682-1025

HDAC6 NM_006044 NP_006035 1132-1180; 883-1068;

480-796; 84-404

HDAC7AH NM_015401 NP_056216 519-829 i2 NM_016596 NP_057680 479-789

HDAC8 NM_018486 NP_060956 16-324

HDAC9ϊ1 NM_014707 NP_055522 i2 NM_058176 NP_478056 633-974 i3 NM_058177 NP_478057 633-860 i4 NM_178423 NP_848510 633-974 i5 NM_178425 NP_848512 636-977

HDAClO NM_032019 NP_114408 1-315

HDACI l NM_024827 NP_079103 17-321

SIRTl NM_012238 NP_036370 431-536; 254-489

SIRT2 il NM_012237 NP_036369 77-331 i2 NM_030593 NP_085096 40-294

SIRT3 ia NM_012239 NP_036371 138-373 ib NM_001017524 NP_001017524 1-231

SIRT4 NM_012240 NP_036372 47-308

SIRT5 il NM_012241 NP_036373 51-301 i2 NM_031244 NP_112534 51-287

SIRT6 NM_016539 NP_057623 45-257

SIRT7 NM 016538 NP 057622 100-314

Other sirtuin family members include the yeast Sir2-like genes termed "HST genes" (homologues of Sir two) HSTl, HST2, HST3 and HST4 and their human homologues.

Methods for modulating gene expression of methylated histone repressed or activated genes may also include modulating the level of protein or activity of methylases. Thus, in a situation in which one desires to reduce methylation, a method may comprise decreasing the level of protein or activity of one or more methylases, whereas in a situation in which one desires to increase methylation, a method may comprise increasing the level of protein or activity of one or more methylases.

Nucleic acids, e.g., those encoding a protein of interest or functional homolog thereof, or a nucleic acid intended to inhibit the production of a protein of interest (e.g., siRNA or antisense RNA) can be delivered to cells, e.g., eukaryotic cells, in culture, to cells ex vivo, and to cells in vivo. The cells can be of any type including without limitation cancer cells, stem cells, neuronal cells, and non-neuronal cells. The delivery of nucleic acids can be by any technique known in the art including viral mediated gene transfer, liposome mediated gene transfer, direct injection into a target tissue, organ, or tumor, injection into vasculature which supplies a target tissue or organ.

Polynucleotides can be administered in any suitable formulations known in the art. These can be as virus particles, as naked DNA, in liposomes, in complexes with polymeric carriers, etc. Polynucleotides can be administered to the arteries which feed a tissue or tumor. They can also be administered to adjacent tissue, whether tumor or normal, which could express the demethylase protein.

Nucleic acids can be delivered in any desired vector. These include viral or non- viral vectors, including adenovirus vectors, adeno-associated virus vectors, retrovirus vectors, lentivirus vectors, and plasmid vectors. Exemplary types of viruses include HSV (herpes simplex virus), AAV (adeno associated virus), HIV (human immunodeficiency virus), BIV (bovine immunodeficiency virus), and MLV (murine leukemia virus). Nucleic acids can be administered in any desired format that provides sufficiently efficient delivery levels, including in virus particles, in liposomes, in nanoparticles, and complexed to polymers.

The nucleic acids encoding a protein or nucleic acid of interest may be in a plasmid or viral vector, or other vector as is known in the art. Such vectors are well known and any can be selected for a particular application. In one embodiment of the invention, the gene delivery vehicle comprises a promoter and a demethylase coding sequence. Preferred promoters are tissue-specific promoters and promoters which are activated by cellular proliferation, such as the thymidine kinase and thymidylate synthase promoters. Other preferred promoters include promoters which are activatable by infection with a virus, such as the α- and β-interferon promoters, and promoters which are activatable by a hormone, such as estrogen. Other promoters which can be used include the Moloney virus LTR, the CMV promoter, and the mouse albumin promoter. A promoter may be constitutive or inducible.

In another embodiment, naked polynucleotide molecules are used as gene delivery vehicles, as described in WO 90/11092 and U.S. Patent 5,580,859. Such gene delivery vehicles can be either growth factor DNA or RNA and, in certain embodiments, are linked to killed adenovirus. Curiel et al., Hum. Gene. Ther. 3:147-154, 1992. Other vehicles which can optionally be used include DNA-ligand (Wu et al., J. Biol. Chem.

264:16985-16987, 1989), lipid-DNA combinations (Feigner et al., Proc. Natl. Acad. Sci. USA 84:7413 7417, 1989), liposomes (Wang et al., Proc. Natl. Acad. Sci. 84:7851-7855, 1987) and microprojectiles (Williams et al., Proc. Natl. Acad. Sci. 88:2726-2730, 1991). A gene delivery vehicle can optionally comprise viral sequences such as a viral origin of replication or packaging signal. These viral sequences can be selected from viruses such as astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, picornavirus, poxvirus, retrovirus, togavirus or adenovirus. In a preferred embodiment, the growth factor gene delivery vehicle is a recombinant retroviral vector. Recombinant retroviruses and various uses thereof have been described in numerous references including, for example, Mann et al., Cell 33:153, 1983, Cane and Mulligan, Proc. Natl. Acad. Sci. USA 81:6349, 1984, Miller et al., Human Gene Therapy 1:5-14, 1990, U.S. Patent Nos. 4,405,712, 4,861,719, and 4,980,289, and PCT Application Nos. WO 89/02,468, WO 89/05,349, and WO 90/02,806. Numerous retroviral gene delivery vehicles can be utilized in the present invention, including for example those described in EP 0,415,731; WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; U.S. Patent No. 5,219,740; WO 9311230; WO 9310218; Vile and Hart, Cancer Res. 53:3860-3864, 1993; Vile and Hart, Cancer Res. 53:962-967, 1993; Ram et al., Cancer Res. 53:83-88, 1993; Takamiya et al., J. Neurosci. Res. 33:493-503, 1992; Baba et al., J. Neurosurg. 79:729-735, 1993 (U.S. Patent No. 4,777,127, GB 2,200,651, EP 0,345,242 and WO91/02805).

A polynucleotide of interest can also be combined with a condensing agent to form a gene delivery vehicle. The condensing agent may be a polycation, such as polylysine, polyarginine, polyornithine, protamine, spermine, spermidine, and putrescine. Many suitable methods for making such linkages are known in the art. In an alternative embodiment, a polynucleotide of interest is associated with a liposome to form a gene delivery vehicle. Liposomes are small, lipid vesicles comprised of an aqueous compartment enclosed by a lipid bilayer, typically spherical or slightly elongated structures several hundred Angstroms in diameter. Under appropriate conditions,

a liposome can fuse with the plasma membrane of a cell or with the membrane of an endocytic vesicle within a cell which has internalized the liposome, thereby releasing its contents into the cytoplasm. Prior to interaction with the surface of a cell, however, the liposome membrane acts as a relatively impermeable barrier which sequesters and protects its contents, for example, from degradative enzymes. Additionally, because a liposome is a synthetic structure, specially designed liposomes can be produced which incorporate desirable features. See Stryer, Biochemistry, pp. 236-240, 1975 (W.H. Freeman, San Francisco, CA); Szoka et al., Biochim. Biophys. Acta 600:1, 1980; Bayer et al., Biochim. Biophys. Acta. 550:464, 1979; Rivnay et al., Meth. Enzymol. 149:119, 1987; Wang et al., PROC. NATL. ACAD. SCI. U.S.A. 84: 7851, 1987, Plant et al., Anal. Biochem. 176:420, 1989, and U.S. Patent 4,762,915. Liposomes can encapsulate a variety of nucleic acid molecules including DNA, RNA, plasmids, and expression constructs comprising growth factor polynucleotides such those disclosed in the present invention.

Liposomal preparations for use in the present invention include cationic (positively charged), anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to mediate intracellular delivery of plasmid DNA (Feigner et al., Proc. Natl. Acad. Sci. USA 84:7413-7416, 1987), mRNA (Malone et al., Proc. Natl. Acad. Sci. USA 86:6077-6081, 1989), and purified transcription factors (Debs et al., J. Biol. Chem. 265:10189-10192, 1990), in functional form. Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. See also Feigner et al., Proc. Natl. Acad. Sci. USA 91: 5148-5152.87, 1994. Other commercially available liposomes include Transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be prepared from readily available materials using techniques well known in the art. See, e.g., Szoka et al., Proc. Natl. Acad. Sci. USA 75:4194-4198, 1978; and WO 90/11092 for descriptions of the synthesis of DOTAP (1,2- bis(oleoyloxy)-3 -(trimethylammonio)propane) liposomes .

Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids (Birmingham, AL), or can be easily prepared using readily available materials. Such materials include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. These materials can also be

mixed with the DOTMA and DOTAP starting materials in appropriate ratios. Methods for making liposomes using these materials are well known in the art.

One or more protein (e.g., a BHC80) or nucleic acid (e.g., siRNA) of interest may be encoded by a single nucleic acid delivered. Alternatively, separate nucleic acids may encode different protein or nucleic acids of interest. Different species of nucleic acids may be in different forms; they may use different promoters or different vectors or different delivery vehicles. Similarly, the same protein or nucleic acid of interest may be used in a combination of different forms.

Antisense molecules, siRNA or shRNA molecules, ribozymes or triplex molecules may be contacted with a cell or administered to an organism. Alternatively, constructs encoding these may be contacted with or introduced into a cell or organism. Antisense constructs, antisense oligonucleotides, RNA interference constructs or siRNA duplex RNA molecules can be used to interfere with expression of a protein of interest, e.g., a histone demethylase. Typically at least 15, 17, 19, or 21 nucleotides of the complement of the mRNA sequence are sufficient for an antisense molecule. Typically at least 19, 21, 22, or 23 nucleotides of a target sequence are sufficient for an RNA interference molecule. Preferably an RNA interference molecule will have a 2 nucleotide 3' overhang. If the RNA interference molecule is expressed in a cell from a construct, for example from a hairpin molecule or from an inverted repeat of the desired histone demethylase sequence, then the endogenous cellular machinery will create the overhangs. siRNA molecules can be prepared by chemical synthesis, in vitro transcription, or digestion of long dsRNA by Rnase III or Dicer. These can be introduced into cells by transfection, electroporation, or other methods known in the art. See Harmon, GJ, 2002, RNA Interference, Nature 418: 244- 251; Bernstein E et al., 2002, The rest is silence. RNA 7: 1509-1521; Hutvagner G et al., RNAi: Nature abhors a double-strand. Curr. Opin. Genetics & Development 12: 225-232; Brummelkamp, 2002, A system for stable expression of short interfering RNAs in mammalian cells. Science 296: 550-553; Lee NS, Dohjima T, Bauer G, Li H, Li M-J, Ehsani A, Salvaterra P, and Rossi J. (2002). Expression of small interfering RNAs targeted against HIV-I rev transcripts in human cells. Nature Biotechnol. 20:500-505; Miyagishi M, and Taira K. (2002). U6-promoter-driven siRNAs with four uridine 3' overhangs efficiently suppress targeted gene expression in mammalian cells. Nature Biotechnol. 20:497-500; Paddison PJ, Caudy AA, Bernstein E, Harmon GJ, and Conklin DS. (2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes & Dev.

16:948-958; Paul CP, Good PD, Winer I, and Engelke DR. (2002). Effective expression of small interfering RNA in human cells. Nature Biotechnol. 20:505-508; Sui G, Soohoo C, Affar E-B, Gay F, Shi Y, Forrester WC, and Shi Y. (2002). A DNA vector-based RNAi technology to suppress gene expression in mammalian cells. Proc. Natl. Acad. Sci. USA 99(6):5515-5520; Yu J-Y, DeRuiter SL, and Turner DL. (2002). RNA interference by expression of short-interfering RNAs and hairpin RNAs in mammalian cells. Proc. Natl. Acad. Sci. USA 99(9):6047-6052.

Antisense or RNA interference molecules can be delivered in vitro to cells or in vivo, e.g., to tumors of a mammal. Typical delivery means known in the art can be used. For example, delivery to a tumor can be accomplished by intratumoral injections. Other modes of delivery can be used without limitation, including: intravenous, intramuscular, intraperitoneal, intraarterial, local delivery during surgery, endoscopic, subcutaneous, and per os. In a mouse model, the antisense or RNA interference can be adminstered to a tumor cell in vitro, and the tumor cell can be subsequently administered to a mouse. Vectors can be selected for desirable properties for any particular application. Vectors can be viral or plasmid. Adenoviral vectors are useful in this regard. Tissue-specific, cell-type specific, or otherwise regulatable promoters can be used to control the transcription of the inhibitory polynucleotide molecules. Non- viral carriers such as liposomes or nanospheres can also be used. An exemplary siRNA or antisense molecule targeting human BHC80 genes comprises the following nucleotide sequences or the complement thereof: 5' ggacctcaaactgtacagctt 3' as well as those set forth in the Examples.

Exemplary siRNA or antisense molecules targeting LSDl genes comprise the following nucleotide sequences or the complement thereof: 5'atgtcaaagatgagcagatt 3' (which targest both mouse and human LSDl); 5'ggcgaaggtagagtacagaga 3' (which targets human LSDl); and 5'ccatggttgtaacaggtctt 3' (which targets mouse LSDl).

An exemplary siRNA or antisense molecule targeting human and mouse CoREST genes comprises the following nucleotide sequence or the complement thereof: 5'gacaatcttggcatgttggt 3'.

Exemplary methods of treatment and diseases

Provided herein are methods of treatment or prevention of conditions and diseases that can be improved by modulating the methylation status of histones, and thereby, e.g.,

modulate the level of expression of methylation activated and methylation repressed target genes, such as an acetylcholine receptor, an SCN gene, p57 and genes regulated by the androgen receptor. A method may comprise administering to a subject, e.g., a subject in need thereof, a therapeutically effective amount of an agent described herein. Diseases such as cancers and neurological disease can be treated by administration of modulators of histone methylation, e.g., modulators of histone demethylase enzyme activity. Histone methylation has been reported to be involved in overexpression of certain genes in cancers and of silencing of neuronal genes in non-neuronal cells. Modulators that are identified by the disclosed methods or modulators that are described herein can be used to treat these diseases, i.e., to restore normal methylation to affected cells.

Based at least on the fact that increased histone methylation has been found to be associated with certain cancers, a method for treating cancer in a subject may comprise administering to the subject a therapeutically effective amount of one or more agents that decrease methylation or restores methylation to its level in corresponding normal cells. It is believed that modulators of methylation can be used for modulating cell proliferation generally. Excessive proliferation may be reduced with agents that decrease methylation, whereas insufficient proliferation may be stimulated with agents that increase methylation. Accordingly, diseases that may be treated include hyperproliferative diseases, such as bening cell growth and malignant cell growths. Exemplary cancers that may be treated include leukemias, e.g., acute lymphoid leukemia and myeloid leukemia, and carcinomas, such as colorectal carcinoma and hepatocarcinoma. Other cancers include Acute Lymphoblastic Leukemia; Acute Lymphoblastic Leukemia; Acute Myeloid Leukemia; Acute Myeloid Leukemia; Adrenocortical Carcinoma Adrenocortical Carcinoma; AIDS-Related Cancers; AIDS- Related Lymphoma; Anal Cancer; Astrocytoma, Childhood Cerebellar; Astrocytoma, Childhood Cerebral; Basal Cell Carcinoma, see Skin Cancer (non-Melanoma); Bile Duct Cancer, Extrahepatic; Bladder Cancer; Bladder Cancer; Bone Cancer, osteosarcoma/Malignant Fibrous Histiocytoma; Brain Stem Glioma; Brain Tumor; Brain Tumor, Brain Stem Glioma; Brain Tumor, Cerebellar Astrocytoma; Brain Tumor, Cerebral Astrocytoma/Malignant Glioma; Brain Tumor, Ependymoma; Brain Tumor,

Medulloblastoma; Brain Tumor, Supratentorial Primitive Neuroectodermal Tumors; Brain Tumor, Visual Pathway and Hypothalamic Glioma; Brain Tumor; Breast Cancer; Breast Cancer and Pregnancy; Breast Cancer; Breast Cancer, Male; Bronchial

Adenomas/Carcinoids; Burkitt's Lymphoma; Carcinoid Tumor; Carcinoid Tumor, Gastrointestinal; Carcinoma of Unknown Primary; Central Nervous System Lymphoma, Primary; Cerebellar Astrocytoma;Cerebral Astrocytoma/Malignant Glioma; Cervical Cancer; Childhood Cancers; Chronic Lymphocytic Leukemia; Chronic Myelogenous Leukemia; Chronic Myeloproliferative Disorders; Colon Cancer; Colorectal Cancer; Cutaneous T-CeIl Lymphoma, see Mycosis Fungoides and Sezary Syndrome; Endometrial Cancer; Ependymoma; Esophageal Cancer; Esophageal Cancer; Ewing's Family of Tumors; Extracranial Germ Cell Tumor; Extragonadal Germ Cell Tumor; Extrahepatic Bile Duct Cancer; Eye Cancer, Intraocular Melanoma; Eye Cancer, Retinoblastoma; Gallbladder Cancer; Gastric (Stomach) Cancer; Gastric (Stomach) Cancer; Gastrointestinal Carcinoid Tumor; Germ Cell Tumor, Extracranial; Germ Cell Tumor, Extragonadal; Germ Cell Tumor, Ovarian; Gestational Trophoblastic Tumor; Glioma; Glioma, Childhood Brain Stem; Glioma, Childhood Cerebral Astrocytoma; Glioma, Childhood Visual Pathway and Hypothalamic; Hairy Cell Leukemia; Head and Neck Cancer; Hepatocellular (Liver) Cancer, Adult (Primary); Hepatocellular (Liver) Cancer, Childhood (Primary); Hodgkin's Lymphoma; Hodgkin's Lymphoma; Hodgkin's Lymphoma During Pregnancy; Hypopharyngeal Cancer; Hypothalamic and Visual Pathway Glioma; Intraocular Melanoma; Islet Cell Carcinoma (Endocrine Pancreas); Kaposi's Sarcoma; Kidney (Renal Cell) Cancer; Kidney Cancer; Laryngeal Cancer; Laryngeal Cancer; Leukemia, Acute Lymphoblastic; Leukemia, Acute Lymphoblastic; Leukemia, Acute

Myeloid; Leukemia, Acute Myeloid; Leukemia, Chronic Lymphocytic; Leukemia; Chronic Myelogenous; Leukemia, Hairy Cell; Lip and Oral Cavity Cancer; Liver Cancer, Adult (Primary); Liver Cancer, Childhood (Primary); Lung Cancer, Non-Small Cell; Lung Cancer, Small Cell; Lymphoma, AIDS-Related; Lymphoma, Burkitt's; Lymphoma, Cutaneous T-CeIl, see Mycosis Fungoides and Sezary Syndrome; Lymphoma, Hodgkin's; Lymphoma, Hodgkin's; Lymphoma, Hodgkin's During Pregnancy; Lymphoma, Non- Hodgkin's; Lymphoma, Non-Hodgkin's; Lymphoma, Non-Hodgkin's During Pregnancy; Lymphoma, Primary Central Nervous System; Macroglobulinemia, Waldenstrom's; Malignant Fibrous Histiocytoma of Bone/Osteosarcoma; Medulloblastoma; Melanoma; Melanoma, Intraocular (Eye); Merkel Cell Carcinoma; Mesothelioma, Adult Malignant; Mesothelioma; Metastatic Squamous Neck Cancer with Occult Primary; Multiple Endocrine Neoplasia Syndrome; Multiple Myeloma/Plasma Cell Neoplasm' Mycosis Fungoides; Myelodysplasia Syndromes; Myelodysplastic/Myeloproliferative Diseases;

Myelogenous Leukemia, Chronic; Myeloid Leukemia, Adult Acute; Myeloid Leukemia, Childhood Acute; Myeloma, Multiple; Myeloproliferative Disorders, Chronic; Nasal Cavity and Paranasal Sinus Cancer; Nasopharyngeal Cancer; Nasopharyngeal Cancer; Neuroblastoma; Non-Hodgkin's Lymphoma; Non-Hodgkin's Lymphoma; Non-Hodgkin's Lymphoma During Pregnancy; Non-Small Cell Lung Cancer; Oral Cancer; Oral Cavity Cancer, Lip and; Oropharyngeal Cancer; Osteosarcoma/Malignant Fibrous Histiocytoma of Bone; Ovarian Cancer; Ovarian Epithelial Cancer; Ovarian Germ Cell Tumor; Ovarian Low Malignant Potential Tumor; Pancreatic Cancer; Pancreatic Cancer; Pancreatic Cancer, Islet Cell; Paranasal Sinus and Nasal Cavity Cancer; Parathyroid Cancer; Penile Cancer; Pheochromocytoma; Pineoblastoma and Supratentorial Primitive Neuroectodermal Tumors; Pituitary Tumor; Plasma Cell Neoplasm/Multiple Myeloma; Pleuropulmonary Blastoma; Pregnancy and Breast Cancer; Pregnancy and Hodgkin's Lymphoma; Pregnancy and Non- Hodgkin's Lymphoma; Primary Central Nervous System Lymphoma; Prostate Cancer; Rectal Cancer; Renal Cell (Kidney) Cancer; Renal Cell (Kidney) Cancer; Renal Pelvis and Ureter, Transitional Cell Cancer; Retinoblastoma; Rhabdomyosarcoma; Salivary Gland Cancer; Salivary Gland Cancer; Sarcoma, Ewing's Family of Tumors; Sarcoma, Kaposi's; Sarcoma, Soft Tissue; Sarcoma, Soft Tissue; Sarcoma, Uterine; Sezary Syndrome; Skin Cancer (non-Melanoma); Skin Cancer; Skin Cancer (Melanoma); Skin Carcinoma, Merkel Cell; Small Cell Lung Cancer; Small Intestine Cancer; Soft Tissue Sarcoma; Soft Tissue Sarcoma; Squamous Cell Carcinoma, see Skin Cancer (non-Melanoma); Squamous Neck Cancer with Occult Primary, Metastatic; Stomach (Gastric) Cancer; Stomach (Gastric) Cancer; Supratentorial Primitive Neuroectodermal Tumors; T-CeIl Lymphoma, Cutaneous, see Mycosis Fungoides and Sezary Syndrome; Testicular Cancer; Thymoma; Thymoma and Thymic Carcinoma; Thyroid Cancer; Thyroid Cancer; Transitional Cell Cancer of the Renal Pelvis and Ureter; Trophoblastic Tumor, Gestational; Unknown Primary Site,

Carcinoma of; Unknown Primary Site, Cancer of; Unusual Cancers of Childhood; Ureter and Renal Pelvis, Transitional Cell Cancer; Urethral Cancer; Uterine Cancer, Endometrial; Uterine Sarcoma; Vaginal Cancer; Visual Pathway and Hypothalamic Glioma; Vulvar Cancer; Waldenstrom's Macroglobulinemia; Wilms' Tumor; and Women's Cancers. Preferred cancers include prostate, breast and colon cancers.

Neurologic diseases that may be treated include epilepsy, schizophrenia, bipolar disorder or other psychological and/or psychiatric disorders, neuropathies, skeletal muscle atrophy, and neurodegenerative diseases, e.g., a neurodegenerative disease. Exemplary

neurodegenerative diseases include: Alzheimer's, Amyotrophic Lateral Sclerosis (ALS), and Parkinson's disease. Another class of neurodegenerative diseases includes diseases caused at least in part by aggregation of poly-glutamine. Diseases of this class include: Huntington's Diseases, Spinalbulbar Muscular Atrophy (SBMA or Kennedy's Disease) Dentatorubropallidoluysian Atrophy (DRPLA), Spinocerebellar Ataxia 1 (SCAl),

Spinocerebellar Ataxia 2 (SCA2), Machado- Joseph Disease (MJD; SCA3), Spinocerebellar Ataxia 6 (SC A6), Spinocerebellar Ataxia 7 (SC A7), and Spinocerebellar Ataxia 12 (SCA12).

Another disease that can be treated include mental retardation. Any other disease in which epigenetics, in particular methylation, plays a role is likely to be treatable or preventable by applying methods described herein.

Pharmaceutical Compositions

Pharmaceutical compositions of this invention include any modulator identified according to the present invention, or a pharmaceutically acceptable salt thereof, and a pharmaceutically acceptable carrier, adjuvant, or vehicle.

Methods of making and using such pharmaceutical compositions are also included in the invention. The pharmaceutical compositions of the invention can be administered orally, parenterally, by inhalation spray, topically, rectally, nasally, buccally, vaginally, or via an implanted reservoir. The term parenteral as used herein includes subcutaneous, intracutaneous, intravenous, intramuscular, intra articular, intrasynovial, intrasternal, intrathecal, intralesional, and intracranial injection or infusion techniques.

Dosage levels of between about 0.01 and about 100 mg/kg body weight per day, preferably between about 0.5 and about 75 mg/kg body weight per day of the modulators described herein are useful for the prevention and treatment of disease and conditions. The amount of active ingredient that may be combined with the carrier materials to produce a single dosage form will vary depending upon the host treated and the particular mode of administration. A typical preparation will contain from about 5% to about 95% active compound (w/w). Alternatively, such preparations contain from about 20% to about 80% active compound.

Kits

The present invention provides kits, for example for screening, diagnosis, preventing or treating diseases, e.g., those described herein. For example, a kit may comprise one or more polypeptides or one or more modulators, optionally formulated as pharmaceutical compositions as described above and optionally instructions for their use. In still other embodiments, the invention provides kits comprising one or more one or more polypeptides or one or more modulators, optionally formulated as pharmaceutical compositions, and one or more devices for accomplishing administration of such compositions. Kit components may be packaged for either manual or partially or wholly automated practice of the foregoing methods. In other embodiments involving kits, this invention contemplates a kit including compositions of the present invention, and optionally instructions for their use. Such kits may have a variety of uses, including, for example, imaging, diagnosis, therapy, and other applications.

All publications, including patents, applications, and GenBank Accession numbers mentioned herein are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In case of conflict, the present application, including any definitions herein, will control.

The invention now being generally described, it will be more readily understood by reference to the following examples, which are included merely for purposes of illustration of certain aspects and embodiments of the present invention, and are not intended to limit the invention.

EXEMPLIFICATION

Histone methylation plays a critical role in regulating chromatin structure, gene transcription and the epigenetic state of the cell. LSDl is a histone demethylase that represses transcription via demethylation of histone H3K4 '. The LSDl complex contains HDAC 1 and 2, LSDl, Co-REST, BRAF35 and BHC80, a PHD finger-containing protein. Previous studies identified the roles for all but BHC80 in mediating events upstream of LSDl -mediated demethylation 2'4 . Here we report that, in contrast to the BPTF and ING2 plant homeodomain (PHD) fingers, which bind methylated H3K4 (H3K4me3) 5 ' 6 , the

BHC80 PHD finger binds un-methylated H3K4 (H3K4meO), and that this interaction is specifically abrogated by H3K4 methylation. The crystal structure of the BHC80 PHD finger bound to an unmodified H3 peptide identifies the structural basis for H3K4meO recognition. RNAi knockdown of BHC80 results in de-repression of LSDl target genes, which is restored by re-introduction of wildtype BHC80 but not a PHD-finger mutant unable to bind H3. ChIP analyses reveal a reciprocal interdependence between BHC80 and LSDl for chromatin association. These findings couple BHC80 function to that of LSDl, and suggest that unmodified H3K4 is part of the "histone code" 7 . They further raise the possibility that generation and recognition of unmodified state on histone tails in general may be just as critical as histone post-translational modifications per se for chromatin and transcriptional regulation.

EXAMPLE 1: BHC80 bind histone H3 through the PHD zinc finger

Recent studies have identified a subset of PHD fingers that bind methyl lysine 5>6>8 ' 9 . To investigate the role of BHC80, a PHD finger-containing protein (Figure IA) of the LSDl co-repressor complex, in transcriptional repression, we determined whether BHC80 also binds histone tails via its PHD finger. As shown in Figure IB, BHC80 binds the first 21 residues of histone H3 (lane 3), but not residues 21-44 or histones H4, H2A or H2B (lanes 4, and 8-11). Unexpectedly, the BHC80-H3 interaction is disrupted by methylation of K4, but is insensitive to modifications at K9 or Kl 4 (Figure IB, lanes 6-7 and Figure 5). Native BHC80 present in the LSDl complex also binds H3K4meO, which is similarly compromised by H3K4 dimethylation (Figure 1C). BHC80 in a reconstituted, three-component complex (LSDl, CoREST and BHC80) also retains the ability to bind H3K4meO (data not shown). While full-length BHC80 binds H3K4meO, deletion of the PHD finger

(BHC80δPHD) significantly impairs this interaction (Figure ID). A GST fusion of the finger (BHC80 residues 486-543) binds the H3 tail and retains the histone binding specificity of full-length BHC80 (Figure IE), indicating that the PHD finger alone is necessary and sufficient for binding H3K4meO peptide. Furthermore, all three methylation states of H3K4 inhibit binding, with mono-methylation having the least adverse effect

(Figure IF). Isothermal titration calorimetric (ITC) analysis of the PHD finger determined a K D of ~30 μM for the unmodified H3 peptide, -460 μM for H3K4mel, and no binding to H3K4me2 (Figure IG), consistent with the pulldown results.

EXAMPLE 3: Structure of BHC80 PHD with H3 1-10

The co-crystal structure of the BHC80 PHD domain (residues 486 to 543) bound to the unmodified H3 tail (residues 1-10) was solved at the resolution of 1.43 angstrom (Table Sl). AU ten residues of the H3 peptide were observed (Figure 6) bound to one of the two molecules in the asymmetric unit (Figure 7A). Like all structurally characterized PHD (and RING) fingers, the BHC80 PHD finger adopts a 'cross-braced' topology OfZn 2+ coordinating residues (Figure 2 A and B). The H3 peptide binds to the surface of the PHD finger as an anti-parallel β-sheet, with H3R2 to H3R8 forming backbone hydrogen bonds with G498 to M502 of BHC80 (Figure 2B). The cognate PHD finger contacts the first eight residues of the H3 peptide, but H3 residues 9 and 10 interact with a neighboring PHD molecule (Figure 7B). The BHC80 PHD finger substrate specificity is determined primarily through the recognition of the H3 N-terminus, H3K4 and H3R8 (Figure 2C). Three main chain carbonyl oxygen atoms (residues 523, 524, and 525) form a hydrogen bond 'cage' that recognizes the amino terminus of H3. The side chain of H3A1 inserts into a shallow hydrophobic pocket formed by L512, W527, and P523 (Figure 2C, inset). The H3 peptide-binding site is further defined by M502, which inserts in between H3R2 and H3K4, and the side chain of D489, which inserts in between the side chains of H3K4 and H3R8, forming an electrostatic bridge between the two (Figure 2C). The importance of D489 and M502 to H3 binding was confirmed by mutagenesis where mutation of D to A and M to W, respectively, abolished PHD binding to the unmodified histone H3 tail (Figure 2D).

In addition to the interaction with D489, the epsilon amino group of H3K4 forms a hydrogen bond with the main chain carbonyl oxygen of E488 (Figure 2C, inset). A β- carbon from H487, 4.0 A away, further restricts the lysine-binding site. Indeed, a modeled mono-methyl lysine is allowed only a 15 -degree range of motion before clashing with other atoms (Figure 8). This may account for the >15-fold reduction in binding of BHC80 to the H3K4mel versus the H3K4meO peptide (Figure IG). A second or third methyl group would clash with the D489 side chain, E488 carbonyl, or H487 β-carbon, consistent with the observation that BHC80 PHD fmger does not bind H3K4me2 or H3K4me3 peptides (Figure IF and IG). Therefore, molecular recognition of unmodified lysine is primarily through bonds to the unmodified epsilon amino group, and steric exclusion of methyl

groups. This mode of binding is distinct from the caging of di and tri-methyl lysine by aromatic residues, as identified in the Polycomb and HPl chromodomains 10"13 .

EXAMPLE 4: Structural comparison of the BHC80, BPTF, and ING2 PHD fingers The aromatic cage, though absent from BHC80, is also present in structurally- characterized methyl lysine-binding PHD fingers, such as BPTF and ING2 14 ' 15 . BHC80, BPTF, and ING2 PHD fingers adopt highly similar folds (Figure 3A). All engage the H3 peptide as an anti-parallel β-sheet on the same face, with recognition of the H3 N-amine, and the H3A1 side chain. H3R2 is buried in a pocket in BPTF and ING2, but is not contacted by BHC80. In BHC80, M502 occupies the space left open for R2 binding by a conserved glycine (G) in BPTF and ING2 12 . Only BHC80 contacts H3R8, whereas in BPTF and ING2, the H3 peptide meanders off the face of the PHD finger prior to R8. The BPTF PHD finger features a full aromatic cage (Figure 3B) reminiscent of HPl or Polycomb chromodomain binding trimethyl lysine 12 ' 13 , while ING2 only has half a hydrophobic cage, with a serine and methionine finishing the H3K4 binding pocket, similar to H3K4me3 recognition by the CHD double chromodomain 16 ' 17 . As a family, PHD fingers show flexibility in peptide binding (Figure 3C), making it difficult to predict whether an individual PHD finger is a histone-binding module, and whether it binds lysine or methyl lysine, based on primary sequence. Indeed, the robustness of the PHD scaffold, and its plasticity as a binding module has been noted 18 . Our data suggest the array of PHD fingers present in chromatin interacting and modifying proteins, many of which lack the consensus binding sequence for H3K4me3 present in the BPTF and ING PHD fingers, may have histone binding activities of unknown specificity. In this regard, Dnmt3L, a regulatory factor required for de novo DNA methylation of imprinting control regions in female germ cells and of retro-transposons in male germ cells 19"21 , was found to bind H3K4meO via its Cys-rich PHD-like domain (Ooi et al., see the accompanying paper). Structural comparison reveals that the N-terminal Cys-rich domain of DNMT3L is highly similar to the PHD finger of BHC80 (Figure 9). The structural similarity as well as their common mode of interaction with histone H3K4meO raises the question of whether de novo DNA methylation (via DNMT3L) is linked to the action of H3K4 demethylases and their associated complex components.

EXAMPLE 5: BHC80 binding to H3 tail is important for LSDl-mediated repression

Having established the specificity of BHC80 for H3K4meO, we next investigated whether LSDl-mediated H3K4me2 demethylation affects BHC80 binding at target promoters in vivo. Previous studies reported that depletion of LSDl by RNAi resulted in an increase of H3K4me2 at the SYNl and SCNlA promoters 1>4 . Importantly, ChIP analysis showed that BHC80 binding to these promoters was reduced in the LSDl depleted cells, although the global level of BHC80 was unaffected (Figure 4A and data not shown). These data are consistent with the model that LSDl-mediated demethylation of H3K4me2 is important for BHC80 chromatin association. We further explored the function of BHC80 in LSD 1 -mediated transcriptional repression by depleting BHC80 using two independent shRNA constructs (Figure 4B, top panel). Inhibition of BHC80 expression resulted in a consistent de-repression of a number of LSDl target genes, including SCNlA, SCN3A and SYNl (Figure 4B). Introduction of the RNAi resistant, wildtype BHC80, but not the PHD finger mutant D489A, into the BHC80 knockdown cells restored repression of LSDl target genes, indicating that BHC80 binding H3K4meO is important for LSDl-mediated gene repression. D489A was expressed at a similar level to that of the wildtype protein (Figure 4C) and assembled into the LSDl complex in the transfected cells (data not shown).

The binding of BHC80 to H3K4meO, which is a demethylation product of LSDl, suggests that BHC80 functions downstream of LSDl . How is a downstream effector of LSDl required for efficient LSDl-mediated repression? Recent studies suggest that histone methylation regulation is highly dynamic, and may involve the actions of both methylases and demethylases at the same target promoters 22 . BHC80 may be necessary to help maintain LSDl at the target promoters to prevent re-methylation of H3K4. Consistent with this idea, BHC80 has previously been shown to physically interact with LSDl 23 . To address this model experimentally, we compared binding of LSDl to H3 using LSDl complexes prepared from either wildtype or BHC80 knockdown cells. Reduction of BHC80 in the LSDl complex resulted in decreased binding of LSDl to the K4 un-methylated histone H3 peptides in vitro (Figure 4D), suggesting that BHC80 is important for LSDl association with its reaction product H3K4meO. Consistently, recombinant BHC80 was able to significantly increase binding of LSDl to the H3K4meO peptide in the presence of CoREST in vitro (data not shown). These findings suggest that BHC80 is required for LSDl association with histone H3 after demethylation.

We further investigated the effect of depletion of BHC80 on the LSDl protein complex and its occupancy at target promoters. Knockdown of BHC80 had no effect on the assembly of the LSDl complex in solution by co-immunoprecipitation analysis (Figure 4E). In contrast, occupancy of LSDl at its target gene loci was reduced as a result of BHC80 knockdown (Figure 4E). These findings support the model where BHC80 is required for stable association of the LSDl complex with its target promoters. An alternative but not mutually exclusive model is that binding of BHC80 to the de-methylated H3 may be important for LSDl to mediate demethylation of the neighboring nucleosome, a propagation mechanism similar to what has been proposed for Suv39H and HPl, where the Suv39H methylase and the H3K9me-binding HPl function together to propagate the repressive signal at the target loci 24~26 . It is interesting to note that many histone demethylases contain histone tail binding modules in the form of PHD or Tudor domains. It will be interesting to determine whether some of these histone tail-binding modules function similarly to what has been reported here for LSDl and BHC80, or facilitate cross- talk among modifications at different residues on histone tails 27"29 . Taken together, our findings suggest that each and every subunit of the LSDl complex plays a unique role in coordinating and setting up a chromatin environment important for repression.

In sum, we have identified a PHD finger with a new histone tail binding specificity (H3K4me0), provided structural insights into the specific recognition of unmethylated lysine, as well as the functional importance. We speculate the existence of additional such binding modules in the human proteome that are dedicated to recognizing the unmodified state of amino acid residues on histone tails that are otherwise modified by various post- translational modification events. We anticipate that identification and functional insights into these modules will shed significant new light on dynamic chromatin regulation.

EXAMPLE 6: Methods

BHC80 derived recombinant proteins were purified from bacteria with either 6xHis or GST tags. LSDl complex was purified from a stable HeLa cell line with Flag-HA-LSDl expression 4 . In vitro binding assays were carried out using purified recombinant BHC80 proteins or LSDl complex with a panel of biotinylated histone peptides with specific modifications at indicated lysine residues. Tranylcypromine was used in LSDl complex binding assay to inhibit its demethylase activity. The bound fractions were visualized by either Coomassie Blue staining or Western blotting after immobilization by streptavidin

beads. shRNAs vectors were made as previously described 30 and cotransfected with the puromycin resistance gene into HeLa cells. Chromatin samples and RNA samples were prepared from drug selected cells. Semi-quantitative ChIP assay and RT-PCR was done by adding 32 P-dCTP during PCR amplification and analyzed by phosphor-image quantification. PCR reactions were optimized within lineage range. ChIP control oligos were designed to amplify the intron region of RNA PoIII gene, which in principle is not affected by either LSDl or BHC80 knockdown. For crystallography and isothermal titration calorimetry (ITC), the BHC80 PHD finger was expressed as a 6XHis-SUMO fusion, and cleaved, leaving HisMet fused to the recombinant protein. The structure of BHC80 PHD domain in complex with histone peptide was solved by three-wavelength Zn anomalous diffraction data (Table Sl). ITC was performed on a MicroCal VP-ITC by injecting synthetic H3 peptide (residues 1-10, K4meO, 1 or 2 into BHC80 at various concentrations.

The X-ray structure of BHC80 PHD domain in complex with H3 tail peptide has been deposited to PDB as 2PUY. In vitro binding assays

Histone peptides (0.5 μg) were incubated with 2-5 μg purified recombinant BHC80 or GST-PHD finger or 50 ng LSDl complex for 2 h at 4 °C in binding buffer (20 mM Tris- HCl 7.5, 150 mM NaCl, 0.1% Triton X-100). LSDl complexes were purified by Flag- immunoprecipitation from a stable cell line with Flag-H A-LSDl followed by 2 χ elution using 3Flag peptide (Sigma) 4 . After 1 h incubation, Streptavidin beads (Upstate 16-126) were washed four times and subjected to Coomassie blue staining. Tranylcypromine (Sigma

P8511) was used in Fig. Ic at a concentration of 50 μM to inhibit the demethylation reaction.

RNAi and PCR with reverse transcriptase (RT-PCR) Two BHC80 shRNAs were designed to target the sequences (5'-

GGAGGCTCTTAAAGTGGAAAT-3' and 5'-GGGCAGAGGCTGTCCAAAT-S'), and subcloned into a pBS-U6 vector. LSDl shRNA was done as described 1 . HeLa cells were cotransfected with shRNA and pBabe-puro vectors. After 12 h of transfection, cells were split and selected by 1.5 μg mF 1 puromycin for 60 h. RNA extraction and RT-PCR were performed as described 1 ' 4 . The RT-PCR results in Fig. 4b,c were achieved by radioactive PCR.

Chromatin immunoprecipitation

LSDl ChIP experiments were done using primers that have been described 1 . Chromatin samples from 10 7 cells were sonicated to 200-500 bp in ChIP lysis buffer (50 mM HEPES/KOH pH 7.5, 500 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na- deoxycholate and protease inhibitors), and incubated with 3 μg LSDl antibody (ab- 17221) or BHC80 antibody 10 for each chromatin IP experiment. The recovered DNA was amplified in radioactive PCR by the indicated primers and quantified by phosphor image, and an unrelated genomic region was used as a PCR internal control for normalization. Crystallography The BHC80 PHD finger (residues 486-543) was expressed in BL21 -Gold (DE3) E. coli cells (Stratagene) as a 6 x His-Smt3 (yeast SUMO) fusion construct (generated in-house: see ref. 31) harbored in a modified pET28b vector (Novagen). Protein expression was induced with 0.4 mM isopropyl-β-D-thiogalactopyranoside for 3 h at 37 0 C in Luria- Bertani broth supplemented with 50 μM ZnCl 2 . The fusion protein was isolated on nickel- charged HiTrap Chelating HP (GE Healthcare). Following imidazole elution, the fusion was cleaved off by UIp 1 protease 31 , leaving two amino acids (HisMet) fused to the BHC80 PHD finger. The protein was further purified by HiTrap-Q and Superdex-75 (GE HealthCare). For co-crystallization, the PHD finger (final concentration -40 mg mT l in 20 mM Tris, pH 7.2, 100 mM NaCl, 5 mM DTT, and 25 μM ZnCl 2 ) was mixed in a 1:1.5 ratio with an H3 peptide (residues 1-10, dissolved in the protein buffer and neutralized with NaOH). Clusters of rod-shaped crystals were obtained using the sitting drop vapour- diffusion method at 16 °C, with mother liquor containing 100 mM sodium citrate, pH 5.6, 20% polyethylene glycol 4000, and 20% isopropanol. In follow-up screens, we obtained single rhombohedral crystals with the hanging-drop vapour diffusion method, in mother liquor containing MES 6.2-6.5, 5-10% polyethylene glycol 4000, and 20% isopropanol. Both crystal forms diffract in the C2 space group with similar cell dimensions, but the latter were used to determine the structure. Crystals were cyroprotected by soaking in mother liquor supplemented with 40% glycerol. To prevent the evaporation of isopropanol, drops containing crystals were submerged in paraffin oil during handling. All data were collected at the SER-CAT 22-ID beamline at the Advanced Light

Source at Argonne National Laboratory on a MAR300 CCD detector. Zinc anomalous diffraction data were collected from two crystals and processed using HKL2000. SOLVE 32 found four zinc sites. We used SOLOMON solvent flipping 33 for density modification,

which gave a clear solvent boundary and traceable density. Model building with O 34 and refinement with CNS 35 continued by using a 1.43-A data set (from a third crystal). Isothermal titration calorimetry

A SUMO-tagged fusion of the BHC80 PHD was exchanged into 25 mM Tris-HCl buffer, 50 mM NaCl, 2 mM β-mercaptoethanol (pH 7.2) by gel-filtration chromatography. Extensively lyophilized H3 1-10 peptides were dissolved in the same buffer. ITC measurements were carried out from 100-500 μM protein concentration, with 3-7 mM peptide concentration, on a MicroCal VP-ITC instrument at 25 °C. For each peptide, a reference titration of peptide into SUMO alone was subtracted from experimental data to control for heat of dilution and non-specific binding. Binding constants were calculated by fitting the data using the ITC data analysis module of Origin 7.0 (OriginLab Corporation).

References 1. Shi, Y. et al. Histone demethylation mediated by the nuclear amine oxidase homolog LSDl. Cell 119, 941-53 (2004). 2. Hakimi, M. A. et al. A core-BRAF35 complex containing histone deacetylase mediates repression of neuronal-specific genes. Proc Natl Acad Sd USA 99, 7420- 5 (2002). 3. Lee, M. G., Wynder, C, Cooch, N. & Shiekhattar, R. An essential role for CoREST in nucleosomal histone 3 lysine 4 demethylation. Nature 437, 432-5 (2005). 4. Shi, Y. J. et al. Regulation of LSDl Histone Demethylase Activity by Its Associated Factors. MoI Cell 19, 857-64 (2005).

5. Shi, X. et al. ING2 PHD domain links histone H3 lysine 4 methylation to active gene repression. Nature 442, 96-9 (2006).

6. Wysocka, J. et al. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature 442, 86-90 (2006).

7. Strahl, B. D. & Allis, C. D. The language of covalent histone modifications. Nature 403, 41-45 (2000).

8. Shi, X. et al. Proteome-wide analysis in Saccharomyces cerevisiae identifies several PHD fingers as novel direct and selective binding modules of histone H3 methylated at either lysine 4 or lysine 36. J Biol Chem 282, 2450-5 (2007).

9. Iwase, S. et al. The X-Linked Mental Retardation Gene SMCX/JARIDIC Defines a Family of Histone H3 Lysine 4 Demethylases. Cell (2007).

10. Jacobs, S. A. & Khorasanizadeh, S. Structure of HPl chromodomain bound to a lysine 9-methylated histone H3 tail. Science 295, 2080-3 (2002).

11. Nielsen, P. R. et al. Structure of the HPl chromodomain bound to histone H3 methylated at lysine 9. Nature 416, 103-7 (2002). 12. Fischle, W. et al. Molecular basis for the discrimination of repressive methyl-lysine marks in histone H3 by Polycomb and HPl chromodomains. Genes Dev 17, 1870- 81 (2003).

13. Min, J., Zhang, Y. & Xu, R. M. Structural basis for specific binding of Polycomb chromodomain to histone H3 methylated at Lys 27. Genes Dev 17, 1823-8 (2003). 14. Pena, P. V. et al. Molecular mechanism of histone H3K4me3 recognition by plant homeodomain of ING2. Nature 442, 100-3 (2006).

15. Li, H. et al. Molecular basis for site-specific read-out of histone H3K4me3 by the BPTF PHD finger of NURF. Nature 442, 91-5 (2006).

16. Flanagan, J. F. et al. Double chromodomains cooperate to recognize the methylated histone H3 tail. Nature 438, 1181-5 (2005).

17. Sims, R. J., 3rd et al. Human but not yeast CHDl binds directly and selectively to histone H3 methylated at lysine 4 via its tandem chromodomains. J Biol Chem 280, 41789-92 (2005).

18. Kwan, A. H. et al. Engineering a protein scaffold from a PHD finger. Structure 11, 803-13 (2003).

19. Bourc'his, D. & Bestor, T. H. Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature 431, 96-9 (2004).

20. Bourc'his, D., Xu, G. L., Lin, C. S., Bollman, B. & Bestor, T. H. Dnmt3L and the establishment of maternal genomic imprints. Science 294, 2536-9 (2001). 21. Hata, K., Okano, M., Lei, H. & Li, E. Dnmt3L cooperates with the Dnmt3 family of de novo DNA methyltransferases to establish maternal imprints in mice.

Development 129, 1983-93 (2002). 22. Garcia-Bassets, I. et al. Histone methylation-dependent mechanisms impose ligand dependency for gene activation by nuclear receptors. Cell 128, 505-18 (2007). 23. Iwase, S. et al. Characterization of BHC80 in BRAF-HDAC complex, involved in neuron-specific gene repression. Biochem Biophys Res Commun 322, 601-8 (2004).

24. Lachner, M., O'Carroll, D., Rea, S., Mechtler, K. & Jenuwein, T. Methylatin of histone H3 lysine 9 creates a binding site for HPl proteins. Nature 410, 116-120 (2001).

25. Bannister, A. J. et al. Selective recognition of methylated lysine 9 on histone H3 by the HPl chromo domain. Nature 410, 120-124 (2001).

26. Nakayama, J.-I., Rice, J. C, Strahl, B. D., AlHs, C. D. & Grewal, S. I. S. Role of histone H3 lysine 9 methylatin in epigenetic control of heterochromatin assembly. Science 292, 110-113 (2001).

27. Huang, Y., Fang, J., Bedford, M. T., Zhang, Y. & Xu, R. M. Recognition of histone H3 lysine-4 methylation by the double tudor domain of JMJD2A. Science 312, 748-

51 (2006).

28. Shi, Y. & Whetstine, J. R. Dynamic regulation of histone lysine methylation by demethylases. MoI Cell 25, 1-14 (2007).

29. Iwase, S. et al. The X-linked mental retardation gene SMCX/JARIDIC defines a family of histone H3 lysine 4 demethylases. Cell 128, 1077-88 (2007).

30. Sui, G. & Shi, Y. Gene Silencing by a DNA Vector-Based RNAi Technology. Methods MoI Biol 309, 205-18 (2005).

31 M. P. Malakhov, M. R. Mattern, O. A. Malakhova et al., J Struct Fund Genomics 5 (1-2), 75 (2004). 32 T. C. Terwilliger and J. Berendzen, Acta Crystallogr D Biol Crystallogr 55 (Pt 4), 849 (1999).

33 J. P. Abrahams and A. G. Leslie, Acta Crystallogr D Biol Crystallogr 52 (Pt 1), 30 (1996).

34 T. A. Jones, J. Y. Zou, S. W. Cowan et al., Acta Crystallogr A 47 ( Pt 2), 110 (1991).

35 A. T. Brunger, P. D. Adams, G. M. Clore et al., Acta Crystallogr D Biol Crystallogr 54 (Pt 5), 905 (1998).

Equivalents Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents of the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.

Table Sl. Statistics of X-ray Data Reduction and Phasing, and Structure Refinement

R.m.s. deviations

Bond Lengths (A) 0.008

Bond Angles (°) 1.4

Dihedral (°) 24.3

Improper (°) 0.90

Estimated coordinate error

From Luzzati plot (A) 0.16

Highest resolution shell is shown in parenthesis.

- 73 -