Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS FOR CHROMATIN IMMUNO-PRECIPITATIONS
Document Type and Number:
WIPO Patent Application WO/2012/047726
Kind Code:
A1
Abstract:
Methods and kits for chromatin immuno-precipitation are provided.

Inventors:
AMIT IDO (IL)
BERNSTEIN BRADLEY (US)
GARBER MANUEL (US)
GOREN ALON (US)
RAM OREN (US)
REGEV AVIV (US)
SHORESH NOAM (US)
Application Number:
PCT/US2011/054072
Publication Date:
April 12, 2012
Filing Date:
September 29, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BROAD INST INC (US)
GEN HOSPITAL CORP (US)
MASSACHUSETTS INST TECHNOLOGY (US)
AMIT IDO (IL)
BERNSTEIN BRADLEY (US)
GARBER MANUEL (US)
GOREN ALON (US)
RAM OREN (US)
REGEV AVIV (US)
SHORESH NOAM (US)
International Classes:
C12Q1/68
Foreign References:
US20070141583A12007-06-21
US20060084078A12006-04-20
Attorney, Agent or Firm:
VAN AMSTERDAM, John, R. (Greenfield & Sacks P.C.,600 Atlantic Avenu, Boston MA, US)
Download PDF:
Claims:
CLAIMS

1. A chromatin immuno-precipitation method for parallel processing of multiple samples, the method comprising:

a) cross-linking a chromatin-associated factor to chromatin,

b) shearing the cross-linked chromatin of (a) to provide nucleic acid fragments, c) contacting the chromatin-associated factor cross-linked to the nucleic acid

fragments of (b) with a first affinity molecule,

d) releasing the nucleic acid from the chromatin-associated factor and from the first affinity molecule,

e) contacting the released nucleic acid fragments in (d) with a second affinity

molecule,

f) releasing the nucleic acid fragments from the second affinity molecule, and g) optionally isolating the nucleic acid fragments, and

h) optionally analyzing the distribution and enrichment of the isolated nucleic acid fragments,

wherein the first affinity molecule and/or second affinity molecule optionally is coupled to a substrate suitable for parallel processing of multiple samples.

2. The method of claim 1, wherein contacting the nucleic acid fragments in (e) is carried out using an affinity interaction between the nucleic acid fragment and the second affinity molecule.

3. The method of claim 2, wherein the nucleic acid is suitably modified for this interaction.

4. The method of claim 3, wherein the modification is addition of poly-A tails or biotinylation.

5. The method of any one of claims 1 to 4, wherein the second affinity molecule is a poly-T oligonucleotide, avidin or streptavidin.

6. The method of any one of claims 1 to 5, wherein the second affinity molecule is silica.

7. The method of any one of claims 1 to 6, wherein the substrate is a surface of a bead or a well.

8. The method of claim 7, wherein the bead is a magnetic bead.

9. The method of any one of claims 1 to 8, wherein steps (e) and (f) are not carried out using a purification column or using phenol/chloroform extraction and ethanol precipitation.

10. The method of claim 6, wherein steps (e) and (f) are not carried out using a purification column comprising silica.

11. The method of any one of claims 1 to 10, wherein the format is a 6- well plate, a 12-well plate, a 24-well plate, a 96-well plate, a 384-well plate or a 1536-well plate.

12. The method of any one of claims 1 to 11, wherein the first affinity molecule in step (c) is an antibody that specifically binds a chromatin-associated factor cross-linked to the nucleic acid fragment.

13. The method of claim 12, wherein the antibody is coupled to a substrate.

14. The method of claim 13, wherein the substrate is a surface of a bead or a well.

15. The method of claim 13 or claim 14, wherein the substrate comprises protein A or protein G.

16. The method of any one of claims 13 to 15, wherein the chromatin-associated factor binds to the antibody before the antibody is coupled to the substrate.

17. The method of any one of claims 1 to 16, wherein the chromatin-associated factor comprises an affinity tag.

18. The method of claim 17, wherein the affinity tag is FLAG-tag, myc-tag, biotin or DHFR.

19. The method of claim 17 or claim 18, wherein the affinity molecule is an antibody that specifically binds the affinity tag, avidin or strep tavidin.

20. The method of claim 19, wherein the antibody is an anti-FLAG antibody, or an anti-myc antibody.

21. The method of any one of claims 1 to 20, wherein shearing in step (b) is carried out by sonication or micrococcal nuclease digestion.

22. The method of any one of claims 1 to 21, the method further comprising a step of analyzing the isolated nucleic acid fragments.

23 The method of claim 22, wherein analyzing the isolated nucleic acid fragments comprises determining the nucleotide sequence.

24. The method of claim 23, wherein the nucleotide sequence is determined using sequencing or hybridization techniques with or without amplification.

25. The method of claim 24, wherein the techniques are ChlP-Seq, real-time PCR, DNA microarray, or NANOSTRING® array.

26. A chromatin immuno-precipitation kit for parallel processing of multiple samples in a multi-well format, the kit comprising:

a) a multi-well plate comprising wells coated on an inside surface of the wells with a first affinity molecule that binds to a chromatin-associated factor, or is coated with a molecule that binds to the first affinity molecule, to form a first affinity surface, and

b) a multi-well plate coated with a second affinity molecule that has binds nucleic acids, or is coated with a molecule that binds to the second affinity molecule, to form a second affinity surface,

optionally further comprising a protein inhibitor, a cross-linking solution, a cell lysis buffer, a wash buffer, an elution buffer, and/or user instructions.

27. The chromatin immuno-precipitation kit of claim 26, wherein the kit comprises a single multi-well plate that comprises different wells for first and second affinity surfaces.

28. The chromatin immuno-precipitation kit of claim 26, wherein the kit comprises a single multi-well plate that comprises wells that have both first and second affinity surfaces.

29. A chromatin immuno-precipitation kit for parallel processing of multiple samples, the kit comprising:

a) a first bead coated with a first affinity molecule that binds to a chromatin- associated factor, or coated with a molecule that binds to the first affinity molecule, to form a first affinity surface, and

b) a second bead coated with a second affinity molecule that binds nucleic acids, or coated with a molecule that binds to the second affinity molecule, to form a second affinity surface,

optionally further comprising a multi-well plate, a protein inhibitor, a cross- linking solution, a cell lysis buffer, a wash buffer, an elution buffer, and/or user instructions.

30. The kit of claim 26 or claim 29, wherein the second affinity molecule comprises silica.

31. The kit of claim 26 or claim 29, wherein the second affinity molecule comprises a poly-T oligonucleotide, a poly-A oligonucleotide, avidin, streptavidin or biotin.

32. The kit of claim 26, wherein the multi-well plate is a 6-well plate, a 12-well plate, a 24-well plate, a 96-well plate, a 384-well plate or a 1536-well plate.

33. The kit of claims 26 or 29, wherein the molecule that binds to the first affinity molecule comprises protein A or protein G.

34. The kit of claims 26 or 29, wherein the first affinity molecule comprises an antibody that specifically binds to a chromatin-associated factor, an antibody that specifically binds to an affinity tag, avidin, streptavidin or biotin.

35. The kit of claim 34, wherein the affinity tag is FLAG-tag, myc-tag, biotin, or DHFR.

36. The kit of claim 34, wherein the wherein the antibody is an anti-FLAG antibody, an anti-myc antibody, or an anti-DHFR antibody.

37. The kit of claim 29, wherein the bead is a magnetic bead.

38. A method of preparing an indexed sequence library comprising:

(a) purifying or obtaining purified ChIP DNA processed using any one of the preceding methods;

(b) adding unique sequence identifiers to the purified ChIP DNA; and

(c) selecting the ChIP DNA in (b) based on size.

39. The method of claim 38, further comprising assessing the ChIP DNA in (c) for enriched molecular binding sites.

The method of claim 38 or claim 39, further comprising sequencing the ChIP

41. The method of any one of claims 1-25 or 38-40, wherein the method is performed using a multi-well format or a microfluidic chamber/channel.

42. The method of any one of claims 1-25 or 38-40, wherein the library is constructed on magnetic particles.

Description:
METHODS FOR CHROMATIN IMMUNO-PRECIPITATIONS

GOVERNMENT SUPPORT

This invention was made with government support under OD003958 awarded by the National Institutes of Health, under U54 HG004570 awarded by the National Institutes of Health, under U01 ES01715 awarded by the National Institutes of Health, and under

5DP1OD003958-05 awarded by the National Institutes of Health and the National Cancer Institute. The government has certain rights in the invention.

RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application serial number 61/387,689, filed September 29, 2010, which is incorporated by reference herein.

BACKGROUND OF INVENTION

Chromatin immuno-precipitation (ChIP) is a powerful tool for evaluating interaction of proteins with specific genomic DNA regions in vivo, to provide a better understanding of the mechanisms of gene regulation, DNA replication, and DNA repair. The ChIP technique involves fixative treatment of live cells with formaldehyde to chemically cross-link DNA- bound proteins. The cells are then lysed, and the chromatin is sheared mechanically or enzymatically, in order to reduce fragment size and increase resolution. The resultant sheared complexes are then immuno-precipitated with antibodies specific to the protein of interest, and the DNA fragments are analyzed, e.g., using real time PCR, sequencing, or microarray hybridization. The ChIP protocol, introduced in 1988 (Solomon MJ et al. Cell. (1988) 53(6):937-47), is widely used.

SUMMARY OF INVENTION

Current versions of the ChIP protocol may be inefficient both in terms of yields and time. Existing protocols can be difficult to standardize and significant differences in quality and yield can be observed among different laboratories that utilize this technique, which may be a result of different levels of technical skill. Many current protocols contain steps that require extensive hands-on time, which has prevented adapting these protocols for fully automated processing. There remains a need for a protocol that gives predictable and reproducible quality and yields that can be applied, in some instances, by a person of average, limited, or no skill in the art and/or that can be automated.

Provided herein, in some aspects, is a high-throughput indexed Chromatin Immuno- Precipitation (iChlP) method coupled to massively parallel sequencing to systematically map protein-DNA interactions. In certain embodiments, iChlP can be applied to reconstruct the physical regulatory landscape of a mammalian cell, by building genome-wide binding maps for, e.g., transcription factors (TFs) and chromatin marks at specific time points following stimulation of, e.g., primary dendritic cells (DCs) with pathogen components. In certain embodiments, the methods described herein may provide a foundation for future

understanding of the mammalian regulatory code.

Provided herein are iChlP methods that, in some embodiments, when performed give high quality immuno-precipitated material with yields superior to (e.g. , greater amounts than and/or higher purity than) existing protocols, including when processing multiple samples in parallel. In certain aspects, provided herein are iChlP methods that are adapted for semi- or full automation. For example, in some embodiments, the entire iChlP process can be performed, for example, in a single nanodroplet, using microfluidic technology. The iChlP methods described herein, in some embodiments, may be practiced by a person of average or limited skill in the art of chromatin immuno-precipitation or even an unskilled person. The iChlP methods provided herein, in some embodiments, reduce the time required to carry out the methods and/or reduce the sample to sample variability (e.g. , improve reproducibility) in quality and/or yield, when compared to existing ChIP methods.

Provided herein are improved chromatin immuno-precipitation (iChlP) methods for parallel processing of multiple samples (e.g., in a multi-well format), the methods include: a) cross-linking a chromatin-associated factor to chromatin,

b) shearing the cross-linked chromatin of (a) to provide nucleic acid fragments, c) contacting the chromatin-associated factor cross-linked to the nucleic acid fragments of (b) with a first affinity molecule,

d) releasing the nucleic acid from the chromatin-associated factor and from the first affinity molecule,

e) contacting the released nucleic acid fragments in (d) with a second affinity molecule, f) releasing the nucleic acid fragments from the second affinity molecule, and g) optionally isolating the nucleic acid fragments, and h) optionally analyzing the distribution and enrichment of the isolated nucleic acid fragments,

wherein the first affinity molecule and/or second affinity molecule optionally is coupled to a substrate suitable for parallel processing of multiple samples.

In certain embodiments, contacting of the nucleic acid fragments in (e) is carried out using an affinity interaction between the nucleic acid fragment and the second affinity molecule, optionally wherein the nucleic acid is suitably modified for this interaction. In certain embodiments, the modification of the nucleic acid is addition of poly-A tails or biotinylation. In certain embodiments, the second affinity molecule is a poly-T

oligonucleotide, avidin or streptavidin. In one embodiment, the second affinity molecule is silica. In certain embodiments, the substrate is a surface of a bead or a well, optionally, the bead is a magnetic bead (e.g., a bead coated in streptavidin).

In certain embodiments, steps (e) and (f) of the aforementioned methods are not carried out using a purification column or using phenol/chloroform extraction and ethanol precipitation. In certain embodiments, steps (e) and (f) are not carried out using a purification column comprising silica.

In certain embodiments, the aforementioned methods are carried out in a multiwall format that is a 6-well plate, a 12-well plate, a 24-well plate, a 96-well plate, a 384-well plate or a 1536- well plate.

In certain embodiments, the first affinity molecule in step (c) is an antibody that specifically binds a chromatin-associated factor cross-linked to the nucleic acid fragment. In certain embodiments, the antibody is coupled to a substrate and optionally, the substrate is a surface of a bead or a well. In certain embodiments, the substrate comprises protein A or protein G. In certain embodiments, the chromatin-associated factor binds to the afore- mentioned antibody before the antibody is coupled to the substrate. In certain embodiments, the chromatin-associated factor comprises an affinity tag. In certain embodiments, the affinity tag is FLAG-tag, myc-tag, biotin or DHFR. In certain embodiments, the affinity molecule is an antibody that specifically binds the affinity tag, avidin or streptavidin. In certain embodiments, the antibody is an anti-FLAG antibody, or an anti-myc antibody.

In certain embodiments, the aforementioned methods include a shearing in step (b) that is carried out by sonication or micrococcal nuclease digestion.

In certain embodiments, the aforementioned methods include a step of analyzing the isolated nucleic acid fragments. In one embodiment, analyzing the isolated nucleic acid fragments includes determining the nucleotide sequence. In certain embodiments, the nucleotide sequence is determined using sequencing or hybridization techniques with or without amplification, optionally such techniques are ChlP-Seq, real-time polymerase chain reaction (PCR), DNA microarray, or NANOSTRING ® array.

Further provided herein are improved chromatin immuno-precipitation (iChIP) kits for parallel processing of multiple samples in a multi-well format, wherein the kits include: a) a multi-well plate having wells coated on an inside surface of the wells with a first affinity molecule that binds to a chromatin-associated factor, or is coated with a molecule that binds to the first affinity molecule, to form a first affinity surface, and

b) a multi-well plate coated with a second affinity molecule that has binds nucleic acids, or is coated with a molecule that binds to the second affinity molecule, to form a second affinity surface. The kits optionally further include a protein inhibitor, a cross-linking solution, a cell lysis buffer, a wash buffer, an elution buffer, and/or user instructions or directions to obtain user instructions (e.g., via an internet website).

In certain embodiments, the aforementioned chromatin immuno-precipitation kits have a single multi-well plate that has different wells for first and second affinity surfaces or the kits have a single multi-well plate that has wells that have both first and second affinity surfaces.

Further provided herein are chromatin immuno-precipitation kits for parallel processing of multiple samples, wherein the kits include:

a) a first bead coated with a first affinity molecule that binds to a chromatin- associated factor, or coated with a molecule that binds to the first affinity molecule, to form a first affinity surface, and

b) a second bead coated with a second affinity molecule that binds nucleic acids, or coated with a molecule that binds to the second affinity molecule, to form a second affinity surface. Optionally, the bead is a magnetic bead. Optionally, the kits further include a multi- well plate, a protein inhibitor, a cross-linking solution, a cell lysis buffer, a wash buffer, an elution buffer, and/or user instructions.

In certain embodiments, the second affinity molecule comprises silica, poly-T oligonucleotide, a poly-A oligonucleotide, avidin, streptavidin, or biotin. In certain embodiments, the aforementioned kits have multi-well plates that are 6- well plates, 12- well plates, 24-well plates, 96-well plates, 384-well plates, or 1536-well plates. In certain embodiments, the molecule that binds to the first affinity molecule includes protein A or protein G. In certain embodiments, the first affinity molecule includes an antibody that specifically binds to a chromatin-associated factor, an antibody that specifically binds to an affinity tag, avidin, streptavidin, or biotin. In certain embodiments, the affinity tag is FLAG-tag, myc-tag, biotin, or DHFR. In certain embodiments, the antibody is an anti- FLAG antibody, an anti-myc antibody, or an anti-DHFR antibody.

In any one of the embodiments provided herein, immuno-precipitated chromatin may be prepared from about 5 to about 20 million cells, or more. In certain embodiments, a method described herein may further comprise collecting (e.g., harvesting) a sample of about 100 cells, 1000 cells, 10,000 cells, or 100,000 cells. In some embodiments, the sample comprises less than 100 cells, while in other embodiments, the sample comprises more than 100,000 cells. In certain embodiments, the sample comprises about 1 million to about 20 million cells, or more. Any one of the methods described herein may further comprise providing or obtaining chromatin and associated proteins prepared from a single sample of about 50 cells, about 100 cells, about 150 cells, about 200 cells, about 300 cells, about 400 cells, about 500 cells, about 1000 cells, about 2000 cells, about 3000 cells, about 4000 cells, about 5000 cells, about 10,000 cells, about 20,000 cells, about 30,000 cells, about 40,000 cells, about 50,000 cells, about 100,000 cells, about 200,000 cells, about 300,000 cells, about 400,000 cells, about 500,000 cells, or about 1,000,000 cells. In yet other embodiments, chromatin and associated proteins are prepared from a single sample of about 100 cells to about 10,000 cells, or about 10,000 cells to about 100,000 cells, or more.

Other aspects are directed to methods of preparing indexed sequence libraries comprising: (a) purifying or obtaining purified ChIP DNA processed using any one of the methods described herein; (b) adding unique sequence identifiers to the purified ChIP DNA; and (c) selecting the ChIP DNA in (b) based on size. In some embodiments, the methods further comprise assessing the ChIP DNA in (c) for enriched molecular binding sites. In some embodiments, the methods further comprising sequencing the ChIP DNA. In any one of the foregoing embodiments, a method can be performed in a multi-well format or a microfluidic chamber/channel. In any one of the foregoing embodiments, an indexed sequence library may be constructed on magnetic particles.

In certain aspects, the indexed sequence libraries can be used to screen and evaluate functional properties of nucleic acids and/or binding factors. In certain embodiments, the libraries may be used to identify or isolate nucleic acids and/or binding proteins of interest (e.g., promoters, enhancers, transcription factors/regulators).

In some embodiments, an indexed sequence library may comprise a plurality of nucleic acid fragments having unique sequence identifiers (e.g., each fragment of a selected nucleic acid may be associated with a sequence identifier, for example, a unique sequence identifier).

In certain embodiments, provided herein are kits for preparing an indexed sequence library comprising any one or more of the reagents described in any one or more of the foregoing embodiments.

BRIEF DESCRIPTION OF THE FIGURES

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.

Figure 1 shows a high throughput indexed chromatin immuno-precipitation (iChIP) pipeline. Figure 1A shows a blueprint of the iChIP pipeline. Top: Protein-DNA fragments are precipitated using antibody coupled magnetic beads in 96- well plates. Middle:

Precipitated DNA is purified using magnetic beads, indexed adapters are ligated and DNA is size selected to generate sequencing libraries. Bottom: Samples are validated using ChlP- String; successful samples are pooled and sequenced. Figure IB shows iChlP-String validation. Nanostring probes target active regulatory regions of a signature gene set.

Comparison of (a) ChlP-Seq (b) or ChlP-String for K4mel, K4me3, Pol-II, Relb, Nfkb2 and Nfkbl, and Atf3. Figure 1C shows a strategy for ab initio transcription factor (TF)-DNA binding maps. The strategy consists of four steps: (1) Expression analysis using RNA-Seq, (2) Selection of expressed TFs, (3) Screening for all potential ChlP-Seq antibodies, and (4) ChlP'ing in appropriate time points all validated TF targets.

Figure 2 shows an Epigenetic and transcription factor binding landscape. Figure 2A shows Integrative Genomics Viewer (IGV) tracks, including enhancer and promoter calls of the Illa-b loci showing "compressed" alignments for various TFs. Call-out boxes show time course data for selected factors. Figure 2B shows the total number of high scoring peaks (enrichment score > 20) of Ctcf, Pol-II, the 29 TFs and the three chromatin marks. Each pie chart shows the distribution of peaks across promoter, 3 'UTR, exonic, intronic, enhancer and unannotated, regions. The total number of peaks is shown in parenthesis and the most significant motif found by de novo motif discovery at these peaks is shown below whenever available. Factors in italics indicate that the motif shown is not the canonical binding motif for the factor and factors in light grey indicate they are one of the highest scoring Pol-II runners.

Figure 3 shows Runxl 3' end binding. Figure 3A shows an overview of the Cxcl2 inflammatory gene loci showing time course RNA-Seq and ChlP-Seq data for selected factors, including Runxl. Figure 3B shows a comparison of the enrichment of promoter and 3' end bound Runxl targets in inflammatory (dark gray) vs. anti-viral genes (light gray). Figure 3C shows cumulative distribution plots of Runxl peak scores at promoters and 3' end. Figure 3D shows cumulative distribution plots of the expression (in RPKM) of genes bound by Runxl at the promoter and 3' end. Figure 3E shows the median score of Runxl peaks at the promoter, 3' end and enhancer across the LPS response time course. Figure 3F shows the expression fold change in Runxl knockdown dendritic cells (DC) 6 hr post stimulation (compared to nontargeting short hairpin (sh)RNA (Amit, et al. Science, 326: 257-263, 2009) of significantly down and up regulated genes. Blue and red starts indicate genes that are bound by Runxl at the 3' end and promoter respectively.

Figure 4 shows co-binding of TFs in regulatory regions. Figure 4A. The degree of a regulatory region is defined as the number of TF bound to it. The heatmaps show for every TF a distribution of the degrees associated with its bound regions. The left heatmap shows the original data while the right plot is obtained from a random process in which the degree of every region is proportional to its length. Figure 4B shows TF co-binding at similar regions. Significant TF pairs (p<10-3) are shade-coded by their respective fold enrichment. Selected overlaps are highlighted.

Figure 5 shows dynamics of TF binding. Figure 5 A shows Ifit locus showing our "compressed" alignments for various TFs. Call-out boxes showing time course data for selected factors. Figure 5B shows a bar plot showing the fraction of TF peaks gained (>3 fold increase compared to the unstimulated state; left plot) or lost (>3 fold decrease; right plot) during the response to LPS. Each bar is subdivided and colored to represent the fraction of peaks that are gained (lost) at each time point.

Figure 6 shows connecting TF binding with gene expression. Figure 6A shows a schematic example of region annotation and association of regions to genes. Top: two typical genes (in black and white), gene 2 has a previously unannotated alternative start site discovered through RNA-Seq. Middle: Promoters were defined as H3K4me3 rich regions (H3K4me3+) that either overlap an existing annotation or a reconstructed transcript.

Similarly enhancers were associated with TF-bound H3K4mel rich regions (H3K4mel+). Bottom: Both gene 1 and 2 are within 150 kb away from the annotated enhancer, however, the enhancer is associated with gene 2 since its promoter shares a common TF with the enhancer. Bottom right: A cartoon model of looping between the annotated enhancer and the promoter of gene 2. Figure 6B shows binding of TF (x-axis) at regulatory elements of genes (y-axis); black cells indicate no change in binding over time; red cells are increased binding and blue cells are decreased binding. Genes were clustered into 8 groups based on their binding profile. On the left, indicate are clusters that are enriched (p<10-10) in anti-viral, inflamatory, early induced (induce within lhour), and late-induced genes (induced after 2 hours). Figure 6C shows enrichment of factor binding at inflammatory (dark gray) or antiviral (light gray) genes. Displayed values are -log 10 of the hypergeometric p-value. Figure 6D shows a bar plot showing the percentage of induced genes (>2 fold change compared to basal state) and average transcription levels (taking the maximum across time) in sets of genes with similar numbers of bound TFs (using the compressed binding data). Figure 6E shows a matrix showing percent of gain (red) or loss (blue) of binding on late (> two hours, top row) intermediate (between the first and second hours, middle row) and early (up to one hour, bottom row) induced genes. Gain or loss events that occurred after the induction of the gene were ignored. Only significant enrichments are shown.

Figure 7 shows feedbacks and robustness in the DC transcriptional network.

Figure 7A shows regulatory interaction between the various TFs. Top: TF binding during the unstimulated state (t=0). Bottom. Changes in interactions after stimulation (t>0). Gain of binding is depicted in red. Loss of binding is depicted in blue. Figure 7B: Top: Binary matrix indicating TF binding at the signature immune genes (Amit et al., 2009); Bottom: Heat map showing expression in DCs infected with TF targeting shRNAs (labels of the TF are shown below) compared to DC infected with control shRNA 6 hours post stimulation (Amit et al., 2009). Figure 7C: Top: Percent of functional binding: for each TF presented are the number of bound genes that are affected by its knockdown divided by the total number of bound genes; Bottom: percent of indirect effect. For each TF, presented the number of non-bound genes that are affected by its knockdown divided by the total number of affected genes. The analysis is limited to the signature set of genes (Amit et al., 2009).

Figure 8 shows diversity of binding properties suggests a layered TF architecture Figure 8A shows principle components analysis performed with a binding characteristic matrix which includes: the number of bound regions, Turnover score, ratio of enhancer to promoter binding, Pol-II runner score and the fraction of high scoring motif matches that are bound by the TF. The plot depicts the projections of the TFs and the loading of the different covariates for the first three principle components. Figure 8B shows a model depicting the layered TF network architecture: pioneer factors initially bind and initiate remodeling of the epigenome, strong binders prime targets for expression and specific TFs control expression of smaller subsets of genes.

Figure 9 compares traditional ChIP and iChlP. Figure 9A shows that Pol-II precipitated DNA (1 ng) was split through the traditional ChlP-Seq and iChlP (HT-ChIP- Seq) library production process. Matrix showing correlation of the two-library production process. Figure 9B shows IGV tracks showing the Zfp36 locus for the traditional ChlP-Seq (red) and iChlP library production process.

Figure 10 shows a Western blot for the ChIP TFs. Cell lysates from DC activated for 2 hours with LPS where subjected to Western blotting (WB) using the indicated antibodies.

Figure 11 shows percentage of transcriptional changes (induction/ repression) during the first two hours post stimulation. A gene is defined to be induced (repressed) if it increased (decreased) by at least 2-fold change. The cumulative plots depict, for every time point (x-axis) the percentage of induced genes that already showed at least 2-fold change (y- axis). Panel A was computed with the RNA-Seq data, panel B with the 4SU-seq and panel C with Pol-II binding data.

Figure 12 shows an example of previously unannotated promoter at the Ncoa6 and Lhx2 loci. Combining RNA-Seq reconstruction with iChlP of chromatin marks reveals start sites and promoter regions. Figure 12A shows the Ncoa6 gene. The tracks in top-down order show: 1) Annotations in the RefSeq database (black), 2) Reconstructed transcripts using total RNA-Seq data (blue), promoters called by our annotation pipeline (gray box), the arrow points to the annotated promoter for Ncoa6, 3,4) RNA-Seq read density plots obtained from DCs before LPS stimulation and four hours post stimulation (blue), 5,6,7) iChlP read density plots for "compressed" data for H3K4mel, H3K4m3 and K4k27Ac (gray). Figure 12B shows the Lhx2 locus. In addition to the tracks displayed in Figure 12A, enhancer annotations and iChlP binding data for Statl and PU.l were included. Black arrows indicate promoters associated with genes through RNA-Seq reconstructions, red arrows indicate promoters associated with unannotated transcripts reconstructed from RNASeq. Figures 13A-13G show scatter plots of peak scores across the timecourse. Scatter plots show peaks that scored above an enrichment cutoff of 5 for chromatin marks and above a cutoff of 10 for TFs in at least one of the libraries. Red lines indicate the x=y line and the 2 fold threshold for the chromatin plots and the 3 fold threshold for the TF plots. Figure 13F also includes scatter plots for Statl biological replicates at 2 hours post stimulation. The scatter plots shown in Figure 13F compare two biological replicates against the library used in the main analysis.

Figure 14 shows the most significant motifs found. A summary of all motifs found by applying MEME to the set of high scoring peaks (enrichment > 30) for each factor.

Motifs shown have an E- value > 0.01.

Figure 15 shows TF "running" with Pol2. Figure 15A shows IGV browser tracks of the IL1-1B locus for RNA-Seq, Pol-II, Chromatin marks, E2fl, and PU.l at indicated time points. Figure 15B shows a bar plot showing the running score for each TF and for Pol-II (brown) and Ctcf (black), included respectively as positive and negative controls of association with Pol-II. Error bars were computed using the standard error. Figure 14C shows a Western blot of co-immuno precipitations of Statl and PU. l with Pol-II and IgG (control), respectively.

Figure 16 shows the distribution of TF binding in regulatory regions. Figure 16A shows the average degree (#bound TF) of bound regions in the original data (red bars) and in the randomized data (blue bars. Figure 16B shows the percentage of bound

regions that have a degree of 1 - observed (red asterisk) and expected from the randomized data (blue bars). Figure 16C shows that for each degree value (x-axis), the figure depicts the expected ratio of regions with this degree (y-axis). Plots are shown for the original data (red), and the randomized data (blue).

Figure 17 shows turnover score with different fold cutoffs. The percentage of dynamic changes in binding (gain or loss) with different fold cutoffs. For each TF, f, and each cutoff level, x, the figure depicts the percentage of genes bound by f that have more than x-fold change (up/down) in their binding enrichment score, in at least one time point, compared to the basal state.

Figure 18 shows TF binding in enhancers and promoters. Figure 18A show a heatmap of TF binding (columns) in promoters (rows). Figure 18B shows the number of promoters bound by every TF. Figure 18C shows a heatmap of TF binding (columns) in enhancers (rows). Figure 18D shows the number of enhancers bound by every TF. Figure 19 shows an example of our enhancer and promoter annotation: the cis regulatory landscape of the Tnfaip3 locus. IGV browser tracks for 150 Kb regulatory region of the Tnfap3 gene showing iChIP compressed data for Pol-II, histone modifications and 24 transcription factors. The example highlights the shared enrichment for TF ChIP data at enhancers and promoters.

Figure 20 shows preference for binding at induced genes or at highly expressed genes at basal state. Figure 20A shows percent genes bound by each TF (x-axis) at groups of genes with different expression levels (1 to 5th quantile of fold-change using the 4SU-Seq data). The average level of expression in each bin is depicted on the left. Non-significant entries are depicted in gray. Figure 20B shows a similar analysis as in Figure 20A, where the genes are grouped by their post-stimulation induction level (using the 4SU-Seq data). Figure 20C shows a similar analysis as in panel Figure 20A, where the genes are grouped by their induction level, only considering genes with low basal transcription level (lower than the value of the 3-rd quantile (out of 10)).

Figure 21 shows Irf2 knockdown and binding. Heat map of genes affected by Irf2 knockdown at the basal state. The cells were compared to DCs infected with non-targeting shRNA at the basal state. Colors represent effect of knockdown on expression levels measured with nanostring nCounter ® . The second column indicates in black genes that are directly bound by Irf2 at the basal state.

Figure 22 shows enriched TF binding at different clusters of co- transcribed genes mRNA transcription (using 4SU-Seq) and Pol-II binding are shown for clusters of genes with a similar temporal profile. Significant binding of TFs to the different clusters are shown across 4 time points of LPS stimulation (0, 0.5 1, 2, hours). The colors in the heat maps represent the percentage of bound genes at the respective time point.

Figure 23 shows the Histone gene clusters are repressed following LPS stimulus.

IGV browser tracks for the Histone gene cluster for the indicated sequencing libraries (RNA- Seq, Pol-II, Chromatin marks, and various TFs as indicated) and time points.

Figure 24 is a schematic that provides an overview of the iChIP method and antibody validation.

Figure 25 is a schematic providing an overview of the NANOSTRING ® platform and quantitative analysis using fluorescently labeled reporter probes and non-labeled capture probes. Figure 26A shows plots of data from iChlP DNA that was run on ChlP-string and was compared to ChlP-seq data of the same cells. Figure 26B shows ChlP-string probes that were designed for specific chromatin states according to an HMM tool (Ernst, et al. , Nature biotechnology (2010) 28:817-825). The left-most column is a representation of 198 informative probes of five different chromatin states, as follows: blue - bivalent: H3K27me3, H3K4me2, H3K4me3; -initiation: H3K4me2, H3K4me3, H3K9Ac, H3K27Ac; gold - enhancers: H3K4mel, H3K4me2, H3K27Ac; orange - heterochromatin: H3K9me3; purple - silencing: H3K27me3. The remainder of the columns represent clustered Z-scores discretization of the enrichments of different probes for 122 ChIP antibodies (in the order as set forth in Table 1) shown as gradient of red, in which, higher Z-scores marked as darker red. Circles - Clusters Al-3 (Initiation): H3K4me3 / H3K9ac / RBbP5 / CHD1 / SIRT6 / KAT3A / Ashl / Plul / JHDM1D / Sap30 / RNApolII(5s) / MLL4; Purple circle - Cluster B (Silencing): H3K27me3 / EZH2 / SUZ12 / HDAC1; Gold circle - Cluster C (Enhancers): H3K79me2 / KMT4 / Ring IB / KMT4 / NSD2. Arrows indicate antibodies that were followed up by a successful ChlP-seq. Figure 26C depicts genomic viewer examples of these data (top - HG19, chrl3:91,986,894-92,018,706, bottom chr9: 133,556,867- 134,180,979).

Figure 27 shows antibody correlation within the SCN and with the relevant 20 million cell ChlP-string data.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS OF THE INVENTION

Provided herein, in certain aspects, is a high-throughput indexed (iChlP) method for systematic mapping of in vivo protein-DNA binding that greatly increases the throughput, while significantly reducing the labor and cost required for ChlP-Seq. In some aspects the iChlP method is automated. In certain aspects, iChlP uses, e.g., magnetic beads for chromatin immunoprecipitation and DNA purifications, thus, in some instances, eliminating the need for laborious manual washes, DNA purification and gel extraction steps. In some embodiments, the entire iChlP process can performed in the same well (or chamber, channel, droplet, or the like), in some instances, reducing sample loss due to, for example, transfer of material. In some embodiments, iChlP further leverages the yield of current next-generation sequencing by multiplexing an arbitrary number of different indexed sequencing adapters to combine samples in, for example, a single flow cell. Provided herein are iChIP methods that produce high quality immuno-precipitated material with yields superior to (e.g., greater than) existing protocols. It was found that a second purification step introduced in the iChIP protocol significantly improves the quality and/or yield of the immuno-precipitated material to an unexpected degree when compared to existing protocols, allowing, for example, subsequent analyses of low abundance chromatin- associated factors using methods such as ChlP-Seq and ChlP-on-chip.

In certain embodiments, the second purification step involves the purification of nucleic acid fragments after they have been released from the specific chromatin-associated factor and/or antibody with which or to which the nucleic acid fragments were bound.

Purification of the nucleic acid fragments by means of an affinity molecule after the release of the fragments, in some embodiments, enhances the quality and/or yield of the nucleic acids. The purified nucleic acids can provide a starting material for subsequent analyses that is unexpectedly superior (e.g. , higher quality material, greater yield of material) to the starting material obtained when using existing ChIP methods. For example, the material is suitable for the detection of binding sites or regions on the chromatin of low abundance chromatin-associated factors using methods such as ChlP-Seq and ChlP-on-chip. The purified nucleic acids can provide starting material for generating sequencing libraries, described below.

In certain embodiments, the purification steps comprise immobilizing the nucleic acid fragments on a substrate, such as a bead, membrane, or surface (e.g. , of a well or tube) that is coated with an affinity molecule suitable for immobilizing the nucleic acid fragments. In certain embodiments, the affinity molecule is silica or carboxyl-coated magnetic beads (SPRI beads). In certain embodiments, the library (e.g. , for next generation sequencing

applications, such as Illumina sequencing (Illumina Inc., San Diego, CA)) is constructed on magnetic particles removing the necessity for DNA purification prior to the ligation of adapters and amplification (e.g., for next generation sequencing applications, such as Illumina). The same DNA absorbing magnetic beads can then be used to purify the resulting library. In some embodiments, a further advantage of providing an affinity surface in a well or as a bead, e.g. , a magnetic bead, is that the iChIP protocol may be adapted for parallel processing of multiple samples, such as in a 96-well format or microfluidic platform, from starting chromatin material to the end of a sequencing library construction and purification. Existing protocols may suggest phenol-chloroform extraction or column-based nucleic acid purification, which can lead to a substantial loss of sample material and such methods may not be adaptable to a high-throughput format.

DNA binding proteins and chromatin modifiers can be difficult to detect reliably using ChIP protocols because of their relative low abundance on the chromatin relative to, for example, many histone tail modifications, such as H3K4me3. ChIP performed on such abundant modifications can be very efficient and robust. A high percentage of the chromatin may be in association with modified histones. Moreover, as the DNA is wrapped tightly around histones (e.g. , the nucleosome octamer), the DNA yield enriched in such experiments can be relatively high, and suffices for any downstream processes.

DNA binding proteins and chromatin modifiers (or other proteins that do not bind the

DNA itself, and are only a part of a complex that binds the DNA, e.g. , chromatin-associated factors) are orders of magnitude less abundant across the genome and the DNA interactions of the DNA-binding proteins and associated factors are much weaker when compared to histones. The low abundance and the weak interactions with DNA are among the factors that may make a ChIP for DNA-binding proteins more susceptible to small variations and a higher sensitivity is required to obtain accurate data. Current methods with their inherent shortcomings in reproducibility and/or sensitivity may not allow for a large scale screen of DNA binding proteins and chromatin modifiers. Further factors that influence the sensitivity of the ChIP assay are, for example (1) the shearing process, which may be more sensitive to small differences when fragmenting chromatin with DNA binding proteins and may contribute to the difficulty of obtaining sufficient amounts of DNA that were in association with the DNA binding proteins; and (2) the very low amounts of DNA that can be obtained by ChIP of DNA binding proteins and chromatin modifiers may lower the overall yield. Very low yields can make it difficult to purify the DNA, a step which is often necessary for subsequent analysis. The low DNA yield generally obtained for ChIP assays involving DNA binding proteins and chromatin modifiers that are carried out using existing ChIP protocols can result in low reproducibility between repeats and can make it difficult to obtain reliable and unbiased data. ChIP assays using antibodies directed to histone modifications usually yield sufficient DNA and the yield may be, for example, about two orders of magnitude higher than the yield from ChIP assays involving DNA binding proteins and chromatin modifiers. Due to the relatively higher DNA yield, ChIP assays involving histone modifications exhibit relatively lower susceptibility to small experimental variations, which makes such assays less prone to experimental biases. Further, existing protocols can be inefficient, time consuming and difficult if not impossible to scale it up to allow parallel processing of larger sample sizes, such as is needed in high throughput screening.

Currently available ChIP protocols and/or commercially available ChIP kits are not optimal for high throughput ChIP screening. They do not provide sufficient sensitivity and/or reproducibility needed to screen large numbers of DNA binding proteins and chromatin modifiers. Provided herein, in some embodiments, are iChIP methods to obtain high quality ChlP-DNA (iChlP-DNA). In certain embodiments, the methods can be carried out easily and data can be obtained reproducibly. In certain embodiments, these methods are used to screen large numbers of DNA binding proteins and/or chromatin modifiers. In certain embodiments, the methods provided are used to screen 10, 50, 100, 200, 500, 750, or 1000, or more DNA binding proteins and/or chromatin regulators (CRs) and modified forms thereof. Modified forms include, but are not limited to, mutants and post-translationally modified DNA binding proteins and/or chromatin modifiers.

In certain embodiments, the methods provided are used to screen one or more of the following DNA binding proteins and/or chromatin modifiers and modified forms thereof: ASH1L, ASH2, ATF2, ASXL1, BAP1, bcllO, Bmil, BRG1, CARM1, KAT3A/CBP, CDC73, CHD1, CHD2, CTCF, DNMT1, DOTL1, EHMT1, ESET, EZH1, EZH2, FBXL10, FRP(Plu- l), HDAC1, HDAC2, HMGA1, hnRNPAl, HP1 gamma, Hsetlb, JaridlA, JaridlC, KIAA1718_JHDM1D, KAT5, KMT4, LSD1, NFKB P100, NSD2, MBD2, MBD3, MLL2, MLL4, P300, pRB, RbAP46/48, RBP1, RbBP5, RING1B, RNApolII P S2, RNApolII P S5, ROC1, sap30, setDB l, Sf3bl, SIRT1, Sirt6, SMYD1, SP1, SUV39H1, SUZ12, TCF4, TET1,TRRAP, TRX2, WDR5, WDR77, and/or YY1. Antibodies for these DNA binding proteins and/or chromatin modifiers are commercially available. Exemplary antibodies that are commercially available are set forth in Tables 1 and 3.

In certain aspects, provided herein are iChIP methods that are adapted to be suitable for semi- or full automation. In some embodiments, the ChIP methods described herein may be practiced by a person of average or limited skill in the art of chromatin immuno- precipitation or even an unskilled person. The iChIP methods provided herein, in some embodiments, reduce the time required to carry out the methods and/or reduce the sample to sample variability (e.g. , improve reproducibility) in quality and/or yield, when compared to existing ChIP methods. In certain aspects, provided herein are iChIP methods that are adapted to be suitable for microfluidic applications, for example, for use with DNA chips (microarrays), lab-on-chip technology, micro-propulsion, micro-thermal technologies, continuous-flow microfluidics, and/or digital (droplet-based) microfluidics.

Improved reproducibility as used herein means that sample-to-sample variation within an individual experiment or between different experiments is reduced compared to existing protocols and allows better sample comparison. For example, a standard or positive control may give a similar signal (or is positive) about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, or more of the time when conducting or repeating an experiment using the methods provided herein, given that the methods are carried out in the same manner (e.g. , incubation times) and the concentrations used are about the same. Low reproducibility is indicated when a standard or positive control gives a similar signal (or is positive) in less than 75% of the time when conducting or repeating an experiment, e.g. , 70%, 60%, 50%, 40%, 30%, or less. The signals that are detected and controls that are used may vary depending on the assays and applications used. In certain embodiments, the iChIP methods described herein provide improvements with regard to the quality and/or quantity of the immuno- precipitated material, so that false negative sample analysis is reduced. A false negative sample analysis, as used herein, means that a sample when analyzed (e.g. , subjected to PCR amplification, sequencing, ChlP-Seq, ChlP-on-ChIP) may not give a signal that can be unequivocally distinguished from the background or a negative control, e.g. , the signal may not show a difference that is statistically significant when compared to the background signal or a signal from a negative control. Such sample may then be considered a negative sample. The negative sample is a false negative sample when the sample prepared under different conditions would give a signal that is distinguishable from the background signal or from the signal of a negative control. Conditions that may cause a sample to be false negative sample are, e.g. , insufficient sensitivity of the detection method, insufficient quantity of starting material for the detection, and/or insufficient quality of the starting material for the detection. In certain embodiments, provided herein are iChIP methods that when performed provide high quality immuno-precipitated materials. In certain embodiments, the purity of the immuno-precipitated materials may be greater than 70%, 75%, 80%, 85%, 90%, 95%, 98%, or more. In some embodiments, the purity it about 80% to about 90 % pure, about 90% to about 95%, about 90% to about 98%, about 90 to about 100%, about 95% to about 98%, or about 95% to about 100% pure. In particular embodiments, the purity of the immuno- precipitated materials may be 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%.

In some embodiments, the iChIP methods provided herein improve yield of immuno- precipitated material. Improved yield, as used herein, means a yield of immuno-precipitated material (e.g. , chromatin/DNA) that is greater compared to existing protocols and allows the subsequent detection of low abundance signal that would not be detectable or would give a signal that is not significantly different or not statistically different from a background signal using immuno-precipitated material from existing protocols. In some embodiments, the yield of immuno-precipitated material produced by the methods described herein, may be improved over existing methods by about 10% to about 500%, or about 10% to about 100%. In some embodiments, the improvement is by about 10% to about 25% or about 25% to about 50%. In other embodiments, the yield is greater improved by than 10%, greater than 20%, greater than 30%, greater than 40%, greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90%, greater than 100%, greater than 200%, or more. For example, using the iChIP methods described herein, chromatin-bound factors may be immuno-precipitated that are of low abundance in the chromatin fraction, e.g. , certain transcription factors. Chromatin (DNA) binding sites of such factors may not be identified using existing protocols. In contrast, immuno-precipitation using antibodies specific for certain abundant histone modifications (e.g. , certain histone H3 tail modifications) usually produce very robust signals and are easier to detect. The signals that are detected may vary depending on the assays and applications used.

Low abundance chromatin-associated factors, as used herein, are factors that can be found at one or more sites on the chromatin and/or that may associate with chromatin in a transient manner. Examples of low abundance chromatin-associated factors include, but are not limited to, transcription factors (e.g. , tumor suppressors, oncogenes, cell cycle regulators, development and/or differentiation factors, general transcription factors (TFs)), activator (e.g. , histone acetyl transferase (HAT)) complexes, repressor (e.g. , histone deacetylase (HDAC)) complexes, co-activators, co-repressors, other chromatin-remodelers, e.g. , histone (de-) methylases, DNA methylases, replication factors and the like. Such factors may interact with the chromatin (DNA, histones) at particular phases of the cell cycle (e.g., Gl, S, G2, M- phase), upon certain environmental cues (e.g., growth and other stimulating signals, DNA damage signals, cell death signals) upon transfection and transient or stable expression (e.g., recombinant factors) or upon infection (e.g., viral factors). Abundant factors are constituents of the chromatin, e.g., histones. Histones may be modified at histone tails through posttranslational modifications which alter their interaction with DNA and nuclear proteins and influence for example gene regulation, DNA repair and chromosome condensation. The H3 and H4 histones have long tails protruding from the nucleosome which can be covalently modified, for example by methylation, acetylation, phosphorylation, ubiquitination, sumoylation, citrullination and ADP-ribosylation. The core of the histones H2A and H2B can also be modified. Combinations of modifications are thought to constitute the so-called "histone code" (Strahl and Allis (2000) Nature 403 (6765): 41-5; Jenuwein and Allis (2001) Science 293 (5532): 1074-80). Such modifications can also be analyzed by ChlP.

In certain embodiments, ChlP methods are provided that allow sample processing in a high-throughput manner. For example, 10, 50, 100, 200, 500, 750, 1000, or more chromatin- associated factors and/or chromatin modifications may be immuno-precipitated and/or analyzed in parallel. In one embodiment, up to 96 samples may be processed at once, using e.g., a 96-well plate. In other embodiments, fewer or more samples may be processed, using e.g., 6-well, 12-well, 32-well, 384-well or 1536-well plates. In some embodiments, ChlP methods are provided that can be carried out in tubes, such as, for example, common 1.5 ml, 2.0 ml, 15 ml, 50 ml size tubes. These tubes may be arrayed in tube racks, floats or other holding devices.

For any one of the embodiments described herein, the immune-precipitated chromatin may be prepared from harvested cells (e.g., subsequently subjected to sonication). In certain embodiments, the immune-precipitated chromatin may be prepared from a single sample of about 1 million to about 20 million cells, or more. In certain embodiments, immune- precipitated chromatin may be prepared from a single sample of about the 5 cells to about 1 million cells. In particular embodiments, a sample may comprise about 50 cells, about 100 cells, about 150 cells, about 200 cells, about 300 cells, about 400 cells, about 500 cells, about 1000 cells, about 2000 cells, about 3000 cells, about 4000 cells, about 5000 cells, about 10,000 cells, about 20,000 cells, about 30,000 cells, about 40,000 cells, about 50,000 cells, about 100,000 cells, about 200,000 cells, about 300,000 cells, about 400,000 cells, about 500,000 cells, or about 1,000,000 cells. In some embodiments, a sample may comprise about 100 cells to about 10,000 cells, or about 10,000 cells to about 100,000 cells, or more. In certain aspects, methods for ChlP are provided. In certain embodiments, the methods comprise optimized sonication conditions for shearing the immuno-precipitated chromatin. Commonly, in order to keep the sample cold and avoid overheating by the sonication waves, sample tubes are held on ice. This practice can introduce experimental variation because the ice itself changes during the sonication, and between experiment to experiment, which may limit the reproducibility of the process. Moreover, as the sonication waves are producing heat the sample temperature can change during the process, in an uncontrolled manner, which can also increase the variations between repeats. Further, as the temperature of the sample is not kept constant, the state of different molecular areas within the tube can change in an uncontrolled way. Another difficulty is that the samples commonly are not located in a specific position, e.g., due to the variations in the amounts of ice and/or due to inaccuracy with the placement of the tube. This may lead to sample-to-sample variation.

In certain embodiments, the methods comprise optimized sonication conditions for shearing the immuno-precipitated chromatin, wherein the optimization includes, but is not limited to, optimizing: i) sample cooling conditions, ii) sample volume, iii) probe size, iv) sample/probe contact, v) total duration of sonication, vi) duration of sonication pulse, vii) duration of cooling down phase between pulses, viii) pulse intensity, ix) pulse frequency, and/or x) pulse cycles. Specific optimization steps are described herein. In certain embodiments, the exact probe location is maintained in all samples, that is, the three- dimensional spatial orientation of the probe in a well or tube: a) the distance from the bottom of the well or tube to the tip of the probe; b) the distance from any of the walls of the well or tube to the tip of the probe; c) the distance of the tip of the probe to the surface of the sample liquid (i.e., the immersion depth of the probe). The exact spatial orientation can be set and maintained using mechanical or electromechanical devices, for example, by using levers that control the insertion of the sonication probe into the tube or well.

In certain embodiments, the temperature of the sample and/or cooling device is kept constant, e.g., by controlling the flow of coolant, the thermal capacity of the coolant, the size of the contact surface of the cooling device and the well or tube and/or adapting the thickness of the well or tube to optimize thermal exchange between the cooling device and the well of tube. Examples of sonication conditions include:

- use 45% amplitude on a Branson SONIFIER™ S-450D

(www . sonif ier . com/s450_digital .asp)

- three repeats of a pulse of 0.7 seconds and a pause of 1.3 seconds, wherein the time per repeat is adjusted according to the cell source used. For example, the time per repeat for dendritic cells is 5 minutes (a total of 15 minutes of sonication), for embryonic stem cells is 5.5 minutes (16.5 minutes total), and for K-562 cells (immortalized myelogenous leukemia line) is 4.5 minutes (13.5 minutes total).

In certain embodiments, the number of repeats are one, two, four or five. In certain embodiments, the amplitude is 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, or 65%. In certain embodiments, the pulse is 0.3, 0.4, 0.5, 0.6, 0.8, 0.9, 1.0 or 1.1 seconds. In certain embodiments, the pause is 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 or 2.0 seconds. In certain embodiments, the pause is 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 seconds. In certain embodiments, the time per repeat is 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5 or 10 minutes. It would be appreciated by one of ordinary skill that the sonicator described herein can be substituted with a different make or model and that the amplitude, pulse and pause time can routinely be adjusted according to the specification of the substitute sonicator device.

Other cell lines that may be subjected to the sonication conditions described herein include, but are not limited to, 293-T (human kidney), 3T3 (mouse embryonic fibroblast), 721 (human melanoma), 9L (rat glioblastoma), A2780 (human ovary), A172 (human glioblastoma), A20 (murine B lymphoma), A253 (human head and neck carcinoma), A431 (human skin epithelium squamous carcinoma), A-549 (human lung carcinoma epithelium), ALC (murine bone marrow stroma), B16 (murine melanoma), B35 (rat neuroblastoma),

BHK-21 (hamster kidney fibroblast), BR 293 (human breast cancer), Cal-27 (human tongue squamous cell carcinoma), CHO (Chinese hamster ovary), COR-L23 (human lung), COS-7 (Cercopithecus aethiops, kidney fibroblast), COV-434 (human ovary metastatic granulosa cell carcinoma), CML Tl (chronic myeloid leukaemia T- lymphocyte 1), CMT (canine mammary gland tumor epithelium), CT26 (murine colorectal carcinoma), D17 (canine osteosarcoma), EL4 (mouse T cell leukemia), FM3 (human lymph node melanoma), H1299 (human lung cancer), H69 (human lung), HEK-293 (human embryonic kidney), HeLa (human cervical cancer epithelium), HL-60 (human leukemia myeloblast), HMEC (human mammary epithelial), HT-29 (human colon epithelium), Jurkat (human T-cell-leukemia), K562 (CML), Ku812 (human erythroleukemia), KCL22 (human CML), LNCap (human lymph node cancer), MC-38 (mouse adenocarcinoma), MCF-7 (Michigan Cancer

Foundation-7, human mammary gland carcinoma), NALM-1 (CML), NW-145 (melanoma), RenCa (mouse renal carcinoma), RIN-5F (mouse pancreas), Saos-2 cells (human osteosarcoma), SkBr3 (human breast carcinoma), T-47D (human mammary gland

carcinoma), T84 (human colorectal carcinoma), THP1 (human AML), U373 (human glioblastoma-astrocytoma), U937 (human lymphoma), and WT-49 (human lymphoblastoid).

In certain embodiments, the methods for iChIP comprise introducing a nucleic acid purification step of immuno-precipitated chromatin. The nucleic acid purification step comprises attaching the nucleic acid to a suitable substrate or object as a means of separating and isolating the nucleic acid from a reaction mix (e.g., comprising buffers and/or proteins). In certain embodiments, the substrate is a silica-coated surface or object. In other

embodiments, the substrate or object is coated with a molecule that is suitable for affinity interactions. Such molecules suitable for affinity interactions may comprise nucleic acids, such as small oligonucleotides. Such small oligonucleotides may comprise poly-T sequences that can be utilized in affinity interactions with nucleic acids that comprise poly-A tails. Poly-A tails may be added to the nucleic acids enzymatically. Other molecules suitable for affinity interactions can be avidin or streptavidin that can be utilized in affinity interactions with nucleic acids that comprise biotin (e.g., biotinylated DNA). Silica or other molecules facilitating affinity interactions to nucleic acids may coat any surface or object, such as a bead, a resin, a gel, a well, a tube, a membrane, a filter or a surface thereof.

In certain embodiments, high-throughput iChIP methods are provided, the methods include one or more of the following steps:

a) cross-linking chromatin-associated factors to chromatin,

b) shearing the cross-linked chromatin of (a) to provide nucleic acid fragments c) contacting the nucleic acid fragments of (b) with an affinity molecule, and

d) isolating the nucleic acid fragments, and

e) analyzing the distribution and enrichment of the isolated nucleic acid fragments, wherein one step of immobilizing the affinity molecule in (c) and one step of immobilizing the nucleic acid fragments in (d) are used to provide a high-throughput iChIP method.

In one embodiment, the affinity molecule in (c) is an antibody that is coupled to a substrate by affinity interaction. For example, an antibody that is specific for a chromatin- associated factor bound (cross-linked) to the nucleic acid fragment is coupled to a substrate or object (e.g., a bead, a membrane, or a well/tube surface) via a second affinity molecule that facilitates interaction with the antibody (e.g., protein A or protein G) that is itself coupled to the substrate. Nucleic acid bound (cross-linked) to a chromatin-associated factor recognized and bound by the specific antibody may then be isolated by capturing this antibody via the second affinity molecule.

In one embodiment, the affinity molecule in (c) is an affinity tag that is coupled to a substrate by affinity interaction. For example, a chromatin-associated factor bound (cross- linked) to the nucleic acid fragment may contain an affinity tag (e.g., FLAG, myc, biotin) and is coupled to a substrate or object (e.g., a bead, a membrane, or a well/tube surface) via a second affinity molecule (e.g., an antibody: anti-FLAG, anti-myc antibody, or streptavidin, avidin) that is itself coupled to the substrate or object. Nucleic acid bound (cross-linked) to a chromatin-associated factor containing the affinity tag may then be isolated by capturing the factor via the second affinity molecule.

If the affinity molecule is, for example, coupled to a bead, separation of the specifically bound nucleic acid fragment may comprise gravity (e.g., settling or centrifuging) or in case the bead is a magnetic bead, a magnetic field and washing steps to reduce or eliminate the fraction of non-specific nucleic acid fragments. If the secondary affinity molecule is, for example, coupled to a membrane or well/tube surface the membrane or well/tube surface may be subjected to washing steps without the need for settling, centrifuging or the use of a magnetic field.

In one embodiment, isolating the nucleic acid fragments in (d) is carried out using an interaction between the nucleic acid and an affinity molecule (e.g., poly-T oligonucleotides, avidin/streptavidin) if the nucleic acid is suitably modified for this interaction (e.g., is modified to contain poly-A tails or is biotinylated) or affinity surfaces such as silica, which interact with nucleic acids e.g., under specific buffer (pH and salt) conditions.

If the affinity molecule or surface is, for example, coupled to a bead, isolation of the nucleic acid fragment may comprise gravity (e.g., settling or centrifuging) or in case the bead is a magnetic bead, a magnetic field. If the affinity molecule or surface is, for example, coupled to a membrane or well/tube surface the membrane or well/tube surface may be used without the need for settling, centrifuging or the use of a magnetic field to immobilize and isolate the nucleic acid fragment. Nucleic acid fragments in (d) immobilized in such way can further be subjected to washing steps to reduce or eliminate any residual undesired components of the reaction mix, e.g., polypeptides, peptide fragments, phospholipids.

In certain embodiments, iChIP methods may be adapted for a high-throughput format (e.g., the parallel processing of several samples, such as 6, 12, 24, 96 or more samples) by combining the improvements described herein. A high-throughput iChIP method may include the steps of:

a) shearing chromatin in a single batch before distributing in individual reaction containers (e.g., wells/tubes) or first distributing cross-linked chromatin in individual reaction containers before chromatin shearing (this may be done in a convenient format, e.g., a 96- well plate or microfluidic chamber),

b) contacting sheared chromatin fragments bound to chromatin-associated factors with a first affinity molecule specific for the chromatin-associated factor (e.g., antibody), c) immobilizing the first affinity molecule on a surface, e.g., a bead/magnetic bead, the surface of the reaction container (e.g., a well, tube, or nanodroplet), or a membrane coated with a second affinity molecule specific for the first affinity molecule,

d) performing washing of the immobilized molecules in (c) to reduce or diminish non-specific chromatin fragments,

e) separating sheared chromatin fragments bound to chromatin-associated factors from first affinity molecule and/or separate first affinity molecule from second affinity molecule,

f) separating nucleic acid fragments from chromatin-associated factors, g) immobilizing nucleic acid fragments in (f) on a surface, e.g., a bead/magnetic bead, the surface of the reaction container (e.g., a well or tube), or a membrane coated with an affinity molecule specific for the nucleic acid fragments (e.g., silica, DNA fragment- specific oligonucleotides (e.g., poly-T), or tag-specific affinity molecule (e.g.,

biotin/avidin/streptavidin), this may be done in a convenient format, e.g., a 96-well plate or microfluidic chamber/channel, and

h) performing washing of the immobilized nucleic acid fragments in (g) to reduce or diminish any undesired residual reaction component (e.g., polypeptides, peptide fragments, phospholipids, RNA), wherein when the nucleic acid in (f) is immobilized in (g) then all steps may be carried out in a convenient high-throughput format (e.g., 96-well plates, or microfluidic chambers/channels) and may be carried out in a fully automated or semi- automated process.

In some embodiments, this may be advantageous over existing protocols that suggest nucleic purification using, for example, phenol extraction or column-based purification methods (e.g., QIAGEN MINELUTE™ kit). Such nucleic acid purification steps would make necessary sample transfer and handling difficult to automate. As described herein, in some embodiments, it is disadvantageous to omit nucleic acid purification. Subsequent nucleic acid analysis (such as sequencing (e.g., ChlP-seq), real-time PCR (qPCR), DNA microarrays hybridization/ChlP-on-chip) may be impacted by

incompletely purified nucleic acid and may lead to increased sample-to- sample variation and an increase in "false negative" samples.

As described herein, in certain embodiments, many of the steps of the iChIP method may be fully automated using methods and artifacts know in the art. For example, chromatin shearing (DNA fragmentation) may be accomplished using methods such as nebulization, enzymatic digestion (e.g., micrococcal nuclease), hydrodynamic shearing, and sonication. Methods to avoid common problems, such as thermal and sequence- specific biased shearing (e.g., sequence- specificity of enzymatic fragmentation), thermal degradation (e.g., heat denaturation and complex stripping of proteins), sample loss, automation difficulties, and user-based issues, are known in the art and are described herein.

For example, specific sonication conditions are provided herein. Such methods can be modified/adapted to work with any sonication platforms, e.g., Branson Standard

SONIFIER® products and/or BRANSONIC® tabletop ultrasonic products (Branson

Ultrasonic Corp., Danbury, CT).

Other sonication platforms include, for example, SONICMAN™ (Matrical

Biosciences, Spokane, WA), a high throughput multi-probe sonication instrument

configurable with 96, 384, and 1536 well formats. The SONICMAN™ platform uses disposable gasketed pin lids (in variable lengths to accommodate different well dimensions of microplates) to transfer sonic energy to each individual well and to prevent well-to-well cro s s-contamination .

Covaris (Woburn, MA) Adaptive Focused Acoustics™ (AFA) technology enables numerous non-contact, isothermal sample preparation processes for fragmenting DNA ranging from lOObp to 5kb in length. Rather than being fixed, AFA wavelengths may be scaled to be focused into a localized area of the sample. For example, in a 96-well plate of about 3mm inner diameter wells, the wavelength is approximately 1mm. The processes may be run isothermally.

In some embodiments, immobilization of the factor-bound sheared chromatin fragments and subsequent eluted complex-free nucleic acid fragments using affinity-based immobilization methods described herein (e.g., using beads or coated surfaces of reaction containers) allows robotic dispensing and aspiration of wash solutions and elution buffers, as well as sample transfer into new reaction containers (e.g., multi- well/micro plates).

In one embodiment, silica is used to immobilize nucleic acid fragments. Without wanting to be bound by any particular theory, it is thought that the highest DNA adsorption efficiencies occur in the presence of buffer solution with a pH at or below the pKa of the surface silanol groups. It is thought that a decrease of the negative charge on the silica surface due to the high ionic strength of the buffer leads to a decrease in the electrostatic repulsion between the negatively charged DNA and the negatively charged silica. The buffer may also reduce the activity of water molecules possibly causing the silica surface and DNA to become less coordinated by water molecules which may aide the DNA to adsorb to the silica surface. For example, guanidinium HC1 (in a GuHCl-based DNA loading buffer), a chaotrope, denatures biomolecules by disrupting the coordinating water molecules around them. This may allow positively charged ions to form a salt bridge between the negatively charged silica and the negatively charged DNA backbone in high salt concentration. The DNA can then be washed with high salt and ethanol, and ultimately eluted with low salt (e.g., Tris-ethylenediaminetetraacetic acid (EDTA) (TE) at pH 8.4).

In certain embodiments, the purified nucleic acid fragments (e.g., DNA fragments) may be transferred to a new reaction container (e.g., a 96-well plate) and used for detection of chromatin regions that are enriched for the chromatin-associated factor that is bound by the specific antibody used for iChIP, e.g., by quantitative real-time (qPCR), ChlP-string arrays, or DNA sequencing. In case of low yield, which is common for DNA-binding

proteins/chromatin-associated factors, the samples may be amplified, e.g., using a whole genome amplification method, which optionally may be carried out in a high throughput manner. These amplified DNA samples may then be used for the detection of enriched regions. Chromatin profiling methods are known in the art. For example, chromatin immunoprecipitation-massively parallel DNA sequencing (ChlP-Seq) is used to analyze a set of DNA-associated proteins. It can be used to precisely map global DNA binding sites for any protein of interest, e.g., transcription factor, restriction enzyme, or other chromatin associated proteins, on a genome scale. Chromatin immunoprecipitation may also be combined with microarray "ChlP-on-chip," which requires a hybridization array.

Specific DNA sites that are in direct physical interaction with transcription factors and other proteins, such as histones, may be isolated by iChIP, which produces a library of target DNA sites bound by a protein in vivo. In some embodiments, massively parallel sequence analyses may be used in conjunction with whole-genome sequence databases to analyze the interaction pattern of a protein of interest (e.g., transcription factors, polymerases or transcriptional machinery) with DNA or to analyze the pattern of an epigenetic chromatin modification of interest (e.g., histone modifications or DNA modifications).

ChIP may be used, in some embodiments, to selectively enrich for DNA sequences bound by a particular protein (e.g., transcription factor or histone, see Examples) in living cells by cross-linking DNA-protein complexes and using an antibody that is specific against a protein of interest. After precipitation of chromatin, oligonucleotide adapters may be added to the small stretches of DNA that are bound to the protein of interest to enable massively parallel sequencing. After size selection, the resulting iChlP-DNA fragments can be sequenced simultaneously using, for example, a genome sequencer. A single sequencing run can scan for genome-wide associations with high resolution. For ChlP-on-chip sets of tiling arrays (of overlapping probes designed to densely represent a genomic region of interest) may be utilized, in certain embodiments.

Massively parallel sequencing is known in the art and many sequencing methods may be used. Some technologies may use cluster amplification of adapter-ligated ChIP DNA (or iChIP DNA) fragments on a solid flow cell substrate. The resulting high density array of template clusters on the flow cell surface may then be submitted to sequencing-by- synthesis in parallel using for example fluorescently labeled reversible terminator nucleotides.

Templates can be sequenced base-by-base during each read. In certain embodiments, the resulting data may be analyzed using data collection and analysis software that aligns sample sequences to a known genomic sequence.

Sensitivity of this technology may depend on factors such as the depth of the sequencing run (e.g., the number of mapped sequence tags), the size of the genome, and the distribution of the target factor. By integrating a large number of short reads, highly precise binding site localization may be obtained. In certain embodiments, ChlP-Seq data can be used to locate the binding site within few tens of base pairs of the actual protein binding site, and tag densities at the binding sites may allow quantification and comparison of binding affinities of a protein to different DNA sites.

A difficulty concerning ChIP protocols is the validation of the success of ChIP protocols for DNA-binding proteins. Current validation methods are difficult and time consuming. One common way of validation that may be used in some embodiments described herein is to focus on one specific DNA-binding protein of interest and to assess the genomic regions that are in association with the DNA-binding protein of interest by trial and error and other empiric approaches, utilizing for example qPCR. In certain embodiments, the iChlP assay validation by qPCR precedes ChlP-DNA sequencing. Another approach involved the over-expression of a tagged version of the DNA-binding protein of interest (e.g., described in Nishiyama et al. Cell Stem Cell 5(4):420-433, 2009). This can only be performed in vitro (e.g., by transfecting cell lines) and, in certain instances, it may be difficult to discern the relevance of the non-physiological data obtained from an over-expressed protein.

Provided herein are methods using the NANOSTRING ® platform as depicted in

Figure 25 (Nanostring Technologies, Seattle, WA). In certain embodiments, methods are provided that probe ChlP-DNA derived from iChlP assays performed as described herein in a high-throughput manner using the NANOSTRING ® platform as depicted in Figure 24 and exemplified in Figure 26. In certain embodiments, probes are specific for genomic regions that are bound by DNA-binding proteins and/or chromatin-associated factors. In certain embodiments, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 50000, or 100000 probes specific for genomic regions are represented using the

NANOSTRING ® platform. The NANOSTRING ® technology uses color-coded

(fluorescently labeled) molecular "barcodes" that can hybridize directly to target molecules. Each color-coded barcode is attached to a single target- specific probe corresponding to a gene of interest. Two probes at about 50 nucleotides are utilized that can hybridize in solution. The first probe is a reporter probe that carries the signal; and the second probe is a capture probe that allows the complex to be immobilized and analyzed for data collection. The NCOUNTER ® Analysis System utilizes a digital technology that is based on direct multiplexed measurement of gene expression with a sensitivity of less than one copy per cell. The NANOSTRING ® technology is described, for example, in Geiss et al. Nat. Biotech. 26(3):317-325, 2008, incorporated by reference herein). Using the NANOSTRING ® technology as described herein, it is possible to validate ChlP-DNA prepared by the iChlP methods described herein in a high-throughput manner, combining the high-throughput iChlP methods provided herein with the ability to parallel analyze and quantify multiple ChlP-DNA samples using NANOSTRING ® technology. The fluorescent signals obtained using the NANOSTRING ® platform are, in some embodiments, easily quantifiable and allow validation of the ChIP assay. In certain embodiments, using the methods described herein, antibodies can be validated for suitability in iChIP and other IP assays.

In any one of the preceding methods may further comprise pre-amplification of chromatin and/or nucleic acids. In certain embodiments, the methods further comprise adding to the chromatin or nucleic acid fragments a sequence (e.g., a Yl recognition site) that will facilitate pre-amplification of the chromatic or nucleic acid. In certain embodiments, pre-amplification renders the methods described herein more sensitive.

EXAMPLES

The present invention will be more specifically illustrated by the following examples.

However, it should be understood that the present invention is not limited by these examples in any manner.

The complex gene expression programs that underlie development, differentiation, and environmental responses are primarily determined by binding of sequence- specific transcription factors (TFs) to DNA (Davis et al, 1987; Graf and Enver, 2009; Laslo et al, 2006; Lenardo and Baltimore, 1989; Weintraub et al, 1989; Zhou et al, 2008). While it is clear that TFs play a critical role in gene regulation, how these factors work together to control gene expression responses in complex organisms is still not fully understood

(Davidson, 2010; Peter and Davidson, 2011). To date, systematic efforts to understand the mammalian regulatory code have mostly relied on generalization from studies on simple model organisms (Capaldi et al, 2008; Harbison et al, 2004), in vitro experiments (Berger et al, 2008; Grove et al, 2009 Cirillo-Zaret 2000) and studies of individual gene loci (Bossard and Zaret, 1998; Cirillo et al, 2002; Cirillo and Zaret, 1999; Thanos and Maniatis, 1992). Genomic approaches, such as correlation analysis of gene expression profiles (Segal et al, 2003), and more recently RNAi perturbation followed by gene expression readouts (Amit et al, 2009), have provided an initial glimpse into the complexity of mammalian gene regulation. However, such approaches cannot distinguish direct from indirect effects and cannot address network redundancy and temporal regulation, providing limited insight into the underlying regulatory mechanisms.

A complementary approach is to measure the temporal in vivo binding of TFs to cis regulatory regions under relevant stimuli. Recent advances in genomic technologies allow for unbiased and accurate genome- wide characterization of TF binding using ChIP followed by DNA sequencing (ChlP-Seq) (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007).

Despite these advances in detection, ChIP has remained a relatively low throughput, labor intensive and costly protocol (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007;Solomon and Varshavsky, 1985). Because of this, ChIP studies in mammalian systems are typically limited to measuring the binding of a handful of selected regulatory proteins at a single time point. As a result, little is known about the genome wide dynamics of protein- DNA interaction networks.

To address these challenges, iChIP was developed - a reproducible, high throughput and cost-effective method for ChIP coupled to multiplexed massively parallel sequencing. iChIP was used to investigate the principles of gene regulation in the model system of primary innate immune dendritic cells (DCs) (Katsnelson, 2006) stimulated with a pathogen component lipopolysaccharide (LPS). In response to stimulation, DCs activate a robust, specific, and reproducible response that unfolds over several hours, involves changes of thousands of genes (Amit et al., 2009), and plays a critical role in directing the host immune response. The regulatory factors that control this response were previously defined (Amit et al., 2009) and showed that its primary driver is transcriptional regulation (Rabani et al., 2011). Thus, DCs are an attractive model system for dissecting a dynamic transcriptional response.

iChIP was used to build genome- wide dynamic maps of TF localization to DNA in the response of DCs to LPS. Antibodies were screened for the 184 most expressed transcription factors and identified ChlP-Seq grade antibodies for 29 TFs, 4 RNA polymerase components, chromatin modifiers and 3 epigenetic modifications. Using these validated antibodies, iChIP was performed across four time points upon LPS stimulation. The TFs vary substantially in their binding dynamics, number of binding events, preferred genomic locations and interactions with other TFs. Analysis of these different binding characteristics shows that TFs fall into at least three broad categories, which taken together with recent reports (Bossard and Zaret, 1998; Cirillo et al., 2002; Cirillo and Zaret, 1999; De Santa et al., 2010; Heinz et al., 2010; Lupien et al., 2008) suggest that a complex multilayered

architecture of TF binding controls cell state and gene induction and expression.

Surprisingly, it was found that much of the binding of TFs is pre-coded prior to stimulation, predominantly on immediate early genes, and that many of these genes are associated with High Occupancy Target (HOT) regions similarly to those recently reported in flies and worms (Gerstein et al., 2010; Negre et al., 2011; Roy et al., 2010). Combining RNAi perturbations and binding maps, the redundancy of different factors in the regulatory network and the extent of indirect transcriptional effects was also assessed. Together, this example demonstrated that a systematic approach can uncover fundamental principles of the regulatory code, such as the hierarchy of the TF organization in mammalian cells. To facilitate visual exploration, an extension to the Integrative Genomics Viewer (IGV

(Robinson et al., 2011)) was developed specifically for the interactive exploration of time course data. The entire data can be viewed from: weizmann.ac.il/immunology/AmitLab/data- and-method/iChlP (For Reviewer Login, User: user Password: gy2011).

Example 1

iChIP: A high-throughput method for mapping Protein-DNA interactions

iChIP is an automated method for systematic mapping of in vivo protein-DNA binding that greatly increases the throughput, while significantly reducing the labor and cost required for ChlP-Seq. iChIP uses magnetic beads for chromatin immunoprecipitation and DNA purifications, thus eliminating the need for laborious manual washes, DNA purification and gel extraction steps (Figure 1). The entire iChIP process is performed in the same well, reducing sample loss due to transfer of material, a significant source of variability given the small amounts of DNA involved (Figure 9). iChIP further leverages the yield of current next-generation sequencing by multiplexing an arbitrary number of different indexed sequencing adapters, 96 in our case, to combine samples in a single flow cell (Figure 1A).

iChIP was used to reconstruct the TF binding network following a time course of LPS stimulation of primary mouse dendritic cells (DCs). RNA-Seq was first used across these time points to identify the TFs expressed in DC. 271 commercially available antibodies targeting the 184 expressed TFs were collected (Figure IB). Each antibody was tested using a signature readout (Geiss et al., 2008) ('ChlP-String') that measures selected genomic DNA regions with high regulatory activity (De Santa et al., 2010; Kim et al., 2010) (Figure 1A bottom; Figure 1C). 41 antibodies that passed selection criteria as 'ChIP grade' were identified, based on their enrichment on the signature regions and performance in Western blots (Figure 1C, Figure 10). These antibodies were used for subsequent iChIP experiments. iChIP was performed for 29 TFs, 3 chromatin modifications, 5 chromatin regulators, and 4 RNA Polymerase II components (Pol-II, Table S4). In addition, mRNA levels and RNA transcription rates were measured with strand- specific RNA-Seq and 4SU-Seq (Rabani et al., 2011), respectively. Each factor and modification in primary DCs at four time points (0, 0.5, 1, 2 hours) post lipopolysaccharide (LPS) stimulation was measured, during which most of the changes in Pol-II binding and gene expression occur (Figure 11). Determining enhancers and promoters using RNA-Seq, histone marks and TF binding maps Recent studies have demonstrated that the ratio between H3K4me3 and H3K4mel histone marks can be used to identify promoters (5' proximal regulatory element) and enhancers (distal regulatory element) regions (Rada-Iglesias et al., 2011 ): promoters are associated with a higher proportion of H3K4me3-marked histones (H3K4me3+), while enhancers have a higher proportion of H3K4mel marked histones (H3K4mel+).

To define promoters, candidates containing H3K4me3+ were identified and those that overlapped a known (Pruitt et al., 2007) or reconstructed (Guttman et al., 2010) transcription start site were retained (Figure 2A, Figure 12). Notably, -75% of the identified promoters were bound by at least one of the TFs. To define enhancers, candidates containing

H3K4mel+ were identified and those that were also bound by at least one TF were retained(See for example the Illab loci in Figure 2A). Altogether, 48,163 enhancers and 11,252 promoters were identified.

Consistent with previous observations, it was found that different chromatin marks exhibit different dynamics during stimulation (Figures 13A, 13B) (De Santa et al., 2010). H3K4me3 is remarkably stable during the LPS response. The few exceptions are in -30 loci which are lowly expressed pre- stimulation and strongly induced after stimulation (top 95% of induction).

Conversely, H3K27Ac is more variable and changes in H3K27Ac tend to correspond to changes in PolII binding (r=0.66 for H3K27Ac vs r=0.49 for H3K4me3). These chromatin marks are significantly less dynamic than most TFs (Figures 13A-13G).

Global properties of TF binding maps

Collapsing the temporal iChIP sequence reads into a unified collection of reads ("compressed" dataset), significant binding events (peaks) for each TF (Guttman et al., 2010) were identified. The vast majority (82%) of high scoring TF peaks fall within the promoter regions or the H3K4mel+ regions (candidate enhancers) defined above (p<10-20). The binding landscape is consistent with the known specificities of TFs (Figure 2B). Using de- novo motif discovery (Bailey and Elkan, 1994) across the high-scoring bound sites, the known motifs for 19 (-65%) of the TFs (Gupta et al., 2007) were identified, as well as novel motifs for an additional four (-13%) TFs (Figure 14). For example, the highest scoring motif (E < e-100) found for the TF E2f4 is the cell cycle genes homology region (CHR), a previously identified regulatory element found adjacent to a handful of cell cycle genes, which appears in tandem to an E2f canonical motif (Lange-zu Dohna et al., 2000). TFs for which no enriched motifs were found either bound very few sites or did not exhibit distinct peaks of binding, but rather "run" through the body of the gene. For example, E2F1 runs with similar appearance and dynamics to that of the Pol-II complex (Figure 15A). To quantify this effect, a 'running index' was defined for every TF on every gene (Figure 15B). Interestingly, several factors (such as E2fl, Statl and Nfkbl) run at -50% of their target genes suggesting they may be in close association with Pol-II. The Pol-II complex was immune-precipitated and the precipitated complex was measured for the presence of Statl (high running index) and PU.l (low running index). Consistent with the hypothesis, Statl co- immuno-precipitated with Pol-II, but PU. l did not (Figure 15C).

The TFs vary substantially in both number and location of binding events. Some TFs

(PU. l and Cebpb) bind >30,000 sites, while others (such as Fos) bind <500 (Figure 2).

Markedly, 76.5% of the identified peaks (excluding peaks of PU.1 and Cebpb) fall in close proximity (500bp) to a peak of either PU. l or Cebpb (p<10-10). The factors exhibit substantially different localization preferences with some favoring enhancers while others bind mostly in promoters (Figure 2). Interestingly, several factors preferentially bind in less canonical regions (Figure 2).

A non-canonical binder: Runxl binds on the 3' UTR of inflammatory genes

The runt domain 1 factor Runxl (also known as Amll) is an important transcription factor for normal hematopoiesis whose translocations are involved in several types of leukemia (Pabst and Mueller, 2007). Surprisingly, our ChIP data shows that Runxl binds many genes at their 3' end (Figure 3A). Considering this non-canonical binding preference, it was observe that (i) the enrichment of Runxl on inflammatory target genes was 2.1 times higher on genes bound at the 3' end than on genes bound at the promoter (Figure 3B), (ii) Runxl peaks at the 3' end are stronger than peaks at the promoter (Figure 3C), (iii) genes bound by Runxl at the 3' end have higher expression levels (Figure 3D), and (iv) the binding at the 3' end is more dynamic than binding at promoters and enhancers (Figure 3E).

Looking at the down regulated genes upon Runxl knock down in primary DC activated with LPS (Methods, Amit et al., 2009), an unexpected proportion consist of inflammatory genes (p=0.003, hypergeometric) with many of these genes bound at their 3' end (Figure 3F). Interestingly de-novo motif finding using only promoter peaks results in the canonical Runxl motif while a similar search on 3' end peaks does not results in any significant motif. Taken together these results suggest that Runxl may have a different function or associate with a different complex when binding the 3' end of genes as compared to promoter bound regions and that this complex regulates the expression of highly expressed inflammatory genes.

Co-binding of TFs in regulatory regions shows a Cis-regulatory organization

Associating each binding site with its encompassing regulatory region (promoter or enhancer), resulted in 186,959 high confidence TF- region binding interactions. Similar to recent reports (Zinzen et al., 2009), the resulting network suggests that TFs tend to bind in Cis-Regulatory Modules (CRMs; Figure 4A) occupied by multiple other factors (1.65 fold enrichment over a random model, Figure 16) with two exceptions, PU.l and Cebpb, which show a strong tendency for binding lowly occupied regions (>10 of their bound regions have no other factor binding, a 3 fold enrichment). Moreover, it was found that 3.9% of the regions are associated with clustered binding (HOT regions) (Gerstein et al., 2010; Negre et al., 2011; Roy et al., 2010) with 10 or more bound TFs (9.3-fold enrichment). TFs interact with one another in a combinatorial fashion to control different gene programs (Hoffmann and Baltimore, 2006). Co-occurring pairs of TFs were searched, while excluding HOT regions, which can confound discovery of such interactions (Gerstein et al. ,2010; Negre et al., 2011; Roy et al., 2010) (Figure 4B). Results recapitulate well-known transcriptional complexes. For example, Rela co-occurs with other Nfkb family members (Relb, Nfkbl, cRel, (Hoffmann and Baltimore, 2006)), Statl co-occurs with Stat2 and l co-occurs with Irf2. Other pairs of TFs revealed by our dataset {e.g., E2f4-Egrl or Maff -Nfe212) have not been previously reported to the best of our knowledge and merit further investigation. TFs range from primarily static to primarily dynamic binders

TFs vary substantially in the extent of dynamic changes in their binding during the response. The Ifit locus (Figure 5A) provides an illustrative example. While PU.l is bound at the same level in both unstimulated and stimulated cells (Figure 5A, top inset), Statl binding appears only during the late stages of LPS response (Figure 5A, bottom inset).

Overall, 58,927 (31 ) of TF-region interactions are "dynamic", such that their binding intensity increases or decreases by more than 3-fold post-stimulation (see Figure 17 for robustness to different fold cutoffs). The majority of dynamic interactions (39,883, 67%) are gain of TF binding (Figure 18). Importantly, the dynamic interactions are reproducible across biological replicates (Figure 13F).

To quantify the extent of dynamic binding for each TF, a Turnover score defined as the fraction of regions bound by the factor that changed from the unstimulated state by at least 3-fold was calcualted (Figure 5B and Figure 17). This score varies dramatically between TFs, with PU. l and Cebpb, being mostly static (<20% Turnover), 16 factors having moderate dynamic binding (e.g. Irf4 and Junb, 20-65% Turnover), and 10 factors being mostly dynamic (e.g. , Statl and Rela, >65% Turnover). The extent of gain versus loss of binding also varies between TFs with 7 factors showing almost exclusive (>90%) gain of binding (e.g. , Statl and Rela) and 5 factors with majority (>60%) of loss events (e.g. Irf2 and Irf4). Finally, the timing of gain or loss also varies with some factors (Figure 5B). As discuss below, these differences often correlate with the expression levels of their target genes.

TF binding correlates with transcription dynamics

To study the functional implication of TF binding, binding events at regulatory regions were associated with target genes (Figure 6A and Figure 19). 46,431 (-96%) of the annotated enhancer regions were associated with 11,251 genes with an average of 10.9 enhancers per-gene. Clustering the genes based on their associated binding profiles revealed significant correlations with gene function (Figure 6B). Genes bound by few TFs (-5.7 TF on average; cluster a) in the data were enriched for basic cellular processes (p<10-6), those targeted by a moderate number of TFs (-12.9 TF on average; cluster f) were enriched for anti-viral response and late induced (>2hr) genes (p<10-10, Figure 6B and Figure 5A), and those targeted by many TFs (>20 TF on average; cluster h, Figure 6B and Figure 2A) were enriched (p<10-10) for inflammatory response pathways and immediate early genes (up to lhr). The targets of individual TFs were also enriched with specific functional classes

(Figure 6C). For instance, Rel and Nfe212 targets are were for inflammatory genes, whereas E2f4 targets were strongly associated with cell cycle genes (p<10- 10).

How TF binding correlates with transcriptional changes in response to environmental signals was investigated. Control of the transcriptional response consists of several layers of regulation (Yosef and Regev, 2011): (1) establishing the basal level pre- stimulation, (2) establishing the induction or repression level after stimulation, and (3) controlling the timing of the transcriptional response. Focusing on TF binding at each gene pre- stimulation, the overall number of TF binding events correlated with basal gene expression levels (Figure 6D, p<10-20), with genes induced post-stimulation (Figure 6D, p<10-20), and with early onset of induction (p<10-10). Despite these correlations, it remains unclear whether having multiple bound TF is necessary for expression or whether it is a consequence of open chromatin or/and high Pol-II activity. Controlling for this, eight specific factors (JunB, Atf3, Irf4 l, Rela, Runxl, Maff and Irf2) were found whose pre- stimulation binding is significantly associated with induced genes or with genes that had high basal transcription levels (p<10-3 Figure 20A, 20B). In contrast, PU. l and Cebpb bound a larger number of genes in the pre-stimulated state but were not enriched at induced or highly expressed genes. Notably, Irf2 was the only TF with significantly enriched pre- stimulation binding in genes that are expressed at low levels pre- stimulation and then strongly induced post- stimulation (Figure 20C, p< 10-3). Consistent with a previous report (Harada et al., 1989), Irf2 may act as a repressor antagonizing l function. It was hypothesized that the M2 circuit functions as a repressor in unstimulated DCs to 'poise' anti- viral genes for the response. Consistent with this hypothesis, the effect of M2 knocked down in unstimulated DCs was almost entirely up-regulation (20% of genes bound by M2 are affected by the knockdown, 91% of them are up-regulated; Figure 21).

As some factors bind thousands of gene regulatory regions, it was unclear which of these binding events were necessary for the precise timing of transcription. To explore this, the relationship between temporal changes in binding and transcription as determined by nascent transcription (Rabani et al., 2011) and Pol-II binding profiles (Figure 22) was studied. Globally, it was found that induced genes were strongly associated with the gain of TF binding (p< 10-20). To examine this effect more closely, several sets of genes were considered, clustered by the shape and timing of their response (Figure 6E). It was found that multiple cases in which the timing of gain or loss of binding of a certain TF precedes or coincided with the timing of induction. For example, Egrl became strongly associated with early-induced genes at 1 hour post-stimulation, coinciding with their peak transcription. In contrast, Statl became associated with later genes at 2 hours post-stimulation, preceding these genes' peak transcription (p<le-3; Figure 22). Consistent with the perturbation data, it was also found that loss of M2 significantly coincides with late induction (P<10-3, Figure 6E). While many genes (847) were induced post-stimulation, an even larger number (2,820) are repressed. Among genes repressed post-stimulation, there was a significant enrichment in the data only for loss of TF binding (p~10-3) or for no binding gain (p<10-10). However, the data cannot rule out active recruitment of repressing complexes such as Smart/Ncor by various TFs not profiled in this study (Barish et al., 2010; Ghisletti et al., 2009). Interesting examples, are the histone gene clusters, bound by Nfkb and E2f family members in the basal state followed by loss of Nfkb factors immediately post-stimulation (Figure 23).

TF network redundancy

While the binding of TF correlated with gene transcription and induction (Figure

6D), it was unclear in which cases a TF binding is redundant (Hu et al., 2007) or at all functional (Johnson et al., 2007). Indeed, the DC TF network architecture is highly interconnected with substantial feedback and auto regulation (Figure 7A), which suggests substantial redundancy and robustness (Alon, 2007).

To determine the degree of functional and non-redundant binding in the DC network, the binding data was integrated with RNAi perturbation data for the same TFs in the same cells under the same stimuli on a selected set of signature genes (Amit et al., 2009) (Figures 7B, 7C). It was found that overall 35% of the binding events involving the signature set were associated with up or down regulation of the target gene. Further, it was found that 59% of the knockdown effects were indirect {e.g., the TF showed no binding but knock-down affected the gene expression). Considering specific TFs, various levels of redundancy were observed. While some TFs affected a relatively small portion of the genes they bind {e.g., Fos, l, Nfkb2 and Maff), others were non-redundant (or more sensitive up to 4 fold decrease in their expression following knock down) and affected a large fraction of the genes to which they bind {e.g. , Statl and Stat2).

A layered architecture of the TF network

Combining the distinct temporal and genomic binding patterns described above revealed diverse forms of binding. Previous studies distinguished between two major classes of TF: Pioneer and non-Pioneer factors (Bossard and Zaret, 1998; Cirillo et al., 2002; Cirillo and Zaret, 1999; Heinz et al., 2010; Lupien et al., 2008). Pioneer factors bind to compacted chromatin and initiate chromatin remodeling during the differentiation processes while non- Pioneers can mostly bind open, nucleosome free DNA (Bossard and Zaret, 1998; Cirillo et al., 2002; Cirillo and Zaret, 1999). It was tested whether non-Pioneers will bind in close proximity to Pioneers and have a smaller set of potential target regions. Two factors in the set (PU.1 and Cebpb) exhibit 'Pioneer' like properties with extremely abundant binding that already exists in the unstimulated state, covers the majority of sites bound by other TF, and are enriched at "isolated" sites with no other TF binding. Both Cebpb and PU.1 were recently identified as Pioneer factors in DCs (De Santa et al., 2010; Heinz et al., 2010). The binding of Cebpb and PU.l is relatively static during the response, comparable to the histone marks and Ctcf (Figure 13).

To discover other potential TF 'classes', the different TFs binding properties discussed above were explored: (1) number of bound regions, (2) ratio of enhancer to promoter binding, (3) Turnover Score, (4) Running Index, (5) Fraction of regions bound in isolation, and (6) Fraction of all DNA motifs in the genome bound by the factor. Using principal component analysis (Figure 8A) it was found that both PU.l and Cebpb clearly separate from other factors, but the remaining factors form at least two sub-groups. Factors in one group (Figure 8A) bound many genes, but rarely bound in isolation (<5% alone), had a higher Running Index (0.3 vs 0.2) and a larger portion of dynamic binding events (35% vs 11 ) compared to the Pioneers. The remaining factors (Figure 8A) tended to bind fewer genes, were mostly dynamic, tended to preferentially bind promoters, and many "ran" with Pol-II during transcription on a high percent of its targets (Figures 2B, 5B and 15).

These factor classes are also distinguished by their effect on gene expression, in a manner consistent with a 'layered' hierarchical organization. Pioneer binding does not correlate with gene expression or induction levels, whereas binding of the first non-pioneered group (green nodes in Figure 8A) in the unstimulated state, correlates with expression and with potential for induction, but has lower enrichment for specific functional categories

(Figures 6C, 20). The remaining factors (Figure 8A) tended to bind a smaller number of regions from certain functional categories (e.g., Statl with anti-viral genes; Figure 6C) and dynamically coincide with the induction of genes post-stimulation (Figure 6E). This was consistent with a spatial-temporal layered organization where Pioneer factors potentiate binding by opening previously inaccessible sites (Bossard and Zaret, 1998; Cirillo et al., 2002; Cirillo and Zaret, 1999; Heinz et al., 2010; Lupien et al., 2008) (Figure 8B, left). These new elements were occupied in a relatively static manner by a second layer of TFs (e.g., Junb) that primed the response and set the basal expression levels of thousands of genes (Figure 8B, middle). The final layer consisted of TFs that bound subsets of genes, often in a very dynamic fashion, and usually at genes with a shared biological process (Hoffmann and Baltimore, 2006), (Figures 6C, 8B). iChIP module

20 million dendritic cells were used for each iChIP experiment. Cells were fixed for

10 min with 1% formaldehyde, quenched with glycine and washed with ice-cold PBS and pellets where flash frozen in liquid nitrogen. Cross-linked DC where thawed on ice and resuspended in RIPA lysis buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA pH 8.0, 14 mM NaCl, 1% Triton X-100, 0.1% SDS, 0.1% DOC) supplemented with protease inhibitor (Roche, 04693159001). Cells were lysed for 10 min on ice and the chromatin was sheared according to the calibrated conditions for DC cells using a Branson Sonifier (model S-450D) with a custom sample cooling system (sample holder- Mecour; #99-401, CB-LSOO-60/24, Chiller-Thermo; RTE-7 Dl). The sonicated cell lysate (Whole Cell Extract) was cleared by centrifugation and mixed with 75 μΐ of protein G magnetic dynabeads (Invitrogen) coupled to target antibody and incubated over night at 4 °C. For the coupling, beads were washed once (200ul) in a binding/blocking buffer (PBS, 0.5% Tween 20, 0.5% BSA), incubated with 10 μg of antibody in binding/blocking buffer for 1 hour at room temperature, and then washed to remove excess antibody, 96 well magnet was used (Invitrogen) in all further steps. Cell lysate was removed and samples was washed 5 times with cold RIPA (200 μΐ per wash), twice with RIPA buffer supplemented with 500 mM NaCl (200 μΐ per wash), twice with LiCl buffer (10 mM TE, 250mM LiCl, 0.5% NP-40, 0.5% DOC), once with TE (lOMm Tris-HCl pH 8.0, ImM EDTA), and then eluted in 50 μΐ of 0.5% SDS, 300 mM NaCl, 5 mM EDTA, 10 mM Tris HC1 pH 8.0. The eluate was reverse crosslinked at 65C for 4 hours and then treated sequentially with 2 μΐ of RNaseA (Roche, 11119915001) for 30 min and 2.5 μΐ of Proteinase K (NEB, P8102S) for two hours.

Library construction module

Solid-phase reversible immobilization (SPRI) cleanup steps were performed using the Bravo liquid handling platform (Agilent) using a modified version of (Fisher et al., 2011). 120 μΐ SPRI AMPure XP beads (Agencourt) were added to the reverse-crosslinked samples, pipette-mixed 15 times and incubated for 2 minutes. Supernatant were separated from the beads using a 96-well magnet for 4 minutes. Beads were washed on the magnet with 70% ethanol and then air dried for 4 minutes. The DNA was eluted in 40 μΐ EB buffer (10 mM Tris-HCl pH 8.0) by pipette mixing 25 times. For the remainder of the library construction process (DNA end-repair, A-base addition, adaptor ligation and enrichment) a general SPRI cleanup involves addition of buffer containing 20% PEG and 2.5 M NaCl to the DNA reaction products (without moving them from their original well position). After thorough mixing and a 2-minute incubation at room temperature, plates are transferred to a magnet plate, incubated for 4 minutes and supernatant removed. Beads are then washed on the magnet with 150 μΐ 70% ethanol and then air dried for 4 minutes. The DNA is eluted with 40 μΐ of EB buffer by pipette mixing 25 times. Reagent kits are prepared in advance for all enzymatic steps (New England Biolabs). The DNA end-repair was performed by adding 27 μΐ of a master mix (17 μΐ master mix (5 μΐ T4 buffer, 5 μΐ BSA-lmg/ml, 5 μΐ ATP-10mM- 2 μΐ dNTPs 10 mM), 5 μΐ T4 PNK enzyme, 5 μΐ T4 polymerase (3 units) to each well.

Samples were incubated in a thermal cycler at 12C for 15 min, 25C for 15 min, and finally cooled to 4 °C. The SPRI bead clean up method was used to purify the product (147 μΐ of 20% PEG, 2.5 M NaCl was added to each sample and eluted in 40 μΐ EB). The A-base addition was performed by adding 20 μΐ master mix (17 μΐ A-base add mix, 3 μΐ Klenow (3'- >5' exonuclease) to each well and incubated at 37 °C for 30 minutes in a thermal cycler. SPRI bead clean up method was used to purify the product (132 μΐ of 20% PEG, 2.5 M NaCl was added to each sample and eluted in 19 μΐ EB). Adaptor ligation was performed by adding 34 μΐ of a master mix (29 μΐ 2x DNA ligase buffer, 5 μΐ DNA ligase) to each well. 5 μΐ PE Indexed oligo adaptors (0.75 uM ) was added to each well and samples were incubated 25C for 15 min in a thermal cycler. SPRI bead clean up with size selection was used to purify the ligated products (15.5 μΐ of 20% PEG, 2.5 M NaCl was added to each sample and eluted in 40 μΐ EB). Finally, enrichment PCR was performed by adding 10 μΐ of a master mix (2 μΐ Forward/Reverse Index Primer, 0.5 μΐ dNTP mix, 5 μΐ lOx Pfu Ultra Buffer, 1 μΐ Pfu Ultra- II Fusion, 1.5 μΐ Nuclease free water) to each well. Plate was transferred to a thermal cycler and ran a Pfu amplification program at 95 °C for 2 min, 16 cycles of: 95 °C for 30 sec, 55 °C for 30 sec, 72 °C for 60 sec, and finally 72 °C for 10 min. The final SPRI clean up coupled to size selection was performed (35 μΐ SPRI beads was added to each sample and eluted in 40 μΐ). Sample concentrations were measured and 5 μΐ was used for ChlP-String enrichment validation. For a detailed Automated iChIP setup procedure on the Bravo liquid handling platform. Enrichment validation: ChlP-String, DNA measurement on Nanostring

Details on the nCounter system are presented in full in (Geiss et al., 2008). A custom CodeSet constructed to detect a total of 786 probes covering -200 genes (for detailed design of the Nanostring code-set see below) was used. 5 μΐ of iChIP libraries DNA where denatured at 95 °C for 5 minutes and immediately cooled on ice. The denatured DNA product was applied directly into the hybridization reaction (5X SSPE, 0.1% Tween-20), and incubated at 65 °C for 16 hours in a PCR machine with a heated lid. The samples were loaded onto the nCounter prep station followed by quantification using the nCounter ® Digital Analyzer 2.

Antibody quality control and Nanostring probe design

In order to rapidly and efficiently QC antibodies in our system, nCounter ® probe-set as designed that target regulatory regions that are active during immune stimulation. A list of 185 induced post stimulation in DCs was first selected together with a set of 16 control genes that are either not expressed (Cryaa, Pckl, Hbb-bl, Gabrbl, Drd2, Pou5fl, Sox2) or that are expressed but their expression remains unchanged (Gapdh, Meal, Ndufa7, Ndufs5, Rbm6, Shfml, Tbca, Tomm7, Ywhaz) upon stimulation with LPS. Because active regulatory regions are enriched in signature chromatin marks (H3K4me3 and H3K4mel) and PolII peaks, a dataset composed of ChIP of PolII and K4me3 in unstimulated and stimulated macrophages was used. Macrophages have a similar gene expression program to DCs after LPS stimulation, so this dataset was combined with the curated list of genes to design probes that target candidate regulatory regions. Scripture (see below) was first used to call peaks of H3K4me3 and PolII enrichment. Annotated transcription start sites of all genes were then targeted, and 4 probes (2 probes for control genes) were designed that centered at the TSS and complemented this set with two probes tiling of any significant PolII peak or K4me3 peak that lied within the gene body but did not overlap any of the original probes (Figure IB). 2 probes were added for any significant K4me3 peak that lied within 30Kb of the TSS of the genes targeted. The final probeset consisted of 786 probes targeting regulatory regions of -200 genes.

Dendritic cell isolation, culture, and LPS stimulation

To obtain sufficient number of cells, a modified version of the DCs isolation used in Lutz et al. was implemented. Briefly, 6-8 week old female C57BL/6J mice were obtained from the Jackson Laboratories. RPMI medium (Invitrogen) supplemented with 10% heat inactivated FBS (Invitrogen), β-mercaptoethanol (50uM, Invitrogen), L-glutamine (2 mM, VWR) penicillin/streptomycin (100 U/ml, VWR), MEM non-essential amino acids (IX, VWR), HEPES (10 mM, VWR), sodium pyruvate (1 mM, VWR), and GM-CSF (20 ng/ml; Peprotech, Rocky Hill, NJ) was used throughout the study. At day 0, bone marrow-derived dendritic cells (BMDCs) were collected from femora and tibiae and plated on twenty (per mouse), 100 mm non-tissue culture treated plastic dishes using 10ml medium per plate. At day 2, cells were fed with another 10ml medium per dish. At day 5, cells were harvested from 15 ml of the supernatant by spinning at 1400 rpm for 5 minutes; pellets were

resuspended with 5 ml medium and added back to the original dish. Cells were fed with another 5 ml medium at day 7. At Day 8, all non-adherent and loosely bound cells were collected and harvested by centrifugation. Cells were then re-suspended with medium, plated at a concentration of 15xl0 6 cells in 10 ml medium per 100mm dish. At day 9, cells were stimulated for various time points with LPS (100 ng/ml, rough, ultrapure E. coli K12 strain, Invitrogen ® ).

RNA extraction and RNA-Seq library preparation.

Total RNA was extracted with QIAZOL ® reagent following the miRNEASY ® kit's procedure (Qiagen), and sample quality was tested on a 2100 Bioanalyzer (Agilent). The RNAA+- Seq libraries were prepared using the 'dUTP second strand (strand specific) protocol as described in (Levine et al 2010). Briefly, extracts were treated with DNase (Ambion 2238). Polyadenylated RNAs were selected using Ambion's MicroPoly(A)Purist kit (AM1919M) and RNA integrity confirmed using Bioanalyzer (Agilent). RNA was fragmented by incubation in RNA fragmentation buffer (Affymetrix) at 80 °C for 4 minutes. Fragmented RNA was mixed with 3 μg random hexamers (Invitrogen), incubated at 70 °C for 10 min, and placed on ice briefly before starting cDNA synthesis. First-strand cDNA was synthesized with this RNA primer mix by adding 4 μΐ 5x first-strand buffer, 2 μΐ 100 mM DTT, 1 μΐ 10 mM dNTPs, 4 μg of actinomycin D, 200 U Superscript III and 20 U

SUPERase-In, incubating at room temperature for 10 min followed by 1 h at 55 °C. Second- strand cDNA was synthesized by adding 4 μΐ of 5x first-strand buffer, 2 μΐ of 100 mM DTT, 4 μΐ of 10 mM dNTPs with dTTP replaced by dUTP (Sigma), 30 μΐ of 5x second-strand buffer, 40 U of Escherichia coli DNA polymerase, 10 U of E. coli DNA ligase and 2 U of E. coli RNase H, and incubating at 16 °C for 2 h. cDNA was eluted using the Qiagen MiniElute kit with 30 μΐ of the manufacturer's EB buffer. DNA ends were repaired using dNTPs and T4 polymerase (NEB), followed by purification using the MiniElute kit. Adenine was added to the 3' end of the DNA fragments using dATP and Klenow exonuclease (NEB; M0212S) to allow adaptor ligation, and fragments were purified using MiniElute. Adaptors were ligated and incubated for 15 min at room temperature (25 °C). Phenol/chloroform/isoamyl alcohol (Invitrogen 15593-031); extraction followed to remove the DNA ligase. The pellet was then resuspended in 10 μΐ EB buffer. The sample was run on a 3% agarose gel (Nusieve 3: 1 agarose, Lonza) and a 160-380 base pair fragment was cut out and extracted. PCR was performed with Phusion High-Fidelity DNA Polymerase with the manufacturer's GC buffer (New England Biolabs) and 2 M betaine (Sigma). PCR conditions were 30 s at 98 °C; 16 cycles of 10 s at 98 °C, 30 s at 65 °C, 30 s at 72 °C; 5 min at 72 °C; forever at 4 °C. Products were run on a polyacrylamide gel for 60 min at 120 V. The PCR 22 products were cleaned up with Agencourt AMPure XP magnetic beads (A63880) to completely remove primers and the product was submitted for Illumina sequencing. All libraries were sequenced using the Illumina Genome Analyzer (GAII). Two lanes for each sample were sequenced,

corresponding to 45 million paired-end reads/sample (90 million single reads, 76 bases long) on average. shRNA knockdowns

High titer lentiviruses encoding shRNAs targeting Irf2 were obtained from The RNAi

Consortium (TRC; Broad Institute, Cambridge, MA, USA). Bone marrow cells were infected with lentiviruses as previously described (Amit et al., 2009). Five shRNAs were tested for knock down efficiency using qPCR of the target gene. shRNAs with >75 knockdown efficacy were selected. Measurements of gene expression in unstimulated cells were carried out using a signature gene set in the nCounter Digital Analyzer as previously described (Amit et al., 2009). Lentivirus-infected cells were composed of -90% CD11C+ cells, which was comparable to sorted BMDCs (Amit et al., 2009). nCounter data analysis

The following pipeline was used to analyze the Irf2 knockdown data and re-analyze the knockdown data from (Amit et al., 2009). For each sample, the nanostring count values were divided by the sum of counts that are assigned with a set of control genes that are the least affected by shRNAs and LPS stimulation (10 gene altogether, including Ndufa7, Tbca, and Tomm7; see (Amit et al., 2009)). For each condition, a fold change ratio was computed by comparing to five control samples infected with non-targeting shRNA. The results of all pairwise comparisons were then pooled together (i.e., AxB pairs for A repeats of the condition and B control samples): a substantial fold change (above a threshold value t) on the same direction (up/ down regulation) in more than half of the pairwise comparisons was required. The threshold is determined as max (1.5, d), where d is the mean + 1.645 times the standard deviation in the fold change shown by the control genes (corresponding to p=0.05, under assumption of normality). All pairwise comparisons in which both control and knockdown samples had low counts before normalization (<100) were ignored.

Western Blot and antibody validation

Nuclear extracts from mouse bone marrow dendritic cells (DC) were prepared by using NE-PER nuclear and cytoplasmic extraction reagents (Thermo scientific, USA), and following the instructions of the manufacturer with minor modifications. Briefly, 10 million cells were harvested by centrifugation and washed with PBS. Cells were transferred to an eppendorf tube and 1ml of CER I buffer was added. Cells were resuspended by vigorous vortexing for 15 sec followed by incubation on ice for 10 min. 55 μΐ of ice-cold CER II buffer was added and vortexed for 5 sec on the highest setting. After one min incubation on ice the tube was vortexed for another 5 sec and the nuclear fraction was separated from cytoplasmic extract (supernatant) by spinning for 5 min at maximum speed. Insoluble pellet was resuspended with 125 μΐ of NER buffer and vortexed for 15 sec. It was then sonicated in Branson digital sonifier for 30 seconds using 45% amplitute and 0.7 sec pulse on/1.3 sec pulse off setting. The tube was centrifuged for 10 min at 16000xg speed and the supernatant containing nuclear extract was transferred to a clear tube. All steps were carried out at 4 °C. 20 μg of the nuclear proteins were separated by SDS23 PAGE and transferred to PVDF membrane (BioRad). Prestained protein molecular mass marker (BioRad) was run to monitor electrophoretic transfer and to determine relative size. Membranes were probed with antibodies and visualized by the enhanced chemiluminescence (ECL, Amarsham) method according to the instructions of the manufacturer.

Immuno precipitation of Pol-II complexes

Cross-linking: Mouse bone marrow dendritic cells (DC) were stimulated with LPS (lOOng/ml) for 2 hours and cross-linked with DSP (Thermo Scientific) to a final concentration of ImM for 30 min at room temperature with mild shaking. Cells were then incubated with glycine to a final concentration of 10 mM for 15 min to stop the cross-linking reaction. Cross-linked cells were harvested by centrifugation and washed with ice cold PBS three times. Cells (120 million cells maximum) were incubated with 1ml of RIPA buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA pH 8.0, 14 mM NaCl, 1% Triton X-100, 0.1% SDS, 0.1% DOC) for 10 min on ice and sonicated in Branson digital sonifier for 3 min using 45% amplitute and 0.7 sec pulse-on/ 1.3 sec pulse-off setting. Cells were centrifuged for 10 min at maximum speed and the supernatant containing cell lysate was transferred to a clear tube. 50 μΐ of the lysate was saved as an input sample. For Immunoprecipitation: 75 μΐ of magnetic beads (Dynabeads, Invitrogen) were washed once with the blocking/binding buffer (PBS,

0.5% Tween 20, 0.5% BSA) and incubated with 5 μg of either control antibody mouse IgG or Polymerase II in a 250 μΐ volume at room temperature for one hour with rotation. After washing with 400 μΐ of binding buffer, antibody conjugated beads were incubated with 250- 300 μΐ of cell lysate for 2 hours at 4 °C with constant rotation. Generally lysate from 25 million cells was used per immunoprecipitation reaction. Beads were washed four times with RIPA buffer and twice with RIPA buffer supplemented with NaCl (Final concentration (500 mM). Both beads and input samples were boiled with SDS sample buffer containing 5% b- ME for 10 min and Western blots were performed. Membranes were probed with PolII, Statl and PU.1 antibodies respectively and visualized by the enhanced chemiluminescence (ECL, Amarsham) method according to the instructions of the manufacturer.

Sequencing and read alignments

ChIP Sequencing was done on Illumina HiSeq-2000 at the Broad Institute sequencing center. Pooled libraries were sequenced in -12 samples per lane at a sequencing depth of ~8 million aligned reads per sample. Initially, several libraries were sequenced using different sets of read lengths with and without paired-end reads test impact on analysis. The optimal read length for both cost and sensitivity was 44 bases (8 bases in this scheme are used for the indexes) with which all late libraries were sequenced. Reads for each index and each lane was aligned to the mouse reference mouse genome NCBI37, using BWA (Li and Durbin, 2009) version 0.5.7 with parameters 24 -q 5 -1 32 -k 2 -t 4 -o 1 -f for the aln command and -P -a 600 -f for the sampe command. After combining reads from different lanes corresponding to the same timepoint, an average of 11,249,898 (7,370,024 sd) reads per timepoint were aligned for transcription factors and an average of 22,678,617 (10,176,016 sd) for chromatin and PolII libraries. In addition, a "compressed" alignment for each TF factor obtained was created by merging the alignments for each of the timepoints for a given library. RNA sequencing was done for samples obtained from DCs pre-stimulation, 1, 2, 4 and 6 hours post stimulation and performed on an Illumina GA-II using 2 lanes per sample and a read length of 76 bases. All reads were aligned to the mouse reference genome (NCBI 37, MM9) using the TopHat aligner (version 1.1.4(Kim and Salzberg, 2011). Briefly, TopHat uses a two-step mapping process, first using Bowtie (Langmead et al., 2009) to align all reads that map directly to the genome (with no gaps), and then mapping all reads that were not aligned in the first step using gapped alignment. TopHat uses canonical and non-canonical splice sites to determine possible locations for gaps in the alignment. The EST database was used, which was downloaded from the UCSC genome browser (Fujita et al., 2011) to improve TopHat sensitivity for splice alignments. Specifically, the following TopHat parameters were used: - g 15 -r 250—library- type fr-firststrand -G spliced.est.gtf -p 4, where the spliced.est.gtf file was downloaded from UCSC. An average of 73 million uniquely aligned reads were obtained, of which an average 55 million aligned in proper pairs and 15 million aligned spanning a putative spliced junction. In addition, 4SU labeled libraries collected from unstimulated DCs were used every hour for 6 hours after stimulation.

Peak calling

A contiguous segmentation algorithm (described in (Guttman et al., 2009) was implemented as part of the Scripture package (available from website:

broadinstitute.org/software/scripture/) and used it to call, score and filter peaks for both chromatin and TF libraries. Scripture calls peak using the same statistical methods reported previously (Guttman et al., 2010), with efficiency improvements that are possible thanks to the contiguity of enriched and result in faster runtime compared to rnaSeq transcript reconstruction. The algorithm scans fixed-sized windows across the genome, identifies significant windows, then merges and trims to obtain peaks. Different ChIP libraries have very different peak characteristics {e.g., H3K27Ac is elongated with moderate enrichment, while transcription factors peaks are small and highly enriched) so the window size allowed us to find peaks of very different nature. Significance (at the user specified level) is assessed using the scan statistic (Wallenstein and Neff, 1987) with underlying Poisson distribution with mean the average number of reads found for a similar window size in the genome.

Because fixed windows were scanned, windows at the edges of peaks tended to have lower coverage at the ends. Bases at the ends of enriched regions were trimmed, using a quantile of coverage specified by the user.

For transcription factors, the following parameters were used to scan 200 base windows and define peaks composed of windows whose coverage at 0.05 significance (under the scan statistic):

Java -Xmx3000m -jar scripture.jar -task chip -maskFileDir <Our local file of Mouse masked regions> -trim -windows 200 -fullScores -alpha 0.05 - min MappingQuality - alignment <TF BAM alignment>

To allow for longer, possibly less enriched peaks, -windows 750,1500 were used for Chromatin marks and PolII, but only windows with read coverage at 0.01 significance were considered. For each TF and chromatin modification, the compressed alignment was used as the input to Scripture. This made it possible to obtain a single set of regions using the combined power of all time points combined. Peak filtering and scoring

For a given library C, a significant region R is score by the enrichment score:

* ' ' * ' , where N is the total number or reads, L is the length of the alignable genome, NR the total reads overlapping the region and IR is the length of the region R. Regions of open chromatin tend to generate more reads than regions of less accessible DNA regardless of enrichment for a specific antibody target. In order to control with this and other fluctuations in read coverage not due binding of DNA to the target protein, whole cell extract (WCE) libraries were used as our null set. For every library C, Scripture significant regions were further filtered by running a fixed window of 150 bases (b) across the region computing

/:. <*,

a fold score ' sa ' and kept only those significant regions having a fold score EC(R) > 3. The peak score was set to the maximum scoring 150 b window across the region. This score makes regions comparable independently of their size. This highest scoring 150 b window within a peak is called the peak summit. Each peak was then scored using the time course alignments by computing the enrichment score of the maximum scoring window for the time point. Transcriptome annotation and quantification (RNA-Seq)

Top-Hat alignments were processed by Scripture (Guttman et al., 2010) to obtain significantly expressed transcripts for each time course. Only multi-exonic transcripts were retained. Scripture was run using the following parameters to find transcripts one chromosome at a time:

java -Xmx5000m -jar scripture.jar -minSpliceSupport 2 -trim -maskFileDir

<mm9_masked_regions> -windows 0 -alignment <timepoint_rnaseq_alignment.bam> - out chr<chrSymbol>.segments.bed -chr chr<chrSymbol> -chrSequence chr<chrSymbol>.fa

Transcript expression of RefSeq (Pruitt et al., 2007) annotations (as of 02/24/2011) downloaded from the UCSC table browser were quantified. Scripture as used to first find constituent isoforms (Guttman et al., 2010) for those genes with multiple isoforms. The Reads Per Kilobase of transcripts per Million reads (RPKM) was then computed based on the both the total and labeled RNASeq data for further analysis of expression. Systematic selection of transcription factor targets for iChIP

In order to systematically select for potential TFs functional in the DC system, the RNASeq time course data of DC activated with LPS as used. A list of 1885 transcription factors was filtered for maximal expression (in our RNA-Seq data) at any of the time points. The list was then manually curated to remove any gene that is not a sequence specific TF (e.g., general transcription machinery, and chromatin modifiers). Any TF that was expressed at any of the time points above 15 RPKM (for RPKM calls see below) was designated as "expressed" and further selected as a TF target, and was screened for potential antibodies in commercial antibody vendors databases. Motif analysis

Both de-novo motif discovery and known motif matching was performed using the MEME software suit. First, sequence for the summit for each high scoring peak (see above definitions) was extracted and used it as input for the MEME-ChIP pipeline website:

meme.nbcr.net/meme4_6_l/memechip-intro.html) which runs MEME (Bailey and Elkan, 1994) for motif discovery, TOMTOM (Gupta et al., 2007) to search discovered motifs within existing databases and MAST (Bailey and Gribskov, 1998) to search for known TF motifs in the sequence provided. Figure 14 summarizes the motif analysis results. Briefly, at a significance cutoff of 0.01, known motifs were found for 17 of the 29 transcription factors. For MafF and Ahr, known motifs associated with different factors were found: NFe212 and Nfy respectively. For three factors: Fos, Hif 1A and Ets2, significant motifs were found, which did not match any known motifs in the databases. Finally, if the stringency cutoff was relaxed, known motifs for RelA were found (see website:

weizmann. ac.il/immunology/ AmitLab/data-and-method/iChlP/Tables). Factors for significant motifs within their summits were not found were either "runners" like E2fl, Nfkbl and Nfkb2 or had very few high scoring peaks (like Jun).

Generating the global property map

For generating the global property map in Figure 2B, the peaks were filtered by requiring a minimum score of 20. Peaks overlapping a promoter region that where closer to a transcriptional start site than to a 3'UTR were considered promoter peaks. Peaks were classified as 3'UTR whenever they were within lkb of an annotated 3' end and no transcriptional start site was closer. Peaks overlapping enhancer regions were classified as enhancer bound. The remaining peaks were classified as intronic, exonic or "other" whenever they overlapped an annotated intron, exon or neither.

Assessing co-binding and overlap of peaks with annotated genomic regions

For each pair of ChIP assays (say x, and y), a binomial p-value was used to assess their overlap in the genome as described in (McLean et al., 2010). The number of hits is set to the number of x peaks that fall within 500 bp away from some peak of y. The background probability set to the length of regions associated with y (i.e., taking 500 bp margin around each of its peaks) divided by the overall length of genomic regions that are associated with at least one ChIP assay. A similar computation was performed for assessing the overlap of ChIP assay peaks with annotated genomic regions. To compute the overlap of assay x with region y, the number of hits was set to the number of x peaks that overlap with y. The background probability was set to the length of regions associated with y divided by the overall length of the 27 genome. The regions used include: (i) regulatory features annotations from ensembl (Flicek et al., 2010), (ii) regulatory features found by the oregano algorithm(Griffith et al., 2008), (iii) conserved regions annotated by the multiz30way algorithm, here regions with multiz30way score>0.7 were considered, and (iv) repeat regions annotated by RepeatMasker (website:repeatmasker.org). Region coordinates (ii - iv) were downloaded from the UCSC genome browser. Computation of the percent of bound motifs

For every de-novo found motif with an E- value of less than 0.01, every peak was scored with an enrichment of 20 or more by evaluating the match to the motif within the peak using the standard log-odds ratio score of the probability of a given k-mer being generated by the inferred position weight matrix and the probability of the k-mer being generated by a neutral model of 40% GC, the mouse genome wide GC content percent. The 10 th percentile was the used in a genome wide scan for available motifs. The percent of bound motifs was the ratio of motifs scoring higher than the cutoff to the number of genome wide matches above the cutoff.

Defining TF-region and TF-gene associations

A TF-region association matrix was first defined having columns corresponding to TF binding in the four studied time points (altogether 4 columns per TF), and rows correspond to regions (promoters and enhancers). The association value is the sum of enrichment scores over all the peaks of the TF that fall within the given region at the given time point. Only peaks that had a sufficiently high enrichment score during at least one time point were considered. In this analysis, a cutoff of 26.9 was used, which corresponded to the mean + 0.25*std of all peak enrichment scores. This cutoff also corresponded to the top -33% scoring peaks. Abinary version of the association matrix was defined. In the binary TF- region association matrix, each factor was associated with four columns (one for each time point). The values in the binary matrix were "1" if the TF has at least one peak within the region that has an enrichment score over the cutoff value (and "0" otherwise). A categorical TF-region association matrix where each factor wasassociated with a single column was also defined. An entry in the categorical matrix will have value of "none" if there is no peak (in either of the time points) within the respective region that has an enrichment score over the cutoff value. Otherwise, if the respective scores at the TF-region association matrix had substantially increased (3-fold, see next section) over time in comparison to the basal state (at t=0), then the value will be "gain". If the scores have decreased (3-fold), the value will be "loss", and if they remained on the same scale (<3-fold change), the value will be "static". Altogether, 186,959 TF-region interactions were mapped (i.e., no "none" entries in the categorical TF-region association matrix). Turning to the gene level, a binary TF-gene association matrix was defined, whose rows correspond to target genes and whose columns correspond to TF binding in the associated enhancers or promoters during the four studied time points (altogether 8 columns per TF). The values in the binary matrix are "1" if there is at least one peak with an enrichment score over the cutoff value that falls in the respective regions (promoters or enhancers) that are associated with the gene. A categorical TF-gene association matrix was also defined, where each factor is 28 associated with one column as above. The values in the matrix are determined in accordance to the regions associated with the gene using the categorical TF-region association matrix. If there are no bound enhancers or promoters, the value will be "none". Otherwise, if at least 50% of the bound enhancers or at least 50% of the bound promoters are associated with gain, then the value will be "gain". If at least 50% of the bound enhancers or at least 50% of the bound promoters are associated with loss, then the value will be "loss" (in the rare event (2.5% of the cases) where both conditions hold, the entry is marked as "static"). If both conditions do not hold, the entry was marked as "static". The binary and categorized matrices were used throughout the analysis as a references for defining TF binding events and the dynamics of these events.

Evaluating the tendency of TF to co-bind at similar regions

For every pair of TF, their tendency to bind at the same regions during the same time point was evaluated using a hypergeometric p-value: , where N is the overall number of regions bound at that time point, B is the number of regions bound by the first TF at the given time point, n is the number of regions bound by the second TF at the given time point, and b is the number of regions bound by both at the given time point. The analysis was limited to regions with at least two binding events. Further, to get more specific results, highly occupied target (HOT) regions with 10 or more bound TF were filtered out. TF pairs that had a p-value lower than 10-3 during at least one time point are shown (Figure 4B).

Turnover score

The Turnover score (Figure 5B) reflects the extent of temporal changes in binding relative to the unstimulated state, defined as the percentage of bound regions that have a substantial (>3 fold) change (up/down) in their respective binding score, comparing the basal binding (at t=0). This score was computed based on the "gain," "loss," and "static" events in the categorical TF-region association matrix defined above. In Figure 17, the same analysis was repeated with other cutoff values (cutoff=[2... 12]). The timing associated with gain of binding (Figure 5B) was determined as the first time point where the fold increase in binding score reached 50% of its maximum level (the timing of binding-loss events is determined in a similar manner).

Cluster analysis

To generate the clusters in Figure 6B (for genes) and Figure 18 (for regions), the k- means algorithm was used with k=8 and on the set of elements (genes or regions) that have at least three binding events in the respective categorical association matrix (which in most case translates to "additional TF on top of PU. l and Cebpb"). As metric, the Hamming distance (percentage of elements that are either bound by both TFs or not bound by both TFs) was used. For each gene-based cluster, the following was computed: (i) Overlap with genes associated with immflamatory and anti-viral response (Figure 6B); and (ii) Functional enrichment using annotations from the 29 MsigDB dataset using the "canonical pathway" subset of the curated gene set (c2.cp.v3.0), the "cellular process" subset of the Gene ontology gene set (c5.all.v3.0), and the motif gene set (c3.all.v3.0);

website:broadinstitute.org/gsea/downloads.jsp). The significance of these overlaps were evaluated using a hypergeometric p-value (see formula above), where B is the size of the cluster; n is the number of genes that have the investigated property {i.e., a functional group from MsigDB, or genes annotated as inflammatory or anti-viral 4); b is the number of genes that belong to the cluster and to the annotated set; and N is the background set of genes. For the first test, N is the number of genes with at least three binding events (N=l l,351). For the second test, N is the number of genes in the MSigDB database (N=14,017). The genes were clustered based on their temporal transcription profile (using the 4SU-Seq data;). To generate these clusters, the k-means algorithm was used with Pearson correlation as metric and with k=l l(the smallest k for which the worst within-cluster similarity was Pearson-r>0.8). The clusters are presented in Figure 22. Even though the clusters were computed with the 4SU- Seq data, the Pol-II binding profiles in the clusters were also coherent, and matched, with a certain time-gap, the 4SU-Seq profiles. Functional enrichments of TF targets

Functional enrichments were computed for the target genes of every TF using the same scheme applied for the clusters (see previous section; Figure 6C).

Principal component analysis

The pcomp Matlab function was applied to the 28x6 dimensional matrix consistent of all transcription factors (excluding Atf4 for which there was only one time point) and six binding characteristics scored: log of the number of bound regions, percent of dynamic binding events, promoter to enhancer binding ratio, percent of regions bound in isolation, running score, and percent of genome wide motifs bound by the TF. All covariates were standardized (mean zero, STD 1) prior to the analysis. The biplot Matlab function was used to present the TFs projections and the loading of the different covariates for the first three principle components (Figure 8A). Notably, the first three principal components account for 88% of the variance in the data.

Mammalian genomes can give rise to hundreds of cell types; each with distinct functions and responses, but the mechanisms by which this plasticity is encoded in the TF- DNA networks is only partly understood. It was found that the response of primary innate DCs to pathogen stimulus is orchestrated by a multilayered TF network with at least three major layers. A first layer consists of Pioneer TFs, which have been previously shown to play a role in establishing cell identity by shaping the cells epigenetic state during

differentiation. Most studies of pioneer factors have focused on differentiation where Pioneers show dynamic changes. In contrast, this study focused on environmental stimulation in differentiated cells where negligible changes in Pioneer occupancy were observed across the response. The little turnover in enhancer and promoter marks in the early response suggests that the response of DCs to pathogen leaves little epigenetic memory and that the cell does not substantially alters its permanent state following this response.

However it cannot be ruled out that epigenetic changes do take place at later times (Foster et al, 2007).

This analysis suggests the existence of another layer of potent and relatively static binders, which bind thousands of genes prior to stimulation and are associated with highly expressed or inducible genes. These factors may function to prime their gene targets for expression, and thus term them "primer" factors. Primer binding pre-stimulation is strongly correlated with gene induction and expression levels (Figure 20); however primers are usually not specific for a defined biological process (e.g., anti-viral or inflammatory transcriptional programs; Figure 6C).

Primers that are already loaded into cis-regulatory elements, most prominently at early-induced genes, may in some embodiments, 'poise' genes for induction under irrelevant conditions. In certain embodiments, once the appropriate signaling events are integrated, they may serve as beacons, to direct other TFs or post-translation modifying enzymes to the appropriate site to tune gene-expression, a recruitment role previously suggested for the pioneer factors Cebpb, PU. l, E2a and Ebf (Cirillo et al., 2002; Cirillo and Zaret, 1999; Heinz et al., 2010). A third layer of TFs is more closely associated with a specific signaling cascade or regulatory pathway (Figure 6C, Table S7). Factors in this Transducer layer work together with Primer factors to regulate genes in specific gene programs (Figure 8B). In certain embodiments, this model may generalize to other transcriptional responses in different cell types.

A more complete understanding of mammalian regulatory circuits may require comprehensive mapping of TFs across a range of cell states, conditions, and responses. In addition, a map of the differences across individuals in a population and across evolutionary history will provide critical insights into the mammalian regulatory code and their role in human disease. This may extend the layered organization to other cellular states, and may enable efficient engineering of cellular identities by controlling the expression and timing of different regulatory layers.

Example 2

End Repair Mastermix Preparation

Adapted from (Fisher et al., 2011)

II Thaw the reagents on ice.

21 Once the reagents have thawed, prepare the appropriate amount of mastermix for the samples plus an additional 10 samples worth of reagent to account for dead volume as detailed in Table 4.

Table 4: Reagents Used for End repair Mastermix Preparation

31 Once the reagents have been combined, gently mix the mastermix, then place back on ice.

End Repair Automated Mastermix Dispense Protocol

II Set head mode to 1 column: 12.

21 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 1 in column one from a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck. (Tips only need to be present in

quadrants 1 and 2 of each position in which a sample is located).

31 Aspirate 27 μΐ of End Repair Mastermix from the 5ml Deerac disposable reservoir located in column 3 of the low volume insert holder located at position 7 on the Agilent Bravo deck. 41 Dispense the 27 μΐ of End Repair Mastermix into samples located in Column one of the sample plate located at position 8 of the Agilent Bravo.

51 Tips are knocked off for disposal at quadrant 1 in column one of an empty 70 μΐ ST VI 1. 61 Repeat steps 1 through 3 for all subsequent columns on the sample which contain samples. Clean tips should be used each time mastermix is aliquotted into a new column on the sample plate. For column 1, put the tips on in column 1 of the tip box and off in column 1 of the tip trash. For column 2 of the sample plate, put the tips on in column 3 of the tip box and off in column 3 of the tip trash. For column 3 of the sample plate, put the tips on in column 5 of the tip box and off in column 5 of the tip trash.

71 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 2 of a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck.

81 Perform a Dual Height Mix on the wells containing sample and mastermix. Aspirate 40 μΐ at a height of 1 mm from the bottom of the well and dispensing 40 ml from the bottom of the well and dispensing at a height of 5 mm. Mix approximately 15 times.

91 Knock off tips for disposal into each quadrant 2 of an empty 70μ1 St VI 1. 101 Once the protocol is complete, seal wells containing sample with ABI optical caps and place the sample plate on thermocycler. (Thermoprofile consist solely of an initial incubation of 12 °C for 15 minutes followed by 12 °C for 15 minutes then a held at 4 °C indefinitely). Set up of the Agilent Bravo with LT head for End Repair 2.2X Cleanup

Process Steps automated on the Bravo

II Put on 180 μΐ tips from tip box # 1.

21 Aspirate 147.4 μΐ of 20% PEG 2.5M NaCl from the 20%PEG 2.5M NaCl source plate, and dispense into sample plate.

31 Perform a Dual Height Mix to ensure the AMPure XP beads are properly resuspended in 20% PEG 2.5 M NaCl buffer. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 13 mm from the bottom of the well. Mix approximately

130 μΐ 12 times.

41 Allow the sample plate to sit for 2 minutes, after which time place the sample plate, on the Dynal MPC - 96 S plate magnet for 4 minutes to allow the AMPure XP beads to separate from the solution.

51 Remove and discard the supernatant into the 20% PEG 2.5M NaCl source plate.

61 Discard the used 180 μΐ tips into tip box # 1.

71 Put on 180 μΐ tips from tip box # 2.

81 Leaving the sample plate on the Dynal MPC-96S magnet plate, aspirate 100 μΐ of 70% EtOH and dispense into the sample plate. DO NOT MIX.

91 Allow the AMPure XP beads and sample sit in the 70% EtOH for 30 seconds, then remove the EtOH and discard into the 70% EtOH source plate.

101 Discard tips into 180 μΐ Tip Box # 2.

111 Move the sample plate off of the Dynal MPC-96s magnet plate and allow the sample- AMPure bead complex to air dry for approximately 4 minutes at room temperature.

121 Put on 180 μΐ tips from tip box # 3.

131 Aspirate 40 μΐ of Tris-HCl pH 8.0 and dispense into sample plate.

141 Perform a Dual height mix. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 6 mm from the bottom of the well. Mix approximately 40 μΐ 15 times.

151 Discard used 180 μΐ tips into tip box # 3.

161 Using ABI optical caps, seal Eppendorf plate containing samples. A Base Addition Mastermix Preparation

II Thaw the reagents on ice.

21 Once the reagents have thawed, prepare the appropriate amount of mastermix for the samples plus an additional 10 samples to account for dead volume.

Table 5: Reagents Used for A Base Mastermix Preparation

31 Once the reagents have been combined, gently mix the mastermix, then place back on ice. Automated A Base Addition Mastermix Dispense Protocol

II Set head mode to 1 column: 12.

21 Pick up clean 70μ1 ST VI 1 Tips from quadrant 1 in column one from a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck. (Tips only need to be present in quadrants 1 and 2 of each position in which a sample is located).

31 Aspirate 20 μΐ of A Base Addition Mastermix from the 5ml Deerac disposable reservoir located in column 3 of the low volume insert holder located at position 7 on the Agilent Bravo deck.

41 Dispense the 20 μΐ of A Base Addition Mastermix into samples located in Column one of the sample plate located at position 8 of the Agilent Bravo.

51 Tips are knocked off for disposal into each quadrant 1 in column one of an empty 70 μΐ St Vl l.

61 Repeat steps 1 through 3 for all subsequent columns on the sample which contain samples. Clean tips should be used each time mastermix is aliquotted into a new column on the sample plate. For column 1, put the tips on in column 1 of the tip box and off in column 1 of the tip trash. For column 2 of the sample plate, put the tips on in column 3 of the tip box and off in column 3 of the tip trash. For column 3 of the sample plate, put the tips on in column 5 of the tip box and off in column 5 of the tip trash. And so on.

71 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 2 of a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck. 81 Perform a Dual Height Mix on the wells containing sample and mastermix. Aspirate 40 μΐ at a height of 1 mm from the bottom of the well and dispensing 40 ml from the bottom of the well and dispensing at a height of 5 mm. Mix approximately 15 times.

91 Knock off tips for disposal into quadrant 2 of an empty 70 μΐ St VI 1.

101 Once the protocol is complete, seal well containing sample with ABI optical caps and place the sample plate on thermocycler. (Thermoprofile consists solely of 37°C for 30 minutes then held at 4°C indefinitely).

Set up of the Agilent Bravo with LT head for A Base Addition 2.2X Cleanup

Process Steps automated on the Bravo

II Put on 180 μΐ tips from tip box # 1.

21 Aspirate 132 μΐ of 20% PEG 2.5M NaCl and dispense into sample plate.

31 Perform a Dual Height Mix to ensure the AMPure XP beads are properly resuspended in 20% 2.5 M NaCl buffer. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 13 mm from the bottom of the well. Mix approximately 130 μΐ 15 times.

41 Allow the sample plate to sit for 2 minutes, after which time place the sample plate, on the Dynal MPC - 96S plate magnet for 4 minutes to allow the AMPure XP beads to separate from the solution.

51 Remove and discard the supernatant into the 20% PEG 2.5M NaCl source plate

61 Discard the used 180 μΐ tips into tip box # 1.

71 Put on 180 μΐ tips from tip box # 2.

81 Leaving the sample plate on the Dynal MPC-96S magnet plate, aspirate 100 μΐ of 70% EtOH and dispense into the sample plate. DO NOT MIX.

91 Allow the AMPure XP beads and sample sit in the 70% EtOH for 30 seconds, then remove the EtOH and discard into the 70% EtOH source plate.

101 Discard tips into 180 μΐ Tip Box # 2.

I II Move the sample plate off of the Dynal MPC-96S magnet plate and allow the sample- AMPure bead complex to air dry for approximately 4 minutes at room temperature

121 Put on 180 μΐ tips from tip box # 3.

131 Aspirate 40μ1 of Tris-HCl pH 8.0 and dispense into sample plate. 141 Perform a Dual height mix . Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 6 mm from the bottom of the well. Mix approximately 40 μΐ 15 times.

151 Discard used 180 μΐ tips into tip box # 3.

161 Using ABI optical caps, seal Eppendorf plate containing samples. Adapter Ligation Mastermix Preparation

II Thaw reagents on ice. Thaw enough for the number of samples being run plus an extra 15 samples for to account for dead volume.

21 Prepare Adapter Ligation Mastermix as described in Table 3.

Table 6: Reagents used for Adapter Ligation mastermix preparation

31 Thaw the Indexed adapter plate, with each well containing at least 6 μΐ of a unique adapter, at room temperature.

41 Once thawed vortex the Indexed adapter plate at a moderate speed followed by a quick spin down.

Automated Adapter Ligation Mastermix Dispense Protocol

II Set head mode to 1 column: 12.

21 Pick up clean 70 μΐ ST VI 1 Tips from position 1 of each quadrant in column one from a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck. (Tips only need to be present in quadrants 1 and 2 of each position in which a sample is located)

31 Aspirate 34 μΐ of Adapter Ligation Mastermix from the 5ml Deerac disposable reservoir located in column 3 of the low volume insert holder located at position 7 on the Agilent Bravo deck.

41 Dispense the 34 μΐ of Adapter Ligation Mastermix into samples located in Column one of the sample plate located at position 8 of the Agilent Bravo.

51 Knock off tips for disposal into quadrant 1 in column one of an empty 70 μΐ St VI 1.

61 Repeat steps 1 through 3 for all subsequent columns on the sample which contain samples. Clean tips should be used each time mastermix is aliquotted into a new column on the sample plate. For column 1, put the tips on in column 1 of the tip box and off in column 1 of the tip trash. For column 2 of the sample plate, put the tips on in column 3 of the tip box and off in column 3 of the tip trash. For column 3 of the sample plate, put the tips on in column 5 of the tip box and off in column 5 of the tip trash. And so on.

71 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 2 of a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck.

81 Aspirate 6 μΐ of adapter from the Indexed adapter plate located at position 9 on the Agilent Bravo.

91 Dispense 6 μΐ of adapter into the corresponding wells of the sample plate. NOTE: Do not discard tips. They will be used to mix sample.

101 Perform a Dual Height Mix on the wells containing sample and mastermix. Aspirate 40 μΐ at a height of 1 mm from the bottom of the well and dispensing 40 μΐ from the bottom of the well and dispensing at a height of 5 mm. Mix approximately 15 times.

I ll Knock off tips for disposal into quadrant 2 of an empty 70 μΐ St VI 1.

121 Once the protocol is complete, seal wells containing sample with ABI optical caps and place the sample plate on thermocycler. (Thermoprofile consists solely of 25 °C for 15 minutes then held at 4°C indefinitely).

Set up of the Agilent Bravo with LT head for Adapter Ligation 0.7X Cleanup

Process Steps automated on the Bravo

II Put on 180 μΐ tips from tip box # 1.

21 Aspirate 40.6 μΐ of 20% PEG 2.5M NaCl and dispense into sample plate.

31 Perform a Dual Height Mix to ensure the AMPure XP beads are properly resuspended in the 20% PEG 2.5 M NaCl buffer. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 7 mm from the bottom of the well. Mix approximately 80 μΐ 15 times.

41 Allow the sample plate to sit for 2 minutes, after which time place the sample plate on a Dynal MPC - 96 S plate magnet for 4 minutes to allow the AMPure XP beads to separate from the solution.

51 Remove and discard the supernatant into the 20% PEG NaCl 2.5M source plate.

61 Discard the used 180 μΐ tips into tip box # 1.

71 Put on 180 μΐ tips from tip box # 2. 81 Leaving the sample plate on the Dynal MPC-96S magnet plate, aspirate 100 μΐ of 70% EtOH and dispense into the sample plate. DO NOT MIX.

91 Allow the AMPure XP beads and sample sit in the 70% EtOH for 30 seconds, then remove the EtOH and discard into the 70% EtOH source plate.

101 Discard tips into 180 μΐ Tip Box # 2.

I ll Move the sample plate off of the Dynal MPC-96S magnet plate and allow the sample- AMPure bead complex to air dry for approximately 4 minutes at room temperature.

121 Put on 180 μΐ tips from tip box # 3.

131 Aspirate 40μ1 of Tris-HCl pH 8.0 and dispense into sample plate.

14 Perform a Dual height mix. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 6 mm from the bottom of the well. Mix approximately 40 μΐ 15 times.

151 Allow the resuspended sample to sit for approximately 2 minutes.

161 Place the sample plate, on a Dynal MPC - 96S plate magnet for 3 minutes to allow the AMPure XP beads to separate from the solution.

171 Aspirate the eluate and dispense into the Eppendorf 96 well twin. tec plate located at position 7.

181 Discard tips into tip box # 3.

191 Using ABI optical caps, seal Eppendorf plate containing samples.

201 Proceed to Automated/Manual Pond Enrichment Master Mix addition.

Enrichment Mastermix Preparation

II Thaw the reagents on ice.

21 When the reagents have thawed, prepare the appropriate amount of mastermix for the samples plus an additional 10 samples to account for dead volume as detailed in Table 4. Table 7: Reagents Used for Pond Enrichment Mastermix Preparation

31 Once the reagents have been combined, gently mix the mastermix, then place back on ice. Pond Enrichment Automated Mastermix Dispense Protocol

II Set head mode to 1 column: 12.

21 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 1 in column one from a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck. (Tips only need to be present in quadrants 1 and 2 of each position in which a sample is located).

31 Aspirate 20 μΐ of Pond Enrichment Mastermix from the 5ml Deerac disposable reservoir located in column 3 of the low volume insert holder located at position 7 on the Agilent Bravo deck.

41 Dispense the 20 μΐ of Pond Enrichment Mastermix into samples located in Column one of the sample plate located at position 8 of the Agilent Bravo.

51 Knock off tips for disposal into quadrant 1 in column one of an empty 70 μΐ St VI 1.

61 Repeat steps 1 through 3 for all subsequent columns on the sample which contain samples.

Clean tips should be used each time mastermix is aliquotted into a new column on the sample plate. For column 1, put the tips on in column 1 of the tip box and off in column 1 of the tip trash. For column 2 of the sample plate, put the tips on in column 3 of the tip box and off in column 3 of the tip trash. For column 3 of the sample plate, put the tips on in column 5 of the tip box and off in column 5 of the tip trash. And so on.

71 Pick up clean 70 μΐ ST VI 1 Tips from quadrant 2 of a clean 70 μΐ ST VI 1 Tips box located at position 3 on the Agilent Bravo deck.

81 Perform a Dual Height Mix on the wells containing sample and mastermix. Aspirate 40 μΐ at a height of 1 mm from the bottom of the well and dispensing 40 ml from the bottom of the well and dispensing at a height of 5 mm. Mix approximately 15 times.

91 Knock off tips for disposal into quadrant 2 of an empty 70 μΐ St VI 1.

101 Once the protocol is complete, seal wells containing sample with ABI optical caps and place the sample plate on thermocycler. (Thermoprofile is diagramed in Table 5).

Table 5: Enrichment Thermoprofile

Ste fem erature C) l inn? seconds : Number oj C cles:

Set up of the Agilent Bravo with LT head for Pond Enrichment 1.8 X Cleanup

Pond Enrichment 1.8X Automated Cleanup deck preparation

Process Steps automated on the Bravo II Put on 180 μΐ tips from tip box # 1.

21 Aspirate 108 μΐ of Agencourt AMPure XP beads and dispense into sample plate.

Mix beads prior to aspiration

31 Perform a Dual Height Mix (mixing approximately 15 times), then allow the sample plate to sit for approximately 2 minutes. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 7 mm from the bottom of the well. Mix approximately 155 μΐ 15 times.

41 Place the sample plate on a Dynal MPC - 96 S plate magnet for 4 minutes to allow the AMPure XP beads to separate from the solution.

51 Remove and discard the supernatant into the AMpure XP beads source plate.

61 Discard the used 180 μΐ tips into tip box # 1.

71 Put on 180 μΐ tips from tip box # 2.

81 Leaving the sample plate on the Dynal MPC-96S magnet plate, aspirate 100 μΐ of 70% EtOH and dispense into the sample plate. DO NOT MIX.

91 Allow the AMPure XP beads and sample sit in the 70% EtOH for 30 seconds, then remove the EtOH and discard into the 70% EtOH source plate.

101 Discard tips into 180 μΐ Tip Box # 2.

I II Move the sample plate off of the Dynal MPC-96S magnet plate and allow the sample- AMPure bead complex to air dry for approximately 4 minutes at room temperature

121 Put on 180 μΐ tips from tip box # 3.

131 Aspirate 40 ml of Tris-HCl pH 8.0 and dispense into sample plate.

141 Perform a Dual height mix. Be sure to set the aspiration height to 1.5 mm from the bottom of the well and the dispense height to 6 mm from the bottom of the well. Mix approximately 40 μΐ 10 times.

151 Allow the resuspended sample to sit for approximately 2 minutes.

161 Place the sample plate, on a Dynal MPC-96S plate magnet for 3 minutes to allow the AMPure XP beads to separate from the solution.

171 Aspirate the eluate and dispense into the Eppendorf 96 well twin. tec plate located at position 7.

181 Discard tips into tip box # 3.

191 Using ABI optical caps, seal Eppendorf plate containing samples.

Visualizing iChIP with Integrative Genomics Viewer This section describes the iChIP extensions to Integrative Genomics Viewer (IGV). General documentation is available at website:broadinstitute.org/igv. The iChiP enabled IGV can be launched directly from website:broadinstitute.org/igv/ichip. This link will download and install IGV 2.0 with iChIP extensions, and open the iChIP dataset at the ilia locus. The iChIP extensions include a new command bar and popup menu.

Example 3

Exemplary High-throughput Chromatin Immuno-Precipitation protocol:

Day One: Cells were grown in 16 dishes of 6 cm diameter to confluency in 3 ml culture medium per dish. Optionally, cells may be contacted with an agent of interest, e.g., a stimulating agent, that may induce chromatin remodeling, activator/repressor recruitment to chromatin (promoters), changes in histone tail modification. For example, BMDC (bone marrow-derived dendritic cells) cells may be stimulated with LPS for 0, 2 and 6 hours.

For harvesting, the approximate cell numbers may be 3 million cells per dish. Tissue culture dishes and table top centrifuge should be cooled to 4°C.

1. Crosslink by adding 1% final formaldehyde concentration. Plates may be put on the rotary shaker, shaking slowly. Add 200 μΐ of 16% formaldehyde per 3 ml culture while shaking. Incubate exactly for 10 minutes.

2. Add 156 μΐ of 2.5M glycine to each plate and continue shaking for another 5 minutes. Glycine acts as a quencher. Provide fresh glycine (dissolve glycine at 65°C) about every month because glycine solution tends to change pH over time. At correct pH, medium color will change to yellow.

3. On ice, scrape cells and collect into 15 or 50 ml FALCON™ tubes depending on the volume.

4. Spin at 1250 rpm for 5 minutes at 4 °C. Remove supernatant, add 5 ml of cold

PBS + protease inhibitor (PBS+PI) and spin again.

5. Remove supernatant, transfer the pellet to 1.5 ml tubes and wash two more times. At this stage crosslinked cells can be flash frozen at -80 °C.

6. Add 1 ml RIPA+PI buffer for every 3x10 cells, allow cells to lyse for 10 min at 4 °C. At this stage couple antibody of interest to protein G magnetic beads (~1 hour room temperature) in binding/ blocking buffer (PBS + 0.5% TWEEN, 0.5% BSA). Wash two times 50 μΐ beads per sample using binding/blocking buffer and incubate 10 μg of antibody in 300 μΐ of binding/blocking buffer. 7. Sonicate cells using Branson SONIFIER™ S-450D

(website:sonifier.com/s450_digital.asp) at 45% amplitude, three repeats of a pulse of 0.7 seconds and a pause of 1.3 seconds, wherein the time per repeat is adjusted according to the cell source used: for dendritic cells use 5 minutes (a total of 15 minutes of sonication), for embryonic stem cells use 5.5 minutes (16.5 minutes total), and for K-562 cells (immortalized myelogenous leukemia line) use 4.5 minutes (13.5 minutes total).

8. Combine individual sonicated tubes to 2 ml tubes. Spin for 10 minutes at maximum speed at 4 °C. Transfer to new 2 ml tube.

9. Optionally, take 15 μΐ of each time point for whole cell extract (add 135 μΐ of RIPA) and 15 μΐ of each time point for gel analysis (add 135 μΐ of RIPA). Aliquot the remaining sonicated material to Protein G beads with specific antibody. Tumble over-night.

Day two:

1. Place the tube on a magnet and remove the supernatant. Wash once with 500 μΐ RIPA + PI, add 200 μΐ RIPA + PI and transfer to 96-well plate.

2. Wash beads with cold RIPA using multi channel, repeat washing 5 times and remove supernatant.

3. Wash twice with 200 μΐ cold RIPA-500 wash buffer.

4. Wash twice with 200 μΐ cold LiCl wash buffer.

5. Wash twice with 200 μΐ cold TE wash buffer.

6. After washing keep sample at RT and add 150 μΐ of direct elution buffer. Incubate sample at 65 °C for 4 hours to over- night. Alternatively, incubate sample for 10 minutes at 95 °C.

Add whole cell extract (from day 1 step 9) in this step for the remaining of the process.

7. Add 2 μΐ of RNase, incubate 37 °C for 30 min.

8. Add 1 μΐ glycogen and 2.5 μΐ Proteinase K (e.g., from Invitrogen) and incubate at 37 °C for 2 hours.

9. Elute DNA (using e.g., QIAGEN MINELUTE kit / Phenol extraction / Dynabead-DNA) with 25 μΐ of water.

10. Optionally, estimate DNA (e.g., QBIT™ assay).

11. Optionally, validate using Q-PCR.

12. Optionally, use whole genome amplification (e.g., SIGMA, as described by manufacturer). 13. Optionally, extract amplified DNA using QIAGEN MINELUTE kit, elute with 25 μΐ of water (measure concentration, optimally between about 70-300 ng/μΐ).

14. Optionally, use 10 μΐ for NANOSTRING ® measurement.

15. Optionally analyze for the distribution and enrichment of the isolated nucleic acid fragments.

An example is the following approach applied for the ChlP-string analysis: The nCounter ® is designed to compare reads from specific probes across different RNA samples; however, evaluating ChIP samples by the nCounter ® requires comparison of reads originating from different probes. In order to perform a valid comparison between the different probes, both the different ChlP-string assays and the probes were normalized by the median. This accounted for the differences in the loading amounts of DNA in each ChlP-string assay. Furthermore, this normalization also adjusts the fluctuations of the diverse probes. To comprehensively evaluate and compare the different ChIP assays Z- score transformation for the ChIP sample was applied, followed by zeroing negative values, in order to reduce background noise. High outliers were subjected to threshold. 122 antibodies were evaluated.

Preparation of Reagents:

1. 10 x TE: 100 mM Tris-HCl pH8.0, 10 mM EDTA pH 8.0,

50 ml: 5 ml 1M Tris-HCl pH 8.0 + 1 ml 0.5M EDTA +44 ml water

2. 10 x STE: 100 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0, 140 mM NaCl

50 ml: 1M Tris-HCl pH 8.0 + 1 ml 0.5M EDTA + 14 ml 5M NaCl + 33 ml water

3. 2.5 M Glycine: 37.52 g in 200 ml water, heat to 65 °C, filter solution.

4. DOC: 5% solution: 10 g Sodium deoxycholate in 200 ml water

5. RIPA buffer: STE + 1% Triton X-100, 0.1% SDS, 0.1% DOC

50 ml: 0.5 ml of 1M Tris-Cl pH 8.0 + 0.1 ml of 0.5M EDTA + 1.4 ml 5M NaCl + 500 μΐ Triton X-100 + 500 μΐ 10% SDS + 1 ml 5% DOC

6. RIPA/500mM NaCl buffer: RIPA buffer + 360mM NaCl

50 ml: 0.5 ml of 1M Tris-Cl pH 8.0 + 0.1 ml of 0.5M EDTA, 5 ml 5M NaCl, 500 μΐ Triton X-100 + 500 μΐ 10% SDS + 1 ml 5% DOC

7. LiCl wash buffer: TE, 250mM LiCl, 0.5% NP-40, 0.5% DOC

50 ml: 0.5 ml of 1M Tris-Cl pH 8.0 + 0.1 ml of 0.5 M EDTA +12.5 ml 1M LiCl + 250 μΐ NP-40 + 5 ml 5% DOC 8. Direct elution buffer: 10 mM Tris-Cl pH 8.0, 5 mM EDTA, 300 mM NaCl, 0.5% SDS

50 ml: 0.5 ml of IM Tris-Cl pH 8.0 +0.5 ml of 0.5 M EDTA+ 3 ml 5 M NaCl +2.5 ml 10% SDS

9. RNase stock, Proteinase K stock, Glycogen stock, PBS-tissue culture grade, water- extra pure, protease inhibitor, IM DTT stock.

Chromatin Immunoprecipitation Reagents:

IM Tris-HCl pH 8.0: Boston Bioproducts (Ashland, MA), BM-320

0.5M EDTA pH 8.0: Boston Bioproducts, BM-150

Hepes Buffer, IM solution: Mediatech Inc. (Manassas, VA) Cat# 25-060-CI 100 ml

5M NaCl : Sigma (St. Louis, MO) S5150-1L

IM MgCl 2 sterile 125 ml : Boston Bioproducts, BM-670

IM KC1 (ACS grade) : Boston Bioproducts, MT-250, 100 ml

DTT

16% Formaldehyde: Thermo (Hanover Park, IL) #28908, 10x10 ampoule

Glycine: Fluka (Buchs, Switzerland) #50046

Protease Inhibitor: Roche (Basel, Switzerland) tablet

Triton X-100: Sigma (St. Louis, MO) T8787-100 ml

NP-40

Sodium deoxycholate

SDS (20%); Boston Bioproducts, BM-230

IM LiCl (ACS grade): Boston Bioproducts, MT-180

RNase DNase free 500 μg/l ml, Roche #11119915001

Proteinase K 1. Invitrogen (Carlsbad, CA) Proteinase K solution #25530-049

Proteinase K 2. QIAGEN (Valencia, CA) Proteinase K (2 ml) # QIAGEN 19131

PBS : Sigma

MINELUTE QIAGEN kit: MINELUTE purification kit (50) #28004

Glycogen: Roche 10901393001 (20mg)

CHC1 3 : isoamyl alcohol (24: 1) : SIGMA C0549-lpt

Phenol: CHC1 3 : isoamyl alcohol (25:24: 10) : Invitrogen 15593-031 100 ml

EtOH: 200 proof

Water: Ultrapure distilled water GIBCO 10977 (DNase and RNase free) Protein G: Invitrogen cat#100.04D

Quant iT 500 ds HS DNA, Q32854, Invitrogen, 500assays

Qubit Assay Tubes, Q32856, Invitrogen, 500 assays

Evaluation of High-throughput Chromatin Immuno-Precipitation protocol:

To evaluate the efficiency of the HT-ChIP-seq protocol, 10,000 (10K), 20,000 (20K), or 100,000 (100K) cells were analyzed. A modified ChIP method was performed (to adjust for the small number of cells) using four antibodies designed for different histone

modifications (K4me3, K9me3, K27me3, K36me3), libraries were directly generated.

The results are as follows in Table 2:

ChIP- string was then used to evaluate the efficiency and accuracy of the data. The SCN ChIP- string data was then compared to to other ChIP- string data (derived using 20 million cells, and successfully sequenced). All the samples from SCN correlated with the ChIP- string data of 20 million cells.

Figure 27 represents ChlP-string experiments for 10,000 20,000 and 100,000 cells. The matrix represents the correlation values of each ChlP-string experiment to the rest. Each antibody correlates within the SCN and with the relevant 20 million cell ChlP-string data.

Method for antibody screening using ChlP-string An method for antibody screening was implemented. This method avoids row normalization and applies objective criteria to detect ChlP-string experiments with skewed probe distributions. This method is applicable for future, small-scale, studies.

The method was performed, as follows:

1. ChlP-string probe signals were normalized by experiment (e.g. , column)

2. The signal for each probe was divided by the median signal of that probe in control experiments only (IgG, WCE). This replaced the 'row normalization' step, and avoided any dependence on other antibodies being screened.

3. For each individual experiment, all probes were ranked based on their signals. 4. Each probe was assigned to one of eleven sets based on the ChlP-string values in histone modification experiments (but not the other CRs ChlP-string). The sets were characterized by a different combination of histone marks, and were analogous to the chromatin states that were the original basis for probe selection. The set definitions were independent of the chromatin regulator (CR) experiments.

5. For each of the eleven sets, the median rank of its probes in the IgG and WCE

ChlP-string experiments were calculated. This provided a background distribution, independent of the CR experiments.

6. To test whether a given CR ChlP-string experiment was an outlier, median ranks for each of the eleven sets were calculated. If any of the sets in a CR experiment had a median rank above the 99 th percentile of its background distribution, the antibody was 'passed' . If none of the sets satisfied this criterion, the antibody 'failed' . Thus, each antibody was required to be an 'outlier' with respect to at least one set, when compared with the control experiments.

7. Of the 126 CR antibodies tested by ChlP-string in this study, 39 'passed' this screen and 87 'failed' .

This screen accurately predicted which antibodies will work in ChlP-seq. Thirty one of the 34 antibodies that passed this screen and had been sequenced yielded high-quality maps. Eleven of the 13 antibodies that failed this screen and had been sequenced yielded flat profiles (no ChlP-seq signal). These statistics suggest that this screen has high sensitivity and specificity.

Example 4 Cell culture

K562 erythrocytic leukaemia cells (ATCC CCL-243) were grown according to standard protocols in RPMI 1640 media (Invitrogen, 22400105) supplemented with 10% fetal bovine serum (FBS, Atlas Biologicals, F-0500-A) and 10% Penicillin/Streptomycin

(Invitrogen, 15140122). HI ES cells were grown in TeSR media on Matrigel (Cellular

Dynamics), as described previously (Ernst, et al, Nature biotechnology (2010) 28:817-825.).

Chromatin Immunoprecipitation

Cells were crosslinked in formaldehyde (1%, 37 °C for 10 min), and then quenched with glycine (5 min at 37 °C). Fixed cells were lysed in 1% SDS, 10 mM EDTA and 50 mM Tris-HCl pH 8.1 supplemented with protease inhibitor (Roche, 04693159001), fragmented with a Branson Sonifier (model S-450D) at 4 °C to a size range between 200 and 800 bp, and precipitated by centrifugation. Five to 10 μg of antibody were pre-bound by incubating with a mix of Protein-A and Protein-G Dynabeads (Invitrogen, 100-02D andl00-07D,

respectively) in blocking buffer (PBS supplemented with 0.5% TWEEN and 0.5% BSA) for 2 hours. Washed beads were added to the chromatin lysate, and then incubated overnight. Samples were washed 6 times with RIPA buffer, twice with RIPA buffer supplemented with 500 mM NaCl, twice with LiCl buffer (10 mM TE, 250mM LiCl , 0.5% NP-40, 0.5% DOC), twice with TE (10 mM Tris-HCl pH 8.0, 1 mM EDTA), and then eluted in 0.5% SDS, 300 mM NaCl, 5 mM EDTA, 10 mM Tris HC1 pH 8.0 at 65 °C. Eluate was incubated in 65 °C over- night, and then treated sequentially with RNaseA (Roche, 11119915001) for 30 min and Proteinase K (NEB, P8102S) for two hours. DNA was purified using a DNA purification kit (Qiagen, 28004). Representative genomic loci and nCounter ® probe design

A set of genomic loci designed to be representative of diverse chromatin

environments was chosen. A hidden Markov model (Ernst, et al, Nature biotechnology (2010) 28:817-825) and ChlP-seq maps for 10 chromatin marks in K562 and ES cells (H3K36me3, H4K20mel, H3K27ac, H3K9ac, H3K4me3, H3K4me2, H3K4mel,

H3K27me3, H3K9me3 and CTCF) (Ernst, et al. Nature (2011) 473:43-49) was used to identify 10 major chromatin states and annotate the genome accordingly. These states corresponded to distinct annotations, including active promoters, poised promoters, weakly and strongly transcribed regions, weak and strong enhancers, Polycomb-repressed regions, heterochromatic regions and CTCF sites. For each state in each cell type, 20 genomic loci, 500 bp to 1 kb in size, were randomly selected. The corresponding sequences and a set of control regions were then used for probe design. Adapting the nCounter ® system for ChIP -string

The nCounter ® was originally designed for the non-enzymatic capture and counting of -800 individual RNA molecules in a single multiplexed reaction. The method employed color-coded, molecular bar-codes (reporters), solid phase capture, and high-resolution fluorescent imaging to digitally count individual nucleic acids. Each custom codeset contained two sequence- specific probes (each 35-50 bases) for every 100 base region of interest: (i) a biotinylated capture probe that contains complementary sequence to the 5' of the particular target region; (ii) a uniquely color-coded reporter probe, complementary to the 3' of the target region. The capture and reporter probes were hybridized to the target of interest, forming a tripartite structure (capture probe:target:reporter probe) that is then purified via universal affinity tags present in both capture and reporter probe molecules. The purified complexes were bound to the imaging surface via the biotin moiety on the capture probe, aligned via electrophoresis, immobilized and imaged. An image analysis algorithm was used to count the barcodes associated with a single molecule of each target sequence. Target genomic sequences were screened for optimal sets of nCounter ® probe pairs as described previously for RNA quantification (Geiss, et al. Nature biotechnology (2008) 26: 317-325), with the following modifications: (1) Tm calculations for probe pairs were based on DNA:DNA hybridization (Allawi, et al. Biochemistry (1997) 36: 10581-10594), (2) candidate probes were screened for cross-hybridization against the reference human genome (Mar. 2006, NCBI 36/HG18) using the NCBI BLAST algorithm (Altschul, et al. Journal of Molecular Biology (1990) 215:403-410), (3) probe pairs were scored using the NanoString Technologies scoring algorithm to select the optimal probe for each target sequence, and (4) probe pairs that met all previous criteria were screened for cross-hybridization with components of the nCounter ® system. ChlP-string DNA measurements

Purified DNA from ChIP experiments was used as the input sample material for the standard nCounter ® protocol. The standard procedure for RNA detection was adapted for DNA detection by incorporating a denaturation step prior to hybridization (95 °C for 5 minutes, followed by flash cooling on ice). The single- stranded denatured product was then applied directly into the hybridization reaction (5X SSPE, 0.1% Tween-20), and incubated at 65 °C overnight in a heat block with a heated lid. Automated purification of hybridized complexes and binding to the sample cartridge were performed using the nCounter ® prep station and reagents, according to standard procedures (NanoString Technologies, Master Kits). Imaging of the sample cartridge and raw data collection were performed on the nCounter ® analyzer, yielding digital counts for the number of times each target molecule was detected, as described previously (Geiss, et al. Nature biotechnology (2008) 26:317-325) and detailed in the nCounter ® Gene Expression Manual (www.nanostring.com).

For CR screening, immunoprecipitated DNA samples were first amplified using a whole genome amplification kit (Sigma- Aldrich, WGA2), according to the manufacturer's protocol.

ChlP-string data analysis and processing

Screening Method 1.

The antibody screen was based on ChIP- string data consisting of digital counts for each probe (rows) over all CR experiments (columns). The original analysis procedure includes the following steps. (1) Each measurement was divided by the median of all counts in the experiment in which it was taken (the column median); (2) Each value was further divided by the median of the values for that probe across all experiments (the row median); (3) The values were standardized based on the distribution of values within each experiment, subtracting the mean of each column and dividing by its standard deviation; after this step, the scores were compared to a Normal distribution with mean of 0 and variance of 1. Probes with large positive values corresponded to enriched loci; (4) To reduce the effect of background variations, or outliers, on the correlations (calculated in the next step), any value less than 0 was set to zero, and values higher than 5 were set to 5; (5) Pearson correlations were calculated between each pair of samples (experiments), and formed the basis for clustering the experiments using average-linkage, hierarchical agglomerative clustering; (6) The resulting clusters were visually inspected to pass/fail individual antibodies in the following way: (i) antibodies that clustered with IgG controls were designated as 'failed' in the assay, with a few exceptions made for antibodies that also showed high correlation with one (or more) histone modification experiments; (ii) experiments in which the pre- standardization signal was flat and contained no strongly enriched probes were also considered 'failed'. All remaining ('passed') antibodies and a selection of 'failed' antibodies were carried forward to ChlP-seq (Table 3). Genomic locations corresponding to enriched ChlP-string probes were subsequently validated in the respective genome-wide ChlP-seq datasets.

Screening Method 2.

Step 2 of Screening Method 1 was designed to adjust for the different baseline signals of individual probes by considering the median signal for one probe across all experiments. In a large screen (such as this study), most of the data for a given probe comes from the background distribution: many antibodies work poorly (and are thus background) and, moreover, the corresponding locus is only expected to be enriched for some of the epitopes. In a small screen with few antibodies, in some instances, this procedure can result in bias. A second approach was devised, which estimates the background distribution from the set of control experiments where no enrichment is expected (IgG and WCE). The background distribution was then applied in an automated fashion to determine whether a given CR experiment was significantly different from the control experiments, in which case the antibody was 'passed' .

In this procedure: (1) Counts were log transformed and zero counts discarded; (2) For each measurement, the median of all log-transformed counts in the experiment in which it was taken was subtracted (subtract column median); (3) Each value was further adjusted by subtracting the median of the (experiment-normalized) values for the corresponding probe in a reference set of samples. The reference set consisted of mouse and rabbit IgG samples, as well as 16 additional un-enriched control (whole cell extract) experiments. The values after these 3 steps were considered 'signal' for all subsequent processing. (4) Probes that showed high signals in multiple control experiments were removed from the analysis. To do this, the Partitioning Around Medoids (PAM) algorithm from a R cluster package was used to cluster experiments (3 clusters) and probes (4 clusters) from the reference set (see below). The section of the probe with the highest signals was identified by experiment cluster grid (12 sections), and the corresponding probes were removed. (5) The remaining probes were divided into 11 sets based on combinatorial enrichments in ChIP- string experiments for histone modifications (H3K4mel, H3K4me3, H3K9me3, and H3K27me3). These sets were also derived using the pam algorithm, and roughly correspond to the chromatin states used for probe design. (6) For each ChlP-string experiment, all probes were ranked according to their signal levels and then the median rank calculated for each of the 11 probe sets. A background distribution of median ranks was calculated for each set based on the 18 control experiments. For each CR experiment, the ranks for each set were compared against the background distribution. An antibody passed the screen if any of the 11 probe sets had a median rank above the 99th percentile of the background distribution.

This screening method accurately predicted which antibodies work in ChlP-seq. Thirty-one of the 34 antibodies that passed this screen and had been sequenced yielded high- quality maps. Eleven of the 13 antibodies that failed this screen and had been sequenced yielded flat profiles (no ChlP-seq signal). The results of the alternative screen are also very similar to the original one, passing just 5 additional antibodies.

Partitioning Around Medoids (PAM) algorithm

The PAM algorithm is a k-medoids algorithm, which is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm. Both the fc-means and fc-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means algorithm, fc-medoids chooses datapoints as centers (medoids or exemplars).

fc-medoid is a classical partitioning technique of clustering that clusters the data set of n objects into k clusters known a priori. A useful tool for determining k is the silhouette.

It is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances.

A medoid can be defined as the object of a cluster, whose average dissimilarity to all the objects in the cluster is minimal, i.e., it is a most centrally located point in the cluster.

The most common realization of k-medoid clustering is the Partitioning Around

Medoids (PAM) algorithm and is as follows (Theodoridis et ah, Pattern Recognition 3rd ed.. p.635 (2006)).: (1) initialize by randomly selecting k of the n data points as the medoids; (2) associate each data point to the closest medoid ("closest" refers to the use of any valid distance metric, most commonly Euclidean distance, Manhattan distance or Minkowski distance); (3) for each medoid m, for each non-medoid data point o, swap m and o and compute the total cost of the configuration; (4) select the configuration with the lowest cost; and (5) repeat steps 2 to 5 until there is no change in the medoid. EQUIVALENTS AND SCOPE

Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present invention is not intended to be limited to the above Description, but rather is as set forth in the appended claims.

In the claims articles such as "a," "an," and "the" may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include "or" between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process. Furthermore, it is to be understood that the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, descriptive terms, etc., from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Furthermore, where the claims recite a

composition, it is to be understood that methods of using the composition for any of the purposes disclosed herein are included, and methods of making the composition according to any of the methods of making disclosed herein or other methods known in the art are included, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise.

Where elements are presented as lists, e.g.,, in Markush group format, it is to be understood that each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements, features, etc. , certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements, features, etc. For purposes of simplicity those embodiments have not been specifically set forth in haec verba herein. It is also noted that the term "comprising" is intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or subrange within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. In addition, it is to be understood that any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Since such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular

embodiment of the compositions of the invention can be excluded from any one or more claims, for any reason, whether or not related to the existence of prior art.

References

Alon, U. (2007). An introduction to systems biology : design principles of biological circuits (Boca Raton, FL, Chapman & Hall/CRC).

Amit, I., Garber, M., Chevrier, N., Leite, A.P., Donner, Y., Eisenhaure, T., Guttman, M., Grenier, J.K., Li, W., Zuk, O., et al. (2009). Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257-263.

Bailey, T.L., and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36.

Bailey, T.L., and Gribskov, M. (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48-54.

Barish, G.D., Yu, R.T., Karunasiri, M., Ocampo, C.B., Dixon, J., Benner, C, Dent, A.L., Tangirala, R.K., and Evans, R.M. (2010). Bcl-6 and NF-kappaB cistromes mediate opposing regulation of the innate immune response. Genes Dev 24, 2760-2765.

Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837.

Berger, M.F., Badis, G., Gehrke, A.R., Talukder, S., Philippakis, A.A., Pena-Castillo, L., Alleyne, T.M., Mnaimneh, S., Botvinnik, O.B., Chan, E.T., et al. (2008). Variation in homeodomain DNA binding revealed by high -resolution analysis of sequence preferences. Cell 133, 1266-1276.

Bossard, P., and Zaret, K.S. (1998). GATA transcription factors as potentiators of gut endoderm differentiation. Development 125, 4909-4917. Capaldi, A. P., Kaplan, T., Liu, Y., Habib, N., Regev, A., Friedman, N., and O'Shea, E.K. (2008). Structure and function of a transcriptional network activated by the MAPK Hogl. Nat Genet 40, 1300-1306.

Cirillo, L.A., Lin, F.R., Cuesta, I., Friedman, D., Jarnik, M., and Zaret, K.S. (2002). Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell 9, 279-289.

Cirillo, L.A., and Zaret, K.S. (1999). An early developmental transcription factor complex that is more stable on nucleosome core particles than on free DNA. Mol Cell 4, 961-969.

Davidson, E.H. (2010). Emerging properties of animal gene regulatory networks. Nature 468, 911-920. Davis, R.L., Weintraub, H., and Lassar, A.B. (1987). Expression of a single transfected cDNA converts fibroblasts to myoblasts. Cell 51, 987-1000.

De Santa, F., Barozzi, I., Mietton, F., Ghisletti, S., Polletti, S., Tusi, B.K., Muller, H., Ragoussis, J., Wei, C.L., and Natoli, G. (2010). A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLoS Biol 8, el000384.

Flicek, P., Aken, B.L., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S., et al. (2010). Ensembl's 10th year. Nucleic Acids Res 38, D557-562.

Foster, S.L., Hargreaves, D.C., and Medzhitov, R. (2007). Gene-specific control of inflammation by TLR-induced chromatin modifications. Nature 447, 972-978.

Fujita, P.A., Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., et al. (2011). The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39, D876-882. 32

Geiss, G.K., Bumgarner, R.E., Birditt, B., Dahl, T., Dowidar, N., Dunaway, D.L., Fell, H.P., Ferree, S., George, R.D., Grogan, T., et al. (2008). Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol 26, 317-325.

Gerstein, M.B., Lu, Z.J., Van Nostrand, E.L., Cheng, C, Arshinoff, B.I., Liu, T., Yip, K.Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., et al. (2010). Integrative analysis of the

Caenorhabditis elegans genome by the modENCODE project. Science 330, 1775-1787.

Ghisletti, S., Huang, W., Jepsen, K., Benner, C, Hardiman, G., Rosenfeld, M.G., and Glass, C.K. (2009). Cooperative NCoR/SMRT interactions establish a corepressor-based strategy for integration of inflammatory and anti-inflammatory signaling pathways. Genes Dev 23, 681- 693.

Graf, T., and Enver, T. (2009). Forcing cells to change lineages. Nature 462, 587-594.

Griffith, O.L., Montgomery, S.B., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M.C., Bilenky, M., Haeussler, M., et al. (2008). ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res 36, D 107-113. Grove, C.A., De Masi, F., Barrasa, M.L, Newburger, D.E., Alkema, M.J., Bulyk, M.L., and Walhout, A.J. (2009). A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314-327.

Gupta, S., Stamatoyannopoulos, J.A., Bailey, T.L., and Noble, W.S. (2007). Quantifying similarity between motifs. Genome Biol 8, R24.

Guttman, M., Amit, I., Garber, M., French, C, Lin, M.F., Feldser, D., Huarte, M., Zuk, O., Carey, B.W., Cassady, J.P., et al. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227.

Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C, et al. (2010). Ab initio reconstruction of cell type- specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28, 503-510.

Harada, H., Fujita, T., Miyamoto, M., Kimura, Y., Maruyama, M., Furia, A., Miyata, T., and Taniguchi, T. (1989). Structurally similar but functionally distinct factors, IRF-1 and IRF-2, bind to the same regulatory elements of IFN and IFN-inducible genes. Cell 58, 729-739.

Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., et al. (2004). Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99-104.

Heinz, S., Benner, C, Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C, Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589.

Hoffmann, A., and Baltimore, D. (2006). Circuitry of nuclear factor kappaB signaling.

Immunol Rev 210, 171-186. Hu, Z., Killion, P.J., and Iyer, V.R. (2007). Genetic

reconstruction of a functional transcriptional regulatory network. Nat Genet 39, 683-687.

Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497-1502.

Katsnelson, A. (2006). Kicking off adaptive immunity: the discovery of dendritic cells. J Exp Med 203, 1622. Kim, D., and Salzberg, S.L. (2011). TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 12, R72. 33

Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity- regulated enhancers. Nature 465, 182-187.

Lange-zu Dohna, C, Brandeis, M., Berr, F., Mossner, J., and Engeland, K. (2000). A

CDE/CHR tandem element regulates cell cycle-dependent repression of cyclin B2 transcription. FEBS Lett 484, 77-81. Langmead, B., Trapnell, C, Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

Laslo, P., Spooner, C.J., Warmflash, A., Lancki, D.W., Lee, H.J., Sciammas, R., Gantner, B.N., Dinner, A.R., and Singh, H. (2006). Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126, 755-766.

Lenardo, M.J., and Baltimore, D. (1989). NF-kappa B: a pleiotropic mediator of inducible and tissue-specific gene control. Cell 58, 227-229. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-

1760.

Lupien, M., Eeckhoute, J., Meyer, C.A., Wang, Q., Zhang, Y., Li, W., Carroll, J.S., Liu, X.S., and Brown, M. (2008). FoxAl translates epigenetic signatures into enhancer-driven lineage- specific transcription. Cell 132, 958-970.

McLean, C.Y., Bristor, D., HiUer, M., Clarke, S.L., Schaar, B.T., Lowe, C.B., Wenger, A.M., and Bejerano, G. (2010). GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28, 495-501.

Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560.

Negre, N., Brown, CD., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P., Eaton, M.L., Loriaux, P., Sealfon, R., et al. (2011). A cis-regulatory map of the Drosophila genome. Nature 471, 527-531.

Pabst, T., and Mueller, B.U. (2007). Transcriptional dysregulation during myeloid

transformation in AML. Oncogene 26, 6829-6837.

Peter, I.S., and Davidson, E.H. (2011). Evolution of gene regulatory networks controlling body plan development. Cell 144, 970-985.

Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61-65.

Rabani, M., Levin, J.Z., Fan, L., Adiconis, X., Raychowdhury, R., Garber, M., Gnirke, A., Nusbaum, C, Hacohen, N., Friedman, N., et al. (2011). Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells. Nat Biotechnol 29, 436-442.

Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., and Wysocka, J. (2011). A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279-283. Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J. P. (2011). Integrative genomics viewer. Nat Biotechnol 29, 24-26.

Roy, S., Ernst, J., Kharchenko, P.V., Kheradpour, P., Negre, N., Eaton, M.L., Landolin, J.M., Bristow, C.A., Ma, L., Lin, M.F., et al. (2010). Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787-1797.

Segal, E., Shapira, M., Regev, A., Pe'er, D., Botstein, D., Koller, D., and Friedman, N.

(2003). Module networks: identifying regulatory modules and their condition— -specific regulators from gene expression data. Nat Genet 34, 166— -176.

Solomon, M.J., and Varshavsky, A. (1985). Formaldehyde— -mediated DNA— -protein crosslinking: a probe for in vivo chromatin structures. Proc Natl Acad Sci U S A 82, 6470— 6474.

Thanos, D., and Maniatis, T. (1992). The high mobility group protein HMG I(Y) is required for NF— -kappa B— -dependent virus induction of the human IFN— -beta gene. Cell 71, 777- -789.

Wallenstein, S., and Neff, N. (1987). An approximation for the distribution of the scan statistic. Stat Med 6, 197—207.

Weintraub, H., Tapscott, S.J., Davis, R.L., Thayer, M.J., Adam, M.A., Lassar, A.B., and Miller, A.D. (1989). Activation of muscle— -specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD. Proc Natl Acad Sci U S A 86, 5434- -5438.

Yosef, N., and Regev, A. (2011). Impulse control: temporal dynamics in gene transcription. Cell 144, 886—896.

Zhou, Q., Brown, J., Kanarek, A., Rajagopal, J., and Melton, D.A. (2008). In vivo

reprogramming of adult pancreatic exocrine cells to beta— -cells. Nature 455, 627— -632. Zinzen, R.P., Girardot, C, Gagneur, J., Braun, M., and Furlong, E.E. (2009). Combinatorial binding predicts spatio— -temporal cis— -regulatory activity. Nature 462, 65— -70

Table 1:

Chromatin regulator Vendor

MLL(l) Millipore

MLL(2) Millipore

LSD1(1) Millipore

LSD 1(2) Millipore bmil Millipore

SUZ12 Millipore

P300 Millipore

H3S10P Millipore

H2Bub Millipore

RBP1 UPSTATE

MBD2 UPSTATE

MybbplA Invitrogen

TCF4 Invitrogen

ROC1 Invitrogen

ATF2 Invitrogen

Bmil Invitrogen

LSD1 Invitrogen

EZH2 Invitrogen

BAP1 SantaCruz

Sap30 SantaCruz

HDacl santaCruz

MLL4 SantaCruz Chromatin regulator Vendor

MLL2 SantaCruz

FBXL10 SantaCruz

ASH1L SantaCruz

HDAC2 SantaCruz

HDAC1 SantaCruz

ESET SantaCruz

SMYD1 SantaCruz

YY1 SantaCruz hnRNPAl SantaCruz

ASXL1 SantaCruz

RING IB Active motif

BRG1 Active motif

EZH2 Active motif

SUZ12 Active motif sap30 Active motif bcllO Active motif

HDAC1 Active motif

NFKB PI 00 Active motif

ASH2 Active motif

CHD1 Active motif

CARM1 Active motif

RbAP46/48 Active motif

HMGA1 Active motif

JaridlC Active motif Chromatin regulator Vendor

MBD3 Active motif RB Active motif

DNMT1 Active motif

HDAC1 Active motif

YY1 Active motif

CHD2 Active motif

SIRT1 Active motif

YY1 Active motif

Sirt6 Novus

RbBP5 Novus

TET1 sigma

PSF Sigma

Sf3bl Custom

CDC73 Custom

KIAA1718_JHDM1D Custom

FRP Plu-1 Custom

H3K79me2 Abeam

KAT3A/CBP Abeam

H3K9ac Abeam

RNApolII P S2 Abeam

RNApolII P S5 Abeam

EZH1 Abeam

WDR5 Abeam

KAT5 Abeam Chromatin regulator Vendor

KMT4 Abeam

Ring IB Abeam

WDR5 Abeam

CTCF Abeam

SUZ12 Abeam

JaridlA Abeam

SP1 Abeam

Sirt6 Abeam

SUV39H1 Abeam hpl gamma Abeam setDBl Abeam

NSD2 Abeam

ASXL1 Abeam

EZH1 Abeam

RING IB Abeam

KMT4 Abeam

KAT5 Abeam

SUV39H1 Bethyl

CHD1 Bethyl

RBbp5 Bethyl

TRX2 Bethyl

DOTL1 Bethyl

WDR77 Bethyl

ASH2 Bethyl Chromatin regulator Vendor

TRRAP Bethyl

EHMT1 Bethyl

Hsetlb Bethyl ashl Bethyl

BAP1 Bethyl

SET7 Bethyl

RNF20 Bethyl

RNF40 Bethyl

Phosoph EZH2 (S21) Bethyl

PHF8 Bethyl

JMJD2B Bethyl

JMJD1C Bethyl

JMJD1A Bethyl

JARID1C Bethyl

J ARID IB Bethyl

SIRT6 Bethyl

CHD3 Bethyl

CHD7 Bethyl

CHD6 Bethyl

HDAC2 Bethyl

HDAC6 Bethyl

HDAC3 Bethyl

MLL1 Bethyl

CHD1 Bethyl Abcam (Cambridge, MA)

Active Motive (Carlsbad, CA)

Bethyl Laboratories (Montgomery, TX) Invitrogen (Carlsbad, CA)

Millipore (Billerica, MA)

Novus International (St. Charles, MO) Santa Cruz Biotechnology (Santa Cruz, CA) Sigma- Aldrich (St. Louis, MO)

Upstate Biotechnology (CharlottsviUe, VA)

Table 3:

CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

Santa

ASH1 Bethyl,A301.749A cruz,98301X

Active

ASH2 Bethyl,A300.107A motif,39099

Santa cruz,sc-

ASXL1 Abcam,ab55285 98302

ATF2 Invitrogen,44295G

Santa cruz,sc- Bethyl,A302.2

BAP1 48386 42A

BCL10 Active motif,39393

Invitrogen,752632 Millipore,MA

BMI1 A B4376

BRG1 Active motif,39003

CARM1 Active motif,39251

CBX2 Bethyl,A302-524A

CBX8 Bethyl,A300-882A

CDC73 Custom

Active

CHD1 Bethyl,A301-218A motif,39729

CHD2 Active motif,39363

CHD3 Bethyl,A301.219A

CHD4/Mi2 Bethyl,A301-081A

CHD6 Bethyl,A301.221A

CHD7 Bethyl,A301.223A

CHD8 Bethyl,A301-224A

CHD9 Bethyl,A301-226A CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

CoREST Bethyl,A300-130A

ChlP-seq maps publishe d in

Goren et al, Nat Methods

CTCF Millipore,07-729* 7, 47-49.

DNMT1 Active motif,39204

DOTL1 Bethyl,A300.953A

EHMT1 Bethyl,A301.642A

Santa

ESET cruz,166621X

Abcam,ab64850-

EZH1 100

Invitrogen,366 Active

EZH2 Active motif,39901 300 motif,39635

FBXL10 Santa cruz,69472X

Active Santa cruz,sc- Active

HDAC1 Santa cruz,6298X motif,39531 81598 motif,40967

Santa

HDAC2 Bethyl,A300.705A cruz,7899X

HDAC3 Bethyl,A300.464A

HDAC6 Bethyl,A301.341A

HDAC7 Bethyl,A301-384A

hnRNPAl Santa cruz,32301

HP 1. gamma Abcam,ab 10480

HSETlb Bethyl,A302.280A

JARID1A Abcam,ab26049 CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

Active

JARID1C Bethyl,A301.034A motif,39229

Sigma,SAB210507

JARID2 9

Bethyl,Al l lA301.

JMJD1A 539A

JMJD1C Bethyl,A300.884A

JMJD2A Bethyl,A300-861A

JMJD2B Bethyl,A301.478A

KAT3A.CBP Abcam,ab2832

KAT5 Abcam,ab62644

KIAA1718_JH

DM1D Custom made

KMT4 Abcam„ab64077

Invitrogen,413 Millipore,1334

LSD1 Bethyl,A300-215A 300 5 Millipore,dav

Upstate,DAM1394

MBD2 799

Millipore,AQl

mlgG Santa cruz,sc-2343 60

Bethyl,A300.0

MLL1 Milliporejbc Millipore,dav 86A

MLL2 Santa cruz,68671X

MLL4 Santa cruz,68675X

MYBBP1A Invitrogen,401200

NCoR Bethyl,A301-145A

NFKB_P100 Active motif,39687

NSD1 Bethyl,IHC-00027 CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

NSD2 Abcam,ab73539

Santa cruz,

P300 Bethyl,A300-358A 48343 X Millipore,5257

p400 Bethyl,A300-541A

Cell

P53 Signaling,4908

PCAF Bethyl,A301-666A

PHF8 Bethyl,A301.772A

PhosophEZH2 Bethyl,A108A300.

_S21 529A

PLU1/JARID1 Bethyl,A301.8

B Custom 13A

PSF Sigma,P2860

Active

RB motif, 160062

RbAP46.48 Active motif,39198

Nobus,600.252

RBBP5 Bethyl,A300.109A 1

RbBP7 Bethyl,A300-958A

RBP1 Upstate,up05563

REST Millipore, 17-641

Millipore,

rlgG Santa cruz,2027 API 32

Covance,MMS-

RNApolII 126R

RNApolII_PS

2 Abcam,ab5095

RNApolII_PS

5 Abcam,ab5131 CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

RNF2/RING1

B Bethyl,A302-869A Abcam,ab3832

RNF20 Bethyl,A300.714A

RNF40 Bethyl,A300.718A

ROC1 Invitrogen,342500

Active Santa cruz,sc-

SAP30 motif,3939731 130425

SET7 Bethyl,A301.747A

SETDB1 Abcam,abl2317

SF3B1 Custom

SIRT1 Active motif,39355

SIRT6 Bethyl,A302.4 Novus,NBPl-

Abcam,ab62739 52A 30101

SP1 Abcam,abl3370

SUV39H1 Bethyl,A302-

Abcam,ab 12405 127 A

SUZ12 Millipore,0513 Abcam,abl207

Active motif,39357 17 3

TCF4 Invitrogen,343800

TET1 Sigma,HPA019032

TRRAP Bethyl,A301.131A

TRX2 Bethyl,A300.113A

WDR5 Abcam,ab56919

WDR77 Bethyl,A301.561A

ChlP-seq maps

Active Santa publishe

YY1 Active motif,39071 motif, 1610076 cruz,1763X* d in

Mendenh CR Antibodyl Antibody2 Antibody3 Antibody4 Notes

all et al, PLOS

Genet 6, el00124 4

H2A.Z Abcam,ab4174

H1.4K26me2 Sigma,H8289

H2Bub Millipore, 17-650

H3K27ac Abcam,ab4729

H3K27me3 Millipore,07-449

H3K4ml Abcam,ab8895

H3K4m2 Abcam,ab7766

H3K4me3 Millipore, 17-614

H3K4me3 Abcam,ab8580

H3K79me2 Abcam,ab3594

H3K9mel Abcam,ab8896

H3K9me3 Abcam,ab8898

H3S10P Millipore, 16-218

H4K20ml Abcam,ab9051

H3K9ac Abcam,abl0812 Abcam,ab4441