Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MODULATION OF TARGETS IN CELLULAR PATHWAYS IDENTIFIED BY RESOLUTION OF STOCHASTIC GENE EXPRESSION
Document Type and Number:
WIPO Patent Application WO/2018/176007
Kind Code:
A1
Abstract:
Gene expression is resolved for individual cells as stochastic bursts in the time dimension. Patterns of coincident expression among markers enables the inference and construction of pathways and identification of associated targets. Methods and compositions for modulating targets according to their characteristic expression profiles. Methods and protocols for diagnosing and treating conditions involving associated pathway components.

Inventors:
SELIGMANN BRUCE (US)
BABIC MILOS (US)
Application Number:
PCT/US2018/024206
Publication Date:
September 27, 2018
Filing Date:
March 23, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIOSPYDER TECH INC (US)
International Classes:
C12Q1/68; C40B40/00; G01N33/567
Foreign References:
US20160203259A12016-07-14
US20160010151A12016-01-14
Other References:
GRIMM ET AL.: "A chemical-biological similarity-based grouping of complex substances as a prototype approach for evaluating chemical alternatives", GREEN CHEM., vol. 18, no. 16, 2016, pages 4407 - 4419, XP055544260
BABIC ET AL., ABSTRACT 1840: DIFFERENTIAL EXPRESSION AND MECHANISTIC PATHWAYS OF PROSTATE CANCER IDENTIFIED FROM FFPE TISSUE USING SURROGATE OR WHOLE TRANSCRIPTOME TEMPO-SEQ TARGETED GENE EXPRESSION ASSAYS, 2016, Retrieved from the Internet [retrieved on 20180507]
YOUK ET AL.: "scientific treatment approach for acute mast cell leukemia: using a strategy based on next-generation sequencing data", BLOOD RES., vol. 51, no. 1, 2016, pages 17 - 22, XP055544266
YEAKLEY ET AL.: "A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling", PLOS ONE, vol. 12, no. 5, 25 May 2017 (2017-05-25), pages e0178302, XP055544269
Attorney, Agent or Firm:
FAN, Calvin (US)
Download PDF:
Claims:
We claim:

1. A method for resolving or inferring a pattern of gene expression in the time domain or the frequency domain for one or more cells.

2. The method of claim 1, wherein the gene expression of the cells is detected using TempO-Seq assays.

3. The method of claim 2, wherein the detection is for a single cell.

4. A method for constructing a pathway or network of associated analytes, comprising the steps of

(a) performing the method of claim 1 to identify coincident analytes, thereby resolving expression of the analytes in a temporal dimension; and

(b) associating the analytes according to temporal criteria into a pathway or network.

5. The method of claim 4, wherein the analytes are genes, expression products, or regulatory products.

6. The method of claim 4, wherein the temporal criteria are selected from the group consisting of coincident bursts, burst frequency duration, burst profile

characteristics, upramp (attack), sustain, downramp (decay), and half-life parameters.

7. The method of claim 4, wherein a temporal criterion is the relative timing of bursts between a plurality of analytes.

8. The method of claim 4, wherein step (b) is automated.

9. A method for identifying a regulatory gene that changes expression level prior to expression of a target gene by performing the method of claim 4.

10 A method for identifying an agent A, which modulates the expression of gene A, which agent A modulates the expression of gene B, where gene A and gene B are members of a pathway inferred by the method of claim 4.

11. The method of claim 10, wherein gene B is associated with a disease state.

12. The method of claim 10, wherein gene A serves as a regulatory or control gene for gene B.

13. A method for modulating the expression of gene B by delivering to a cell the agent A identified by the method of claim 10.

14. The method of claim 13, wherein delivery is coordinated in time with expression burst timing.

15. The method of claim 14, wherein the expression burst timing is by synchronizing, suppressing, or prolonging the interval between expression bursts.

Description:
Modulation of Targets in Cellular Pathways

Identified by Resolution of Stochastic Gene Expression

Cross-Reference to Related Applications This application claims the benefit of priority of U.S. provisional application

Ser. 62/475,796, filed March 23, 2017, the contents of which are incorporated herein in its entirety.

Statement of Government Support

This invention was made with government support under grant 1R43HG008917-01 and grant 2R44HG007815-02, both awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.

Technical Field

This invention relates to cell biology, and more particularly to modulating targets in cellular pathways.

Summary of the Invention

This invention provides methods for resolving or inferring patterns of gene expression in the time domain or the frequency domain for single cells or populations of cells.

The invention also provides methods for constructing pathways or networks of associated analytes by identifying coincident analytes, such as genes, expression products or regulatory products, and associating the analytes according to temporal criteria. Such criteria include coincident bursts, burst frequency duration, burst profile characteristics, upramp (attack), sustain, downramp (decay), half-life parameters, and the relative timing of bursts between a plurality of analytes. These methods can be used to identify regulatory genes and agents that modulate the expression of genes, such as those associated with diseases states. The invention further provides methods for modulating the expression of genes by delivering agents to a cell in coordination in time with expression burst timing, such as by synchronizing, suppressing, or prolonging the interval between expression bursts.

Brief Description of the Drawings

Figure 1 illustrates a representative ligation assay for detecting target nucleic acid sequences. Briefly, downstream detector (DD) and upstream detector (UD) probe oligonucleotides are allowed to (a) hybridize to a target sequence, having DR and UR regions, in a sample. For convenience of identification, upstream regions are often underlined herein. While hybridized to the DR and UR of the target sequence, the DD is (b2) ligated selectively to the UR. Optionally, the DD is (bO) extended prior to (b2) ligation. The ligation product is optionally (c) amplified via amplification regions PI and P2' by one or more primers, such as PI and P2.

Figure 2 shows a "anchored" version of the TempO-Seq assay where the UD is configured with a second complementary region (UR2' or "anchor") separated by a noncomplementary region (CPl). The DD and UD can hybridize to a target sequence as in Figure 3, forming a hybridization complex (HC) providing a substrate for ligation at the junction (L) between DR' and UR'. In some TempO-Seq methods, an optional nuclease, such as a 3'- or 5 '-single-stranded exonuclease, is provided at various stages to remove undesired or leftover reactants. After ligation, Figure 4 shows the ligation product (LP) can be amplified by primers to yield amplification products (AP) in Figure 5.

Target sequences were used to design detectors for mRNA expression products for 24 human genes of interest. The genes were selected to demonstrate detection over an expected range of 6 orders of magnitude in abundance, with 10, 1, and 0.1 ng sample RNA input. The number of amplified ligation products, confirmed by sequencing, are shown for anchored detector designs (Figures 6a, 6b, and 6c). The x-axis is for the first technical replicate; the_y-axis is for the second replicate.

Figure 7 shows a modified version of TempO-Seq that can be performed after antibody-staining, before flow cytometry sorting (FACS) and PCR. MAQC Universal Reference RNA vs. MAQC Brain RNA were analyzed by the surrogate SI 500 (Figure 8a, 2700 genes) and whole transcriptome assays (Figure 8b, measuring 20,000 genes).

In Figure 9a, absolute sensitivity was measured using Mix 2 of the synthetic reference ERCC ExFold RNA Mixtures in a background of URR RNA and assayed using a detector oligo pool specific for ERCC RNAs. In another measure of sensitivity, MDA MB 231 cells were diluted in 10-fold increments into a constant background of MCF7 cells (Figure 9b, right bars), or vice versa (Figure 9b, left bars), then lysed and assayed for cell-specific transcripts. As shown, TempO-Seq is highly resistant to RNA degradation (Figures 9c, 9d, and 9e)

Figure 10 presents an experiment where human T cells were CD3/CD28 bead- activated, cultured for 5-9 days, surface-stained for CD4 & CD8, low CH 2 0-fixed, intracellular-stained for either T-bet, EOMES, or FoxP3 transcription factors (TF), sorted as indicated by the gates, profiled, and their expression was plotted.

Figures 11a and lib show TempO-Seq reproducibility for MAQ RNA and bulk measurement of 1000 T cells that express FoxP3. Figure 11c (Jurkat cells) and Figure lid (FoxP3 -expressing T cells) contrast counts of expression products for bulk populations with counts for single cells.

Figures 12a-d show high (CDC42, 12a), moderate (CRY1, 12b), low (NOTCH1, 12c), and very low (MYCBP 12d) expression when shown as (variable tracing) over time or average overall expression in bulk samples (solid flat line).

Figure 13 compares bulk expression compared to the number of expressing cells. Twelve representative BIOCARTA pathways are shown in Figure 14.

Figure 15 depicts expression as measured by bulk average (top), and as actual expression of individual single cells A, B, and C.

Figure 16 illustrates three possible relationships between genes 342, 541, 209, and 957. Figure 17a depicts average activities of genes 342 (solid trace), 541 (dotted), and 209 (dashed) in a population of cells correlated with a disease state (rectangle). Figure 17b shows the same genes resolved temporally by the present invention.

Figure 18a shows a change within a gene expression/cell signaling pathway (solid trace) causes a physiological response (dotted), which can lead to a disease state.

In Figure 18b, a treatment targeting the gene expression/cell signaling pathway (rectangle) has to be correctly timed, otherwise its effects will be limited. This requires significant time investment into dosing strategies, using the physiological or disease state as the readout. As shown in the Figure 18c, correct timing of treatment application, using the temporal data about the status of the relevant pathway in the majority of cells can instead focus on the causal gene expression pathway as a readout directly, thereby maximizing the effect of treatment on the target physiological state.

Figure 19 shows a table of measured counts of individual genes, arranged by frequency in a bulk average of 1000, and as several single cells.

Detailed Description of the Invention

The invention also provides a method for resolving a pattern of gene expression in the time domain or the frequency domain, which can be applied to single cells, or populations of two or more cells. The detection capability of TempO-Seq assays enables resolution of the expression levels and time course of single cells.

Pathway inference

The invention also provides methods for constructing pathways or networks of associated analytes by identifying coincident analytes, such as genes, expression products or regulatory products, and associating the analytes according to temporal criteria. Such criteria include coincident bursts, burst frequency duration, burst profile characteristics, upramp (attack), sustain, downramp (decay), half-life parameters, and the relative timing of bursts between a plurality of analytes. These methods can be used to identify regulatory genes and agents that modulate the expression of genes, such as those associated with diseases states. This process can be performed using predetermined inference criteria for manual or automated inference or assembly of pathway and networks.

The invention further provides methods for modulating the expression of genes by delivering agents to a cell in coordination in time with expression burst timing, such as by synchronizing, suppressing, or prolonging the interval between expression bursts.

Alternatively, the method can modulate the degradation of an expression product. Compositions of matter and methods of coordinated delivery

The invention provides a related method for identifying an agent A (which modulates the expression of gene A) that modulates the expression of gene B, where gene A and gene B are members of an inferred pathway. For example, Gene B may have a known association with a disease state. Gene A may serve as a regulatory or control gene where the relation to gene B is not previously known. Novel compositions comprising a plurality of agents, which were identified as modulating distant parts of an inferred pathway.

The invention provides novel modulatory compositions that are identified by the methods disclosed herein. A method for delivering an agent A (such as a therapeutic agent) to a cell based on the expression burst timing, such as by synchronizing,

suppressing, or prolonging the interval between expression bursts. This is particularly useful for cells that are resistant or more susceptible to drugs at different times.

Delivery of agents to cells

A composition that delivers an agent in conjunction with a trigger agent that synchronizes the timing of the expression bursts. For example, the method can be used to synchronize a set of cells, aligning their expression profiles on a time axis, and treating the cells with an activating agent or drug or arrest of the cell cycle, followed by controlled release. The invention includes modulating a pathway in a cell by exposing the cell to an agent or conditions that suppress or increase the throughput one a part or the entire pathway.

Diagnostic methods

The invention provides methods for diagnosing a disease state by detecting aberrant expression bursts of gene B, or of gene A, or aberrant coordination of bursts of separate expression products, such as in an abnormal sequence or combination. The method includes diagnosing a disease state by inferring pathways and then comparing them to reference pathways for missing or extra components or connections. Treatment methods

A method for treating a disease by restoring a lab oratory -detected pathway in a patient to a reference pathway, through interventions directed to the aberrant pathway steps. Rate-limiting agents can be used for interventions. A method for giving a drug that turns on the susceptible gene in all cells, then administering a therapeutic drug.

A method for dual therapy for a population of cells that express gene set 1, wherein a first subpopulation of cells is transiently sensitive or resistant to drug A, comprising administering drug A (then removing drug A or allowing drug A to degrade), then allowing a time period to pass whereby the cells become transiently sensitive, but before resistant cells proliferate; administering drug A another time.

An apparatus for delivering timed doses of one or more therapeutic agents according to their expression burst profiles. Thus, drugs can be delivered according to an individual cell's stochastic burst expression, rather than to a resistant "subpopulation" of cells.

The invention provides methods of using the assay of single cells to determine the temporal characteristics of the delay should be to set clinical guidelines for a population or subpopulation of patients, or for individualized treatment. ligation assays, generally A typical ligation assay is illustrated schematically in Figure 1, which is discussed in more detail in Example 1. A sample that may contain target sequences is contacted with a pool of detector oligonucleotide probes ("probes" or "detectors"). For each target sequence, a pair of detectors is provided: a downstream detector (DD) and an upstream detector (UD). A downstream detector can have a portion (DR') that is complementary to a region of the target sequence designated as a downstream region (DR). An upstream detector can have a portion (UR') that is complementary to a region of the target sequence designated as the upstream region (UR). Here, the terms "downstream" and "upstream" are used relative to the 5'-to-3' direction of transcription when the target sequence is a portion of an mRNA, and for convenience the regions designated as upstream are often shown underlined. As shown in Figure 1, the DR' of the DD and the UR' of the UD for each target sequence are allowed to hybridize to the corresponding DR and UR of the target sequence, if present in the sample. When the DR and UR of a target sequence are adjacent and the DR' and UR' of the pair of detector oligos are specifically hybridized to the target sequence to form a hybridization complex, the adjacent detectors DD and UD can be ligated. Thus, formation of a DD-UD ligation product serves as evidence that the target sequence (DR- UR) was present in the sample. In cases where the DR and UR of a target sequence are separated by at least one nucleotide, the ligation step can be preceded by (bO) extending the DR' using the sample as a template so the extended DR' and UR' become adjacent and can be ligated. The ligation product can then be detected by a variety of means; if desired, the products can be amplified prior to detection. Various detection TempO-Seq methods are disclosed herein. samples

The samples used in the method can be any substance where it is desired to detect whether a target sequence of a nucleic acid of interest is present. Such substances are typically biological in origin, but can be from artificially created or environmental samples. Biological samples can be from living or dead animals, plants, yeast and other microorganisms, prokaryotes, or cell lines thereof. Particular examples of animals include human, primates, dog, rat, mouse, zebrafish, fruit flies (such as Drosophila melanogaster), various worms (such as Caenorhabditis elegans) and any other animals studied in laboratories or as animal models of disease. The samples can be in the form of whole organisms or systems, tissue samples, cell samples, subcellular organelles or processes, or samples that are cell-free, including but not limited to solids, fluids, exosomes and other particles. Particular examples are cancer cells, induced pluripotent stem cells (iPSCs), primary hepatocytes, and lymphocytes and subpopulations thereof. The samples can be provided in liquid phase, such as cell-free homogenates or liquid media from tissue cultures, or nonadherent cells in suspension, tissue fragments or homogenates, or in solid phase, such as when the sample is mounted on a slide or in the form of formalin-fixed paraffin-embedded (FFPE) tissue or cells, as a fixed sample of any type, or when cells are grown on or in a surface, as long as detectors can be put into contact for potential hybridization with the sample nucleic acids. target sequences

The target sequences can be selected from any combination of sequences or subsequences in the genome or transcriptome of a species or an environment, or modified nucleic acids or nucleic acid mimics to which the detector oligos can bind or hybridize. The set can be specific for a sample type, such as a cell or tissue type. For some sample types, the number of target sequences can range in any combination of upper and lower limits of 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, 23,000, 30,000, 38,000, 40,000, 50,000, or more. The number of target sequences can also be expressed as a percentage of the total number of a defined set of sequences, such as the RNAs in the human transcriptome or genes in the human genome, ranging in any combination of upper and lower limits of 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 65%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, and 100%). Where large sets of detector oligos are used, it can be useful to check the full sequence of each oligo for potential cross-hybridization to other oligos in the set, where, for example, one oligo may inadvertently serve as an template to other detectors. While such non-specific artifacts can be identified by sequence, and are typically discarded from detection results, they may represent noninformative hybridization events that compete for reaction resources. nucleic acids

The nucleic acids of interest to be detected in samples include the genome, transcriptome, and other functional sets of nucleic acids, and subsets and fractions thereof. The nucleic acids of interest can be DNA, such as nuclear or mitochondrial DNA, or cDNA that is reverse transcribed from RNA. The sequence of interest can also be from RNA, such as mRNA, rRNA, tRNA, siRNAs (e.g., small interfering RNAs, small inhibitory RNAs, and synthetic inhibitory RNAs), antisense RNAs, circular RNAs, or long noncoding RNAs, circular RNA, or modified RNA, and can include unnatural or nonnaturally occurring bases. The nucleic acids can include modified bases, such as by methylation, and the assay is designed to detect such modifications. The nucleic acid of interest can be a microRNA (miRNA) at any stage of processing, such as a primary microRNA (pri-miRNA), precursor microRNA (pre-miRNA), a hairpin-forming microRNA variant (miRNA*), or a mature miRNA. Detection of microRNAs is discussed in Example 3a.

Relatively short nucleic acids of interest, such as mature miRNAs, can be lengthened to enhance hybridization to the detectors. For example, many microRNAs are phosphorylated at one end, and can be lengthened by chemical or enzymatic ligation with a supplementary oligo. The supplemental oligo can be single-stranded, double-stranded, or partially double-stranded, depending on the ligation method to be used. If desired, the supplemental oligo can be unique to each target sequence, or can be generic to some or all of the target sequences being ligated. The detectors can then be designed with extended DR' and/or UR' regions that include a portion that hybridizes to the supplemental sequence. A target sequence can also be supplemented by adding nucleotides, such as by polyadenylation, where the extended detectors include at least a portion to hybridize to the supplemental polyA tail. Detection of a family of mature miRNA sequences using extended detectors is discussed in Example 3b and illustrated in Figure 2j.

The amount of nucleic acid in the sample will vary on the type of sample, the complexity, and relative purity of the sample. Because of the sensitivity of the assay, the sample can be taken from a small number of cells, for example from fewer than 100,000, 10,000, 1000, 100, 50, 20, 10, 5, or even from a single cell or a subcellular portion of a cell. The total amount of nucleic acid in the sample can also be quite small: less than 100, 50, 20, 10, 5, 2, 1 micrograms, 500, 200, 100, 50, 20, 10, 5, 2, 1, 0.5, 0.2, 0.1 nanogram, 50, 20, 10, 5, 2, 1 picogram or less of nucleic acid (see Figure 6d), or less than 10, 1, 0.1, 0.01, 0.001 picograms of nucleic acid, or amount of a lysate containing equivalent amounts of nucleic acid. The copy number of a particular target sequence can be less than 100,000, 10,000, 1000, 100, 50, 20, 10, 5, or even a single copy present in the sample, particularly when coupled with representative amplification of the ligation product for detection. The amount of input nucleic acid will also vary, of course, depending on the complexity of the sample and the number of target sequences to be detected. detectors

Based on the particular target sequences, the invention provides pools of detector oligos where a target sequence has a pair of upstream and downstream detectors (UD and DD) that correspond to DR and UR, which are typically subsequences of the entire nucleic acid sequence of interest. Detector oligos can be designed to hybridize to the target sequence so a single-stranded sequence portion of the target sequence remains between the detectors, which can then be filled in, such as by reverse transcriptase or polymerase, thereby extending a detector to bring it effectively together with the other detector so they can be ligated. Detectors can be provided to detect targets that contain mutations including individual single-nucleotide polymorphisms (S Ps), gene fusions, and exon-splicing variants, or modifications such as methylation. Detectors can contain blocking groups, modified linkages between bases, unnatural or nonnaturally occurring bases or other unnatural or nonnaturally occurring components. An individual target sequence can have more than one set of DRs and URs, which can be selected by the user to optimize the performance of the assay. Multiple sets of DRs and URs can provide multiple

measurements of the same target sequence or of different portions of the target sequence, such as different exons or exon junctions, or provide measurement of a portion of sequence that is not mutated versus a portion of sequence that may harbor a mutation. multiple detectors for a gene

Multiple detector oligo (DO) sets targeting different sequences within a gene can be designed and synthesized for use to detect that gene. Each DO set hybridizes to its targeted sequence independently of the hybridization of other DO sets to each of their respective targeted sequences. Thus, the statistical reliability, statistical power, of measurement of the gene itself can be increased by use of multiple DO set targeting that gene. Measurement CV's can be reduced. Furthermore, if secondary structure, protein binding, or other factor modulates the hybridization of one DO set, and thus affects resulting measure of gene abundance by that DO set, then the counts from other DOs unaffected by such factors can be used to provide more accurate measure of gene abundance. Outlier analysis can be used to identify such deviations of DO set

measurements. In the case that the expression of a gene is low abundant, or that the amount of sample is small, such as from a single cell, and thus the number of gene molecules is low, hybridization of a specific DO set to that low amount of gene may not be sufficient to provide an amplifiable ligated product every time across repeat samples, and hence, not produce sequencing counts from some samples. The use of additional DO sets targeting other sequences within the same gene increases the probability that some of those DO sets will produce counts if the gene is actually expressed, and thus use of multiple DO sets can be used to increase the sensitivity of measurement of low expressed, or low numbers of gene molecules in a sample. The no sample background counts can be used to validate that DO counts result from the presence of the gene even though not all DO sets produce counts. The concurrence of more than one DO set reporting the presence of the gene can be used as a measure to validate that the DO counts result from the presence of the gene even though not all DO sets produce counts. Because the DO sets have a defined sequence, each DO set measurement represents independent measurements of defined target sequences, permitting statistical methods to be applied to determine that a gene is expressed or present in the sample or not.

The detector oligos themselves can be DNA, RNA, or a mixture or hybrid of both. If desired, they can have a modified nucleotide such as dideoxy nucleotides, deoxyUridine (dU), 5-methylCytosine (5mC), 5-hydroxymethylCytosine (5hmC), 5-formylCytosine (5fC), 5-carboxylCytosine (5caC), and Inosine. Yet other modifications to detector oligos include modified bases such as 2,6-diaminopurine, 2-aminopurine, 2-fluro bases,

5-bromoUracil, or 5-nitroindole. Other detector oligos can have a modified

sugar-phosphate backbone at one or more positions. Such modifications include a 3 '-3' or 5 '-5' linkage inversion, a locked nucleic acid (LNA), or a peptide nucleic acid (PNA) backbone. LNAs can be useful for their stronger hybridization properties to

complementary bases, enhancing the selectivity or the overall binding affinity for the detector oligo as a whole. The modified bases or bonds can also be used at positions 1, 2, or 3 away from the point of ligation.

As shown schematically in Figure 1, a downstream detector (DD) has a

complementary downstream region (DR'), which can be at least 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, 40, 45, or 50 nucleotides in length. Similarly, an upstream detector (UD) has a complementary upstream region (UR'), which can be at least 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, 40, 45, or 50 nucleotides in length. In a given pair of DD and UD for a target sequence, the DR' and UR' need not be exactly the same length, but will typically be similar so they can hybridize to the target under similar conditions and stringency.

As discussed in more detail below, the detectors can be optimized for ligation, such as by providing a 5 '-phosphate on the UD, although this is not necessary, depending on the selection of ligase or other ligation methods. Ribonucleotides can also be substituted at the ligatable ends of the DD and UD to increase the specificity and efficiency of ligation, as when an RNA ligase is used.

detector labels

Where the ligation assay proceeds directly to a detection step, either or both detectors can be designed to be labeled appropriately for detection. For example, the detector can be conjugated to any number of molecular or physical entities, labeled with a crosslinker, activatable crosslinker, activatable cleavage group or enzymatically cleavable group, optical, color or fluorescent dye, latex or other beads, quantum dots, or nanodots, or nanoparticles. Any of these entities can also be further modified or conjugated to other entities. The label can also take the form of an additional nucleotide sequence that serves to enable detection and identification, such as a barcode sequence. For example, a useful barcode sequence can uniquely identify the specific gene or target sequence, or a group of select genes or target sequences within the sample that are being measured. Such sequences can be positioned between the UR' and P2' sequence, and/or between the DR' and PI sequence, so they are amplified when using flanking primers. This sequence can also be a random sequence, useful for identifying the number of copies of the target gene in the sample, independent of the particular efficiency of any amplification step. anchored detectors

In one configuration of TempO-Seq, the upstream detector has a second region (UR2') that is complementary to a second region of the target sequence (UR2), as illustrated in Figure 2a. Because the tail of the UD can hybridize to a separate portion of the target, this configuration can be described as an "anchored" detector, as in Figure 2b. The anchor at the 3' end of the UD hybridizes with the target to form a double-strand and is thus configured to resist digestion to nucleases that degrade single strands, such as 3 ' exonucleases like exo I.

As a separate target-binding region, the anchor UR2' can be used to provide additional discrimination between similar sequences, such as isoforms of a family of genes where sequence differences between isoforms are found beyond the range of the DR and UR target sequence. The UR2' can be at least 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, 40, 45, or 50 nucleotides in length. The UR2' can be separated from the UR' by a

noncomplementary region (CPl), which can be at least 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length. In general, the UR2' will be upstream relative to the UR'. If an amplification region (such as P2') is present, it can be upstream of the UR', such as within the CPl or part of UR2' to allow amplification of the UR' portion as shown in Figure 2c to generate the amplification products (AP) in Figure 2d

In a mirror-image configuration, it is the downstream detector that has the anchor region (DR2') complementary to a second region of the target sequence. The DR2' anchor hybridizes to a DR2 on the target so that the configuration resists the action of 5' ss- exonucleases. The UR2' of the DD will generally be downstream relative to the UR'. If an amplification region (such as PI) is present, it can be downstream of the DR' to allow amplification of the DR' after ligation. Anchored DDs and UDs can be used separately or in combination to resist a cocktail of nucleases.

Because the separate anchor region of the detector can affect the hybridization characteristics of the detector via monomolecular kinetics, the compositions and relative lengths of the DR2', CPl(s), DR', UR' and UR2' can be tuned to optimize target selectivity between the detector pair and among the pairs of the detector pool. hybridization

Returning to the steps of the assay, the detectors are provided so that they contact the sample to allow the detectors to hybridize specifically to the target nucleic acids.

Hybridization conditions can be selected by the skilled artisan to allow and optimize for hybridization between the polynucleotides with the desired degree of specificity or mismatches, and such conditions will vary with the lengths and compositions of sequences present in the hybridization reaction, the nature of any modifications, as well as conditions such as the concentrations of the polynucleotides and ionic strength. Particular

hybridization temperatures include 30°, 32.5°, 35°, 37.5°, 40°, 42.5°, 45°, 47.5°, 50°, 52.5°, 55°, 57.5°, 60°, 62.5°, 65°, 67.5°, 70°, 72.5°, 75°, 77.5°, 80°, 82.5°, 85°, 87.5°, and/or 90°. Particular hybridization temperatures can be achieved by ramping the temperature up or down at various rates and profiles, such as timed temperature plateaus, one or more incremental increases or decreases of 5C°, 10C°, or 15C°, and repeated cycling between two or more temperatures. Ions such as Li + , Na + , K + , Ca 2+ , Mg 2+ and/or Mn 2+ can also be present from 0, 1, 2, 5, 10, 20, 50, 100, 200, and 500 mM, and such ions can affect the selection of the other hybridization conditions. Hybridization is also affected by steric crowding components such as branched polysaccharides, glycerol, and polyethylene glycol. Further additives can be present in the hybridization (and subsequent) reactions, such as DMSO, non-ionic detergents, betaine, ethylene glycol, 1,2-propanediol, formamide, tetramethyl ammonium chloride (TMAC), and/or proteins such as bovine serum albumin (BSA), according to the desired specificity.

Optionally, the conditions for hybridization can be adjusted or fine-tuned to permit other steps to be performed in the same environment. For example, the same buffers used for hybridization can be used for lysing cells in a sample, promoting hybridization of certain cell types, facilitating removal or permeation of cell walls, cell membranes, or subcellular fractions, as desired. Depending on the ligation method used in the assay, hybridization conditions can be selected to be compatible with conditions for ligation as is, or with the addition of one or more components and preferably without requiring a change of the reaction container when transitioning from hybridization to ligation steps.

Unhybridized detectors can also be removed using an optional wash step, for example when the sample is fixed tissue or fixed cells or the target sequences, or the target sequence is attached to a solid phase. ligation

The ligation reaction can occur by chemical ligation or by using a ligase enzyme or a ligation-facilitating co-factor. A variety of nick-repairing ligases are commercially available to catalyze the formation of a phosphodiester bond between adjacent

single-stranded polynucleotides when hybridized to another single-stranded template, such as to join DNA to RNA when hybridized to template. An example is bacteriophage T4 DNA ligase, which is generally understood to use ATP as a co-factor. The ATP can be supplied during the ligase reaction. In other reactions, the ligase can be pre-adenylated. In yet other reactions, the UD must be pre-adenylated at the 5' end, as with a 5' App

DNA/RNA ligase. The UD in a typical reaction will have a 5 '-phosphate to facilitate ligation to the DD, although this is not necessary, depending on the selection of ligase and ligation conditions. (Where a 5 '-phosphate on the DD is required for efficient ligation, using a comparable oligonucleotide without 5 '-phosphorylation can be used to inhibit or reduce undesired ligation.) Preferred ligation conditions include 10, 25, 50, 100 mM Tris-HCl (pH 7.5, 8.0, or 8.5); at least lOmM, 5mM, 2mM, ImM MgCl 2 ; at least or at most 2mM, 1 mM, 0.7mM, 0.5mM, 0.2mM, O. lmM, 0.05 mM, 0.02 mM, 0.01 mM, 0.005 mM, 0.002 mM, or 0.001 mM ATP; or at least lOmM, 7 mM, 5 mM, 2 mM, 1 mM, 0.5 mM DTT or another antioxidant. T3 DNA ligase can also be used, which can ligate a broader range of substrates and has a wider tolerance for salt concentration. As with other steps, the temperature can be selected according to the characteristics of the reaction components and conditions such as ionic strength.

As discussed above, the ligation step can be preceded by an optional extension step, as in Figure 1, step (bO). The ligation step can also be preceded by an optional cleavage step, such as by a nuclease, to remove any overhangs. In other cases, a portion of the DD can overlap with the UR sequence to which the UD hybridizes, so that after hybridization of the UD and the DD, there is an overhang sequence of 1, 2, 3, or more bases. A useful enzyme for removing an overhang is a Flap endonuclease, such as Fen-1, which cleavage leaves a ligatable 5 '-phosphate.

Nonligated detectors can also be removed using an optional wash step, for example when the sample is fixed tissue or fixed cells or the target sequences, or when the hybridization complex is attached to a solid phase. amplification

If desired, the ligation product can be amplified (for example by PCR or qPCR) to facilitate detection. Amplification methods and instruments are commercially available, including PCR plate and droplet formats, and the amplification enzymes (such as Taq and its commercial variants) and reaction conditions can be selected and tailored to the particular platform. Optionally, the polymerase selected for amplification can have strand-displacing activity. As illustrated in Figure. 1, the detectors can have additional sequences ("tails") including primer hybridization sequences {e.g. PI, P2') or

complements thereof, that serve as amplification sequences, so that after ligation, the ligation product can be amplified with a pair of amplification primers (PI, P2). Amplification can also be linear, or achieved by any number of methods other than PCR. If desired, the amplification primer can incorporate a barcode sequence, for example a barcode sequence that uniquely identifies the sample in a multi-sample experiment, and optionally has redundant and/or error-correction features. In some experiments, for example, different sample barcodes can be used for 96, 384, 1536, or more generally 2 or 4 different samples that are prepared with different barcodes separately for some steps, such as hybridization, ligation, and amplification, and combined for others, such as detection. The barcode sequence can be incorporated into the primer, such as 3' to the amplification sequence, so that the barcode becomes part of the amplified strand. In other instances, the amplification sequence of the primer can be extended by an additional sequence to provide a primer hybridization sequence that can be used for use in subsequent sequencing steps. The barcode may also be interposed between the

amplification sequence, and if desired, the extended amplification sequence, and another sequence that can be used for capture, such as capture onto a surface as part of a sequencing process, and/or for yet another primer hybridization sequence that is used for sequencing. In each case the barcode will be amplified with the rest of the detector sequences, for instance forming a single amplified, elongated molecule that contains sequencing primer hybridization sequences, sample barcode, and a gene-specific sequence, which may include a gene-specific barcode or a target molecule-specific barcode as well as sequence or complement to the sequence of the target gene. In the case where the targeted oligo is a cDNA, a gene-specific sequence or a sample-specific sequence can be added as part of the primer used for reverse transcription, and be a part of the sequence targeted by the UD and DD.

In other instances, methods known in the art can be used to amplify the ligated DD and UD sequences, such as by repetitive cycles of (1) ligation, (2) heating to melt off the ligated product, (3) cooling to permit hybridization of DD and UD to the target, (4) ligation, then repeating the heating (2), cooling (3), and ligation (4) steps. These additional amplification steps can be performed before amplification step (c), during which the sample barcodes and other sequences are added to the ligated UD and DD sequence. The target of the UD and DD hybridization may also be amplified by whole transcriptome amplification of RNA or amplification of cDNA. detection

The ligation product (or its amplicons) can optionally be detected by methods such as sequencing, qPCR, end point PCR, enzymatic, optical, or labeling for detection on an array or other molecule detection. Other detection methods include flow-through systems for counting labeled molecules. Depending on the detection method, the skilled user will be able to modify the design of the detectors and amplification primers to include functional features that are appropriate, such as for bridge amplification on a sequencing flow cell. The experimental resources used for amplification and detection can be limited and are often among the most expensive, and their consumption can be optimized by reducing the number of non-informative assay components present at various stages of the assay. steps in solid, liquid phases

In some embodiments, the hybridization, ligation, or extension steps can be performed while the target sequence is in situ. This can be particularly useful, for example, when the sample is on a histological slide, so that the ligation is known to occur at a recordable location and can be compared to similar reactions at other locations on the slide. It useful for any sample where the target sequence is part of a nucleic acid is fixed to the tissue. The ligated probes can remain at the location while other steps are performed, such as imaging or detection of other analytes at or near the location. If desired, the ligated probes can remain in situ more securely by a variety of chemical or enzymatic methods for cross-linking to the site, which can be permanent or reversible, such as by a photocleavable link as with using a cyanovinylcarbazole nucleoside analog ( CNY K). In a particular embodiment, the ligation products can be eluted from the sample in situ for collection and further processing, preferably eluting from small areas to preserve the location information and morphological context of the ligation reaction products. Elution can simply be by heat in low salt, effected by the PCR process, or by addition of base. In a particular

embodiment, samples are fixed, optionally permeabilized, and optionally processed prior to or during the assay. In yet another embodiment, samples are simply preserved by fixation before the assay.

In other embodiments, one or more of the steps can be performed in liquid phase, such as in a microfluidic system, so that one or more of the steps does not involve capture to a solid phase, such as to a bead or a plate surface. For example, any one or combination of the hybridization, extension, ligation, nuclease digestion, amplification, or detection steps can be performed in liquid phase. In a mixed phase assay, a solid phase can be used to immobilize one or more of the sample, the detector oligos, the hybridization complex, the extension product, the ligation product, or the amplification product. In particular, the target nucleic acid can be attached to a solid surface during the hybridization step, the ligation step, or both. The solid surface can be a bead, such as a magnetic, nonmagnetic, polymeric, reversible immobilization, or latex bead, or compound beads thereof, or a relatively flat surface such as a plate or flowcell surface, optionally with coatings of similar materials. The mixed phase format allows the components to be transferred from one reaction environment to another, or the conditions to be changed as the components remain in one container. kits

The invention provides kits for performing the methods described above, comprising detector oligos, and optionally a nuclease, a ligase, and/or a polymerase. The kits can further provide reaction buffers for the enzymes in the kit or buffer components to be added to reactions suitable for the enzymes. The component can be suitable for addition to a container for an enzyme reaction to prepare a suitable reaction buffer for the enzyme. The component can also be selected to be compatible with the reaction buffer for the preceding step of the method so that the component can be added to the same container to form a reaction buffer for the next enzyme to be used. Thus, the components can be selected to enable an "add-add-add" strategy for multiple steps of the assay to minimize transfers of sample, oligos, enzymes and/or solutions between separate containers.

The kits can also have eluent solutions suitable for removing oligonucleotides, such as ligated oligonucleotides, from a tissue sample for further analysis. The kits can further have amplification primers suitable for use with the detectors of the kit. Examples

Example 1: Representative Ligation Assay

A representative method is provided to illustrate ligation assays. Here, over 100 RNA expression products were detected in a sample of cells using a multiplex assay format. For each expression product, the assay was designed to detect one or more target sequences within the full sequence of the product. For example, in human cells, a GAPDH gene of interest encodes the enzyme glyceraldehyde 3-phosphate dehydrogenase; three different portions within the RNA transcript of the GAPDH gene were independently detected as target sequences. One such RNA target sequence, identified here as

GAPDH 2, where a 5' end was designated "upstream" and the 3' end was designated "downstream" for the direction of transcription and translation. A downstream region (DR) was defined as the downstream 25 bases of GAPDH 2, which has a complementary DNA sequence of DR'. The upstream region (UR) was defined as the upstream 25 bases of GAPDH 2, which has a complementary DNA sequence of UR'.

For GAPDH 2, a pair of detectors was designed: a downstream detector (DD) having the DR' sequence, and an upstream detector (UD) having the UR' sequence.

Similar pairs were designed for each of the target sequences to provide a pool of detectors for the assay. In this example, all the upstream detectors were phosphorylated at the 5' end.

In this particular example, an amplification step was to be performed later in the experiment using two primers, PI and P2, so all UDs in the experiment included a primer sequence (PI) and all URs included a complementary primer sequence (Ρ2'). Because amplification is not necessary to the practice of the invention, however, the sequence of the specific primers and primer sequences is a matter of selection to suit the particular amplification method, if used.

At least 10 ng of RNA isolated from human kidney or liver cell lines was placed in a well of a microtiter plate for each assay experiment. To each well was added 20 \L of 2X Binding Cocktail, which contained 5 nM of each detector (providing a final input of 0.1 pmoles per oligo), 100 nM biotinylated oligo(dT) 2 5, and 5 \L streptavidin-coated magnetic beads in a Wash Buffer (40 mM Tris-Cl pH 7.6, 1 M NaCl, 2 mM EDTA disodium, 0.2% SDS). The plate was heated for 10 min at 65°C to denature the RNA, then the temperature was ramped down over 40 min to 45°C to allow the detectors to anneal to the target sequences in the RNA sample. The plate was then transferred to a magnetic base to immobilize the beads, allowing the supernatant, containing unbound and excess detectors, to be aspirated from the wells. The beads were washed at least three times with 50 μΙ_, Wash Buffer.

To each well was added 5 Weiss units of T4 DNA ligase in 20 μΙ_, of IX ligation buffer, as provided by the supplier. After the beads were resuspended by pipette, the plates were incubated for 60 min at 37°C to allow target-dependent ligation of DDs to UDs as appropriate. After the ligation reaction, the beads were immobilized and washed twice with 50μΙ. Wash Buffer. To release the ligated detectors from their RNA targets, the beads were resuspended in 30μΙ. and incubated for 5 min at 65°C. After incubation, the beads were immobilized, and the supernatant was removed and transferred to a storage plate.

For the optional amplification step, 5 μΙ_, of the supernatant, containing the ligation products, was transferred to a well of a PCR plate. Then 10 μΙ_, of a PCR cocktail was added, containing 0.45 U Taq polymerase, 0.6 μΜ PI primer, 0.6 μΜ P2 primer, 1.5 mM

MgCl2, and 200 μΜ dNTPs. The thermocycler used the following program: 10 min at

94°C, followed by 20 to 25 cycles of 30 sec at 94°C, 30 sec at 58°C, and 30 sec at 72°C. The amplification products were then sequenced according to manufacturer's instructions. This representative ligation assay can be modified as in the following examples.

Example 2: Anchored Detector Designs

Upstream and downstream detector probe oligonucleotides were prepared as in Figure 2a and 3a for 24 target sequences identified as breast cancer targets: ACTB l, TFFl l, GAT A3 3, GAPDH 3, CDHl l, KRT19 2, TIMP1 2, NFKBIA l, ESRl l, VEGFA 3, LAMP 1 2, MUC1 3, BAD 3, PTEN l, BRCA2 1, BCAT2 3, ICAM1 2, IGF2 3, BRCA1 2, EGFR l, BMP4 1, KIT 3, WNTl l, and EGF 3 (in descending order of expected counts). The targets were selected for a range of expression covering 6 orders of magnitude from ACTB l to EGF 3. The target sequences used for the DRs and URs are shown in Figure 6a. The assay was performed in triplicate with 100, 10, 1, and 0.1 and 0 (control) nanograms of MCF7 total RNA as sample. The detectors were added to the sample in a volume of 1 or 2 μΙ_, and allowed to hybridize by incubating at 65°C for 10 minutes, ramping down over 20 minutes from 65° to 45°C, then held for 20 minutes at 45°C.

Exonuclease I (E. coli) was added to the hybridization mixture in 6 iL of 0.5 Units and incubated for 1 hour at 37°C. T4 ligase was added to the mixture in 6 μΙ_, of 5 Units and incubated for 1 hour at 37°C. A heat step was performed for 30 minutes at 80°C. The mixture was amplified by adding 2X PCR master mix. The amplification products corresponding to the target sequences were detected and quantificated by qPCR and sequencing.

Example 3: Stochastic Gene Expression of Single Intracellular Stained, FACS-Sorted Cells Profiled by TempO-Seq

Defining the nature of stochastic gene expression is important for understanding the regulation of transcription/translation and cell population dynamics. We prepared Jurkat cells and human blood lymphocytes (activated ex vivo, fixed, permeabilized, antibody- stained for surface CD4 and CD8, and for intracellular transcription factors FoxP3 and EOMES). A modified version of whole transcriptome TempO-Seq gene expression assay was performed in situ, and the cells were FACS-sorted into bulk subpopulations or into single cells. In this modified version, the TempO-Seq probes were eluted and gene expression was profiled by sequencing. The modified TempO-Seq assay (based on the NIEHS SI 500 gene-set) measured 2977 genes ("surrogate whole transcriptome" or "surrogate" assay, compared to the more comprehensive TempO-Seq "whole

transcriptome" assay), identifying every known signaling pathway. TempO-Seq bulk cell measurements correlated with the summed single cell measurements (R 2 =0.89 for a bulk preparation of 1000 CD4-/FoxP3- cells versus single cells). The no-sample control background was < 0.06 counts, showing that true "off could be measured. The

"abundance" of genes measured in bulk samples correlated to the number of cells in which expression was "on", a measure of the percentage of time that the gene is on. Only 48 genes were expressed all the time in every single cell, while the rest exhibited no expression in one or more cells. We observed that most genes were either on or off with very little "ramp up" or "ramp down" of expression over the time required to fix the cells and stop RNA synthesis/degradation.

When the bulk measurement was 10 counts, 247 cells had 0 expression, 6 had a median expression of 500 (average 583), ranging from 149 to 1206 counts, compared to the highest expressed gene, average counts 12,541, range 7,519 to 18,970; only ~ 16-fold higher. Thus, the concept of single copy gene expression is more complex than previously understood. Rather, low-expressed genes are "off "most of the time, but when "on" they are at relatively high levels in a cell. This in turn drives up "average" expression levels if measured in larger populations of nonactive cells.

Modified TempO-Seq Assay

Figure 7 shows a modified version of TempO-Seq that can be performed after antibody-staining, before flow cytometry sorting (FACS). A proprietary reagent was used to permeabilize the cells, which provided highly sensitive antibody-staining of intracellular antigens. The TempO-Seq protocol was carried out by adding a cocktail of detector oligos (DOs) so that there was a pair of DOs that hybridized to each targeted RNA, and when properly hybridized, the two detector oligos butt up against one another, permitting ligation. Wash steps were used to remove excess nonhybridized DOs, and subsequently, unligated DOs. The FACS sorting was performed, capturing each cell into 10 ml of PCR buffer, and then universal PCR was carried out to amplify the TempO-Seq products and at the same time to add a sample-specific barcode to the product from each cell.

Performance v. RNA-Seq

TempO-Seq gene expression measurements were highly correlated to comparable RNA-Seq results. MAQC Universal Reference RNA vs. MAQC Brain RNA were analyzed by the surrogate SI 500 (Figure 8a, 2700 genes) and whole transcriptome assays (Figure 8b, measuring 20,000 genes)

Sensitivity and resistance to degradation

In Figure 9a, absolute sensitivity was measured using Mix 2 of the synthetic reference ERCC ExFold RNA Mixtures diluted lxlO "5 in a background of URR RNA and assayed using a detector oligo pool specific for ERCC RNAs. Average reads were 144K/sample. Calculating from the slope, the assay is sensitive to -30 molecules.

In another measure of sensitivity, MDA MB 231 cells were diluted in 10-fold increments into a constant background of MCF7 cells (Figure 9b, right bars), or vice versa (Figure 9b, left bars), then lysed and assayed for cell-specific transcripts. Of the 13 and 14 genes monitored, respectively, the fraction that was significantly above background is shown for each cell dilution. Read depth ranged from 3.6M/sample for 100%, 299K for 0.1%, and down to 64K for 0.00001% for both titrations. One cell in 1000 background cells could be detected.

No reverse transcription was observed. Binding could occur even if a stretch of

RNA contained some abasic or cross-linked sites, as in FFPE. Fragmentation did not interfere as long as ~85bp stretches remained intact. For these reasons, TempO-Seq is highly resistant to RNA degradation. See Figures 9c, 9d, and 9e. Validation in human T-cells

Human T cells were CD3/CD28 bead-activated, cultured for 5-9 days, surface- stained for CD4 & CD8, low CH 2 0-fixed, intracellular-stained for either T-bet, EOMES, or FoxP3 transcription factors (TF), sorted as indicated by the gates, profiled, and expression was plotted (average + SD) for the enriched subset (Figure 10, right, gray bar) or depleted subset (Figure 10 right, black bar). Expression of surface molecule CD8 and transcription factors EOMES and FoxP3 (Figure 10 left upper panels) correlated with FACS-gated fluorescence. Other T-cell-specific genes did not differ among FACS sorted subsets (Figure 10 left center and bottom panels). Single cells switch between high and no expression

TempO-Seq reproducibility and sensitivity were unaffected in flow-sorted samples. Figure 11a shows TempO-Seq reproducibility for MAQ RNA. Figure lib shows TempO-Seq reproducibility for bulk measurement of 1000 T cells that express FoxP3.

Figure 11c and Figure lid contrast counts of expression products for bulk populations (y-axis) with counts for single cells (x-axis). An unexpected "fishhook" pattern is shown compared to an expected 45° distribution. The expression by single cells is significantly higher than the bulk expression for the mid to low expressed genes, creating the fishhook pattern. From such data the expression max for a gene can be determined, the maximal expression level within a single cell.

If a simple average is used to compare the single-cell population to the bulk population, the expression behavior of individual cells over time is masked behind a single average value for the expression of the bulk population as a whole. A better representation (as shown in Figs. 11c and 1 Id) is to sum the expression for a gene across all single cells, count the number of cells that express the gene (i.e. where expression is not zero), then divide the sum by the number of cells expressing it. This shows a more representative average expression for all expressing cells.

Bursting frequency correlates with expression level

If we treat single cells as representative snapshots of gene activity over time, it is possible to arrange them arbitrarily to produce a representative graph of what the expression patterns of each gene may look like over time (variable tracing), and to correlate this to an average overall expression in bulk samples (dark line).

This is shown for a gene with high expression (Figure 12a, CDC42). Figure 12b shows a gene CRY1 with moderate overall expression when measured in bulk (dark line), but when cells are resolved individually by TempO-Seq methods, a distinct subpopulation of cells are expressing CRY1 at high levels, while the others are not expressing significant levels of CRY1.

Figure 12c shows counts (on a log 2 scale) of NOTCH1, where the measurement of the bulk population is shown (dark line), but the underlying NOTCH1 expression of 12 individual cells is shown to be high, while the other cells have negligible NOTCH1 expression. Figure 12d shows another example where 3 cells of a population are high expressers of MYCBP, where an earlier bulk measurement of the population of cells would have indicated a simplistic average number of counts for the population.

Expression levels correlate with proportion of cells expressing a transcript

If temporal control of bursting is the dominant mechanism, the number of cells "caught" expressing a specific transcript should correlate with the expression levels of that transcript in bulk. Indeed, this is what we observe in Figure 13, with very few outliers and with additional mechanisms becoming important at the lowest and the highest levels of expression. Each point on the graph represents a single transcript. The average level of expression by each single cell is much higher than the average measured in a bulk sample.

Simultaneous bursting allows pathway detection

By correlating co-bursts of single genes representative of specific pathways, it is possible to determine the proportion of cells in which a particular pathway is active.

Twelve representative BIOCARTA pathways are shown in Figure 14. The possibility that co-bursting is random detection was checked with Fisher's Exact Test; all results above are p < 0.01. Thus, pathway co-bursting is not random, but is coordinated.

To summarize, TempO-Seq was extended to single cells, showing that previously observed burst expression of genes has a very rapid on-and-off ramp and that a small proportion of cells is capable of driving the overall expression average from entire tissues. Low expression in bulk reflects low numbers of expressing cells, but the expression level of each cell is high, typically within an order of magnitude or so of the max single cells expression for the genes that are highest expressed in the bulk sample measurement. We further show that signaling pathways can be reliably measured as active or inactive within single cells, allowing for differentiation into temporal subpopulations that express different activation states at any one time, reflected in the active pathways expressed by each transient subpopulation. Such transient subpopulations may respond differently to outside stimuli.

Example 4: Identification of coincident expression products

As shown in the chart at the top of Figure 15, the expression of gene 342 is measured for a population of cells. The average level of expression for the whole population is shown as the bulk average. Expression levels of individual cells are shown as resolved along the x-axis, showing individual cells A, B, and C have high "on" expression as detected by TempO-Seq.

While the invention is not bound by a particular theory or mechanism, it is believed that fixation freezes a snapshot of cell physiology in a given moment, and also that cells can be fixed at different time points after treatment or intervention. With either step or with a combination of steps, this bursting phenotype can be used to infer or measure temporal changes in gene expression and sequential gene activation. Time-based parameters (e.g., frequency, period between bursts, burst duration ti, decay) can be used to characterize burst characteristics. Similar data is obtained when cells are first stained and sorted, and then subjected to the TempO-Seq assay.

Figure 16 illustrates three possible relationships between genes. The first two panels show genes 342 and 541 being activated together (coincident activation). The third panel shows gene 209 activation lagging after 342 and 541 (lagged coincidental activation). The fourth panel shows gene 957, which displays non-coincident activation. These relationships between genes can be discerned by measurement of gene activity in a single cell population by measuring the odds of co-detection within the same cell in a population fixed at a particular time period. This allows reconstruction of the temporal order of events in a causal pathway, rather than just relying on traditional inferences from a generalized state obtained from the averaged results of the entire cell population. As an illustration, Figure 17a depicts average activities of genes 342 (solid trace), 541 (dotted), and 209 (dashed) in a population of cells correlated with a disease state (rectangle).

Figure 17b shows the same genes resolved temporally by the present invention.

Furthermore, status of the organism, and of the cell, can determine its response to therapeutic and other interventions. A classic example is expression of xenobiotic metabolizing genes which, among other things, metabolize pharmacological substances and poisons. The effect of gene expression status can be dramatic: a lethal dose of poison given to mice during daylight (the time of rest for this nocturnal animal) can be resisted and survived if administered at night. Looking at population averages can only determine overall average changes in signaling pathways. Measurements in single cells allows determination of signaling profiles that actually modulate what the physiological response will. Figure 18a shows a change within a gene expression/cell signaling pathway (solid trace) causes a physiological response (dotted), which can lead to a disease state.

In Figure 18b, a treatment targeting the gene expression/cell signaling pathway (rectangle) has to be correctly timed, otherwise its effects will be limited. This requires significant time investment into dosing strategies, using the physiological or disease state as the readout. As shown in the Figure 18c, correct timing of treatment application, using the temporal data about the status of the relevant pathway in the majority of cells can instead focus on the causal gene expression pathway as a readout directly, thereby maximizing the effect of treatment on the target physiological state.

Access to this information allows development of better interventions, and easier development of better dosing guidelines to maximize responses or minimize side effects.

Example 5: Sensitivity to degradation

Expression products, such as mRNAs have varying stability in different compartments of a cell. Cells are used with known poly- A loss-of-function mutations. There are a number of inherited conditions caused by RNA export defects.

The degradation of products (or differential degradation) may introduce potential bias in the results. As a check, a set of detector oligos (DOs) are designed for target sequences of interest, while a corresponding set of detectors are designed targeting introns, fusion proteins, or 3' end of RNAs targeting a different part of the same RNA. The reads are compared across different sample types (light vs heavy fixing of cells vs lysate), comparing to template regions in exons as well as those in adjacent untranslated DNA. Similarly, DOs are designed against the 3' end of RNA, most sensitive to degradation, and the results are compared to the DOs designed against other regions of the RNA.

Example 6: Persistence of burst information In the general population, the transcription factor EOMES is expressed by 15% of cells, for example. When cells are stained for EOMES and sorted, nearly all cells express the EOMES gene. Thus, the expression of the gene correlates on a single cell basis with the expression of protein product, and protein represents a "memory" that lasts after the direct expression product is degraded and may not be detectable. We identify EOMES- associated genes from the single cells in which it was expressed by looking at all single cells and sort out the genes that are not consistently expressed in all EOMES-positive- sorted cells. Example 7: Pathway analysis by sequential activation

Cells are analyzed for probability patterns of having particular pathways on or off. Then an identical population is treated with a drug targeting the pathway. The response of cells in bulk correlate with the measurement of pathway activity prior to treatment.

For example, if a single individual cancer cell, or several single cancer cells express a set of genes that provide resistance to a drug at the time such drug is administered, then these cells will survive, particularly if a mechanism induces these genes to be continually expressed so long as the drug stressor is present. This set of cells may proliferate and become the predominant phenotype of the evolving, drug-resistant cancer. Similarly, within a cancer population there may be cells that transiently express genes that make them more stem-cell-like, or more like the cells they evolved from, and thus again more resistant to drug and/or capable of metastasizing. A time-resolved expression analysis of cells shows that it is a transient subpopulation of cells that are expressing in bursts. By expressing a gene, individual cells may "move" between the transient subpopulation of expressing cells or the subpopulation of nonexpressing cells. Where the expression behavior of the population is in a natural equilibrium, active agents can be used to disrupt the entry and exit of cells to increase or decrease the number of expressing cells, as well as the number cells expressing other genes in the network.

Cells are treated with a modulating agent and then single-cell analysis performed at set times afterwards. The sequential activation of pathways at the single cell level are observed, as well as the first events controlling the activation process and expression at the single cell level. A progression of single cell expression signatures are identified, from which regulatory genes are identified that can be targeted from drug therapy.

For example, cancer cells are exposed to a drug. A percentage of cells survive during drug exposure, with different single cell profiles compared to profiles prior to drug treatment and during treatment, e.g., such signatures disappearing. After removal of the drug, the signatures change again, as the surviving cells grow and proliferate. Upon re- treatment with the same drug, the same changes in signatures are observed, and another percentage of cells survive. What is observed is that a signature exhibited by a similar percentage of cells is seen upon the start of treatment in each cycle, and is identified as a survival signature, combined with the subsequent signature seen in surviving cells during treatment. Drugs that inhibit this survival signature are identified that result in greater cell death upon drug treatment, making therapy more effective.

Also, subpopulations within populations can be considered, not only individual cells, so that modulation targets a repeated group 1 of expressed genes for some cells, and a different group 2 of genes for other cells.

The invention identifies the proportion of cells in which a particular gene or pathway is active. This enables determination of probability that a gene or pathway will be active when an intervention takes place. Cells are analyzed for probability patterns of having particular pathways on or off. Pathway X is on in 90% of the cells, but off in 10% of the cells. In the cells in which pathway X was off, pathway Y was usually on. Guided by this, an identical population of cells is treated with drugs targeting both pathways X and Y, while another population is targeted only with drugs targeting X. Finally, another sample of identical cells is treated with drugs targeting pathway X and pathway Z. Dual therapy against X and Y outperforms both the therapy targeting only X, and the therapy targeting X and Z .

One can infer that just as single-cell gene expression is stochastic, on or off, so to is the expression of fusion genes and other mutations. In a population of cancer cell line cells containing the DNA for a gene fusion the expression of that gene fusion was expressed in a stochastic manner by single cells, some cells expressing the fusion at the same time that other cells within the population did not. The association of the fusion gene expression could be correlated with the expression of other genes by those cells. The expression of gene fusions can be both a target for therapy in some cases, and a mechanism of resistance in other cases. Thus, control of the temporal expression of the gene fusion can be exploited as a therapeutic approach.

Example 8: Targeting different sequences within a gene

Ten single-cell samples are assayed using an assay in which genes that are low expressed when measured from bulk samples of 1000 cells are each measured using ten pairs of detector oligos (DOs) targeting ten different sequences within each gene.

Sequencing of no-sample controls for each DO set results in 0 to 5 counts for each

DO set. Sequencing of each single cell results in about 200 to about 1500 counts for one or more of the DO sets for certain of the genes for some or all cells, from which it is determined that those genes are expressed in each cell for which there is such counts for one or more CO sets for that gene. For other genes, measurements result in 0 to 20 counts for all DO sets in some or all cells, from which it is determined that those genes are not expressed in each cell for which there are no or not significant (compared to no sample control) counts for any of the DO sets for that gene.

For the comparison, where 8 of the 10 DO sets for one gene produce counts of about 200 to about 1500 counts, and where for another gene 3 of 10 DO sets produce these counts, it can be inferred that the first gene is higher expressed then the latter. In the case where one set of samples was treated and another set untreated, and none of the DO sets for one gene in the untreated sample gave counts, while an average of 5 of 10 DO sets for that gene in the treated sample gave counts, it can be determined that the gene is differentially expressed in the treated sample.

In the case where for a second gene 3 of 10 DO sets produced counts in the untreated sample, and 8 of 10 produced counts in the treated sample, it can be determined that the gene is differentially expressed in the treated sample.

The headings provided above are intended only to facilitate navigation within the document and should not be used to characterize the meaning of one portion of text compared to another. Skilled artisans will appreciate that additional embodiments are within the scope of the invention. The invention is defined only by the following claims; limitations from the specification or its examples should not be imported into the claims.