Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
REPROGRAMMING METHODS FOR GENERATING DIFFERENT INDUCED NEURONS
Document Type and Number:
WIPO Patent Application WO/2019/210231
Kind Code:
A1
Abstract:
This invention provides methods of generating diverse subtypes of induced neurons (iNs) from non-neuronal cells such as fibroblasts. The invention also provides methods of using iNs in various therapeutic or non-therapeutic applications, e.g., methods to identify agents or cellular modulations that enhance iN formation from non-neuronal cells.

Inventors:
BALDWIN KRISTIN (US)
Application Number:
PCT/US2019/029441
Publication Date:
October 31, 2019
Filing Date:
April 26, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SCRIPPS RESEARCH INST (US)
International Classes:
C12N5/074; C12N5/0793; C12N5/0797; C12N15/85; C40B40/02
Domestic Patent References:
WO2011091048A12011-07-28
WO2015085201A12015-06-11
Foreign References:
US20120129262A12012-05-24
US20170107498A12017-04-20
US20160333311A12016-11-17
Other References:
BLANCHARD ET AL.: "Selective conversion of fibroblasts into peripheral sensory neurons", NAT NEUROSCI, vol. 18, no. 1, 24 November 2014 (2014-11-24), pages 25 - 35, XP055366111, DOI: 10.1038/nn.3887
TSUNEMOTO ET AL.: "Diverse reprogramming codes for neuronal identity", NATURE, vol. 557, no. 7705, 9 May 2018 (2018-05-09), pages 375 - 380, XP036505216, DOI: 10.1038/s41586-018-0103-5
WAINGER ET AL.: "Modeling pain in vitro using nociceptor neurons reprogrammed from fibroblasts", NAT NEUROSCI, vol. 18, no. 1, 24 November 2014 (2014-11-24), pages 17 - 24, XP055528889, DOI: 10.1038/nn.3886
Attorney, Agent or Firm:
FITTING, Thomas et al. (US)
Download PDF:
Claims:
WE CLAIM:

1. A method for generating a group of induced neuron cells, comprising recombinantly expressing in a fibroblast or a stem cell a pair of transcription factors (TFs), thereby generating induced neuronal-like cells; wherein the pair of TFs comprise (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl,

Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6,

Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2.

2. The method of claim 1, wherein the pair of TFs are expressed in the non neuronal cell via one or more expression vectors.

3. The method of claim 1, wherein expression of the pair of TFs is via one or more inducible expression vectors.

4. The method of claim 1, wherein expression of the pair of TFs is temporal.

5. The method of claim 1, wherein the stem cell is an embryonic stem cell (ESC), or an induced pluripotent stem cell (iPSC).

6. The method of claim 5, wherein the iPSC is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line.

7. The method of claim 1, wherein the fibroblast is an embryonic fibroblast or an adult fibroblast.

8. The method of claim 7, wherein the fibroblast is derived from a mammal.

9. The method of claim 8, wherein the mammal is human, mouse or rat.

10. The method of claim 1, wherein the pair of TFs are both transiently expressed in the non-neuronal cell.

11. The method of claim 1, wherein an expression vector encoding both of the pair of TFs is introduced into the non-neuronal cell.

12. The method of claim 11, wherein the expression vector is a lentiviral vector.

13. The method of claim 1, further comprising examining the induced neurons for the presence of a neuronal marker.

14. A method for generating an induced neuron subtype with a set of defined neuron makers from a non-neuronal cell, comprising recombinantly expressing in a fibroblast or a stem cell a pair of transcription factors (TFs) selected from Tables 1-11, thereby generating an induced neuron subtype with the corresponding set of defined neuron makers listed in Tables 1-11.

15. An isolated fibroblast or stem cell, comprising one or more expression vectors that express a pair of transcription factors (TFs), wherein the pair of TFs comprise (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2.

16. The cell of claim 15, which is a fibroblast, an embryonic stem cell (ESC), or an induced pluripotent stem cell (iPSC).

17. The cell of claim 15, which is an embryonic fibroblast or an adult fibroblast.

18. The cell of claim 17, wherein the fibroblast is derived from a mammal.

19. The cell of claim 18, wherein the mammal is human, mouse or rat.

20. The cell of claim 15, wherein the expression vectors are lentiviral vectors.

21. The cell of claim 15, wherein the expression vectors are inducible vectors.

22. A method for identifying an agent or cellular modulation that stimulates conversion of a fibroblast or stem cell into an induced neuron (iN), comprising: (a) recombinantly expressing in a non-neuronal cell a pair of transcription factors, wherein the pair of TFs comprise (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2; and (b) detecting enhanced iN conversion from the non-neuronal cell that has been subject to contact with a specific candidate compound, or subject to a specific cellular manipulation, relative to iN conversion from the non-neuronal cell that has not been subject to contact with the specific candidate compound or subject to the specific cellular manipulation, thereby identifying the specific candidate compound or cellular modulation as one that stimulates conversion of a non-neuronal cell into an induced neuron.

23. The method of claim 22, wherein expression of the pair of TFs is temporal.

24. The method of claim 22, wherein the candidate compounds are transcription factors or miRNAs.

25. The method of claim 22, wherein the cellular manipulations are epigenetic modulations.

26. The method of claim 22, wherein the stem cell is an embryonic stem cell (ESC) or an induced pluripotent stem cell (iPSC).

27. The method of claim 26, wherein the iPSC is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line.

28. The method of claim 22, wherein the fibroblast is an embryonic fibroblast or an adult fibroblast.

29. The method of claim 28, wherein the fibroblast is derived from a mammal.

30. The method of claim 29, wherein the mammal is human, mouse or rat.

31. The method of claim 22, wherein the pair of TFs are transiently expressed in the non-neuronal cell.

32. The method of claim 22, wherein the pair of TFs are introduced into the non-neuronal cell via a lentiviral vector.

Description:
REPROGRAMMING METHODS FOR GENERATING

DIFFERENT INDUCED NEURONS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The subject patent application claims the benefit of priority to U.S.

Provisional Patent Application Number 62/663,303 (filed April 27, 2018; now pending). The full disclosure of the priority application is incorporated herein by reference in its entirety and for all purposes.

STATEMENT OF GOVERNMENT SUPPORT

[0002] This invention was made with government support under grant numbers

DA031566, DC012592 and MH102698 awarded by The National Institutes of Health. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

[0003] Neurons comprise a conspicuously diverse but clearly recognizable cell type. All neurons share defining core features such as electrical excitability and synaptic connectivity. Yet, in even the simplest organisms, neurons also exhibit extensive subtype diversity that affords each species its unique sensory modalities, behaviors and cognitive capabilities. The extent to which this diversity reflects the action of cell intrinsic programs or, rather, depends on environmental and developmental cues is an unsolved yet central question in neuroscience. Remarkably, despite the elaborate sequential mechanisms that specify cell types during embryonic development, recent studies have shown that transient

overexpression of small sets of transcription factors can stably reprogram cells from one lineage into another.

[0004] Nevertheless, there are needs in the art to better understand several important issues related to induced neurons (iNs). For example, is the capacity to induce neuronal identity limited to only a few sets of transcription factors or might there be a larger set of inducing factors? What features of neuronal identity and diversity can be produced outside the context of the brain and independent of developmental trajectories? There is also a need in the art for means to generate induced neurons with diverse functional

properties. The present invention is directed to this and other unmet needs in the art.

SUMMARY OF THE INVENTION

[0005] In one aspect, the present invention provides methods for generating a group of induced neuron cells. The methods entail recombinantly expressing in a fibroblast or a stem cell a specific pair of transcription factors (TFs) described herein, thereby generating induced neuronal-like cells. In various embodiments, the employed pair of TFs are (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2.

[0006] In some embodiments, the employed pair of TFs are expressed in the non neuronal cell via one or more expression vectors. In some of these embodiments, expression of the pair of TFs is via one or more inducible expression vectors. In some embodiments, expression of the pair of TFs is temporal. In some embodiments, the employed stem cell is an embryonic stem cell (ESC), or an induced pluripotent stem cell (iPSC). In some embodiments, the employed fibroblast is an embryonic fibroblast or an adult fibroblast. In some embodiments, the employed iPSC is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line (LCL). In some embodiments, the employed fibroblast is derived from a mammal. For example, the fibroblast can be obtained from human, mouse or rat. In some embodiments, the employed pair of TFs are both transiently expressed in the non-neuronal cell. In some methods, an expression vector encoding both of the pair of TFs is introduced into the non-neuronal cell. In some of these embodiments, the employed expression vector is a lentiviral vector.

[0007] Some methods of the invention further entail examining the induced neurons for the presence of a neuronal marker. Some methods of the invention are directed to generating an induced neuron subtype with a set of defined makers (e.g., expression of a group of receptors) from a non-neuronal cell. In various embodiments, the methods involve expressing in a fibroblast or a stem cell a pair of transcription factors (TFs) selected from Tables 1-11, thereby generating an induced neuron subtype with the corresponding set of defined neuron markers (neuronal receptors or functional genes) listed in the tables. In some embodiments, the invention provides a group of induced neurons with the specific cellular markers as described herein, e.g., with the combination of receptor expression profiles in each row of Tables 1-11.

[0008] In a related aspect, the invention provides isolated fibroblast or stem cells that harbor one or more expression vectors that express a pair of transcription factors (TFs).

In various embodiments, the pair of TFs are (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2. [0009] In various embodiments, the employed cell can be, e.g., a fibroblast, an embryonic stem cell (ESC), or an induced pluripotent stem cell (iPSC). In some

embodiments, the employed iPSC is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line (LCL). In some embodiments, the employed cell is an embryonic fibroblast or an adult fibroblast. In some of these embodiments, the fibroblast is derived from a mammal, e.g., human, mouse or rat. In some embodiments, the employed expression vectors are lentiviral vectors. In some of these embodiments, the expression vectors are inducible vectors.

[0010] In another aspect, the invention provides methods for identifying an agent or cellular modulation that stimulates conversion of a fibroblast or stem cell into an induced neuron (iN). These methods involve (a) recombinantly expressing in a non-neuronal cell a specific pair of transcription factors as described herein, and (b) detecting enhanced iN conversion from the non-neuronal cell that has been subject to contact with a specific candidate compound, or subject to a specific cellular manipulation, relative to iN conversion from the non-neuronal cell that has not been subject to contact with the specific candidate compound or subject to the specific cellular manipulation, thereby identifying the specific candidate compound or cellular modulation as one that stimulates conversion of a non neuronal cell into an induced neuron. In various embodiments, the employed specific pair of TFs can be wherein the pair of TFs comprise (1) Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5, (2) Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8, (3) Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla, (4) Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, (5) Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (6) Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1, (7) Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1, (8) Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1, (9) Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl, or (10) Nurrl and bHLH TF Ascl2. [0011] In some of the methods, expression of the pair of TFs is temporal. In some methods, the candidate compounds are transcription factors or miRNAs. In some methods, the cellular manipulations are epigenetic modulations. In some methods, the employed stem cell is an embryonic stem cell (ESC) or an induced pluripotent stem cell (iPSC). In some embodiments, the employed iPSC is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line (LCL). In some methods, the employed fibroblast is an embryonic fibroblast or an adult fibroblast. In some of these embodiments, the fibroblast is derived from a mammal, e.g., human, mouse or rat. In some methods, the employed pair of TFs are transiently expressed in the non-neuronal cell. In some of these methods, the pair of TFs are introduced into the non-neuronal cell via a lentiviral vector.

[0012] A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and claims.

DESCRIPTION OF THE DRAWINGS

[0013] Figure 1 shows high percentage of bHLH and POU transcription factor pairs convert fibroblasts into neuronal-like cells (a) Schematic overview of reprogramming method. MEFs were derived from El 3.5 embryos from which neural tissue was removed and cultured in vitro. MEFs were transduced with lentiviruses encoding doxy cy cline inducible transcription factors and rtTA to mediate induction. Doxy cy cline was added and maintained for 8 days. After additional days of maturation cells were screened for expression of neuronal markers (Tuj l, TauEGFP) (day 14) or used for other experiments (day 16-24). (b) Matrix of 76 reprogramming factor pairs that produced Tuj l -positive cells from fibroblasts on day 14 post-induction. All 12 tested POU factors and the NR are included in this table but the 30 bHLH factors that did not produce candidate iNs with any POU or NR were omitted for clarity. Each box contains the average percent Tuj l-positive cells out of the total fibroblasts plated (n = 3 wells). Combinations that included bHLH factors Ascii, Ascl2, Ngnl and Ngn3 were normalized by subtracting the percent of Tuj 1 -positive cells generated from the bHLH factors alone (range 0.01-0.39%). (c) Co-labeling of Tuj l-positive candidate induced neurons with neuronal markers, TauEGFP, Map2, and Synapsin. Representative images are of cells generated with reprogramming pairs Ngn3 and Pitl 12 to 16 days post-induction. Scale bars, 100 pm. (d) Percentages of Tuj l-positive cells that co-express TauEGFP (n = 574 cells), Map2 (n = 574 cells) and Synapsin (n = 293 cells) generated with Ngn3/Pitl (N3.P1), Ngn3/Oct4 (N3.04 ), Ascl2/Brn3c (A2.B3c), NeuroD2/Brn3c (ND2.B3c) and Atohl/Brn3c (Atohl.B3c). Data are presented as mean ± SD from at least three independent experiments.

[0014] Figure 2 shows that reprogramming pairs generate iNs with distinct electrophysiological properties (a) Representative image of a whole-cell patch clamped cell expressing Synapsin-TdTomato exhibiting neuronal morphology. TdTomato intensity in the image was artificially adjusted to simultaneously visualize neurites and the cell body. Scale bar, 25 pm. (b) Representative membrane voltage responses from a TauEGFP-, Synapsin- TdTomato positive cell with neuronal morphology generated with Ascl2/Brn3c under whole cell patch clamp conditions at max current injection (top-left) and current steps until the first induction of action potentials (top-right). Current step traces are shown below voltage traces (bottom-left) (c) Induced neurons generated with five different TF pairs exhibit current- induced action potentials. Responses for all tested cells are shown (n = 60, responders = 58). (d) Representative current trace showing mEPSCs from one TauEGFP-, Synapsin-positive cell generated with Ngn3/Oct4. (e) Representative membrane voltage responses to depolarizing current steps of TauEGFP-, Synapsin-positive cells with neuronal morphology generated from left to right w ith Ngn3 Pill (N3.P1), Ngn3/Oci4 (N3.04 ), Ascl2/Brn3c (A2.B3), NeuroD2/Brn3c (ND2.B3c), and Atohl/Brn3c (Atohl.B3c). (f) Quantification from left to right, of resting membrane potentials, membrane input resistance, voltage sag behavior, and rheobase, for cells that exhibited current-induced action potentials per condition. Voltage sag behavior is plotted as the slope of the voltage sag vs. current relationship. Colored data points correspond to cells with traces shown: Ngn3/Pitl (N3.P1, blue), Ngn3/Oct4 (N3.04), Ascl2/Brn3c (A2.B3c), NeuroD2/Brn3c (ND2.B3c) and

Atohl/Brn3c (Atohl.B3c). n = 58 cells. Data are presented as mean ± SD. ***, ** and * represents a p-value < 0.001, 0.01, and 0.05, respectively (Bonferroni’s Multiple Comparison Test).

[0015] Figure 3 shows that RNA-Seq of iN populations reveal shared neuronal transcriptome with endogenous neurons (a) Top ten biological process (BP level 5) gene ontology (GO) terms over-represented in the TauEGFP positive iN populations and MEFs as determined by DAVID. Input gene lists included significantly enriched genes (p-adjusted < 0.05) identified by DESeq2 when comparing the 35 duplicate iN populations (3,860 genes) to MEFs (3,508 genes). Data are presented as fold enrichment, with corresponding FDR q- values (b) Bar plot of the number of genes shared between enriched genes of each individual endogenous and induced neuron populations and the core genes.

[0016] Figure 4 shows diverse transcriptional and functional properties of iN populations generated from distinct reprogramming factor pairs (a) Module eigengene expression (ME expression) of select WGCNA modules as depicted as bar plots of average ME expression for representative modules correlated with bHLH subclasses (M09), POU subclasses (M24), and non-linear/synergistic correlations (M25). Data are presented as mean ± SD. Colors highlight iNs populations generated with shared transcription factors. Below the module number is the number of genes assigned to the module (b) Heat map of expression of select neurotransmitter-associated genes. Expression levels of iN,

EndoNs/Brain, and MEF populations are defined as DESeq2 vsd-normalized RNA-Seq counts and scaled by row. Dendrogram represents hierarchical clustering based on correlation distance (c) Representative calcium response to 100-250 mM KC1 and 1 mM glutamate (Glu) plotted as AF/Fo versus time. Cell was not responsive to 100 mM nicotine (Nic), and buffer alone (But). Also shown is representative calcium response for 100-250 mM KC1, 1 mM glutamate (Glu) and 100 mM nicotine (Nic) plotted as AF/Fo versus time. Cell was not responsive to buffer alone (But) (d) Percentages of glutamate and nicotine responsive cells out of total KC1 responsive cells (n = 218 cells). Analyzed iN populations were grouped using hierarchical cluster analysis of glutamate (GluR) and nicotine receptor subunit (nAchR) expression. Group 1 consists of iN populations with the lowest overall expression of nAchRs while Group 2 consists of iN populations with the highest overall expression of nAchRs. Relative levels of GluRs were similar among groups. Populations in bold are replicate samples from independent experiments. *** represents a p-value < 0.001, ns is non-significant (unpaired Student’s /-test). Data are presented as mean ± SD. (e) Schematic of dopamine and noradrenaline biosynthesis pathway and heat map of expression of genes involved in dopamine/noradrenaline biosynthesis and re-uptake across all iN (green), EndoN (purple), and MEF (grey) populations. Expression patterns for populations generated with Ascll/Nurrl, Ascl2/Nurrl, Ascl5/Bm3c and NeuroD2/Bm3c are outlined with a black frame. Expression levels are defined as DESeq2 vsd-normalized RNA-Seq counts with replicates averaged. Dendrogram represents hierarchical clustering based on correlation distance. [0017] Figure 5 shows Tuj l immunostaining of MEF- and TTF-derived iNs and the p75-depeletion experiment (a) Tuj 1 immunofluorescence labeling of 35 of the 76 positive combinations that were selected for whole-transcriptome analysis. Fixed and stained on day 14-16 post-induction (b) Tuj l immunofluorescence labeling of conditions with individual bHLH factors Ascii. Ascl2. Ngnl and Ngn3. (c) Tuj l immunofluorescence labeling of MEFs treated with only rtTA, without reprogramming factors (d) Tuj l immunofluorescence labeling of tail tip fibroblasts (TTFs) derived from 3-day-old mice and transduced with select reprogramming combinations following the same reprogramming methods used with MEFs. Fixed and stained on day 16 post-induction (e) Tuj l

immunofluorescence of TTFs treated with only rtTA, without reprogramming factors, and fixed and stained on day 16 post-induction (f) Representative FACS gates of MEFs (-180,000 cells shown). MEFs were depleted of p75-positive neural crest cells by first gating for DAPI-negative cells (not shown) and collecting only those that were p75-negative (-93% of the DAPI-negative population) (g) Quantification of immunostaining for p75 positive cells in source and p75-depleted MEF populations after expansion for 4 days post-FACS, on the day of transduction for reprogramming. Data is presented as the mean ± SD, n = 3. (h) Quantification of the percent of Tuj l positive cells derived from source and p75-depleted MEF populations 16 days post-induction. A2, Ascl2 N3, Ngn3 ND2, NeuroD2 B3c, Brn3c Pl, Pitl. Data is presented as the mean ± SD, n = 3. Percentages between source and p75- depleted were not significantly different (p-value > 0.05, Bonferroni’s Multiple Comparison Test). Scale bars, 100 pm.

[0018] Figure 6 shows additional electrophysiological recordings of iNs from five different TF combinations (a-e) Example voltage responses of two representative iNs from five transcription factor combinations: (a) Ngn3/Pnl , (b) Ngn3/Ocl4 , (c) Ascl2/Brn3c , (d) NeuroD2/Brn3c, and (e)Atohl/Brn3c. Ceils were stimulated using incrementing levels of intracellular current starting at -100 to -50 pA and reaching levels where intense firing of action potentials was observed (f-j) Plots represent various physiological properties of the ceils and steps of analysis to extract these features (f) I-V relationship obtained by plotting the observed membrane potential as a function of the injected current of both maximal voltage deflections (black) and the membrane potential at the end of the current step (gray). Data from second NgnS/Pitl cell in (a) (g) Selected action potential of the second Ngn3/Oct4 cell in (h). The dual spike after hyperpolarization is indicative of Ca-dependent K-currents in this neuron (h) Input-output curve of the number of spikes as a function of the injected current. This cell starts firing at +100 pA level (rheobase). (i) Plot of the voltage sag (darker dots) and afterdepolarization (lighter dots) as a function of the current. The NeuroD2/Brn3c cells m (d) exhibit characteristic voltage sags under negative current levels. The second NeuroD2 / Brn3c cell also produces post-inhibitory rebound spikes (j) Plot of membrane resistance versus current. Darker symbols are resistance values calculated from maximal voltage deflections and lighter symbols were obtained from voltage levels just before the termination of the current step of the second Atohl/Brn3c ceil in (e). The decrease in of membrane resistance as a function of current indicates the action of potent outward rectifying K-currents. (k) Representative current traces showing mEPSCs from TauEGFP-, synapsin- positive cells generated with Ngn3/Oct4 and NeuroD2/Brn3c.

[0019] Figure 7 shows that transcription factor pairs generate functional human induced neurons from HEFs. Tuj l immunofluorescence labeling of human induced neurons reprogrammed from HEFs was done using mouse (Ngn3/Pitl) and human transcription factors (NGN3-P2A-PIT1), followed by quantification of Tuj l+/DAPI cells for mouse and human induced neurons derived from Ngn3/Pitl combination and rtTAonly. (A)

Representative images of human induced neurons reprogrammed from HEFs using mouse transcription factor combinations. Tuj 1 immunofluorescence labeling of 15 of the 76 positive pairwise combinations derived from the unbiased mouse screen. Fixed and stained on day 16- 18 post-induction. Scale bar, 50 pm. (B-E) Electrophysiological recordings were performed on human induced neurons generated with mouse Ngn3/Pitl combination between 26-31 days post-dox induction. (B) Representative voltage responses from a Synapsin-TdTomato positive cell with neuronal morphology. 21/27 fluorescent cells tested (77%) generated action potentials upon current injection. (C) Representative whole-cell currents evoked by hyperpolarizing and depolarizing voltage steps delivered from a holding potential of -65 mV. (D) Passive membrane properties of human induced neurons. Quantification of resting membrane potential (left), capacitance (middle), membrane resistance (right) is shown as mean ± SD (n=l5). (E) Steady-state currents versus voltage in individual cells reflect the expression of depolarization-induced voltage-gated outward currents (n=9).

DETAILED DESCRIPTION

I. Overview [0020] The transcriptional programs that establish neuronal identity evolved to produce a rich diversity of neuronal cell types that arise sequentially during development. Remarkably, transient expression of certain transcription factors (TFs) can also endow non- neural cells with neuronal properties, perhaps through convergent mechanisms. The present invention is predicated in part on the studies undertaken by the present inventors to decipher the relationship between reprogramming factors and neuronal diversity. Specifically, the inventor identified TF pairs that can produce induced neurons (iNs) with different biological properties. By linking distinct TF input“codes” to defined outputs, these studies uncovered new cell autonomous features of neuronal identity and greatly expanded the reprogramming toolbox to enable more precise engineering of neuronal subtypes for basic and translational research.

[0021] As detailed herein, the inventor and colleagues performed a large unbiased screen of -600 pairs of TFs to determine how many could induce neuronal identity in fibroblasts. Unexpectedly, it was found that more than 12% of the TF pairs tested (76/598) could reprogram fibroblasts into induced neurons (iNs) that express key neuronal markers and exhibit neuronal morphologies. The iNs reprogrammed by each TF pair exhibit unique transcriptional patterns that predict their electrophysiological and pharmacological properties and establish their similarity with endogenous neural subtypes including cortical, hypothalamic and cholinergic neurons. These iNs are electrically active and can form synaptic connections without co-culturing with glia. Using RNA-Seq profiling of 35 iN populations and a diverse set of endogenous neuronal subtypes, a“core” set of neuronally expressed genes have been defined, along with a regulatory logic underlying their expression. These studies

demonstrated that the identified gene expression dataset can be used to engineer neurons with desired patterns of gene expression, mirrored by pharmacological and electrophysiological properties. Further, patterns of gene co-regulation that depend on the inducing TF pairs were identified, which show that distinct iN populations most closely resemble different endogenous neuronal subtypes.

[0022] By defining a new set of cell autonomous transcriptional networks regulating key aspects of neuronal identity, the inducing TF dataset disclosed herein also offer provide a unique resource for direct reprogramming in a wide range of research and clinical applications. Consistently, the inventors also generated human iN subtypes with some of the TF pairs in vitro. It was observed that the generated human iNs possess desired functional properties and patterns of gene expression, which hold value for translational applications. Specifically, the inventors tested 15 different mouse TF pairs for their capacity to reprogram human fibroblasts and remarkably all pairs produced candidate human iNs. Importantly, the human iNs generated with mouse factors were electrically active, though parameters such as resting membrane potential suggested that more time in culture could promote maturation. The results from these studies indicate a potential for future human studies that aim to reprogram neurons from disease cohorts and diverse genetic backgrounds.

[0023] The following sections provide more detailed guidance for making and using the compositions of the invention, and for carrying out the methods of the invention.

II. Definitions

[0024] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention pertains. The following references provide one of skill with a general definition of many of the terms used in this invention: Academic Press Dictionary of Science and Technology, Morris (Ed.), Academic Press (I st ed., 1992); Oxford Dictionary of

Biochemistry and Molecular Biology, Smith et al. (Eds.), Oxford University Press (revised ed., 2000); Encyclopaedic Dictionary of Chemistry, Kumar (Ed.), Anmol Publications Pvt. Ltd. (2002); Dictionary of Microbiology and Molecular Biology, Singleton et al. (Eds.), John Wiley & Sons (3 rd ed., 2002); Dictionary of Chemistry, Hunt (Ed.), Routledge (I st ed., 1999); Dictionary of Pharmaceutical Medicine, Nahler (Ed.), Springer-Verlag Telos (1994);

Dictionary of Organic Chemistry, Kumar and Anandand (Eds.), Anmol Publications Pvt. Ltd. (2002); and A Dictionary of Biology (Oxford Paperback Reference), Martin and Hine (Eds.), Oxford University Press (4 th ed., 2000). In addition, the following definitions are provided to assist the reader in the practice of the invention.

[0025] The term "agent" or“test agent” includes any substance, molecule, element, compound, entity, or a combination thereof. It includes, but is not limited to, e.g., protein, polypeptide, small organic molecule, polysaccharide, polynucleotide, and the like. It can be a natural product, a synthetic compound, or a chemical compound, or a combination of two or more substances. Unless otherwise specified, the terms“agent”,“substance”, and “compound” are used interchangeably herein. [0026] The term "analog" is used herein to refer to a molecule that structurally resembles a reference molecule but which has been modified in a targeted and controlled manner, by replacing a specific substituent of the reference molecule with an alternate substituent. Compared to the reference molecule, an analog would be expected, by one skilled in the art, to exhibit the same, similar, or improved utility. Synthesis and screening of analogs, to identify variants of known compounds having improved traits (such as higher binding affinity for a target molecule) is an approach that is well known in pharmaceutical chemistry.

[0027] As used herein,“contacting” has its normal meaning and refers to combining two or more agents (e.g., polypeptides or small molecule compounds) or combining agents and cells. Contacting can occur in vitro, e.g., combining two or more agents or combining a test agent and a cell or a cell lysate in a test tube or other container. Contacting can also occur in a cell or in situ, e.g., contacting two polypeptides in a cell by coexpression in the cell of recombinant polynucleotides encoding the two polypeptides, or in a cell lysate.

[0028] Stem cells are cells characterized by the ability of self-renewal through mitotic cell division and the potential to differentiate into a tissue or an organ. Embryonic stem cells (ESCs) are pluripotent stem cells derived from the inner cell mass of a blastocyst, an early-stage embryo. ESCs are pluripotent, that is, they are able to differentiate into all derivatives of the three primary germ layers: ectoderm, endoderm, and mesoderm. These include each of the more than 220 cell types in the adult body. Pluripotency distinguishes embryonic stem cells from adult stem cells found in adults; while embryonic stem cells can generate all cell types in the body, adult stem cells are multipotent and can produce only a limited number of cell types. Additionally, under defined conditions, embryonic stem cells are capable of propagating themselves indefinitely. This allows embryonic stem cells to be employed as useful tools for both research and regenerative medicine, because they can produce limitless numbers of themselves for continued research or clinical use.

[0029] Induced pluripotent stem cells (iPSCs) are a type of pluripotent stem cell artificially derived from a non-pluripotent cell - typically an adult somatic cell - by inducing a "forced" expression of specific genes. Induced pluripotent stem cells are similar to natural pluripotent stem cells, such as embryonic stem (ES) cells, in many aspects, such as the expression of certain stem cell genes and proteins, chromatin methylation patterns, doubling time, embryoid body formation, teratoma formation, viable chimera formation, and potency and differentiability, but the full extent of their relation to natural pluripotent stem cells is still being assessed. Induced pluripotent cells have been made from adult stomach, liver, skin cells, blood cells, prostate cells and urinary tract cells.

[0030] The term“modulate” with respect to a reference molecule or cellular activity (e.g., transcription or DNA methylation) refers to inhibition or activation of a biological activity of the reference molecule or cellular activity. Modulation can be up- regulation (i.e., activation or stimulation) or down-regulation (i.e., inhibition or suppression). The mode of action can be direct, e.g., through binding to the reference molecule. The modulation can also be indirect, e.g., through binding to and/or modifying another molecule which otherwise modulates the reference molecule.

[0031] Enhanced efficiency of conversion refers to an up-regulated ability of a culture of non-neuronal cells to give rise to the induced neurons when contacted with a compound (or subjected to a genetic or epigenetic modulation) relative to a culture of the same type of cells that is not contacted with the compound (or subjected to the modulation). By enhanced, it is meant that the cell cultures have an ability to give rise to induced neurons that is greater than the ability of a population that is not contacted with the candidate agent or induction agent, e.g., 150%, 200%, 300%, 400%, 600%, 800%, 1000%, or 2000% of the ability of the uncontacted (or unmodulated) population. In other words, the cell cultures produce 1.5-fold or more, 2-fold or more, 3-fold or more, 4-fold or more, 6-fold or more, 8- fold or more, lO-fold or more, 20-fold or more, 30-fold or more, 50-fold or more, lOO-fold or more, 200-fold or more the number of induced neurons as the uncontacted (or unmodulated) population.

[0032] “Polynucleotide” or“nucleic acid sequence” refers to a polymeric form of nucleotides (polyribonucleotide or polydeoxyribonucleotide). In some instances, a polynucleotide refers to a sequence that is not immediately contiguous with either of the coding sequences with which it is immediately contiguous (one on the 5’ end and one on the 3’ end) in the naturally occurring genome of the organism from which it is derived. The term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences. Polynucleotides can be ribonucleotides, deoxyribonucleotides, or modified forms of either nucleotide.

[0033] A polypeptide or protein refers to a polymer in which the monomers are amino acid residues that are joined together through amide bonds. When the amino acids are alpha-amino acids, either the L-optical isomer or the D-optical isomer can be used, the L- isomers being typical. A polypeptide or protein fragment can have the same or substantially identical amino acid sequence as the naturally occurring protein. A polypeptide or peptide having substantially identical sequence means that an amino acid sequence is largely, but not entirely, the same, but retains a functional activity of the sequence to which it is related.

[0034] Polypeptides may be substantially related due to conservative substitutions.

A conservative variation denotes the replacement of an amino acid residue by another, biologically similar residue. Examples of conservative variations include the substitution of one hydrophobic residue such as isoleucine, valine, leucine or methionine for another, or the substitution of one polar residue for another, such as the substitution of arginine for lysine, glutamic for aspartic acids, or glutamine for asparagine, and the like. Other illustrative examples of conservative substitutions include the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine;

glutamine to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to arginine, glutamine, or glutamate; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine;

tyrosine to tryptophan or phenylalanine; valine to isoleucine to leucine.

[0035] The bHLH transcription factors refer to a large family of transcription factors found in almost all eukaryotes that have a basic helix-loop-helix protein structural motif. These proteins bind as dimers to specific DNA target sites, and are important regulators of embryonic development, particularly in neurogenesis, myogenesis, heart development and hematopoiesis in animals. Members of the bHLH superfamily have two highly conserved and functionally distinct domains, which together make up a region of approximately 60 amino-acid residues. At the amino-terminal end of this region is the basic domain, which binds the transcription factor to DNA at a consensus hexanucleotide sequence known as the E box. Different families of bHLH proteins recognize different E-box consensus sequences. At the carboxy-terminal end of the region is the HLH domain, which facilitates interactions with other protein subunits to form homo- and hetero-dimeric complexes. Many different combinations of dimeric structures are possible, each with different binding affinities between monomers. The heterogeneity in the E-box sequence that is recognized and the dimers formed by different bHLH proteins determines how they control diverse developmental functions through transcriptional regulation. Based on evolutionary relationships, E-box binding and the presence or absence of additional domains, bHLH proteins are classified into 6 major groups, A-F.

[0036] The POU (Pit-Oct-Unc) family of transcription factors was originally defined on the basis of a common DNA binding domain in the mammalian factors Pit-l, Oct- 1, and Oct-2 as well as the nematode protein Unc-86. Subsequently, a number of other POU family factors have been identified in both vertebrates and invertebrates. Many of these original and subsequently isolated members of the family have been shown to play critical roles in the development and functioning of the nervous system, e.g., the Oct-2 factor and the Bm-3 factors. The POU domain is a bipartite domain composed of two subunits separated by a non-conserved region of 15-55 aa. The N-terminal subunit is known as the POU-specific (POUs) domain, while the C-terminal subunit is a homeobox domain. POU transcription factors include at least subfamilies POU1-POU6 (or Classes I-VI). For example, Pit-l belongs to Class I, Oct-l and Oct-2 are members of Class II, Oct-6, Bml, Bm2 and Bm4 are members of Class III, and Unc-86 is a member of Class IV as are Bm3a, Bm3b and Bm3c. The six classes diverged early in animal evolution: POU1, POU3, POU4, and POU6 classes evolved before the last common ancestor of sponges and eumetazoans, POU2 evolved in the Bilatera, and POU5 appears to be unique to vertebrates.

[0037] The term "subject" includes mammals, especially humans, as well as other non-human animals, e.g., horse, dogs and cats.

[0038] A“substantially identical” nucleic acid or amino acid sequence refers to a polynucleotide or amino acid sequence which comprises a sequence that has at least 75%, 80% or 90% sequence identity to a reference sequence as measured by one of the well-known programs described herein (e.g., BLAST) using standard parameters. The sequence identity is preferably at least 95%, more preferably at least 98%, and most preferably at least 99%. In some embodiments, the subject sequence is of about the same length as compared to the reference sequence, i.e., consisting of about the same number of contiguous amino acid residues (for polypeptide sequences) or nucleotide residues (for polynucleotide sequences).

[0039] Sequence identity can be readily determined with various methods known in the art. For example, the BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989)). Percentage of sequence identity is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

[0040] As used herein, "treating" or "ameliorating" includes (i) preventing a pathologic condition (e.g., a neuronopathy) from occurring (e.g. prophylaxis); (ii) inhibiting the pathologic condition (e.g., sensory neuronopathy) or arresting its development; and (iii) relieving symptoms associated with the pathologic condition (e.g., a neuronopathy). Thus, "treatment" includes the administration of an isolated (and/or purified) iN population of the invention and/or other therapeutic compositions or agents to prevent or delay the onset of the symptoms, complications, or biochemical indicia of a disease described herein, alleviating or ameliorating the symptoms or arresting or inhibiting further development of the disease, condition, or disorder. "Treatment" further refers to any indicia of success in the treatment or amelioration or prevention of the disease, condition, or disorder described herein, including any objective or subjective parameter such as abatement; remission; diminishing of symptoms or making the disease condition more tolerable to the patient; slowing in the rate of degeneration or decline; or making the final point of degeneration less debilitating.

Detailed procedures for the treatment or amelioration of the disorder or symptoms thereof can be based on objective or subjective parameters, including the results of an examination by a physician.

[0041] A "variant" of a reference molecule (e.g., aNeurogenin 1 or Neurogenin 2) is meant to refer to a molecule substantially similar in structure and biological activity to either the entire reference molecule, or to a fragment thereof. Thus, provided that two molecules possess a similar activity, they are considered variants as that term is used herein even if the composition or secondary, tertiary, or quaternary structure of one of the molecules is not identical to that found in the other, or if the sequence of amino acid residues is not identical.

[0042] A "vector" is a replicon, such as plasmid, phage or cosmid, to which another polynucleotide segment may be attached so as to bring about the replication of the attached segment. Vectors capable of directing the expression of genes encoding for one or more polypeptides are referred to as "expression vectors".

[0043] A retrovirus (e.g., a lentivirus) based vector or retroviral vector means that genome of the vector comprises components from the virus as a backbone. The viral particle generated from the vector as a whole contains essential vector components compatible with the RNA genome, including reverse transcription and integration systems. Usually these will include the gag and pol proteins derived from the virus. If the vector is derived from a lentivirus, the viral particles are capable of infecting and transducing non-dividing cells. Recombinant retroviral particles are able to deliver a selected exogenous gene or

polynucleotide sequence such as therapeutically active genes, to the genome of a target cell.

III. Transcription factor pairs for generating induced neurons

[0044] The invention provides methods for converting a non-neuronal cell into induced neurons with diverse transcriptional profiles and functional properties. The methods involve recombinant expression of one of a number of pairs of transcription factors (TFs) delineated by the inventors. As detailed herein, one of the pair of TFs is typically a basic helix-loop-helix (bHLH) transcription factor, and the other one is a POU transcription factor or nuclear receptor (NR) Nurrl. In some exemplified embodiments, the TF pairs for inducing neurons with diverse properties are listed in Column 1 of each of Tables 1-11. The bHLH transcription factors refer to a large family of transcription factors found in almost all eukaryotes that have a basic helix-loop-helix protein structural motif. These proteins bind as dimers to specific DNA target sites, and are important regulators of embryonic development, particularly in neurogenesis, myogenesis, heart development and hematopoiesis in animals. Members of the bHLH superfamily have two highly conserved and functionally distinct domains, which together make up a region of approximately 60 amino-acid residues. At the amino-terminal end of this region is the basic domain, which binds the transcription factor to DNA at a consensus hexanucleotide sequence known as the E box. Different families of bHLH proteins recognize different E-box consensus sequences. At the carboxy -terminal end of the region is the HLH domain, which facilitates interactions with other protein subunits to form homo- and hetero-dimeric complexes. Many different combinations of dimeric structures are possible, each with different binding affinities between monomers. The heterogeneity in the E-box sequence that is recognized and the dimers formed by different bHLH proteins determines how they control diverse developmental functions through transcriptional regulation. Based on evolutionary relationships, E-box binding and the presence or absence of additional domains, bHLH proteins are classified into 6 major groups, A-F.

[0045] The POU (Pit-Oct-Unc) family of transcription factors was originally defined on the basis of a common DNA binding domain in the mammalian factors Pit-l, Oct- 1, and Oct-2 as well as the nematode protein Unc-86. Subsequently, a number of other POU family factors have been identified in both vertebrates and invertebrates. Many of these original and subsequently isolated members of the family have been shown to play critical roles in the development and functioning of the nervous system, e.g., the Oct-2 factor and the Bm-3 factors. The POU domain is a bipartite domain composed of two subunits separated by a non-conserved region of 15-55 aa. The N-terminal subunit is known as the POU-specific (POUs) domain, while the C-terminal subunit is a homeobox domain. POU transcription factors include at least subfamilies POU1-POU6 (or Classes I-VI). For example, Pit-l belongs to Class I, Oct-l and Oct-2 are members of Class II, Oct-6, Bml, Bm2 and Bm4 are members of Class III, and Unc-86 is a member of Class IV as are Bm3a, Bm3b and Bm3c. The six classes diverged early in animal evolution: POU1, POU3, POU4, and POU6 classes evolved before the last common ancestor of sponges and eumetazoans, POU2 evolved in the Bilatera, and POU5 appears to be unique to vertebrates. [0046] As detailed in the Examples below, the specific bHLH TFs in the various pairs of TFs capable of converting non-neuronal cells into different neuron subtypes include Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7, Atoh8, Myf5 and Ptfla. The other TF in the pairs of TFs delineated by the inventors include POU TFs Bm3c, Bm3a, Bm3b, Bm2, Bm4, Bml, Oct6, Pitl, and Oct4, as well as NR TF Nurrl. In some embodiments, the employed TF pair are Bm3c and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngnl, Ngn2, Ngn3, ND1, ND2, Nd6, Atohl, Atoh7 and Myf5. In some embodiments, the employed TF pair are Bm3a and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ascl4, Ascl5, Ngn3, ND1, ND2, ND4, ND6, Atohl, Atoh7 and Atoh8. In some other embodiments, the employed TF pair are Bm3b and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2, Atohl, Myf5 and Ptfla. In some other embodiments, the employed TF pair are Bm2 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl. In still some embodiments, the employed TF pair are Bm4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1. In some embodiments, the employed TF pair are Bml and a bHLH TF selected from the group consisting of Ngnl, Ngn2, Ngn3 and ND1. In some other embodiments, the employed TF pair are Oct6 and a bHLH TF selected from the group consisting of Ascll, Ngnl, Ngn2, Ngn3 and ND1. In some embodiments, the employed TF pair are Pitl and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3 and ND1.

In some other embodiments, the employed TF pair are Oct4 and a bHLH TF selected from the group consisting of Ascll, Ascl2, Ngnl, Ngn2, Ngn3, ND1, ND2 and Atohl. In some embodiments, the employed TF pair are Nurrl and bHLH TF Ascl2.

[0047] The various TFs described herein are all well known, and have been structurally and functionally characterized. Genomic and cDNA sequences of these genes are all known in the art. For example, sequences of various human TFs described herein have been described in the literature. See, e.g., Xiang et al., J. Neurosci. 15, 4762-4785, 1995 (Bm3c, Accession No. NP_00269l. l), Collum et al, Nucleic Acids Res. 20, 4919-4925,

1992 (Bm3a, Accession No. NP_006228.3), Ring & Latchman, Nucleic Acids Res. 21, 2946,

1993 (Bm3b, Accession No. NP_004566.2), He et al, Nature 340 (6228), 35-41, 1989 (Bm2, Accession No. NP_005595.2; Bml, Accession No. NP_006227.l), Douville et al, Mamm. Genome 5 (3), 180-182, 1994 (Bm4, Accession No. NP_000298.3), Gong et al, Exp.

Hematol. 30 (10), 1162-1169, 2002 (Oct6, Accession No. NR_149116.2), O'Hara et al, Cell Growth Differ. 1, 119-127, 1990 (Pitl, Accession No. NP_005406.3), Schoorlemmer & Kruijer, Mech. Dev. 36, 75-86, 1991; and Takeda et al, Nucleic Acids Res. 20, 4613-4620, 1992 (Oct4, Accession No. NP_002692.2 or NP_976034.4), Mages et al, Mol. Endocrinol. 8, 1583-1591, 1994 (Nurrl, Accession No. NP_006l77. l), Ball et al, Proc. Natl. Acad. Sci. U.S.A. 90, 5648-5652, 1993 (Ascll, Accession No. NP_004307.2), Miyamoto et al, Cytogenet. Cell Genet. 73, 312-314, 1996) (Ascl2, Accession No. NP_005l6l. l), McLellan et al, Gene Expr. Patterns 2, 329-335, 2002 (Ascl4, Accession No. NP_982260.2), McLellan et al, Mech. Dev. 119, S285-S291, 2002) (Ascl5, Accession No. Q7RTU5.2 or

NP_00l257530), McCormick et al, Mol. Cell. Biol. 16, 5792-5800, 1996 (Ngnl, Accession No. NP_006l52.2), Gradwohl et al, Dev. Biol. 180, 227-241, 1996 (Ngn2, Accession No. NP_076924. l), Sommer et al, Mol. Cell. Neurosci. 8, 221-241, 1996 (Ngn3, Accession No. NP_066279.2), Yokoyama et al, DNA Res. 3, 311-320, 1996 (ND1, Accession No.

NP_00249l), McCormick et al, Mol. Cell. Biol. 16, 5792-5800, 1996 (ND2, Accession No. NP_006l5l.3), Horikawa et al, Diabetes 49, 1955-1957, 2000 (ND4, Accession No.

NP_0670l4.2), Guo et al, J. Genet. 81, 13-17, 2002 (ND6, Accession No. NP_073565.2), Ben-Arie et al., Hum. Mol. Genet. 5, 1207-1216, 1996 (Atohl, Accession No. NP_005l63. l), Brown et al., Mamm. Genome 13, 95-101, 2002 (Atoh7, Accession No. NP_660l6l. l), Talmud et al, Am. J. Hum. Genet. 85, 628-642, 2009 (Atoh8, Accession No. NP_l 16216.2), Braun et al., EMBO J. 8, 701-709 (1989) (Myf5, Accession No. NP_005584.2), and Roux et al, Genes Dev. 3, 1613-1624, 1989 (Ptfla, Accession No. NP_835455. l). In addition to human TFs, the methods of the invention may also employ orthologs of these TFs from other animal species. As exemplified herein for mouse orthologs, these non-human TFs can also be obtained via routinely practiced cloning techniques based on their sequences that have been characterized in the art.

[0048] Other than the wildtype genes (or cDNA sequences) of the pair of TFs described above, variants or functional derivatives of such genes (or cDNA sequences) may also be used in the invention. Thus, methods of the invention can utilize a variant or modified sequence of a given TF that is substantially identical to its wildtype counterpart, e.g., conservatively modified variants. For example, the substantially identical variants should contain a sequence that is at least 80%, 90%, 95% or 99% identical to the wildtype sequence. In some embodiments, the functional derivatives are variants produced by non conservative substitutions to the extent that that they substantially retain the activities of the native proteins. Modification to a polynucleotide encoding a polypeptide of interest can be performed with standard techniques routinely practiced in the art. In some other embodiments, the functional derivatives can contain a partial sequence of the wildtype gene or cDNA sequence of the TF. Such partial sequence should encode a functionally fragment that possesses some or all of the cellular functions of the wildtype protein, e.g., activities in regulating neurogenesis. Cellular functions (e.g., transcriptional regulation) of many of the TFs described herein have been characterized in the art. Based on their structural and functional information known in the art, cloning and expression of functional fragments of these transcription factors can be readily carried out via standard techniques of molecular biology. In addition, the functional derivatives of the TFs described herein can be subject to the screening methods described below to confirm their activities in promoting formation of induced neurons.

IV. Non-neuronal cells for generating induced specific subtypes of neurons

[0049] Various non-neuronal cells can be employed in the present invention for generating induced neurons with distinct phenotypes or cellular functions. These include fibroblasts, stem cells, blood cells, and other non-neuronal somatic cells. The non-neuronal cells can be obtained from both human and non-human animals including vertebrates and mammals. Thus, other than human cells and mouse cells as exemplified herein, the cells can also be from other animal species such as bovine, ovine, porcine, canine, feline, avian, bony and cartilaginous fish, rats, other primates including monkeys, as well as other animals such as ferrets, sheep, rabbits and guinea pigs.

[0050] In general, non-neuronal cells suitable for the methods of the invention can be any somatic cells that would not give rise to a neuron in the absence of experimental manipulation. Examples of such non-neuronal somatic cells include differentiating or differentiated cells from ectodermal (e.g., keratinocytes), mesodermal (e.g., fibroblast), endodermal (e.g., pancreatic cells), or neural crest lineages (e.g., melanocytes). The somatic cells may be, for example, pancreatic beta cells, glial cells (e.g., oligodendrocytes, astrocytes), hepatocytes, hepatic stem cells, cardiomyocytes, skeletal muscle cells, smooth muscle cells, hematopoietic cells, osteoclasts, osteoblasts, pericytes, vascular endothelial cells, schwann cells, dermal fibroblasts, and the like. They may be terminally differentiated cells, or they may be capable of giving rise to cells of a specific, non-neuronal lineage, e.g., cardiac stem cells, hepatic stem cells, and the like. The somatic cells are readily identifiable as non-neuronal by the absence of neuronal-specific markers that are well-known in the art, as described herein. Of interest are cells that are vertebrate cells, e.g., mammalian cells, such as human cells, including adult human cells.

[0051] Some preferred embodiments of the invention utilize fibroblasts to generate iNs. These include both embryonic fibroblasts and adult fibroblasts. The fibroblasts can be obtained or derived from various animal (e.g., mammal) species, e.g., human, mouse and rat. In some embodiments, stem cells can be used for conversion into iNs. Stem cells suitable for practicing the invention include and are not limited to hematopoietic stem cells (HSC), embryonic stem cells, mesenchymal stem cells, and also induced pluripotent stem cells (iPSCs). For example, the methods can employ an iPSC that is derived from a peripheral blood mononuclear cell or a lymphoblastoid cell line (LCL). Still some embodiments of the invention can utilize somatic cells other than fibroblasts such as blood cells. In these embodiments, blood cells obtained from various organs including, e.g., liver, spleen, bone marrow and the lymphatic system, may all be employed in the practice of the invention. In addition, methods of the invention may also be used for generating iNs from peripheral blood cells such as erythrocytes, leukocytes and thrombocytes. In some other embodiments, the employed non-neuronal somatic cells can be glial cells (glia). Glia or glial cells refer to non neuronal cells found in close contact with neurons, and encompass a number of different cells, including but not limited to the microglia, macroglia, neuroglia, astrocytes, astroglia, oligodendrocytes, ependymal cells, radial glia, Schwann cells, satellite cells, and enteric glial cells.

V. Expressing transcription factor pairs in non-neuronal cells

[0052] Recombinant expression of a TF pair descripted herein or their functional variants in a non-neuronal cell can be carried out in accordance with the methods exemplified below and/or other methods well known in the art. Preferably, genes or polynucleotide sequences encoding the TFs are transiently expressed in the non-neuronal cell. This can be accomplished via cloning the genes (genomic or cDNA sequences) into expression vector(s) and then introducing the expression vector(s) into the target non-neuronal cells. The two genes can be cloned into and expressed separately from two expression vectors, as exemplified herein. Alternatively, the two TF-encoding sequences can be co-expressed from the same vector. Preferably, the genes are cloned into retroviruses or retroviral vectors (lentiviral vectors) for transducing into the non-neuronal cells. As demonstrated in the Examples below, recombinant retroviral vectors expressing the different TF pairs can be readily constructed by inserting the TF-encoding sequences operably into the vector, replicating the vector in an appropriate packaging cell as described herein, obtaining viral particles produced therefrom and then infecting target non-neuronal cells (e.g., fibroblasts) with the recombinant viruses. In some embodiments, two recombinant viral particles each expressing one of the two TFs in a given TF pair are pooled before infecting the target non neuronal cells.

[0053] Cloning polynucleotide sequences (e.g., cDNA sequences) encoding one or both of the TF pair into expression vectors and expressing the TF pair in non-neuronal cells can be performed using the specific protocols described herein and methods routinely practiced in the art, e.g., as described in Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, N.Y., (3 rd ed., 2000); and Brent et al, Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (ringbou ed., 2003). Detailed procedures for cloning genes of interest into lentiviral vectors, producing lentiviral viruses in packaging cells (e.g., 293T cells), and infecting host cells with the viruses for expression of the TF pair are also described in the art, e.g., Boland et al. Nature 461, 91-94, 2009; and Blanchard et al,

Nat. Neurosci. 18, 25-35, 2015. Unless otherwise stated, other procedures or steps required for practicing the present invention can be based on standard procedures as described, e.g., in Murray et al, Gene Transfer and Expression Protocols, The Humana Press Inc. (1991);

Davis et al, Basic Methods in Molecular Biology, Elsevier Science Publishing, Inc., New York, USA (1986); or Methods in Enzymology: Guide to Molecular Cloning Techniques Vol. 152, S. L. Berger and A. R. Kimmerl Eds., Academic Press Inc., San Diego, USA (1987); Current Protocols in Protein Science (CPPS) (John E. Coligan, et. al, ed., John Wiley and Sons, Inc.), Current Protocols in Cell Biology (CPCB) (Juan S. Bonifacino et. al. ed., John Wiley and Sons, Inc.), and Culture of Animal Cells: A Manual of Basic Technique by R. Ian Freshney, Publisher: Wiley -Liss; 5th edition (2005), Animal Cell Culture Methods (Methods in Cell Biology, Vol. 57, Jennie P. Mather and David Barnes editors, Academic Press, lst edition, 1998).

[0054] In some embodiments, expression of the TF encoding sequences in the non neuronal cells is controlled temporally. Temporal expression of these genes can be achieved via, e.g., the use of an inducible expression system. Any inducible expression method can be employed in the practice of the present invention. For example, the expression vectors can incorporate an inducible promoter that is active under environmental or developmental regulation, e.g., doxycycline (dox)-inducible lentiviral vectors. As exemplified herein, the genes can be expressed under the control of a promoter that is activated when bound by a reverse tetracycline trans activator (rtTA) and contacted by doxycycline, tetracycline, or a tetracycline analog. The tTA protein is created by fusing one protein, TetR (tetracycline repressor), found in Escherichia coli bacteria, with the activation domain of another protein, VP16, found in the Herpes Simplex Virus. The resulting tTA protein is able to bind to DNA at specific TetO operator sequences. In the inducible expression system, several repeats of the TetO sequences are placed upstream of a minimal promoter such as the CMV promoter. The entirety of several TetO sequences with a minimal promoter is called a tetracycline response element (TRE), because it responds to binding of the tetracycline transactivator protein tTA by increased expression of the gene or genes downstream of its promoter.

Typically, in addition to the expression vector(s) under the control of TetO for expressing the intended exogenous genes (e.g., the TF encoding sequences), another vector for expressing a reverse tet transactivator is included in the inducible expression system.

[0055] The inducible expression system allows optimization of expression of the

TF encoding sequences in the non-neuronal cells that are appropriate for the neurons to mature. The inducible expression system may also allow for higher and/or more prolonged expression of the TFs compared to non-inducible expression systems. In some preferred embodiments of the invention, induction of the TF pair expression can last for at least about 2 days, 4 days, 8 days, 12 days, e.g., between 2-8, 4-10, 6-12, 8-14, 10-20, 12-30, or 15-40 days. The period of induction refers to the period from initial expression of the TF encoding sequences (or induction of expression with addition of doxycycline as exemplified herein) to the time the iNs are selected (or termination of induction). Following introduction of the expression vectors under the control of TetO, expression of the TF encoding sequences can be induced in the non-neuronal cells with tetracycline, doxy cy cline, or another tetracycline analog, and the cells can be cultured and selected for iNs. Detailed protocols for inducible expression of exogenous genes in a host cell, e.g., using the rtTA/TetO system exemplified herein, are well known in the art (e.g., WO 2011005580).

[0056] Retroviruses are a group of single-stranded RNA viruses characterized by an ability to convert their RNA to double-stranded DNA in infected cells by a process of reverse-transcription. The resulting DNA then stably integrates into cellular chromosomes as a provirus and directs synthesis of viral proteins. The integration results in the retention of the viral gene sequences in the recipient cell and its descendants. The retroviral genome contains three genes, gag, pol, and env that code for capsid proteins, polymerase enzyme, and envelope components, respectively. A sequence found upstream from the gag gene contains a signal for packaging of the genome into virions. Two long terminal repeat (LTR) sequences are present at the 5' and 3' ends of the viral genome. These elements contain strong promoter and enhancer sequences and are also required for integration in the host cell genome.

[0057] Retroviral vectors or recombinant retroviruses are widely employed in gene transfer in various therapeutic or industrial applications. For example, gene therapy procedures have been used to correct acquired and inherited genetic defects, and to treat cancer or viral infection in a number of contexts. The ability to express artificial genes in humans facilitates the prevention and/or cure of many important human diseases, including many diseases which are not amenable to treatment by other therapies. For a review of gene therapy procedures, see Anderson, Science 256:808-813, 1992; Nabel & Felgner, TIBTECH 11 :211-217, 1993; Mitani & Caskey, TIBTECH 11 : 162-166, 1993; Mulligan, Science 926- 932, 1993; Dillon, TIBTECH 11 : 167-175, 1993; Miller, Nature 357:455-460, 1992; Van Brunt, Biotechnology 6: 1149-1154, 1998; Vigne, Restorative Neurology and Neuroscience 8:35-36, 1995; Kremer & Perricaudet, British Medical Bulletin 51 :31-44, 1995; Haddada el al, in Current Topics in Microbiology and Immunology (Doerfler & Bohm eds., 1995); and Yu et al, Gene Therapy 1: 13-26, 1994.

[0058] To construct lentiviral or retroviral vectors for transient expression of the

TF encoding sequences, a polynucleotide encoding one or both of the genes is inserted into the viral genome in the place of certain viral sequences to produce a viral construct that is replication-defective. In order to produce virions, a producer host cell or packaging cell line is employed. The host cell usually expresses the gag, pol, and env genes but without the LTR and packaging components. When the recombinant viral vector containing the gene of interest together with the retroviral LTR and packaging sequences is introduced into this cell line (e.g., by calcium phosphate precipitation), the packaging sequences allow the RNA transcript of the recombinant vector to be packaged into viral particles, which are then secreted into the culture media. The media containing the recombinant retroviruses is then collected, optionally concentrated, and used for transducing host cells (e.g., fibroblasts or stem cells) in gene transfer applications.

[0059] Suitable host or producer cells for producing recombinant lentiviral or retroviruses (or viral vectors) according to the invention are well known in the art (e.g., 293T cells exemplified herein). Many retroviruses have already been split into replication defective genomes and packaging components. For other retroviruses, vectors and corresponding packaging cell lines can be generated with methods routinely practiced in the art. The producer cell typically encodes the viral components not encoded by the vector genome such as the gag, pol and env proteins. The gag, pol and env genes may be introduced into the producer cell and stably integrated into the cell genome to create a packaging cell line. The retroviral vector genome is then introduced-into the packaging cell line by transfection or transduction to create a stable cell line that has all of the DNA sequences required to produce a retroviral vector particle. Another approach is to introduce the different DNA sequences that are required to produce a retroviral vector particle, e.g. the env coding sequence, the gag-pol coding sequence and the defective retroviral genome into the cell simultaneously by transient triple transfection. Alternatively, both the structural components and the vector genome can all be encoded by DNA stably integrated into a host cell genome.

[0060] The methods of the invention can be practiced with various retroviral vectors and packaging cell lines well known in the art. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), simian immunodeficiency virus (SIV), human immunodeficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al. , J. Virol. 66:2731-2739, 1992; Johann et al., J Virol. 66: 1635-1640, 1992; Sommerfelt et al, Virol. 176:58-59, 1990; Wilson et al, J Virol. 63:2374-2378, 1989; Miller et al, J Virol. 65:2220-2224, 1991; and PCT/US94/05700). Particularly suitable for the present invention are lentiviral vectors. Lentiviral vectors are retroviral vector that are able to transducer or infect non-dividing cells and typically produce high viral titers. Lentiviral vectors have been employed in gene therapy for a number of diseases. For example, hematopoietic gene therapies using lentiviral vectors or gamma retroviral vectors have been used for x-linked adrenoleukodystrophy and beta thalassaemia. See, e.g., Kohn et al, Clin. Immunol. 135:247-54, 2010; Cartier et al, Methods Enzymol. 507: 187-198, 2012; and Cavazzana-Calvo et al, Nature 467:318-322, 2010. Methods of the invention can be readily applied in gene therapy or gene transfer with such vectors. In some other embodiments, other retroviral vectors can be used in the practice of the methods of the invention. These include, e.g., vectors based on human foamy virus (HFV) or other viruses in the Spumavirus genera.

[0061] In particular, a number of viral vector approaches are currently available for gene transfer in clinical trials, with retroviral vectors by far the most frequently used system. All of these viral vectors utilize approaches that involve complementation of defective vectors by genes inserted into helper cell lines to generate the transducing agent. pLASN and MFG-S are examples are retroviral vectors that have been used in clinical trials (Dunbar et al, Blood 85:3048-305 (1995); Kohn et al, Nat. Med. 1: 1017-102 (1995); Malech et al,

Proc. Natl. Acad. Sci. USA. 94:22 12133-12138 (1997)). PA3l7/pLASN was the first therapeutic vector used in a gene therapy trial (Blaese et al, Science 270:475-480, 1995). Transduction efficiencies of 50% or greater have been observed for MFG-S packaged vectors (Ellem et al, Immunol Immunother . 44: 10-20, 1997; Dranoff et al, Hum. Gene Ther. 1: 111- 2, 1997). Many producer cell lines or packaging cell lines for transfecting retroviral vectors and producing viral particles are also known in the art. The producer cell to be used in the invention needs not to be derived from the same species as that of the target cell (e.g., human target cell). Instead, producer or packaging cell lines suitable for the present invention include cell lines derived from human (e.g., HEK 293 cell or 293T cell), monkey (e.g., COS- 1 cell), mouse (e.g., NIH 3T3 cell) or other species (e.g., canine). Some of the cell lines are disclosed in the Examples below. Additional examples of retroviral vectors and compatible packaging cell lines for producing recombinant retroviruses in gene transfers are reported in, e.g., Markowitz et al, Virol. 167:400-6, 1988; Meyers et al, Arch. Virol. 119:257-64, 1991 (for spleen necrosis virus (SNV)-based vectors such as vSN02l); Davis et al, Hum. Gene. Ther. 8: 1459-67, 1997 (the“293-SPA” cell line); Povey et al., Blood 92:4080-9, 1998 (the “1MI-SCF” cell line); Bauer et al, Biol. Blood Marrow Transplant. 4: 119-27, 1998 (canine packaging cell line“DA”); Gerin et al, Hum. Gene Ther. 10: 1965-74, 1999; Sehgal et al, Gene Ther. 6: 1084-91, 1999; Gerin et al, Biotechnol. Prog. 15:941-8, 1999; McTaggart et al, Biotechnol. Prog. 16:859-65, 2000; Reeves et al, Hum. Gene. Ther. 11:2093-103, 2000; Chan et al, Gene Ther. 8:697-703, 2001; Thaler et al, Mol. Ther. 4:273-9, 2001; Martinet et al, Eur. J. Surg. Oncol. 29:351-7, 2003; and Lemoine et al, I .Gene Med. 6:374-86, 2004. Any of these and other retroviral vectors and packaing producer cell lines can be used in the practice of the present invention.

[0062] Many of the retroviral vectors and packing cell lines used for gene transfer in the art can be obtained commercially. For example, a number of retroviral vectors and compatible packing cell lines are available from Clontech (Mountain View, CA). Examples of lentiviral based vectors include, e.g., pLVX-Puro, pLVX-IRES-Neo, pLVX-IRES-Hyg, and pLVX-IRES-Puro. Corresponding packaging cell lines are also available, e.g., Lenti-X 293T cell line. In addition to lentiviral based vectors and packaging system, other retroviral based vectors and packaging systems are also commercially available. These include MMLV based vectors pQCXIN, pQCXIQ and pQCXIH, and compatible producer cell lines such as HEK 293 based packaging cell lines GP2-293, EcoPack 2-293 and AmphoPack 293, as well as NIH/3T3 -based packaging cell line RetroPack PT67. Any of these and other retroviral vectors and producer cell lines may be employed in the practice of the present invention.

VI. Generating diverse subtypes of induced neurons with different TF pairs

[0063] The invention provides methods for obtaining different induced neuron subtypes that have diverse transcription profiles and biological functions. For example, some embodiments of the invention are directed to converting non-neuronal cells into cortical neurons. Cortical neurons are the main functional cell type of the brain's cerebral cortex. They moderate perception and communication in the brain, and also effects musculoskeletal control, the basis of voluntary movement, like walking. Cortical neurons communicate with each other through chemical and electrical signaling, and often use molecules called neurotransmitters to send messages at junctions called synapses. As demonstrated herein, induced cortical neurons can be generated from non-neuronal cells via recombinant co expression of TF pairs such as Oct4/Ngn3. Some other embodiments of the invention are directed to converting non-neuronal cells into hypothalamic neurons. The hypothalamus of the brain controls body temperature, hunger, thirst, fatigue, and circadian cycles.

Hypothalamic neurons orchestrate many essential physiological and behavioral processes via secreted neuropeptides, and are relevant to human diseases such as obesity, narcolepsy and infertility. These neurons could form the basis of cellular models, chemical screens or cellular therapies to study and treat common human diseases. As described herein, induced hypothalamic neurons can be generated from non-neuronal cells via recombinant co expression of TF pairs such as Bm3c/Ascl5 or Nurrl/Ascll Some embodiments of the invention are directed to converting non-neuronal cells into cholinergic neurons. Cholinergic neurons are nerve cells which mainly use the neurotransmitter acetylcholine (ACh) to send its messages. Many neurological systems are cholinergic. Cholinergic neurons provide the primary source of acetylcholine to the cerebral cortex, and promote cortical activation during both wakefulness and rapid eye movement sleep. Cholinergic neurons have been implicated in aging and neural degradation, specifically in connection with Alzheimer's disease. The dysfunction and loss of basal forebrain cholinergic neurons and their cortical projections are among the earliest pathological events in Alzheimer's disease. As described herein, induced cholinergic neurons can be generated from non-neuronal cells via recombinant co-expression of TF pairs such as Nurrl/Ascl2.

[0064] In various other embodiments, the invention provides methods of employing a number of TF pairs to induce neurons with diverse properties. The properties of the induced neurons are characterized by their neuron marker expression profiles as set forth in Tables 1-11. These tables respectively list expression profiles in mouse embryonic fibroblasts (MEFs) of various genes in 11 families of neuron markers (neuronal receptors and functional genes) in response to each of 36 exogenously introduced TFs (as listed in Column 1 of each table). Different members of each neuron marker family are shown in Row 1 in each table. A number“1” in the tables represents positive expression, and a number“0” indicates negative expression. Thus, the phenotype of neurons induced by each of the 36 TFs is represented by an expression profile consisting of positive expression of genes in each of the 11 neuron marker families. In some embodiments, the invention provides methods of using each of the listed TF pairs to obtain induced neurons with the combination of expression profiles set forth in the corresponding row of each of Tables 1-11. As exemplification, one would employ TF pair A1/B2 to obtain neurons expressing (1) those positive expression Glut-R family receptors as shown in Row 2 of Table 1; (2) those positive expression GABA-R family receptors as shown in Row 2 of Table 2; (3) those positive expression Glut-R family receptors as shown in Row 2 of Table 3; (4) those positive expression Glut-R family receptors as shown in Row 2 of Table 4; (5) those positive expression Acetylcholine-R family receptors as shown in Row 2 of Table 5; (6) those positive expression Histamine-R family receptors as shown in Row 2 of Table 6; (7) those positive expression Orexin-R family receptors as shown in Row 2 of Table 7; (8) those positive expression Somatostatin -R family receptors as shown in Row 2 of Table 8; (9) those positive expression Adenosine-R family receptors as shown in Row 2 of Table 9; (10) those positive expression Adrenaline-R family receptors as shown in Row 2 of Table 10; and (11) those positive expression Npy-R family receptors as shown in Row 2 of Table 11.

Table 1. Expression of members of Glut-R receptor family in response to TF pairs

Table 2. Expression of members of GABA-R receptor family in response to TF pairs

Table 3. Expression of members of Serotonin-R receptor family in response to TF pairs Table 4. Expression of members of Dopamine-R receptor family in response to TF pairs

Table 5. Expression of members of Acetylcholine-R receptor family in response to TF

Table 6. Expression of members of Histamine-R receptor family in response to TF pairs

Table 7. Expression of members of Orexin (Hypocretin)-R receptor family in response to TF pairs

Table 8. Expression of members of Somatostatin-R receptor family in response to TF pairs

Table 9. Expression of members of Adenosine-R receptor family in response to TF pairs

Table 10. Expression of members of Adrenaline-R receptor family in response to TF pairs

Table 11. Expression of members of Npy-R receptor family in response to TF pairs [0065] To generate a specific subtype of induced neurons (iNs) with desired properties (e.g., cortical, hypothalamic or cholinergic neurons) from non-neuronal cells, methods of the invention entail introducing into a non-neuronal cell (e.g., fibroblast) sequences (genomic or cDNA sequence) encoding the corresponding TFs as described herein. Generation of vector(s) (e.g., lentiviral vectors) for recombinantly expressing the TF pair in a chosen non-neuronal cell, preparation of viral particles harboring the expression vectors, introduction of the viral particles into the non-neuronal cell, and selecting induced neurons can all be performed in accordance with the methods described herein.

VII. Therapeutic and industrial applications

[0066] Induced neurons (iNs) produced in accordance with the present invention can find various therapeutic and non-therapeutic applications. The various neuron subtypes, including those with the receptor expression profiles set forth in Tables 1-11, can be used in clinical setting to treat many different diseases or conditions in which the biological function of a given neuron subtype is impaired or compromised. The subjects suitable for treatment with methods of the invention can be neonatal, juvenile or fully mature adults. In some embodiments, the subjects to be treated are neonatal subjects suffering from a disease or disorder noted above. In some preferred embodiments, the subjects are human, and the iN to be used in the treatment are human cells, preferably autologous cells isolated from the same subject to be treated. In the various therapeutic applications of the invention, an iN population is typically first generated with non-neuronal cells (e.g., fibroblasts or glial cells) from the subject in need of treatment, using methods described herein. The iN population can then be transferred to, or close to, an injured site in the subject. Alternatively, the cells can be introduced to the subject in a manner allowing the cells to migrate, or home, to the injured site. The transferred cells may advantageously replace the damaged or injured cells and allow improvement in the overall condition of the subject. In some instances, the transferred cells may stimulate tissue regeneration or repair. In some embodiments, the induced neurons may be transplanted directly to an injured site to treat a neuronopathy or neurological condition. The iN replacement therapies can be performed with protocols well known in the art for cell transplantation. See, e.g., Morizane et al, Cell Tissue Res.,

33l(l):323-326, 2008; Coutts and Keirstead, Exp. Neurol., 209(2):368-377, 2008; and Goswami and Rao, Drugs, 10(10):713-719, 2007. Other techniques and specific procedures for carrying out therapeutic methods of the invention can be based on or modified from methods well known in the art. See, e.g., Areman et al, Cellular Therapy: Principles, Methods, and Regulations , American Association of Blood Banks (AABB), I st ed., 2009; Wingard et al, Hematopoietic Stem Cell Transplantation: A Handbook for Clinicians, American Association of Blood Banks (AABB); I st ed., 2009; and Remington: The Science and Practice of Pharmacy, Mack Publishing Co., 20 th ed., 2000.

[0067] Other than therapeutic applications, iNs generated with methods of the invention can be used as a basic research or drug discovery tool. Some embodiments of the invention are directed to identifying agents or modulations that can promote formation of iNs from non-neuronal cells, as detailed below. Some other embodiments of the invention are directed to identifying compounds that are capable of modulating (e.g., inhibiting or enhancing) the biological function of a specific neuron subtype (e.g., one with the receptor expression profile set forth in Tables 1-11). Still some other embodiments of the invention relate to identifying compounds that can relieve or cause degeneration of various neurons subtypes. Similarly, these methods entail first generating a specific neuron subtype in accordance with methods of the invention. The induced neurons are then contacted with candidate agents to detect one or more compounds that are able to modulate (promote or suppress) the survival or function of the neurons.

[0068] Some other embodiments are directed to evaluating the phenotype of a genetic disease, e.g., to better understand the etiology of the disease, to identify target proteins for therapeutic treatment, to identify candidate agents with disease-modifying activity. These methods allow identification of compounds with desired therapeutic activities, e.g., an activity in modulating the survival or function of sensory neurons in a subject suffering from a neurological disease or disorder, e.g., to identify an agent that will be efficacious in treating the subject. For example, a candidate agent may be added to a cell culture comprising iNs derived from the subject's somatic cells, and the effect of the candidate agent assessed by monitoring output parameters such as iN survival.

[0069] In some related embodiments, the invention provides kits or pharmaceutical combinations for generating iNs and for using the iNs in various applications described herein. Some of the kits will contain one or more components of the agents described herein for inducing formation of specific neuron subtypes from non-neuronal cells. Any of the components described above may be provided in the kits, e.g., the specific TF pair encoding polynucleotides or expressing vectors harboring them, packaging cell lines for producing recombinant viruses, as well as reagents for transducing recombinant viruses into a non neuronal cell. The kits may further include non-neuronal cells for conversion into iNs. The kits may also include tubes, buffers, etc., and instructions for use. The various components of the kits may be present in separate containers, or some or all of them may be pre-combined into a reagent mixture in a single container, as desired. In addition to the above components, the subject kits may further include instructions for practicing the methods of the invention.

[0070] Utilizing the system for generating iNs described herein, the invention also provides methods to screen for compounds, cellular factors or modulations that can promote or stimulate conversion of a non-neuronal cell into iN. The compounds or cellular factors or manipulations can be exogenous compounds and genetic or epigenetic modulations inside the non-neuronal cell. In these methods, co-expressing a specific TF pair in the non-neuronal cell is performed in the presence of the candidate compounds or cellular factors. This allows identification of specific candidate factor (e.g., a miRNA or an epigenetic modulation) which can enhance the efficiency of conversion of the non-neuronal cell into iN. Various biochemical and molecular biology techniques or assays well known in the art can be employed to practice the screening methods of the present invention. Such techniques are described in, e.g., Handbook of Drug Screening, Seethala et al. (eds.), Marcel Dekker (lP stP ed., 2001); High Throughput Screening: Methods and Protocols (Methods in Molecular Biology, 190), Janzen (ed.), Humana Press (lP stP ed., 2002); Current Protocols in

Immunology, Coligan et al. (Ed.), John Wiley & Sons Inc (2002); Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press (3P rdP ed., 2001); and Brent et al., Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (ringbou ed., 2003).

[0071] The candidate compounds that can be screened for promoting iN formation can be any polypeptides, beta-turn mimetics, polysaccharides, phospholipids, hormones, prostaglandins, steroids, aromatic compounds, heterocyclic compounds, benzodiazepines, oligomeric N-substituted glycines, oligocarbamates, polynucleotides (e.g., miRNAs or siRNAs), polypeptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs or combinations thereof. Some candidate compounds are synthetic molecules, and others natural molecules.

[0072] By way of example, the screening methods of the present invention typically involve inducing neuron formation as described herein in the presence of candidate compounds or cellular manipulations (e.g., epigenetic modulations). Thus, co-expression of the TF encoding sequences in the non-neuronal cell (e.g., a fibroblast) is performed in the presence of the candidate compounds (e.g., miRNA) or performed in combination with other modulations (e.g., alternations in DNA methylation). If the presence of a candidate agent or modulation leads to an enhanced conversion efficiency of the non-neuronal cells into iNs, the candidate compound (or the specific modulation) is then identified as an agent or factor that promotes formation of iNs. An enhanced conversion efficiency refers to any substantial increase in the number of the initial non-neuronal cells being converted into iNs. This can be an increase of at least 20%, at least 30%, at least 40%, at least 50%, at least 75%, or at least 90% or more, of the cells being converted into iNs.

[0073] In some methods, various transcription factors or other polypeptides are screened for ability to promote iN formation. Screen for iN-promoting transcription factors or other DNA-binding proteins can be performed by using and/or modifying various assays that have been described in the art. See, e.g., Wiese et a , Front. Neurosci., 6: 1-15, 2012; Alvarado et af, J. Neurosci., 3l(l2):4535-43, 2011; Ouwerkerk et a , Methods Mol. Biol., 678:211-27, 2011; Laurenti et af, Nat. Immunol. 14, 756-763, 2013; and Xu et a , Virol.

446: 17-24, 2013. Some other methods of the invention are directed to identifying nucleic acid agents that are capable of stimulating formation of iNs. For example, candidate miRNAs can be co-expressed inside the non-neuronal cell. Expressing miRNAs in a host cell and testing the miRNAs for ability to enhance iN formation can be performed with techniques based on or derived from a number of miRNA screens that have been described in the literature. See, e.g., Voorhoeve et a , Cell, 124: 1169-1181, 2006; Becker et af, PLoS ONE 7(11): e48474, 2012; Lam et af, Mol. Cancer Ther. 9: 2943-2950, 2010; and Olarerin- George et al. BMC Biol., 11: 19, 2013.

[0074] In still some methods, the candidate compounds are small organic molecules (e.g., molecules with a molecular weight of not more than about 500 or 1,000). Preferably, high throughput assays are adapted and used to screen for such small molecules. In some methods, combinatorial libraries of small molecule test agents can be readily employed to screen for small molecule modulators that enhance iN formation. A number of assays known in the art can be readily modified or adapted in the practice of these screening methods of the present invention, e.g., as described in Schultz et al., Bioorg Med Chem Lett 8: 2409-2414, 1998; Weller et al, Mol Divers. 3: 61-70, 1997; Fernandes et al, Curr. Opin. Chem. Biol. 2: 597-603, 1998; and Sittampalam et al., Curr. Opin. Chem. Biol. 1: 384-91, 1997.

EXAMPLES

[0075] The following examples are offered to further illustrate, but not to limit the present invention.

Example 1. An unbiased screen for TF pairs that induce neurons

[0076] Our previous studies showed that transient expression of only two TFs could stably convert fibroblasts into neurons of a particular subtype. To determine whether other sets of two TFs could induce neuronal identity, we cloned cDNAs for 46 bHLH and 12 POU transcription factors into doxy cy cline (dox)-inducible lentiviral vectors, focusing on those known to be expressed in neural lineages. For comparison we also included one nuclear receptor (NR) transcription factor, NurrJ and the embryonic stem cell POU factor, Oct4, which is not expressed in neurons. Pairing each bHLH with each of the POU/NR TFs resulted in 598 unique TF combinations. Mouse embryonic fibroblasts (MEFs) were transduced with each TF pair along with the doxy cy cline inducible activator rtTA to allow timed and transient expression of each combination. For this screen, we induced TF expression for eight days, then allowed cells to mature for six or more days without induction (Fig. la).

[0077] By day 14 of reprogramming, 76 (12.7%) of the 598 pairs produced Tuj l- positive cells that exhibited neuronal morphologies (Fig. lb, Fig. 5a). The 76 positive TF pairs included 16/46 bHLH TFs, 9/12 POU TFs, and 1 NR TF. Although most single factors had no effect on fibroblasts, in four cases - those involving Ascii. Ascl2, NgnJ and Ngn3 - we detected rare Tuj l positive cells. However, the majority of these cells did not exhibit neuronal morphologies, suggesting that they reflect the direct activation of the Tuj 1 promoter or represent partially reprogrammed cells. We controlled for this by requiring that pairs containing these factors showed a significant enrichment over single factor conditions (Fig. 5b-c).

[0078] To confirm that these factor pairs are indeed reprogramming fibroblasts rather than rare embryonic or neural crest cells present in culture, we transduced tail-tip fibroblasts (TTFs) with 12 of the positive factor combinations. All 12 combinations produced candidate iNs from TTFs (Fig. 5d-e). In addition, we used FACS to deplete MEFs of cells expressing p75, a marker of neural crest cells (Fig. 5f-g). The percentages of Tuj l -positive cells generated from the original MEFs and p75-depleted MEFs were similar (Fig. 5h).

These experiments indicate that the primary starting population for direct conversion is of fibroblast origin.

[0079] While Tuj 1 expression in cells with neuronal morphology can identify candidate iNs, we previously observed that successful induction of neuronal identity leads to the coordinated expression of other neuronal markers. The majority of candidate iNs were positive for the mature neuron marker, Map2 (85-99% across five populations) and presynaptic protein, Synapsin (86-98%) (Fig. lc-d). They also expressed Tau, as measured by EGFP expression, when we reprogrammed MEFs derived from TauEGFP knock-in mice (35 TF pairs) (Fig. lc-d). These experiments suggested that the candidate iNs are likely to exhibit other characteristic properties of endogenous neurons.

[0080] Neurons are most stringently defined by their electrophysiological properties. To characterize the physiological properties of the candidate i s, we analyzed their voltage output in response to stimulation with rectangular current pulses under whole- cell patch clamp conditions (Fig. 2b). We selected cells that exhibited neuronal morphology and expressed both TauEGFP and a red fluorescent protein driven by a Synapsin promoter (Synapsin-TdTomato) (Fig. 2a). At days 16 to 24 post-induction, the majority of recorded cells exhibited behaviors characteristic of endogenous neurons. Of the 60 TauEGFP and Synapsin positive cells recorded, 58 (97%, Fig. 2c) displayed both resting membrane potentials (-61.7 ± 7.8 mV, Fig. 2f) and action potentials (Fig. lc-e).

[0081] We also examined electrophysiological properties known to diversify endogenous neuron populations. We found TF pair-specific differences in the membrane input resistance and voltage sag slope parameters. In contrast, no significant differences were evident in the rheobase (Fig. 2f). Collectively, the observed variations in e!ectrophvsiological properties within the iN populations may reflect TF -induced functional diversity and/or variable maturation states.

[0082] Another key feature of neurons is their capacity to form synapses, which requires co-culture with glia for many cultured neurons generated by reprogramming methods. Surprisingly, in these experiments, we detected excitatory post-synaptic currents (EPSCs) in five recorded candidate iNs (Fig. 2d, Fig. 6k). The presence of EPSCs indicates that the candidate iNs are sufficiently mature to form synapses and respond to synaptic stimulation. Cells that expressed neither TauEGFP nor TdTomato and exhibited fibroblast- like morphology did not display these electrophysiological properties (data not shown). Because these candidate cells meet the current standards for iNs, we will refer to the Tujl- positive cells generated from all TF pairs as iNs.

[0083] We have previously shown that transcription factors that can generate mouse iNs also have the capacity to reprogram human fibroblasts. To determine whether the screen results also apply to human cells, we tested 15 mouse TF pairs on human embryonic fibroblast-like cells (HEFs). Among these 15 combinations, when we tested either mouse of human TF orthologs of the same transcription factor pair (Ngn3/Pitl), we did not observe a marked difference in efficiency when reprogramming. All 15 combinations produced candidate human iNs that expressed pan-neuronal marker Tuj 1 and exhibited neuronal morphology (Fig. 7A). To determine whether human iNs generated with mouse factors were electrically excitable, we performed electrophysiological recordings on Synapsin-TdTomato positive cells with neuronal morphology. These experiments demonstrated that human iNs produced with a novel TF pair (Ngn3/Pitl) fired evoked action potentials (n=2l/27, 77%) and displayed passive membrane properties comparable to those reported previously for directly reprogrammed human iNs (Fig. 7B-D). Human iNs also demonstrated voltage dependent Na+ and K+ currents with positive inward currents at depolarized holding potentials (Fig.

7E). Together, these experiments and our previously published studies predict that the results of the mouse TF screen may be more broadly expanded to identify TF pairs that produce functional human iNs with defined patterns of gene expression.

Example 2. Establishing iN similarity with endogenous neurons using RNA-Seq [0084] The iNs produced in this study exhibit neuronal morphologies, express multiple markers of mature neurons, and fire action potentials. This suggests that

reprogramming via transient induction of TF pairs has stably reset the transcriptional program of fibroblasts to produce a set of convergent features characteristic of endogenous neurons.

To establish the extent to which the transcriptomes of iNs resemble endogenous neural populations, we conducted RNA-Seq on 35 iN populations prepared from two independent replicate experiments. We selected the 35 combinations based on their reprogramming efficiencies and evidence for synergistic activity. To obtain a highly enriched population of iNs, we reprogrammed MEFs derived from TauEGFP knock-in mice. TauEGFP is a reliable marker of neuronal identity in vivo and we confirmed this in vitro by showing that 100% of TauEGFP cells are Tujl positive, while 98% of Tuj l -positive cells are TauEGFP-positive. Then, at day 16 of reprogramming, we used FACS to purify the TauEGFP-positive cells. RNA-Seq libraries were generated from each biological replicate and from control MEFs transduced with rtTA and cultured in parallel with the iNs

[0085] For comparison, we applied similar methods to isolate and transcriptionally profile several populations of endogenous neurons (EndoNs), using genetic labeling and/or regional dissection. These included neuronal populations from the olfactory bulb (OB), cortex (CTX), hippocampus (HIP), medial habenula (mHb), cerebellum (CER), and the dorsal root ganglia (DRG). Several populations were sorted using the same TauEGFP reporter mouse strain used to generate the iNs (DRG, CER, CTX). Whole mouse brain (Brain) RNA provided by Clontech was also prepared for sequencing. While not

comprehensive, these endogenous neuronal populations encompass the peripheral and central nervous system and include neurons of diverse neurotransmitter identities (i.e. excitatory, inhibitory and cholinergic), providing a first pass survey of neuronal transcriptional similarity and diversity in the mouse nervous system.

[0086] To explore the relationship between the iN, MEF and EndoN/Brain populations, we conducted principal component analysis (PCA) on the entire transcriptome of all samples. Plotting the first three principal components revealed that the iNs and

EndoN/Brain populations were intermixed and both segregated from the MEFs. However, in one component, the iNs displayed slightly more similarity to MEFs than the EndoNs. This appears to reflect weak residual MEF gene expression that may derive from a contaminating population carried along by FACs sorting or reflect a need for additional maturation to fully silence MEF gene promoters. These preliminary investigations confirm that the iNs have acquired transcriptional programs that resemble those of endogenous neurons.

[0087] Expressing each of the 76 TF pairs results in shared patterns of protein expression and morphologic similarity. To identify the gene regulatory networks that may control these and other shared properties of iNs, we identified all genes that were significantly upregulated in the group of 35 iNs compared to the MEFs using DESeq2 (p- adjusted < 0.05). As expected these 3,860 genes were significantly enriched for gene ontology (GO) terms associated with neuronal development, neuronal function and synaptic transmission based on DAVID analyses (Fig. 3a). Conversely the 3,467 genes that were downregulated in iNs compared to MEFs were enriched for GO terms associated with immune function and cell division (Fig. 3a). These analyses confirm that the iN populations represent reprogrammed MEFs that have exited mitosis and established known neuronal patterns of gene expression.

[0088] Next, we wished to assess the similarity of iNs with endogenous neurons.

To accomplish this, we identified all genes that were enriched in the pool of endogenous neurons compared to MEFs (EndoN/Brain genes, n=2,965). The pooled set of 35 iN populations expressed 75.5% of these genes, which define a candidate“core” neuronal gene signature that may be indispensable in establishing and maintaining neuronal identity. We assessed the relative similarity of each individual iN or EndoN population to the neuronal core gene set (Fig. 3b). The number of genes shared between the core and each individual population does not significantly differ among endogenous and iN populations (78% ± 7%, range 63% - 86%). This suggests that the end result of direct reprogramming is similar to the result of normal neuronal development with respect to this gene network (Fig. 3b).

[0089] “Missing” genes in iNs could reflect subtype differences, the influence of exogenous signaling, or incomplete reprogramming. We identified a set of these“missing” genes in iNs comprising 24.5% of the total EndoNs/Brain enriched genes. The strongest signatures of the“missing” genes based on GO analyses were for genes associated with glia and neural stem cells (gliogenesis, serotonin receptor signaling, and neurogenesis), suggesting that non-neuronal subtypes present in the EndoNs (for example, deriving from the whole brain sample) account for much of this distinction. Together these analyses define a new candidate core set of neuronal genes and show that it is possible to reprogram certain aspects of cell type identity using a larger and more diverse set of TF inputs than has previously been appreciated.

[0090] This observation raises the question of whether gene expression patterns shared in iNs and EndoNs are governed by the same mechanisms, which can include loss of repressors of neuronal identity and gain of activators of different neuronal genes. To identify candidate activators and repressors of the“core” genes in each population, we applied two complementary bioinformatics tools.

[0091] First, we conducted Ingenuity Pathway Analysis (IP A) on the“core” gene set to find potential upstream regulators. IPA takes advantage of a priori knowledge of expected interactions between transcriptional regulators and their target genes stored in the Ingenuity Knowledge Base. This approach identified 39 candidate transcriptional regulators of the“core” genes. We further categorized these genes into five classes. Class I and II genes are expressed in MEFs but not in iNs or EndoNs. These represent candidate repressors of the shared“core” genes (Class I) or of subsets of the“core” genes (Class II). Encouragingly this analysis identified RE 1 -Silencing Transcription factor (REST), a well-established transcriptional repressor, and six other genes.

[0092] In contrast, candidate activators should exhibit low MEF expression and high expression in some or all neuronal populations. Class III and IV genes represent candidate activators based on low expression in MEFs but high expression in either the EndoNs (Class III) or the iNs (Class IV). These genes included TFs known to regulate neuronal differentiation and diversity such as Isll and Eomes. Intriguingly, we identified only one TF that was highly expressed in all iN and EndoNs (Mecp2).

[0093] To independently corroborate these results, we conducted HOMER motif enrichment on the promoters of the“core” genes (Target sequences) and compared them to the promoter regions of all detectable genes by RNA-Seq (Background sequences). This analysis identified 48 highly significant enriched motifs upstream of the shared neuronal genes. The four transcriptional regulators that were identified by both IPA and HOMER also fell into classes I-IV. The most significant gene in both cases was REST.

[0094] These data support a model in which common mechanisms for gene derepression operate in development and direct reprogramming, while mechanisms governing activation of genes may be more diverse. The differences in activator expression we observe between the two main subclasses of iNs (those produced using Ascll family members vs.

Ngn family members) suggest that this diversity is not simply due to differences between direct reprogramming and normal neuronal development, but may instead reflect

requirements for generating aspects of neuronal diversity not captured in the core neuronal transcriptome. While this analysis is far from exhaustive, it serves as a proof of principle demonstration of the utility of the iN dataset for understanding mechanisms governing neuronal identity and diversity.

Example 3. Diversity among iNs

[0095] To further explore potential diversity among our iN populations, we conducted single-cell RNA-Seq on four iN populations ( Ngnl/Brn3a , Ngn3/Oct4,

Ngn3/Brn4, Ascl2/Nurrl ) selected for their novelty of reprogramming factors but also including a combination (Ngn I BrnJa) we had extensively characterized in a previous study. We used the TauEGFP reporter to enrich for iNs and FAC-sorted on day 16 post-induction to match our population RNA-Seq. Using a droplet-based method, we analyzed 952 cells from these four different iN populations with a median gene discovery rate of 6,699 genes per cell. These analyses show that iNs produced with the same TFs cluster together as visualized in the PCA-reduced t-distributed Stochastic Neighbor Embedding (t-SNE) space. Ngnl/Bm3a iNs and Ascl2/Nurrl iNs formed their own unique clusters, while Ngn3/Bm4 and Ngn3/Oct4 which shared the same bHLH factor but used different POU factors were more proximal, with a few cells from each population mixed in the other cluster. We also detected a small cluster in which cells from the four iN combinations were intermixed. However these cells all exhibited low total unique molecular identifier (UMI) counts suggesting that they cluster due to lack of detectable expression of subtype specific genes. Expression of pan-neuronal markers such as Tau and Tubb3 were largely homogenous among the single cells, with the potential exception of cells derived with a non-POU factor, Nurrl (Ascl2/Nurr 1), which exhibited a slightly more graded expression. Mature neuronal markers such as Map2 and Snap25 displayed remarkable homogeneity among all iN combinations, re-confirming their neuronal identity. [0096] Despite the presence of mature neuronal makers in our single-cell RNA-Seq populations, it has been previously reported that fibroblasts can enact a myocyte program when supplied with an Ascl family member without sufficient expression of a second reprogramming factor. We also detect myocyte gene expression in iN populations derived using Ascl family transcription factors, but not with the Ngn family factors. Mapping these genes onto the Ascl2 Nurrl single cell data shows that there is a small subpopulation of cells (3 out of 90) with coordinated co-expression of their genes, while no myocyte genes are detected in the other iN populations.

[0097] The bulk RNA-Seq data also showed weak but detectable expression of

MEF genes in iNs that was not present in endogenous neurons. This signature could derive from either: (1) rare contaminating MEFs in the sorted cells, (2) residual MEF gene expression in the iNs, or (3) bona fide expression of genes categorized as MEF genes in the iNs but not in the EndoNs we profiled. To distinguish between these possibilities, we sorted and profiled MEFs (Tau-GFP negative) and iNs (Tau-GFP positive) from the same reprogramming experiment (using Ngn3/Pitl) . Analyses of these cells alongside the other iN populations identify a small population of cells with strong MEF gene expression. This suggests that a small number of contaminating MEFs carried along in the FACS experiment can account for much of this signature, although we also observed several cases in which candidate MEF genes were expressed at low levels throughout the iN populations.

[0098] We further explored diversity among the iN populations by examining differentially expressed genes in one TF combination compared to the three remaining combinations. Among the top differentially expressed genes were receptors, ion channels, and transmembrane proteins, which represent an important gene classes underlying neuronal function and therapeutic targets for neurologic disease. Within individual combinations we also found evidence for diversity as defined by the presence of a given receptor or ion channel. However, this could also reflect technical limitations of single cell RNA-Seq, which has a high drop-out rate. When we independently performed t-SNE clustering on the four individual TF combinations, we did not observe significant clusters forming other than the few contaminating MEF cells, supporting low intra-population heterogeneity. These results show that the while bulk RNA-Seq data is sensitive to small subpopulations of contaminating cells, the overall predictions it makes apply well to the patterns of gene expression in individual iNs. This supports the idea that the TF pairs used to produce iNs generally result in adoption of a limited set of potentially related cell fates, as we showed previously for induced sensory neurons.

[0099] Neuronal subtypes can be distinguished by coordinated expression of genes that afford them their specialized functions. To identify gene co-expression patterns in the iNs we applied Weighted Gene Co-expression Network Analysis (WGCNA). These analyses identified 22 gene co-expression patterns, or modules. As expected, many modules were enriched in the iN populations versus MEFs. Module 3 (M03), for example, was enriched in most iN populations and included genes associated with neurogenesis, dendritic spine morphology and synaptic transmission, as determined by PANTHER Overrepresentation Test. Additionally, some of these modules were selectively enriched in distinct groups of iN populations based on the identities of their reprogramming factors. M24 was enriched in iN populations generated with Class IV ( Brn3 ) POU transcription factors while M09 was enriched in iN populations generated with either a member of the Ascl family or NeuroD2 (Fig. 4a). Although M09 contained genes associated with neuronal function, many were also related to muscle development and function, corroborating recently reported analyses of individual iNs produced using the Wemig factor combinations. Therefore, these analyses may guide the selection of appropriate reprogramming factors for desired cell types and also be useful to define cell autonomous co-expression modules relevant to the growing number of neuronal RNA-Seq expression studies involving neurons.

[00100] Neurotransmitter identity is perhaps the most common means to divide neurons into subtypes and producing iNs with desired neurotransmitter expression profiles is of great interest for translational medicine. Mining the RNA-Seq data we generated allows us to identify TF pairs that produce neurons with characteristics of glutamatergic neurons (expression of vesicular glutamate transporters; vGlut 7, vGlut2 and vGlutS) and key synthesizing enzymes indicative of GABAergic, serotonergic and cholinergic neurons. Our study both confirms reports of other groups and highlights additional TF pairs that produce iNs of potentially specific neurotransmitter identities. For example, some TF pairs that induce the dopaminergic-marker tyrosine hydroxylase (7¾), are similar to those previously reported (Ascll/Nurrl), while others are novel (Ascl2/Nurrl and As cl 51 Brn 3c). We have detected EPSCs in two iN populations, indicating that vesicular release of glutamate occurs in iNs. Although we were unable to detect serotonin release by ELISA as reported in other studies, this is perhaps due to the low number of iNs per well or a need for further maturation.

[00101] In addition to neurotransmitters, different populations of iNs exhibit diversity in many other classes of genes including cell surface receptors and ion channels. Distinct neuronal subtypes can be classified based on their responses to chemical ligands and perturbations of specific ligand-induced responses underlie many neurological diseases. For example, genetic variation at the nicotinic receptor cluster including Chma3/b4/a5 has a strong influence on susceptibility to nicotine addiction and is also linked to lung cancer and alcoholism. Applying our iN RNA-Seq dataset, we found that iN populations generated with the Aq/i- family and non-ZinG-family of TFs (Group 2) expressed Chma3, b4 and a5, while iNs generated using the AscZ-family of TFs paired with the /ini -family of TFs (Group 1) did not. In contrast both populations expressed similar levels of certain glutamate receptors.

Other subsets of iNs expressed different members of the Chma/b family. Importantly, nicotine receptor subunits known to be co-expressed in vivo were also typically co-expressed in iNs, suggesting that these reprogramming codes can indeed recapitulate key aspects of known endogenous transcriptional networks.

[00102] To test whether expression levels of nicotine subunits in iN subtypes predict their responses to nicotine, we performed calcium imaging on iN populations within each group (Fig. 4c-d). We transduced the different iN populations with lentiviruses encoding a Synapsin-TdTomato reporter and the fluorescent calcium reporter protein GCAMP5.G under the control of a Map2 promoter. In a randomized order, we serially exposed the iNs to glutamate and nicotine. To select for iNs that maintained functional viability throughout the recording, we only analyzed Synapsin-TdTomato-positive cells that responded to transient exposure of 100-250 mM KC1 at the beginning and end of each recording. iNs within the same group and replicates of the same iN populations responded similarly to each other, and both groups responded equally well to glutamate. In contrast, the percentages of cells that responded to nicotine were significantly greater in the iN populations of Group 2 versus Group 1 (Fig. 4d). These results, paired with the electrophysiology results, demonstrate that iNs generated with different pairs of transcription factors exhibit functionally diverse phenotypes that can be predicted by interrogating the RNA-Seq dataset we have generated. [00103] Although we have demonstrated transcriptional and functional diversity in the iNs, and similarity with core transcriptional programs of endogenous neurons, we wished to determine whether some iN populations closely resembled any known neuronal subtypes. At present, no established methods exist to quantify neuronal subtype similarity using RNA- Seq datasets. However, the Dougherty group recently developed the Cell-type Specific Expression Analysis (CSEA) method to accomplish this. We applied this method to the five iN populations that had over 40 significantly enriched genes compared to all other iN populations and MEFs (DESeq2 p-adjusted value < 0.05) (Fig. 4c-d).

[00104] When we input the“core” neuronal transcriptome, we saw significant enrichment in nearly all neuronal populations, but not in glial populations, consistent with our previous analyses. iNs produced with Ascl2/Nurrl were similar to cholinergic neuronal subtypes, while iNs produced Ascl5/Bm3c resembled hypocretinergic neurons of the hypothalamus (Fig. 4c). Intriguingly, iNs produced using the non-neuronal TF Oct4 (paired with Ngn3) best matched the cortical and cerebellar neuronal populations in this dataset, while the iNs with the largest number of unique genes (Ascll/Nurrl) were similar to habenular neurons as well as some oligodendrocytes, suggesting that these factors might produce multiple cell lineages. The CSEA dataset did not include data from DRG sensory neurons. Therefore, as a control for specificity we input the enriched genes from the induced sensory neurons produced using the Nl/N2.B3a combinations. This gene set did not exhibit significant similarity to any profiled populations, highlighting the specificity of this tool. This demonstrates the potential utility of pairing iN transcriptional profiling with datasets derived from endogenous neuronal populations. Further progress in generating well-defined transcriptional profiles of endogenous and induced neurons should allow even more precise comparisons of distinct aspects of neuronal identity.

Example 4. Some exemplified methods for inducing iNs

[00105] Embryonic fibroblast isolation and derivation: Wild-type CD1 mice and heterozygous TauEGFP mice (Jackson Laboratory, STOC A4apf ml<J GFP ^ Kk / stock number: 004779) were bred at The Scripps Research Institute animal facility. Mouse embryonic fibroblasts (MEFs) were isolated under a dissection microscope from E13.5 embryos by removing the heads, limbs, internal organs, and spinal columns to eliminate neurogenic cells. The remaining tissue was manually dissociated with 0.25% trypsin (Gibco) for 20 minutes at 37 °C. The trypsin was subsequently diluted with MET media (DMEM +

10% FBS and penicillin/streptoraycin) and removed via centrifugation. Pelleted cells were re-suspended in MEF media and seeded on gelatin-coated (0.01%) tissue culture plates.

MEFs were grown to confluence and passaged at least twice before use

[00106] Primary tail-tip fibroblasts were isolated from 2-4 mm-long tail tips of P3 mouse pups. Tail tips were first rinsed in 70% ethanol washed with HBSS (Invitrogen), chopped into smaller pieces, and dissociated 0.25% trypsin for 60 minutes at 37 °C.

Subsequent steps are the same as in the MEF isolation protocol

[00107] For human embryonic fibroblasts (HEF) derivation, human iPSCs colonies were harvested using 0.5mM EDTA (Invitrogen) and differentiated by embryoid body (EB) formation. The EBs were cultured for 7 days in non-adherent suspension culture dishes (Coming), 2 days in mTeSR medium (Stemcell Technologies) and the following 5 days in 10% FBS DMEM (vol/vol). On day 8, the EBs were plated onto adherent tissue culture dishes and passaged according to primary fibroblast protocols using 0.25% trypsin for two to three passages before the start of experiments.

[00108] Molecular cloning, cell culture, and lentiviral transduction: The cDNAs for the transcription factors used were cloned into lentiviral constructs under the control of tetracycline operator (TetO). The cDNA for BRN3A and HEN2 are the only human factors. BRN3A which has 97% homology to the mouse Bm3a peptide and was cloned as described in Blanchard et al. (Nat. Neurosci. 18, 25-35, 2015). Replication-incompetent VS Vg-coated lentiviral particles were packaged in 293T cells (ATCC), harvested 48-hours after transfection, and filtered through a 45 micron PVDF membrane before use.

[00109] Reprogramming method is a modification of a previously described protocol (Blanchard et al, Nat. Neurosci. 18, 25-35, 2015). Passage two MEFs were infected with lentivirus in MEF media. After 12-24 hours of infection, virus -containing media was replaced with fresh MEF media. Transcription factors were induced 48 hours post infection media by switching to MEF media supplemented with 5 mM doxy cy cline (Sigma). 4 days after initiating induction with doxy cy cline, MEF media was replaced with N3 media as published in Pfisterer et al. (Nature 463, 1035-1041, 2010) but using N2 supplement (Gibco) in replacement of some individual components. 8 days post-induction, doxycycline was withdrawn. 10 days post-induction, media was switched to neural maintenance media, which consisted of a 1: 1 mix ofN3 media and Neurobasal (Invitrogen) supplemented with B27 (minus vitamin A, Gibco) and bFGF (10 ng/ml) (N3/NB media). Efficiency of conversion was measured by the number of Tuj l -positive cells divided by the initial number of cells plated.

[00110] Immunohistochemistry: Cells for immunofluorescence staining were fixed with 4% paraformaldehyde for 10 min at room temperature. Cells were then washed three times with phosphate-buffered saline (PBS) and subsequently blocked in 5% horse serum and 0.1% Triton X-100 (Sigma) for 1 h at room temperature. Primary staining was performed overnight at 4 °C in block. Cells were again washed three times and then stained with secondary antibodies diluted in block for 1 h at room temperature. The following primary antibodies and dilutions were used: Tuj l (Sigma-Aldrich T2200, 1 :500), Map2 (Sigma- Aldrich M4403, 1 :500), and Synapsin 1 (Synaptic Systems 106103, 1 :500).

[00111] Electrophysiology: TauEGFP MEFs were reprogrammed and cultured as described on Thermanox® plastic coverslips (33 mm diameter). Coverslips were placed in the recording chamber mounted on an Olympus BX51 microscope. To identify TauEGFP positive cells that expressed Synapsin, we transduced our candidate iNs with lentivirus encoding the fluorescent red protein, TdTomato, under the control of a SYNl promoter. Spontaneous activity and evoked responses were recorded from identified cells at day 16 to 24 post-induction under whole-cell patch clamp at 33 °C. Similar to the electrophysiology protocol described in Blanchard et al. (Nat. Neurosci. 18, 25-35, 2015), signals were amplified using a MultiClampTOOB (Molecular Devices) and acquired using die data acquisition software DASYLab v.11 (National Instruments) at 20 kHz. Patch pipettes with input resistances of 6-8 MOhm were pulled from standard wail glass of 1.5-mm OD (Warner Instruments) and filled with solution containing 120 mM potassium-gluconate, 10 mM KC1, 10 mM HEPES, 10 mM EGTA, 2 mM MgATP, 0.3 mM NasGTP at pH 7.3. The bath solution (artificial cerebrospinal fluid) was composed of 125 mM NaCl, 2.5 mM KC1, 2 M CaCh, 1 mM MgCh, 1.25 mM NaH 2 P04, 26 mM NaHCCh, and 25 mM glucose. To record voltage responses of the identified iNs, we used incrementing levels of constant, rectangular current steps of 350 ms duration. The initial current step level was -50 to -200 pA depending on the observed input resistance of the cell. Steps were incremented by +2 or ÷5 pA in successive cycles of stimulation at a rate of 1 Hz. Analysis of the evoked responses was performed in software developed by A. Sziics (IV Analyzer). For each cell, several physiological parameters, including the resting membrane potential, rheobase, input resistance at rest, and spike amplitude, were measured.

[00112] Spontaneous excitatory or inhibitor) ' postsynaptic potentials were occasionally observed in the recorded iNs. We performed voltage clamp recordings of postsynaptic current whenever such activity was detected (150-200 s recordings at -50 mV holding potential). At this potential, the inward currents we observed were identified as excitatory postsynaptic currents, considering the typical resting membrane potential of the iNs (near -50 mV). GABAergic inputs do not typically produce such prominent PSCs at this holding potential.

[00113] Human iNs generated from HEFs were also identified via the SYN1-

TdTomato reporter virus. Recordings were performed between 26-31 days post-dox induction. Voltage-gated currents were induced by 400-ms long voltage steps to -115 mV to - 5 mV in 10 mV increments from the initial potential of -65 mV. The leak currents were subtracted from the voltage-gated currents prior to analysis. Leak currents were calculated using the currents induced by stimulation from -65 mV to -55 mV and scaling them to the corresponding membrane voltage. Whole-cell currents were filtered at 2 kHz and sampled at 20 kHz with a Digidata 1440 interface controlled by pClamp Software (Molecular Devices, Union City, CA).

[00114] Fluorescence-activated Cell Sorting (FACS) purification: Reprogrammed candidate iNs generated from heterozygous TauEGFP MEFs were prepped for Fluorescence- activated Cell Sorting (FACS) by first detaching cells from culture plate using Accutase (Innovative Cell Technologies). Accutase was subsequently diluted with neural maintenance media (N3/NB media) and removed via centrifugation. Pelleted ceils were resuspended in neural maintenance media, triturated, and strained through 35 pm nylon mesh filter to obtain single ceil suspensions. Viabilities markers DAPI (1 mM) and DRAQ5 (BioStatus DR50050, 1 mM) were added to the suspension at least 10 minutes prior to soiling. Appropriate gates for FACS were set based on TauEGFP, DAPI and DRAQ5 intensities to isolate live

TauEGFP-positive cells as shown in Figure 3.9 using the MoFlo® Astrios™ (Beckman Coulter). Isolated cells were sorted into TRIzol® LS (Invitrogen). [00115] Similarly, endogenous neuron populations were isolated from the appropriate transgenic reporter mice at P21. Dissected tissue samples were dissociated as in Brewer and Torricelli (Nature Protocols 2, 1490-1498, 2007) with the following

modifications. Manual homogenization was conducted with a scalpel rather than with a tissue sheer. As in Hazen, et al. (Neuron 89, 1223-1236, 2016), we also used papain- containing L-cysteine (Worthington Biochemical, PAP2 10 uniis/rnl) because its higher activity allowed for shorter dissociation times (1.5 minutes total). During papain digestion, samples were triturated every 5 minutes using PI 000 plastic tips instead of siliconized Pasteur glass pipettes. After centrifugation using the density gradient, we found viable neurons in the fraction containing the cell pellet and the fraction 2 mis immediately above the pellet. Both fractions were combined and washed once in 10 mis ofHAGB (Hibernate- A (Gibco A1247501), 1X B-27 supplement (Gibco 12587010), 500 pM GlutaMAX (Gibco 3.5050061)) After a subsequent centrifugation, pelleted cells were resuspended in HAGB, filtered and kept on ice until consequential FACS sorting. As with the candidate iNs, viabilities markers DAPI and DRAQ5 were added to the suspension and appropriate gates were set to purify cells into TRIzol® LS.

[00116] RNA isolation: Total RNA was isolated from FACS-sorted cells using

Direct-zol RNA MiniPrep Kit (Zymo Rsearch) according to the manufacturer’s protocol, except linearized acrylamide (1 pg) was added to each sample prior to the first step and Zymo-Spin IC columns were used in replacement of IIC columns. RNA quality and quantity was determined with an Agilent 2100 Bioanalyzer. RNA integrity numbers (RINs) for all iN samples were between 6 and 10 (median = 8.7). The amount of RNA per sorted event was between 1 and 15 pg (median = 7.9 pg). Therefore, approximately 1,500 to 2,000 cells were required to yield 10 ng RNA for library input.

[00117] RNA-Seq library preparation and sequencing: Typically 10 ng of purified, high quality RNA served as input for SMARTer® Ultra™ Low Input RNA Kit for

Sequencing - v3 (Clontech Laboratories, Inc.). A few replicate libraries were prepped from 1-7 ng of input total RNA. These were comparable to libraries prepped from 10 ng of RNA since correlation coefficients were greater than 0.98 between libraries prepped from, 1, 5, and 10 ng of the same total RNA (Fig. 6f-h). Amplified cDNA was assessed for quality using High Sensitivity DNA Kit (Agilent Technologies) and sheared using the Covaris system. Sequencing libraries were subsequently prepped using NEBNext® Ultra™ DNA Library Prep Kit for Illumina®. 75 base pair single end reads generated using Illumina’s NextSeq platform were mapped to the mouse genome (mmlO) by first removing adapters and low quality bases using Trimmomatic (v0.32, ILLUMINACLIP: TruSeq3-SE.fa:2:30: l0

LEADINGS TRAILING: 3) (Bolger et al, Bioinformatics 30, 2114-2120, 2014). Reads were then aligned using STAR (Dolbin et al, Bioinformatics 29, 15-21, 2013) and counts were generated using HTSeq (Anders et al, Bioinformatics 31, 166-169, 2015). MmlO did not include As cl 5. therefore, we added it to the reference GTF file in HTSeq. It is also important to note that some libraries were prepped using SMARTer® Ultra™ Low Input RNA for Illumina® Sequencing - HV (Clontech Laboratories, Inc. and sequenced on Illumina’s HiSeq platform, resulting in 100 base pair single reads. Libraries were sequenced to a mean of -37.5 million uniquely mapped 75 base pair single-end reads per replicate.

[00118] RNA-Seq data analysis (DESeq2 and Principal Component Analysis) :

RNA-Seq data was analyzed using R, an open source programming language and

environment for statistical computing and visualization (R: A Language and Environment for Statistical Computing (Vienna, Austria: the R Foundation for Statistical Computing., 2011)). Multiple R packages available through Bioconductor (Gentleman et al, Genome Biol. 5,

R80, 2004) were used during analysis. Differential gene expression analysis was conducted using DESeq2 (Love et al, Genome Biol. 15, 550, 2014). Heat maps were generated using gplots (Various R Programming Tools for Plotting Data, 2015). Both rgl {3D Visualization Using OpenGL, 2016) and pca3d ( Three Dimensional PCA Plots, 2015) were used to calculate and generate principal component plots.

[00119] Ingenuity Upstream Regulator Analysis in Ingenuity Pathway Analysis

(IP A): Ingenuity Upstream Regulator Analysis in Qiagen’s Ingenuity Pathway Analysis (IP A) was used to identify the cascade of upstream regulators of the“core” gene set. IPA utilizes a priori knowledge of expected interactions between transcriptional regulators and their target genes stored in their scientific literature-based database, Ingenuity Knowledge Base.

[00120] Hypergeometric Optimization of Motif EnRichment (HOMER) analysis: In order to determine the regulatory elements acting within the iNs, specifically transcription factors, motif enrichment was performed on the promoters of differentially expressed genes in the iNs. The known motif enrichment routine in the fmdMotifs.pl routine available in from the software HOMER (Hypergeometric Optimization of Motif EnRichment) was used to perform the analysis (Heinz et al, Mol. Cell 38, 576-589, 2010). Known motif enrichment in HOMER is performed by scanning a defined set of promoter regions for motifs defined by a set of position weight matrices (PWMs) and using ZOOPS (zero or one occurrence per sequence) counting coupled with a hypergeometric enrichment test to determine significance. Built into HOMER is a curated set of binding site motifs taken from the TRANSFAC database (Matys, Nucleic Acids Res. 34, D108-D110, 2006). In order to expand our search entries in JASPAR core (Mathelier et al, Nucleic Acids Res. 44, Dl 10-115, 2016), a curated collection of transcription factor binding profiles were converted into PWMs for use in the analysis. HOMER asks for a threshold to be set for all PWMs. This threshold determines the minimum log odds score that is allowed for a sequence to be considered a match with the motif described in a given PWM. When converting the JASPAR profiles, the threshold was set by allowing for the least likely base in the most likely mismatched nucleotide of each motif, which was chosen because it allowed for some degeneracy when searching for possible transcription factor binding sites while excluding overly mismatched sequences.

[00121] Several promoter sets are available within HOMER. We used the mm9 genome build with a promoter region defined as 2000 bp upstream and 50 bp downstream of the transcription start site for all identified genes in the mm9 build. The background gene set was restricted to those genes that were detectable in the RNA-Seq experiments, excluding those genes whose transcripts had significantly few reads mapped to them across all datasets. All other parameters available for fmdMotifs.pl were left as their defaults.

[00122] Cell Type-Specific Expression Analysis (CSEA): Cell Type-Specific

Expression Analysis (CSEA) was conducted using the publically available CSEA web-based tool provided by the Dougherty Lab (http://genetics.wustl.edu/jdiab/csea-tool-2/, Version 1.0: Updated 10/11/13) and described in Xu et al. (J. Neurosci. 34, 1420-1431, 2014). Uniquely enriched genes of individual iN populations served as the input candidate gene lists.

Uniquely enriched genes were defined as genes significantly enriched (p-adjusted value < 0.05) in each iN population versus all other iN populations and MEFs as determined by DESeq2. Overlap of these gene lists with a particular cell type or region for which data are currently available were identified by Fisher’s exact test with Benjamini-Hochberg correction.

[00123] Weighted Gene Co-expression Network Analysis (WGCNA): Weighted

Gene Co-expression Network Analysis (WGCNA) has been previously described in detail (Zhang & Horvath, Stat Appl. Genet. Mol. Biol. 2005) and also summarized in papers utilizing this technique (Hawrylycz et al, Nature Neurosci. 18, 1832-1844, 2015; and Hawrylycz et al, Nature 489, 391-399, 2012). DESeq2 vsd-normalized counts of all iN and MEF population replicates (n = 72) served as input into a user-friendly WGCNA R library (Langfelder & Horvath, BMC Bioinformatics 9, 559, 2008). To reduce the noise from low expressing genes in our dataset, we only included genes in which the non-normalized counts were greater than 200 in at least one iN or MEF population, in both replicates (n = 12,549). We constructed a signed network, with a power of 12, using the default parameters except deepSplit = 4 and cutHeight = 0.999. Modules were merged if their module eigengenes (ME) were correlated with R > 0.8. Module hub genes were those that had the highest module membership ( E) for that module, which was calculated as the Pearson correlation between the gene and the corresponding ME.

[00124] Calcium Imaging: Calcium imaging was performed day 16-24 post induction on iNs transduced with a Map2::GCAMP5.G lentiviral reporter (Addis et al, PLoS One 6, e287l9, 2011) and Synapsin-TdTomato lentiviral reporter. Imaging was performed in Tyrode’s solution (145 mM NaCl, 2.5 mM KCI, 10 mM Hepes, NaH2P04, 2 mM CaC12, 1 mM MgC!2, 10 mM Glucose, and 0.4 mM ascorbic acid) at a constant flow rate. In a randomized order, we serially exposed the iNs to 1 mM glutamate and 100 mM nicotine by direct application to the area of interest. We only analyzed Synapsin-TdTomato-positive cells that responded to transient exposure to 100-250 mM KCI at the beginning and end of each recording to ensure iNs exhibited neuronal identity and maintained functional viability throughout the recording. Additionally, we did not include mechanosensitive cells that responded to Tyrode’s solution alone. As similarly described in Blanchard et al. (2015), calcium responses were calculated as the change in fluorescence intensity (A F) over the initial fluorescence intensity' (F - IF) ! ·.·. where F is the fluorescence at a given time point and Fb was calculated as the average of the first five unstimulated fluorescence measurements at the start of imaging. A non-response area for each recording was measured for background subtraction. The threshold for a positive calcium response to the addition of a ligand was determined as 1 ( ~ F0)/F0 greater than 0.01 in a 10 sec window.

[00125] Statistics: Statistical analyses conducted on the data presented were performed using GraphPad Prism? and detailed in the corresponding figure legends. Data from electrophysiology experiments were analyzed by one-way ANOVA followed by Bonferroni’s Multiple Comparison Test post-hoc. Similarity of variance between groups was confirmed by Brown-Forsythe test. Data from calcium imaging experiments were analyzed by unpaired Student ' s f-test. Similarity of variance between groups was confirmed by test.

[00126] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.

[00127] All publications, GenBank sequences, ATCC deposits, patents and patent applications cited herein are hereby expressly incorporated by reference in their entirety and for all purposes as if each is individually so denoted.