Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A PSEUDO-RANDOM DNA EDITOR FOR EFFICIENT AND CONTINUOUS NUCLEOTIDE DIVERSIFICATION IN HUMAN CELLS
Document Type and Number:
WIPO Patent Application WO/2020/206325
Kind Code:
A2
Abstract:
The present disclosure provides compositions and methods for performance of targeted mutagenesis in higher eukaryotic cells, e.g., mammalian cells, across large stretches of targeted sequence. Compositions and methods that rely upon combination of a bacteriophage polymerase with a nucleic acid-editing deaminase to achieve robust mutagenesis of targeted regions of nucleic acid sequence under control of a phage promoter are specifically provided.

Inventors:
CHEN FEI (US)
CHEN HAIQI (US)
LIU SOPHIA (US)
PADULA SAMUEL (US)
GRISWOLD KETTNER (US)
Application Number:
PCT/US2020/026679
Publication Date:
October 08, 2020
Filing Date:
April 03, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BROAD INST INC (US)
HARVARD COLLEGE (US)
International Classes:
C12N15/86
Attorney, Agent or Firm:
COWLES, Christopher R. et al. (US)
Download PDF:
Claims:
We Claim:

1. A fusion protein comprising:

(i) a bacteriophage RNA polymerase and

(ii) a nucleic acid-editing deaminase.

2. The fusion protein of claim 1, wherein the bacteriophage RNA polymerase is selected from the group consisting of a T7 RNA polymerase and a T7-like RNA polymerase, optionally wherein the T7-like RNA polymerase is a N4 RNA polymerase.

3. The fusion protein of claim 1, wherein the nucleic acid-editing deaminase is selected from the group consisting of a cytidine deaminase, an adenine deaminase and a guanine deaminase, optionally wherein the cytidine deaminase is an activation-induced cytidine deaminase, optionally wherein the activation-induced cytidine deaminase is rat APOBEC1 or AID, optionally wherein the AID cytidine deaminase is a hyperactive mutant of AID, optionally wherein the hyperactive mutant of AID is AID*A.

4. The fusion protein of claim 1, further comprising a nuclear localization signal (NLS), optionally wherein the NLS is attached at the C-terminus of the fusion protein.

5. The fusion protein of claim 1, further comprising a uracil glycosylase inhibitor (UGI), optionally wherein the UGI is attached at a location C-terminal to the nucleic acid-editing deaminase and the bacteriophage RNA polymerase.

6. A nucleic acid comprising:

(i) a nucleic acid sequence encoding for a bacteriophage RNA polymerase and

(ii) a nucleic acid sequence encoding for a nucleic acid-editing deaminase.

7. The nucleic acid of claim 6, wherein the bacteriophage RNA polymerase is selected from the group consisting of a T7 RNA polymerase and a T7-like RNA polymerase, optionally wherein the T7-like RNA polymerase is a N4 RNA polymerase.

8. The nucleic acid of claim 6, wherein the nucleic acid-editing deaminase is selected from the group consisting of a cytidine deaminase, an adenine deaminase and a guanine deaminase, optionally wherein the cytidine deaminase is an activation-induced cytidine deaminase, optionally wherein the activation-induced cytidine deaminase is rat APOBEC1 or AID, optionally wherein the AID cytidine deaminase is a hyperactive mutant of AID, optionally wherein the hyperactive mutant of AID is AID*A.

9. The nucleic acid of claim 6, further comprising a nucleic acid sequence encoding for a nuclear localization signal (NLS), optionally wherein nucleic acid sequence encoding for the NLS is attached at the 3’-terminus of the nucleic acid.

10. The nucleic acid of claim 6, further comprising a nucleic acid sequence encoding for a uracil glycosylase inhibitor (UGI), optionally wherein the nucleic acid sequence encoding for the UGI is attached at a location 3’ of the nucleic acid sequence encoding for the nucleic acid editing deaminase and the nucleic acid sequence encoding for the bacteriophage RNA polymerase.

11. The nucleic acid of claim 6, further comprising a mammalian expression vector promoter, optionally wherein the mammalian expression vector promoter is located 5’ of the nucleic acid sequence encoding for a bacteriophage RNA polymerase and the nucleic acid sequence encoding for the nucleic acid-editing deaminase, optionally wherein the mammalian expression vector promoter is selected from the group consisting of a CMV promoter, a SV-40 promoter, an (EF)-l promoter and a tetracycline-inducible mammalian promoter.

12. The nucleic acid of claim 6, further comprising an origin of replication, optionally wherein the nucleic acid is a plasmid.

13. A mammalian cell comprising a first nucleic acid of any one of claims 6-12.

14. The mammalian cell of claim 13, wherein the cell further comprises a second nucleic acid comprising a bacteriophage promoter corresponding to the bacteriophage RNA polymerase of the first nucleic acid, optionally wherein the bacteriophage promoter is a T7 promoter or is a T7-like promoter, optionally wherein the T7-like promoter is a N4 promoter.

15. The mammalian cell of claim 14, wherein the bacteriophage promoter of the second nucleic acid is operably linked to a target nucleic acid sequence, optionally wherein the target nucleic acid sequence is a mammalian target nucleic acid sequence, optionally wherein the mammalian target nucleic acid sequence is selected from the group consisting oiABLl, FLT3, MCL1, PRKCQ, WEE1, ABL2, FNTA, MDM2, PRKCSH, XIAP, AKT1, GSK3A, MEK1, PRKCZ, AKT2, GSK3B, MET, PRKDC, AKT3, HDAC1, MTOR, PSENEN, ALK, HDAC2, NFKB1, PSMB5, AR, HDAC3, NTRK1, PTK2, ATM, HDAC6, P4HB, PTPN11, AURKA, HDAC8, p53, PTPN6, AURKB, HER2, PAK1, RAC1, AURKC, HSP90AA1, PARP1, RET, BCL2, HSP90AB1, PDGFRA, ROCK1, BCL-ABL1, HSP90AB4P, PDGFRB, ROCK2, BMX, HSP90B1, PDK1, RPS6KA1, BRAE, HSP90B3P, PIK3CA, RPS6KA2, BTK, IGF 1R, PIK3CB, RPS6KA3, CASP3, IKBKE, PIK3CD, RPS6KA4, CCR5, ITK, PIK3CG, RPS6KA5, CDK1, JAK2, PLK1, RPS6KA6, CDK2, KDR, PLK2, RPS6KB2, CDK4, KIT, PLK3, RXRA, CDK6, KRAS, PPM1D, RXRB, CDK7, MAP2K1, PRKAAl, SGK3, CTNNB1, MAP2K2, PRKCA, SMO, DHFR, MAPK11, PRKCB, SRC, EGFR, MAPK12, PRKCD, SYK, ERBB2, MAPK13, PRKCE, TBK1, FGFR1, MAPK14, PRKCG, TEC, FGFR3, MAPK7, PRKCH, TNF, FLT1, MAPK8, PRKCI and TOPI.

16. The mammalian cell of claim 14, wherein the second nucleic acid is harbored on a plasmid within the mammalian cell.

17. The mammalian cell of claim 14, wherein the second nucleic acid is integrated into the genome of the mammalian cell, optionally wherein the second nucleic acid is integrated into the genome of the mammalian cell at the Rosa 26 locus, optionally wherein the first nucleic acid and the second nucleic acid are integrated into the genome of the mammalian cell at the Rosa 26 locus.

18. The mammalian cell of claim 14, wherein the mammalian cell is a mouse cell, optionally a mouse oocyte cell.

19. The mammalian cell of claim 17, further comprising a cell type-specific Cre- recombinase or Cre-ER capable of inducing conditional expression of the first nucleic acid and/or the second nucleic acid where Cre-recombinase is present.

20. The mammalian cell of claim 14, wherein the mammalian cell is a cell of a mammalian cell line, optionally wherein the mammal cell line is selected from the group consisting of HEK293T, VERO, BHK, HeLa, CV1, MDCK, 3T3, a myeloma cell line, PC12, WI38, and Chinese hamster ovary (CHO).

21. A method for performing mutagenesis upon a target nucleic acid of a mammalian cell, the method comprising:

(a) providing a mammalian cell;

(b) contacting the mammalian cell with:

(i) a first nucleic acid of any one of claims 6-12; and

(ii) a second nucleic acid comprising a bacteriophage promoter operably linked to a target nucleic acid;

wherein said contacting with said first nucleic acid and said second nucleic acid is performed in any order, including concurrently; and

(c) culturing the mammalian cell for a duration of time sufficient for mutation of the target nucleic acid to be detected.

22. The method of claim 21, wherein the first nucleic acid is harbored on a plasmid.

23. The method of claim 22, wherein said contacting step (b) comprises transfecting the first nucleic acid into the mammalian cell.

24. The method of claim 21, wherein said contacting step (b) comprises genomic integration of the first nucleic acid.

25. The method of claim 21, wherein the second nucleic acid is harbored on a plasmid.

26. The method of claim 22, wherein said contacting step (b) comprises transfecting the second nucleic acid into the mammalian cell.

27. The method of claim 21, wherein said contacting step (b) comprises genomic integration of the second nucleic acid.

28. A kit comprising a nucleic acid of any one of claims 6-12 and instructions for its use.

29. The kit of claim 28, further comprising a transfection agent, optionally wherein the transfection agent is a lentivirus.

Description:
A PSEUDO-RANDOM DNA EDITOR FOR EFFICIENT AND CONTINUOUS NUCUEOTIDE DIVERSIFICATION IN HUMAN CEUUS

CROSS REFERENCE TO REUATED APPUICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/830,084 filed April 5, 2019, entitled“A Pseudo-Random DNA Editor for Efficient and Continuous Nucleotide Diversification in Human Cells,” the entire contents of which are incorporated herein by reference.

STATEMENT REGARDING FEDERAUUY SPONSORED RESEARCH

This invention was made with government support under Grant No. 1DP5OD024583 awarded by the National Institutes of Health. The government has certain rights in the invention.

FIEUD OF THE INVENTION

The invention relates generally to methods of DNA editing capable of providing efficient and continuous nucleotide diversification in human cells.

BACKGROUND OF THE INVENTION

The advancement of methods for studying the genetic dynamics of eukaryotic cells, such as directed evolution, lineage tracing, and molecular recording, depends upon development of additional tools for targeted, continuous mutagenesis. Existing tools tend to rely upon non- physiological environments, tend to saturate mutagenized sites rapidly, and/or have only been adapted in bacterial or yeast systems. While approaches for relatively long editing regions have been identified and demonstrated in bacterial and yeast cells, a need exists for an editor system that is efficient in inducing continuous nucleotide diversification in cells of multicellular eukaryotic organisms, especially in mammalian cells.

BRIEF SUMMARY OF THE INVENTION

The current disclosure relates, at least in part, to the discovery of compositions and methods capable of performing targeted mutagenesis in higher eukaryotic cells, particularly in mammalian cells in culture, across large spans of targeted nucleic acid sequence, at mutation rates that are robust as compared to background rates of polymerase-mediated mutation. In certain aspects, the compositions and methods of the instant disclosure provide for enhanced, targeted mutagenesis of mammalian cells capable of enabling directed evolution of targeted sequences in living cells. Accordingly, application of the instant compositions and methods to drug and/or peptide evolution and screening in mammalian cell lines is expressly contemplated, as are other applications as set forth herein and as known in the art.

In one aspect, the instant disclosure provides a fusion protein that includes: (i) a bacteriophage RNA polymerase and (ii) a nucleic acid-editing deaminase.

In one embodiment, the bacteriophage RNA polymerase is a T7 RNA polymerase or a T7-like RNA polymerase. Optionally, the T7-like RNA polymerase is a N4 RNA polymerase.

In another embodiment, the nucleic acid-editing deaminase is a cytidine deaminase, an adenine deaminase and/or a guanine deaminase. Optionally, the cytidine deaminase is an activation-induced cytidine deaminase. Optionally, the activation-induced cytidine deaminase is rat APOBEC1 or AID. Optionally, the AID cytidine deaminase is a hyperactive mutant of AID. Optionally, the hyperactive mutant of AID is AID*A.

In an additional embodiment, the fusion protein further includes a nuclear localization signal (NLS). Optionally, the NLS is attached at the C-terminus of the fusion protein.

In certain embodiments, the fusion protein further includes a uracil glycosylase inhibitor (UGI). Optionally, the UGI is attached at a location C-terminal to the nucleic acid-editing deaminase and the bacteriophage RNA polymerase.

Another aspect of the instant disclosure provides a nucleic acid that includes: (i) a nucleic acid sequence encoding for a bacteriophage RNA polymerase and (ii) a nucleic acid sequence encoding for a nucleic acid-editing deaminase.

In one embodiment, the nucleic acid further includes a nucleic acid sequence encoding for a nuclear localization signal (NLS). Optionally, nucleic acid sequence encoding for the NLS is attached at the 3’-terminus of the nucleic acid.

In another embodiment, the nucleic acid further includes a nucleic acid sequence encoding for a uracil glycosylase inhibitor (UGI). Optionally, the nucleic acid sequence encoding for the UGI is attached at a location 3’ of the nucleic acid sequence encoding for the nucleic acid editing deaminase and the nucleic acid sequence encoding for the bacteriophage RNA polymerase.

In an additional embodiment, the nucleic acid further includes a mammalian expression vector promoter. Optionally, the mammalian expression vector promoter is located 5’ of the nucleic acid sequence encoding for a bacteriophage RNA polymerase and the nucleic acid sequence encoding for the nucleic acid-editing deaminase. Optionally, the mammalian expression vector promoter is a CMV promoter, a SV-40 promoter, an (EF)-l promoter or a tetracycline-inducible mammalian promoter (e.g., Tet-On, Tet-Off, etc.)·

In another embodiment, the nucleic acid further includes an origin of replication. Optionally, the nucleic acid is a plasmid.

An additional aspect of the disclosure provides a mammalian cell that includes a first nucleic acid of the disclosure (e.g., encoding for a fusion protein that includes a bacteriophage RNA polymerase and a nucleic acid-editing deaminase).

In one embodiment, the mammalian cell further harbors a second nucleic acid that includes a bacteriophage promoter corresponding to the bacteriophage RNA polymerase of the first nucleic acid. Optionally, the bacteriophage promoter is a T7 promoter or is a T7-like promoter. Optionally, the T7-like promoter is aN4 promoter.

In certain embodiments, the bacteriophage promoter of the second nucleic acid is operably linked to a target nucleic acid sequence. Optionally, the target nucleic acid sequence is a mammalian target nucleic acid sequence. Optionally, the mammalian target nucleic acid sequence is ABI.l FLT3, MCL1, PRKCQ, WEE1, ABL2, FNTA, MDM2, PRKCSH, XIAP, AKT1, GSK3A, MEK1, PRKCZ, AKT2, GSK3B, MET, PRKDC, AKT3, HDAC1, MTOR, PSENEN, ALK, HDAC2, NFKB1, PSMB5, AR, HDAC3, NTRK1, PTK2, ATM, HDAC6, P4HB, PTPN11, AURKA, HDAC8, p53, PTPN6, AURKB, HER2, PAK1, RAC1, AURKC, HSP90AA1, PARP1, RET, BCL2, HSP90AB1, PDGFRA, ROCK1, BCL-ABL1, HSP90AB4P, PDGFRB, ROCK2, BMX, HSP90B1, PDK1, RPS6KA1, BRAF, HSP90B3P, PIK3CA, RPS6KA2, BTK, IGF 1R, PIK3CB, RPS6KA3, CASP3, IKBKE, PIK3CD, RPS6KA4, CCR5, ITK, PIK3CG, RPS6KA5, CDK1, JAK2, PLK1, RPS6KA6, CDK2, KDR, PLK2, RPS6KB2, CDK4, KIT, PLK3, RXRA, CDK6, KRAS, PPM1D, RXRB, CDK7, MAP2K1, PRKAAl, SGK3, CTNNB1, MAP2K2, PRKCA, SMO, DHFR MAPK11, PRKCB, SRC, EGFR, MAPK12, PRKCD, SYK, ERBB2, MAPK13, PRKCE, TBK1, FGFR1, MAPK14, PRKCG, TEC, FGFR3, MAPK7, PRKCH, TNF, FLT1, MAPK8, PRKCI and/or TOPI.

In some embodiments, the second nucleic acid is harbored on a plasmid within the mammalian cell.

In an embodiment, the second nucleic acid is integrated into the genome of the mammalian cell. Optionally, the second nucleic acid is integrated into the genome of the mammalian cell at the Rosa 26 locus. Optionally, the first nucleic acid and the second nucleic acid are integrated into the genome of the mammalian cell at the Rosa 26 locus. In embodiments, the mammalian cell is a mouse cell. Optionally, the mammalian cell is a mouse oocyte cell.

In certain embodiments, the mammalian cell further harbors a cell type-specific Cre- recombinase or Cre-ER capable of inducing conditional expression of the first nucleic acid and/or the second nucleic acid where Cre-recombinase is present.

In one embodiment, the mammalian cell is a cell of a mammalian cell line. Optionally, the mammal cell line is HEK293T, VERO, BHK, HeLa, CV1, MDCK, 3T3, a myeloma cell line, PC12, WI38 or Chinese hamster ovary (CHO).

Another aspect of the instant disclosure provides a method for performing mutagenesis upon a target nucleic acid of a mammalian cell, the method involving: (a) providing a mammalian cell; (b) contacting the mammalian cell with: (i) a first nucleic acid of the instant disclosure; and (ii) a second nucleic acid that includes a bacteriophage promoter operably linked to a target nucleic acid; where contacting of the mammalian cell with the first nucleic acid and the second nucleic acid is performed in any order, including concurrently; and (c) culturing the mammalian cell for a duration of time sufficient for mutation of the target nucleic acid to be detected.

In one embodiment, the first nucleic acid is harbored on a plasmid.

In another embodiment, contacting step (b) includes transfecting the first nucleic acid into the mammalian cell. Optionally, the transfecting involves a lentivirus.

In other embodiments, contacting step (b) includes genomic integration of the first nucleic acid.

In certain embodiments, the second nucleic acid is harbored on a plasmid.

In an additional embodiment, contacting step (b) involves transfecting the second nucleic acid into the mammalian cell.

In other embodiments, contacting step (b) involves genomic integration of the second nucleic acid.

A further aspect of the instant disclosure provides a kit that includes a nucleic acid of the instant disclosure and instructions for its use.

In one embodiment, the kit further includes a transfection agent. Optionally, the transfection agent is a lentivirus.

Definitions

As used herein, the term“bacteriophage RNA polymerase” refers to any bacteriophage- derived RNA polymerase (RNAP) that possesses DNA processivity, which is expressly contemplated to include all variant, mutant and/or derivative forms of bacteriophage RNAP, provided that DNA processivity is maintained. Specific examples of RNAP are set forth below, and include, without limitation, T7 RNAP and T7-like RNA polymerases, such as T3 RNAP, SP6 RNAP and/or N4 RNAP.

The term“nucleic acid-editing deaminase,” as used herein, refers to any deaminase that is capable of performing somatic hypermutation. Deaminases effect the deamination or removal of an amine group of a nucleic acid. Expressly contemplated examples of nucleic acid-editing deaminases include, but are not limited to, adenine deaminase, cytidine deaminase (including activation-induced cytidine deaminase), and guanine deaminase. Specific examples of nucleic acid-editing deaminases are provided in additional detail elsewhere herein.

The term "fusion protein" as used herein refers to an engineered polypeptide that combines sequence elements excerpted from two or more other proteins, optionally from two or more naturally-occurring proteins.

The terms "transfect," "transfects," "transfecting" and "transfection" as used herein refer to the delivery of nucleic acids (usually DNA or RNA) to the cytoplasm or nucleus of cells, e.g. , through the use of lentiviral delivery vectors/plasmids, cationic lipid vehicle(s) and/or by means of electroporation, or other art-recognized means of transfection.

The term "plasmid" as used herein refers to a construction comprised of genetic material designed to direct transformation of a targeted cell. The plasmid consist of a plasmid backbone. A "plasmid backbone" as used herein contains multiple genetic elements positional and sequentially oriented with other necessary genetic elements such that the nucleic acid in a nucleic acid cassette can be transcribed and when necessary translated in the transfected cells. The term plasmid as used herein can refer to nucleic acid, e.g., DNA derived from a plasmid vector, cosmid, phagemid or bacteriophage, into which one or more fragments of nucleic acid may be inserted or cloned which encode for particular genes

A "viral vector" as used herein is one that is physically incorporated in a viral particle by the inclusion of a portion of a viral genome within the vector, e.g., a packaging signal, and is not merely DNA or a located gene taken from a portion of a viral nucleic acid. Thus, while a portion of a viral genome can be present in a plasmid of the present disclosure, that portion does not cause incorporation of the plasmid into a viral particle and thus is unable to produce an infective viral particle.

As used herein, the term“vector” refers to any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc., which is capable of replication when associated with the proper control elements and which can transfer gene sequences between cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.

As used herein, the term“integrating vector” refers to a vector whose integration or insertion into a nucleic acid (e.g., a chromosome) is accomplished via an integrase. Examples of “integrating vectors” include, but are not limited to, retroviral vectors, transposons, and adeno associated virus vectors.

As used herein, the term“integrated” refers to a vector that is stably inserted into the genome (i.e., into a chromosome) of a host cell.

As used herein, the term“genome” refers to the genetic material (e.g., chomosomes) of an organism.

The term“target nucleic acid” refers to any nucleotide sequence (e.g., RNA or DNA), the manipulation of which may be deemed desirable for any reason (e.g., for directed evolution, to treat disease, confer improved qualities, expression of a protein of interest in a host cell, expression of a ribozyme, etc.), by one of ordinary skill in the art. Such nucleic acid sequences include, but are not limited to, coding sequences of genes (e.g., enzyme-encoding genes, transcription factor-encoding genes, cytokine-encoding genes, reporter genes, selection marker genes, oncogenes, drug resistance genes, growth factors, etc.), and non-coding regulatory sequences which do not encode an mRNA or protein product (e.g., promoter sequence, polyadenylation sequence, termination sequence, enhancer sequence, etc.).

As used herein, the term“exogenous gene” refers to a gene that is not naturally present in a host organism or cell, or is artificially introduced into a host organism or cell.

The term“gene” refers to a nucleic acid (e.g., DNA or RNA) sequence that comprises coding sequences necessary for the production of a polypeptide or precursor (e.g., proinsulin). The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, etc.) of the full-length or fragment are retained. The term also encompasses the coding region of a structural gene and includes sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. The sequences that are located 5' of the coding region and which are present on the mRNA are referred to as 5' untranslated sequences. The sequences that are located 3' or downstream of the coding region and which are present on the mRNA are referred to as 3' untranslated sequences. The term“gene” encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed “introns” or “intervening regions” or“intervening sequences.” Introns are segments of a gene which are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or“spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.

As used herein, the term“gene expression” refers to the process of converting genetic information encoded in a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through “transcription” of the gene (i.e., via the enzymatic action of an RNA polymerase), and for protein encoding genes, into protein through“translation” of mRNA. Gene expression can be regulated at many stages in the process.“Up-regulation” or“activation” refers to regulation that increases the production of gene expression products (i.e., RNA or protein), while“down-regulation” or “repression” refers to regulation that decrease production. Molecules (e.g., transcription factors) that are involved in up-regulation or down-regulation are often called “activators” and “repressors,” respectively.

Where“amino acid sequence” is recited herein to refer to an amino acid sequence of a naturally occurring protein molecule, “amino acid sequence” and like terms, such as “polypeptide” or“protein” are not meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the recited protein molecule.

As used herein, the terms“nucleic acid molecule encoding,”“DNA sequence encoding,” “DNA encoding,”“RNA sequence encoding,” and“RNA encoding” refer to the order or sequence of deoxyribonucleotides or ribonucleotides along a strand of deoxyribonucleic acid or ribonucleic acid. The order of these deoxyribonucleotides or ribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA or RNA sequence thus codes for the amino acid sequence.

As used herein, the term“variant,” when used in reference to a protein, refers to proteins encoded by partially homologous nucleic acids so that the amino acid sequence of the proteins varies. As used herein, the term“variant” encompasses proteins encoded by homologous genes having both conservative and nonconservative amino acid substitutions that do not result in a change in protein function, as well as proteins encoded by homologous genes having amino acid substitutions that cause decreased (e.g., null mutations) protein function or increased protein function. The terms“in operable combination,”“in operable order,” and“operably linked” as used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

As used herein, the term“regulatory element” refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element that facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, RNA export elements, internal ribosome entry sites, etc.

Transcriptional control signals in eukaryotes comprise “promoter” and“enhancer” elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription (Maniatis et al, Science 236: 1237 [1987]). Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect and mammalian cells, and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types (for review see, Voss et al, Trends Biochem. Sci., 11:287 [1986]; and Maniatis et al, supra). For example, the SV40 early gene enhancer is very active in a wide variety of cell types from many mammalian species and has been widely used for the expression of proteins in mammalian cells (Dijkema et al, EMBO J. 4:761 [1985]). Two other examples of promoter/enhancer elements active in a broad range of mammalian cell types are those from the human elongation factor la gene (Uetsuki et al, J. Biol. Chern, 264:5791 [1989]; Kim et al, Gene 91:217 [1990]; and Mizushima andNagata, Nuc. Acids. Res., 18:5322 [1990]) and the long terminal repeats of the Rous sarcoma virus (Gorman et al, Proc. Natl. Acad. Sci. USA 79:6777 [1982]) and the human cytomegalovirus (Boshart et al, Cell 41 :521 [1985]).

As used herein, the term“promoter/enhancer” denotes a segment of DNA which contains sequences capable of providing both promoter and enhancer functions (i.e., the functions provided by a promoter element and an enhancer element, see above for a discussion of these functions). For example, the long terminal repeats of retroviruses contain both promoter and enhancer functions. The enhancer/promoter may be “endogenous” or “exogenous” or “heterologous.” An“endogenous” enhancer/promoter is one which is naturally linked with a given gene in the genome. An“exogenous” or“heterologous” enhancer/promoter is one which is placed in juxtaposition to a gene by means of genetic manipulation (i.e., molecular biological techniques such as cloning and recombination) such that transcription of that gene is directed by the linked enhancer/promoter.

The term“promoter,”“promoter element,” or“promoter sequence” as used herein, refers to a DNA sequence which when ligated to a nucleotide sequence of interest is capable of controlling the transcription of the nucleotide sequence of interest into mRNA. A promoter is typically, though not necessarily, located 5' (i.e., upstream) of a nucleotide sequence of interest whose transcription into mRNA it controls, and provides a site for specific binding by RNA polymerase and other transcription factors for initiation of transcription.

Promoters may be constitutive or regulatable. The term“constitutive” when made in reference to a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid sequence in the absence of a stimulus (e.g., heat shock, chemicals, etc.). In contrast, a“regulatable” promoter is one which is capable of directing a level of transcription of an operably linked nucleic acid sequence in the presence of a stimulus (e.g., heat shock, chemicals, etc.) which is different from the level of transcription of the operably linked nucleic acid sequence in the absence of the stimulus.

Eukaryotic expression vectors may also contain“viral replicons” or“viral origins of replication.” Viral replicons are viral DNA sequences that allow for the extrachromosomal replication of a vector in a host cell expressing the appropriate replication factors. Vectors that contain either the SV40 or polyoma virus origin of replication replicate to high“copy number” (up to 104 copies/cell) in cells that express the appropriate viral T antigen. Vectors that contain the replicons from bovine papillomavirus or Epstcin-Barr virus replicate extrachromosomally at “low copy number” C l 00 copies/cell). However, it is not intended that expression vectors be limited to any particular viral origin of replication.

As used herein, the term“retrovirus” refers to a retroviral particle which is capable of entering a cell (i.e., the particle contains a membrane-associated protein such as an envelope protein or a viral G glycoprotein which can bind to the host cell surface and facilitate entry of the viral particle into the cytoplasm of the host cell) and integrating the retroviral genome (as a double-stranded provirus) into the genome of the host cell. The term“retrovirus” encompasses Oncovirinae (e.g., Moloney murine leukemia virus (MoMOLV), Moloney murine sarcoma virus (MoMSV), and Mouse mammary tumor virus (MMTV), Spumavirinae, amd Lentivirinae (e.g., Human immunodeficiency virus, Simian immunodeficiency virus, Equine infection anemia virus, and Caprine arthritis-encephalitis virus; See, e.g., U.S. Pat. Nos. 5,994,136 and 6,013,516, both of which are incorporated herein by reference).

As used herein, the term“retroviral vector” refers to a retrovirus that has been modified to express a gene of interest. Retroviral vectors can be used to transfer genes efficiently into host cells by exploiting the viral infectious process. Foreign or heterologous genes cloned (i.e., inserted using molecular biological techniques) into the retroviral genome can be delivered efficiently to host cells which are susceptible to infection by the retrovirus.

The term“Rhabdoviridae” refers to a family of enveloped RNA viruses that infect animals, including humans, and plants. The Rhabdoviridae family encompasses the genus Vesiculovirus which includes vesicular stomatitis virus (VSV), Cocal virus, Piry virus, Chandipura virus, and Spring viremia of carp virus (sequences encoding the Spring viremia of carp virus are available under GenBank accession number U18101). The G proteins of viruses in the Vesiculovirus genera are virally-encoded integral membrane proteins that form externally projecting homotrimeric spike glycoproteins complexes that are required for receptor binding and membrane fusion. The G proteins of viruses in the Vesiculovirus genera have a covalently bound palmi titic acid (Cl 6) moiety. The amino acid sequences of the G proteins from the Vesiculoviruses are fairly well conserved. For example, the Piry virus G protein share about 38% identity and about 55% similarity with the VSV G proteins (several strains of VSV are known, e.g., Indiana, New Jersey, Orsay, San Juan, etc., and their G proteins are highly homologous). The Chandipura virus G protein and the VSV G proteins share about 37% identity and 52% similarity. Given the high degree of conservation (amino acid sequence) and the related functional characteristics (e.g., binding of the virus to the host cell and fusion of membranes, including syncytia formation) of the G proteins of the Vesiculoviruses, the G proteins from non- VSV Vesiculoviruses may be used in place of the VSV G protein for the pseudotyping of viral particles. The G proteins of the Lyssa viruses (another genera within the Rhabdoviridae family) also share a fair degree of conservation with the VSV G proteins and function in a similar manner (e.g., mediate fusion of membranes) and therefore may be used in place of the VSV G protein for the pseudotyping of viral particles. The Lyssa viruses include the Mokola virus and the Rabies viruses (several strains of Rabies virus are known and their G proteins have been cloned and sequenced). The Mokola virus G protein shares stretches of homology (particularly over the extracellular and transmembrane domains) with the VSV G proteins which show about 31% identity and 48% similarity with the VSV G proteins. Preferred G proteins share at least 25% identity, preferably at least 30% identity and most preferably at least 35% identity with the VSV G proteins. The VSV G protein from which New Jersey strain (the sequence of this G protein is provided in GenBank accession numbers M27165 and M21557) is employed as the reference VSV G protein.

As used herein, the term“lentivirus vector” refers to retroviral vectors derived from the Lentiviridae family (e.g., human immunodeficiency virus, simian immunodeficiency virus, equine infectious anemia virus, and caprine arthritis-encephalitis virus) that are capable of integrating into non-dividing cells (See, e.g., U.S. Pat. Nos. 5,994,136 and 6,013,516, both of which are incorporated herein by reference).

As used herein, the term“adeno-associated virus (AAV) vector” refers to a vector derived from an adeno-associated virus serotype, including without limitation, AAV-1, AAV-2, AAV-3, AAV -4, AAV-5, AAVX7, etc. AAV vectors can have one or more of the AAV wild-type genes deleted in whole or part, preferably the rep and/or cap genes, but retain functional flanking ITR sequences.

As used herein the term, the term“in vitro” refers to an artificial environment and to processes or reactions that occur within an artificial environment. In vitro environments can consist of, but are not limited to, test tubes and cell cultures. The term“in vivo” refers to the natural environment (e.g., an animal or a cell) and to processes or reaction that occur within a natural environment.

As used herein, the term“clonally derived” refers to a cell line that it derived from a single cell.

As used herein, the term“non-clonally derived” refers to a cell line that is derived from more than one cell.

As used herein, the term“passage” refers to the process of diluting a culture of cells that has grown to a particular density or confluency (e.g., 70% or 80% confluent), and then allowing the diluted cells to regrow to the particular density or confluency desired (e.g., by replating the cells or establishing a new roller bottle culture with the cells.

As used herein, the term“stable,” when used in reference to genome, refers to the stable maintenance of the information content of the genome from one generation to the next, or, in the particular case of a cell line, from one passage to the next. Accordingly, a genome is considered to be stable if no gross changes occur in the genome (e.g., a gene is deleted or a chromosomal translocation occurs). The term“stable” does not exclude subtle changes that may occur to the genome such as point mutations. As used herein, the term“cell culture” refers to any in vitro culture of cells. Included within this term are continuous cell lines (e.g., with an immortal phenotype), primary cell cultures, finite cell lines (e.g., non-transformed cells), and any other cell population maintained in vitro, including oocytes and embryos.

As used herein, the term“host cell” refers to any eukaryotic cell (e.g., mammalian cells, avian cells, amphibian cells, plant cells, fish cells, and insect cells), whether located in vitro or in vivo.

Unless specifically stated or obvious from context, as used herein, the term“about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean.“About” can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value.

In certain embodiments, the term "approximately" or "about" refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).

Unless otherwise clear from context, all numerical values provided herein are modified by the term“about.”

By“control” or“reference” is meant a standard of comparison. Methods to select and test control samples are within the ability of those in the art. Determination of statistical significance is within the ability of those skilled in the art, e.g., the number of standard deviations from the mean that constitute a positive result.

As used herein, the term "each," when used in reference to a collection of items, is intended to identify an individual item in the collection but does not necessarily refer to every item in the collection. Exceptions can occur if explicit disclosure or context clearly dictates otherwise.

As used herein, the term "subject" includes humans and mammals (e.g., mice, rats, pigs, cats, dogs, and horses). In many embodiments, subjects are mammals, particularly primates, especially humans. In some embodiments, subjects are livestock such as cattle, sheep, goats, cows, swine, and the like; poultry such as chickens, ducks, geese, turkeys, and the like; and domesticated animals particularly pets such as dogs and cats. In some embodiments (e.g., particularly in research contexts) subject mammals will be, for example, rodents (e.g., mice, rats, hamsters), rabbits, primates, or swine such as inbred pigs and the like. Unless specifically stated or obvious from context, as used herein, the term "or" is understood to be inclusive. Unless specifically stated or obvious from context, as used herein, the terms "a", "an", and "the" are understood to be singular or plural.

Ranges can be expressed herein as from“about” one particular value, and/or to“about” another particular value. When such a range is expressed, another aspect includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent“about,” it is understood that the particular value forms another aspect. It is further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as“about” that particular value in addition to the value itself. It is also understood that throughout the application, data are provided in a number of different formats and that this data represent endpoints and starting points and ranges for any combination of the data points. For example, if a particular data point“10” and a particular data point“15” are disclosed, it is understood that greater than, greater than or equal to, less than, less than or equal to, and equal to 10 and 15 are considered disclosed as well as between 10 and 15. It is also understood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.

Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,

42, 43, 44, 45, 46, 47, 48, 49, or 50 as well as all intervening decimal values between the aforementioned integers such as, for example, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, and 1.9. With respect to sub-ranges,“nested sub-ranges” that extend from either end point of the range are specifically contemplated. For example, a nested sub-range of an exemplary range of 1 to 50 may comprise 1 to 10, 1 to 20, 1 to 30, and 1 to 40 in one direction, or 50 to 40, 50 to 30, 50 to 20, and 50 to 10 in the other direction.

The transitional term “comprising,” which is synonymous with “including,” “containing,” or“characterized by,” is inclusive or open-ended and does not exclude additional, unrecited elements or method steps. By contrast, the transitional phrase“consisting of’ excludes any element, step, or ingredient not specified in the claim. The transitional phrase“consisting essentially of’ limits the scope of a claim to the specified materials or steps“and those that do not materially affect the basic and novel characteristic(s)” of the claimed invention.

The embodiments set forth below and recited in the claims can be understood in view of the above definitions.

Other features and advantages of the disclosure will be apparent from the following description of the preferred embodiments thereof, and from the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All published foreign patents and patent applications cited herein are incorporated herein by reference. All other published references, documents, manuscripts and scientific literature cited herein are incorporated herein by reference. In the case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description, given by way of example, but not intended to limit the disclosure solely to the specific embodiments described, may best be understood in conjunction with the accompanying drawings, in which:

FI (is. 1A to IE show that the approach set forth herein (termed“PRIME” or alternatively “TRACE” for“T7 poJymeRAce-driven Continuous Editing”) enabled targeted mutagenesis in mammalian cells within a 2000-bp window with high efficiency. FIG. 1A shows a schematic of the PRIME approach, in which the recombinant protein fusion of cytidine deaminase and T7 RNAP specifically recognizes a T7 promoter upstream of the target gene. The fusion protein subsequently reads through the DNA sequence and introduces site mutations (C G->T A). FIG. IB shows a schematic of constructs designed and used in the instant disclosure. T7 RNAP, T7 RNA polymerase; AID, activation-induced cytidine deaminase; UGI, uracil glycosylase inhibitor; NLS, nuclear localization signal. FIG. 1C shows representative sequencing reads aligned to a subset of the target region in pT7, pAID-T7, and pAID-T7-UGI, respectively. C->T mutations in the aligned reads have been highlighted in green and G->A mutations have been highlighted in red. FIG. ID shows dot plots of a representative experiment showing C->T (upper panel) and G->A (lower panel) mutation rate per base (%) across the target region (as currently exemplified, a 2000-bp window) in pT7, AID-T7 and pAID-T7-UGI group. Dot plots showing mutation rates in pAPOBEC-T7 and pAPOBEC-T7-UGI are also displayed below, in FIG. 5 A. FIG. IE shows average C->T (left) and G->A (right) mutation rates of the target region in pAPOBEC-T7, pAPOBEC-T7-UGI, pAID-T7, and pAID-T7-UGI groups (N=3 biological replicates). Background error rate was subtracted (see Example 1: Materials and Methods, below).

FI (is. 2A and 2B show that PRIME enabled continuous somatic mutations in targeted gene loci with high efficiency and negligible off-target effect. FIG. 2A shows that PRIME enabled accumulation of mutations in targeted gene loci over time. EGFP under the control of a T7 promoter was lentivirally integrated into the genome of HEK293T cells. A single integrated clone was transfected with pAID-T7-UGI vs. pAID every 3 days (upper panel). C->T and G->A mutations in the EGFP region were observed to accumulate over a course of 7 days. Lower panel shows results from two biological replicates with the same integrated clone. Background error rate was subtracted. FIG. 2B shows that PRIME exhibited negligible off-target mutation rates in the human genome. Two regions in the human genome with a single-base mismatch from the wild type conserved T7 promoter sequence are highlighted (upper panel). 2000-bp windows (designated as Chr6 & Chr7 locations) immediately downstream of the two T7 promoter-like regions were amplified and sequenced. C->T and G->A mutation rates observed for off-targets (Chr6, Chr7) in pAID-T7-UGI and pT7 group were compared to the on-target mutation rates in pAID-T7-UGI group after 1 week of transfection (lower panel).

FI (Is. 3A to 3C demonstrate engineering of the T7 RNA polymerase to achieve high efficiency PRIME. FIG. 3A depicts a schematic showing the mutations in T7 RNA polymerase tested in the Examples of the instant disclosure (upper panel). Bar graphs show the C->T and G- >A mutation rates among pEditor variants harboring different mutations in T7 RNA polymerase (lower panel) (N=2 biological replicates). FIG. 3B shows that PRIME-mediated mutation evolved a BFP fluorescence excitation and emission spectra to a GFP fluorescence excitation and emission spectra. In particular, a single H66Y amino acid substitution (CAC->TAC or TAT) caused a shift in the fluorescence excitation and emission spectra of BFP to those of GFP (left panel). Representative fluorescence microscopy images of cells transfected with the indicated editor constructs are also shown (right panel). Scale bar, 100 pm. Scale bar in insets, 15 pm. FIG. 3C summarizes the ratio of GFP-positive cells to BFP-positive cells in each group (N=3 biological replicates). FI Gs. 4A and 4B demonstrate that the PRIME approach maintained the transcriptional activity of T7 RNA polymerase. FIG. 4A shows that fusing a cytidine deaminase to T7 RNAP did not significantly hinder the transcriptional activity of the T7 RNAP. Each pEditor variant was introduced into HEK293T cells together with pTarget in which EGFP gene was solely under the control of a T7 promoter. EGFP signals were observed in cells transfected with pT7, pAPOBEC-T7, pAPOBEC-T7-UGI, pAID-T7, and pAID-T7-UGI, but not in cells transfected with pAPOEBC. Scale bar, 200 pm, which also applies to other micrographs. FIG. 4B shows a schematic of the experimental workflow for calculating the mutation rates of PRIME. Cells transfected with pTarget and pEditor plasmids were incubated for 3 days before being harvested. pTarget plasmids were extracted and PCR reactions were performed to amplify the target region. Sequencing libraries were prepared using the PCR products and next-generation sequencing was performed. Mutation rates in each group, across different pEditor variants, were calculated.

FI (Is. 5A to 5C depict that PRIME demonstrated high efficiency and specificity in human cells. FIG. 5 A shows dot plots of a representative experiment showing C->T (upper panel) and G->A (lower panel) mutation rates per base (%) across a ~2-kbp region downstream of a T7 promoter in pT7, APOBEC-T7 and pAPOBEC-T7-UGI groups. FIG. 5B shows that overexpression of cytidine deaminases alone (pAPOBEC or pAID) in the cells resulted in mutation rates that were not statistically different from the background error rates (i.e.. the mutation rates in the pT7 group). Each bar is a mean ± SD of N = 3 biological replicates. FIG. 5C shows bar graphs that display the C->A and G->T (left), C->G and G->C (right) mutation rates observed in pAID-T7 and pAID-T7-UGI groups. Background error rate was subtracted. Each bar is a mean ± SD of N = 3 biological replicates.

FIG. 6 shows that the PRIME approach demonstrated robust capability in inducing continuous somatic mutations in genomic loci. Plots show observed C->T and G->A mutations in targeted gene loci over a period of 7 days in pAID-T7-UGI vs. pAID group in two additional single cell clones. Background error rate was subtracted.

FIG. 7 displays a table in which features of the instant PRIME approach have been compared with other art-recognized methods for nucleotide diversification.

FIG. 8 displays a reconstruction of cellular lineages produced using the instant TRACE (T7 polymeRAce-driven Continuous Editing) approach over 10 days. Shown are sequence alignments from next generation sequencing (NGS) reads of a cell population that underwent TRACE-mediated diversification. The population was sampled at 4, 7 and 10 days. Highlighted in red and blue are C- T and G- A edits from the consensus. This clonal population was then extracted via consensus editing, and a lineage tree was reconstructed via maximum parsimony.

DETAILED DESCRIPTION OF THE INVENTION

The current disclosure relates, at least in part, to the identification of a system capable of performing targeted mutagenesis in higher eukaryotic cells, particularly in mammalian cells in culture, across large regions (e.g., 2 kb or more) of targeted nucleic acid sequence, at significantly elevated on-target rates of mutation, as compared to either off-target mutation rates or to background rates of polymerase-mediated mutation. In some aspects, a regions of nucleic acid sequence that is to be targeted for mutagenesis is placed under control of (operably linked to) a bacteriophage promoter (e.g., a T7 promoter), and this promoter-target nucleic acid construct is introduced to a mammalian cell (optionally via transfection). Meanwhile, a nucleic acid construct that encodes for a RNA polymerase (that recognizes the bacteriophage promoter associated with the target nucleic acid sequence) and an operably linked nucleic acid-editing deaminase is constructed and also introduced to the mammalian cell harboring the phage promoter-target nucleic acid construct. The targeted mammalian cell is then cultured for an amount of time sufficient to allow the RNA polymerase to process across the targeted nucleic acid region of interest, and to thereby introduce deaminase-mediated mutants into the targeted nucleic acid sequence during such phage RNA polymerase processing across the targeted nucleic acid.

In certain aspects, the compositions and methods of the instant disclosure therefore provide for enhanced, targeted mutagenesis of mammalian cells, to an extent that is capable of enabling directed evolution of targeted sequences in living cells. As such, application of the instant compositions and methods to drug and/or peptide evolution and screening in mammalian cell lines is expressly contemplated, as are other applications as set forth herein and as are known in the art.

Bacteriophage RNAPs have been previously identified as capable of reading through DNA sequences under the control of a specific promoter without auxiliary transcription factors (8). In particular, the T7 RNAP/T7 promoter system has been previously described as capable of serving as an orthogonal gene expression system in mammalian cells (9, 10). Somatic hypermutation machinery, especially the family of cytidine deaminases, have also been leveraged to induce DNA base switching by catalyzing the deamination of cytosine (C) and subsequent conversion to uracil (U), which is read as thymine (T) by polymerases (11). The instant disclosure has examined whether combining the DNA processivity of bacteriophage DNA-dependent RNA polymerases (RNAPs) with the somatic hypermutation capability of cytidine deaminases could enable continuous, targeted mutagenesis in eukaryotic cells. As demonstrated herein, such a system for pseudo-random integrated mutation of eukaryotic cells (PRIME) is indeed effective and robust.

Various expressly contemplated components of certain compositions and methods of the instant disclosure are considered in additional detail below.

Bacteriophage Promoters

Certain aspects of the instant disclosure relate to compositions and methods that include bacteriophage promoters, as well as corresponding bacteriophage polymerases, to achieve targeted mutagenesis in mammalian cells across long stretches of sequence. Exemplary bacteriophage promoters of the instant disclosure include, but are not limited to, the following.

T7 Bacteriophage Promoter

The T7 bacteriophage promoter has the sequence 5'-TAATACGACTCACTATAG-3' (SEQ ID NO: 1). The T7 RNA polymerase initiates transcription at the 3’-terminal guanine (G) of the T7 promoter sequence. The T7 polymerase then transcribes using the opposite strand as a template, processing from 5’->3\ The first base in a T7 polymerase transcript is therefore a guanine (G). The T7 promoter family includes both constitutive promoters and negatively regulated promoters, which can be turned off by a repressor protein. The most common bacterial strain to use with a T7 promoter system is BL21 (DE3) which is an E. coli B strain that contains a l lysogen with an inducible T7 RNAP gene on the chromosome. However, it is possible to engineer many other E. coli strains to conditionally express T7 RNAP.

T7-Like Bacteriophage Promoters

T7-like bacteriophage promoters most notably include the T3 promoter and the N4 promoter. The T3 promoter has the sequence 5'-AATTAACCCTCACTAAAG-3' (SEQ ID NO: 2). The bacteriophage T3 and T7 RNA polymerases are closely related, yet are highly specific for their own promoter sequences. T7 promoter variants that contain substitutions of T3-specific base-pairs at one or more positions within the T7 promoter consensus sequence have been previously synthesized and cloned. Template competition assays between variant and consensus promoters have demonstrated that the primary determinants of promoter specificity are located in the region from -10 to -12, and that the base-pair at -11 is of particular importance. Changing this base-pair from G:C, which is normally present in T7 promoters, to C:G, which is found at this position in T3 promoters, was identified to prevent utilization by the T7 RNA polymerase and simultaneously enabled transcription from the variant T7 promoter by the T3 enzyme. Substitution of T7 base-pairs with T3 base-pairs at other positions where the two consensus sequences diverge were also observed to affect the overall efficiency with which the variant promoter was utilized by the T7 RNA polymerase, but these changes were not sufficient to permit recognition by the T3 RNA polymerase. Switching the -11 base-pair in the T3 promoter consensus to the T7 base-pair prevented utilization by the T3 RNA polymerase, but did not allow the T3 variant promoter to be utilized by the T7 RNA polymerase. This probably reflects a greater specificity of the T7 RNA polymerase for base-pairs at other positions where the promoter sequences differ, most notably at -15. Without wishing to be bound by theory, the magnitude of the effects of base substitutions in the T7 promoter on promoter strength (-11C much greater than -IOC greater than -12A) were found to correlate with the affinity of the T7 polymerase for the promoter variants, which suggested that the discrimination of the phage RNA polymerases for their promoters was mediated primarily at the level of DNA binding, rather than at the level of initiation (Klement et al. JMol Biol. 215: 21-9).

N4 Bacteriophage Promoters

N4 bacteriophage promoters comprise conserved sequences and a 3-base loop-5-base pair (bp) stem DNA hairpin structure on single-stranded templates. As an example, N4 Bacteriophage RNAP Polymerase has been identified to bind a 20-nucleotide (nt) N4 P2 promoter deoxy oligonucleotide with high affinity (K d = 2 nM) to form a salt-resistant complex. It has also been shown that N4 Bacteriophage RNAP Polymerase interacts specifically with the central base of the hairpin loop (-11G) and a base at the stem (-8G) and that the guanine 6-keto and 7-imino groups at both positions are essential for binding and complex salt resistance. The major determinant (-11G), which has been described as presented to N4 Bacteriophage RNAP Polymerase in the context of a hairpin loop, appears to interact with N4 Bacteriophage RNAP PolymeraseTrp-129. This interaction has been described as reliant upon template single strandedness at positions -2 and -1. Contacts with the promoter have been described as disrupted when the RNA product becomes 11-12 nt long (see Wigneshweraraj etal. Biomolecules. 5: 647- 667, the entire contents of which are incorporated by reference herein, in their entirety).

Bacteriophage RNA Polymerases In certain aspects, compositions and methods that rely upon bacteriophage RNA polymerases to achieve targeted mutagenesis in mammalian cells across long stretches of sequence are provided. Bacteriophage-encoded RNA polymerase (RNAP) was first discovered in T7 phage-infected Escherichia coli cells. It was known that phage infection of host bacterial cells led to redirection of host gene expression towards generation of progeny phage particles; however, a previously uncharacterized“switching event” that provoked expression of late bacteriophage genes was first attributed to a phage-encoded RNAP. This phage RNAP was identified as recognizing promoters in the phage genome and expressing phage genes using a single-polypeptide polymerase of -100 kDa molecular weight, which is -4 times smaller than bacterial RNAPs. This was a substantial simplification from the previously known RNAPs from bacteria (5 subunits) and eukaryotes (more than 12 subunits). In spite of its relative simplicity, the single-unit T7 RNAP has been described as able to recognize promoter DNA and unwind double-stranded (ds) DNA to form open complex. After abortive initiation, it proceeds to processive RNA elongation. The simplicity of T7 phage RNAP renders it an attractive model system for study of transcription mechanisms and tool for protein expression in bacterial cells (Basu et al. Nucleic. 30; 237-250). In certain aspects of the instant disclosure, use of the T7 RNAP in concert with nucleic acid-editing deaminases is expressly contemplated for effecting mutagenesis across long stretches of target sequence in eukaryotic cells, particularly mammalian cells. It is also contemplated herein that other polymerases can be used in concert with nucleic acid-editing deaminases, to similar effect. Such other polymerases include, for example and without limitation, T7-like RNA polymerases, such as T3 RNAP, SP6 RNAP and/or N4 RNAP, as described in additional detail below.

T7 RNA Polymerase (T7 RNAP)

T7 RNA Polymerase is an RNA polymerase originally identified in T7 bacteriophage. The T7 RNAP catalyzes formation of RNA from DNA in the 5' 3' direction. T7 polymerase has been described as extremely promoter-specific and transcribes only DNA downstream of a T7 promoter 5’-TAATACGACTCACTATAG-3’ (SEQ ID NO: 1), with transcription beginning at the 3' G of the T7 promoter). T7 polymerase has also been described to require a double stranded DNA template and Mg 2+ ion as cofactor for the synthesis of RNA. It has been described as possessing a very low error rate, and has a molecular weight of 99 kDa (Sousa et al. Progress in Nucleic Acid Research and Molecular Biology. 73: 1-41).

T7-Like RNA Polymerases T7 RNA Polymerase is a member of a family of single-subunit RNAPs that comprises but is not limited to phage RNAPs including T3 RNA Polymerase, SP6 RNA Polymerase, K11 RNA Polymerase, and N4 RNA Polymerase. These non-T7 RNA polymerases are categorized as T7-like RNA Polymerases.

T3 RNA Polymerase is a member of the DNA-dependent RNA polymerase family and was originally isolated from Bacteriophage T3. It is highly specific to the T3 promoter and transcribes from DNA templates having the T3 promoter. Commercially produced T3 RNA Pol enzyme is expressed from E. coli and is active at 37 °C. It has been used in the art for RNA synthesis applications such as for generating in vitro translation templates, hybridization probes, RNA assay substrates, and others.

SP6 RNA Polymerase is a DNA-dependent RNA polymerase isolated from phage- infected Salmonella typhimurium. The enzyme has an extremely high specificity for SP6 promoter sequences (1, 2) and has been described as synthesizing large quantities of RNA from a DNA fragment inserted downstream from a promoter. Strong promoter sequences have been used to construct various cloning vectors, and inserts into the multiple cloning site of these vectors can be transcribed to generate discrete RNAs.

Kl l RNA polymerase is an RNA polymerase isolated from gene 1 of the Klebsiella phage K11. It is part of the T7 RNAP family.

N4 RNA Polymerase: Transcription of bacteriophage N4 middle genes is carried out by a phage-coded, heterodimeric RNA polymerase (N4 RNAPII), which belongs to the family of T7-like RNA polymerases. In contrast to phage T7-RNAP, N4 RNAPII displays no activity on double-stranded templates and low activity on single-stranded templates. In vivo, at least one additional N4-coded protein (pi 7) is required for N4 middle transcription.

Nucleic Acid-Editing Deaminases

Certain aspects of the instant disclosure relate to compositions and methods that relate to combining the somatic hypermutation capability of a deaminase with the DNA processivity of an orthologous bacteriophage RNA polymerase. Deamination or the removal of an amine group in nucleic acid is carried out by enzymes called deaminases that include, but are not limited to, adenine deaminase, cytidine deaminase (including activation-induced cytidine deaminase), and guanine deaminase. Adenine deaminases include E. coli TadA, human ADAR2, mouse ADA, and human ADAT2 (see Guadelli etal. Nature. 551 : 464-471). Exemplary sequences of adenine deaminases include the following.

tRNA adenosine(34) deaminase [. Escherichia coli str. K-12 substr. MG1655] (SEQ ID NO: 7): MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTA HAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGA AGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD

Escherichia coli str. K-12 substr. MG1655, complete genome (NC_000913.3) (SEQ ID NO: 8)

TTGTCTGAAGTCGAATTTAGCCACGAATACTGGATGCGTCACGCGCTGACGCTGGC

GAAACGTGCCTGGGATGAGCGGGAAGTGCCGGTCGGCGCGGTATTAGTGCATAAC

AATCGGGTAATCGGCGAAGGCTGGAACCGCCCGATTGGTCGCCATGATCCCACCG

CACATGCAGAAATCATGGCCCTGCGGCAGGGTGGTCTGGTGATGCAAAATTATCG

TCTGATCGACGCCACGTTGTATGTCACGCTTGAACCATGTGTAATGTGTGCCGGAG

CGATGATCCACAGTCGCATTGGTCGCGTGGTCTTTGGTGCGCGTGACGCGAAAACT

GGCGCTGCGGGATCTTTAATGGATGTGCTGCATCATCCGGGTATGAATCACCGAGT

GGAAATTACGGAAGGAATACTGGCGGATGAGTGCGCGGCGTTGCTCAGTGACTTC

TTTCGCATGCGCCGCCAGGAAATTAAAGCGCAGAAAAAAGCGCAATCCTCGACGG

ATT A A

Homo sapiens adenosine deaminase RNA specific B1 (ADARB1, also known as ADAR2), transcript variant 1, mRNA (NM_001112.4; SEQ ID NO: 9)

GAGGCGCTGAGGCGGCCGTGGCGGCGGCGGCGGCGGCGGCGGCAGCGGCGGCCA

AGCGGCCAGGTTGGCGGCCGGGGCTCCGGGCCGCGCGAGGCCACGGCCACGCCGC

GCCGCTGCGCACAACCAACGAGGCAGAGCGCCGCCCGGCGCGAGACTGCGGCCGA

AGCGTGGGGCGCGCGTGCGGAGGACCAGGCGCGGCGCGGCTGCGGCTGAGAGTG

GAGCCTTTCAGGCTGGCATGGAGAGCTTAAGGGGCAACTGAAGGAGACACACTGG

CCAAGCGCGGAGTTCTGCTTACTTCAGTCCTGCTGAGATACTCTCTCAGTCCGCTC

GC ACC GAAGGAAGCTGC CTTGGGAT C AGAGC AGAC AT AAAGCT AGAAAAATTT C A

AGACAGAAACAGTCTCCGCCAGTCAAGAAACCCTCAAAAGTATTTTGCCATGGAT

ATAGAAGAT GA AGAA AAC AT GAGTT C C AGC AGC ACT GATGTGAAGGAAAAC CGC

AATCTGGACAACGTGTCCCCCAAGGATGGCAGCACACCTGGGCCTGGCGAGGGCT

CTCAGCTCTCCAATGGGGGTGGTGGTGGCCCCGGCAGAAAGCGGCCCCTGGAGGA GGGCAGCAATGGCCACTCCAAGTACCGCCTGAAGAAAAGGAGGAAAACACCAGG

GCCCGTCCTCCCCAAGAACGCCCTGATGCAGCTGAATGAGATCAAGCCTGGTTTGC

AGTACACACTCCTGTCCCAGACTGGGCCCGTGCACGCGCCTTTGTTTGTCATGTCT

GTGGAGGTGAATGGCCAGGTTTTTGAGGGCTCTGGTCCCACAAAGAAAAAGGCAA

AACTCCATGCTGCTGAGAAGGCCTTGAGGTCTTTCGTTCAGTTTCCTAATGCCTCTG

AGGCCCACCTGGCCATGGGGAGGACCCTGTCTGTCAACACGGACTTCACATCTGAC

CAGGCCGACTTCCCTGACACGCTCTTCAATGGTTTTGAAACTCCTGACAAGGCGGA

GCCTCCCTTTTACGTGGGCTCCAATGGGGATGACTCCTTCAGTTCCAGCGGGGACC

TCAGCTTGTCTGCTTCCCCGGTGCCTGCCAGCCTAGCCCAGCCTCCTCTCCCTGTCT

TACCACCATTCCCACCCCCGAGTGGGAAGAATCCCGTGATGATCTTGAACGAACTG

CGCCCAGGACTCAAGTATGACTTCCTCTCCGAGAGCGGGGAGAGCCATGCCAAGA

GCTTCGTCATGTCTGTGGTCGTGGATGGTCAGTTCTTTGAAGGCTCGGGGAGAAAC

AAGAAGCTTGCCAAGGCCCGGGCTGCGCAGTCTGCCCTGGCCGCCATTTTTAACTT

GCACTTGGATCAGACGCCATCTCGCCAGCCTATTCCCAGTGAGGGTCTTCAGCTGC

ATTTACCGCAGGTTTTAGCTGACGCTGTCTCACGCCTGGTCCTGGGTAAGTTTGGT

GACCTGACCGACAACTTCTCCTCCCCTCACGCTCGCAGAAAAGTGCTGGCTGGAGT

CGTCATGACAACAGGCACAGATGTTAAAGATGCCAAGGTGATAAGTGTTTCTACA

GGAACAAAATGTATTAATGGTGAATACATGAGTGATCGTGGCCTTGCATTAAATGA

CTGCCATGCAGAAATAATATCTCGGAGATCCTTGCTCAGATTTCTTTATACACAAC

TTGAGCTTTACTTAAATAACAAAGATGATCAAAAAAGATCCATCTTTCAGAAATCA

GAGCGAGGGGGGTTTAGGCTGAAGGAGAATGTCCAGTTTCATCTGTACATCAGCA

CCTCTCCCTGTGGAGATGCCAGAATCTTCTCACCACATGAGCCAATCCTGGAAGAA

CCAGCAGATAGACACCCAAATCGTAAAGCAAGAGGACAGCTACGGACCAAAATA

GAGTCTGGTGAGGGGACGATTCCAGTGCGCTCCAATGCGAGCATCCAAACGTGGG

ACGGGGTGCTGCAAGGGGAGCGGCTGCTCACCATGTCCTGCAGTGACAAGATTGC

ACGCTGGAACGTGGTGGGCATCCAGGGATCCCTGCTCAGCATTTTCGTGGAGCCCA

TTTACTTCTCGAGCATCATCCTGGGCAGCCTTTACCACGGGGACCACCTTTCCAGG

GCCATGTACCAGCGGATCTCCAACATAGAGGACCTGCCACCTCTCTACACCCTCAA

CAAGCCTTTGCTCAGTGGCATCAGCAATGCAGAAGCACGGCAGCCAGGGAAGGCC

CCCAACTTCAGTGTCAACTGGACGGTAGGCGACTCCGCTATTGAGGTCATCAACGC

CACGACTGGGAAGGATGAGCTGGGCCGCGCGTCCCGCCTGTGTAAGCACGCGTTG

TACTGTCGCTGGATGCGTGTGCACGGCAAGGTTCCCTCCCACTTACTACGCTCCAA

GATTACCAAGCCCAACGTGTACCATGAGTCCAAGCTGGCGGCAAAGGAGTACCAG GCCGCCAAGGCGCGTCTGTTCACAGCCTTCATCAAGGCGGGGCTGGGGGCCTGGG

TGGAGAAGCCCACCGAGCAGGACCAGTTCTCACTCACGCCCTGACCCGGGCAGAC

ATGATGGGGGGTGCAGGGGGCTGTGGGCATCCAGCGTCATCCTCCAGAACCTCAC

ATCTGAACTGGGGGCAGGTGCATACCTTGGGGAGGGAGTAGGGGGACACGGGGG

ACCACCAGGTGTCCACGGTTGTCCCCAGCATCTCACATCAGACCTGGGGCAGGTGC

GCAGTGTGGGGAGGGGATGGGGTGCGTCAGGGCCCAGCATCGCCGCCTGGCATCT

CTCTGCCGCAGCATTTCCCCTTCTGAACCGTCCAGTGACTGCTTTCAATCTCGGTTT

ACGTTTAGAAATTGAGTTCTACTGAGTAGGGCTTCCTTAAGTTTAGGAAAATAGAA

ATTACTTTGTGTGAAATTCTTGAATAAATAATTTATTCAGAGCTAGGAATGTGGTTT

ATAAAATAGGAAGTAATTGTGTCAGGTCACTTTTATGCCACATTATTTTAATTGCA

AAAAAGCATCTATATATGGAGGAGGGTGGGAAAATAGAGGTAGGAAATAGTAGC

CTAAAGGAAATCGCCACACGTCTGTCTAAACTTAGGTCTCTTTTCTCCGTAGGTAC

CTCCCTGGGTAGTTCCACACACTAGGTTGTAACAGTCTCTCCCTGAGGAGCAGACT

CCCAGCATGGTGTAGCGTGGCCCTGTCATGCACATGGGGTCCCGCAGCAGTGACTG

TGTGTCCTGCAGAGGCGTGACCCAGGCCCCTGTAGCCCTCAGCCTCCTCTAGAAGC

TTCT GT ACTCCTT GT AGGAT C AGATC ATGGAAAACTTTT CT C AGTTT ACTT CT AAGT

AATCACAGATAATACATGGCCAGTAATCCCAGGCTGGCCATTCATTCAGGTTTTTT

AAAGGATATTTAACTTTTATGGACTAGAAGGAATCACGAGGGCTACTGCACAATA

CATGGCCTAAGTTCCCTCTGTTCCTTCCTCTGAATCGAATGGATGTGGGTGACCGC

CCGAAGGCCTTCACAGGATGGAAGTAGAATGATTTCAGTAGATACTCATTCTTGGA

AAATGCCATAGTTTTAAATTATTGTTTCCAGCTTTATCAAAGACATGTTTGAAAAAT

AAAAAGCATCCAAGTGAGAGCTGGTGAGACCACGTGCTGCTGGCGTAGTGTAGGC

CAGACATTGACAGTCCTGACGGGAGCTCAGGGCTGCCCAGCGCCCAGCGTGCACG

GGACGGCCCCACGACAGAGGGAGTCAGCCCGGGAGGTCAGGAGCGCGGCGGGCG

AGGGCCCTGTGTGGACCACCTCCACCAAGCTCAGAGATTTGCACCAGGTGCCTTGT

TGCCTCCGCTCAGGATGAAAGAGGAGCTGAGAGAAGTGCTCTGCCTGCCAGTGCA

GTGCCCAGCTCCAAGGCTCTAGAGGGTGTTCAGGTGGGTCTCCTGGGGCCATGGG

GAGAGATTGGTGCAGACCTTACCCCACAGCATACACCTGCCACAGCGAAATCCAG

GGTGTTGGCACCTGTGTGTCCGTGATGAGCCTAGGAAACCAGAGCAGGGGCAGAG

GGGCGTCATCCTCCCACCGGACGCTGGGAGCTCAGACCCCAAAACTGAAACACCG

TGGCTTCGGCGGGGGGTGTGCCTCCTGATGTCAGGAGCCCCATCCACGTGTGTCCA

CACAGATCTCGTCGCAGCACGGCAGGAAGGGGTGCTGCTTAGGGCTCATTGTTGG

GGACATGACCGGGTTCAGCGGCTAGAACATCTGCCCCACAGCAGCCTCCTCCTCCA CCGAAGAGGGTAGTTGTCTCCCTGAAGCAGTCACAGCAGGCGTCTCTGCCGCTCCG

TCACCACAGTGGGGTTTTGTTCAGGCAGATCGCGCTGGGGTTCTGCACCTGCAGAA

GGAGAGGGGTCTGTTGTCGCTGGCTTTCCCCCAAGCAGGCTCTTGCACACTCTAGA

AAAAACACCTTGTAAGTCTGTGCATTTTTATTGTCTTGATAAATTGTATTTTTTTCT

AATGGGGATTGGGAGATGGACTTCGTTTTTAAAAATATGTGGATTTTGGTTACCAA

GTTTAGTGTTAATATATTCCATATACATACAAAACTACCCGGTATGTCTGGCTTTTC

CCTTCTGTCAGGTAATAGCTAAAGTCAGCATGATTGCTCCCTGTACCACCCCAAAT

AAGTGAGTGCCTCACCTTGTGGGGCCTGAGCAGCTACCTTGAGACCATGTGAGGTG

GCACCTTTCCGGGGTGGACTCGTGCGGCCTTGAGGACAGGCACAGGGCACCCTAT

CCCAAGCCGTCCAGGCAGGAGGAAGGCAGCCAAGGCAACTGGGTTCTGGGAGCCC

TGGGTGGGGCAGCTGTGGGGAGGAACTGGGTTCGGGGAGCCCTGGGCGGGGCGGC

TGTTGGGGGGAACTGGGTTCGGGGTGCCCTGGGCAGGGGGCTACTGGGGGGCGGC

TGTGAGGAGGAGTTGGGTTCAGGGAGCCCTGGGCGGGGTGGCTGTCAGGGGGAAC

TGGGTTCCGGGAGCCCTGGGCCGGGGCAGGGGGCGGCTGTAGGAAGGAACTGGTT

TCGGGGAGCCCTGGGCGGGGCGGCTGTGGGGAGGAAGGTGACGTGCAGGGGACC

AGAGGCTCTGCACTGCTCCTAGGACAGCTCATCTGTAATCAGAAAAAAAATAAAC

AAAATACAGAACGCTGACTCCTCCGTGAGACAGATCGGGGACCTTAGCACTTTAA

TCCCTCCCTTCTGAGCGCTCGGTGTGCACTTTTAGACTATAGCTGTTTCATTGACGT

GTCACTCTCCATCCAGTGTCCTTGATGTGGCTTTTAGAGACTTAGCAGAAAATTCG

ACACAAGCAGGAACTTGATTTTTTAAGAAAAAATATTACATTTTGAGGACATTTTG

AC AAGT AGGGGA AGAGAGGGCTT CT GTT GTTTTGTTTTGTTTT GTTTT GTTAACT AA

ACCTGAAGTATTAATTCCACAAAGACACTGTCCCTCAGGACCACTCAGGTACAGCT

CTGCCAGGGACAGAGTCCTGCTAGTGGGAGGTCTCAGGTGGGGCGGTGTGTTCTGT

GCCATGAGGCAGCGACAGGTCCAGATGGATGTCGTCACCACCTTCCTCAGCTCTCA

TCACCTGGTCGTACGCCAGGCCCACCTCTTCCCAGCAAGGGACGCCAAAGAACTG

CAGTTTTTATTCTGAGTCTTAATTTAACTTTTCATCATCTTTTCCTATTTTGGAGAA T

TTTTTGTAATTAAAAGCAATTATTTTAAAATGTGCAAGCCAGTATCTCACAAGGCA

TGGATTTCTGTGGAATTTATTTTTATTCAAATAACCATATTTATCTCCAGGCTGTGG

AATCGCCACTTTCTTTGTGAAGACAGTGTCTCTCCTTGTAATCTCACACAGGTACAC

TGAGGAGGGGACGGCTCCGTCTTCACATTGTGCACAGATCTGAGGATGGGATTAG

CGAAGCTGTGGAGACTGCACATCCGGACCTGCCCATGTCTCAAAACAAACACATG

TACAGTGGCTCTTTTTCCTTCTCAAACACTTTACCCCAGAAGCAGGTGGTCTGCCCC

AGGCATAAAGAAGGAAAATTGGCCATCTTTCCCACCTCTAAATTCTGTAAAATTAT AGACTTGCTCAAAAGATTCCTTTTTATCATCCCCACGCTGTGTAAGTGGAAAGGGC

ATTGTGTTCCGTGTGTGTCCAGTTTACAGCGTCTCTGCCCCCTAGCGTGTTTTGTGA

CAATCTCCCTGGGTGAGGAGTGGGTGCACCCAGCCCCGAGGCCAGTGGTTGCTCG

GGGCCTTCCGTGTGAGTTCTAGTGTTCACTTGATGCCGGGGAATAGAATTAGAGAA

AACTCTGACCTGCCGGGTTCCAGGGACTGGTGGAGGTGGATGGCAGGTCCGACTC

GACCATGACTTAGTTGTAAGGGTGTGTCGGCTTTTTCAGTCTCATGTGAAAATCCT

CCTGTCTCTGGCAGCACTGTCTGCACTTTCTTGTTTACTGTTTGAAGGGACGAGTAC

CAAGCCACAAGAACACTTCTTTTGGCCACAGCATAAGCTGATGGTATGTAAGGAA

CCGATGGGCCATTAAACATGAACTGAACGGTTAAAAGCACAGTCTATGGAACGCT

AATGGAGTCAGCCCCTAAAGCTGTTTGCTTTTTCAGGCTTTGGATTACATGCTTTTA

ATTTGATTTTAGAATCTGGACACTTTCTATGAATGTAATTCGGCTGAGAAACATGTT

GCTGAGATGCAATCCTCAGTGTTCTCTGTATGTAAATCTGTGTATACACCACACGT

TACAACTGCATGAGCTTCCTCTCGCACAAGACCAGCTGGAACTGAGCATGAGACG

CTGTCAAATACAGACAAAGGATTTGAGATGTTCTCAATAAAAAGAAAATGTTTCA

CTA

Homo sapiens adenosine deaminase RNA specific B1 (ADARB1, also known as ADAR2) protein (NP_001103.1; SEQ ID NO: 10))

MDIEDEENMSSSSTDVKENRNLDNVSPKDGSTPGPGEGSQLSNGGGGGPGRKRPLEEG

SNGHSKYRLKKRRKTPGPVLPKNALMQLNEIKPGLQYTLLSQTGPVHAPLFVMSVEV

NGQVFEGSGPTKKKAKLHAAEKALRSFVQFPNASEAHLAMGRTLSVNTDFTSDQADF

PDTLFNGFETPDKAEPPFYVGSNGDDSFSSSGDLSLSASPVPASLAQPPLPVLPPFP PPSG

KNPVMILNELRPGLKYDFLSESGESHAKSFVMSVVVDGQFFEGSGRNKKLAKARAAQ

SALAAIFNLHLDQTPSRQPIPSEGLQLHLPQVLADAVSRLVLGKFGDLTDNFSSPHA RR

KVLAGVVMTTGTDVKDAKVISVSTGTKCINGEYMSDRGLALNDCHAEIISRRSLLRF L

YTQLELYLNNKDDQKRSIFQKSERGGFRLKENVQFHLYISTSPCGDARIFSPHEPIL EEP

ADRHPNRKARGQLRTKIESGEGTIPVRSNASIQTWDGVLQGERLLTMSCSDKIARWN V

VGIQGSLLSIFVEPIYFSSIILGSLYHGDHLSRAMYQRISNIEDLPPLYTLNKPLLS GISNA

EARQPGKAPNFSVNWTV GDS AIEVINATTGKDELGRASRLCKHALY CRWMRVHGKV

PSHLLRSKITKPNVYHESKLAAKEYQAAKARLFTAFIKAGLGAWVEKPTEQDQFSLT P

Mus musculus adenosine deaminase (Ada), transcript variant 1, mRNA (NM_001272052.1; SEQ ID NO: 11) AGCGTGGGCGGGGCTGTGCCGGGGCAGCCCGGTAAAAAAGAGCGTGGCGGGCCG

CGGTCTCTGAGAGCCATCGGGAAGCGACCCTGCCAGCGAGCCAACGCAGACCCAG

AGAGCTTCGGCGGAGAGAACCGGGAACACGCTCGGAACCATGGCCCAGACACCCG

CATTCAACAAACCCAAAGTAGAGTTACACGTCCACCTGGATGGAGCCATCAAGCC

AGAAACCATCTTATACTTTGGCAAGAAGAGAGGCATCGCCCTCCCGGCAGATACA

GTGGAGGAGCTGCGCAACATTATCGGCATGGACAAGCCCCTCTCGCTCCCAGGCTT

CCTGGCCAAGTTTGACTACTACATGCCTGTGATTGCGGGCTGCAGAGAGGCCATCA

AGAGGATCGCCTACGAGTTTGTGGAGATGAAGGCAAAGGAGGGCGTGGTCTATGT

GGAAGTGCGCTATAGCCCACACCTGCTGGCCAATTCCAAGGTGGACCCAATGCCCT

GGAACCAGACTGAAGGGGACGTCACCCCTGATGACGTTGTGGATCTTGTGAACCA

GGGCCTGCAGGAGGGAGAGCAAGCATTTGGCATCAAGGTCCGGTCCATTCTGTGC

TGCATGCGCCACCAGCCCAGCTGGTCCCTTGAGGTGTTGGAGCTGTGTAAGAAGTA

CAATCAGAAGACCGTGGTGGCTATGGACTTGGCTGGGGATGAGACCATTGAAGGA

AGTAGCCTCTTCCCAGGCCACGTGGAAGCCTATGAGGGCGCAGTAAAGAATGGCA

TTCATCGGACCGTCCACGCTGGCGAGGTGGGCTCTCCTGAGGTTGTGCGTGAGGCT

GTGGACATCCTCAAGACAGAGAGGGTGGGACATGGTTATCACACCATCGAGGATG

AAGCTCTCTACAACAGACTACTGAAAGAAAACATGCACTTTGAGGTCTGCCCCTGG

TCCAGCTACCTCACAGGCGCCTGGGATCCCAAAACGACGCATGCGGTTGTTCGCTT

CAAGAATGATAAGGCCAACTACTCACTCAACACAGACGACCCCCTCATCTTCAAGT

CCACCCTAGACACTGACTACCAGATGACCAAGAAAGACATGGGCTTCACTGAGGA

GGAGTTCAAGCGACTGAACATCAACGCAGCGAAGTCAAGCTTCCTCCCAGAGGAA

GAGAAGAAGGAACTTCTGGAACGGCTCTACAGAGAATACCAATAGCCACCACAGA

CTGACGCAGGGCGGGTCCCCTGAAGATGGCAAGGCCACTTCTCTGAGCCTCATCCT

GTGGATAAAGTCTTTACAACTCTGACATATTGACCTTCATTCCTTCCAGACCTTGGA

GAGGCCAGGTCTGTCCTCTGATTGGATATCCTGGCTAGGTCCCAGGGGACTTGACA

ATCATGCACATGAATTGAAAACCTTCCTTCTAAAGCTAAAATTATGGTGTTCAATA

AAGCAGCTGGTGACTGGTATCTTGCAGCACATGGTGAATATGGTCTCGGGGCTGCT

GGCTAGGATGCTAAGAAAGGAGGAGCCCTGGGCCCTACGCTGAGTGTCAGGCTGG

GGAGCCAGGGTCTCTTTCCTGCAGAAGCGATTCTTTCCCAGAGGGGCTGTTGGAGC

AGATGCTCCTGAACTCTCCGCCCCTTTAACCAGTCCTTTGGATTTATTTTTATTATT T

TT A A AT ATTT A ATT AT GTTT AT GT AT AT GGGT GTTTT Homo sapiens adenosine deaminase tRNA specific 2 (ADAT2), transcript variant 1, mRNA (NM_182503.3; SEQ ID NO: 12)

CTCTGCCGCGGGCTCTGTAGCTGAGTGGTGGCTGGGTATGGAGGCGAAGGCGGCA

CCCAAGCCAGCTGCAAGCGGCGCGTGCTCGGTGTCGGCAGAGGAGACCGAAAAGT

GGATGGAGGAGGCGATGCACATGGCCAAAGAAGCCCTCGAAAATACTGAAGTTCC

TGTTGGCTGTCTTATGGTCTACAACAATGAAGTTGTAGGGAAGGGGAGAAATGAA

GTTAACCAAACCAAAAATGCTACTCGACATGCAGAAATGGTGGCCATCGATCAGG

TCCTCGATTGGTGTCGTCAAAGTGGCAAGAGTCCCTCTGAAGTATTTGAACACACT

GTGTTGTATGTCACTGTGGAGCCGTGCATTATGTGTGCAGCTGCTCTCCGCCTGAT

GAAAATCCCGCTGGTTGTATATGGCTGTCAGAATGAACGATTTGGTGGTTGTGGCT

CTGTTCTAAATATTGCCTCTGCTGACCTACCAAACACTGGGAGACCATTTCAGTGT

ATCCCTGGATATCGGGCTGAGGAAGCAGTGGAAATGTTAAAGACCTTCTACAAAC

AAGAAAATCCAAATGCACCAAAATCGAAAGTTCGGAAAAAGGAATGTCAGAAAT

CTTGAACATGTTCTGATGAAAGAACCAAGTGACCCAAAGTGACCTGGACAAGATT

CAT AGACT GAAAGCT GTT GAC ATCGTT GAAT CAT AT GTTTAT AT ATT GTTTTT AATC

TGCAGGAAAATGGTGTCTCTCATCATTTGCTCTGTTAAGGGAACAAATTAGCACTT

TTTAGAAGTCTGACAATTGTAAACAGTTATTAGCTTTTCCAGAAGCTGATTCCCATT

TT A AGAT GGGGG AA A ATT A AGGTTT GAGGTTTT AGA A ATT AGC A AGT AGT GC AT A

CCCTTCTAGCCACAAGTGCCCAGTCCAGGCAAGTGCTGACTTCTTAGAGAATGTGT

GGCCAGACCCAGGGACCTGGAGTGTGTTTGGACTGCAGTTTGCCACCCTGAGAAC

ACCTTCTCCAGGACTGGCATTTCAGAATCAGATTCTTCATTTTTTGCAGCTACGATG

TTCTTCCAGGGCACTGGGGGCTGTGACTTCTCTCTAAATTGTATATAAGTTGTGTAT

ATAGAGACCATAATTATATGGTCCTTAGAAAAGACTTTGCTTTTATAAAGCATTTA

GAA AA AAT GC AT ACTTTT AAAAC AAGT GCTTGAGTT GT C ACTT AAAAATT AT AGC A

TATTGCTATAATAAAACCTTATTTATGTCTTATTTGAAGATGAATAGTCTTAAAAGA

TAAAGACATAAATGGGACAATTGTTATTGAGCAAAAAACCAAATTATCCCACCCT

CATGGAGCTTATATTCTAGCAAGGGGAGATGGATATGATAGATTACACAGTTTATT

GGAGGACAATAAGAGTTATGGCAAAAAGCAAAAGGAACACAGGGTAAAGGGGAT

AGGTGCCATTTGGTGGTGAGAATGCTGACTGAAAAATAGAATGATCAATTTAATCT

GAAACAAATGGTTATTTCTTTTATAATCCATATAATAAATTTAAAATCTAAAATGT

AAAATTTTGAACACAACACTGGAAAGGGTATCCACAGCAGGAAGTCCCCAGTTCA

CCTCCATGACTACAGGGCAGCTTTGCACAGCCCTCTGGGCGCACTGTGTGCCTCTG

CCCAGAAGGGGGCCTCGCCGTTCCACCAGAAGCTCAGCTCCAGGCCCTGGAGGGG CTGCTGCTCCTCAGTTGCATTTCTTCAGTAGATTCATTTCCTTGATGCAAAGCATCT

GTATTTGTTGGTTCTGTCATTTGAGCGATGTCTCTGACTTGTTTGTTTTGAATTACA T

TACAGGCTGGAATGTAATTGTGGTGAAAGTATTTTTATATTGCTGAGAGTAGCAGC

TAATCACAGTTACATGCTTCAGAGGACTTATAATTGCTTGGTTTTGTGTGTGTGTGT

GT GT GT GT GT GT GT GT GT GT GT GT GTTT A ACT GC ATTT GA A A AGTTTT AT GGAGA AT

ATGCATGATTTTAAATCTGTGATAATGTTACATGCACCTTCAATTTCATCCACTTTA

AAAATTATCTTCTCATTGAATTTTAGTGCTTCTACTAGTTTGTTCCTTTTTGCAGTT G

GTCGTAATTCATTTCTGGCTTCTTATGCTTTCCTGCAAGCAGATTTCATTGCATTTA T

TGTGTTCATATCATTTTCTTGGGGATTATTTGTAGGACAACCAACCTGGAGTTTTGC

CTCTCTAGAGTACCACCCAGTAAGTCTGGCTGAGCATCTTATGTCCAGTAGGTTCT

TGGTAAACATTTGCTAAATGAAATTACTGATTGAAATTTGGGGAAAAGTGAATAA

GAAGACTATCTAGGACAAAAAGCCAAAGCCGAAAATAGTATATGAGCATTCTAGC

CCAGAGACTGTCGCTACTAAAAGAATGAAGGAAATAATAAAGTGATAGACAGGG

AAGGATAGAAAAGACTTAACAATATACATATGTTCCGTCTTTGCTGTTTTGGAGAA

TGATGGATAAGTAGTGTTTCCTGATTCTGAAGCATAGCTGAACAATTTAATTGTGG

TTTACCATCTTTTTGGTTCCCTCTTCAGTAATTAACCTATCGAAAATCTGTCCTAAA

TGTTTGGACTGGGGCACAGTTCCCTCCATCGCTTTGGGAGAAAATCATTAATATGG

CATACTGCAGATTGGAGGGCAGGACCACTGAGGGTGTCATAGACATTAGCTCTAT

GGA ATT CT GCT AGC AATTTC C AAGT GAC AGT GAGGAATTAT GGATAT ATGTT GAGG

TCATTCAGCTTCCTGAGTACCACATTCCCCAGCTACTTAGACACGGGTTAAAATAT

TAAGATGTCCTAGTTCAACAGCTTGAATTCCATTGATTGATACTGATAGTGCCTGT

CCAAGACACCAGCTGAAAGACTTGTTTTGTGTACAAAATAGTTCTGAAAGTGGTGA

GATACAAAAAGGTTTTAGAATCACTGCCCTGTTGAGAGAAATTAGGGGGAAATGA

TTACATTTAGAAGCTGCTAGAGTTATCCAGTGTTTGCTGGTCTTTGCAACAAACTGT

GGAGAATGGGTGGTATGTAATGCTTTGGTAGGCTTCAATCACTGATAAAAGATCAT

GTTAAAATATCTTTGTGCTTTCTTGTTACTTGGCACAACCATCTCTTCCTGTGTTGT A

TTTGGAGTATCATGGAGAGAAAATAGATGGCCAAGAGCTTCAGTGTAGGCAAGAA

CTCTTAATTTTTCTTTAAACTTTTTACTGGGAAAAGTATATATATATAAAATACACA

CACACACACACACACACACACACACACACACACACACACAAACACAACACACCAT

GGCCCTTTACCCCGAAATGCTTCAGTATAGTTATTGACTTAAGTAAATTTAACATT

GATATACTTGAATCTATCATTTGTATTACAGTTTTGTCAGCTGACCCAATAATGTCC

TGTAAAGAAGTTCTCCCACTACCCTATAATCCCAGGTCCAGTCTAGGGTCCAGCAT

TACATTTACTTGTCTTGAATCCAGCTTTTTCTTTTTTTTTTTTTTTTTTGAGATAGG TC TCACTCTGTCGTCCAGTGGCATGATCACAGCTCACTGCAGCCTCAACCTGGCTCAA

GCAATCCTTCCTCCTCAGCCTCCTGAGTAGCTGGGACCACAGACTCATGTCACCAC

ACCT AATTTTTTTTTTTTTTTTTTTTTGT AGAGAC AAGGT CT C ACT AT GTTGC CC AGG

CTGGTCTTGAACTCCTAGGCTGAAGCAATCCTCCTTCCTTGGCCTCCCAAAGCACT

GGGATTATAGACGTGAGCCACTGCACCGGTCTGCCTTTAGCTTCTTTTAGTCTAGA

ACATTTTCACTGGCTTTCTTTGTCTTTTATGACATTGACATTTTTAAATAATACAGT C

ATTTTGCCTCCTTTCTGTTTTCTTCTTCTTTTTTTAAATAATAGAATGGTCCTTGTT TT

AAATTTATTTGATATTTTCTTGTGATTAGATTCAGGTGCTGGTTGATGTTAAGTTCC

TCACAGGATATCACATCTGGAGGCACACAAAGGCCGTCACACCAAGGTGATGTCA

ATTTTGGTCATCTGGTCAAGGTGTTGTCCTATTCCTTCACTATATAGTTACCTTTTT T

CTCT GTT GC AATGAAT AAGC AGTCTGT GGGAAGAGGAGCT GTT AC ATTTT AAAC AG

AAA ATGT ATTT GAC ACT GAT GGAAAGGAGAGGAGGAAAATTAAT GAC AT AAATTT

CAAAGCAACTATTAAATTATTTGATTGCATTCTTCCTCTTTTACTGTCTGCCAAAAT

TGATAAAAAAAATTTTTCTAATAAGAATGTTTTAAATAGTGATATCTTAATAAGCA

TCAAAATTAAGCCTGAGAAATAAATTCTTTCCTTCCTAATTTCCTCCTCAGCAAAA

GTAATAATTATATAAATTTCATTATGCCTGATAAGATAGGGTTTTGGAAAATAGAC

CTAAGATGTTTCTGATACTGCAGATGACCTATGGTGATCCAATGGGATAAACACTC

TAGGTAGGTTGTCATTTGGTCATAAAATATGAGTTATCTTGGGTTTCCAT AGAGAC

ATCTAGACTTAAAATGTTGTAAGCACTGCTACTTTCAAAATGTCAGTAAAAATAGC

AAAAGCCAAAGCTCTTGAAAAAATTACTTAAATCTTTTTTAAAAGTAGTATAGCGC

CTTGTTAAAAATCTGTGGTGATGCCAAAGCTTGTCTTTCCCAGTGGTCCTACGTGA

ACTGGCCTTATAGCCCCAGGGAAACCAGACACCAGGAATTGGTTTCTCTGCCTTTT

GGC AAAGGAATAAGACT AC ATT GACTT CAT CT AT GAAGAC AACT GCC A ACT ATTT C

CTTTGTAAATTGCTAATTTTGTGTAGTGAGGAAAGGAGCGATGGGCGACGTGATTT

TT AT GGATT AGACT GGT GAGTT CT GCT GAAAGTTT GAC ATCTTTAGGAT CTT AC ATT

TTCTTCAAGTTGAGCTAATGAAAACAGGCTCGTGACTATTTATCACCTGATTTCTA

AGTGGATATTGGGTTGAACACCACATATCCATGACTATTAAGGAGGCTTCATGGTG

TAGTTTGACAAAGGCTCTCTCCTTGACCAAACTTCAGTCAGGCCCTAAGTCCTCTTT

TTAACCAGGCCTCCACCTTGGCCCCCATTCTTGATGGGCCTATACAGCCCAGCTTT

AGCAAGAATCCTGCTAAGCTAGTTTAGAGAGAATCCCACATCCCCAATATCTATGA

AATTTCTCATCCCCTACTTTTGATGTGTAAGTCCTTGGCCTCCCTTCAACGAGAAGC

CTGTTAAGTTCATTTTGCAAGAACTCTACTCTTGATATCTCCTCTTAGTAATTTCCT

AATCACTGACCCCCTCACTCTGCCCATTAGTTATAAACCCCCACATGTTCTGGTTGT ATTCAGAGCTGAGCCTGATCTCTTCCTCTTGTTGGGATAGTTTTAAAACCTGCGATA

GTTTTAAAACCTATCACTGTAGTCCTGAATTAAGTCTTCCTTACCTTAACAAGTGTC

AAAATAAATTTTTCTTTAACATGTTGAAGCATGAACTTGAGAATCTAGAGCAGGAG

TCCACAAAGTATGGCCCATGGGCCATATCCAGCCCGCTGCCGGTTTCGGTACCACT

CATGACTTAAAAATGGGTCTTACAATTCTGAGTGATTGAAAAAAAATCAAAAGAA

GGATAATATTTAGTGACCCATGAACCTTATATGGCAATCAAATTTCAGTGTCCATA

AATAAAGTTACATTGGATGACAGCCATGCCCATTTGTTTCTGTGTTGTCTGTGGCTG

CTCGTGTGCTACAATGGCAGAGTTGAGCAGTGGTGACAAACCATGCGACTCACAA

AGGCCTAAAATATTTAGCGTCTGGCCCTTCGAGAAAATGTTAGCTGCCCCTGGTCT

AGAGTAGGT AAAAGGCT GAGATT GGAAGCT GCTT GTT C A AATT CTGT GATT GGAAC

CGAATGATGTGGCTCATTGTACAGCTCATGGTGAATTGCTTCAGTACCATGGTTTT

GTTTTTTCCTTTTGAAAAGTTGGTCTATAAATGTAAAGGAAAAATCTAAGATACC A

AAATATGTTTTCTGGCTTAGAATGTTTTATTTCCTTGTATACATTTTAAGAGAGTGG

CAAGGAGAAAAGATAATGTATCATTTTATTTGGGTTTAGAATAAATAATACATTTT

ATTTATGATCA

Homo sapiens adenosine deaminase tRNA specific 2 (ADAT2), transcript variant 1, protein (NP_872309.2; SEQ ID NO: 13)

MEAKAAPKPAASGACSV S AEETEKWMEEAMHMAKEALENTEVPV GCLMVYNNEVV GKGRNEVNQTKNATRHAEMVAIDQVLDWCRQSGKSPSEVFEHTVLYVTVEPCIMCA AALRLMKIPLVVYGCQNERFGGCGSVLNIASADLPNTGRPFQCIPGYRAEEAVEMLKT FYKQENPNAPKSKVRKKECQKS

Mus musculus adenosine deaminase (NP_00125898E 1 ; SEQ ID NO: 14)

MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLS LP GFLAKFDYYMPVI AGCREAIKRI AYEFVEMKAKEGVVYVEVRY SPHLLANSKVDPMP WNQTEGDVTPDDVVDLVNQGLQEGEQAFGIKVRSILCCMRHQPSWSLEVLELCKKYN QKTVVAMDLAGDETIEGSSLFPGHVEAYEGAVKNGIHRTVHAGEVGSPEVVREAVDI LKTERV GHGYHTIEDEALYNRLLKENMHFEV CPW S S YLTGAWDPKTTHAVVRFKND KANY SLNTDDPLIFKSTLDTDY QMTKKDMGFTEEEFKRLNINAAKS SFLPEEEKKELLE RLYREYQ

Cytidine deaminase is an enzyme that in humans is encoded by the CDA gene, which has the following mRNA sequence: Homo sapiens cytidine deaminase (CDA), mRNA (SEQ ID NO: 5; NM_001785.3):

CCCGCTGCTCTGCTGCCTGCCCGGGGTACCAACATGGCCCAGAAGCGTCCTGCCTG

CACCCTGAAGCCTGAGTGTGTCCAGCAGCTGCTGGTTTGCTCCCAGGAGGCCAAGA

AGTCAGCCTACTGCCCCTACAGTCACTTTCCTGTGGGGGCTGCCCTGCTCACCCAG

GAGGGGAGAATCTTCAAAGGGTGCAACATAGAAAATGCCTGCTACCCGCTGGGCA

TCTGTGCTGAACGGACCGCTATCCAGAAGGCCGTCTCAGAAGGGTACAAGGATTT

CAGGGCAATTGCTATCGCCAGTGACATGCAAGATGATTTTATCTCTCCATGTGGGG

CCTGCAGGCAAGTCATGAGAGAGTTTGGCACCAACTGGCCCGTGTACATGACCAA

GCCGGATGGTACGTATATTGTCATGACGGTCCAGGAGCTGCTGCCCTCCTCCTTTG

GGCCTGAGGACCTGCAGAAGACCCAGTGACAGCCAGAGAATGCCCACTGCCTGTA

ACAGCCACCTGGAGAACTTCATAAAGATGTCTCACAGCCCTGGGGACACCTGCCC

AGTGGGCCCCAGCCCTACAGGGACTGGGCAAAGATGATGTTTCCAGATTACACTC

CAGCCTGAGTCAGCACCCCTCCTAGCAACCTGCCTTGGGACTTAGAACACCGCCGC

CCCCTGCCCCACCTTTCCTTTCCTTCCTGTGGGCCCTCTTTCAAAGTCCAGCCTAGT

CTGGACTGCTTCCCCATCAGCCTTCCCAAGGTTCTATCCTGTTCCGAGCAACTTTTC

TAATTATAAACATCACAGAACATCCTGGA

The human CDA-Q ncoded protein is:

Homo sapiens cytidine deaminase (CDA), protein (SEQ ID NO: 6; NP_001776.1)

MAQKRPACTLKPECV QQLLVCSQEAKKS AY CPYSHFPV GAALLTQEGRIFKGCNIENA CYPLGICAERTAIQKAVSEGYKDFRAIAIASDMQDDFISPCGACRQVMREFGTNWPVY MTKPDGTYIVMTV QELLPS SFGPEDLQKTQ

The cytidine deaminase gene encodes for an enzyme involved in pyrimidine salvaging. The encoded protein forms a homotetramer that catalyzes the irreversible hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine, respectively. It is one of several deaminases responsible for maintaining the cellular pyrimidine pool. Mutations in this gene have been described as associated with decreased sensitivity to the cytosine nucleoside analogue cytosine arabinoside, used in the treatment of certain childhood leukemias. Apobec-1 is an RNA- specific cytidine deaminase that possesses homology to other members of the cytidine/deoxycytidine deaminase family, particularly within the domain HVE-PCXXC proposed to coordinate zinc binding and catalysis. APOBEC1 (rat) is an apolipoprotein B mRNA editing enzyme. The APOBEC1 protein is responsible for the postranscriptional editing of a CAA codon for Gin to a UAA codon for a stop codon in the APOB mRNA. APOBEC1 has also been described as involved in CGA (Arg) to UGA (Stop) editing in the NF1 mRNA. APOBEC1 has been described to be expressed exclusively in the small intestine. The rat apobec-1 gene spans 16 kb and includes one untranslated (exon A) and five translated exons (exons 1-5).

The wild-type mRNA sequence of rat APOBECl is the following:

Rattus norvegicus apolipoprotein B mRNA editing enzyme catalytic subunit 1 (Apobecl), mRNA (SEQ ID NO: 3; NM_012907.2)

CCAAGGTCCTGCTTTTGCATCTTAAGCCGCCCCTCCTTTCTCCAACAGACACGAGG

AGCAAAGGGTAACTGAGAGGGAGTAGCAGGTAAAGCCCACAGTGTTCTCACCGGG

TCACCCTGAGGACTTCTTAGTTATAGGAGCTGCTTCATTCTCTCCGATCCGTGCTGG

CTTCTCTCCCACTCTCACTTGAAGGAAGGGGAAAGCTTTCTAAGTTTAGCCGTCAC

TCTGGAATTTAACATCATCGATGTTCTACTGTGCAGCGTTGATGGTTCGATGGGCT

CTCTCCAGGGAGGACGGAAATCCAGATGCCACTTCCTTCTTCATTTACATAGCATT

CATATCACGTCGCGACTGACGCTCAGGAATGAGTCATCCTGTGTCCCTGCAGGTGG

CCGTGGGCACACCTGAGGAAGCAAAGTCCGGCACGCAGCTGGCAGCAGCCATCGC

CGCAACATAAGCTCCCGAGGAAGGAGTCCAGAGACACAGAGAGCAAGATGAGTT

CCGAGACAGGCCCTGTAGCTGTTGATCCCACTCTGAGGAGAAGAATTGAGCCCCA

CGAGTTTGAAGTCTTCTTTGACCCCCGGGAACTTCGGAAAGAGACCTGTCTGCTGT

ATGAGATCAACTGGGGAGGAAGGCACAGCATCTGGCGACACACGAGCCAAAACA

CCAACAAACACGTTGAAGTCAATTTCATAGAAAAATTTACTACAGAAAGATACTTT

TGTCCAAACACCAGATGCTCCATTACCTGGTTCCTGTCCTGGAGTCCCTGTGGGGA

GTGCTCCAGGGCCATTACAGAATTTTTGAGCCGATACCCCCATGTAACTCTGTTTA

TTTATATAGCACGGCTTTATCACCACGCAGATCCTCGAAATCGGCAAGGACTCAGG

GACCTTATTAGCAGCGGTGTTACTATCCAGATCATGACGGAGCAAGAGTCTGGCTA

CTGCTGGAGGAATTTTGTCAACTACTCCCCTTCGAATGAAGCTCATTGGCCAAGGT

ACCCCCATCTGTGGGTGAGGCTGTACGTACTGGAACTCTACTGCATCATTTTAGGA

CTTCCACCCTGTTTAAATATTTTAAGAAGAAAACAACCTCAACTCACGTTTTTCAC

GATTGCTCTTCAAAGCTGCCATTACCAAAGGCTACCACCCCACATCCTGTGGGCCA

CAGGGTTGAAATGACTTCTGGGAGTTGGGGATGGATGAAATGACTCCTTGTATGTC

TTGACAGCAAGCATTGATTACCCACTAAAGAGCGACTGCCACAAGGAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The corresponding wild-type rat APOBEC1 protein sequence is the following: Rattus norvegicus apolipoprotein B mRNA editing enzyme catalytic subunit 1 (Apobecl), protein (SEQ ID NO: 4; NP_037039.1)

MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTN

KHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI ARLY

HHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRL

YVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK

Activation-induced cytidine deaminase, also known as AICDA and AID, is a 24 kDa enzyme which in humans is encoded by the AICDA gene. It creates mutations in DNA by deamination of cytosine base, which turns it into uracil (which is recognized as a thymine). In other words, it changes a C: G base pair into a U: G mismatch. The cell's DNA replication machinery recognizes the U as a T, and hence C: G is converted to a T: A base pair. During germinal center development of B lymphocytes, AID also generates other types of mutations, such as C: G to A: T.

Homo sapiens activation induced cytidine deaminase (AICDA), transcript variant 1, mRNA (NM_020661.4; SEQ ID NO: 15)

GTCAGACTAAGACAGAGAACCATCATTAATTGAAGTGAGATTTTTCTGGCCTGAGA

CTTGCAGGGAGGCAAGAAGACACTCTGGACACCACTATGGACAGCCTCTTGATGA

ACCGGAGGAAGTTTCTTTACCAATTCAAAAATGTCCGCTGGGCTAAGGGTCGGCGT

GAGACCTACCTGTGCTACGTAGTGAAGAGGCGTGACAGTGCTACATCCTTTTCACT

GGACTTTGGTTATCTTCGCAATAAGAACGGCTGCCACGTGGAATTGCTCTTCCTCC

GCTACATCTCGGACTGGGACCTAGACCCTGGCCGCTGCTACCGCGTCACCTGGTTC

ACCTCCTGGAGCCCCTGCTACGACTGTGCCCGACATGTGGCCGACTTTCTGCGAGG

GAACCCCAACCTCAGTCTGAGGATCTTCACCGCGCGCCTCTACTTCTGTGAGGACC

GCAAGGCTGAGCCCGAGGGGCTGCGGCGGCTGCACCGCGCCGGGGTGCAAATAGC

CATCATGACCTTCAAAGATTATTTTTACTGCTGGAATACTTTTGTAGAAAACCACG

AAAGAACTTTCAAAGCCTGGGAAGGGCTGCATGAAAATTCAGTTCGTCTCTCCAG

ACAGCTTCGGCGCATCCTTTTGCCCCTGTATGAGGTTGATGACTTACGAGACGCAT

TTCGTACTTTGGGACTTTGATAGCAACTTCCAGGAATGTCACACACGATGAAATAT

CTCTGCTGAAGACAGTGGATAAAAAACAGTCCTTCAAGTCTTCTCTGTTTTTATTCT

TCAACTCTCACTTTCTTAGAGTTTACAGAAAAAATATTTATATACGACTCTTTAAAA

AGATCTATGTCTTGAAAATAGAGAAGGAACACAGGTCTGGCCAGGGACGTGCTGC

AATTGGTGCAGTTTTGAATGCAACATTGTCCCCTACTGGGAATAACAGAACTGCAG GACCTGGGAGCATCCTAAAGTGTCAACGTTTTTCTATGACTTTTAGGTAGGATGAG

AGC AGA AGGT AGAT C C T A A A AAGC AT GGT GAGAGGAT C A A AT GTTTTT AT AT C A A

CATCCTTTATTATTTGATTCATTTGAGTTAACAGTGGTGTTAGTGATAGATTTTTCT

ATTCTTTTCCCTTGACGTTTACTTTCAAGTAACACAAACTCTTCCATCAGGCCATGA

TCTATAGGACCTCCTAATGAGAGTATCTGGGTGATTGTGACCCCAAACCATCTCTC

CAAAGCATTAATATCCAATCATGCGCTGTATGTTTTAATCAGCAGAAGCATGTTTT

TATGTTTGTACAAAAGAAGATTGTTATGGGTGGGGATGGAGGTATAGACCATGCAT

GGTCACCTTCAAGCTACTTTAATAAAGGATCTTAAAATGGGCAGGAGGACTGTGA

ACAAGACACCCTAATAATGGGTTGATGTCTGAAGTAGCAAATCTTCTGGAAACGC

AAACTCTTTTAAGGAAGTCCCTAATTTAGAAACACCCACAAACTTCACATATCATA

ATT AGC AAAC AATT GGAAGGAAGTT GCTT GAATGTT GGGGAGAGGAAAAT CT ATT

GGCTCTCGTGGGTCTCTTCATCTCAGAAATGCCAATCAGGTCAAGGTTTGCTACAT

TTTGTATGTGTGTGATGCTTCTCCCAAAGGTATATTAACTATATAAGAGAGTTGTG

ACAAAACAGAATGATAAAGCTGCGAACCGTGGCACACGCTCATAGTTCTAGCTGC

TTGGGAGGTTGAGGAGGGAGGATGGCTTGAACACAGGTGTTCAAGGCCAGCCTGG

GCAACATAACAAGATCCTGTCTCTCAAAAAAAAAAAAAAAAAAAAGAAAGAGAG

AGGGCCGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGCC

GGGCGGATCACCTGTGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGCAAAAC

CCCGTCTGTACTCAAAATGCAAAAATTAGCCAGGCGTGGTAGCAGGCACCTGTAA

TCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGTGGAG

GTTGCAGTAAGCTGAGATCGTGCCGTTGCACTCCAGCCTGGGCGACAAGAGCAAG

ACT CTGTCT C AGAAAAAAAAAAAAAAAAGAGAGAGAGAGAGAAAGAGAAC AATA

TTT GGGAG AG AAGGAT GGGGA AGC ATT GC A AGGA AATT GT GCTTT AT C C A AC A A A

ATGTAAGGAGCCAATAAGGGATCCCTATTTGTCTCTTTTGGTGTCTATTTGTCCCTA

ACAACTGTCTTTGACAGTGAGAAAAATATTCAGAATAACCATATCCCTGTGCCGTT

ATT AC CT AGC AAC CCTT GC AAT GAAGAT GAGC AGATCC AC AGGAAAACTT GAAT G

C AC AACT GTCTT ATTTT AAT CTTATT GT AC AT AAGTTT GT AAAAGAGTT AAAAATT G

TTACTTCATGTATTCATTTATATTTTATATTATTTTGCGTCTAATGATTTTTTATTA A

CATGATTTCCTTTTCTGATATATTGAAATGGAGTCTCAAAGCTTCATAAATTTATAA

CTTT AGA A AT GATT CT AAT A AC A AC GT AT GT AATT GT A AC ATT GC AGT AAT GGT GC

TACGAAGCCATTTCTCTTGATTTTTAGTAAACTTTTATGACAGCAAATTTGCTTCTG

GCT C ACTTT C AAT C AGTT AAAT AAAT GAT AAAT AATTTT GGAAGCTGTGAAGAT AA AATACCAAATAAAATAATATAAAAGTGATTTATATGAAGTTAAAATAAAAAATCA

GTATGATGGAATAAA

Homo sapiens activation induced cytidine deaminase (AICDA), transcript variant 1, protein (NP_065712.1 ; SEQ ID NO: 16)

MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCH VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLY F CEDRKAEPEGLRRLHRAGV QIAIMTFKDYFY C WNTFVENHERTFKAWEGLHEN S VR LSRQLRRILLPLYEVDDLRDAFRTLGL

The pGH335_MS2-AID*A-Hygro plasmid has the following sequence (SEQ ID NO: 17) > pGH335_MS2-AID*A-Hygro sequence 11382 bps

GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGC

TCTGATGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCT

GAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATT

GCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCC

AGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGG

GTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATG

GCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTAT

GTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTT

ACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCC

CTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACC

TTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATG

GTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGG

ATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATC

AACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGG

TAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTGCCTGTACTGGGTCT

CTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACT

GCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGT

TGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATC

TCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGC

TCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCG

GCGACT GGT GAGTACGC C AAAAATTTT GACT AGC GGAGGCT AGAAGGAGAGAGAT GGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAA

ATTCGGTTAAGGCCAGGGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGG

CAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAAACATCAGA

AGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAA

GAACTTAGATCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGGAT

AGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGCAAAACAA

AAGTAAGACCACCGCACAGCAAGCGGCCGCTGATCTTCAGACCTGGAGGAGGAGA

T AT GAGGGAC A ATT GGAGA AGT GA ATT AT AT A AAT AT A A AGT AGT A A A AATT GA A

CCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAA

AGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCA

CTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGT

ATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGT

TGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGA

AAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTC

ATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACA

GATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACA

AGCTT AAT AC ACTC CTTA ATT GAAGAATCGC AAAAC C AGC AAGAAAAGAAT GAAC

A AG AATT ATT GGA ATT AGAT A A AT GGGC A AGTTT GT GGA ATT GGTTT A AC AT A AC A

AATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTT

AAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCAC

CATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGG

AATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAA

CGGATCGGCACTGCGTGCGCCAATTCTGCAGACAAATGGCAGTATTCATCCACAAT

TTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAATAGTAGAC

ATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTC

AAAATTTTCGGGTTTATTACAGGGACAGCAGAGATCCAGTTTGGTTAATTAGCTAG

CTGCAAAGATGGATAAAGTTTTAAACAGAGAGGAATCTTTGCAGCTAATGGACCT

TCTAGGTCTTGAAAGGAGTGGGAATTGGCTCCGGTGCCCGTCAGTGGGCAGAGCG

CACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGT

GCCTAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCC

GCCTTTTTCCCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAA

CGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGGTAAGTGCCGTGTGTGGTT

CCCGCGGGCCTGGCCTCTTTACGGGTTATGGCCCTTGCGTGCCTTGAATTACTTCCA CCTGGCTGCAGTACGTGATTCTTGATCCCGAGCTTCGGGTTGGAAGTGGGTGGGAG AGTTCGAGGCCTTGCGCTTAAGGAGCCCCTTCGCCTCGTGCTTGAGTTGAGGCCTG GCCTGGGCGCTGGGGCCGCCGCGTGCGAATCTGGTGGCACCTTCGCGCCTGTCTCG CTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGACGCTTT TTTTCTGGCAAGATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCG GTTTTTGGGGCCGCGGGCGGCGACGGGGCCCGTGCGTCCCAGCGCACATGTTCGG CGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAATCGGACGGGGGTAGTCTCAAGC TGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTGGG CGGCAAGGCTGGCCCGGTCGGCACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCC CGGCCCTGCTGCAGGGAGCTCAAAATGGAGGACGCGGCGCTCGGGAGAGCGGGC GGGTGAGTCACCCACACAAAGGAAAAGGGCCTTTCCGTCCTCAGCCGTCGCTTCAT GTGACTCCACGGAGTACCGGGCGCCGTCCAGGCACCTCGATTAGTTCTCGAGCTTT TGGAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTTTCCCCA CACTGAGTGGGTGGAGACTGAAGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCT TGGAATTTGCCCTTTTTGAGTTTGGATCTTGGTTCATTCTCAAGCCTCAGACAGTGG TTCAAAGTTTTTTTCTTCCATTTCAGGTGTCGTGACGTACGGCCACCATGGCTTCAA ACTTTACTCAGTTCGTGCTCGTGGACAATGGTGGGACAGGGGATGTGACAGTGGCT CCTTCTAATTTCGCTAATGGGGTGGCAGAGTGGATCAGCTCCAACTCACGGAGCCA GGCCTACAAGGTGACATGCAGCGTCAGGCAGTCTAGTGCCCAGAAGAGAAAGTAT ACCATCAAGGTGGAGGTCCCCAAAGTGGCTACCCAGACAGTGGGCGGAGTCGAAC TGCCTGTCGCCGCTTGGAGGTCCTACCTGAACATGGAGCTCACTATCCCAATTTTC GCTACCAATTCTGACTGTGAACTCATCGTGAAGGCAATGCAGGGGCTCCTCAAAG ACGGTAATCCTATCCCTTCCGCCATCGCCGCTAACTCAGGTATCTACAGCGCTGGA GGAGGTGGAAGCGGAGGAGGAGGAAGCGGAGGAGGAGGTAGCGGACCTAAGAA AAAGAGGAAGGTGGCGGCCGCTGGATCCATGGACAGCCTCTTGATGAACCGGAGG GAGTTTCTTTACCAATTCAAAAATGTCCGCTGGGCTAAGGGTCGGCGTGAGACCTA CCTGTGCTACGTAGTGAAGAGGCGTGACAGTGCTACATCCTTTTCACTGGACTTTG GTTATCTTCGCAATAAGAACGGCTGCCACGTGGAATTGCTCTTCCTCCGCTACATC TCGGACTGGGACCTAGACCCTGGCCGCTGCTACCGCGTCACCTGGTTCATCTCCTG GAGCCCCTGCTACGACTGTGCCCGACATGTGGCCGACTTTCTGCGAGGGAACCCCA ACCTCAGTCTGAGGATCTTCACCGCGCGCCTCTACTTCTGTGAGGACCGCAAGGCT GAGCCCGAGGGGCTGCGGCGGCTGCACCGCGCCGGGGTGCAAATAGCCATCATGA CCTT C AAAGATTATTTTT ACT GCT GGAAT ACTTTT GT AGAAAAC C AC GGAAGAACT TTCAAAGCCTGGGAAGGGCTGCATGAAAATTCAGTTCGTCTCTCCAGACAGCTTCG

GCGCATCCTTTTGCCCCTGTATGAGGTTGATGACTTACGAGACGCATTTCGTACTTG

TACAGGCAGTGGAGAGGGCAGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGA

GAATCCTGGCCCAACCATGAAAAAGCCTGAACTCACCGCTACCTCTGTCGAGAAG

TTTCTGATCGAAAAGTTCGACAGCGTCTCCGACCTGATGCAGCTCTCCGAGGGCGA

AGAATCTCGGGCTTTCAGCTTCGATGTGGGAGGGCGTGGATATGTCCTGCGGGTGA

ATAGCTGCGCCGATGGTTTCTACAAAGATCGCTATGTTTATCGGCACTTTGCATCC

GCCGCTCTCCCTATTCCCGAAGTGCTTGACATTGGGGAGTTCAGCGAGAGCCTGAC

CTATTGCATCTCCCGCCGTGCACAGGGTGTCACCTTGCAAGACCTGCCTGAAACCG

AACTGCCCGCTGTTCTCCAGCCCGTCGCCGAGGCCATGGATGCCATCGCTGCCGCC

GATCTTAGCCAGACCAGCGGGTTCGGCCCATTCGGACCTCAAGGAATCGGTCAAT

ACACTACATGGCGCGATTTCATCTGCGCTATTGCTGATCCCCATGTGTATCACTGG

CAAACTGTGATGGACGACACCGTCAGTGCCTCCGTCGCCCAGGCTCTCGATGAGCT

GATGCTTTGGGCCGAGGACTGCCCCGAAGTCCGGCACCTCGTGCACGCCGATTTCG

GCTCCAACAATGTCCTGACCGACAATGGCCGCATAACAGCCGTCATTGACTGGAG

CGAGGCCATGTTCGGGGATTCCCAATACGAGGTCGCCAACATCTTCTTCTGGAGGC

CCTGGTTGGCTTGTATGGAGCAGCAGACCCGCTACTTCGAGCGGAGGCATCCCGA

GCTTGCAGGATCTCCTCGGCTCCGGGCTTATATGCTCCGCATTGGTCTTGACCAACT

CTATCAGAGCTTGGTTGACGGCAATTTCGATGATGCAGCTTGGGCTCAGGGTCGCT

GCGACGCAATCGTCCGGTCCGGAGCCGGGACTGTCGGGCGTACACAAATCGCCCG

CAGAAGCGCTGCCGTCTGGACCGATGGCTGTGTGGAAGTGCTCGCCGATAGTGGA

AACAGACGCCCCAGCACTCGTCCTAGGGCAAAGGATCTGCAGTAATGAGAATTCG

ATATCAAGCTTATCGGTAATCAACCTCTGGATTACAAAATTTGTGAAAGATTGACT

GGTATTCTTAACTATGTTGCTCCTTTTACGCTATGTGGATACGCTGCTTTAATGCCT

TTGTATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTTGTATAAATCC T

GGTTGCTGTCTCTTTATGAGGAGTTGTGGCCCGTTGTCAGGCAACGTGGCGTGGTG

TGCACTGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCATTGCCACCACCTGTCA

GCTCCTTTCCGGGACTTTCGCTTTCCCCCTCCCTATTGCCACGGCGGAACTCATCGC

CGCCTGCCTTGCCCGCTGCTGGACAGGGGCTCGGCTGTTGGGCACTGACAATTCCG

TGGTGTTGTCGGGGAAATCATCGTCCTTTCCTTGGCTGCTCGCCTGTGTTGCCACCT

GGATTCTGCGCGGGACGTCCTTCTGCTACGTCCCTTCGGCCCTCAATCCAGCGGAC

CTTCCTTCCCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCGCGTCTTCGCCTTCGC

CCTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCATCGATACCGTCGACC TCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACCAAT

GCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCA

CACCTCAGGTACCTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCAC

TTTTT AAAAGAAAAGGGGGGACT GGAAGGGCT AATT C ACTCC C AAC GAAGAC AAG

ATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAAC

TACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCT

AGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAGAGAACACCCG

CTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTA

GAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCC

GGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCT

AACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTA

GTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAG

TCAGTGTGGAAAATCTCTAGCAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTG

TGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCC

TGGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCAT

TGTCTGAGTAGGTGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGG

GGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGC

TTCTGAGGCGGAAAGAACCAGCTGGGGCTCTAGGGGGTATCCCCACGCGCCCTGT

AGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACAC

TTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGT T

CGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTA

GTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGT

GGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTCTTT

AATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTC

TTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGAT

TTAACAAAAATTTAACGCGAATTAATTCTGTGGAATGTGTGTCAGTTAGGGTGTGG

AAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAG

TCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAA

GCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCG

CCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTT

ATTTATGCAGAGGCCGAGGCCGCCTCTGCCTCTGAGCTATTCCAGAAGTAGTGAGG

AGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCGGGAGCTTGTATATCCA

TTTTCGGATCTGATCAGCACGTGTTGACAATTAATCATCGGCATAGTATATCGGCA TAGTATAATACGACAAGGTGAGGAACTAAACCATGGCCAAGTTGACCAGTGCCGT

TCCGGTGCTCACCGCGCGCGACGTCGCCGGAGCGGTCGAGTTCTGGACCGACCGG

CTCGGGTTCTCCCGGGACTTCGTGGAGGACGACTTCGCCGGTGTGGTCCGGGACGA

CGTGACCCTGTTCATCAGCGCGGTCCAGGACCAGGTGGTGCCGGACAACACCCTG

GCCTGGGTGTGGGTGCGCGGCCTGGACGAGCTGTACGCCGAGTGGTCGGAGGTCG

TGTCCACGAACTTCCGGGACGCCTCCGGGCCGGCCATGACCGAGATCGGCGAGCA

GCCGTGGGGGCGGGAGTTCGCCCTGCGCGACCCGGCCGGCAACTGCGTGCACTTC

GTGGCCGAGGAGCAGGACTGACACGTGCTACGAGATTTCGATTCCACCGCCGCCTT

CTATGAAAGGTTGGGCTTCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCC

AGCGCGGGGATCTCATGCTGGAGTTCTTCGCCCACCCCAACTTGTTTATTGCAGCT

TATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTT

TTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTG

TATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGT

GTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAG

TGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTC

ACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCC

AACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACT

GACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGC

GGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCA

AAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCC

ATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTG

GCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTC

GTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCT

TCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTA

GGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCT

GCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCG

CCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTG

CTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTT

GGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTG

ATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGA

TTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCT

GACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAA

AAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAA AGTAT ATATGAGT AAACTT GGT CT GAC AGTT AC C AAT GCTT AAT C AGT GAGGC AC C

TATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTA

GATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCG

CGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAA

GGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAAT

TGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGT

TGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAG

CTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAG

CGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTA

TCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAG

ATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGC

GGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAG

CAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAA

GGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGA

TCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCA

AAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTC

TTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATAC

ATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCG

AAAAGTGCCACCTGAC

Within the above plasmid, AID*A includes the following peptide sequence (SEQ ID NO:

18):

MD S LLMNRREFL Y QFKNVRWAKGRRETYLC Y VVKRRD S ATS F SLDF GYLRNKN GCH VELLFLRYISDWDLDPGRCYRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLY F CEDRKAEPEGLRRLHRAGV QI AIMTFKDYFY C WNTF VENHGRTFKAWEGLHEN S VR LSRQLRRILLPLYEVDDLRDAFRT

The above plasmid also includes the AID*A DNA sequence (SEQ ID NO: 30):

ATGGACAGCCTCTTGATGAACCGGAGGGAGTTTCTTTACCAATTCAAAAATGTCCG

CTGGGCTAAGGGTCGGCGTGAGACCTACCTGTGCTACGTAGTGAAGAGGCGTGAC

AGTGCTACATCCTTTTCACTGGACTTTGGTTATCTTCGCAATAAGAACGGCTGCCA

CGTGGAATTGCTCTTCCTCCGCTACATCTCGGACTGGGACCTAGACCCTGGCCGCT

GCTACCGCGTCACCTGGTTCATCTCCTGGAGCCCCTGCTACGACTGTGCCCGACAT GTGGCCGACTTTCTGCGAGGGAACCCCAACCTCAGTCTGAGGATCTTCACCGCGCG

CCTCTACTTCTGTGAGGACCGCAAGGCTGAGCCCGAGGGGCTGCGGCGGCTGCAC

CGCGCCGGGGTGCAAATAGCCATCATGACCTTCAAAGATTATTTTTACTGCTGGAA

TACTTTTGTAGAAAACCACGGAAGAACTTTCAAAGCCTGGGAAGGGCTGCATGAA

AATTCAGTTCGTCTCTCCAGACAGCTTCGGCGCATCCTTTTGCCCCTGTATGAGGTT

GATGACTTACGAGACGCATTTCGTACT

Guanine deaminase - also known as cypin, guanase, guanine aminase, GAH, and guanine aminohydrolase - is an aminohydrolase enzyme which converts guanine to xanthine. Cypin is a major cytosolic protein that interacts with PSD-95.

Homo sapiens guanine deaminase (GDA), transcript variant 2, mRNA (NM_004293.4; SEQ ID NO: 19)

AGAAAAATCCTATTGGCATTGAGGAGGTAGGGAGCCAGCCCCTGGGCGCGGCCTG

CAGGGTACCGGCAACCGCCCGGGTAAGCGGGGGCAGGACAAGGCCGGAGCCTGT

GTCCGCCCGGCAGCCGCCCGCAGCTGCAGAGAGTCCCGCTGCGTCTCCGCCGCGTG

CGCCCTCCTCGACCAGCAGACCCGCGCTGCGCTCCGCCGCTGACATGTGTGCCGCT

CAGATGCCGCCCCTGGCGCACATCTTCCGAGGGACGTTCGTCCACTCCACCTGGAC

CTGCCCCATGGAGGTGCTGCGGGATCACCTCCTCGGCGTGAGCGACAGCGGCAAA

ATAGTGTTTTTAGAAGAAGCATCTCAACAGGAAAAACTGGCCAAAGAATGGTGCT

TCAAGCCGTGTGAAATAAGAGAACTGAGCCACCATGAGTTCTTCATGCCTGGGCTG

GTTGATACACACATCCATGCCTCTCAGTATTCCTTTGCTGGAAGTAGCATAGACCT

GCCACTCTTGGAGTGGCTGACCAAGTACACATTTCCTGCAGAACACAGATTCCAGA

ACATCGACTTTGCAGAAGAAGTATATACCAGAGTTGTCAGGAGAACACTAAAGAA

TGGAACAACCACAGCTTGTTACTTTGCAACAATTCACACTGACTCATCTCTGCTCCT

TGCCGACATTACAGATAAATTTGGACAGCGGGCATTTGTGGGCAAAGTTTGCATGG

ATTTGAATGACACTTTTCCAGAATACAAGGAGACCACTGAGGAATCGATCAAGGA

AACTGAGAGATTTGTGTCAGAAATGCTCCAAAAGAACTATTCTAGAGTGAAGCCC

ATAGTGACACCACGTTTTTCCCTCTCCTGCTCTGAGACTTTGATGGGTGAACTGGG

CAACATTGCTAAAACCCGTGATTTGCACATTCAGAGCCATATAAGTGAAAATCGTG

ATGAAGTTGAAGCTGTGAAAAACTTATACCCCAGTTATAAAAACTACACATCTGTG

TATGATAAAAACAATCTTTTGACAAATAAGACAGTGATGGCACACGGCTGCTACCT

CTCTGCAGAAGAACTGAACGTATTCCATGAACGAGGAGCATCCATCGCACACTGT

CCC AATT CT AATTT AT C GCT C AGC AGT GGATTT CT AAAT GTGCT AGAAGTCCT GAA ACATGAAGTCAAGATAGGGCTGGGTACAGACGTGGCTGGTGGCTATTCATATTCC

ATGCTTGATGCAATCAGAAGAGCAGTGATGGTTTCCAATATCCTTTTAATTAATAA

GGTAAATGAGAAAAGCCTCACCCTCAAAGAAGTCTTCAGACTAGCTACTCTTGGA

GGAAGCCAAGCCCTGGGGCTGGATGGTGAGATTGGAAACTTTGAAGTGGGCAAGG

AATTTGATGCCATCCTGATCAACCCCAAAGCATCCGACTCTCCCATTGACCTGTTTT

ATGGGGACTTTTTTGGTGATATTTCTGAGGCTGTTATCCAGAAGTTCCTCTATCTAG

GAGATGATCGAAATATTGAAGAGGTTTATGTGGGCGGAAAGCAGGTGGTTCCGTT

TTCCAGCTCAGTGTAAGACCCTCGGGCGTCTACAAAGTTCTCCTGGGATTAGCGTG

GTTCTGCATCTCCCTTGTGCCCAGGTGGAGTTAGAAAGTCAAAAAATAGTACCTTG

TTCTTGGGATGACTATCCCTTTCTGTGTCTAGTTACAGTATTCACTTGACAAATAGT

TCGAAGGAAGTTGCACTAATTCTCAACTCTGGTTGAGAGGGTTCATAAATTTCATG

AAAATATCTCCCTTTGGAGCTGCTCAGACTTACTTTAAGCTCAAACAGAAGGGAAT

GCTATTACTGGTGGTGTTCCTACGGTAAGACTTAAGCAAAGCCTTTTTCATATTTGA

A AAT GT GGA A AGA AA AGAT GTT C CT A A A AGGTT AGAT ATTTT GAGC T A AT A ATT GC

A AA AATT AGA AGACT GA AA AT GGAC C CAT GAGAGT AT ATTTTT AT GAGGGAGC A A

AAGTTAGACTGAGAACAAACGTTAGAAAATCACTTCAGATTGTGTTTGAAAATTAT

ATACTGAGCATACTAATTTAAAAAGAGAACTTGTTGAAATTTAAAACGTGTTTCTA

GGTTGACCTTGTGTTTTAGAAATTTGCACTTAATGGAATTTGCATTTCAGAGATGTG

TTAGTGTTGTGCTTTGCCTTCTTTGGCGATGAATGTCAGAAATTGAATGCCACATGC

TTTCATAATATAGTTTTGTGCTTCAAAGTGTTTGACAGAAGTTGGGTATTAAAGATT

TAAAGTCTCTTAGGAAT ATT ATTCATGTAACTCCATGGCATAAATAGTTGT ATTTTT

GTGTACTTTAAAATCAACTTATAACTGTGAGATGTTATTGCTTCCATTTTATTAGAA

GAGAAACAAATTCCATGCTTTATGGAATTTATGTAGACTGGAGTCTTCGTGAACTG

GGGCAAATGCTGGCATCCAGGAGCCGCCAATACTAACAGGACAGGTTCCATTGCC

ATGGCCTATTCCACCCAAACAATATGTTGTAGTTTCTGGAAATTCCATACTCAGAT

ATCAGTCTGCTAGAACTTTAAAATGAAGGACAAATCCTGTTAAAGAAATATTGTTA

AAAATCTTTAAACCCTGTGTATTGAAAGCACTCTATTTTCTAATTTTATCCAGTTTT

CTGTTTAACTCCTTATAATGTTTAGGATATTAAAATTTTAGGATAATGAAGAGTAC

ATAATGTCCTACTTAATATTTATGTTAATAGGACTTAATTCTTACTAGACATCTAGG

AACATTACAAAGCAAAGACTATTTTTATGCTTCCATAACCTAGAATTAAAACCAAA

TT AT GAC CTT AT GAT A A AT CTTT A AGT ATT GGT GT GA AT GTT ATTT A A ATT CT AT AT

TTTTCTTATTTAATTACAAATACTATAAATGAGCAAGGAAAAGGAATAGACTTTCT

TAATATATTATAACACTCATTCCTAGAGCTTAGGGGTGACTCTTTAATATTACCTTA TAGTAGAAACTTTATGTAATATAGCTAACTCCGTATTTACAGAACAAAAAAACACA GTTCCCCCTCCTGTAGTATAAATTTTATTTTCACATACTTAGCTAATTTAGCAGTAA TTGGCCCAGTTTTTTCCCTAATAGAAATACTTTTAGATTTGATTATGTATACATGAC ACCTAAAGAGGGAACAAAAGTTAGTTTTATTTTTTTAATAAACAACAGAGTTTGTT TTGTGAGATAAGTATCTTAGTAAACCCAATTTCCAGTCTTAGTCTGTATTTCCAATA TTTCTAATTCCTGAGCCACGTCAAAGATGCCTTGCCAAATTTCTCCCCATTTCTCTA CGGGGCTAGCAAAAATCTTCAGCTTTATCACTCAACCCCTGCCAAAGGAACTTGAT TACATGGTGTCTAACCAAATGAGCAGGCTTAGGAATTTAGATGAGATGTGTAAGAT TCACTTACAGGCAGTAGCTGCTTCTAGCATTTGCAAGATCCTACACTTTTACCTTCT TTAAGGGTGTACATTTTGATGTTGAACATCAGTTTTCATGTAGACTTAGGACTCATG TGCAGTAAATATAAATAAGTGTAGCATCAGAAGCAGTAGGAATGGCCGTATACAA CCATCCTGTTAAACATTTAAATTTAGCTCTGATAGTGTGTTAAGACCTGAATATCTT TCCTAGTAAAAATAGGATGTGTTGAAATATTTATATGTACTTTGATCTCTCCACATC ACTTATAACTTATGTGTTTTATTTCTCCAAGTGCGGTGTTCCTGAATGTTATGTATG CTTTTTTTTCTGTACCACAGGCATTATCTATACCTGGGGCCAGATTTTCTGCACT TT GAAATGTTGCCTTTGCCTAATGTAGGTTGACTTTCTGAATTGTGGAGAGGCACTTTT CCAAGCCAATCTTATTTGTCACTTTTTGTTTTAATATCTTGCTCTCTGACAGGAAAG AAACAATTCACTTACCAGCCTCCTCACCCCATCCTCCACCATTTCCTTAATGTTCCA TGGTATTTTCAACGGAATACACTTTGAAAGGTAAAAACAATTCAAAAGTATCGATT ATCATAAATTCACAAAATATTTTTGCAACCAGAACACAAAAGCAGGCTAGTCAGC T A AGGT A A ATTT C ATTTT C A A AC GAGAGGGA AAC AT GGG AAGT A A A AGATT AGGA TGTGAAAGGTTGTCCTAAACAGACCAAGGAGACTGTTCCCTAATTTATTCTCTTGG CTGGTTCTCTCATTGAATTATCAGACCCCAAGAGGAGATATTGGAACAGGCTCCCT TCATGCCAAGGGTCTTTCTAAGTTAATACTGTGAGCATTGAGCCCCCATTAAAACT CTTTTTTACTTCAGAAAGAATTTTACAGGTTAAAGGGAAAGAAATGGTGGGAAAC T CTCCCCGTAATGCTTAGCCAACTTTAAAGTGTACCCTTCAATATCCCCATTGGCAA CT GC AGCTGAGAT CTT AGAGAGGAAAT ATAAC CGGT GT GAGAT CT AGC AAT GC AT TTTGAATCTTCACTCCCTACCAGGCTCTTCCTATTTTTAATCTCTTCACCTCAGAACT AGAC AT AT GGAGAGCTTTA AAGGC AAGCT GGAAGGC AC ATT GTAT C AATT CT ACC TTGTGCTATACGTAGGAGAGATCCAAAATTTGGATGCTTCTGGAGACTCTTAGACA TCTTTTCATTGTTGTCCATTTTTAAAGTTGATGATTGCTGGAAACATTCACACGCTT AAAAGCAATGGTGTGAGTTATTAATGGGTAAACTAAGAAGTGTTATAGGCAATGA CTT GA AAT GGTTTTT A AATT GT AT GGATT GTT A AGA ATT GTT GAA A A A A A ATTTTTT TTTTTTGGACAGCTTCAAGGAGATGTTAGCAATTTCAGATATACTAGCCAGTTTA G

GTAT GACTTT GGAAGT GC AGAAAC AGAAGGAT ACTGTT AGAAAATCCT AAC ATT G

GTCTCCGTGCATGTGTTCACACCTGGTCTCACTGCCTTTCCTTCCCACAGACCTGAG

TGTGAAAGACTGAGAGTTGAGGAGTTACTTTGTGGATCTTGTCCAAATTTAGTGAA

ATGTGGAAGTCAACCAGACCAATGATGGAATTAAATGTAAATTCCAAGAGGGCTT

TCACAGTCCACAGGGTTCAAATGACTTGGGTAACAGAAGTTATTCTTAGCTTACCT

GTTATGTGACAGTGATTTACCTGTCCATTTCCAACCCAAAAGCCTGTCAGAAAGCA

TTCTTTAGAGAAAACCACTTTACATTTGTTGTTAAACTCCTGATCGCTACTCTTAAG

AATATACATGTATGTATTCATAGGAACATTTTTTCTCAATATTTGTATGATTCGCTT

ACTGTT ATTGTGCTGAGTGAGCTCCTGTGTGCTTCAGACAAAAATAAATGAGACTT

T GT GTTT AC GTT A A A A A A A A A A A A A A AAA A A A A A

Homo sapiens guanine deaminase (GDA), transcript variant 2, protein (NP_004284.1; SEQ ID NO: 20)

MCAAQMPPLAHIFRGTFVHSTWTCPMEVLRDHLLGVSDSGKIVFLEEASQQEKLAKE W CFKPCEIREL SHHEFFMPGL VDTHIH AS Q Y SF AGS S IDLPLLEWLTKYTFP AEHRF QNI DF AEEV YTRV VRRTLKN GTTT AC YF ATIHTD S S LLL ADITDKF GQRAF V GKV CMDLND TFPEYKETTEESIKETERFVSEMLQKNYSRVKPIVTPRFSLSCSETLMGELGNIAKTRDL HIQSHISENRDEVEAVKNLYPSYKNYTSVYDKNNLLTNKTVMAHGCYLSAEELNVFH ERGASIAHCPNSNLSLSSGFLNVLEVLKHEVKIGLGTDVAGGYSYSMLDAIRRAVMVS NILLINKVNEKSLTLKEVFRLATLGGSQALGLDGEIGNFEVGKEFDAILINPKASDSPID L F Y GDFF GDI SE AVIQKFL YLGDDRNIEEV YV GGKQ V VPF S S S V

Other sequences relevant to the instant disclosure include the following:

Hyperactive AID*A-T7 RNA Polymerase (w/o T7 promoter)-NLS plasmid DNA sequence (SEQ ID NO: 31):

ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCAT

TATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA

GTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATA

GCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTT

TGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCA

TTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTG

GTTTAGTGAACCGTCAGATCCGCTAGAGATCCGCGGCCGCGAGAGCCGCCACCAT

GGACAGCCTCTTGATGAACCGGAGGGAGTTTCTTTACCAATTCAAAAATGTCCGCT GGGCTAAGGGTCGGCGTGAGACCTACCTGTGCTACGTAGTGAAGAGGCGTGACAG

TGCTACATCCTTTTCACTGGACTTTGGTTATCTTCGCAATAAGAACGGCTGCCACGT

GGAATTGCTCTTCCTCCGCTACATCTCGGACTGGGACCTAGACCCTGGCCGCTGCT

ACCGCGTCACCTGGTTCATCTCCTGGAGCCCCTGCTACGACTGTGCCCGACATGTG

GCCGACTTTCTGCGAGGGAACCCCAACCTCAGTCTGAGGATCTTCACCGCGCGCCT

CTACTTCTGTGAGGACCGCAAGGCTGAGCCCGAGGGGCTGCGGCGGCTGCACCGC

GCCGGGGTGCAAATAGCCATCATGACCTTCAAAGATTATTTTTACTGCTGGAATAC

TTTTGTAGAAAACCACGGAAGAACTTTCAAAGCCTGGGAAGGGCTGCATGAAAAT

TCAGTTCGTCTCTCCAGACAGCTTCGGCGCATCCTTTTGCCCCTGTATGAGGTTGAT

GACTTACGAGACGCATTTCGTACTAGCGGCAGCGAGACTCCCGGGACCTCAGAGT

CCGCCACACCCGAAAGTAACACCATCAACATTGCTAAGAACGACTTCTCAGACAT

AGAGCTCGCGGCTATTCCGTTCAACACCCTGGCTGACCACTACGGCGAGAGACTCG

CTAGGGAGCAGCTGGCGTTGGAGCATGAATCCTACGAGATGGGCGAGGCTAGGTT

CCGCAAGATGTTCGAGCGACAATTGAAGGCAGGGGAGGTGGCGGACAACGCTGCC

GCCAAGCCCCTGATCACAACCTTGCTGCCCAAAATGATCGCGCGGATCAACGATTG

GTTTGAGGAGGTTAAGGCAAAACGGGGCAAACGCCCGACCGCATTTCAATTCCTC

CAAGAAATCAAGCCTGAGGCTGTTGCCTACATCACTATCAAGACGACACTGGCGT

GTCTCACAAGCGCCGACAACACCACCGTGCAAGCCGTCGCCAGCGCCATCGGGCG

GGCAATTGAGGATGAGGCACGGTTTGGTAGGATCCGAGACCTGGAAGCGAAGCAC

TTCAAGAAGAACGTGGAAGAGCAGTTGAACAAACGCGTCGGCCACGTGTATAAAA

AGGCTTTCATGCAGGTGGTGGAGGCCGATATGCTCAGTAAGGGGCTGCTTGGGGG

GGAGGCGTGGTCATCCTGGCACAAGGAGGATAGCATTCACGTGGGGGTCCGATGT

ATCGAGATGCTGATAGAGAGCACCGGAATGGTCTCCCTCCATCGCCAGAACGCTG

GGGTCGTAGGGCAGGACTCCGAGACTATTGAGCTGGCCCCCGAGTATGCCGAAGC

AATCGCTACACGCGCAGGTGCACTGGCTGGGATAAGCCCTATGTTTCAGCCCTGCG

TAGTGCCTCCAAAGCCATGGACCGGCATCACAGGGGGTGGCTATTGGGCCAACGG

TAGGCGGCCTCTGGCCCTGGTACGCACGCACAGCAAGAAGGCGCTCATGCGCTAT

GAAGACGTTTACATGCCCGAGGTTTACAAGGCGATCAATATCGCGCAGAACACCG

CCT GGAAAAT C AAT AAGAAGGT GTTGGC GGTC GC AAAC GTGATT AC C AAGT GGAA

GCATTGCCCAGTCGAGGACATACCCGCCATAGAACGCGAAGAGCTGCCGATGAAG

CCGGAAGACATTGATATGAACCCCGAGGCCCTCACCGCGTGGAAAAGAGCCGCAG

CCGCCGTATACAGGAAGGATAAAGCGCGCAAGTCCCGACGCATAAGCCTCGAGTT

TATGCTGGAACAGGCCAACAAGTTCGCCAACCACAAAGCTATCTGGTTCCCCTACA ACATGGACTGGAGAGGGAGGGTCTACGCCGTCAGCATGTTCAATCCCCAGGGCAA

CGACATGACGAAGGGCCTTCTGACATTGGCAAAGGGGAAGCCTATCGGAAAGGAG

GGGTACTACTGGCTCAAGATCCACGGCGCCAACTGCGCGGGAGTGGACAAGGTTC

CATTTCCCGAGCGAATTAAGTTCATCGAGGAAAACCACGAAAACATTATGGCGTG

CGCTAAATCCCCCCTCGAGAACACATGGTGGGCCGAGCAAGACTCCCCGTTCTGTT

TTTTGGCATTCTGCTTTGAGTACGCCGGTGTGCAGCACCATGGCCTCTCATACAACT

GTTCCCTGCCCCTGGCCTTCGACGGAAGTTGCAGTGGGATTCAACATTTCAGCGCA

ATGTTGCGGGACGAGGTCGGTGGCAGGGCCGTTAACCTGCTCCCTTCCGAAACGGT

GCAGGACATCTACGGAATCGTGGCAAAAAAGGTAAACGAGATCCTGCAAGCGGAT

GCCATCAACGGGACGGACAATGAGGTCGTTACGGTGACAGACGAAAATACTGGGG

AAATAAGCGAAAAGGTCAAGCTGGGGACCAAAGCACTCGCGGGTCAGTGGCTCGC

CTACGGGGTGACACGCTCCGTCACCAAGAGAAGCGTGATGACCCTCGCGTACGGT

TCAAAAGAATTCGGCTTCCGCCAGCAAGTGCTGGAGGACACCATCCAGCCGGCGA

TTGACTCCGGGAAGGGTCTCATGTTTACCCAGCCGAACCAGGCCGCAGGGTACAT

GGCCAAACTGATCTGGGAAAGCGTTAGCGTCACAGTGGTCGCCGCGGTTGAGGCG

ATGAATTGGCTGAAGAGCGCGGCAAAGCTCCTCGCCGCTGAGGTGAAGGACAAAA

AGACCGGCGAAATCCTGCGCAAGCGCTGCGCCGTCCACTGGGTCACGCCGGATGG

ATTCCCCGTCTGGCAGGAGTACAAGAAGCCCATCCAAACCCGGCTCAACTTGATGT

TCCTTGGCCAGTTTCGCCTGCAGCCCACGATAAACACCAACAAAGACAGCGAGAT

CGACGCCCACAAGCAGGAGAGCGGCATCGCGCCCAACTTCGTGCACAGTCAGGAC

GGGTCCCATCTGCGGAAAACTGTTGTGTGGGCTCACGAGAAGTACGGCATTGAGA

GCTTCGCCCTGATACACGACAGCTTCGGGACCATACCAGCGGACGCAGCGAACCT

GTTCAAAGCCGTGCGGGAAACAATGGTCGACACCTACGAAAGCTGCGACGTACTG

GCAGACTTCTATGACCAATTCGCCGACCAGCTTCACGAGTCACAGCTCGACAAGAT

GCCCGCTCTGCCCGCGAAAGGCAACCTGAATTTGCGCGACATCCTTGAGAGCGATT

TTGCGTTCGCCTCTGGTGGTTCTCCCAAGAAGAAGAGGAAAGTCTAACCGGTCATC

ATCACCATCACCATTGAGTTTAAACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTT

GCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCA

CTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGG

TGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGG

AAGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGA

AAGAACCAGCTGGGGCTCGATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCAT

GGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATA CGAGCCGGAAGCATAAAGTGTAAAGCCTAGGGTGCCTAATGAGTGAGCTAACTCA

CATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAG

CTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCT

CTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCG

GTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACG

C AGGAAAGAAC AT GT GAGC A AA AGGC C AGC A AA AGGC C AGGAAC CGT AAAAAGG

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAAT

CGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGT

TTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGAT

ACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTA

GGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCC

CCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCC

GGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGA

GCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCT

ACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGA

AAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT

TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCC

TTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGA

TTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAA

TGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCA

ATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGT

TGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCC

CCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCA

ATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCG

CCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTT

AATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTC

GTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGAT

CCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGA

AGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCT

TACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGT

CATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGG

GATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTC

TTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAAC CCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGT

GAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGA

AATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTT

ATT GT CT CAT GAGCGGAT AC AT ATTT GAAT GT ATTT AGAAAAAT AAAC AAATAGGG

GTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCGACGGATCGGGAGATC

GATCTCCCGATCCCCTAGGGTCGACTCTCAGTACAATCTGCTCTGATGCCGCATAG

TTAAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAG

CAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAAGAATCTG

CTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTG

ACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATA

GCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGA

CCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAAC

GCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCC

ACTTGGCAGTACATCAAGTGTATC

AID*A-T7 RNA Polymerase-NLS polypeptide sequence (SEQ ID NO: 32):

MD S LLMNRREFL Y QFKNVRWAKGRRETYLC Y VVKRRD S ATS F SLDF GYLRNKN GCH VELLFLRYISDWDLDPGRCYRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLY F CEDRKAEPEGLRRLHRAGV QI AIMTFKDYFY C WNTF VENHGRTFKAWEGLHEN S VR LSRQLRRILLPLYEVDDLRDAFRTSGSETPGTSESATPESNTINIAKNDFSDIELAAIPF NT LADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPK MI ARINDWFEEVKAKRGKRPTAF QFLQEIKPEAV AYITIKTTL ACLTS ADNTTV Q AV AS AIGRAIEDE ARF GRIRDLE AKHFKKNVEEQLNKRV GHV YKKAFMQ V VEADML SKGLL GGEAWSSWHKEDSIHV GVRCIEMLIESTGMV SLHRQNAGVV GQDSETIELAPEYAEAI ATRAGAL AGI SPMF QPC V VPPKP WT GIT GGGYWAN GRRPL AL VRTHS KKALMRYED VYMPEVYKAINIAQNTAWKINKKVLAVANVITKWKHCPVEDIPAIEREELPMKPEDID MNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHKAIWFPYNMDWR GRVYAVSMFNPQGNDMTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERI KFIEENHENIM AC AKS PLENTWW AEQD S PF CFL AF CFEY AGV QHHGLS YN C SLPL AFD GSCSGIQHFS AMLRDEV GGRAVNLLPSETV QDIY GIVAKKVNEILQADAINGTDNEVV TVTDENTGEISEKVKLGTKALAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLE DTIQPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWLKSAAKLLAA EVKDKKTGEILRKRCAVHWVTPDGFPVWQEYKKPIQTRLNLMFLGQFRLQPTINTNK DSEIDAHKQESGIAPNFVHSQDGSHLRKTVVWAHEKYGIESFALIHDSFGTIPADAANL

FKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDFA

FASGGSPKKKRKV

Hyperactive AID*A-T7 RNA Polymerase Uracil DNA Glycosylase Inhibitor (UGI)-NLS plasmid DNA sequence (SEQ ID NO: 33):

ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCAT

TATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA

GTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATA

GCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTT

TGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCA

TTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTG

GTTTAGTGAACCGTCAGATCCGCTAGAGATCCGCGGCCGCGAGAGCCGCCACCAT

GGACAGCCTCTTGATGAACCGGAGGGAGTTTCTTTACCAATTCAAAAATGTCCGCT

GGGCTAAGGGTCGGCGTGAGACCTACCTGTGCTACGTAGTGAAGAGGCGTGACAG

TGCTACATCCTTTTCACTGGACTTTGGTTATCTTCGCAATAAGAACGGCTGCCACGT

GGAATTGCTCTTCCTCCGCTACATCTCGGACTGGGACCTAGACCCTGGCCGCTGCT

ACCGCGTCACCTGGTTCATCTCCTGGAGCCCCTGCTACGACTGTGCCCGACATGTG

GCCGACTTTCTGCGAGGGAACCCCAACCTCAGTCTGAGGATCTTCACCGCGCGCCT

CTACTTCTGTGAGGACCGCAAGGCTGAGCCCGAGGGGCTGCGGCGGCTGCACCGC

GCCGGGGTGCAAATAGCCATCATGACCTTCAAAGATTATTTTTACTGCTGGAATAC

TTTTGTAGAAAACCACGGAAGAACTTTCAAAGCCTGGGAAGGGCTGCATGAAAAT

TCAGTTCGTCTCTCCAGACAGCTTCGGCGCATCCTTTTGCCCCTGTATGAGGTTGAT

GACTTACGAGACGCATTTCGTACTAGCGGCAGCGAGACTCCCGGGACCTCAGAGT

CCGCCACACCCGAAAGTAACACCATCAACATTGCTAAGAACGACTTCTCAGACAT

AGAGCTCGCGGCTATTCCGTTCAACACCCTGGCTGACCACTACGGCGAGAGACTCG

CTAGGGAGCAGCTGGCGTTGGAGCATGAATCCTACGAGATGGGCGAGGCTAGGTT

CCGCAAGATGTTCGAGCGACAATTGAAGGCAGGGGAGGTGGCGGACAACGCTGCC

GCCAAGCCCCTGATCACAACCTTGCTGCCCAAAATGATCGCGCGGATCAACGATTG

GTTTGAGGAGGTTAAGGCAAAACGGGGCAAACGCCCGACCGCATTTCAATTCCTC

CAAGAAATCAAGCCTGAGGCTGTTGCCTACATCACTATCAAGACGACACTGGCGT

GTCTCACAAGCGCCGACAACACCACCGTGCAAGCCGTCGCCAGCGCCATCGGGCG

GGCAATTGAGGATGAGGCACGGTTTGGTAGGATCCGAGACCTGGAAGCGAAGCAC TTCAAGAAGAACGTGGAAGAGCAGTTGAACAAACGCGTCGGCCACGTGTATAAAA

AGGCTTTCATGCAGGTGGTGGAGGCCGATATGCTCAGTAAGGGGCTGCTTGGGGG

GGAGGCGTGGTCATCCTGGCACAAGGAGGATAGCATTCACGTGGGGGTCCGATGT

ATCGAGATGCTGATAGAGAGCACCGGAATGGTCTCCCTCCATCGCCAGAACGCTG

GGGTCGTAGGGCAGGACTCCGAGACTATTGAGCTGGCCCCCGAGTATGCCGAAGC

AATCGCTACACGCGCAGGTGCACTGGCTGGGATAAGCCCTATGTTTCAGCCCTGCG

TAGTGCCTCCAAAGCCATGGACCGGCATCACAGGGGGTGGCTATTGGGCCAACGG

TAGGCGGCCTCTGGCCCTGGTACGCACGCACAGCAAGAAGGCGCTCATGCGCTAT

GAAGACGTTTACATGCCCGAGGTTTACAAGGCGATCAATATCGCGCAGAACACCG

CCT GGAAAAT C AAT AAGAAGGT GTTGGC GGTC GC AAAC GTGATT AC C AAGT GGAA

GCATTGCCCAGTCGAGGACATACCCGCCATAGAACGCGAAGAGCTGCCGATGAAG

CCGGAAGACATTGATATGAACCCCGAGGCCCTCACCGCGTGGAAAAGAGCCGCAG

CCGCCGTATACAGGAAGGATAAAGCGCGCAAGTCCCGACGCATAAGCCTCGAGTT

TATGCTGGAACAGGCCAACAAGTTCGCCAACCACAAAGCTATCTGGTTCCCCTACA

ACATGGACTGGAGAGGGAGGGTCTACGCCGTCAGCATGTTCAATCCCCAGGGCAA

CGACATGACGAAGGGCCTTCTGACATTGGCAAAGGGGAAGCCTATCGGAAAGGAG

GGGTACTACTGGCTCAAGATCCACGGCGCCAACTGCGCGGGAGTGGACAAGGTTC

CATTTCCCGAGCGAATTAAGTTCATCGAGGAAAACCACGAAAACATTATGGCGTG

CGCTAAATCCCCCCTCGAGAACACATGGTGGGCCGAGCAAGACTCCCCGTTCTGTT

TTTTGGCATTCTGCTTTGAGTACGCCGGTGTGCAGCACCATGGCCTCTCATACAACT

GTTCCCTGCCCCTGGCCTTCGACGGAAGTTGCAGTGGGATTCAACATTTCAGCGCA

ATGTTGCGGGACGAGGTCGGTGGCAGGGCCGTTAACCTGCTCCCTTCCGAAACGGT

GCAGGACATCTACGGAATCGTGGCAAAAAAGGTAAACGAGATCCTGCAAGCGGAT

GCCATCAACGGGACGGACAATGAGGTCGTTACGGTGACAGACGAAAATACTGGGG

AAATAAGCGAAAAGGTCAAGCTGGGGACCAAAGCACTCGCGGGTCAGTGGCTCGC

CTACGGGGTGACACGCTCCGTCACCAAGAGAAGCGTGATGACCCTCGCGTACGGT

TCAAAAGAATTCGGCTTCCGCCAGCAAGTGCTGGAGGACACCATCCAGCCGGCGA

TTGACTCCGGGAAGGGTCTCATGTTTACCCAGCCGAACCAGGCCGCAGGGTACAT

GGCCAAACTGATCTGGGAAAGCGTTAGCGTCACAGTGGTCGCCGCGGTTGAGGCG

ATGAATTGGCTGAAGAGCGCGGCAAAGCTCCTCGCCGCTGAGGTGAAGGACAAAA

AGACCGGCGAAATCCTGCGCAAGCGCTGCGCCGTCCACTGGGTCACGCCGGATGG

ATTCCCCGTCTGGCAGGAGTACAAGAAGCCCATCCAAACCCGGCTCAACTTGATGT

TCCTTGGCCAGTTTCGCCTGCAGCCCACGATAAACACCAACAAAGACAGCGAGAT CGACGCCCACAAGCAGGAGAGCGGCATCGCGCCCAACTTCGTGCACAGTCAGGAC

GGGTCCCATCTGCGGAAAACTGTTGTGTGGGCTCACGAGAAGTACGGCATTGAGA

GCTTCGCCCTGATACACGACAGCTTCGGGACCATACCAGCGGACGCAGCGAACCT

GTTCAAAGCCGTGCGGGAAACAATGGTCGACACCTACGAAAGCTGCGACGTACTG

GCAGACTTCTATGACCAATTCGCCGACCAGCTTCACGAGTCACAGCTCGACAAGAT

GCCCGCTCTGCCCGCGAAAGGCAACCTGAATTTGCGCGACATCCTTGAGAGCGATT

TTGCGTTCGCCTCTGGTGGTTCTCCCAAGAAGAAGAGGAAAGTCTAACCGGTCATC

ATCACCATCACCATTGAGTTTAAACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTT

GCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCA

CTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGG

TGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGG

AAGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGA

AAGAACCAGCTGGGGCTCGATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCAT

GGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATA

CGAGCCGGAAGCATAAAGTGTAAAGCCTAGGGTGCCTAATGAGTGAGCTAACTCA

CATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAG

CTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCT

CTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCG

GTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACG

C AGGAAAGAAC AT GT GAGC A AA AGGC C AGC A AA AGGC C AGGAAC CGT AAAAAGG

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAAT

CGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGT

TTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGAT

ACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTA

GGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCC

CCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCC

GGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGA

GCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCT

ACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGA

AAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT

TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCC

TTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGA

TTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAA TGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCA

ATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGT

TGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCC

CCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCA

ATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCG

CCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTT

AATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTC

GTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGAT

CCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGA

AGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCT

TACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGT

CATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGG

GATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTC

TTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAAC

CCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGT

GAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGA

AATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTT

ATT GT CT CAT GAGCGGAT AC AT ATTT GAAT GT ATTT AGAAAAAT AAAC AAATAGGG

GTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCGACGGATCGGGAGATC

GATCTCCCGATCCCCTAGGGTCGACTCTCAGTACAATCTGCTCTGATGCCGCATAG

TTAAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAG

CAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAAGAATCTG

CTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTG

ACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATA

GCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGA

CCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAAC

GCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCC

ACTTGGCAGTACATCAAGTGTATC

AID*A-T7 RNA Polymerase-UGI-NLS polypeptide sequence (SEQ ID NO: 34):

MD S LLMNRREFL Y QFKNVRWAKGRRETYLC Y VVKRRD S ATS F SLDF GYLRNKN GCH VELLFLRYISDWDLDPGRCYRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLY F CEDRKAEPEGLRRLHRAGV QI AIMTFKD YF Y C WNTF VENHGRTFKAWEGLHEN S VR LSRQLRRILLPLYEVDDLRDAFRTSGSETPGTSESATPESNTINIAKNDFSDIELAAIPF NT

LADHYGERLAREQLALEHESYEMGEARFRKMFERQLKAGEVADNAAAKPLITTLLPK

MI ARINDWFEEVKAKRGKRPTAF QFLQEIKPEAV AYITIKTTL ACLTS ADNTTV Q AV AS

AIGRAIEDE ARF GRIRDLE AKHFKKNVEEQLNKRV GHV YKKAFMQ V VEADML SKGLL

GGEAWSSWHKEDSIHV GVRCIEMLIESTGMV SLHRQNAGVV GQDSETIELAPEYAEAI

ATRAGAL AGI SPMF QPC V VPPKP WT GIT GGGYWAN GRRPL AL VRTHS KKALMRYED

VYMPEVYKAINIAQNTAWKINKKVLAVANVITKWKHCPVEDIPAIEREELPMKPEDI D

MNPEALTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHKAIWFPYNMDWR

GRVYAVSMFNPQGNDMTKGLLTLAKGKPIGKEGYYWLKIHGANCAGVDKVPFPERI

KFIEENHENIM AC AKS PLENTWW AEQD S PF CFL AF CFEY AGV QHHGLS YN C SLPL AFD

GSCSGIQHFS AMLRDEV GGRAVNLLPSETV QDIY GIVAKKVNEILQADAINGTDNEVV

TVTDENTGEISEKVKLGTKALAGQWLAYGVTRSVTKRSVMTLAYGSKEFGFRQQVLE

DTIQPAIDSGKGLMFTQPNQAAGYMAKLIWESVSVTVVAAVEAMNWLKSAAKLLAA

EVKDKKTGEILRKRCAVHWVTPDGFPVWQEYKKPIQTRLNLMFLGQFRLQPTINTNK

DSEIDAHKQESGIAPNFVHSQDGSHLRKTVVWAHEKYGIESFALIHDSFGTIPADAA NL

FKAVRETMVDTYESCDVLADFYDQFADQLHESQLDKMPALPAKGNLNLRDILESDFA

FASGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTD ENV

MLLTS D APEYKP W ALVIQD SN GENKIKML S GGSPKKKRKV ecTadA DNA sequence (SEQ ID NO: 35):

ATGTCCGAAGTCGAGTTTTCCCATGAGTACTGGATGAGACACGCATTGACTCTCGC

AAAGAGGGCTTGGGATGAACGCGAGGTGCCCGTGGGGGCAGTACTCGTGCATAAC

AATCGCGTAATCGGCGAAGGTTGGAATAGGCCGATCGGACGCCACGACCCCACTG

CACATGCGGAAATCATGGCCCTTCGACAGGGAGGGCTTGTGATGCAGAATTATCG

ACTTATCGATGCGACGCTGTACGTCACGCTTGAACCTTGCGTAATGTGCGCGGGAG

CTATGATTCACTCCCGCATTGGACGAGTTGTATTCGGTGCCCGCGACGCCAAGACG

GGTGCCGCAGGTTCACTGATGGACGTGCTGCATCACCCAGGCATGAACCACCGGG

TAGAAATCACAGAAGGCATATTGGCGGACGAATGTGCGGCGCTGTTGTCCGACTTT

TTTCGCATGCGGAGGCAGGAGATCAAGGCCCAGAAAAAAGCACAATCCTCTACTG

AC ecTadA polypeptide sequence (SEQ ID NO: 36): MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTA

HAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGA

AGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTD

Rattus norvegicus APOBEC1 DNA sequence (SEQ ID NO: 37):

ATGAGCTCAGAGACTGGCCCAGTGGCTGTGGACCCCACATTGAGACGGCGGATCG

AGCCCCATGAGTTTGAGGTATTCTTCGATCCGAGAGAGCTCCGCAAGGAGACCTGC

CTGCTTTACGAAATTAATTGGGGGGGCCGGCACTCCATTTGGCGACATACATCACA

GAACACTAACAAGCACGTCGAAGTCAACTTCATCGAGAAGTTCACGACAGAAAGA

TATTTCTGTCCGAACACAAGGTGCAGCATTACCTGGTTTCTCAGCTGGAGCCCATG

CGGCGAATGTAGTAGGGCCATCACTGAATTCCTGTCAAGGTATCCCCACGTCACTC

TGTTTATTTACATCGCAAGGCTGTACCACCACGCTGACCCCCGCAATCGACAAGGC

CTGCGGGATTTGATCTCTTCAGGTGTGACTATCCAAATTATGACTGAGCAGGAGTC

AGGATACTGCTGGAGAAACTTTGTGAATTATAGCCCGAGTAATGAAGCCCACTGG

CCTAGGTATCCCCATCTGTGGGTACGACTGTACGTTCTTGAACTGTACTGCATCAT

ACTGGGCCTGCCTCCTTGTCTCAACATTCTGAGAAGGAAGCAGCCACAGCTGACAT

TCTTTACCATCGCTCTTCAGTCTTGTCATTACCAGCGACTGCCCCCACACATTCTCT

GGGCCACCGGGTTGAAA

SP6 RNA Polymerase DNA sequence (SEQ ID NO: 38):

C AAGATTT AC AC GCT ATC C AGCTT C A ATT AGAAGAAGAGATGTTT AAT GGT GGC AT

TCGTCGCTTCGAAGCAGATCAACAACGCCAGATTGCAGCAGGTAGCGAGAGCGAC

ACAGCATGGAACCGCCGCCTGTTGTCAGAACTTATTGCACCTATGGCTGAAGGCAT

TCAGGCTTATAAAGAAGAGTACGAAGGTAAGAAAGGTCGTGCACCTCGCGCATTG

GCTTTCTTACAATGTGTAGAAAATGAAGTTGCAGCATACATCACTATGAAAGTTGT

TATGGATATGCTGAATACGGATGCTACCCTTCAGGCTATTGCAATGAGTGTAGCAG

AACGCATTGAAGACCAAGTGCGCTTTTCTAAGCTAGAAGGTCACGCCGCTAAATA

CTTTGAGAAGGTTAAGAAGTCACTCAAGGCTAGCCGTACTAAGTCATATCGTCACG

CTCATAACGTAGCTGTAGTTGCTGAAAAATCAGTTGCAGAAAAGGACGCGGACTT

TGACCGTTGGGAGGCGTGGCCAAAAGAAACTCAATTGCAGATTGGTACTACCTTG

CTTGAAATCTTAGAAGGTAGCGTTTTCTATAATGGTGAACCTGTATTTATGCGTGCT

ATGC GC ACTT ATGGC GGAA AGACT ATTT ACT ACTT AC AAACTT CT GAAAGT GT AGG

CCAGTGGATTAGCGCATTCAAAGAGCACGTAGCGCAATTAAGCCCAGCTTATGCC CCTTGCGTAATCCCTCCTCGTCCTTGGAGAACTCCATTTAATGGAGGGTTCCATACT

GAGAAGGTAGCTAGCCGTATCCGTCTTGTAAAAGGTAACCGTGAGCATGTACGCA

AGTTGACTCAAAAGCAAATGCCAAAGGTTTATAAGGCTATCAACGCATTACAAAA

TACACAATGGCAAATCAACAAGGATGTATTAGCAGTTATTGAAGAAGTAATCCGC

TT AGAC CTT GGTTAT GGT GT AC CTTCCTT C AAGC C ACT GATTGAC AAGGAGAAC AA

GCCAGCTAACCCGGTACCTGTTGAATTCCAACACCTGCGCGGTCGTGAACTGAAAG

AGATGCTATCACCTGAGCAGTGGCAACAATTCATTAACTGGAAAGGCGAATGCGC

GCGCCTATATACCGCAGAAACTAAGCGCGGTTCAAAGTCCGCCGCCGTTGTTCGCA

TGGTAGGACAGGCCCGTAAATATAGCGCCTTTGAATCCATTTACTTCGTGTACGCA

ATGGATAGCCGCAGCCGTGTCTATGTGCAATCTAGCACGCTCTCTCCGCAGTCTAA

CGACTTAGGTAAGGCATTACTCCGCTTTACCGAGGGACGCCCTGTGAATGGCGTAG

AAGCGCTTAAATGGTTCTGCATCAATGGTGCTAACCTTTGGGGATGGGACAAGAA

AACTTTTGATGTGCGCGTGTCTAACGTATTAGATGAGGAATTCCAAGATATGTGTC

GAGACATCGCCGCAGACCCTCTCACATTCACCCAATGGGCTAAAGCTGATGCACCT

TATGAATTCCTCGCTTGGTGCTTTGAGTATGCTCAATACCTTGATTTGGTGGATGAA

GGAAGGGCCGACGAATTCCGCACTCACCTACCAGTACATCAGGACGGGTCTTGTTC

AGGCATTCAGCACTATAGTGCTATGCTTCGCGACGAAGTAGGGGCCAAAGCTGTT

AACCTGAAACCCTCCGATGCACCGCAGGATATCTATGGGGCGGTGGCGCAAGTGG

TTATCAAGAAGAATGCGCTATATATGGATGCGGACGATGCAACCACGTTTACTTCT

GGTAGCGTCACGCTGTCCGGTACAGAACTGCGAGCAATGGCTAGCGCATGGGATA

GTATTGGTATTACCCGTAGCTTAACCAAAAAGCCCGTGATGACCTTGCCATATGGT

TCTACTCGCTTAACTTGCCGTGAATCTGTGATTGATTACATCGTAGACTTAGAGGA

AAAAGAGGCGCAGAAGGCAGTAGCAGAAGGGCGGACGGCAAACAAGGTACATCC

TTTTGAAGACGATCGTCAAGATTACTTGACTCCGGGCGCAGCTTACAACTACATGA

CGGCACTAATCTGGCCTTCTATTTCTGAAGTAGTTAAGGCACCGATAGTAGCTATG

AAGATGATACGCCAGCTTGCACGCTTTGCAGCGAAACGTAATGAAGGCCTGATGT

ACACCCTGCCTACTGGCTTCATCTTAGAACAGAAGATCATGGCAACCGAGATGCTA

CGCGTGCGTACCTGTCTGATGGGTGATATCAAGATGTCCCTTCAGGTTGAAACGGA

TATCGTAGATGAAGCCGCTATGATGGGAGCAGCAGCACCTAATTTCGTACACGGTC

ATGACGCAAGTCACCTTATCCTTACCGTATGTGAATTGGTAGACAAGGGCGTAACT

AGTATCGCTGTAATCCACGACTCTTTTGGTACTCATGCAGACAACACCCTCACTCTT

AGAGTGGCACTTAAAGGGCAGATGGTTGCAATGTATATTGATGGTAATGCGCTTCA

GAAACTACTGGAGGAGCATGAAGAGCGCTGGATGGTTGATACAGGTATCGAAGTA CCTGAGCAAGGGGAGTTCGACCTTAACGAAATCATGGATTCTGAATACGTATTTGC

C

SP6 RNA Polymerase polypeptide sequence (SEQ ID NO: 39):

QDLHAIQLQLEEEMFNGGIRRFEADQQRQIAAGSESDTAWNRRLLSELIAPMAEGIQA YKEEYEGKKGRAPRALAFLQCVENEV AAYITMKVVMDMLNTD ATLQ AIAMS V AERI EDQVRFSKLEGHAAKYFEKVKKSLKASRTKSYRHAHNVAVVAEKSVAEKDADFDRW EAWPKETQLQIGTTLLEILEGSVFYNGEPVFMRAMRTY GGKTIYYLQTSES V GQWIS A FKEHVAQLSPAYAPCVIPPRPWRTPFNGGFHTEKVASRIRLVKGNREHVRKLTQKQMP KVYKAINALQNTQWQINKDVLAVIEEVIRLDLGYGVPSFKPLIDKENKPANPVPVEFQ HLRGRELKEMLSPEQWQQFINWKGECARLYTAETKRGSKS AAVVRMV GQARKY S AF ESIYFVYAMDSRSRVYVQSSTLSPQSNDLGKALLRFTEGRPVNGVEALKWFCINGANL W GWDKKTFD VRV SNVLDEEF QDMCRDI AADPLTFTQ W AKAD AP YEFL AW CFEY AQ YLDLVDEGRADEFRTHLPVHQDGSCSGIQHYS AMLRDEV GAKAVNLKPSDAPQDIY G AVAQVVIKKNALYMDADDATTFTSGSVTLSGTELRAMASAWDSIGITRSLTKKPVMT LPYGSTRLTCRESVIDYIVDLEEKEAQKAVAEGRTANKVHPFEDDRQDYLTPGAAYNY MT ALI WP SI SEV VKAPIV AMKMIRQL ARF AAKRNEGLMYTLPT GFILEQKIMATEMLR VRTCLMGDIKMSLQVETDIVDEAAMMGAAAPNFVHGHDASHLILTVCELVDKGVTSI AVIHDSFGTHADNTLTLRVALKGQMVAMYIDGNALQKLLEEHEERWMVDTGIEVPEQ GEFDLNEIMD S EYVF A

SV40 nuclear localization signal (NLS) DNA sequence (SEQ ID NO: 40):

CCCAAGAAGAAGAGGAAAGTC

SV40 NLS polypeptide sequence (SEQ ID NO: 41):

PKKKRKV

T7 RNA Polymerase DNA sequence (SEQ ID NO: 42):

ATGAACACCATCAACATTGCTAAGAACGACTTCTCAGACATAGAGCTCGCGGCTAT

TCCGTTCAACACCCTGGCTGACCACTACGGCGAGAGACTCGCTAGGGAGCAGCTG

GCGTTGGAGCATGAATCCTACGAGATGGGCGAGGCTAGGTTCCGCAAGATGTTCG

AGCGACAATTGAAGGCAGGGGAGGTGGCGGACAACGCTGCCGCCAAGCCCCTGAT

CACAACCTTGCTGCCCAAAATGATCGCGCGGATCAACGATTGGTTTGAGGAGGTTA

AGGCAAAACGGGGCAAACGCCCGACCGCATTTCAATTCCTCCAAGAAATCAAGCC TGAGGCTGTTGCCTACATCACTATCAAGACGACACTGGCGTGTCTCACAAGCGCCG

ACAACACCACCGTGCAAGCCGTCGCCAGCGCCATCGGGCGGGCAATTGAGGATGA

GGCACGGTTTGGTAGGATCCGAGACCTGGAAGCGAAGCACTTCAAGAAGAACGTG

GAAGAGCAGTTGAACAAACGCGTCGGCCACGTGTATAAAAAGGCTTTCATGCAGG

TGGT GGAGGCC GATAT GCT C AGT AAGGGGCT GCTT GGGGGGGAGGC GTGGT CAT C

CTGGCACAAGGAGGATAGCATTCACGTGGGGGTCCGATGTATCGAGATGCTGATA

GAGAGCACCGGAATGGTCTCCCTCCATCGCCAGAACGCTGGGGTCGTAGGGCAGG

ACTCCGAGACTATTGAGCTGGCCCCCGAGTATGCCGAAGCAATCGCTACACGCGC

AGGTGCACTGGCTGGGATAAGCCCTATGTTTCAGCCCTGCGTAGTGCCTCCAAAGC

CATGGACCGGCATCACAGGGGGTGGCTATTGGGCCAACGGTAGGCGGCCTCTGGC

CCTGGTACGCACGCACAGCAAGAAGGCGCTCATGCGCTATGAAGACGTTTACATG

CCCGAGGTTTACAAGGCGATCAATATCGCGCAGAACACCGCCTGGAAAATCAATA

AGAAGGTGTTGGCGGTCGCAAACGTGATTACCAAGTGGAAGCATTGCCCAGTCGA

GGACATACCCGCCATAGAACGCGAAGAGCTGCCGATGAAGCCGGAAGACATTGAT

ATGAACCCCGAGGCCCTCACCGCGTGGAAAAGAGCCGCAGCCGCCGTATACAGGA

AGGATAAAGCGCGCAAGTCCCGACGCATAAGCCTCGAGTTTATGCTGGAACAGGC

CAACAAGTTCGCCAACCACAAAGCTATCTGGTTCCCCTACAACATGGACTGGAGA

GGGAGGGTCTACGCCGTCAGCATGTTCAATCCCCAGGGCAACGACATGACGAAGG

GCCTTCTGACATTGGCAAAGGGGAAGCCTATCGGAAAGGAGGGGTACTACTGGCT

CAAGATCCACGGCGCCAACTGCGCGGGAGTGGACAAGGTTCCATTTCCCGAGCGA

ATTAAGTTCATCGAGGAAAACCACGAAAACATTATGGCGTGCGCTAAATCCCCCCT

CGAGAACACATGGTGGGCCGAGCAAGACTCCCCGTTCTGTTTTTTGGCATTCTGCT

TTGAGTACGCCGGTGTGCAGCACCATGGCCTCTCATACAACTGTTCCCTGCCCCTG

GCCTTCGACGGAAGTTGCAGTGGGATTCAACATTTCAGCGCAATGTTGCGGGACG

AGGTCGGTGGCAGGGCCGTTAACCTGCTCCCTTCCGAAACGGTGCAGGACATCTAC

GGAATCGTGGCAAAAAAGGTAAACGAGATCCTGCAAGCGGATGCCATCAACGGG

ACGGACAATGAGGTCGTTACGGTGACAGACGAAAATACTGGGGAAATAAGCGAA

AAGGTCAAGCTGGGGACCAAAGCACTCGCGGGTCAGTGGCTCGCCTACGGGGTGA

CACGCTCCGTCACCAAGAGAAGCGTGATGACCCTCGCGTACGGTTCAAAAGAATT

CGGCTTCCGCCAGCAAGTGCTGGAGGACACCATCCAGCCGGCGATTGACTCCGGG

AAGGGTCTCATGTTTACCCAGCCGAACCAGGCCGCAGGGTACATGGCCAAACTGA

TCTGGGAAAGCGTTAGCGTCACAGTGGTCGCCGCGGTTGAGGCGATGAATTGGCT

GAAGAGCGCGGCAAAGCTCCTCGCCGCTGAGGTGAAGGACAAAAAGACCGGCGA AATCCTGCGCAAGCGCTGCGCCGTCCACTGGGTCACGCCGGATGGATTCCCCGTCT

GGCAGGAGTACAAGAAGCCCATCCAAACCCGGCTCAACTTGATGTTCCTTGGCCA

GTTTCGCCTGCAGCCCACGATAAACACCAACAAAGACAGCGAGATCGACGCCCAC

AAGCAGGAGAGCGGCATCGCGCCCAACTTCGTGCACAGTCAGGACGGGTCCCATC

TGCGGAAAACTGTTGTGTGGGCTCACGAGAAGTACGGCATTGAGAGCTTCGCCCT

GATACACGACAGCTTCGGGACCATACCAGCGGACGCAGCGAACCTGTTCAAAGCC

GTGCGGGAAACAATGGTCGACACCTACGAAAGCTGCGACGTACTGGCAGACTTCT

ATGACCAATTCGCCGACCAGCTTCACGAGTCACAGCTCGACAAGATGCCCGCTCTG

CCCGCGAAAGGCAACCTGAATTTGCGCGACATCCTTGAGAGCGATTTTGCGTTCGC

C

T7 RNA Polymerase polypeptide sequence (SEQ ID NO: 43):

MNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARFRKMFERQ

LKAGEVADNAAAKPLITTLLPKMIARINDWFEEVKAKRGKRPTAFQFLQEIKPEAVA YI

TIKTTL ACLTS ADNTTV Q AV AS AIGRAIEDEARF GRIRDLEAKHFKKNVEEQLNKRV G

HVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHVGVRCIEMLIESTGMVSLHR

QNAGVVGQDSETIELAPEYAEAIATRAGALAGISPMFQPCVVPPKPWTGITGGGYWA N

GRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAWKINKKVLAVANVITKWK

HCPVEDIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRISLEFML E

Q ANKF ANHKAIWFPYNMDWRGRVY AV SMFNPQGNDMTKGLLTL AKGKPIGKEGYY

WLKIHGANCAGVDKVPFPERIKFIEENHENIMACAKSPLENTWWAEQDSPFCFLAFC F

EYAGVQHHGLSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGI V

AKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKLGTKALAGQWLAYGVTRSVTK

RS VMTL AY GS KEF GFRQQ VLEDTIQP AID S GKGLMFTQPN Q AAGYMAKLI WES V S VT

VVAAVEAMNWLKSAAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVWQEYKKPI

QTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQDGSHLRKTVVWAHE K

YGIESFALIHDSFGTIPADAANLFKAVRETMVDTYESCDVLADFYDQFADQLHESQL D

KMPALPAKGNLNLRDILESDFAFA

Uracil DNA Glycosylase Inhibitor (UGI) DNA sequence (SEQ ID NO: 44):

ACTAATCTGTCAGATATTATTGAAAAGGAGACCGGTAAGCAACTGGTTATCCAGG AATCCATCCTCATGCTCCCAGAGGAGGTGGAAGAAGTCATTGGGAACAAGCCGGA AAGCGATATACTCGTGCACACCGCCTACGACGAGAGCACCGACGAGAATGTCATG CTTCTGACTAGCGACGCCCCTGAATACAAGCCTTGGGCTCTGGTCATACAGGATAG C AACGGT GAGAAC AAGATTAAGAT GCT C

UGI polypeptide sequence (SEQ ID NO: 45):

TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD

APEYKPWALVIQDSNGENKIKML

Rattus norvegicus APOBEC1-T7 Polymerase-NLS plasmid DNA sequence (SEQ ID NO: 46):

ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCAT

TATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA

GTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATA

GCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTT

TGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCA

TTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTG

GTTTAGTGAACCGTCAGATCCGCTAGAGATCCGCGGCCGCGAGAGCCGCCACCAT

GAGCTCAGAGACTGGCCCAGTGGCTGTGGACCCCACATTGAGACGGCGGATCGAG

CCCCATGAGTTTGAGGTATTCTTCGATCCGAGAGAGCTCCGCAAGGAGACCTGCCT

GCTTTACGAAATTAATTGGGGGGGCCGGCACTCCATTTGGCGACATACATCACAGA

ACACTAACAAGCACGTCGAAGTCAACTTCATCGAGAAGTTCACGACAGAAAGATA

TTTCTGTCCGAACACAAGGTGCAGCATTACCTGGTTTCTCAGCTGGAGCCCATGCG

GCGAATGTAGTAGGGCCATCACTGAATTCCTGTCAAGGTATCCCCACGTCACTCTG

TTTATTTACATCGCAAGGCTGTACCACCACGCTGACCCCCGCAATCGACAAGGCCT

GCGGGATTTGATCTCTTCAGGTGTGACTATCCAAATTATGACTGAGCAGGAGTCAG

GATACTGCTGGAGAAACTTTGTGAATTATAGCCCGAGTAATGAAGCCCACTGGCCT

AGGTATCCCCATCTGTGGGTACGACTGTACGTTCTTGAACTGTACTGCATCATACT

GGGCCTGCCTCCTTGTCTCAACATTCTGAGAAGGAAGCAGCCACAGCTGACATTCT

TTACCATCGCTCTTCAGTCTTGTCATTACCAGCGACTGCCCCCACACATTCTCTGGG

CCACCGGGTTGAAAAGCGGCAGCGAGACTCCCGGGACCTCAGAGTCCGCCACACC

CGAAAGTAACACCATCAACATTGCTAAGAACGACTTCTCAGACATAGAGCTCGCG

GCTATTCCGTTCAACACCCTGGCTGACCACTACGGCGAGAGACTCGCTAGGGAGC

AGCTGGCGTTGGAGCATGAATCCTACGAGATGGGCGAGGCTAGGTTCCGCAAGAT

GTTCGAGCGACAATTGAAGGCAGGGGAGGTGGCGGACAACGCTGCCGCCAAGCCC

CTGATCACAACCTTGCTGCCCAAAATGATCGCGCGGATCAACGATTGGTTTGAGGA GGTTAAGGCAAAACGGGGCAAACGCCCGACCGCATTTCAATTCCTCCAAGAAATC

AAGCCTGAGGCTGTTGCCTACATCACTATCAAGACGACACTGGCGTGTCTCACAAG

CGCCGACAACACCACCGTGCAAGCCGTCGCCAGCGCCATCGGGCGGGCAATTGAG

GATGAGGCACGGTTTGGTAGGATCCGAGACCTGGAAGCGAAGCACTTCAAGAAGA

ACGTGGAAGAGCAGTTGAACAAACGCGTCGGCCACGTGTATAAAAAGGCTTTCAT

GCAGGTGGTGGAGGCCGATATGCTCAGTAAGGGGCTGCTTGGGGGGGAGGCGTGG

TCATCCTGGCACAAGGAGGATAGCATTCACGTGGGGGTCCGATGTATCGAGATGC

TGATAGAGAGCACCGGAATGGTCTCCCTCCATCGCCAGAACGCTGGGGTCGTAGG

GCAGGACTCCGAGACTATTGAGCTGGCCCCCGAGTATGCCGAAGCAATCGCTACA

CGCGCAGGTGCACTGGCTGGGATAAGCCCTATGTTTCAGCCCTGCGTAGTGCCTCC

AAAGCCATGGACCGGCATCACAGGGGGTGGCTATTGGGCCAACGGTAGGCGGCCT

CTGGCCCTGGTACGCACGCACAGCAAGAAGGCGCTCATGCGCTATGAAGACGTTT

ACATGCCCGAGGTTTACAAGGCGATCAATATCGCGCAGAACACCGCCTGGAAAAT

CAATAAGAAGGTGTTGGCGGTCGCAAACGTGATTACCAAGTGGAAGCATTGCCCA

GTCGAGGACATACCCGCCATAGAACGCGAAGAGCTGCCGATGAAGCCGGAAGAC

ATTGATATGAACCCCGAGGCCCTCACCGCGTGGAAAAGAGCCGCAGCCGCCGTAT

ACAGGAAGGATAAAGCGCGCAAGTCCCGACGCATAAGCCTCGAGTTTATGCTGGA

ACAGGCCAACAAGTTCGCCAACCACAAAGCTATCTGGTTCCCCTACAACATGGACT

GGAGAGGGAGGGTCTACGCCGTCAGCATGTTCAATCCCCAGGGCAACGACATGAC

GAAGGGCCTTCTGACATTGGCAAAGGGGAAGCCTATCGGAAAGGAGGGGTACTAC

TGGCTCAAGATCCACGGCGCCAACTGCGCGGGAGTGGACAAGGTTCCATTTCCCG

AGCGAATTAAGTTCATCGAGGAAAACCACGAAAACATTATGGCGTGCGCTAAATC

CCCCCTCGAGAACACATGGTGGGCCGAGCAAGACTCCCCGTTCTGTTTTTTGGCAT

TCTGCTTTGAGTACGCCGGTGTGCAGCACCATGGCCTCTCATACAACTGTTCCCTG

CCCCTGGCCTTCGACGGAAGTTGCAGTGGGATTCAACATTTCAGCGCAATGTTGCG

GGACGAGGTCGGTGGCAGGGCCGTTAACCTGCTCCCTTCCGAAACGGTGCAGGAC

ATCTACGGAATCGTGGCAAAAAAGGTAAACGAGATCCTGCAAGCGGATGCCATCA

ACGGGACGGACAATGAGGTCGTTACGGTGACAGACGAAAATACTGGGGAAATAA

GCGAAAAGGTCAAGCTGGGGACCAAAGCACTCGCGGGTCAGTGGCTCGCCTACGG

GGTGACACGCTCCGTCACCAAGAGAAGCGTGATGACCCTCGCGTACGGTTCAAAA

GAATTCGGCTTCCGCCAGCAAGTGCTGGAGGACACCATCCAGCCGGCGATTGACT

CCGGGAAGGGTCTCATGTTTACCCAGCCGAACCAGGCCGCAGGGTACATGGCCAA

ACTGATCTGGGAAAGCGTTAGCGTCACAGTGGTCGCCGCGGTTGAGGCGATGAAT TGGCTGAAGAGCGCGGCAAAGCTCCTCGCCGCTGAGGTGAAGGACAAAAAGACCG

GCGAAATCCTGCGCAAGCGCTGCGCCGTCCACTGGGTCACGCCGGATGGATTCCCC

GTCTGGCAGGAGTACAAGAAGCCCATCCAAACCCGGCTCAACTTGATGTTCCTTGG

CCAGTTTCGCCTGCAGCCCACGATAAACACCAACAAAGACAGCGAGATCGACGCC

CACAAGCAGGAGAGCGGCATCGCGCCCAACTTCGTGCACAGTCAGGACGGGTCCC

ATCTGCGGAAAACTGTTGTGTGGGCTCACGAGAAGTACGGCATTGAGAGCTTCGC

CCTGATACACGACAGCTTCGGGACCATACCAGCGGACGCAGCGAACCTGTTCAAA

GCCGTGCGGGAAACAATGGTCGACACCTACGAAAGCTGCGACGTACTGGCAGACT

TCTATGACCAATTCGCCGACCAGCTTCACGAGTCACAGCTCGACAAGATGCCCGCT

CTGCCCGCGAAAGGCAACCTGAATTTGCGCGACATCCTTGAGAGCGATTTTGCGTT

CGCCTCTGGTGGTTCTCCCAAGAAGAAGAGGAAAGTCTAACCGGTCATCATCACC

ATCACCATTGAGTTTAAACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAG

CCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCC

ACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCA

TTCT ATT CT GGGGGGTGGGGT GGGGC AGGAC AGC AAGGGGGAGGATT GGGAAGAC

AATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAA

CCAGCTGGGGCTCGATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCA

TAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGC

CGGAAGCATAAAGTGTAAAGCCTAGGGTGCCTAATGAGTGAGCTAACTCACATTA

ATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCA

TTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCG

CTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCA

GCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAA

AGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGT

TGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC

TCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCC

CTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGT

CCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATC

TCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTT

CAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAG

ACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGG

TATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAG

AAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAG TTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTT

GCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTT

TTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCA

TGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTT

AAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAAT

CAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACT

CCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTG

CAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCA

GCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATC

CAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTT

GCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTA

TGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATG

TTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTT

GGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCAT

GCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAG

AATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACC

GCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCG

AAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTG

CACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAA

ACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGA

ATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTC

ATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGC

GCACATTTCCCCGAAAAGTGCCACCTGACGTCGACGGATCGGGAGATCGATCTCCC

GATCCCCTAGGGTCGACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCA

GTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTT

AAGCT AC AAC AAGGC AAGGCTTGACC GAC AATT GC AT GAAGAATCT GCTT AGGGT

TAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGAT

TATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAA

CGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAG

GGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCA

GTACATCAAGTGTATC Rattus norvegicus APOBEC1-T7 RNA Polymerase-NLS polypeptide sequence (SEQ ID NO: 47):

MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTN

KHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI ARLY

HHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRL

YVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSET PGTS

ESATPESNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARF RK

MFERQLKAGEVADNAAAKPLITTLLPKMIARINDWFEEVKAKRGKRPTAFQFLQEIK P

EAV AYITIKTTLACLTS ADNTTV Q AV AS AIGRAIEDEARF GRIRDLEAKHFKKNVEEQL

NKRV GHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHV GVRCIEMLIESTGM

VSLHRQNAGVVGQDSETIELAPEYAEAIATRAGALAGISPMFQPCVVPPKPWTGITG G

GYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAWKINKKVLAVAN

VITKWKHCPVEDIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRI S

LEFMLEQANKFANHKAIWFPYNMDWRGRVY AV SMFNPQGNDMTKGLLTLAKGKPI

GKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAKSPLENTWWAEQDSPF

CFLAFCFEYAGVQHHGLSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSET V

QDIYGIVAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKLGTKALAGQWLAYG

VTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPAIDSGKGLMFTQPNQAAGYMAKLI

WESVSVTVVAAVEAMNWLKSAAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVW

QEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQDGSHLRK TV

VWAHEKYGIESFALIHDSFGTIPADAANLFKAVRETMVDTYESCDVLADFYDQFADQ L

HESQLDKMPALPAKGNLNLRDILESDFAFASGGSPKKKRKV

Rattus norvegicus APOBEC1-T7 RNA Polymerase-UGI-NLS plasmid DNA sequence (SEQ ID NO: 48):

ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCAT

TATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA

GTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATA

GCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTT

TGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCA

TTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTG

GTTTAGTGAACCGTCAGATCCGCTAGAGATCCGCGGCCGCGAGAGCCGCCACCAT

GAGCTCAGAGACTGGCCCAGTGGCTGTGGACCCCACATTGAGACGGCGGATCGAG CCCCATGAGTTTGAGGTATTCTTCGATCCGAGAGAGCTCCGCAAGGAGACCTGCCT

GCTTTACGAAATTAATTGGGGGGGCCGGCACTCCATTTGGCGACATACATCACAGA

ACACTAACAAGCACGTCGAAGTCAACTTCATCGAGAAGTTCACGACAGAAAGATA

TTTCTGTCCGAACACAAGGTGCAGCATTACCTGGTTTCTCAGCTGGAGCCCATGCG

GCGAATGTAGTAGGGCCATCACTGAATTCCTGTCAAGGTATCCCCACGTCACTCTG

TTTATTTACATCGCAAGGCTGTACCACCACGCTGACCCCCGCAATCGACAAGGCCT

GCGGGATTTGATCTCTTCAGGTGTGACTATCCAAATTATGACTGAGCAGGAGTCAG

GATACTGCTGGAGAAACTTTGTGAATTATAGCCCGAGTAATGAAGCCCACTGGCCT

AGGTATCCCCATCTGTGGGTACGACTGTACGTTCTTGAACTGTACTGCATCATACT

GGGCCTGCCTCCTTGTCTCAACATTCTGAGAAGGAAGCAGCCACAGCTGACATTCT

TTACCATCGCTCTTCAGTCTTGTCATTACCAGCGACTGCCCCCACACATTCTCTGGG

CCACCGGGTTGAAAAGCGGCAGCGAGACTCCCGGGACCTCAGAGTCCGCCACACC

CGAAAGTAACACCATCAACATTGCTAAGAACGACTTCTCAGACATAGAGCTCGCG

GCTATTCCGTTCAACACCCTGGCTGACCACTACGGCGAGAGACTCGCTAGGGAGC

AGCTGGCGTTGGAGCATGAATCCTACGAGATGGGCGAGGCTAGGTTCCGCAAGAT

GTTCGAGCGACAATTGAAGGCAGGGGAGGTGGCGGACAACGCTGCCGCCAAGCCC

CTGATCACAACCTTGCTGCCCAAAATGATCGCGCGGATCAACGATTGGTTTGAGGA

GGTTAAGGCAAAACGGGGCAAACGCCCGACCGCATTTCAATTCCTCCAAGAAATC

AAGCCTGAGGCTGTTGCCTACATCACTATCAAGACGACACTGGCGTGTCTCACAAG

CGCCGACAACACCACCGTGCAAGCCGTCGCCAGCGCCATCGGGCGGGCAATTGAG

GATGAGGCACGGTTTGGTAGGATCCGAGACCTGGAAGCGAAGCACTTCAAGAAGA

ACGTGGAAGAGCAGTTGAACAAACGCGTCGGCCACGTGTATAAAAAGGCTTTCAT

GCAGGTGGTGGAGGCCGATATGCTCAGTAAGGGGCTGCTTGGGGGGGAGGCGTGG

TCATCCTGGCACAAGGAGGATAGCATTCACGTGGGGGTCCGATGTATCGAGATGC

TGATAGAGAGCACCGGAATGGTCTCCCTCCATCGCCAGAACGCTGGGGTCGTAGG

GCAGGACTCCGAGACTATTGAGCTGGCCCCCGAGTATGCCGAAGCAATCGCTACA

CGCGCAGGTGCACTGGCTGGGATAAGCCCTATGTTTCAGCCCTGCGTAGTGCCTCC

AAAGCCATGGACCGGCATCACAGGGGGTGGCTATTGGGCCAACGGTAGGCGGCCT

CTGGCCCTGGTACGCACGCACAGCAAGAAGGCGCTCATGCGCTATGAAGACGTTT

ACATGCCCGAGGTTTACAAGGCGATCAATATCGCGCAGAACACCGCCTGGAAAAT

CAATAAGAAGGTGTTGGCGGTCGCAAACGTGATTACCAAGTGGAAGCATTGCCCA

GTCGAGGACATACCCGCCATAGAACGCGAAGAGCTGCCGATGAAGCCGGAAGAC

ATTGATATGAACCCCGAGGCCCTCACCGCGTGGAAAAGAGCCGCAGCCGCCGTAT ACAGGAAGGATAAAGCGCGCAAGTCCCGACGCATAAGCCTCGAGTTTATGCTGGA

ACAGGCCAACAAGTTCGCCAACCACAAAGCTATCTGGTTCCCCTACAACATGGACT

GGAGAGGGAGGGTCTACGCCGTCAGCATGTTCAATCCCCAGGGCAACGACATGAC

GAAGGGCCTTCTGACATTGGCAAAGGGGAAGCCTATCGGAAAGGAGGGGTACTAC

TGGCTCAAGATCCACGGCGCCAACTGCGCGGGAGTGGACAAGGTTCCATTTCCCG

AGCGAATTAAGTTCATCGAGGAAAACCACGAAAACATTATGGCGTGCGCTAAATC

CCCCCTCGAGAACACATGGTGGGCCGAGCAAGACTCCCCGTTCTGTTTTTTGGCAT

TCTGCTTTGAGTACGCCGGTGTGCAGCACCATGGCCTCTCATACAACTGTTCCCTG

CCCCTGGCCTTCGACGGAAGTTGCAGTGGGATTCAACATTTCAGCGCAATGTTGCG

GGACGAGGTCGGTGGCAGGGCCGTTAACCTGCTCCCTTCCGAAACGGTGCAGGAC

ATCTACGGAATCGTGGCAAAAAAGGTAAACGAGATCCTGCAAGCGGATGCCATCA

ACGGGACGGACAATGAGGTCGTTACGGTGACAGACGAAAATACTGGGGAAATAA

GCGAAAAGGTCAAGCTGGGGACCAAAGCACTCGCGGGTCAGTGGCTCGCCTACGG

GGTGACACGCTCCGTCACCAAGAGAAGCGTGATGACCCTCGCGTACGGTTCAAAA

GAATTCGGCTTCCGCCAGCAAGTGCTGGAGGACACCATCCAGCCGGCGATTGACT

CCGGGAAGGGTCTCATGTTTACCCAGCCGAACCAGGCCGCAGGGTACATGGCCAA

ACTGATCTGGGAAAGCGTTAGCGTCACAGTGGTCGCCGCGGTTGAGGCGATGAAT

TGGCTGAAGAGCGCGGCAAAGCTCCTCGCCGCTGAGGTGAAGGACAAAAAGACCG

GCGAAATCCTGCGCAAGCGCTGCGCCGTCCACTGGGTCACGCCGGATGGATTCCCC

GTCTGGCAGGAGTACAAGAAGCCCATCCAAACCCGGCTCAACTTGATGTTCCTTGG

CCAGTTTCGCCTGCAGCCCACGATAAACACCAACAAAGACAGCGAGATCGACGCC

CACAAGCAGGAGAGCGGCATCGCGCCCAACTTCGTGCACAGTCAGGACGGGTCCC

ATCTGCGGAAAACTGTTGTGTGGGCTCACGAGAAGTACGGCATTGAGAGCTTCGC

CCTGATACACGACAGCTTCGGGACCATACCAGCGGACGCAGCGAACCTGTTCAAA

GCCGTGCGGGAAACAATGGTCGACACCTACGAAAGCTGCGACGTACTGGCAGACT

TCTATGACCAATTCGCCGACCAGCTTCACGAGTCACAGCTCGACAAGATGCCCGCT

CTGCCCGCGAAAGGCAACCTGAATTTGCGCGACATCCTTGAGAGCGATTTTGCGTT

CGCCTCTGGTGGTTCTACTAATCTGTCAGATATTATTGAAAAGGAGACCGGTAAGC

AACTGGTTATCCAGGAATCCATCCTCATGCTCCCAGAGGAGGTGGAAGAAGTCATT

GGGAACAAGCCGGAAAGCGATATACTCGTGCACACCGCCTACGACGAGAGCACCG

ACGAGAATGTCATGCTTCTGACTAGCGACGCCCCTGAATACAAGCCTTGGGCTCTG

GTCATACAGGATAGCAACGGTGAGAACAAGATTAAGATGCTCTCTGGTGGTTCTCC

CAAGAAGAAGAGGAAAGTCTAACCGGTCATCATCACCATCACCATTGAGTTTAAA CCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCT

CCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAA

ATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCTATTCTGGGGGGTGGG

GTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGG

GATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAACCAGCTGGGGCTCGATAC

CGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGA

AATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTA

AAGCCTAGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTG

CCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACG

CGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGACT

CGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTA

ATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAA

GGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATA

GGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCG

AAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGC

GCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGG

GAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTC

GTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGC

CTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCAC

TGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTAC

AGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTA

TCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCC

GGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTAC

GCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACG

CTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAG

GATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTA

TATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATC

TCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATA

ACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAG

ACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGC

CGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTT

GCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCC

ATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCC GGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGG

TTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCA

CTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATG

CTTTT CT GT GACTGGT GAGT ACT C AAC C AAGT C ATTCTGAGA AT AGT GT AT GC GGC

GACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAG

AACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGA

TCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTT

CAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAAT

GCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCC

TTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATAT

TTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAA

AGTGCCACCTGACGTCGACGGATCGGGAGATCGATCTCCCGATCCCCTAGGGTCG

ACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCT

TGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGC

AAGGCTTGACCGACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTG

CTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTA

ATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTA

CATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTG

ACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACG

TCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATC

Rattus norvegicus APOBEC1-T7 RNA Polymerase-UGI-NLS polypeptide sequence (SEQ ID NO: 49):

MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTN

KHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI ARLY

HHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRL

YVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSET PGTS

ESATPESNTINIAKNDFSDIELAAIPFNTLADHYGERLAREQLALEHESYEMGEARF RK

MFERQLKAGEVADNAAAKPLITTLLPKMIARINDWFEEVKAKRGKRPTAFQFLQEIK P

EAV AYITIKTTLACLTS ADNTTV Q AV AS AIGRAIEDEARF GRIRDLEAKHFKKNVEEQL

NKRV GHVYKKAFMQVVEADMLSKGLLGGEAWSSWHKEDSIHV GVRCIEMLIESTGM

VSLHRQNAGVVGQDSETIELAPEYAEAIATRAGALAGISPMFQPCVVPPKPWTGITG G

GYWANGRRPLALVRTHSKKALMRYEDVYMPEVYKAINIAQNTAWKINKKVLAVAN VITKWKHCPVEDIPAIEREELPMKPEDIDMNPEALTAWKRAAAAVYRKDKARKSRRIS

LEFMLEQANKFANHKAIWFPYNMDWRGRVY AV SMFNPQGNDMTKGLLTLAKGKPI

GKEGYYWLKIHGANCAGVDKVPFPERIKFIEENHENIMACAKSPLENTWWAEQDSPF

CFLAFCFEYAGVQHHGLSYNCSLPLAFDGSCSGIQHFSAMLRDEVGGRAVNLLPSET V

QDIYGIVAKKVNEILQADAINGTDNEVVTVTDENTGEISEKVKLGTKALAGQWLAYG

VTRSVTKRSVMTLAYGSKEFGFRQQVLEDTIQPAIDSGKGLMFTQPNQAAGYMAKLI

WESVSVTVVAAVEAMNWLKSAAKLLAAEVKDKKTGEILRKRCAVHWVTPDGFPVW

QEYKKPIQTRLNLMFLGQFRLQPTINTNKDSEIDAHKQESGIAPNFVHSQDGSHLRK TV

VWAHEKYGIESFALIHDSFGTIPADAANLFKAVRETMVDTYESCDVLADFYDQFADQ L

HESQLDKMPALPAKGNLNLRDILESDFAFASGGSTNLSDIIEKETGKQLVIQESILM LPE

EVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML S

GGS PKKKRKV

Uracil glycosylase inhibitor

In certain aspects, the compositions of the instant disclosure include a uracil glycosylate inhibitor. Uracil glycosylate inhibitor has been shown to facilitate C:G- T: A mutations. Uracil glycosylate inhibitor or uracil-DNA glycosylase inhibitor (UGI) is a small protein from Bacillus subtilis bacteriophage PBS1 which inhibits E. coli and other species’ uracil DNA glycosylase (UDG). UGI can disassociate UDG: DNA complexes. This protein binds specifically and reversibly to the host uracil-DNA glycosylase, preventing removal of uracil residues from PBS2 DNA by the host uracil-excision repair system. An exemplary UGI sequence is:

Bacillus subtilis Uracil glycosylate inhibitor (SEQ ID NO: 21)

MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTS

DAPEYKPWALVIQDSNGENKIKML

Nuclear Localization Signals (NLS)

In some aspects, the compositions of the present disclosure include a pEditor containing the T7 RNAP-cytidine deaminase fusion gene with a nuclear localization signal. A nuclear localization signal or sequence (NLS) is an amino acid sequence that 'tags' a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus. (Kalderon et al. Cell. 39: 499-509).

Classical NLSs can be classified as either monopartite or bipartite. The major structural differences between the two is that the two basic amino acid clusters in bipartite NLSs are separated by a relatively short spacer sequence (hence bipartite - 2 parts), while monopartite NLSs are not. The first NLS to be discovered was the sequence PKKKRKV (SEQ ID NO: 22) in the SV40 Large T-antigen (a monopartite NLS; Kalderon et al. Cell. 39: 499-509). The NLS of nucleoplasmin, KR[PAATKKAGQA]KKKK (SEQ ID NO: 23), is the prototype of the ubiquitous bipartite signal: two clusters of basic amino acids, separated by a spacer of about 10 amino acids (Dingwall et al. J. Cell Biol. 107: 841-9). Both signals are recognized by importin a. Importin a contains a bipartite NLS itself, which is specifically recognized by importin b. The latter can be considered the actual import mediator.

Chelsky et al. proposed the consensus sequence K-K/R-X-K/R (SEQ ID NO: 24) for monopartite NLSs (Dingwall et al). A Chelsky sequence may, therefore, be part of the downstream basic cluster of a bipartite NLS. Makkerh et al. carried out comparative mutagenesis on the nuclear localization signals of SV40 T-Antigen (monopartite), C-myc (monopartite), and nucleoplasmin (bipartite), and showed amino acid features common to all three. The role of neutral and acidic amino acids was shown for the first time in contributing to the efficiency of the NLS (Makkerh et al. Curr. Biol. 6: 1025-7).

Rotello et al. compared the nuclear localization efficiencies of eGFP fused NLSs of SV40 Large T-Antigen, nucleoplasmin ( AVKRP AATKKAGQ AKKKKLD ; SEQ ID NO: 25), EGL-13 (MSRRRKANPTKLSENAKKLAKEVEN; SEQ ID NO: 26), c-Myc (PAAKRVKLD; SEQ ID NO: 27) and TUS-protein (KLKIKRPVK; SEQ ID NO: 28) through rapid intracellular protein delivery. They found significantly higher nuclear localization efficiency of c-Myc NLS compared to that of SV40 NLS (Ray et al. Bioconjug. Chem. 26: 1004-7).

Mammalian Expression Vector Promoters

An expression vector, otherwise known as an expression construct, is commonly a plasmid or virus designed for gene expression in cells. The vector is used to introduce a specific gene into a target cell, and can commandeer the cell's mechanism for protein synthesis to produce the protein encoded by the gene. Expression vectors are the basic tools in biotechnology for the production of proteins. The vector is engineered to contain regulatory sequences that act as enhancer and promoter regions and lead to efficient transcription of the gene carried on the expression vector. The promoters for cytomegalovirus (CMV) and SV40 are commonly used in mammalian expression vectors to drive gene expression. Non-viral promoter, such as the elongation factor (EF)-l promoter, is also known.

CMV Promoter is commonly included in vectors used in genetic engineering work conducted in mammalian cells, as it is a strong promoter that drives constitutive expression of genes under its control. This promoter has been used to express a plethora of eukaryotic gene products and is used for specialty protein production, gene therapy, and DNA-based vaccination, among other applications.

The CMV promoter has the following sequence (SEQ ID NO: 29):

TAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTAC

ATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGA

CGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGT

CAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA

TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATT

ATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA

GTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATA

GCGGTTT GACT C AC GGGGATTTC C AAGT CTCC AC CC C ATT GAC GT C AAT GGGAGTT

TGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCA

TTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTG

GTTTAGTGAACCGTCAG

SV40 Promoter (Simian Virus 40 promoter) contains the SV40 enhancer promoter region and origin of replication (part no. GA-ori-00009.1) for high-level expression and replication in cell lines expressing the large T antigen (e.g. COS-7 and 293T cells). It does not replicate episomally in the absence of the SV40 large T antigen. The SV40 promoter is weak in B cells, but SV40 exhibits high activity in T24 and HCV29 human bladder urethelium carcinoma cell lines.

Human elongation factor- 1 alpha (EF-1 alpha) or EF-1 is a constitutive non-viral promoter of human origin that can be used to drive ectopic gene expression in various in vitro and in vivo contexts. EF-1 alpha is often useful in conditions where other promoters (such as CMV) have diminished activity or have been silenced (as in embryonic stem cells). Directed Evolution

Directed evolution (DE) is a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids toward a user-defined goal. In general, DE involves subjecting a gene to iterative rounds of mutagenesis, selection (expressing those variants and isolating members with the desired function), and amplification (generating a template for the next round). Advantageously, it can be performed both in vivo and in vitro). Directed evolution is used both for protein engineering as an alternative to rationally designing modified proteins, as well as studies of fundamental evolutionary principles in a controlled, laboratory environment.

Mammalian cells have been employed in DE to engineer recombinant proteins, particularly those that require posttranslational modifications, such as antibodies, hormones and cytokines. Bacteria and yeast are less suitable to evolve these types of proteins because they have insufficient disulfide-bridge formation mechanisms, lack glycosylation, and frequently form protein aggregates. The ability to evolve mammalian proteins within mammalian cells is a relatively recent development, with the methods of the instant disclosure constituting an advance in mammalian mutagenesis approaches available for performing DE. Enhanced performance of DE in mammalian cells is expected to decrease the development time required for generating robust, high-producing mammalian cells lines for commercial applications involving engineering of novel enzymes, proteins (e.g., pharmaceutical applications), and immune support therapies (e.g., bacteriophage with antibody genes). As compared to bacteria and yeast, mammalian cells exhibit low productivity due to their slow growth rates and tendency to undergo programmed cell death (apoptosis). DE in mammalian cells has previously relied upon non-physiological environments, with such DE methods rapidly saturating mutagenized sites, or such DE approaches have only been adapted optimally in bacterial and yeast systems. Use of DE in mammalian cells prior to the instant disclosure has also been hampered because mammalian cells are time-consuming to work with, exhibit a low efficiency of stable gene integration, have a tendency toward multiple gene insertions, and display highly variable expression levels. Certain aspects of the instant disclosure relate to compositions and methods that involve pseudo-random integrated mutation of eukaryotic cells (PRIME), which enables DE in mammalian cells while overcoming some of the above-stated challenges to DE previously described in the art (Pourmir et al. Comput Struct Biotechnol J. 2: e201209012). Mammalian Target Genes

The methods and compositions of the instant disclosure can be applied to achieve targeted mutagenesis of mammalian cells across long stretches of sequence, optionally in and around effectively any region of the genome, including targeted genes and/or other genetic elements. In certain embodiments, the methods and compositions of the instant disclosure can be applied to oncogenes and/or cancer-related genes. Exemplary oncogenes and/or cancer-related genes include, but are not limited to, those recited in Table 1.

Table 1. Exemplary Oncogenes and Cancer-Related Genes

Mammalian Cell Culture

In certain aspects, the instant disclosure describes methods and compositions designed to achieve targeted mutagenesis of mammalian cells across long stretches of sequence. Mammalian cell culture is used widely in academic, medical and industrial settings. It has provided a means to study the physiology and biochemistry of the cell, and developments in the fields of cell and molecular biology have required the use of reproducible model systems, which cultured cell lines are especially capable of providing. For medical use, cell culture provides test systems to assess the efficacy and toxicology of potential new drugs. Large-scale mammalian cell culture has allowed production of biologically active proteins, initially production of vaccines and then recombinant proteins and monoclonal antibodies; meanwhile, recent innovative uses of cell culture include tissue engineering, as a means of generating tissue substitutes.

Mammalian cells can be isolated from tissues for ex vivo culture in several ways. Cells can be easily purified from blood. However, only the white cells are capable of growth in culture. Cells can be isolated from solid tissues by digesting the extracellular matrix using enzymes such as collagenase, trypsin, or pronase, before agitating the tissue to release the cells into suspension. Alternatively, pieces of tissue can be placed in growth media, and the cells that grow out are available for culture. This method is known as explant culture. Cells that are cultured directly from a subject are known as primary cells. With the exception of some derived from tumors, most primary cell cultures have limited lifespan (Voight etal. Journal of Molecular and Cellular Cardiology. 86: 187-98). An established or immortalized cell line has acquired the ability to proliferate indefinitely either through random mutation or deliberate modification, such as artificial expression of the telomerase gene. Numerous cell lines are well established as representative of particular cell types. Examples of commonly used mammalian cell lines include HEK293T cells, VERO, BHK, HeLa, CV1 (including Cos), MDCK, 293, 3T3, myeloma cell lines (e.g., NSO, NS 1), PC12, WI38 cells, and Chinese hamster ovary (CHO) cells, among many other examples (Langdon et al. Molecular Biomethods Handbook. 861-873).

Mammalian Cell Transfection Methods

Mammalian cell transfection is a technique commonly used to express exogenous DNA or RNA m a host cell line. There are many different methods available for transfecting mammalian cells, depending upon the cell line characteristics, desired effect, and downstream applications. These methods can be broadly divided into two categories: those used to generate transient transfection, and those used to generate stable transfectants. Transient transfection methods include, but are not limited to, liposome-mediated transfection, non-hposomal transfection agents (lipids and polymers), dendrimer-based transfection, and electroporation. Stable transfection methods include, but are not limited to microinjection, and virus-mediated gene delivery.

Certain aspects of the instant disclosure describe methods and compositions designed to achieve targeted mutagenesis in mammalian cells across long stretches of sequence, via use of virus-mediated gene delivery (bacteriophages). Viral vectors, such as bacteriophages, retrovirus, adenovirus (types 2 and 5), adeno-associated virus, herpes virus, pox virus, human foamy virus (HFV), and lentivirus have been used for gene transfection. All viral vector genomes have been modified by deleting some areas of their genomes so that their replication becomes altered, rendering such viruses safer than native forms. However, viral delivery systems have some problems, including: the marked immunogenicity of viruses, which can cause induction of the inflammatory system, potentially leading to degeneration of transducted tissue; and toxin production, including mortality, the insertional mutagenesis; and their limitation in transgenic capacity size. During the past few years some viral vectors with specific receptors have been designed that are capable of transferring transgenes to some other specific cells, which are not their natural target cells (retargeting) (Nayerossadat el al. Adv Biomed Res. 1 : 27).

Kits

The instant disclosure also provides kits containing compositions of the instant disclosure, e.g., for use in methods of the present disclosure. Kits of the instant disclosure may include one or more containers comprising a composition (e.g., a nucleic acid encoding for a nucleic acid-editing deaminase and a bacteriophage RNA polymerase (e.g., T7 RNAP), optionally also encoding for a UGI and/or a NLS) of this disclosure. In some embodiments, the kits further include instructions for use in accordance with the methods of this disclosure. In some embodiments, these instructions comprise a description of administration/transfection of the composition(s) to mammalian cells, optionally further including instructions for performance of directed evolution of a targeted gene in mammalian cell(s).

Instructions supplied in the kits of the instant disclosure are typically written instructions on a label or package insert (e.g., a paper sheet included in the kit), but machine-readable instructions (e.g., instructions carried on a magnetic or optical storage disk) are also acceptable. Instructions may be provided for practicing any of the methods described herein.

The kits of this disclosure are in suitable packaging. Suitable packaging includes, but is not limited to, vials, bottles, jars, flexible packaging (e.g., sealed Mylar or plastic bags), and the like. The container may further comprise a mammalian cell transfection agent.

Kits may optionally provide additional components such as buffers and interpretive information. Normally, the kit comprises a container and a label or package insert(s) on or associated with the container.

The practice of the present disclosure employs, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA, genetics, immunology, cell biology, cell culture and transgenic biology, which are within the skill of the art. See, e.g., Maniatis et al, 1982, Molecular Cloning (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.); Sambrook et al, 1989, Molecular Cloning, 2nd Ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.); Sambrook and Russell, 2001, Molecular Cloning, 3rd Ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.); Ausubel et al., 1992), Current Protocols in Molecular Biology (John Wiley & Sons, including periodic updates); Glover, 1985, DNA Cloning (IRL Press, Oxford); Anand, 1992; Guthrie and Fink, 1991; Harlow and Lane, 1988, Antibodies, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.); Jakoby and Pastan, 1979; Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. 1984); Transcription And Translation (B. D. Hames & S. J. Higgins eds. 1984); Culture Of Animal Cells (R. I. Freshney, Alan R. Liss, Inc., 1987); Immobilized Cells And Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide To Molecular Cloning (1984); the treatise, Methods In Enzymology (Academic Press, Inc., N.Y.); Gene Transfer Vectors For Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring Harbor Laboratory); Methods In Enzymology, Vols. 154 and 155 (Wu et al. eds.), Immunochemical Methods In Cell And Molecular Biology (Mayer and Walker, eds., Academic Press, London, 1987); Handbook Of Experimental Immunology, Volumes I- IV (D. M. Weir and C. C. Blackwell, eds., 1986); Riott, Essential Immunology, 6th Edition, Blackwell Scientific Publications, Oxford, 1988; Hogan et al., Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986); Westerfield, M., The zebrafish book. A guide for the laboratory use of zebrafish (Danio rerio), (4th Ed., Univ. of Oregon Press, Eugene, 2000).

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Reference will now be made in detail to exemplary embodiments of the disclosure. While the disclosure will be described in conjunction with the exemplary embodiments, it will be understood that it is not intended to limit the disclosure to those embodiments. To the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the disclosure as defined by the appended claims. Standard techniques well known in the art or the techniques specifically described below were utilized.

EXAMPLES

Example 1: Materials and Methods

Design and Construction of pTarget and pEditor Plasmids

A list of the plasmids and primers used in this disclosure are listed in Table 2.

Table 2. Plasmids and Primers of the Disclosure

Plasmids

Cloning Primers

Amplification Primers pcDNA3.1(+)-IRES-GFP was a gift from Kathleen L. Collins (Addgene plasmids #51406). pCMV-BE3 was a gift from David Liu (Addgene plasmid # 73021). pGH335_MS2- AID*A-Hygro was a gift from Michael Bassik (Addgene plasmid # 85406). Lenti_CMV_T_IR, Lenti_PAX2 and Lenti_VSVg were gifts from Jamie Marshall. T7 RNAP was ordered as a gBlock from Integrated DNA Technologies (IDT). The Cas9(D10A) in the pCMV-BE3 construct was replaced with T7 RNAP by Gibson assembly to generate pAPOBEC-T7 and pAPOBEC- T7-UGI in which the original T7 promoter was also deleted to avoid self-editing. Rat APOBEC1 in pAPOBEC-T7 and pAPOBEC-T7-UGI was replaced with AID*A amplified from pGH335_MS2-AID*A-Hygro to generate pAID-T7 and pAID-T7-UGI. For pTarget, T7 promoter-GFP fragment was amplified from pcDNA3.1(+)-IRES-GFP and was sub-cloned into a pUC19 backbone. This fragment was also sub-cloned into the Lenti_CMV-T-IR to generate the Lenti_CMV_T7_GFP-T-IR. A pTarget plasmid without T7 promoter was also cloned as a negative control. BFP fragment was generated from GFP sequence via site-directed mutagenesis. pAID-T7G645A-UGI, pAID-T7P266L-UGI, pAID-T7P266LG645A-UGI and pAID- T7G645AQ744R-UGI were cloned via site-directed mutagenesis using wild type pAID-T7-UGI as a template. All plasmid sequences were verified using Sanger sequencing. All cloning primers were ordered from IDT. Plasmids were extracted using Qiaprep ® Spin Miniprep Kit and Plasmid Plus Midi Kit (Qiagen ® ).

Cell Culture and Plasmid Transfection

HEK293T cells were obtained from ATCC and were grown in high-glucose (4.5 g/L) DMEM supplemented with GlutaMAX™, 1 mM sodium pyruvate, 10 % FBS, 100 units/mL of penicillin and lOOpg/mL of streptomycin in a humidified chamber with 5 % CCh at 37°C. Cells were maintained at ~80 % confluence in 24-well plates on the day of transfection. 250 ng of pTarget and 250 ng of pEditor plasmids were mixed together with 1 pi of TransIT-X2 reagent (Mirus) and the mixture was incubated in 50 mΐ of Opti-MEM ® (Thermo Fisher Scientific™) for 30 min. The mixture was then added drop-wise to each well. For time-point experiment using target-integrated single cell clones, cells were cultured in 12-well plates and were transfected with 1000 ng of pTarget plasmids. Cells were subsequently harvested at the time points indicated above.

Lentivirus Production and Generation of Single Cell Clones 3 million HEK293T cells were cultured in 10 mL of culture media in a 10-cm dish. Cells were transfected with 12 mg of Lenti_CMV_T7_GFP-T-IR, 9 pg of Lenti_PAX2 and 3 pg of lenti_VSVg. 24 hr after transfection, culture media was replaced with 6 mL of high-glucose (4.5 g/L) DMEM supplemented with GlutaMAX™, 1 mM sodium pyruvate, 30% FBS, 100 units/mL of penicillin and 100pg/mL of streptomycin. Supernatant containing viral particles was collocated and filtered through 0.22 mM filters 24 hr after. To generate single cell clones, HEK293T cells in a 6-well plate with 2.5 mL of culture media received 500 mΐ of virus together with polybrene at a final concentration of 8 pg/mL. Two days after transduction, successfully- integrated cells were selected by puromycin at a concentration of 1.5 pg/mL. Seven days after transduction, integrated cells were subject to FACS-sorting in single cell format into 96-well plates using a MoFlo ® Astrios™ EQ Cell Sorter (Beckman Coulter™) and single cells were allowed to expand to form colonies.

Fluorescence Microscopy and Image Analysis

HEK293T cells transfected with pTarget and pEditor plasmids were seeded in a 24-well glassbottom plate. Cells were imaged using an inverted Nikon ® CSU-W1 Yokogawa ® spinning disk confocal microscope with 488 nm (GFP) and 405 nm (BFP) lasers, an air objective (Plan Apo l, numerical aperture (NA) = 0.75, 20x, Nikon), and an Andor ® Zyla sCMOS ® camera. NIS- Elements AR software (v4.30.01, Nikon ® ) was used for image capture. Images were processed using ImageJ (National Institutes of Health). CellProfiler (version 3.1.5, Broad Institute) (21) was used for segmentation and counting BFP and GFP positive cells. GFP positive cells were further thresholded by Otsu’s method using integrated intensity with the R package autothresholdr (22) .

Preparation of Sequencing Library

To sequence the targeted region (-2000 bp) on pTarget, plasmids were extracted from -1 million

cells using Qiaprep Spin Miniprep Kit. PCR was performed using those plasmids as templates (primer sequences are shown in Table 2 above. Ampure ® XP beads (Beckman Coulter™) were added to samples at a 0.8: 1 ratio to size select for the pcr'ed fragments. The concentration of each sample was measured by Qubit™ (Thermo Fisher Scientific™). 1 ng of DNA at a volume of 2.5 mΐ from each sample was used as input for the subsequent library preparation. Sequencing library was prepared following the Nextera ® XT Kit protocol (Illumina ® ) except that half the amount of each reagent was used. To sequence the targeted loci, genomic DNA was extracted from -1 million cells using the Quick-DNA™ Kit (Zymo Research™). 4 mΐ of extracted genomic DNA were used to set up in vitro transcription reactions at a volume of 10 mΐ using HiScribe™ T7 High Yield RNA Synthesis Kit (New England BioLabs, Inc. ® ). The newly synthesized RNA was purified using RNA Clean & Concentrator Kit (Zymo Research™). Reverse transcription was performed using Superscript ® IV First-Strand Synthesis System (Thermo Fisher Scientific™). cDNA was purified using AMPure ® XP beads at a ratio of 1 : 1 and was used as the template for subsequent PCR reactions. The concentration of each sample was measured by Qubit ® and the same Nextera ® XT Kit protocol was followed to prepare sequencing library. Sequences were measured on a MiSeq ® (Illumina ® ) with paired-end reads.

Analysis of Sequencing Data

On average, 1 million reads were produced for each sample. Illumina ® sequencing adapters were trimmed during sample demultiplexing using bcl2fastq2 (version 2.19.1). Bases in each read with Illumina ® quality score lower than 25 were filtered. Alignment on respective reference sequence was performed using Bowtie 2 (v2.2.4.1) (23). Alignment files were generated in bam format and were visualized in Geneious (v 11.1.5). The mutation enrichment was calculated at each base with custom Matlab™ scripts . The first and last 15 bases of each aligned read and bases with read count less than 100 were excluded from the analysis. Transitions, transversions, and indels observed at each position were calculated, and the C->T and G->A mutation profiles were plotted, respectively, for each sample. The mutation rate per base data was obtained by dividing the number of reads with mutations over the number of total reads at each base. The average mutation rate for each possible combination of base switching for each sample was calculated by averaging the mutation rate per base data across the targeted region. The pT7 sample was used to estimate the background error rates introduced through sample preparation and Illumina ® sequencing. The final average mutation rate for each base switching combination was calculated by subtracting the background error rate. Negative values were set to 0. All bar graphs and dot plots were generated in RStudio ® using ggplot2.

Statistical Analysis

Pairwise comparison was analyzed using two-sided t test. Example 2: Construction and Demonstration of a Pseudo-Random Integrated Mutation of Eukaryotic Cells (PRIME)

It was initially examined whether combining T7 RNAP with a cytidine deaminase could create a means of continuously diversifying DNA nucleotides downstream of a T7 promoter (FIG. 1A). This was tested by devising a dual-plasmid system (pTarget, pEditor), with pTarget containing an EGFP gene downstream of a T7 promoter and pEditor containing the T7 RNAP- cytidine deaminase fusion gene with a nuclear localization signal (FIG. IB). Two variants of the cytidine deaminase, rat APOBEC1 and a hyperactive mutant of AID (AID*A), previously selected for their reported strong catalytic activity (4, 11), were selected for pEditor. Additionally, variants containing a uracil DNA glycosylase inhibitor (UGI), which has been shown to facilitate C:G->T:A mutations (11), fused to the 3’ end were also tested (FIG. IB).

To test whether fusing a cytidine deaminase to T7 RNAP maintained T7 RNAP activity, pTarget and various pEditor plasmids were transfected into HEK 293T cells and EGFP fluorescence under each condition was measured. Consistent with previous reports (9, 10), T7 RNAP alone (pT7) was able to drive EGFP expression, while deaminase alone (pAPOBEC) could not (FIG. 4A). All variants of cytidine deaminase-T7 RNAP fusions induced EGFP expression (FIG. 4A), which indicated that the T7 RNAP-deaminase fusion proteins maintained the transcriptional activity of T7 RNAP.

The ability of the T7 RNAP-deaminase fusion protein to induce mutations was then tested within a targeted region. HEK293T cells transfected with both pTarget and pEditor were collected 3 days after transfection. pTarget plasmids were then extracted, and a downstream 2000-bp window was amplified by PCR for high-throughput sequencing (FIG. 5B and Example 1, above). Representative reads from pT7, pAID-T7, and pAID-T7-UGI aligned to the same region within the 2000-bp window are shown in FIG. 1C. Cells transfected with pAID-T7-UGI contained the most number of reads with C->T (green) and G->A (red) mutations, whereas very few reads in the pT7 control group were found to harbor such mutations. It was observed that both C->T and G->A mutation events caused by the cytidine deaminase-T7 RNAP fusion proteins were identified across the entire length of the 2000-bp window, with mutation rates at multiple base positions at -0.5-2% (represented as the percentage of reads harboring the mutation at each base; FIG. ID and FIG. 5 A). In contrast, the control pT7 group exhibited mutation rates of less than 0.1% for the majority of bases (which is similar to the error rate expected with Illumina ® sequencing chemistry; FIG. ID and FIG. 5A). Thus, mutation rates in the pT7 group were treated as measurement background (i.e., sequencing errors). The overall average C->T and G->A mutation rates for each of the pEditor variants was then calculated. The most efficient variant, which was observed to be pAID-T7-UGI, showed an average C->T mutation rate of 1.30 per 1000 base pairs (kbp 1 ) and an average G->A mutation rate of 2.92 kbp 1 (FIG. IE), which was approximately 500,000-fold higher than the basal somatic mutation frequency in human cells (12). Although not as efficient as the pAID-T7-UGI variant, the pAID-T7 variant was still identified as capable of inducing an average C->T mutation rate of -0.97 kbp 1 and an average G->A mutation rate of -1.55 kbp 1 . The fact that both C->T and G- >A substitutions were observed in the data indicated that there was no significant mutational strand bias. The two AID constructs (pAID-T7-UGI and pAID-T7) exhibited higher enzymatic activity than APOBEC constructs, with the pAPOBEC-T7 variant showing an average C->T mutation rate of -0.3 kbp 1 and an average G->A mutation rate of -0.15 kbp 1 , while the pAPOBEC-T7-UGI variant showed an average C->T mutation rate of -0.33 kbp 1 and an average G->A mutation rate of -0.17 kbp 1 (FIG. IE). Of note, cells transfected with only cytidine deaminase (pAPOBEC or pAID) showed C->T and G->A mutation rates similar to the background measurement error rates (i.e., similar to that of pT7, (FIG. 5B; pT7 vs. pAPOBEC, two-sided /test, » =0.1201 in C->T, >=0.2244 in G->A; pT7 vs. pAID, two-sided /test, p=0.3625 in C->T. /; 0.5877 in G->A), which indicated high specificity of the system. Moreover, although high mutation rates were observed for C->T and G->A base substitutions in AID variants, low mutation rates (< 0.1 kbp 1 ) were observed in other combinations of base substitutions, in line with the primary mutational profile of cytidine deamination (FIG. 5C).

Example 3: Use of PRIME to Mutate Targeted Gene Loci within the Human Genome

PRIME was then utilized to mutate targeted gene loci within the human genome. An EGFP gene under the control of a T7 promoter was integrated into the HEK293T genome via lentiviral transduction. A CMV promoter was also included upstream of the T7 promoter, to allow for subsequent single cell sorting by EGFP fluorescence. A single cell clone of the EGFP construct-integrated cells was then selected and expanded (FIG. 2A). By transfecting pEditor variant pAID-T7-UGI into the integrated single cell clonal cell line, it was observed to be possible to achieve an average C->T and G->A mutation rate of more than 1-2 kbp 1 three days after transfection (FIG. 2A). Furthermore, another round of pEditor transfection increased the average mutation rate by another 1-2 kbp 1 within the second 3-day period (FIG. 2A). In contrast, no significant accumulation of mutations was observed in the control pAID group at either time point (FIG. 2A). PRIME activity was then examined in an additional two single cell clones. Although it was observed that there were variations in mutation rates across single cell clones in the pAID-T7-UGI group(s), the trend in the accumulation of mutations in the targeted genome region over time remained consistent among all cell clones tested (FIG. 6). The heterogeneity observed was likely due to differences in integration copy number and/or genomic accessibility of the integrated T7 promoter to the PRIME system.

To examine potential off-target effects of the PRIME system in the genome, a search for regions in the genome that possess the conserved T7 promoter sequence (TAATACGACTCACTATAG; SEQ ID NO: 1) was performed. Although an exact match for the T7 promoter sequence in the human genome was not identified, three regions possessing a single-base mismatch, located at distinct locations in chromosomes 6, 7 and 8, respectively, were identified. Among them, the regions in chromosome 6 and 7 (designated“Chr6” and“Chr7”, respectively) shared the same sequence (TAATACAACTCACTATAG; SEQ ID NO: 1) (FIG. 2B, upper panel). The genomic mutation rate of the 2000-bp window immediately after Chr6 and Chr7 was observed using targeted genomic sequencing (see Example 1, above). After 7 days of expression of pAID-T7-UGI, the average C->T and G->A mutation rates of the two regions were observed to be similar to cells expressing pT7 only (-0.2-0.5 kbp 1 ), whereas the PRIME-targeted regions (i.e., the regions downstream of the integrated T7 promoter in the genome) showed significant edits (-2.0-4.5 kbp 1 n = 2 biological replicates across 2 single cell clones; FIG. 2B, lower panel). Thus, off-target effects were identified to be minimal/undetectable as compared to background.

Example 4: Modification of the T7 RNAP Elongation Rate Rendered the Editing Rate of PRIME to be Tunable

T7 RNAP is widely used in biotechnology and has previously been shown to be highly engineerable. It was examined if the editing rate of PRIME could be tuned by modifying the elongation rate of T7 RNAP or its processivity over the DNA template, as, without wishing to be bound by theory, such changes would be expected to modulate the probability of cytidine deaminase-DNA template interaction. To this end, three mutations (P266L, G645A, Q744R) relative to the wild type T7 RNAP were constructed and tested, with these particular mutations identified based upon previous studies (FIG. 3A, upper panel). P226L was previously shown to enhance the DNA processivity of T7 RNAP over a subregion of the initially transcribed sequence, although this mutation also decreased T7 RNAP affinity for the promoter (13). The G645A mutation was previously shown to decrease the elongation rate of wild type T7 RNAP 14, and Q744R was previously shown to enhance the specific activity of the polymerase (15). pEditor variants pAID-T7G645A-UGI, pAID-T7P266L-UGI, pAID-T7P266LG645A-UGI and pAID- T7G645AQ744R-UGI were constructed and compared for their editing efficiency, as compared to pAID-T7-UGI, in a single cell clone integrated with T7 promoter-controlled target. Across two biological replicates, pEditor variant pAID-T7G645AQ744R-UGI induced average C->T and G->A mutation rates that were more than 2-fold higher than those of the wild type pAID- T7-UGI, whereas pAID-T7P266L-UGI reduced the mutation rates by a factor of 2 (FIG. 3A, lower panel).

To demonstrate PRIME can perform functional mutagenesis in mammalian systems, PRIME was used to shift the fluorescence spectra of blue fluorescent protein (BFP). A single H66Y amino acid substitution (in this case, CAC->TAC or TAT) has been previously identified to cause a shift in the fluorescence excitation and emission spectra of BFP, to that of GFP16 (FIG. 3B). The BFP gene was placed under the control of a T7 promoter and a CMV promoter (pBFP), and the pBFP plasmid was introduced alongside pEditor variants into HEK293T cells. After 3 days, fluorescence microscopy and automatic cell counting by Cellprofiler was used to assay the ratio between the number of GFP positive cells and the number of BFP-positive cells. GFP-positive cells were observed in both pAID-T7 (-0.5%) and pAID-T7-UGI (-1.2%) groups, whereas spectrum shifts in BFP were not observed in the pT7 group. It was also noted that less than 0.2% of cells in the pAID group became GFP positive (FIG. 3C).

In summary, the above examples have demonstrated that cytidine deaminase fused to T7 RNAP can be used to generate localized nucleotide diversity within the human genome at an average C->T and G->A mutation rate ranging from -0.4-4 kbp 1 within a week. Higher editing efficiency may be achieved via additional engineering of the T7 RNAP. The wide editing window of PRIME (>2000 bps) makes it possible to target a long stretch of a selected genomic region over multiple cellular generations. In comparing PRIME with other reported directed evolution methods (FIG. 7), PRIME has demonstrated herein its superiority in terms of both high editing rate and wide editing window. PRIME can be leveraged to evolve both new protein functions and new cellular systems. By introducing T7 promoters to different genes of interest, it is anticipated that this system can simultaneously diversify multiple genomic loci without disrupting reading frames, by avoiding insertions and deletions observed with other DNA editors (17, 18). The base-editing profile of the system can also be greatly expanded by utilizing other base editing enzymes, such as the newly evolved adenine deaminases (19) in concert with cytidine deaminases. Moreover, multiplexed-PRIME systems utilizing orthogonal bacteriophage polymerase systems (e.g., SP6 RNAP) may allow differential editing on multiple loci. Additionally, the highly efficient pseudo-random DNA editing property of PRIME opens doors to a wider range of applications that are not limited to directed evolution. Due to its ubiquity and durability, genomic DNA serves as an ideal medium for recording artificial biological information (20). PRIME is also well suited to serve as a cellular recorder for long-term storage of information using DNA as a medium for the following reasons: 1) PRIME enables continuous targeted mutagenesis in genomic loci over multiple cellular generations, which is a prerequisite for long-term information storage; 2) The toolkit for the PRIME system can be greatly expanded by engineering different editor variants which induce varying targeted mutation rates ranging from -0.4-4 per kbp 1 within a week. This gives users flexibility in choosing the one variant that best suits their experimental needs regarding the time-scale of the cellular recording; 3) the wide editing window of PRIME (at least 2000 bps) ensures that the editable sites in the genome will not be exhausted within a short time frame, which is beneficial to applications such as long term lineage tracing and 4) a multiplexed-PRIME system is contemplated as making multi-event analog recording possible. PRIME therefore provides an engineer-able and generalized platform for nucleotide diversification in mammalian systems.

Example 5: In vitro and in vivo recording of cell lineages using TRACE

TRACE (T7 polymeRAce-driven Continuous Editing), as described herein and also referred to herein as“PRIME”, is a method that enables continuous, targeted mutagenesis in human cells using a cytidine deaminase fused to T7 RNA polymerase. TRACE can be applied to enable cell lineage recordings both in vitro and in vivo. A reconstruction of lineage trees by grouping and ranking DNA mutations from sequencing reads is shown in FIG. 8. In this experiment, a pool of HEK294 cells were sparsely integrated with barcoded lentiviral TRACE templates so that each integrated cell had a unique barcoded TRACE template. Mutation accumulation over time was demonstrated within the same molecular lineage. Reads which shared a unique lentiviral barcode also shared private clonal, and hierarchical sub-clonal mutations which accumulated over time, which demonstrated the usefulness of TRACE for lineage tracing.

A TRACE transgenic mouse is generated by decomposing the TRACE system into two components: the TRACE editor consisting of the T7 RNA-polymerase deaminase fusion protein, and the T7 recording template consisting of a T7 promoter and a transcribed editing template. Both the TRACE editor as well as the T7 promoter-recording template are integrated into a mouse at the Rosa 26 locus. Oocytes containing a T7 promoter-recording template are then fertilized with sperm harboring a constitutively active TRACE editor to initiate sequence diversification in the whole embryo. In addition, to enable cell type-specific lineage tracing, existing mouse lines expressing cell type-specific Cre-recombinase or Cre-ER (a tamoxifen inducible version of Cre) are leveraged to drive the conditional expression of a stably integrated TRACE editor in cells where Cre-recombinase is present. Thus, by crossing the TRACE mouse line with a Cre-driver line, cell-type specific lineage recording is achieved, and additional temporal resolution is provided by tamoxifen induction.

References

1. Farzadfard, F. & Lu, T.K. Emerging applications for DNA writers and molecular recorders. Science 361, 870-875 (2018).

2. Esvelt, K.M., Carlson, J.C. & Liu, D.R. A system for the continuous directed evolution of biomolecules. Nature 472, 499-503 (2011).

3. Su, T. et al. A CRISPR-Cas9 Assisted Non-Homologous End-Joining Strategy for Onestep Engineering of Bacterial Genome. Scientific reports 6, 37895 (2016).

4. Hess, G.T. et al. Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nature methods 13, 1036-1042 (2016).

5. Halperin, S.O. et al. CRISPR-guided DNA polymerases enable diversification of all nucleotides in a tunable window. Nature 560, 248-252 (2018).

6. Moore, C.L., Papa, L.J., 3rd & Shoulders, M.D. A Processive Protein Chimera Introduces Mutations across Defined DNA Regions In Vivo. Journal of the American Chemical Society 140, 11560-11564 (2018).

7. Alexander, D.L. et al. Random mutagenesis by error-prone pol plasmid replication in Escherichia coli. Methods in molecular biology (Clifton, N.J.) 1179, 31-44 (2014).

8. Chamberlin, M., Kingston, R., Gilman, M., Wiggs, J. & deVera, A. Isolation of bacterial and bacteriophage RNA polymerases and their use in synthesis of RNA in vitro. Methods in enzymology 101, 540-568 (1983).

9. Lieber, A., Kiessling, U. & Strauss, M. High level gene expression in mammalian cells by a nuclear T7-phase RNA polymerase. Nucleic acids research 17, 8485-8493 (1989).

10. Ghaderi, M. et al. Construction of an eGFP Expression Plasmid under Control of T7

Promoter and IRES Sequence for Assay of T7 RNA Polymerase Activity in Mammalian Cell Lines.

Iranian journal of cancer prevention 7, 137-141 (2014). 11. Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A. & Liu, D.R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016).

12. Milholland, B. et al. Differences between germline and somatic mutation rates in humans and mice. Nature communications 8, 15183 (2017).

13. Guillerez, J, Lopez, P.J., Proux, F., Launay, H. & Dreyfus, M. A mutation in T7 RNA polymerase that facilitates promoter clearance. Proceedings of the National Academy of Sciences 102, 5958-5963 (2005).

14. Bonner, G., Lafer, E.M. & Sousa, R. Characterization of a set of T7 RNA polymerase active site mutant. The Journal of Biological Chemistry 269, 25120-25128(1994).

15. Boulin, J.C. et al..Mutants with higher stability and specific activity from a single thermosensitive variant of T7 RNA polymerase. Protein Engineering, Design and Selection 26, 725-734 (2013).

16. Glaser, A., McColl, B. & Vadolas, J. GFP to BFP Conversion: A Versatile Assay for the Quantification of CRISPR/Cas9-mediated Genome Editing. Molecular therapy. Nucleic acids 5, e334 (2016).

17. Jakociunas, T., Pedersen, L.E., Lis, A.V., Jensen, M.K. & Keasling, J.D. CasPER, a method for directed evolution in genomic contexts using mutagenesis and CRISPR/Cas9.

Metabolic engineering 48, 288-296 (2018).

18. Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using

CRISPR-Cas9-induced genetic scars. Nature biotechnology 36, 469-473 (2018).

19. Gaudelli, N.M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).

20. Church, G.M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628 (2012).

21. Carpenter, A.E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7:R100 (2006).

22. Landini, G, Randell, D.A., Fouad, S, and Galton, A. Automatic thresholding from the gradients of region boundaries. Journal of Microscopy 265, 185-195 (2017).

23. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359 (2012).

24. Ravikumar, A., Arzumanyan, G.A., Obadi, M.K.A. & Liu, C.C. Scalable, continuous evolution of genes at mutation rates above genomic error thresholds. Cell 175, 1-12 (2018).

All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually.

One skilled in the art would readily appreciate that the present disclosure is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. The methods and compositions described herein as presently representative of preferred embodiments are exemplary and are not intended as limitations on the scope of the disclosure. Changes therein and other uses will occur to those skilled in the art, which are encompassed within the spirit of the disclosure, are defined by the scope of the claims.

In addition, where features or aspects of the disclosure are described in terms of Markush groups or other grouping of alternatives, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group or other group.

The use of the terms "a" and "an" and "the" and similar referents in the context of describing the disclosure (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the disclosure.

Embodiments of this disclosure are described herein, including the best mode known to the inventors for carrying out the disclosed invention. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description.

The disclosure illustratively described herein suitably can be practiced in the absence of any element or elements, limitation or limitations that are not specifically disclosed herein. Thus, for example, in each instance herein any of the terms "comprising", "consisting essentially of, and "consisting of may be replaced with either of the other two terms. The terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention that in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present disclosure provides preferred embodiments, optional features, modification and variation of the concepts herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this disclosure as defined by the description and the appended claims.

It will be readily apparent to one skilled in the art that varying substitutions and modifications can be made to the invention disclosed herein without departing from the scope and spirit of the invention. Thus, such additional embodiments are within the scope of the present disclosure and the following claims. The present disclosure teaches one skilled in the art to test various combinations and/or substitutions of chemical modifications described herein toward generating conjugates possessing improved contrast, diagnostic and/or imaging activity. Therefore, the specific embodiments described herein are not limiting and one skilled in the art can readily appreciate that specific combinations of the modifications described herein can be tested without undue experimentation toward identifying conjugates possessing improved contrast, diagnostic and/or imaging activity.

The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the disclosure to be practiced otherwise than as specifically described herein. Accordingly, this disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the disclosure described herein. Such equivalents are intended to be encompassed by the following claims.