Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CRISPR-ASSOCIATED PROTEIN FROM FRANCISELLA AND USES RELATED THERETO
Document Type and Number:
WIPO Patent Application WO/2017/015015
Kind Code:
A1
Abstract:
This disclosure relates to CRISPR-associated proteins from Francisella (CPF1), variants, fusions, and nucleic acid complexes related thereto. In certain embodiments, the disclosure relates to recombinant vectors encoding a CPF1 and variants thereof. In certain embodiments, the disclosure relates to CPF1 and guide RNA complexes for use in targeted binding and/or cutting of nucleic acids, e.g., DNA, mRNA, or viral RNA. In certain embodiments, the disclosure contemplates genome editing using CPF1 and nucleic acid complexes disclosed herein.

Inventors:
WEISS DAVID S (US)
RATNER HANNAH K (US)
SAMPSON TIMOTHY R (US)
Application Number:
PCT/US2016/042030
Publication Date:
January 26, 2017
Filing Date:
July 13, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV EMORY (US)
International Classes:
C12N15/10; C12N9/22; C12N15/11; C12N15/113; C12N15/82; C12N15/85
Domestic Patent References:
WO2015040402A12015-03-26
Foreign References:
EP3009511A22016-04-20
Other References:
ZETSCHE ET AL.: "Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System", CELL, vol. 163, no. Iss. 3, 25 September 2015 (2015-09-25), pages 759 - 771
SCHUNDER ET AL.: "First indication for a functional CRISPR/Cas system in Francisella tularensis", INTERNATIONAL JOURNAL OF MEDICAL MICROBIOLOGY, vol. 303, no. Iss. 2, 17 January 2013 (2013-01-17), pages 51 - 60, XP055271835
FONFARA ET AL.: "The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA.", NATURE, vol. 532, 20 April 2016 (2016-04-20), pages 517 - 521, XP055349049
Attorney, Agent or Firm:
MASON, James C. et al. (US)
Download PDF:
Claims:
CLAIMS

1. A polypeptide comprising a non-naturally occurring CPFl variant or fusion.

2. The polypeptide of Claim 1, wherein the CPFl variant comprises a nuclease inactivating mutation.

3. The polypeptide of Claim 2, wherein the mutation is one or more nucleotide substitutions, deletions, and/or additions.

4. The polypeptide of Claim 1, wherein the CPFl fusion or fusion variant comprises a nuclear localization sequence, transcriptional activator, transcriptional repressor, hi stone-modifying protein, integrase, deaminase, and/or recombinase.

5. A recombinant vector comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter.

6. The recombinant vector of Claim 5, wherein the vector is a viral vector.

7. The recombinant vector of Claim 5, wherein the viral vector is selected from the group consisting of lentiviral, adenoviral, adeno-associated and herpes simplex viral vectors.

8. The recombinant vector of Claim 5, wherein the variant comprises at least 1,000, 1, 100, 1,200, or 1,250 amino acids, comprises one or more amino acid substitutions, additions, and/or deletions and has greater than 90, 95, 96, 97, 98, or 99% identity to SEQ ID NO: 1.

9. The recombinant vector of Claim 5, wherein the variant comprises one or more amino acid substitutions, additions, and/or deletions in a nuclease/cleavage domain.

10. The recombinant vector of Claim 5, further comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter wherein the guide RNA binds the encoded CPFl protein and a segment with sufficient base pairs to hybridize to a target sequence.

11. A method of reducing the translation of mRNA into a polypeptide comprising a) providing a cell expressing a target sequence within a mRNA;

b) inserting into the cell a vector comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter; and

c) inserting into the cell a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter,

wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the cell express the CPFl protein and guide RNA and results in reduction of the translation of the mRNA into a polypeptide.

12. A method of altering the transcription of a gene comprising

a) providing a cell expressing a target sequence about the promoter region of a gene; b) inserting into the cell a vector comprising a nucleic acid sequence encoding a fusion protein comprising i) a transcriptional activator or transcriptional repressor and ii) a CPFl protein variant of SEQ ID NO: 1, wherein nuclease cleavage activity in the CPFl protein variant is inactivated, in operable combination with a promoter; and

c) inserting into the cell a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter,

wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the cell express the CPFl protein and guide RNA and results in increased or decreased translation of the gene.

13. A method of a cutting or nicking a chromosome comprising

a) providing a eukaryotic cell comprising a target sequence within a chromosome;

b) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter; and c) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPFl protein and guide RNA and results in cutting or nicking at least one strand of the target sequence.

14. The method of Claim 9, wherein the target sequence comprises a protospacer-adjacent motif having TN or TTTN (SEQ ID NO: 51), wherein N is any nucleotide.

15. The method of Claim 9, wherein CPFl comprises one or more amino acid substitutions, additions, and/or deletions in a cleavage domain resulting in a single stranded nick.

16. A bacteria comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter and a heterologous nucleic acid sequence that encodes a guide RNA.

17. The bacteria of Claim 16, wherein the guide RNA targets a bacterial resistance gene.

18. The bacteria of Claim 16, wherein the bacteria does not naturally encode a CPFl protein having SEQ ID NO: 1 or variant thereof providing a heterologous nucleic acid that encodes CPFl and a heterologous nucleic acid that encodes the guide RNA.

Description:
CRISPR-ASSOCIATED PROTEIN FROM FRANCISELLA AND USES RELATED

THERETO

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Provisional Application No.

62/193,921 filed July 17, 2015. The entirety of this application is hereby incorporated by reference for all purposes.

INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED AS A TEXT FILE

VIA THE OFFICE ELECTRONIC FILING SYSTEM (EFS-WEB)

The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is 15024PCT_ST25.txt. The text file is 24 KB, was created on July 13, 2016, and is being submitted electronically via EFS-Web.

BACKGROUND

In bacteria, CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-CAS (CRISPR-associated) genes provide defense against foreign infectious agents. These systems utilize an array of small CRISPR RNAs (crRNAs) and spacers to recognize their targets in combination with CAS nucleases that are able capable of cutting nucleic acids ultimately causing targeted degradation of foreign nucleic acids.

Mali et al. report RNA-guided human genome engineering via Cas9. Science, 2013, 339:823-26. See also Jinek et al., eLife, 2013, 2:e00471; Cong et al., Science, 2013, 339(6121):819-23; Cho et al. Nat Biotechnol, 2013, 31(3):230-2; and U.S. Patent Application Publications 2014/0068797, 2014/0179006, 2015/0184139, 2015/0176013, 2015/0167000, 2015/0166980, and 2015/0079681.

Price et al. report Cas9-mediated targeting of viral RNA in eukaryotic cells. Proc Natl Acad Sci U S A, 2015, 112(19):6164-9.

Nekrasov et al., report targeted mutagenesis in the model plant Nicotiana benthamiana using Cas9. Nat Biotechnol, 2013, 31(8):691-3. See U.S. Patent Application Publications 20150167000, 20150082478, and 20150067922.

See also WO2013176772 and WO2014124226. GenBank accession number WP 003034647 is reported as a 1300 amino acid protein from Francisella novicida. GenBank accession number AJI56734.1 is reported as a 939 CRISPR- associated protein from Francisella philomiragia (Cpfl).

References cited herein are not an admission of prior art.

SUMMARY

This disclosure relates to CRISPR-associated protein from Francisella (CPFl), variants, and nucleic acid complexes related thereto. In certain embodiments, the disclosure relates to recombinant vectors encoding a CPFl or variants thereof. In certain embodiments, the disclosure relates to CPFl and guide RNA complexes for use in targeted binding and/or cutting of nucleic acids, e.g., DNA, mRNA, or viral RNA. In certain embodiments, the disclosure contemplates genome editing using CPFl and nucleic acid complexes disclosed herein. In certain embodiments, the disclosure contemplates cells, eukaryotic cells, transgenic plants, and animals which may be genetically altered using CPFl variants disclosed herein or genetically engineered to express CPFl, variants, and nucleic acids disclosed herein.

In certain embodiments, the disclosure relates to non-naturally occurring synthetic or recombinant polypeptides comprising a CPFl variant or a CPFl fusion. In certain embodiments, the CPFl variant comprises a nuclease inactivating mutation. In certain embodiments, the mutation is one or more nucleic acid substitutions, deletions, and/or additions. In certain embodiments, the CPFl fusion or variant fusion comprises a nuclear localization sequence, transcriptional activator, transcriptional repressor, hi stone-modifying protein, integrase, deaminase, and/or recombinase.

In certain embodiments, the disclosure relates to recombinant vectors comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variants in operable combination with a promoter. In certain embodiments, the vector is a viral vector, plasmid, or phage. In certain embodiments, the vector is selected from the group consisting of lentiviral, adenoviral, adeno- associated, and herpes simplex viral vectors. Typically, the recombinant vector further comprises a nucleic acid sequence encoding a guide RNA wherein the guide RNA binds the encoded CPFl protein and a segment with sufficient base pairs to hybridize to a target sequence. The guide RNA may be a single sequence or a combination of hybridized sequences. Within any of the embodiments disclosed herein the vector comprising a nucleic acid sequence encoding a CPFl protein or variant and the vector comprising a nucleic acid sequence encoding a guide RNA are in the same vector or are in different vectors. In certain embodiments, the recombinant nucleic acids or vectors reported herein contain non-naturally occurring sequences as a whole, e.g., because of nucleotide spacers, non-natural sequence modifications, or connections of heterologous sequences.

In certain embodiments, the variants comprises at least 900, 1,000, 1,100, 1,200, or 1,250 amino acids, comprises one or two or more amino acid substitutions, additions, and/or deletions and has greater than 50, 60, 70, 80, 85, 90, 95, 96, 97, 98, or 99% identity or similarity to SEQ ID NO: 1. In certain embodiments, the variant comprises one or more amino acid substitutions, additions, and/or deletions in a nuclease cleavage domain such that the cleavage function is inactivated. "Inactivated" refers to the inability or substantially reduced rate at which the nuclease cleavage occurs.

In certain embodiments, the disclosure relates to methods of reducing the translation of mRNA comprising a) providing a cell expressing a target sequence within a mRNA; b) inserting into the cell CPFl or variant or a vector comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter; and c) inserting into the cell a guide RNA or a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the cell express the CPFl protein and guide RNA and results in reduction of the translation of the mRNA.

In certain embodiments, the disclosure relates to methods of altering the transcription of a gene comprising a) providing a cell expressing a target sequence about the promoter region of a gene; b) inserting into the cell a fusion protein or a vector comprising a nucleic acid sequence encoding a fusion protein comprising i) a transcriptional activator or transcriptional repressor, ii) nuclear localization signal and iii) a CPFl protein variant of SEQ ID NO: 1 wherein nuclease cleavage activity is inactivated in operable combination with a promoter; and c) inserting into the cell a guide RNA or a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the cell expresses the CPFl protein and guide RNA and results in increased, decreased, or terminates translation of the gene. In certain embodiments, the disclosure relates to methods of a cutting or nicking a genome, e.g., chromosome or other double stranded nucleic acid (DNA or RNA) comprising a) providing a cell, e.g., eukaryotic cell comprising a target sequence within a genome, e.g., chromosome; b) inserting into the eukaryotic cell a CPF1 or variant or a vector comprising a nucleic acid sequence encoding a CPF1 protein having SEQ ID NO: 1 or variant in operable combination with a promoter; and c) inserting into the eukaryotic cell a guide RNA or a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPF1 protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPF1 protein and guide RNA and results in cutting or nicking at least one strand of the target sequence.

In certain embodiments, the target sequence does not contain a protospacer-adjacent motif. In certain embodiments, the target sequence comprises a protospacer-adjacent motif having TN or TTTN (SEQ ID NO: 51), wherein N is any nucleotide.

In certain embodiments, the nucleotide sequence encoding CPF1 protein is codon optimized for expression in the eukaryotic cell.

In certain embodiments, the CPF1 variant comprises one or more amino acid substitutions, additions, and/or deletions in a nuclease/cleavage domain resulting a CPF1 variant that is capable of making single stranded nicks.

In certain embodiments, the disclosure relates to methods of substituting a replacement sequence into a genome, e.g., chromosome or other double stranded nucleic acid comprising a) providing a eukaryotic cell comprising a target sequence within a chromosome; b) inserting into the eukaryotic cell a CPF1 protein or variant or a vector comprising a nucleic acid sequence encoding a CPF1 protein having a variant SEQ ID NO: 1 wherein the variant comprises one or more amino acid substitutions, additions, and/or deletions in a nuclease/cleavage domain such that the cleavage function is inactivated in operable combination with a promoter; c) inserting into the eukaryotic cell a guide RNA or a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPF1 protein and the guide RNA a has segment with sufficient base pairs to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPF1 protein and guide RNA providing a nicked target sequence; and d) inserting into the eukaryotic cell a replacement double stranded nucleic acid under conditions such that homologous recombination of the replacement double stranded nucleic acid and the nicked target sequences provides a substitution of the nicked target sequence with the replacement double stranded nucleic acid.

In certain embodiments, the disclosure relates to methods of increasing or decreasing the expression of a gene comprising a) providing a eukaryotic cell comprising a target gene sequence; b) inserting into the eukaryotic cell CPF1 fusion or a vector comprising a nucleic acid sequence encoding a CPF1 fusion having a variant SEQ ID NO: 1, wherein the variant comprises one or more amino acid substitutions, additions, and/or deletions in one or more nuclease/cleavage domain such that the cleavage function is inactivated, wherein CPF1 is a fusion comprising a transcriptional activator, transcriptional repressor, hi stone-modifying protein, integrase, and/or recombinases in operable combination with a promoter; c) inserting into the eukaryotic cell a guide RNA or a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPF1 protein and the guide RNA a has segment with sufficient base pairs to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPF1 protein and guide RNA providing altered expression of the gene due to the transcriptional activator, transcriptional repressor, hi stone-modifying protein, integrase, and/or recombinases. BRIEF DESCRIPTION OF THE FIGURES

Figure 1A illustrates the organization of the CRISPR-Cpfl locus in F. novicida U112. There are four cas genes in the locus, cpfl followed by cas4, casl, cas2 and the CRISPR array. The CRISPR array spacers referenced in this paper are referred to by their number, as labeled in the array. The "P" and arrow indicate the promoter for the array. Wild-type F. novicida has spacers against a putative prophage in the CRISPR-Cpfl array, which correspond with spacers 4 and 5. 'Cpfl Target' vectors were made by inserting the protospacer, prophage-derived sequences that spanned the region identical to the U112 spacer with 7-10 nucleotide flanks on each side of the spacer, into a pBav vector.

Figure IB shows Cpfl_Target_5 plasmid (SEQ ID NO: 42 and 43) aligned to spacer (SEQ ID NO: 44) in cpfl CRISPR array. Figure 1C shows Cpfl_Target_4 plasmid (SEQ ID NO: 45 and 46) aligned to spacer (SEQ ID NO: 47) in cpfl CRISPR array.

Figure ID shows data where F novicida was transformed with 1 μg of both the pBav vector 'Control' which does not contain a sequence corresponding to a spacer in the CRISPR array, and the 'Cpfl Target' protospacer containing plasmids with predicted target sequences. The number of transformants with each plasmid was evaluated. F. novicida inhibited transformation with a 'Cpfl_Target_5' plasmid containing a putative prophage sequence that encompassed the protospacer and flanking nucleotides corresponding with spacer 5 in the CRISPR-Cpfl array. Transformation with the target-5 plasmid was inhibited by over 3 log relative to the non-targeted vector control.

Figure IE shows data indicating F. novicida inhibited transformation by a Cpfl_Target_4 plasmid containing a putative prophage sequence that encompassed the protospacer and flanking nucleotides corresponding with spacer 4 in the CRISPR-Cpfl array. F. novicida is capable of inhibiting transformation by sequences found in the CRISPR-Cpfl array.

Figure IF shows data where clean deletions in Wild-type F. novicida Ul 12 were made for all of the cas genes: Acpfl, Acas4, Acasl-2. Each mutant was transformed with a Control and Cpfl_Target_5 plasmid and transformation efficiency was evaluated. The Acpfl mutant was transformed with the Cpfl_target_5 and control plasmids at the same efficiency, indicating that this strain was unable to restrict plasmid transformation. In contrast, cas4, casl, and cas2 were dispensable for interference activity, as both Acas4 and Acasl-2 mutants inhibited transformation with the target plasmids at the same efficiency as wild-type F. novicida Ul 12.

Figure 1G shows data where complementation of the Acpfl mutant in cis with cpfl restored plasmid inhibition to this strain for the Cpfl_target_5 plasmid, while remaining permissive to transformation with the control. These in vivo results to indicate that Cpfl is the only Cas protein encoded in its CRISPR-Cas locus that is required for target inhibition.

Figure 2A illustrates a representative conserved active site motifs in Francisella novicida Ul 12. Bolded, labeled residues represent the predicted active sites. Based on domain conservation between Cpfl homologs, the active site residues were predicted to comprise a RuvC-like catalytic DED triad (D917A related to SEQ ID NO: 48, El 006 A relative SEQ ID NO: 49, D 1255 A relative to SEQ ID NO: 50). Figure 2B shows data where conserved residues in each active site domain, including the DED motif (D917A, El 006 A, D 1255 A) were mutated in the chromosome, and each new strain was tested for Cpfl activity by transformation with a control and Cpfl_Target_5 plasmid. The N897A mutation the RuvCI domain that did not disrupt transformation efficiency was used as a positive control. Mutations in all three domains effectively restored transformation efficiency with the target plasmid to that of the control plasmid, indicating that all three residues are essential for DNA targeting, a series of additional mutation were conducted in conserved residues that were predicted to be involved in target or non-target strand cleavage, and found that disruption of W971, E1028, or Q1056 resulted in loss of the ability to restrict transformation with the Cpfl_target_5 plasmid; bringing the transformation efficiency with the Cpfl_target_5 plasmid to that of the control plasmid. These results indicate that D917A, E1006A, D1255A, W971, E1028, and Q1056 are all required for plasmid inhibition by FnCpf 1.

Figure 2C shows data for mutations in the predicted Arginine- and Lysine-rich RNA binding domain. Mutations near predicted interaction domains K839A, and H881 A do not have an effect on plasmid inhibition, while mutation R833 A, which is most conserved within the positively charged domain of the protein, prevents transformation inhibition. This residue most likely directly interacts with the crRNA backbone, and disruption prevents RNA binding.

Figure 3 shows the protospacer adjacent motif requirements (PAM) on the target DNA in order for Cpfl to be able to restrict transformation with target plasmids. FnCas9 uses a 5' TTTN PAM.

Figure 3 shows data where wild-type F. novicida U112 and a Acpfl mutant were transformed with Control and Cpfl_Target_5 plasmid (SEQ ID NO: 52) derivatives containing a panel of PAM mutations (SEQ ID NO: 51, 53-57) with a 5' CCCN (SEQ ID NO: 57) PAM mutation was most able to restore transformation efficiency after 48 hours post transformation. Of the 5' PAM bases, the mutation of the -2 position to a 5' TTCN (SEQ ID NO: 53) mutant PAM has the largest individual effect of the bases on disrupting the ability of Cpfl to inhibit transformation with a target plasmid. Data is presented as relative transformation efficiency of the WT F. novicida to a Acpfl mutant with each plasmid.

Figure 4A illustrates secondary structure of representative CRISPR-Cpfl repeats. Repeats have a stem-loop on the 3' of the repeat (adjacent to the 5' of the spacer) with a stem 5 nucleotides long and a 4 nucleotide loop. Figure 4B illustrates a CRISPR array and the reprogrammed crRNA for a new spacer. The AcrRNA strain is wild-type F. novicida with a deletion of the CRISPR array. The reprogrammed strain locus is the AcrRNA strain complemented with a repeat-(non-native spacer)-repeat. The non-native spacer targets the backbone of the plasmid vector, which is present in both the control and Cpfl_target_5 construct.

Figure 4C shows data on transformation efficiency of U112, AcrRNA, and AcrRNA+Reprogrammed with a Cpfl_Target_5 and Control plasmid. Wild-type F. novicida is able to restrict transformation by the target plasmid only, AcrRNA is permissive to transformation by both plasmids. In the reprogrammed strain, which targets the plasmid backbone of both the control and Cpfl_target_5 plasmids, Cpfl inhibits transformation by both plasmids, indicating effective reprogramming of the strain for new targets.

DETAILED DISCRIPTION

Before the present disclosure is described in greater detail, it is to be understood that this disclosure is not limited to particular embodiments described, and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, the preferred methods and materials are now described.

All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present disclosure is not entitled to antedate such publication by virtue of prior disclosure. Further, the dates of publication provided could be different from the actual publication dates that may need to be independently confirmed. As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure. Any recited method can be carried out in the order of events recited or in any other order that is logically possible.

Embodiments of the present disclosure will employ, unless otherwise indicated, techniques of medicine, organic chemistry, biochemistry, molecular biology, pharmacology, and the like, which are within the skill of the art. Such techniques are explained fully in the literature.

It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a support" includes a plurality of supports. In this specification and in the claims that follow, reference will be made to a number of terms that shall be defined to have the following meanings unless a contrary intention is apparent.

Prior to describing the various embodiments, the following definitions are provided and should be used unless otherwise indicated.

As used herein, "subject" refers to any animal, preferably a human patient, livestock, or domestic pet.

As used herein, the term "nucleic acid" refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5' to the 3' end. The "nucleic acid" may also optionally contain non-naturally occurring or altered nucleotide bases that permit correct read through by a polymerase and do not reduce expression of a polypeptide encoded by that nucleic acid. The term "nucleotide sequence" or "nucleic acid sequence" refers to both the sense and antisense strands of a nucleic acid as either individual single strands or in the duplex. The term "ribonucleic acid" (RNA) is inclusive of RNAi (inhibitory RNA), dsRNA (double stranded RNA), siRNA (small interfering RNA), mRNA (messenger RNA), miRNA (micro-RNA), tRNA (transfer RNA, whether charged or discharged with a corresponding acylated amino acid), and cRNA (complementary RNA) and the term "deoxyribonucleic acid" (DNA) is inclusive of cDNA and genomic DNA and DNA-RNA hybrids. The words "nucleic acid segment", "nucleotide sequence segment", or more generally "segment" will be understood by those in the art as a functional term that includes both genomic sequences, ribosomal RNA sequences, transfer RNA sequences, messenger RNA sequences, small regulatory RNAs, operon sequences and smaller engineered nucleotide sequences that express or may be adapted to express, proteins, polypeptides or peptides.

Nucleic acids of the present disclosure may also be synthesized, either completely or in part, especially where it is desirable to provide plant-preferred sequences, by methods known in the art. Thus, all or a portion of the nucleic acids of the present codons may be synthesized using codons preferred by a selected host. Species-preferred codons may be determined, for example, from the codons used most frequently in the proteins expressed in a particular host species. Other modifications of the nucleotide sequences may result in mutants having slightly altered activity.

The term "a nucleic acid sequence encoding" a specified polypeptide refers to a nucleic acid sequence comprising the coding region of a gene or in other words the nucleic acid sequence which encodes a gene product. The coding region may be present in either a cDNA, genomic DNA or RNA form. When present in a DNA form, the oligonucleotide, polynucleotide, or nucleic acid may be single-stranded (i.e., the sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present disclosure may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.

The term "cDNA" refers to complementary DNA (cDNA), i.e., DNA synthesized from a

RNA (e.g. mRNA) template typically catalyzed by the enzymes reverse transcriptase and DNA polymerase.

DNA molecules are said to have "5' ends" and "3' ends" because mononucleotides are reacted to make oligonucleotides in a manner such that the 5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide is referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring. An end of an oligonucleotide is referred to as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of another mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5' and 3' ends. In either a linear or circular DNA molecule, discrete elements are referred to as being "upstream" or 5' of the "downstream" or 3' elements. This terminology reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA strand. The promoter and enhancer elements which direct transcription of a linked gene are generally located 5' or upstream of the coding region. However, enhancer elements can exert their effect even when located 3' of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3' or downstream of the coding region.

The term "gene" refers to a nucleic acid (e.g., DNA or RNA) sequence that comprises coding sequences necessary for the production of an RNA, or a polypeptide or its precursor (e.g., proinsulin). A functional polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence as long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, etc.) of the polypeptide are retained. The term "portion" when used in reference to a gene refers to fragments of that gene. The fragments may range in size from a few nucleotides to the entire gene sequence minus one nucleotide. Thus, "a nucleotide comprising at least a portion of a gene" may comprise fragments of the gene or the entire gene. The term "gene" also encompasses the coding regions of a structural gene and includes sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full-length mRNA. The sequences which are located 5' of the coding region and which are present on the mRNA are referred to as 5' non-translated sequences. The sequences which are located 3' or downstream of the coding region and which are present on the mRNA are referred to as 3' non -translated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene which are transcribed into nuclear RNA (mRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.

In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5' and 3' end of the sequences which are present on the RNA transcript. These sequences are referred to as "flanking" sequences or regions (these flanking sequences are located 5' or 3' to the non-translated sequences present on the mRNA transcript). The 5' flanking region may contain regulatory sequences such as promoters and enhancers which control or influence the transcription of the gene. The 3' flanking region may contain sequences which direct the termination of transcription, posttranscriptional cleavage and polyadenylation.

The term "heterologous gene" refers to a gene encoding a factor that is not in its natural environment (i.e., has been altered by the hand of man). For example, a heterologous gene includes a gene from one species introduced into another species. A heterologous gene also includes a gene native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer sequence, etc.). Heterologous genes may comprise bacterial gene sequences that comprise cDNA forms of a bacterial gene; the cDNA sequences may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an anti- sense RNA transcript that is complementary to the mRNA transcript).

The terms "complementary" and "complementarity" refer to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, for the sequence "A-G- T," is complementary to the sequence "T-C-A." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods which depend upon binding between nucleic acids.

The nucleic acid molecules or guided or targeting RNA disclosed herein are capable of specifically hybridizing to the target nucleic acid under certain circumstances. As used herein, two nucleic acid molecules are said to be capable of specifically hybridizing to one another if the two molecules are capable of forming a hydrogen bonding nucleic acid structure. A nucleic acid molecule may exhibit complete complementarity. Two molecules are said to be "minimally complementary" if they can hybridize to one another with sufficient stability to permit them to remain annealed to one another under at least conventional "low-stringency" conditions. Similarly, the molecules are said to be complementary if they can hybridize to one another with sufficient stability to permit them to remain annealed to one another under conventional "high-stringency" conditions. Conventional stringency conditions are described by Sambrook, et al. (1989), and by Haymes et al. (1985).

Departures from complete complementarity are therefore permissible, as long as such departures do not completely preclude the capacity of the RNA molecules to form a hydrogen bonding structure with the target. Thus, in order for an RNA to serve as a guide to the target, the RNA needs only be sufficiently complementary in sequence to be able to form a stable hydrogen bonding structure under the physiological conditions of the cell expressing the RNA.

The term "recombinant" when made in reference to a nucleic acid molecule refers to a nucleic acid molecule which is comprised of segments of nucleic acid joined together by means of molecular biological techniques. The term "recombinant" when made in reference to a protein or a polypeptide refers to a protein molecule which is expressed using a recombinant nucleic acid molecule.

A "cloning vector" or "vector" refers to a nucleic acid molecule used as a vehicle to carry foreign genetic material into another cell, where it can be replicated and/or expressed. A cloning vector containing foreign nucleic acid is termed a recombinant vector. Examples of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Recombinant vectors typically contain an origin of replication, a multicloning site, and a selectable marker. The nucleic acid sequence typically consists of an insert (recombinant nucleic acid or transgene) and a larger sequence that serves as the "backbone" of the vector. The purpose of a vector which transfers genetic information to another cell is typically to isolate, multiply, or express the insert in the target cell. Expression vectors (expression constructs) are for the expression of the transgene in the target cell, and generally have a promoter sequence that drives expression of the transgene. Insertion of a vector into the target cell is referred to transformation or transfection for bacterial and eukaryotic cells, although insertion of a viral vector is often called transduction.

The terms "in operable combination", "in operable order" and "operably linked" refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

The term "regulatory element" refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element which facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, etc.

Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription (Maniatis, et al., Science 236: 1237, 1987). Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect, mammalian and plant cells. Promoter and enhancer elements have also been isolated from viruses and analogous control elements, such as promoters, are also found in prokaryotes. The selection of a particular promoter and enhancer depends on the cell type used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types (for review, see Voss, et al., Trends Biochem. Sci., 11 :287, 1986; and Maniatis, et al., supra 1987).

The terms "promoter element," "promoter," or "promoter sequence" as used herein, refer to a DNA sequence that is located at the 5' end (i.e. precedes) the protein coding region of a DNA polymer. The location of most promoters known in nature precedes the transcribed region. The promoter functions as a switch, activating the expression of a gene. If the gene is activated, it is said to be transcribed, or participating in transcription. Transcription involves the synthesis of mRNA from the gene. The promoter, therefore, serves as a transcriptional regulatory element and also provides a site for initiation of transcription of the gene into mRNA. The term "cell type specific" as applied to a promoter refers to a promoter which is capable of directing selective expression of a nucleotide sequence of interest in a specific type of cell in the relative absence of expression of the same nucleotide sequence of interest in a different type of cell within the same tissue. Promoters may be constitutive or regulatable. The term "constitutive" when made in reference to a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid sequence in the absence of a stimulus (e.g., heat shock, chemicals, light, etc.). Typically, constitutive promoters are capable of directing expression of a transgene in substantially any cell and any tissue. In contrast, a "regulatable" or "inducible" promoter is one which is capable of directing a level of transcription of an operably linked nuclei acid sequence in the presence of a stimulus (e.g., heat shock, chemicals, light, etc.) which is different from the level of transcription of the operably linked nucleic acid sequence in the absence of the stimulus.

The enhancer and/or promoter may be "endogenous" or "exogenous" or "heterologous." An "endogenous" enhancer or promoter is one that is naturally linked with a given gene in the genome. An "exogenous" or "heterologous" enhancer or promoter is one that is placed in juxtaposition to a gene by means of genetic manipulation (i.e., molecular biological techniques) such that transcription of the gene is directed by the linked enhancer or promoter, e.g., heterologous because the promoter and the gene are from different organisms. For example, an endogenous promoter in operable combination with a first gene can be isolated, removed, and placed in operable combination with a second gene, thereby making it a "heterologous promoter" in operable combination with the second gene as they do not naturally occurring together in nature.

Efficient expression of recombinant DNA sequences in eukaryotic cells is believed to include the expression of signals directing the efficient termination and polyadenylation of the resulting transcript. Transcription termination signals are generally found downstream of the polyadenylation signal and are a few hundred nucleotides in length. The term "poly(A) site" or "poly(A) sequence" as used herein denotes a DNA sequence which directs both the termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the recombinant transcript is desirable, as transcripts lacking a poly(A) tail are unstable and are rapidly degraded. The poly(A) signal utilized in an expression vector may be "heterologous" or "endogenous." An endogenous poly(A) signal is one that is found naturally at the 3' end of the coding region of a given gene in the genome. A heterologous poly(A) signal is one which has been isolated from one gene and positioned 3' to another gene.

A "selectable marker" is a nucleic acid introduced into a recombinant vector that encodes a polypeptide that confers a trait suitable for artificial selection or identification (report gene), e.g., beta-lactamase confers antibiotic resistance, which allows an organism expressing beta-lactamase to survive in the presence antibiotic in a growth medium. Another example is thymidine kinase, which makes the host sensitive to ganciclovir selection. It may be a screenable marker that allows one to distinguish between wanted and unwanted cells based on the presence or absence of an expected color. For example, the lac-z-gene produces a beta-galactosidase enzyme which confers a blue color in the presence of X-gal (5-bromo-4-chloro-3-indolyl-P-D-galactoside). If recombinant insertion inactivates the lac-z-gene, then the resulting colonies are colorless. There may be one or more selectable markers, e.g., an enzyme that can complement to the inability of an expression organism to synthesize a particular compound required for its growth (auxotrophic) and one able to convert a compound to another that is toxic for growth. URA3, an orotidine-5' phosphate decarboxylase, is necessary for uracil biosynthesis and can complement ura3 mutants that are auxotrophic for uracil. URA3 also converts 5-fluoroorotic acid into the toxic compound 5-fluorouracil. Additional contemplated selectable markers include any genes that impart antibacterial resistance or express a fluorescent protein. Examples include, but are not limited to, the following genes: ampr, camr, tetr, blasticidinr, neor, hygr, abxr, neomycin phosphotransferase type II gene (nptll), p-glucuronidase (gus), green fluorescent protein (gfp), egfp, yfp, mCherry, p- galactosidase (lacZ), lacZa, lacZAM15, chloramphenicol acetyltransferase (cat), alkaline phosphatase (phoA), bacterial luciferase (luxAB), bialaphos resistance gene (bar), phosphomannose isomerase (pmi), xylose isomerase (xylA), arabitol dehydrogenase (atlD), UDP- glucose:galactose-l -phosphate uridyltransferasel (galT), feedback-insensitive a subunit of anthranilate synthase (OASA1D), 2-deoxy glucose (2-DOGR), benzyladenine-N-3-glucuronide, E. coli threonine deaminase, glutamate 1-semialdehyde aminotransferase (GSA-AT), D-amino acidoxidase (DAAO), salt-tolerance gene (rstB), ferredoxin-like protein (pflp), trehalose-6-P synthase gene (AtTPSl), lysine racemase (lyr), dihydrodipicolinate synthase (dapA), tryptophan synthase beta 1 (AtTSB l), dehalogenase (dhlA), mannose-6-phosphate reductase gene (M6PR), hygromycin phosphotransferase (HPT), and D-serine ammonialyase (dsdA).

A "label" refers to a detectable compound or composition that is conjugated directly or indirectly to another molecule, such as an antibody or a protein, to facilitate detection of that molecule. Specific, non-limiting examples of labels include fluorescent tags, enzymatic linkages, and radioactive isotopes. In one example, a "label receptor" refers to incorporation of a heterologous polypeptide in the receptor. A label includes the incorporation of a radiolabeled amino acid or the covalent attachment of biotinyl moieties to a polypeptide that can be detected by marked avidin (for example, streptavidin containing a fluorescent marker or enzymatic activity that can be detected by optical or colorimetric methods). Various methods of labeling polypeptides and glycoproteins are known in the art and may be used. Examples of labels for polypeptides include, but are not limited to, the following: radioisotopes or radionucleotides (such as 35S or 1311) fluorescent labels (such as fluorescein isothiocyanate (FITC), rhodamine, lanthanide phosphors), enzymatic labels (such as horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase), chemiluminescent markers, biotinyl groups, predetermined polypeptide epitopes recognized by a secondary reporter (such as a leucine zipper pair sequences, binding sites for secondary antibodies, metal binding domains, epitope tags), or magnetic agents, such as gadolinium chelates. In some embodiments, labels are attached by spacer arms of various lengths to reduce potential steric hindrance.

In certain embodiments, the disclosure relates to recombinant polypeptides comprising sequences disclosed herein or variants or fusions thereof wherein the amino terminal end or the carbon terminal end of the amino acid sequence are optionally attached to a heterologous amino acid sequence, label, or reporter molecule.

In certain embodiments, the disclosure relates to the recombinant vectors comprising a nucleic acid encoding a polypeptide disclosed herein or fusion protein thereof.

In certain embodiments, the recombinant vector optionally comprises a mammalian, human, insect, viral, bacterial, bacterial plasmid, yeast associated origin of replication or gene such as a gene or retroviral gene or lentiviral LTR, TAR, RRE, PE, SLIP, CRS, and INS nucleotide segment or gene selected from tat, rev, nef, vif, vpr, vpu, and vpx or structural genes selected from gag, pol, and env.

In certain embodiments, the recombinant vector optionally comprises a gene vector element (nucleic acid) such as a selectable marker region, lac operon, a CMV promoter, a hybrid chicken B-actin/CMV enhancer (CAG) promoter, tac promoter, T7 RNA polymerase promoter, SP6 RNA polymerase promoter, SV40 promoter, internal ribosome entry site (IRES) sequence, cis-acting woodchuck post regulatory regulatory element (WPRE), scaffold-attachment region (SAR), inverted terminal repeats (ITR), FLAG tag coding region, c-myc tag coding region, metal affinity tag coding region, streptavidin binding peptide tag coding region, polyHis tag coding region, HA tag coding region, MBP tag coding region, GST tag coding region, polyadenylation coding region, SV40 polyadenylation signal, SV40 origin of replication, Col El origin of replication, fl origin, pBR322 origin, or pUC origin, TEV protease recognition site, loxP site, Cre recombinase coding region, or a multiple cloning site such as having 5, 6, or 7 or more restriction sites within a continuous segment of less than 50 or 60 nucleotides or having 3 or 4 or more restriction sites with a continuous segment of less than 20 or 30 nucleotides.

"Sequence identity" refers to a measure of relatedness between two or more nucleic acids or proteins, and is typically given as a percentage with reference to the total comparison length. The identity calculation takes into account those nucleotide or amino acid residues that are identical and in the same relative positions in their respective larger sequences. Calculations of identity may be performed by algorithms contained within computer programs such as "GAP" (Genetics Computer Group, Madison, Wis.) and "ALIGN" (DNAStar, Madison, Wis.) using default parameters. In certain embodiments, sequence "identity" refers to the number of exactly matching residues (expressed as a percentage) in a sequence alignment between two sequences of the alignment. In certain embodiments, percentage identity of an alignment may be calculated using the number of identical positions divided by the greater of the shortest sequence or the number of equivalent positions excluding overhangs wherein internal gaps are counted as an equivalent position. For example the polypeptides GGGGGG and GGGGT have a sequence identity of 4 out of 5 or 80%. For example, the polypeptides GGGPPP and GGGAPPP have a sequence identity of 6 out of 7 or 85%.

In certain embodiments, for any contemplated percentage sequence identity, it is also contemplated that the sequence may have the same percentage of sequence similarity. Percent "similarity" is used to quantify the extent of similarity, e.g., hydrophobicity, hydrogen bonding potential, electrostatic charge, of amino acids between two sequences of the alignment. This method is similar to determining the identity except that certain amino acids do not have to be identical to have a match. In certain embodiments, sequence similarity may be calculated with well-known computer programs using default parameters. Typically, amino acids are classified as matches if they are among a group with similar properties, e.g., according to the following amino acid groups: Aromatic - F Y W; hydrophobic-A V I L; Charged positive: R K H; Charged negative - D E; Polar - S T N Q.

A partially complementary sequence is one that at least partially inhibits (or competes with) a completely complementary sequence from hybridizing to a target nucleic acid - also referred to as "substantially homologous." The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a sequence which is completely homologous to a target under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target which lacks even a partial degree of complementarity (e.g., less than about 30%) identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.

The following terms are used to describe the sequence relationships between two or more polynucleotides: "reference sequence", "sequence identity", "percentage of sequence identity", and "substantial identity". A "reference sequence" is a defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of a full-length cDNA sequence given in a sequence listing or may comprise a complete gene sequence. Generally, a reference sequence is at least 20 nucleotides in length, frequently at least 25 nucleotides in length, and often at least 50 nucleotides in length. Since two polynucleotides may each (1) comprise a sequence (i.e., a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) may further comprise a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a "comparison window" to identify and compare local regions of sequence similarity. A "comparison window", as used herein, refers to a conceptual segment of at least 20 contiguous nucleotide positions wherein a polynucleotide sequence may be compared to a reference sequence of at least 20 contiguous nucleotides and wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by the local homology algorithm of Smith and Waterman (Smith and Waterman, Adv. Appl. Math. 2: 482 (1981)) by the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, J. Mol. Biol. 48:443 (1970)), by the search for similarity method of Pearson and Lipman (Pearson and Lipman, Proc. Natl. Acad. Sci. (U.S.) 85:2444 (1988)), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by inspection, and the best alignment (i.e., resulting in the highest percentage of homology over the comparison window) generated by the various methods is selected. In certain embodiment, the term "sequence identity" refers to two polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. In some embodiments, the term "percentage of sequence identity" over a comparison window is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T/U, C, G, or I) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. The terms "variant" when used in reference to a polypeptide refer to an amino acid sequence that differs by one or more amino acids from another, usually related polypeptide. The variant may have "conservative" changes, wherein a substituted amino acid has similar structural or chemical properties. One type of conservative amino acid substitutions refers to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine- glutamine. More rarely, a variant may have "non-conservative" changes (e.g., replacement of a glycine with a tryptophan). Similar minor variations may also include amino acid deletions or insertions (in other words, additions), or both. Guidance in determining which and how many amino acid residues may be substituted, inserted or deleted without abolishing biological activity may be found using computer programs well known in the art, for example, DNAStar software. Variants can be tested in functional assays. Certain variants have less than 10%, and preferably less than 5%, and still more preferably less than 2% changes (whether substitutions, deletions, and so on).

CRISPR-associated protein Cpfl of Francisella novicida (CPF1)

The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive immune system that has been modified to enable general genome engineering in a variety of organisms and cell lines. CRISPR-Cas (CRISPR associated) systems are protein- RNA complexes that use an RNA molecule (gRNA) as a guide to localize the complex to a target nucleic acid sequence via base-pairing. In the natural systems, a Cas protein then acts as an nuclease to cleave the targeted DNA sequence. The target DNA sequence must be both complementary to the gRNA, and also contain a "protospacer-adjacent motif (PAM) dinucleotide adjacent to the the complementary region in order for the system to function. Among the known Cas proteins, S. pyogenes Cas9 has been mostly widely used as a tool for genome engineering. This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to abolish nuclease activity, resulting in a nuclease inactive Cas9 that still retains its ability to bind DNA in a gRNA-programmed manner. By creating Cas9 fusion proteins with protein domains that alter the rate of gene translation into mRNA, e.g., transcription factors and regulators, the CRISPR-cas system functions as a RNA guided gene expression controller.

It has been discovered that a protein in Francisella novicida targets nucleic acids and provides sequence-specific DNA targeting directed by the spacer sequences from the CPF1- associated CRISPR repeats in genome independent of Cas9. Thus, CPF1 is contemplated to be useful in all the applications that Cas9 is useful. In addition, it has also been discovered that CPF1 requires a different protospacer-adjacent motif (PAM) located on the 5' of target DNA sequence, a further distinction from Cas9. Bioinformatic predictions do not identify accessory RNAs in the Cpfl loci and the conserved secondary structure of the repeats indicate a single crRNA is used by Cpfl to guide it to target sequences. This simplicity differs from Cas9 which uses a tracrRNA- crRNA hybrid.

The amino acid sequence of CPF1 is provided in GenBank accession number WP 003034647 which was reported as a 1300 amino acid protein from Francisella novicida (SEQ ID NO: 1).

msiyqefvnk yslsktlrfe lipqgktlen ikarglildd ekrakdykka kqiidkyhqf fieeilssvc isedllqnys dvyfklkksd ddnlqkdfks akdtikkqis eyikdsekfk nlfnqnlida kkgqesdlil wlkqskdngi elfkansdit didealeiik sfkgwttyfk gfhenrknvy ssndiptsii yrivddnlpk flenkakyes lkdkapeain yeqikkdlae eltfdidykt sevnqrvfsl devfeianfn nylnqsgitk fntiiggkfv ngentkrkgi neyinlysqq indktlkkyk msvlfkqils dtesksfvid kleddsdvvt tmqsfyeqia afktveeksi ketlsllfdd lkaqkldlsk iyfkndkslt dlsqqvfddy svigtavley itqqiapknl dnpskkeqel iakktekaky lsletiklal eefnkhrdid kqcrfeeila nfaaipmifd eiaqnkdnla qisikyqnqg kkdllqasae ddvkaikdll dqtnnllhkl kifhisqsed kanildkdeh fylvfeecyf elanivplyn kirnyitqkp ysdekfklnf enstlangwd knkepdntai lfikddkyyl gvmnkknnki fddkaikenk gegykkivyk llpgankmlp kvffsaksik fynpsedilr irnhsthtkn gspqkgyekf efniedcrkf idfykqsisk hpewkdfgfr fsdtqrynsi defyrevenq gykltfenis esyidsvvnq gklylfqiyn kdfsayskgr pnlhtlywka lfdernlqdv vyklngeael fyrkqsipkk ithpakeaia nknkdnpkke svfeydlikd krftedkfff hcpitinfks sgankfndei nlllkekand vhilsidrge rhlayytlvd gkgniikqdt fniigndrmk tnyhdklaai ekdrdsarkd wkkinnikem kegylsqvvh eiaklvieyn aivvfedlnf gfkrgrfkve kqvyqklekm lieklnylvf kdnefdktgg vlrayqltap fetfkkmgkq tgiiyyvpag ftskicpvtg fvnqlypkye svsksqeffs kfdkicynld kgyfefsfdy knfgdkaakg kwtiasfgsr linfrnsdkn hnwdtrevyp tkelekllkd ysieyghgec ikaaicgesd kkffakltsv lntilqmrns ktgteldyli spvadvngnf fdsrqapknm pqdadangay higlkglmll griknnqegk klnlviknee yfefvqnrnn

In certain embodiments, the disclosure relates to a variant CPF1 protein of SEQ ID NO: 1 comprising at least one, two, three or more amino acid mutations, substitutions, additions, or deletions such that it is not naturally occurring structure.

In certain embodiments, this disclosure relates to the use of CPR1 and variant proteins, e.g., CPF1 mutants that produce a nuclease inactive CPF1 and CPF1 fusions to transcriptional activators, transcriptional repressors, hi stone-modifying proteins, integrases, deaminases, and recombinases. Co-expression of these fusions with a variety of gRNAs results in specific expression of the target genes.

In certain embodiments, CPF1 is a fusion with another polypeptide sequence of greater than 5, 10, 20, 30, 40, or 50 amino acids and less than 200 or 100 amino acids. In certain embodiments, CPF1 fusion further comprises a functional domain. In some embodiments, the functional domain is the transcriptional activator domain VP64. In some embodiments, the functional domain is the transcriptional repressor domain KRAB. In some embodiments, the transcriptional repressor domain is SID, or concatemers of SID (i.e. SID4X). In some embodiments, an epigenetic modifying enzyme is provided, e.g. a histone modifying protein or an epigenetic chromatin modifying protein. In some embodiments, an activator domain is provided, which may be the P65 activator domain.

In certain embodiments, this disclosure relates to the use of CPR1 and variant proteins, e.g., CPF1 mutants that produce a nuclease inactive CPF1 capable of making single stranded nicks in double stranded nucleic acids, in order to make nucleotide changes in the genome of a eukaryotic cell through homologous recombination. Homologous recombination is a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA.

In certain embodiments, the disclosure relates to methods of modifying eukaryotic cells by manipulation of a target sequence in a genomic locus of interest comprising delivering a non- naturally occurring or engineered vector or one or more vectors operably encoding systems herein discussed for expression thereof.

In certain embodiments, the eukaryotic cell is a stem cell, a somatic cell, differentiated somatic cell, a reprogrammed induced pluripotent somatic stem cell, reprogrammed induced pluripotent somatic stem cell. In certain embodiments, the target sequence is genomic selected from the group consisting of OCT4, SOX2, KLF4, and cMYC. In one embodiment, the somatic cell is selected from the group consisting of a mesenchymal somatic cell, a fibroblast somatic cell, a cardiomyocyte somatic cell, a hematopoietic cell, and a pancreatic beta somatic cell. In one embodiment, said somatic cell is selected from the group consisting of a mesenchymal somatic cell, a fibroblast somatic cell, a cardiomyocyte somatic cell, a hematopoietic cell, and a pancreatic beta somatic cell. In one embodiment, the differentiated somatic cell is selected from the group consisting of a fibroblast cell. In one embodiment, the reprogrammed induced pluripotent somatic stem cell is selected from the group consisting of a neuronal cell, a motoneuron cell, a cortical neuron cell and an astrocyte cell. In one embodiment, the reprogrammed induced pluripotent somatic stem cell is selected from the group consisting of a mesenchymal somatic cell, a fibroblast somatic cell, a cardiomyocyte somatic cell, a hematopoietic cell and a pancreatic beta somatic cell. In one embodiment, the reprogrammed induced pluripotent somatic stem cell is selected from the group consisting of a pancreatic endocrine cell, a cardiomyocyte cell, a thymic epithelial cell and a thyroid cell. In one embodiment, the regulating transcription of said specific genomic target results in a phenotypic change of said reprogrammed induced pluripotent somatic stem cell.

In some embodiments, the nucleic acid encoding CPF1 is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human primate. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the "Codon Usage Database", and these tables can be adapted in a number of ways. See Nakamura, Y., et al. "Codon usage tabulated from the international DNA sequence databases: status for the year 2000" Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding CPF1 correspond to the most frequently used codon for a particular amino acid.

In some embodiments, a vector encodes CPF1 comprising one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the CRISPR enzyme comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy -terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. Typically, NLS consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface, but other types of NLS are known. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 2); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 3)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 4) or RQRRNELKRSP (SEQ ID NO: 5); the hRINPAl M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 6); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 7) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 8) and PPKKARED (SEQ ID NO: 9) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 10) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 11) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 12) and PKQKKRK (SEQ ID NO: 13) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 14) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 15) of the mouse Mx 1 protein; the sequence KRKGDE VD GVDE V AKKK SKK (SEQ ID NO: 16) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 17) of the steroid hormone receptors (human) glucocorticoid.

In general, the one or more NLSs are of sufficient strength to drive accumulation of CPF1 in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs in CPF1, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to CPF1, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI). Examples of detectable markers include fluorescent proteins (such as Green fluorescent proteins, or GFP; RFP; CFP), and epitope tags (HA tag, tag, SNAP tag). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of CPF1 and guide RNA complex formation (e.g. assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity), as compared to a control.

The sequence below is an example guide sequence and crRNA sizes for nucleic acid targeting by F. novicida Ul 12 Cpfl :

3 ' -NNNNNNNNNNNNNNNNNNNNNNNNNN

AGUAGAAAUUAUUUAAAGUUCUUAGAC 5' or

5'- CAGAUUCUUGAAAUUUAUUAAAGAUGACAACUCUANNNNNNNNNNNNNN NNNN NNNNNNNNNNNNNNNNNNNNNNNN- 3' (SEQ ID NO: 18)

The Sequence 3'-UCUACAACAGUAGA-5'or 5'-AGAUGACAACAUCU-3' (SEQ ID

NO: 19) indicates CRISPR repeat hairpin. In certain embodiments, the guide sequence having at least 50, 60, 70, 80, 85, 90, or 95% identity to SEQ ID NO: 19-41 and Poly (N)p wherein is any nucleotide and p is 4 or 5 to 100, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33,32,31,30,29,28,27,26,25,24,23,22,21,20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10 indicates the sequence that targets the specific target nucleic acid sequence

Table 1 shows the lengths of functional repeats and spacers after CRISPR array transcription and processing into mature crRNAs for each spacer sequence:

In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of CPF1 to the target sequence.

In certain embodiments, the guide sequence comprises a stem loop having (SEQ ID NO: 59) 5'-UCUACXJN 1 2 UGUACA- 3' wherein N 1 is any nucleotide, and N 2 is any nucleotide. In certain embodiments, N 1 is G or A. In certain embodiments N 2 is U or G. In certain embodiments, the guide sequence is 5'-(N)nUCUACUGUUGUACAU-3' (SEQ ID NO: 60), wherein N is any nucleotide configured to bind with a target sequence and n is 5 to 100, or 8 to 50, or 10 to 50, or 10 to 40, or 10 to 30, 10 to 25, or 15 to 50, or 15to 40, or 15to 30, 5 to 25. In certain embodiments, n is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25. In certain embodiments, n is 17, 18, 19, 20, 21, 22, or 23.

In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non- limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of CPF1 complex to a target sequence may be assessed by any suitable assay. For example, the components sufficient to form a complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of CPF1, followed by an assessment of preferential cleavage within the target sequence. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a CPF1 complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within an mRNA or a genome of a cell. Exemplary target sequences include those that are unique in the target genome. In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g. A. R. Gruber et al., 2008, Cell 106(1): 23-24; and P A Carr and G M Church, 2009, Nature Biotechnology 27(12): 1151-62).

In some embodiments, CPF1 is part of a fusion protein comprising one or more heterologous protein domains (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to CPF1). A CPF1 fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that may be fused to a CPF1 include, without epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity and nucleic acid binding activity. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta- glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluore scent proteins including blue fluorescent protein (BFP). CPF1 may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including but not limited to maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP 16 protein fusions. In some embodiments, a tagged CPF1 is used to identify the location of a target sequence. In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, CPF1 in combination with (and optionally complexed with) a guide sequence is delivered to a cell. Conventional viral and non- viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.

Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, poly cation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™).

The use of RNA or DNA viral based systems for the delivery of nucleic acids takes advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.

The tropism of a retrovirus can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity tier up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof. In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno- associated virus ("AAV") vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene

Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and PA317 which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.

In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, HDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, RK, RK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS- 2B, bEnd.3, BHK-21. BR 293, BxPC3, C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr -/-, COR-L23, COR-L23/CPR, COR-L23/5010, COR- L23/R23, COS-7, COV-434, CML Tl, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalclc7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYOl, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-IOA, MDA-MB-231, MDA-MB-468, MDA-MB- 435, MDCK II, MDCK II, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI- H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1 A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)).

In some embodiments, one or more vectors described herein are used to produce a non- human transgenic animal or transgenic plant. In some embodiments, the transgenic animal is a mammal, such as a mouse, rat, or rabbit. In certain embodiments, the organism or subject is a plant. In certain embodiments, the organism or subject or plant is algae. Methods for producing transgenic plants and animals are known in the art, and generally begin with a method of cell transfection, such as described herein. RNA regulation

In certain embodiments, the disclosure relates to compositions and methods that use CPF1 systems disclosed herein, e.g., CPF1 and guide RNA, to target RNAs of interest in the context of various biological systems. This allows the CPF1 system to function as a form of RNA interference. CPF1 is capable of functioning in the eukaryotic cytosol. By using longer targeting RNAs one can increase specificity. In certain embodiments, the disclosure contemplates a segment of a targeting RNA of greater than 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides. CPF1 systems disclosed herein leads to lower levels of protein from an RNA that is targeted. With regard to the claimed embodiments, it is not intended that reduction of protein result by any particular mechanism. It is believed that in some cases the RNA is likely degraded, but it is also possible that CPF1 simply sits on the target RNA blocking access by the ribosome, thereby blocking translation or by some other unappreciated mechanism.

While in some instances CPF1 is directed to its binding site by a "guide RNA" (gRNA or targeting RNA, or RNA-targeting guide RNA or rgRNA) that hybridizes to a target sequence, it is contemplated that the guide may contain a certain number of mismatches or secondary structures. In certain embodiments, the gRNA is a fusion of the tracrRNA and scaRNAs or variant sequences thereof. In order to combat non-target interactions, certain strategies maybe used, e.g., creating gRNA secondary structures that inhibit non-target interactions or altering the length of the gRNA.

CPF1 in mammalian cells targeted to recognize viral RNAs prevents productive viral replication. CPF1 can be targeted to any RNA by changing the sequence of the RNA-targeting guide RNA as an anti-viral strategy capable of combating any virus. In certain embodiments, it is contemplated that multiple gRNAs targeting different regions of viral RNA, e.g., HCV RNA, simultaneously (multiplexing), can be utilized limiting the chances that viral mutations would facilitate escape from this targeting system.

Suitable methods for transformation of host cells for use with the disclosure are believed to include virtually any method by which nucleic acids, e.g., DNA can be introduced into a cell, such as by transformation of protoplasts (U.S. Pat. No. 5,508,184), by desiccation/inhibition- mediated DNA uptake, by electroporation, by agitation with silicon carbide fibers U.S. Pat. Nos. 5,302,523; and 5,464,765), by Agrobacterium-mediated transformation (U.S. Pat. Nos. 5,563,055; 5,591,616; 5,693,512; 5,824,877; 5,981,840; 6,384,301) and by acceleration of DNA coated particles (U.S. Pat. Nos. 5,015,580; 5,550,318; 5,538,880; 6, 160,208; 6,399,861; 6,403,865), etc. Through the application of techniques such as these, the cells of virtually any species may be stably transformed. In the case of multicellular species, the transgenic cells may be regenerated into transgenic plants and organisms.

Plants and animals genetically engineered to express CPF1 with RNA targeting (gRNA) or multiple RNA-targeting RNAs specific for different viruses or pests can used to create pest- resistant progeny. In certain embodiments, the disclosure relates to generating transgenic insect vectors that are resistant to viral infection.

In certain embodiments, the disclosure contemplates the expression of CPF1 and a gRNA in eukaryotic cells used to target viruses, e.g., Hepatitis C (HCV) RNA, and prevent viral replication. Targeting CPF1 to the eukaryotic cell cytosol was done in order to target HCV RNA (HCV is an RNA virus, and has no DNA stage). CPF1 engineering studies in mammalian cells typically include NLS (nuclear localization signal) to the protein and targeted it to the nucleus in order to target DNA. In certain embodiments, a recombinantly produce CPF1 of this disclosure does not contain a NLS sequence. CPF1 has activity in the cytosol of a eukaryotic cell. CPF1 in the cytosol of eukaryotic cells may be used to target RNA or may be used to prevent its translation into protein.

Targeting of mRNA by the CPF1 system can use a much larger region of complementarity

(in the range of 50 bp) that can also tolerate imperfect hybridization (mismatches, loops, etc.).

This may be used to generate a "tunable" system in which one can control how much of a given RNA is knocked down. In certain embodiments, the disclosure contemplates single stranded targeting nucleic acids in the range of 25 to 50 nucleotides, or 25 to 100 or more nucleotides, or

35 to 65 nucleotides or more nucleotides, or 40 to 60 nucleotides or more nucleotides.

In certain embodiments, the disclosure contemplates targeting numerous genes or target

RNAs at the same time, e.g., host genes at the same time, viral genes at the same time, or viral and host genes at the same time. In certain embodiments, the disclosure contemplates that the CPF1 system can be used to target host RNAs. In certain embodiments, a combination of targeting viral

RNA and host RNAs encoding factors that promote viral infection.

In certain embodiments, the disclosure contemplates that one may skew the immune response (e.g. to a Thl, Th2, or Thl7 phenotype). One may treat an infection with a pathogen that induces a Th2 response with an rgRNA that will skew the response back to Thl and lead to clearance of the pathogen. I In certain embodiments, the disclosure also contemplates methods that can include inducing expression, which can be inducing expression of the CPFl and/or inducing expression of the guide sequences. In certain embodiments of the herein methods, the organism or subject is a eukaryote or a non-human eukaryote. In certain embodiments of the herein methods, the organism or subject is a plant. In certain embodiments of the herein methods, the organism or subject is a mammal or a non-human mammal. In certain embodiments of the herein methods, the organism or subject is algae.

One aspect of manipulation of a target sequence also refers to the epigenetic manipulation of a target sequence. This may be of the chromatin state of a target sequence, such as by modification of the methylation state of the target sequence (i.e. addition or removal of methylation or methylation patterns or CpG islands), histone modification, increasing or reducing accessibility to the target sequence, or by promoting or reducing 3D folding.

Gene Editing

Genome modification relies on the DNA-repair machinery of the target cell. With respect to targeted mutagenesis, DNA double-strand breaks (DSBs) are frequently repaired by the error- prone non-homologous end joining (NHEJ) pathway, resulting in mutations at the break site. However, if a donor DNA with strong homology to the cleaved DNA is present, the chances of integration of the donor by homologous recombination increase significantly. See, e.g., Moehle et al., Proc. Natl Acad. Sci. USA, 9:3055-3060 (2007); Chen et al., Nat. Methods, 9, 753-755 (2011). If a donor replacement DNA is co-delivered with a nuclease, the ensuing DSB can stimulate recombination with sequences near the break site with sequences present on the donor DNA.

Insertion of exogenous DNA into the chromosome sometimes requires the concomitant integration of a selectable marker, which enables enrichment for transformed cells that have undergone the desired integration event. However, this may introduce extraneous sequences into the genome which may not be compatible with downstream applications. U. S. Patent Application Publication 20150184199 reports methods of homologous recombination resulting in formation of extrachromosomal nucleic acids having a fluorescent marker. FACS can be used to isolate the cells.

CPFl or variants can be used to create targeted DSBs or single-strand breaks, and can be used for targeted mutagenesis, gene targeting, gene replacement, targeted deletions, targeted inversions, targeted translocations, targeted insertions, and multiplexed genome modification through multiple DSBs in a single cell directed by co-expression of multiple targeting RNAs.

In certain embodiments, the disclosure relates to methods of a cutting or nicking a genome, e.g., chromosome or other double stranded nucleic acid (DNA or RNA) comprising a) providing a cell, e.g., eukaryotic cell comprising a target sequence within a genome, e.g., chromosome; b) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter; and c) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairing to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPFl protein and guide RNA and results in cutting or nicking at least one strand of the target sequence or both strands of a double stranded nucleic acid.

In certain embodiments, the target sequence does not contain a protospacer-adjacent motif. In certain embodiments, the target sequence is not 3' terminally connected to a 5'-NGG-3' polynucleotide or the target sequence is not 5' terminally connected to 3'-CCN-5' polynucleotide, wherein N is any nucleotide.

In certain embodiments, the nucleotide sequence encoding CPFl protein is codon optimized for expression in the eukaryotic cell.

In certain embodiments, the CPFl variant comprises one or more amino acid substitutions, additions, and/or deletions in a nuclease/cleavage domain resulting a CPFl variant that is capable of making single stranded nicks.

In certain embodiments, the disclosure relates to methods of substituting a replacement sequence into a genome, e.g., chromosome or other double stranded nucleic acid comprising a) providing a eukaryotic cell comprising a target sequence within a chromosome;

b) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a CPFl protein having a variant SEQ ID NO: 1 wherein the variant comprises one or more amino acid substitutions, additions, and/or deletions in a cleavage domain in operable combination with a promoter;

c) inserting into the eukaryotic cell a vector comprising a nucleic acid sequence encoding a guide RNA in operable combination with a promoter, wherein the guide RNA binds the encoded CPFl protein and the guide RNA a has segment with sufficient base pairs to hybridized to the target sequence; wherein inserting is done under conditions such that the eukaryotic cell express the CPFl protein and guide RNA providing a nicked target sequence; and

d) inserting into the eukaryotic cell a replacement double stranded nucleic acid under conditions such that homologous recombination of the replacement double stranded nucleic acid and the nicked target sequences provides a substitution of the nicked target sequence with the replacement double stranded nucleic acid.

In certain embodiments, the recombinant nucleic acids or vectors reported herein contain non-naturally occurring sequences as a whole, e.g., because of spacers, non-natural sequence modifications, or connections of heterologous sequences.

Embodiments of this disclosure provide strategies, systems, reagents, methods, and kits that are useful for the targeted editing of nucleic acids, including editing a single site within a genome, e.g., the human genome. In some embodiments, fusion proteins of CPFl or variants disclosed herein.

In certain embodiments, this disclosure relates to fusion proteins comprising (i) a CPFl variant having a nuclease-inactive domain; and (ii) a nucleic acid-editing domain. In some embodiments, the nucleic acid-editing domain is a DNA-editing domain. In some embodiments, the nucleic-acid-editing domain is a deaminase domain. In some embodiments, the deaminase is a cytidine deaminase. In some embodiments, the deaminase is an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. In some embodiments, the deaminase is an APOBECl family deaminase. In some embodiments, the deaminase is an activation-induced cytidine deaminase (AID). In some embodiments, the deaminase is an ACF1/ASE deaminase. In some embodiments, the deaminase is an adenosine deaminase. In some embodiments, the deaminase is an ADAT family deaminase. In some embodiments, the nucleic-acid-editing domain is fused to the N-terminus of the CPFl variant. In some embodiments, the nucleic-acid-editing domain is fused to the C-terminus of the CPFl variant. In some embodiments, the CPFl variant and the nucleic-acid-editing domain are fused via a linker.

In certain embodiments, the disclosure relates to methods for using one or more elements of a CFR1 expression system. In certain embodiments, this disclosure relates to using a CFR1 complex of the disclosure to provide an effective means for modifying a target polynucleotide. The CFR1 complex of this disclosure has a wide variety of utilities including modifying (e.g., deleting, inserting, translocating, inactivating, activating, repressing, altering methylation, transferring specific moieties) a target polynucleotide in a multiplicity of cell types. As such the CFR1 complex of the disclosure has a broad spectrum of applications in, e.g., gene or genome editing, gene regulation.

In certain embodiments, the disclosure contemplate designing and preparing guide RNAs having optimal activity, truncating CPFl making it smaller in length than the corresponding wild- type CPF 1 by truncating the nucleic acid molecules coding therefor and generating chimeric CPF 1. Aspects of the invention also relate to methods of improving the target specificity of a CPFl or of designing an expression system comprising by preparing guide RNAs having optimal activity. In certain embodiments, the disclosure contemplates packaging a nucleic acid coding therefor into a delivery vector.

In certain embodiments, the disclosure relates to uses of the present sequences, vectors, CPFl, and expression systems, in medicine. Also provided are the same for use in gene or genome editing. Also provided is use of the same in the manufacture of a medicament for gene or genome editing, for instance treatment by gene or genome editing.

In certain embodiments, the disclosure contemplates that CPFl may comprise one or more mutations and may be used as a generic DNA binding protein with or without fusion to or being operably linked to a functional domain. The mutations may be artificially introduced mutations and may include but are not limited to one or more mutations in a catalytic domain.

Other aspects of the disclosure relate to the mutated CPF 1 being fused to or operably linked to domains which include but are not limited to a nuclear localization signal (NLS) domain, transcriptional activator, transcriptional repressor, a recombinase, a transposase, a histone remodeler, a DNA methyltransferase, a cryptochrome, a light inducible/controllable domain or a chemically inducible/controllable domain.

Gene Therapies

In certain embodiments, the disclosure relates to methods of treating or preventing diseases, conditions, or infections comprising administering an effective amount recombinant vectors to a subject that encode CPFl and nucleic acid complexes disclosed herein, to a subject in need thereof. In certain embodiments, the disclosure relates to methods of treating or preventing viral infections or other pathogenic infection comprising administering an effective amount of vector configured to express a CPF1 -nucleic acid complex that targets viral or pathogenic nucleic acids.

In certain embodiment, the disclosure contemplates administration in combination with other therapeutic agents, anti-pathogenic agents, anti-viral agents, anti-bacterial agents or vaccines. In certain embodiments, the antiviral agent(s) are selected from abacavir, acyclovir, acyclovir, adefovir, amantadine, amprenavir, ampligen, arbidol, atazanavir, atripla, boceprevir, cidofovir, combivir, complera, darunavir, delavirdine, didanosine, docosanol, dolutegravir, edoxudine, efavirenz, emtricitabine, enfuvirtide, entecavir, famciclovir, fomivirsen, fosamprenavir, foscarnet, fosfonet, ganciclovir, ibacitabine, imunovir, idoxuridine, imiquimod, indinavir, inosine, interferon type III, interferon type II, interferon type I, lamivudine, lopinavir, loviride, maraviroc, moroxydine, methisazone, nelfinavir, nevirapine, nexavir, oseltamivir, peginterferon alfa-2a, penciclovir, peramivir, pleconaril, podophyllotoxin , raltegravir, ribavirin, rimantadine, ritonavir, pyramidine, saquinavir, stavudine, stribild, tenofovir, tenofovir disoproxil, tenofovir alafenamide fumarate (TAF), tipranavir, trifluridine, trizivir, tromantadine, truvada, valaciclovir, valganciclovir, vicriviroc, vidarabine, viramidine, zalcitabine, zanamivir, or zidovudine, and combinations thereof.

In certain embodiments, the disclosure contemplates treating and/or preventing viral infections by targeting both RNA and DNA viruses, e.g., targeting the genome of and/or transcript of RNA viruses or the viral transcript of DNA viruses. In some embodiments, the virus is or a subject is diagnosed with influenza A virus including subtype H1N1, influenza B virus, influenza C virus, rotavirus A, rotavirus B, rotavirus C, rotavirus D, rotavirus E, SARS coronavirus, human adenovirus types (HAdV-1 to 55), human papillomavirus (UPV) Types 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, parvovirus B19, molluscum contagiosum virus, JC virus (JCV), BK virus, Merkel cell polyomavirus, coxsackie A virus, norovirus, Rubella virus, lymphocytic choriomeningitis virus (LCMV), yellow fever virus, measles virus, mumps virus, respiratory syncytial virus, rinderpest virus, California encephalitis virus, hantavirus, rabies virus, ebola virus, marburg virus, herpes simplex virus- 1 (HSV-1), herpes simplex virus-2 (HSV-2), varicella zoster virus (VZV), Epstein-Barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, roseolovirus, Kaposi's sarcoma-associated herpesvirus, hepatitis A (HAV), hepatitis B (HBV), hepatitis C (HCV), hepatitis D (HDV), hepatitis E (HEV), human immunodeficiency virus (HIV), The Human T-lymphotropic virus Type I (HTLV-1), Friend spleen focus-forming virus (SFFV) or Xenotropic MuLV-Related Virus (XMRV).

In certain embodiments, the disclosure contemplates targeting multiple sites in the RNA genome of an RNA virus, or RNA transcript of a DNA virus for the purpose of preventing development of resistance by viruses.

In certain embodiments, the disclosure contemplates CPFl and a cocktail of gRNAs targeting different viruses could be used as a "one-shot" therapeutic.

In certain embodiments, the disclosure contemplates using the CPFl system disclosed herein to improve the ability of a subject to process and respond to a vaccine by administering a cloning vector disclosed herein in combination with a vaccine wherein a CPFl nucleic acid complex is configuring with gRNA to target mRNA expression of IL-10 and/or other antiinflammatory cytokines, and/or targeting mRNA expression PD-1/PD-L1.

In certain embodiments, the disclosure contemplates using the CPFl system for treating cancer. For example, gRNA may be configured to target mRNA or microRNA that is overexpressed in cancer cells or control the expression of oncogenes. Some cancers suppress the RNAi machinery, but would likely be unable to do the same with CPFl systems disclosed herein. Targeting mRNA with CPFl systems disclosed herein typically results in decreased expression of the gene product, while targeting microRNA typically results in increased expression of gene product.

In certain embodiments, the disclosure relates to treating or preventing cancer comprising administering a vector that expresses CPFl and guided nucleic acid complexes disclosed herein wherein the cancer is selected from brain, lung, cervical, ovarian, colon, breast, gastric, skin, ovarian, pancreatic, prostate, neck, and renal cancer.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPFl and a guided nucleic acid complex that targets mRNA or microRNA associated with an oncogene. In certain embodiments, target mRNA or microRNA are associated with K-ras, baculoviral IAP repeat containing 3, baculoviral IAP repeat containing 7, tumor protein p53, tumor protein p53 regulated apoptosis inducing protein 1, tumor protein p73, vascular endothelial growth factor A, v-akt murine thymoma viral oncogene, phosphatase and tensin, B-cell CLL/lymphoma 2, signal transducer and activator of transcription 3, epidermal growth factor receptor, v-erb-b2 avian erythroblastic leukemia viral oncogene, tumor necrosis factor, tumor necrosis factor superfamily member 14, nuclear factor of kappa light polypeptide gene enhancer in B-cells 1, catenin (cadherin-associated protein) beta 1, transforming growth factor beta 1, cyclin-dependent kinase inhibitor 1A, caspase 3, caspase 8, caspase 9, telom erase reverse transcriptase, hypoxia inducible factor 1 alpha subunit, ATP-binding cassette sub-family B, cyclin-dependent kinase inhibitor 2A, v-myc avian myelocytomatosis viral oncogene, insulin-like growth factor 1, matrix metallopeptidase 7, matrix metallopeptidase 9, interleukin 8, cyclin Bl, cyclin Dl, chemokine (C- C motif) ligand 2, cadherin 1, E-cadherin, mitogen-activated protein kinase 1, interferon gamma, tumor necrosis factor (ligand) superfamily member 10, microtubule-associated protein tau, X- linked inhibitor of apoptosis, Fas cell surface death receptor, retinoblastoma 1, Bcl-2, BCL2-like 2, BCL2-associated X protein, BCL2-antagonist/killer 1, caveolin 1, caveolae protein, mechanistic target of rapamycin, v-kit Hardy -Zuckerman 4 feline sarcoma viral oncogene, mitogen-activated protein kinase 14, adenomatous polyposis coli, aurora kinase B, cyclin-dependent kinase 1, cyclin- dependent kinase 4, cyclin-dependent kinase inhibitor IB, heme oxygenase (decy cling) 1, notch 1, notch 2, secreted phosphoprotein 1, mitogen-activated protein kinase 3, runt-related transcription factor 1, forkhead box 03, forkhead box P3, jun proto-oncogene, poly (ADP-ribose) polymerase 1, Harvey rat sarcoma viral oncogene, glycogen synthase kinase 3 beta, nitric oxide synthase 2, ras-related C3 botulinum toxin substrate 1, E1A binding protein p300, Fas ligand, ATP-binding cassette G2, CREB binding protein, protein kinase C alpha, fms-related tyrosine kinase 3, fibroblast growth factor 2, O-6-methylguanine-DNA methyltransferase, checkpoint kinase 2, diablo IAP -binding mitochondrial protein, parkinson protein 2, polo-like kinase 1, transcription factor 7-like 2, E2F transcription factor 1, high mobility group box 1, promyelocytic leukemia, BCL2-like 1, urokinase plasminogen activator, tumor necrosis factor receptor superfamily member 1A, proliferating cell nuclear antigen, urokinase receptor plasminogen activator, APEX nuclease, lectin galactoside-binding soluble 3, myeloid cell leukemia sequence 1, cannabinoid receptor 1, gap junction protein alpha 1, antigen identified by monoclonal antibody Ki-67, calcium-sensing receptor, thrombospondin 1, POU class 5 homeobox 1, hepatocyte nuclear factor 4 alpha, transforming growth factor beta receptor II, platelet-derived growth factor receptor alpha polypeptide, runt-related transcription factor 2, vascular endothelial growth factor C, early growth response 1, angiopoietin 2, BMI1 polycomb ring finger oncogen, parkinson protein 7, v- myc avian myelocytomatosis viral oncogene neuroblastoma, v-akt murine thymoma viral oncogene homolog 2, H2A histone family member X, tuberous sclerosis 2, exportin 1, peptidylprolyl cis/trans isomerase NIMA-interacting 1, dickkopf WNT signaling pathway inhibitor 1, beclin 1, platelet-derived growth factor beta polypeptide, cortactin, colony stimulating factor 2, fused in sarcoma, ets variant 6, GATA binding protein 1, RAN member RAS oncogene, Kruppel-like factor 4, Kruppel-like factor 5, lymphoid enhancer-binding factor 1, histone deacetylase 6, stathmin 1, folate hydrolase 1, RAS p21 protein activator 1, serine/arginine-rich splicing factor 1, glypican 3, cell adhesion molecule 1, wingless-type MMTV integration site family, member 1, platelet-derived growth factor alpha polypeptide, junction plakoglobin, protein arginine methyltransferase 1, interleukin 11, retinoblastoma-like 2, E2F transcription factor 3, tumor-associated calcium signal transducer 2, XIAP associated factor 1, microtubule-associated protein 4, sirtuin 6, Wilms tumor 1 associated protein, or combinations thereof.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with growth factors, or mitogens, e.g. c-Sis, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with glioblastoma, fibrosarcoma, osteosarcoma, breast carcinoma, or melanoma.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with receptor tyrosine kinases, e.g., epidermal growth factor receptor (EGFR), platelet-derived growth factor receptor (PDGFR), and vascular endothelial growth factor receptor (VEGFR), HER2/neu, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with breast cancer, gastrointestinal stromal tumors, non-small-cell lung cancer, or pancreatic cancer.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with cytoplasmic tyrosine kinases, e.g., Src-family, Syk-ZAP-70 family, and BTK family of tyrosine kinases, Abl, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with colorectal, breast cancers, melanomas, ovarian cancers, gastric cancers, head and neck cancers, pancreatice cancer, lung cancer, brain cancers, or blood cancers.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with cytoplasmic Serine/threonine kinases and their regulatory subunits, e.g., Raf kinase, and cyclin- dependent kinases, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with malignant melanoma, papillary thyroid cancer, colorectal cancer, or ovarian cancer.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with regulatory GTPases, e.g., Ras protein, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with adenocarcinomas of the pancreas and colon, thyroid tumors, or myeloid leukemia

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex that targets mRNA or microRNA associated with transcription factors, e.g., myc, to a subject in need thereof. In certain embodiments, the cancer is selected from or the subject is diagnosed with malignant T-cell lymphomas and acute myleoid leukemias, breast cancer, pancreatic cancer, retinoblastoma, and small cell lung cancer

In certain embodiments, the disclosure contemplates targeting multiple sites in a cancer oncogene, or any gene desirable to knockdown in cancer cells for the purpose of preventing the development of resistance in the cancer cells.

In certain embodiments, the disclosure relates to methods of treating cancer comprising administering an effective amount of cloning vector disclosed herein that is configure to express CPF1 and a guided nucleic acid complex in combination with chemotherapies. In certain embodiments, the chemotherapy includes that administration of

In certain embodiments, the disclosure contemplates using the CPF1 system disclosed herein to improve the ability of a subject to process and respond to chemotherapies by administering a cloning vector disclosed herein in combination with a chemotherapies wherein a CPFl nucleic acid complex is configuring with gRNA to target mRNA expression of IL-10 and/or other anti-inflammatory cytokines, and/or targeting mRNA expression PD-1/PD-L1.

Transgenic animals

In certain embodiments, the disclosure relates to transgenic animals containing mutations and genetic alterations made by methods disclosed herein. Embryos may be selected with desired and/or non-naturally occurring nucleic acid sequence modifications, and used for fertilization and growth.

In certain embodiments the disclosure contemplates transgenic animals that express CPFl systems disclosed herein to prevent pathogenic infections, e.g., viruses. Non-limiting examples of contemplated transgenic animals include fish, livestock, and pets. In certain embodiments, the disclosure contemplates transforming embryonic stem cells (ES cells) growing in tissue culture with the desired nucleic acids that encode or express a CPFl system disclosed herein. In certain embodiments, the disclosure contemplates injecting a cloning vector disclosed herein into isolated embryonic stem cells of a human or non-human animal.

One can transform ES cells in culture by mixing embryonic stem cells with a vector that encodes CPFl systems disclosed herein under conditions that the ES cells incorporated the nucleic acids into the genome of the ES cell. One can isolate and select successfully transformed cells by injecting transformed cells into the inner cell mass (ICM) of a blastocyst, followed by preparing a pseudopregnant animal, e.g., by mating a female with a vasectomized male. The stimulus of mating elicits the hormonal changes typically needed to make the uterus receptive. Alternatively, direct administration of hormones may be utilized. Implanting the embryos into the uterus provides conditions to develop a transgenic animal with nucleic acids that express CPFl systems disclosed herein.

As an alternative method to create a transgenic animal, one can transform fertilized eggs by injecting a cloning vector into the sperm pronucleus. After fusion the zygote will divide to form two embryo cells. One can implant the embryos in a pseudopregnant foster as described above.

In certain embodiments, the disclosure contemplates a transgenic animal comprising a nucleic acid that express CPFl systems disclosed herein in combination with another protein, e.g., growth hormone. The cloning vectors disclosed herein may be configured to replace a target gene. In certain embodiments, the disclosure relates to transgenic sheep or goats comprising nucleic acids that express CPFl systems disclosed herein and nucleic acids that express a recombinant protein in their milk.

In certain embodiments, the disclosure contemplates a transgenic chicken comprising nucleic acids that express CPFl systems disclosed herein and nucleic acids that express a recombinant protein in their eggs, e.g., whites.

Transgenic plants

In certain embodiments, the disclosure relates to transgenic plants containing mutations and genetic alterations made by methods disclosed herein. In certain embodiments, the disclosure relates to methods for modifying the genomic material in a plant cell, comprising: (a) introducing into the plant cell a nucleic acid molecule, wherein the nucleic acid molecule a guide RNA, wherein the guide RNA is targeted to a sequence that is endogenous to the plant cell; and (b) introducing into the plant cell a CPFl or variant or a nucleic acid molecule comprising a sequence encoding CPFl or variant, and (c) introducing into the plant cell a replacement nucleic acid or a vector encoding a replacement nucleic acid, wherein the CPFl induces a double strand break or single strand nick at or near the sequence to which the guide RNA is targeted, and wherein the replacement nucleic acid modifies the genomic material at or near the guide RNA target site. See U.S. Patent Application Publications 20150167000, 20150082478, and 20150067922.

Plant viruses can be effective vectors for delivery of heterologous nucleic acid sequences.

Useful plant viruses include both RNA viruses (e.g., tobacco mosaic virus, tobacco rattle virus, potato virus X, and barley stripe mosaic virus) and DNA viruses (e.g., cabbage leaf curl virus, bean yellow dwarf virus, wheat dwarf virus, tomato leaf curl virus, maize streak virus, tobacco leaf curl virus, tomato golden mosaic virus, and Faba bean necrotic yellow virus. Such plant viruses are modified for the delivery of CPFl complex components reported herein.

In certain embodiments, the disclosure contemplates plants genetically engineered to express CPFl -nucleic acid complexes disclosed herein, e.g., for the purpose of preventing infections from viral or other pests. In certain embodiments, the present disclosure relates to genetically modifying a plant to confer pest resistance by transforming a host plant cell with a heterologous nucleic acid configured to express a CPFl-nucleic acid complex disclosed herein. In certain embodiments, the disclosure provides recombinant nucleic acid constructs for use in achieving stable transformation of particular host targets, e.g., plants and plant cells. Transformed host targets may express effective levels of CPFl systems disclosed herein from the recombinant nucleic acid constructs. Provided according to the disclosure are nucleic acids that express certain CPFl and RNA(s) that binds the CPFl conjugated to a nucleic acid sequences that hybridizes to an RNA molecule of a targeted gene in a plant or plant pest or combinations thereof.

In certain embodiment, the disclosure provides nucleic acid sequences capable of being expressed as RNA in a cell to inhibit target gene expression in a cell or tissue of a plant, plant pest or combinations thereof. The sequences comprise a nucleic acid molecule coding for one or more different nucleotide sequences, wherein each of the different nucleotide sequences target a plant pest RNA molecule. The sequences may be connected by a spacer sequence. The nucleic acid molecule that encodes the CPFl and targeting RNA may be placed operably under the control of a promoter sequence that functions in the cell or tissue of the host.

In certain embodiments, a targeted sequence is in the genome of the pest or the RNA of a gene in the genome of the pest. In certain embodiments, a targeted sequence is selected that is essentially involved in the growth and development of a pest, for example, mRNA of proteins that play important roles in viability, growth, development, infectivity and of the pest. These mRNA targets may be one of the house keeping genes, transcription factors and the like.

In certain embodiments, the disclosure provides a nucleic acid sequence for expression in a cell of a plant that, upon expression of the CPFl and targeting RNA and ingestion by a plant pest, achieves suppression of a target in a cell or tissue. Methods to express a gene suppression molecule in plants are known (e.g. WO06073727 A2; US Publication 2006/0200878 Al), and may be used to express a nucleotide sequence disclosed herein.

A nucleic acid sequence may be cloned between two tissue specific promoters, such as two root specific promoters which are operable in a transgenic plant cell and therein expressed to produce mRNA in the transgenic plant cell. Examples of root specific promoters are known in the art (e.g. the nematode-induced RB7 promoter; U.S. Pat. No. 5,459,252).

Promoters that function in different plant species are also well known in the art. Promoters useful for expression of polypeptides in plants include those that are inducible, viral, synthetic, or constitutive, and/or promoters that are temporally regulated, spatially regulated, and spatio- temporally regulated. Preferred promoters include the enhanced CaMV35S promoters, and the FMV35S promoter. A fragment of the CaMV35S promoter exhibiting root-specificity may also be preferred. For the purpose of the present disclosure, it may be preferable to achieve the highest levels of expression of these genes within the root tissues of plants. A number of root-specific promoters have been identified and are known in the art (e.g. U.S. Pat. Nos. 5, 110,732; 5,837,848; 5,459,252).

A recombinant vector or cloning vector of the present disclosure may also include a screenable marker. Screenable markers may be used to monitor expression. Exemplary screenable markers include a beta-glucuronidase or uidA gene (GUS) which encodes an enzyme for which various chromogenic substrates are known; an R-locus gene, which encodes a product that regulates the production of anthocyanin pigments (red color) in plant tissues; a beta-lactamase gene, a gene which encodes an enzyme for which various chromogenic substrates are known (e.g., PAD AC, a chromogenic cephalosporin); a luciferase gene a xylE gene which encodes a catechol dioxygenase that can convert chromogenic catechols; an alpha-amylase gene; a tyrosinase gene which encodes an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone which in turn condenses to melanin; an alpha-galactosidase, which catalyzes a chromogenic alpha-galactose substrate.

Preferred plant cloning or transformation vectors include those derived from a Ti plasmid of Agrobacterium tumefaciens (e.g. U.S. Pat. Nos. 4,536,475, 4,693,977, 4,886,937, 5,501,967 and EP 0 122 791). Agrobacterium rhizogenes plasmids (or "Ri") are also useful and known in the art. A transgenic plant formed using Agrobacterium transformation methods typically contains a single simple recombinant DNA sequence inserted into one chromosome and is referred to as a transgenic event. Such transgenic plants can be referred to as being heterozygous for the inserted exogenous sequence. A transgenic plant homozygous with respect to a transgene can be obtained by sexually mating (selfing) an independent segregant transgenic plant that contains a single exogenous gene sequence to itself, for example an F0 plant, to produce Fl seed. One fourth of the Fl seed produced will be homozygous with respect to the transgene. Germinating Fl seed results in plants that can be tested for heterozygosity, typically using a SNP assay or a thermal amplification assay that allows for the distinction between heterozygotes and homozygotes (i.e., a zygosity assay). Crossing a heterozygous plant with itself or another heterozygous plant typically results in only heterozygous progeny. In general it may be preferred to introduce a functional recombinant DNA at a non-specific location in a plant genome. In special cases it may be useful to insert a recombinant nucleic acid construct by site-specific integration. Several site-specific recombination systems exist which are known to function in plants include cre-lox as disclosed in U.S. Pat. No. 4,959,317 and FLP-FRT as disclosed in U.S. Pat. No. 5,527,695.

In certain embodiments, a seed having the ability to express a CPF1 system disclosed herein also has a transgenic event that provides herbicide tolerance. One beneficial example of a herbicide tolerance gene provides resistance to glyphosate, N-(phosphonomethyl)glycine, including the isopropylamine salt form of such herbicide.

In addition to direct transformation of a plant with a recombinant DNA construct, transgenic plants can be prepared by crossing a first plant having a recombinant DNA construct with a second plant lacking the construct. For example, recombinant DNA for gene suppression can be introduced into first plant line that is amenable to transformation to produce a transgenic plant that can be crossed with a second plant line to introgress the recombinant DNA for gene suppression into the second plant line.

In certain embodiments, the present disclosure may be used for transformation of any plant, including, but not limited to, corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatus), cassaya (Manihot esculenta), coffee (Cofea ssp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidental), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), oats, barley, vegetables, ornamentals, and conifers.

In certain embodiments, crop plants are contemplated (for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassaya, barley, pea, and other root, tuber, or seed crops. Important seed crops for the present disclosure are oil-seed rape, sugar beet, maize, sunflower, soybean, and sorghum. In certain embodiments, horticultural plants are contemplated including lettuce, endive, and vegetable brassicas including cabbage, broccoli, and cauliflower, and carnations, geraniums, petunias, and begonias. The present disclosure may be applied to tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum, poplar, eucalyptus, and pine. In certain embodiments, plants such as grain seeds, such as corn, wheat, barley, rice, sorghum, rye are contemplated. In certain embodiments, plants such as oil-seed plants are contemplated. Oil seed plants include canola, cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, etc. In certain embodiments, plants such as leguminous plants are contemplated. Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mung bean, lima bean, fava bean, lentils, chickpea, etc.

In certain embodiments, the plants are monocots and/or dicots. Non-limiting examples of useful monocots are rice, corn, wheat, palm trees, turf grasses, barley, and oats. Non-limiting examples of useful dicots are soybean, cotton, alfalfa, canola, flax, tomato, sugar beet, sunflower, potato, tobacco, corn, wheat, rice, lettuce, celery, cucumber, carrot, cauliflower, grape, and turf grasses. In certain embodiments, plants such as flowering plants, trees, grasses, shade plants, and flowering and non-flowering ornamental plants are contemplated.

Plant pests useful in the present disclosure (i.e., can be rendered non-pathogenic or reduced pathogenicity), include fungi, nematodes, bacteria, and parasitic plants such as striga, dodder and mistletoe. Plant pests usefully treated by the present disclosure include the downy mildews.

The skilled artisan can readily identify pest genes to target. Such a gene could be any pest gene that serves a direct or indirect role in such a pest's deleterious effects on a host plant. By way of example only, such a gene may be one that serves a role in pest growth, development, replication and reproduction, and invasion or infection.

In certain embodiments, the pest is a plant virus. Exemplary of such plant viruses are soybean mosaic virus, bean pod mottle virus, tobacco ring spot virus, barley yellow dwarf virus, wheat spindle streak virus, soil born mosaic virus, wheat streak virus in maize, maize dwarf mosaic virus, maize chlorotic dwarf virus, cucumber mosaic virus, tobacco mosaic virus, alfalfa mosaic virus, potato virus X, potato virus Y, potato leaf roll virus and tomato golden mosaic virus. Among these, protection against maize dwarf mosaic virus, barley yellow dwarf virus, wheat streak mosaic virus, soil born mosaic virus, potato leafroll virus and cucumber mosaic virus is particularly important. In certain embodiments, the pest is Botrytis cinerea, a necrotrophic pathogenic fungus with an exceptionally wide host range. The cultivated tomato (predominantly Lycopersicon esculentum) is also susceptible to infection by Botrytis and the fungus generally affects stem, leaves and fruit of the tomato plant.

Vector Delivery

Nucleic acids encoding the CPF1 protein or variants and/or any of the present RNAs, for instance a guide RNA, can be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. CPF1 and one or more guide RNAs can be packaged into one or more viral vectors. In some embodiments, the viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the viral delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector chosen, the target cell, organism, or tissue, the general condition of the subj ect to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.

Such a dosage may further contain, for example, a carrier (water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.), a diluent, a pharmaceutically-acceptable carrier (e.g., phosphate-buffered saline), a pharmaceutically-acceptable excipient, an adjuvant to enhance antigenicity, an immunostimulatory compound or molecule, and/or other compounds known in the art. The adjuvant herein may contain a suspension of minerals (alum, aluminum hydroxide, aluminum phosphate) on which antigen is adsorbed; or water-in-oil emulsion in which antigen solution is emulsified in oil (MF-59, Freund's incomplete adjuvant), sometimes with the inclusion of killed mycobacteria (Freund's complete adjuvant) to further enhance antigenicity (inhibits degradation of antigen and/or causes influx of macrophages). Adjuvants also include immunostimulatory molecules, such as cytokines, costimulatory molecules, and for example, immunostimulatory DNA or RNA molecules, such as CpG oligonucleotides. Such a dosage formulation is readily ascertainable by one skilled in the art. The dosage may further contain one or more pharmaceutically acceptable salts such as, for example, a mineral acid salt such as a hydrochloride, a hydrobromide, a phosphate, a sulfate, etc.; and the salts of organic acids such as acetates, propionates, malonates, benzoates, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, gels or gelling materials, flavorings, colorants, microspheres, polymers, suspension agents, etc. may also be present. In addition, one or more other conventional pharmaceutical ingredients, such as preservatives, humectants, suspending agents, surfactants, antioxidants, anticaking agents, fillers, chelating agents, coating agents, chemical stabilizers, etc. may also be present, especially if the dosage form is a reconstitutable form. Suitable exemplary ingredients include microcrystalline cellulose, carboxymethylcellulose sodium, polysorbate 80, phenylethyl alcohol, chlorobutanol, potassium sorbate, sorbic acid, sulfur dioxide, propyl gallate, the parabens, ethyl vanillin, glycerin, phenol, parachlorophenol, gelatin, albumin and a combination thereof. A thorough discussion of pharmaceutically acceptable excipients is available in REMINGTON'S PHARMACEUTICAL SCIENCES (Mack Pub. Co., N.J. 1991) which is incorporated by reference herein.

In an embodiment herein the delivery is via an adenovirus, which may be at a single booster dose See, for example, the adenoviral vectors in U.S. Pat. No. 8,454,972 to Nabel, et. al.; incorporated by reference herein, and the dosages thereof. In an embodiment herein, the adenovirus is delivered via multiple doses.

In an embodiment herein, the delivery is via an AAV. A therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of from about 20 to about 50 ml of saline solution containing from about 1 x 10 1 to about 1 x 10 10 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects. Such concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution. Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al.

In an embodiment herein, the delivery is via a plasmid. In such plasmid compositions, the dosage should be a sufficient amount of plasmid to elicit a response. For instance, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg, or from about 1 μg to about 10 μg. The doses herein are based on an average 70 kg individual. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or scientist skilled in the art. The vectors disclosed herein can be injected into the tissue of interest. For cell-type specific genome modification, the expression of CPF1 can be driven by a cell-type specific promoter. For example, liver-specific expression might use the Albumin promoter and neuron-specific expression might use the Synapsin I promoter.

CPF1 and/or any of the present RNAs, for instance a guide RNA, can also be delivered in the form of RNA. CPF1 mRNA can be generated using in vitro transcription.

To enhance expression and reduce toxicity, CPF1 and/or guide RNA can be modified using pseudo-U or 5-Methyl-C. CPF1 mRNA and guide RNA may be delivered simultaneously using nanoparticles or lipid envelopes. For example, Su et al. report in vitro and in vivo mRNA delivery using lipid-enveloped pH-responsive polymer nanoparticles (Mol Pharm, 2011, 8(3):774-87) and describes biodegradable core-shell structured nanoparticles with a poly(beta-amino ester) (PBAE) core enveloped by a phospholipid bilayer shell. These were developed for in vivo mRNA delivery. The pH-responsive PBAE component was chosen to promote endosome disruption, while the lipid surface layer was selected to minimize toxicity of the polycation core.

Furthermore, Kormann et al. report expression of therapeutic proteins after delivery of chemically modified mRNA in mice describes the use of lipid envelopes to deliver RNA. Nature Biotechnology, 29: 154-157 (2011).

In certain embodiments, mRNA delivery methods are especially promising for liver delivery.

In certain embodiments, CPF1 mRNA and guide RNA might also be delivered separately.

CPF1 mRNA can be delivered prior to the guide RNA to give time for CPF1 to be expressed. CPF1 mRNA might be administered 1-12 hours (preferably around 2-6 hours) prior to the administration of guide RNA. Alternatively, CPF1 mRNA and guide RNA can be administered together. Advantageously, a second booster dose of guide RNA can be administered 1-12 hours (preferably around 2-6 hours) after the initial administration of CPF1 mRNA and guideRNA.

Optimal concentrations of CPF1 enzyme mRNA and guide RNA can be determined by testing different concentrations in a cellular or animal model and using deep sequencing to analyze the extent of modification at potential off-target genomic loci. For example, for the guide sequence targeting gene of the human genome, deep sequencing can be used to assess the level of modification at two off-target loci. The concentration that gives the highest level of on-target modification while minimizing the level of off-target modification should be chosen for in vivo delivery.

In certain embodiments, CPF 1 mRNA can be delivered with a pair of guide RNAs targeting a site of interest. In certain embodiments, the disclosure relates to the expression of a gene product being decreased or a template polynucleotide being further introduced into the DNA molecule encoding the gene product or an intervening sequence being excised precisely by allowing the two 5' overhangs of the guide RNAs to reanneal and ligate. In certain embodiments, the activity or function of the gene product being altered or the expression of the gene product is increased. In an embodiment of the invention, the gene product is a protein.

Additional delivery options for the brain include encapsulation of CPF 1 and guide RNA in the form of either DNA or RNA into liposomes and conjugating to molecular Trojan horses for trans-blood brain barrier (BBB) delivery. Molecular Trojan horses have been shown to be effective for delivery of B-gal expression vectors into the brain of non-human primates. The same approach can be used to delivery vectors containing CPF1 and guide RNA. For instance, Xia et al report antibody -mediated targeting of siRNA via the human insulin receptor using avidin-biotin technology (Mol Pharm, 2009, 6(3):747-51) and describes how delivery of short interfering RNA (siRNA) to cells in culture, and in vivo, is possible with combined use of a receptor-specific monoclonal antibody (mAb) and avidin-biotin technology. The authors also report that because the bond between the targeting mAb and the siRNA is stable with avidin-biotin technology, and RNAi effects at distant sites such as brain are observed in vivo following an intravenous administration of the targeted siRNA.

Zhang et al. report global non-viral gene transfer to the primate brain following intravenous administration (Mol Ther, 2003, 7(1): 11-8) and describe how expression plasmids encoding reporters such as luciferase were encapsulated in the interior of an "artificial virus" comprised of an 85 nm pegylated immunoliposome, which was targeted to the rhesus monkey brain in vivo with a monoclonal antibody (MAb) to the human insulin receptor (HIR). The HIRMAb enables the liposome carrying the exogenous gene to undergo transcytosis across the blood-brain barrier and endocytosis across the neuronal plasma membrane following intravenous injection. The level of luciferase gene expression in the brain was 50-fold higher in the rhesus monkey as compared to the rat. Widespread neuronal expression of the beta-galactosidase gene in primate brain was demonstrated by both histochemistry and confocal microscopy. The authors indicate that this approach makes feasible reversible adult transgenics in 24 hours. Accordingly, the use of immunoliposome is preferred. These may be used in conjunction with antibodies to target specific tissues or cell surface proteins.

Other means of delivery or RNA are also preferred, such as via nanoparticles (Cho et al., Advanced Functional Materials, 2010, 19: 3112-3118) or exosomes (Schroeder et al. J Intern Med, 2010, 267: 9-21). El-Andaloussi et al. report exosome-mediated delivery of siRNA in vitro and in vivo (Nat Protoc 2012, 7(12):2112-26) and describe how exosomes are promising tools for drug delivery across different biological barriers and can be harnessed for delivery of siRNA in vitro and in vivo.

Bacteria and Phage vectors

Provided are compositions and methods for selectively reducing the amount of antibiotic resistant and/or virulent bacteria in a mixed bacteria population, or for reducing any other type of unwanted bacteria in a mixed bacteria population. The compositions and methods involve targeting bacteria that are differentiated from other members of the population by delivery of a vector encoding a trans-activating CPF1 and a targeting guide RNA into the cell results in expression and assembly of the CPF1 guide RNA complex. This complex cleaves the targeted gene (antibiotic resistance gene) resulting in bacterial cell death.

Genetic resistance to antibiotics can arise through mutations in chromosomal DNA of bacteria or by acquisition of foreign episomes that contain resistance genes. Antibiotic-sensitive bacteria can acquire resistance genes from other bacteria that become chromosomally integrated and cause the bacteria to become resistant to the antibiotic. Episomes are replicating extrachromosomal DNA elements that either remain as intact entities inside the bacterium or integrate into the bacterial chromosome.

In certain embodiments, the vector is a plasmid, phage or a phagemid. A phagemid is a plasmid that contains an fl origin of replication from an fl phage It can be used as a vector in combination with filamentous phage Ml 3.

In certain embodiments, this disclosure contemplates recombinant phage that express CPF1 or variants disclosed herein optionally in combination with guide sequences. Yosef et al. report temperate and lytic bacteriophages programmed to sensitize and kill antibiotic-resistant bacteria. Proc Natl Acad Sci U S A. 2015, 112(23):7267-72. In certain embodiments, this disclosure relates to recombinant temperate or lytic bacteriophages that encode CPFl and variants disclosed herein and guide RNA sequences that target antibiotic escape mutants in order to destroy specific DNAs that confer antibiotic resistance and to concurrently confer a selective advantage to antibiotic-sensitive bacteria by virtue of resistance to lytic phages. In certain embodiments, the phage is lamda phage. The certain embodiments, the phage incorporate into the bacterial genome.

In certain embodiments, the guide RNA targets methicillin resistance gene in methicillin- resistant Staphylococcus aureus (MRSA). In certain embodiments the guide RNA targets a carbenicillin resistance gene of Escherichia coli. In certain embodiments the guide RNA targets a kanamycin-resistant S. aureus.

In certain embodiments, the disclosure relates to bacteria comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter and a heterologous nucleic acid sequence that encodes a guide RNA sequence. In certain embodiments, the guide sequences targets a bacterial resistance gene. In certain embodiments, the bacteria encodes or does not naturally encode a CPFl protein having SEQ ID NO: 1 or variant thereof providing a heterologous nucleic acid that encodes CPFl and a heterologous nucleic acid that encodes the guide RNA.

In certain embodiments, the nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter is in the bacterial chromosome or on an episome or in a vector inside the bacteria, e.g., using bacteriophage or bacteria carrying plasmids transmissible by conjugation.

In certain embodiments, the heterologous nucleic acid sequence that encodes a guide RNA sequence is in the bacterial chromosome or on an episome or in a vector inside the bacteria, e.g., using bacteriophage or bacteria carrying plasmids transmissible by conjugation.

In certain embodiments, the disclosure relates to non-naturally occurring recombinant bacteria comprising a nucleic acid sequence encoding a CPFl protein having SEQ ID NO: 1 or variant in operable combination with a promoter and a heterologous nucleic acid sequence that encodes a guide RNA sequence.

In certain embodiments, the disclosure provides pharmaceutical compositions for selectively reducing the amount of bacteria in a mixed bacteria population wherein the composition comprises a pharmaceutically acceptable carrier and a packaged, recombinant phage. The phage comprises a nucleotide sequences encoding i) CPFl; and ii) a targeting guide RNA selected from at a) least one bacterial chromosome targeting guide RNA; or b) at least one plasmid targeting guide RNA; or a combination of a) and b). In certain embodiments, the targeting guide RNA is directed to a bacterial virulence gene or an antibiotic resistance gene in the bacteria. Such gene targets can be on the bacterial chromosome, a plasmid in the bacteria, or both. In certain embodiments, the targeting guide RNA is specific for a DNA sequence present in a virulent and/or antibiotic resistant bacteria, but the DNA sequence is not present in non-virulent and non-antibiotic resistant bacteria in the bacterial population.

Kits

In certain embodiments the disclosure contemplates a kit, comprising: one or more vectors disclosed herein and a set of instructions comprising at least one method for transfecting a cell with said first and second vectors. In one embodiment, the set of instructions further comprise at least one method for differentiating a pluripotent stem cell into a somatic cell with the vectors. In one embodiment, the set of instructions further comprise at least one method for reprogramming a somatic cell into an induced pluripotent stem cell with the vectors. In one embodiment, the somatic cell is selected from the group consisting of a mesenchymal somatic cell, a fibroblast somatic cell, a cardiomyocyte somatic cell, a hematopoietic cell, a neuronal somatic cell, a fibroblast somatic cell, a midbrain dopamine somatic cell, and a pancreatic beta somatic cell.

In one embodiment, the present invention contemplates a kit, comprising: kit, comprising: one or more vectors disclosed herein and a set of instructions comprising at least one method for editing a specific target sequence within a cell with said first and second vectors.

In some embodiments, the kits can optionally include enzymes capable of performing PCR (i.e., for example, DNA polymerase, Taq polymerase and/or restriction enzymes).

In some embodiments, the kits can optionally include a delivery vehicle for said vectors (e.g., a liposome). The reagents may be provided suspended in the excipient and/or delivery vehicle or may be provided as a separate component which can be later combined with the excipient and/or delivery vehicle.

In some embodiments, the kits may optionally contain additional therapeutics to be coadministered with the vectors to affect the desired transcriptional regulation. In some embodiments, the kits may also optionally include appropriate systems (e.g. opaque containers) or stabilizers (e.g. antioxidants) to prevent degradation of the reagents by light or other adverse conditions.

In some embodiments, the kits may optionally include instructional materials containing directions (i.e., protocols) providing for the use of the reagents in affecting transcriptional regulation of cell cultures and delivery of said vectors to said cell cultures. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media include, but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), and the like. Such media may include addresses to internet sites that provide such instructional materials.

F. novicida CpflMutants

Mutation of D917 will inactivate a cleavage site. Mutating this residue can be used to make a nickase Cpfl . It can also be mutated to make a deactivated or dCpfl . Mutating any of these residues alone or combination thereof (e.g., with D917) in F. novicida Ul 12 Cpfl disrupts DNA targeting activity, creating a dCpfl or nickase by disrupting active sites or DNA/RNA recognition and binding: D792, D917, E1006, E1028, D1255, N887, D870, R872, N802, T805, Y807, D40, K326, R202, N207, P23, F786, R833, N816, R918, Y952, N877, W971, K981, Y984, F1012, Q1056, Q1070, T1082, P1087, K595, Y1024, T805, L957, W971, 1977, K981, Y984, K1034, T1082, S1083, P1087, R1218, P1232, N1257, A1259.

In certain embodiment the mutant is contemplated to be Alanine or Glycine (A or G) or a non-conserved substitution.