Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CAS12 PROTEIN, CRISPR-CAS SYSTEM AND USES THEREOF
Document Type and Number:
WIPO Patent Application WO/2024/042479
Kind Code:
A1
Abstract:
Provided is a an engineered, non-naturally occurring Cas12 protein comprising an amino acid sequence selected from SEQ ID NOs: 1- 35, or a homologue thereof having at least 70% sequence identity. Also provided are sgRNAs, CRISPR-Cas systems, cells and uses thereof. These Cas12 proteins should enable wider application of CRISPR-Cas systems for gene editing and gene targeting.

Inventors:
WANG BANG (CN)
Application Number:
PCT/IB2023/058394
Publication Date:
February 29, 2024
Filing Date:
August 24, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GENEDITBIO LTD (CN)
International Classes:
C12N9/22; C07K14/195; C12N15/11; C12N15/63
Domestic Patent References:
WO2022092317A12022-05-05
Foreign References:
CN111676246A2020-09-18
CN110799525A2020-02-14
US20210009974A12021-01-14
Other References:
LUAN TIAN, GONG JUN, LUAN HUI, LIU WEN-YU, YANG QIN, ZHU YAO, WANG CHUN-LAI, LIU SI-GUO, ZHANG WAN-JIANG; LI GANG: "Establishment of a visual method for detection of Actinobacillus pleuropneumoniae based on CRISPR-Cas12a", ZHONGGUO YUFANG SHOUYI XUEBAO - CHINESE JOURNAL OF PREVENTIVE VETERINARY MEDICINE, ZHONGGUO NONGYE KEXUEYUAN * HA'ERBIN SHOUYI YANJIUSUO, CN, vol. 43, no. 8, 1 August 2021 (2021-08-01), CN , pages 843 - 847, XP093143828, ISSN: 1008-0589, DOI: 10.3969/j.issn.1008-0589.202012051
HUANG HONGXIN, HUANG GUANJIE, TAN ZHIHONG, HU YONGFEI, SHAN LIN, ZHOU JIAJIAN, ZHANG XIN, MA SHUFENG, LV WEIQI, HUANG TAO, LIU YUC: "Engineered Cas12a-Plus nuclease enables gene editing with enhanced activity and specificity", BMC BIOLOGY, vol. 20, no. 1, 1 December 2022 (2022-12-01), XP093004963, DOI: 10.1186/s12915-022-01296-1
Attorney, Agent or Firm:
SHANGHAI BESHINING LAW OFFICE (CN)
Download PDF:
Claims:
What is claimed is:

1. An engineered, non-naturally occurring Casl2 protein, wherein the Casl2 protein comprises an amino acid sequence selected from SEQ ID NOs: 1-35, or a homologue thereof having at least 70% sequence identity.

2. The engineered, non-naturally occurring Casl2 protein of claim 1, wherein the Casl2 protein comprises an amino acid sequence having at least 75%, 80%, 85%, 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35; or, the Casl2 protein comprises an amino acid sequence having at least 90%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.

3. The engineered, non-naturally occurring Casl2 protein of claim 1, wherein the amino acid sequence of Casl2 protein lacks of 25-40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71; preferably, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71; more preferably, the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71; further more preferably, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

4. The engineered, non-naturally occurring Casl2 protein of claim 3, wherein the Casl2 protein has an amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14, or a homologue thereof having at least 70% sequence identity; preferably, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14; more preferably, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.

5. The engineered, non-naturally occurring Casl2 protein of claim 1, wherein the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13; or, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13; preferably, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.

6. The engineered, non-naturally occurring Casl2 protein of claim 1, wherein the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26; or, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26; or, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.

7. The engineered, non-naturally occurring Casl2 protein of claim 1, wherein the Casl2 protein further comprises promoter sequence, enhancer sequence, and/or termination region sequence; preferably, the Casl2 protein, based on any one of SEQ ID NOs: 1-35, comprises an M amino acid residue at its N-terminus.

8. The engineered, non-naturally occurring Casl2 protein of any one of claims 1-7, wherein the Casl2 protein further comprises one or more of a nuclear localization signal sequence, a cell penetrating peptide sequence, an affinity tag and/or a fusion base editor protein; preferably, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152.

9. An engineered, non-naturally occurring cell comprising the Cas 12 protein of any one of claims 1-8; preferably, the cell is a eukaryotic cell or a prokaryotic cell; more preferably, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.

10. A kit comprising the Casl2 protein of any one of claims 1-8.

11. An engineered, non-naturally occurring Cas 12 polynucleotide encoding the Cas 12 protein of any one of claims 1-8; preferably, the polynucleotide is ribonucleotide sequence or deoxyribonucleotide sequence, or analogs thereof; preferably the polynucleotide is mRNA, and the polynucleotide further comprises 5’cap sequence and poly-A tail sequence; more preferably, the polynucleotide is codon optimized for expression in a cell of interest; further more preferably, the polynucleotide is codon optimized for expression in a eukaryotic cell; preferably the polynucleotide has at least 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 111-120; e.g. the polynucleotide has the sequence selected from SEQ ID NOs: 111-120; or the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112- 115 or SEQ ID NOs: 117-120; or, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell.

12. The polynucleotide of claim 11, wherein the polynucleotide has at least 70% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87; or, the polynucleotide has at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 98% or 99% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.

13. The engineered, non-naturally occurring Cas 12 protein of any one of claims 1-8, or the

Cas 12 polynucleotide of claim 11 or 12 for use as nuclease, preferably, for use as double-strand DNA cleavage nuclease or nickase.

14. The engineered, non-naturally occurring Cas 12 protein of any one of claims 1-8, or the

Cas 12 polynucleotide of claim 11 or 12 for use in the gene editing.

15. The engineered, non-naturally occurring Cas 12 protein of any one of claims 1-8, or the

Cas 12 polynucleotide of claim 11 or 12 for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

16. The engineered, non-naturally occurring Cas 12 protein of any one of claims 1-8, or the

Cas 12 polynucleotide of claim 11 or 12 for use as a medicament.

17. The engineered, non-naturally occurring Casl2 protein of any one of claims 1-8, or the Casl2 polynucleotide of claim 11 or 12 for use in a method of therapeutic treatment of a patient.

18. An engineered vector comprising the Cas 12 polynucleotide of claim 11 or 12; preferably, the vector is an expression vector; more preferably, the vector is an inducible, conditional, or constitutive expression vector.

19. A vector system comprising one or more vectors of claim 18; preferably, said one or more vectors comprise a polynucleotide according to claim 11 or 12 and one or more polynucleotides which are on the same or a different vector encoding a gRNA.

20. An engineered cell comprising the Casl2 polynucleotide of claim 11 or 12, or comprising the vector of claim 18, or comprising the vector system of claim 19; preferably, the cell is expressing the Casl2 protein; more preferably, the cell transiently expresses or non-transiently expresses the modified CRISPR- Casl2 protein; further more preferably, the cell is a eukaryotic cell or a prokaryotic cell; e.g. the cell is a mammalian cell or a human cell or a plant cell.

21. A reagent kit comprising the Casl2 protein of any one of claims 1-8, or comprising the Casl2 polynucleotide of claim 11 or 12, or comprising the vector of claim 18, or comprising the vector system of claim 19.

22. A pharmaceutical composition comprising the Casl2 protein of any one of claims 1-8 or the polynucleotide of claim 11 or 12 or the vector of claim 18 or the vector system of claim 19 formulated for delivery by AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (virus-like particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

23. An engineered, non-naturally occurring CRISPR-Cas system comprising: a) the Casl2 protein of any one of claims 1-8 or the polynucleotide of claim 11 or 12; b) at least one engineered guide sequence or one or more engineered nucleic acid encoding the at least one engineered guide sequence, and the guide sequence comprises a direct repeat sequence capable of binding the Casl2 protein and a spacer sequence capable of hybridizing to a target nucleotide sequence; preferably, the system comprises at least one guide sequence which is capable of hybridizing at least one target sequence or different regions of one target sequence.

24. The system of claim 23, wherein the guide sequence hybridizes to one or more target sequences in a prokaryotic cell or in a eukaryotic cell; preferably, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell; or, the eukaryotic cell comprises a mammalian cell, e.g. a human cell, or a plant cell.

25. The system of claim 23, wherein the target sequence is a DNA; preferably, the target sequence is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.

26. The system of claim 23, wherein the direct repeat sequence comprises a stem-loop structure which comprising a first stem nucleotide strand which comprises 4-6 nucleotides; a second stem nucleotide strand which comprises 4-6 nucleotides, wherein the first and second stem nucleotide strands can hybridize with each other; and a loop nucleotide strand arranged between the first and second stem nucleotide strands, wherein the loop nucleotide strand comprises 4 or 5 nucleotides; preferably, the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to any one of SEQ ID NOs:153-156; preferably, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.

27. The system of claim 23, wherein the spacer sequence is between 10 and 40 nucleotides in length, preferably the spacer sequence is between 15 and 30 nucleotides in length, or between 18 and 25 nucleotides in length.

28. The system of claim 23, wherein a mRNA or a DNA encodes the Casl2 protein.

29. The system of claim 23, wherein the polynucleotide encoding the Casl2 protein, operably linked to a promoter; preferably, the promoter is a constitutive promoter, a tissue-specific promoter or an inducible promoter; or the polynucleotide encoding the Casl2 protein operably linked to a promoter is in a vector; more preferably, the vector is selected from the group consisting of a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.

30. The system of any one of claims 23-29, wherein the system further comprising a donor template nucleic acid, the donor template nucleic acid is a DNA or an RNA or a DNA-RNA hybrids; or the targeting of the target sequence by the Casl2 protein and guide sequence results in a modification of the target sequence; preferably, the modification of the target sequence is a cleavage event or a nicking event; preferably, the target sequence is 3’ of a Protospacer Adjacent Motif (PAM), the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 70% sequence identity; the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 70% sequence identity, or the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO:26, or a homologue thereof having at least 70% sequence identity.

31. A delivery system, wherein the system of any one of claims 23-30 is presented in selected from the group consisting of AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (virus-like particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

32. An engineered cell comprising the system of any one of claims 23-30; preferably, the cell is a eukaryotic cell or a prokaryotic cell; more preferably, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell; or, the cell is a mammalian cell or a human cell or a plant cell.

33. The engineered, non-naturally occurring CRISPR-Cas system of any one of claims 23-30, or the delivery system of claim 31 for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

34. The engineered, non-naturally occurring CRISPR-Cas system of any one of claims 23-30, delivery system of claim 31 or the engineered cell of claim 32 for use as a medicament.

35. The engineered, non-naturally occurring CRISPR-Cas system of any one of claims 23-30, delivery system of claim 31 or the engineered cell of claim 32 for use in a method of therapeutic treatment of a patient.

36. A method of modifying or targeting a target DNA locus, the method comprising delivering to said locus a CRISPR-Cas system of any one of claims 23-30 or a delivery system of claim 31 ; preferably: said modifying or targeting a target locus comprises inducing a DNA strand break; or, said modifying or targeting a target locus comprises inducing a DNA double strand break or a DNA single strand break; or, said modifying or targeting a target locus comprises altering gene expression of one or more genes; or, said modifying or targeting a target locus comprises epigenetic modification of said target DNA locus.

37. The method of claim 36, which is a method of modifying a cell, a cell line, or an organism by manipulation of one or more target sequences at genomic loci of interest; preferably, the cell is a eukaryotic cell or a prokaryotic cell; or, the method is in vitro or in vivo; more preferably, the cell is a mammalian cell or a human cell or a plant cell.

38. A method of targeting and cleaving a double- stranded target DNA, the method comprising: contacting the double- stranded target DNA with a system of any one of claims 23-30; preferably, cleaving the target DNA or target sequence results in the formation of an indel or the insertion of a nucleotide sequence; or cleaving the target DNA or target nucleotide comprising cleaving the target DNA or target sequence in two sites, and results in the deletion or inversion of a sequence between the two sites.

39. An isolated eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to the method or via use of the composition or via use of the system of any one of the preceding claims.

40. A system for detecting the presence of a nucleic acid target sequence in an in vitro sample, comprising: a Casl2 protein of any one of claims 1-8; at least one guide polynucleotide comprising a guide sequence capable of binding the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence; and wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the nontarget sequence of the nucleic acid-based masking construct activated by the target sequence.

41. A method for detecting target nucleic acids in samples comprising: contacting one or more samples with a Casl2 protein of any one of claims 1-8; at least one guide polynucleotide comprising a guide sequence designed to have a degree of complementarity with the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence, wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the non-target sequence of the nucleic acid-based masking construct activated by the target sequences; and detecting a signal from cleavage of the non-target sequence, thereby detecting the one or more target nucleic acid sequences in the sample.

42. An engineered, non-naturally occurring sgRNA, wherein the sgRNA comprises, in a tandem arrangement:

I. a direct repeat sequence;

II. a spacer sequence, which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182; preferably, the tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation; or, the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156; or, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156; or, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181; or, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182; or, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

43. A DNA polynucleotide molecule encoding the sgRNA of claim 42.

44. A DNA expression vector comprising the DNA polynucleotide molecule of claim 43; preferably, wherein the vector further comprises one or more regulatory element(s) operably linked to sequences encoding the sgRNA; more preferably, at least one regulatory element is capable of directing expression of the sgRNA within the cell.

45. A delivery vector carrying one or more sgRNA of claim 42.

46. An engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof; preferably, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156; or, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.

47. An engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182; preferably, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182; or, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

48. A DN A polynucleotide molecule encoding the spacer sequence of claim 47.

49. The engineered, non-naturally occurring spacer sequence of claim 47 for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

50. The engineered, non-naturally occurring spacer sequence of claim 47 for use as a medicament.

51. The engineered, non-naturally occurring spacer sequence of claim 47 for use in a method of therapeutic treatment of a patient.

Description:
Casl2 Protein, CRISPR-Cas System and Uses thereof

This application claims priority to PCT/CN2022/114894, filed on August 25, 2022; PCT/CN2022/132014, filed on November 15, 2022; PCT/CN2022/115244, filed on August 26, 2022; PCT/CN2022/120125, filed on September 21, 2022; PCT/CN2022/115249, filed on August 26, 2022; PCT/CN2022/ 125447, filed on October 14, 2022; PCT/CN2023/088765, filed on April 17, 2023; PCT/CN2023/094273, filed on May 15, 2023, each of which is incorporated herein in its entirety and for all purposes.

Technical Field

The present disclosure relates to a Casl2 protein, CRISPR-Cas system and uses thereof. Particularly, the Casl2 protein and CRISPR-Cas system are used for the gene targeting or gene editing.

Background

Targeted genome editing or modification is rapidly becoming an important tool for basic and applied research, with clustered regularly interspaced short palindromic repeats and CRISPR- associated proteins (CRISPR-Cas) system showing the most promising due to the ease of altering target specificity by engineering associated guide RNAs. Recent advances in genome sequencing techniques and analysis methods have significantly accelerated the ability to catalog and map genetic factors associated with a diverse range of biological functions and diseases. Precise genome targeting technologies are needed to enable systematic reverse engineering of causal genetic variations by allowing selective perturbation of individual genetic elements, as well as to advance synthetic biology, biotechnological, and medical applications.

Various of CRISPR-Cas system had been explored and different CRISPR-Cas systems present different characteristics. Like as CRISPR-Casl2a system, which belongs to the class II of CRISPR-Cas system and is an alternative to the wildly used CRISPR-Cas9. The further studies showed that each subtype of the CRISPR-Cas system itself is also diverse, and some of them are highly controversial in taxonomy.

While these diverse properties of the CRISPR-Cas 12a system provide potential for the development of versatile tools for genome engineering, there are still challenges, including few currently identified orthologs, limited genomic targeting coverage, and relatively low editing efficiency. Given the variety and wealth of microbial genomes, it is reasonable countless Casl2 presently have yet to be identified, many of which could exhibit alternate target recognition or enhanced editing efficiency over the commercially available Casl2.

Summary

There exists a pressing need for alternative Casl2a systems and techniques for gene editing with a wide array of applications. This invention addresses this need and provides related advantages. Mining of new Cas protein will help us to obtain the CRISPR-Cas system with higher gene editing efficiency and specificity. Besides, discovery of new Cas protein with smaller scales will attractive from the standpoint of intracellular delivery via viral vectors. Collectively, 35 novel Cas 12 proteins are presented and should enable wider application of CRISPR-Cas systems for gene editing.

The study found that they exhibit some special characteristics. Although phylogenetically more closely related to Cas 12a than other subtypes, the tree shows they each have their unique branches, suggesting that they are evolutionarily distinct.

In one aspect, the disclosure provides an engineered, non-naturally occurring Casl2 protein, the Casl2 protein comprises an amino acid sequence selected from SEQ ID NOs: 1-35, or a homologue thereof having at least 70% sequence identity.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 75%, 80%, 85%, 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 90%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.

In some embodiments, the amino acid sequence of Casl2 protein lacks of 25-40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. Furthermore, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the Casl2 protein has an amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14, or a homologue thereof having at least 70% sequence identity.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.

In some embodiments, the Casl2 protein further comprises promoter sequence, enhancer sequence, and/or termination region sequence.

In some embodiments, the Casl2 protein, based on any one of SEQ ID NOs: 1-35, comprises an M amino acid residue at its N-terminus.

In some embodiments, the Casl2 protein further comprises one or more of a nuclear localization signal sequence, a cell penetrating peptide sequence, an affinity tag and/or a fusion base editor protein.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152.

In another aspect, the disclosure provides an engineered, non-naturally occurring cell comprising the Casl2 protein of any one of above.

In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

In another aspect, the disclosure provides a kit comprising the Casl2 protein of any one of above.

In another aspect, the disclosure provides an engineered, non-naturally occurring Casl2 polynucleotide encoding the Casl2 protein of any one of above.

In some embodiments, the polynucleotide is ribonucleotide sequence or deoxyribonucleotide sequence, or analogs thereof; preferably the polynucleotide is mRNA, and the polynucleotide further comprises 5’cap sequence and poly-A tail sequence. In some embodiments, the polynucleotide is codon optimized for expression in a cell of interest. In some embodiments, the polynucleotide is codon optimized for expression in a eukaryotic cell. In some embodiments, the polynucleotide has at least 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has the sequence selected from SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115 or SEQ ID NOs: 117-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell, preferably a human cell.

In some embodiments, the polynucleotide has at least 70% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.

In some embodiments, the polynucleotide has at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 98% or 99% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use as nuclease, preferably, for use as double-strand DNA cleavage nuclease or nickase.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in the gene editing.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein of any one of above, or the Casl2 polynucleotide of any one of above for use in a method of therapeutic treatment of a patient.

In another aspect, the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above.

In some embodiments, the vector is an expression vector. In some embodiments, the vector is an inducible, conditional, or constitutive expression vector.

In another aspect, the disclosure provides a vector system comprising one or more vectors of any one of above.

In some embodiments, one or more vectors comprise a polynucleotide according to any one of above and one or more polynucleotides which are on the same or a different vector encoding a gRNA.

In another aspect, the disclosure provides an engineered cell comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.

In some embodiments, the cell is expressing the Casl2 protein. In some embodiments, the cell transiently expresses or non-transiently expresses the modified CRISPR-Casl2 protein. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

In another aspect, the disclosure provides a reagent kit comprising the Casl2 protein of any one of above, or comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.

In another aspect, the disclosure provides a pharmaceutical composition comprising the Casl2 protein of any one of above or the polynucleotide of any one of above or the vector of any one of above or the vector system of any one of above formulated for delivery by AAV (adena- associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus -like particles), VLP (virus-like particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

In another aspect, the disclosure provides an engineered, non-naturally occurring CRISPR- Cas system comprising: a) the Casl2 protein of any one of above or the polynucleotide encoding the Casl2 protein; b) at least one engineered guide sequence or one or more engineered nucleic acid encoding the at least one engineered guide sequence, and the guide sequence comprises a direct repeat sequence capable of binding the Casl2 protein and a spacer sequence capable of hybridizing to a target nucleotide sequence.

In some embodiments, the system comprises at least one guide sequence which is capable of hybridizing at least one target sequence or different regions of one target sequence. In some embodiments, the guide sequence hybridizes to one or more target sequences in a prokaryotic cell or in a eukaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell. In some embodiments, the eukaryotic cell comprises a mammalian cell. In some embodiments, the mammalian cell comprises a human cell. In some embodiments, the eukaryotic cell comprises a plant cell.

In some embodiments, the target sequence is a DNA. In some embodiments, the target sequence is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.

In some embodiments, the direct repeat sequence comprises a stem-loop structure which comprising a first stem nucleotide strand which comprises 4-6 nucleotides; a second stem nucleotide strand which comprises 4-6 nucleotides, wherein the first and second stem nucleotide strands can hybridize with each other; and a loop nucleotide strand arranged between the first and second stem nucleotide strands, wherein the loop nucleotide strand comprises 4 or 5 nucleotides.

In some embodiments, the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to any one of SEQ ID NOs:153-156.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.

In some embodiments, the spacer sequence is between 10 and 40 nucleotides in length, preferably the spacer sequence is between 15 and 30 nucleotides in length, or between 18 and 25 nucleotides in length.

In some embodiments, a mRNA or a DNA encodes the Casl2 protein.

In some embodiments, the polynucleotide encoding the Casl2 protein, operably linked to a promoter.

In some embodiments, the promoter is a constitutive promoter, a tissue-specific promoter or an inducible promoter.

In some embodiments, the polynucleotide encoding the Casl2 protein operably linked to a promoter is in a vector.

In some embodiments, the vector is selected from the group consisting of a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.

In some embodiments, the system further comprising a donor template nucleic acid, the donor template nucleic acid is a DNA or an RNA or a DNA-RNA hybrids.

In some embodiments, the targeting of the target sequence by the Casl2 protein and guide sequence results in a modification of the target sequence. In some embodiments, the modification of the target sequence is a cleavage event or a nicking event.

In some embodiments, the target sequence is 3’ of a Protospacer Adjacent Motif (PAM), the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 70% sequence identity; the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 70% sequence identity, or the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 70% sequence identity.

In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 80% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 85% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 90% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 95% sequence identity. In some embodiments, the PAM sequence is TTTR (R is A or G) and the Casl2 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 98% sequence identity.

In some embodiments, the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 80% sequence identity. In some embodiments, the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 85% sequence identity. In some embodiments, the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 90% sequence identity. In some embodiments, the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 95% sequence identity. In some embodiments, the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 98% sequence identity.

In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 80% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 85% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 90% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 95% sequence identity. In some embodiments, the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 26, or a homologue thereof having at least 98% sequence identity.

In another aspect, the disclosure provides a delivery system, wherein the system of any one of above is presented in selected from the group consisting of AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (viruslike particles), liposomes, plasmids, LNPs (lipid nanoparticles), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

In another aspect, the disclosure provides an engineered cell comprising the system of any one of above. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, or the delivery system of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use in a method of therapeutic treatment of a patient.

In another aspect, the disclosure provides a method of modifying or targeting a target DNA locus, the method comprising: delivering to said locus a CRISPR-Cas system of any one of above or a delivery system of above.

In some embodiments, said modifying or targeting a target locus comprises inducing a DNA strand break. In some embodiments, said modifying or targeting a target locus comprises inducing a DNA double strand break. In some embodiments, said modifying or targeting a target locus comprises altering gene expression of one or more genes. In some embodiments, said modifying or targeting a target locus comprises epigenetic modification of said target DNA locus. In some embodiments, the method is a method of modifying a cell, a cell line, or an organism by manipulation of one or more target sequences at genomic loci of interest.

In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell. In some embodiments, the method is in vitro or in vivo.

In another aspect, the disclosure provides a method of targeting and cleaving a doublestranded target DNA, the method comprising: contacting the double- stranded target DNA with a system of any one of above.

In some embodiments, cleaving the target DNA or target sequence results in the formation of an indel or the insertion of a nucleotide sequence. In some embodiments, cleaving the target DNA or target nucleotide comprising cleaving the target DNA or target sequence in two sites, and results in the deletion or inversion of a sequence between the two sites.

In another aspect, the disclosure provides an isolated eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition or via use of a system of any one of the preceding contents. In another aspect, the disclosure provides a system for detecting the presence of a nucleic acid target sequence in an in vitro sample, comprising: a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence capable of binding the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence; and wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the nontarget sequence of the nucleic acid-based masking construct activated by the target sequence.

In another aspect, the disclosure provides a method for detecting target nucleic acids in samples comprising: contacting one or more samples with a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence designed to have a degree of complementarity with the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence, wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the non-target sequence of the nucleic acid-based masking construct activated by the target sequences; and detecting a signal from cleavage of the non-target sequence, thereby detecting the one or more target nucleic acid sequences in the sample.

In another aspect, the disclosure provides an engineered, non-naturally occurring sgRNA, wherein the sgRNA comprises, in a tandem arrangement:

I. a direct repeat sequence;

II. a spacer sequence, which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182.

In some embodiments, the tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation. In some embodiments, the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

In another aspect, the disclosure provides a DNA polynucleotide molecule encoding the sgRNA of above.

In another aspect, the disclosure provides a DNA expression vector comprising the DNA polynucleotide molecule of above. In some embodiments, the vector further comprises one or more regulatory element(s) operably linked to sequences encoding the sgRNA. In some embodiments, at least one regulatory element is capable of directing expression of the sgRNA within the cell.

In another aspect, the disclosure provides a delivery vector carrying one or more sgRNA of any one of above.

In another aspect, the disclosure provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.

In another aspect, the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

In another aspect, the disclosure provides a DNA polynucleotide molecule encoding the spacer sequence of above.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a method of therapeutic treatment of a patient.

These and other aspects, objects, features, and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.

Brief description of the drawings

An understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure may be utilized, and the accompanying drawings of which:

FIG.1A shows the phylogenetic tree of the Casl2 protein (GEBx0013, GEBx0014, GEBx0017, GEBx0018, GEBx0020, GEBx0021, GEBx0024, GEBx0027, GEBx0029, GEBx0032, GEBx0037, GEBx0047, GEBx0049, GEBx0063-72) constructed by IQTREE; FIG. IB shows the phylogenetic tree of the Casl2 protein (GEBx0030, GEBx0039) constructed by IQTREE; FIG.1C shows the phylogenetic tree of the Casl2 protein (GEBx0030, GEBx0039) constructed by IQTREE; FIG. ID shows the phylogenetic tree of the Casl2 protein (GEBx0074- GEBx0080) constructed by IQTREE;

FIG.2A, FIG.2B, FIG.2C, and FIG.2D show the domain arrangement of the Casl2 proteins;

FIG.3A, FIG.3B and FIG.3C show the results of the amino acid sequence between GEBx0019(SEQ ID NO:5), GEBx0022(SEQ ID NO:8) and GEBxOO33(SEQ ID NO: 14) with the reference nuclease (AsCpfl, LbCpfl and FnCpfl), wherein FIG.3 A shows the comparison of the WED.3 domain of them and the difference is indicated with the line box; FIG.3B shows the comparison of the PI domain of them and the difference is indicated with the line box; FIG.3C shows the comparison of the NUC domain of them and the difference is indicated with the line box;

FIG.4 shows the structure of AsCpfl and the model of GEBx0019, GEBx0022 and GEBxOO33 and the PI domains are shown in dark grey;

FIG.5A shows the percentages of shared amino-acid sequences between the GEBx0030, GEBx0029 and reference Casl2a nucleases (AsCasl2a, LbCasl2a, and FnCasl2a), the multiple sequence alignment was performed by MUSCLE while the identity of each sequence was automatically calculated by GeneDoc; FIG.5B shows the sequence identity distance matrix between the GEBx0074-GEBx0080 in this disclosure and the reference sequences;

FIG.6 shows the secondary structure of the crRN A utilized by the Casl2 protein;

FIG.7 shows the secondary structure of the crRNA utilized by GEBx0019, GEBx0022, GEBxOO33, GEBx0030, GEBx0039, GEBx0074-GEBx0080;

FIG.8 shows the schematic of the construction of pEAST-Blunt E2 vector harbored with the Casl2 protein CDS;

FIG.9A shows the SDS-PAGE results of different Chromatographic fractions of GEBx0032 protein and the position of the target band is indicated by arrows; FIG.9B shows the SDS-PAGE analysis results of the purified GEBxOO33 proteins and the position of the target band is indicated by arrows; FIG.9C shows the SDS-PAGE results of different Chromatographic fractions of GEBx0037 protein and the position of the target band is indicated by arrows; FIG.9D shows the SDS-PAGE results of different Chromatographic fractions of GEBx0013 protein and the position of the target band is indicated by arrows; FIG.9E shows the SDS-PAGE results of different Chromatographic fractions of GEBx0018 protein and the position of the target band is indicated by arrows;

FIG.10 shows the heatmap of the PAM requirement of GEBxOO33;

FIG.11A shows the in vitro cleavage result of GEBxOO33; FIG.1 IB shows the in vitro cleavage results of GEBx0032 and GEBx0037; FIG.11C shows the in vitro cleavage results of GEBx0013 and GEBx0018;

FIG.12 shows the bar graph of the cleavage efficiency;

FIG.13 shows the sequences alignment between TnpB, Casl2f and GEBx0013/0047/0063/0064/0070 in this disclosure; the region of Zinc finger domain and the conserved 4-Cys Zinc finger in Casl2f and TnpB were marked with arrow and star respectively, indicated that GEBx0013/0047/0063/0064/0070 doesn’t have the zinc finger structure in their C terminus;

FIG.14A shows the PAM preference of the GEBx0047 in HEK293 cell line; FIG.14B is statistical curve depicting the relationship between the number of aligned sites and the cumulative number of aligned reads about GEBx0047; FIG.14C shows the PAM preference of the GEBx0070 in HEK293 cell line; FIG.14D is statistical curve depicting the relationship between the number of aligned sites and the cumulative number of aligned reads about GEBx0070; FIG.14E shows the PAM preference of the GEBx0013 in HEK293 cell line; FIG.14F shows the PAM preference of the GEBx0018 in HEK293 cell line; FIG.14G shows the PAM preference of the GEBx0032 in HEK293 cell line;

FIG.15 shows the schematic of pCasX and pgRNA plasmid harbored with the Cas nucleases CDS and guide RNA respectively;

FIG.16A shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with MYODI targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture; FIG.16B shows the editing efficiency (Indel) of human HEK293T cells following forward co-transfection of pCasX plasmid harbored GEBx0063 or GEBx0064 CDS and the pgRNA plasmid harbored different length of MYODI -TTTG-T1 spacer respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;

FIG.17 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with VEGFA targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;

FIG.18 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with IL1RN targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture;

FIG.19 shows the editing efficiency (Indel) of human HEK293T cells following forward transfection of different pCasX plasmids with DNMT1-1 targeted crRNA plasmid at 400 ng and 100 ng respectively, wherein NC (Negative Control) represents the cell sample without adding the lipoplex mixture.

FIG.20A shows the indel activity of GEBx0047 across 22 targets with TTTG-PAM in HEK293T cell line; FIG.20B shows the indel event and partly the allele plots achieved by GEBx0047 at RNF2-TTTG-T1 locus;

FIG.21A shows the indel activity of GEBx0063 across 22 targets with TTTG-PAM in HEK293T cell line; FIG.2 IB shows the indel event and partly the allele plots achieved by GEBx0063 at TTR-TTTG-T3 locus;

FIG.22A shows the indel activity of GEBx0064 across 22 targets with TTTG-PAM in HEK293T cell line; FIG.22B shows the indel event and partly the allele plots achieved by GEBx0064 at TTR-TTTG-T3 locus;

FIG.23A shows the indel activity of GEBx0070 across 22 targets with TTTG-PAM in HEK293T cell line; FIG.23B shows the indel event and partly the allele plots achieved by GEBx0070 at HBB-TTTG-T2 locus.

Detailed description of the preferred embodiment

The following examples further illustrate the present disclosure, but the present disclosure is not limited thereto.

General Definitions

Unless defined otherwise, the technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Eaboratory Manual, 2 nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4 th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F. M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (1995) (M. J. MacPherson, B. D. Hames, and G. R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2 nd edition 2013 (E. A. Greenfield ed.); Animal Cell Culture (1987) (R.I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew et al. (eds.), the Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2 nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4 th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2 nd edition (2011).

As used herein, the term “a”, “an”, “the”, and “said” and similar terms used in the context of the present disclosure (especially in the context of the claims) are to be construed to cover both the singular and plural unless otherwise indicated herein or clearly contradicted by the context. In addition, it should be noted that the plural form does not necessarily mean that it is plural, and it needs to be understood according to the context in the article.

The term “identity” in the context of two or more nucleic acids or polypeptide sequences refers to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same as measured using a BLAST or BLAST 2.0 or FASTA etc. sequence comparison algorithms with default parameters described below.

It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as “comprises”, “comprised”, “comprising” and the like can have the meaning attributed to it in U.S. Patent law; e.g., they can mean “includes”, “included”, “including”, and the like; and those terms such as “consisting essentially of’ and “consists essentially of’ have the meaning ascribed to them in U.S. Patent law.

As used herein, the terms “recognized”, “recognizing”, or “recognition” in this context refers to the capability of the Casl2 protein to form a functional complex with a gRNA at a DNA target site to which the gRNA hybridizes (i.e. to which the guide sequence of the gRNA hybridizes) and being flanked by the PAM sequence, and wherein the Casl2 protein is capable of performing its natural function, i.e. DNA cleavage. In this context it is to be noted that such DNA cleavage precludes the Casl2 protein from being a catalytically inactive Casl2 protein. In the case of for instance an inactivated Casl2 protein (e.g. a dead Casl2 protein), a complex between the Casl2 protein, gRNA and cognate target may nevertheless be formed if the required PAM sequence is present, but such does not result in DNA cleavage.

The term “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.

The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.

The terms “about”, “~”as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/— 10% or less, +/-5% or less, +/-2% or less, +/-1% or less, and +/-0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed disclosure. It is to be understood that the value to which the modifier “about” or refers is itself also specifically, and preferably, disclosed.

The term “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects, embodiments, or designs. As used herein, a “sample” may contain whole cells and/or live cells and/or cell debris. The sample may contain (or be derived from) a “bodily fluid”. The present disclosure encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.

The terms “subject”, “individual” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.

Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “in a specific embodiment”, “in some embodiment”, “in certain embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases “a specific embodiment”, “in one embodiment” or “in certain embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, a particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the disclosure. For example, in the appended claims, any one of the claimed embodiments can be used in any combination.

All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.

The term “gene” refers to a nucleic acid sequence (used interchangeably with polynucleotide or nucleotide sequence) that encodes a chimeric molecule as described herein. This definition includes various sequence polymorphisms, mutations, and/or sequence variants wherein such alterations do not substantially affect the function of the encoded chimeric molecule. The term “gene” may include not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions. The term further can include all introns and other DNA sequences spliced from the mRNA transcript, along with variants resulting from alternative splice sites. Gene sequences encoding the molecule can be DNA or RNA that directs the expression of the chimeric molecule. These nucleic acid sequences may be a DNA strand sequence that is transcribed into RNA or an RNA sequence that is translated into protein. The nucleic acid sequences include both the full-length nucleic acid sequences as well as non-full-length sequences derived from the full-length protein. The sequences can also include degenerate codons of the native sequence or sequences that may be introduced to provide codon preference in a specific cell type. Portions of complete gene sequences are referenced throughout the disclosure as is understood by one of ordinary skill in the art.

“Encoding” refers to the property of specific sequences of nucleotides in a gene, such as a cDNA, or an mRNA, to serve as templates for synthesis of other macromolecules such as a defined sequence of amino acids. Thus, a gene codes for a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. A polynucleotide encoding a protein includes all nucleotide sequences that are degenerate versions of each other and that code for the same amino acid sequence or amino acid sequences of substantially similar form and function.

The terms “polynucleotide”, “nucleotide”, “nucleotide sequence”, “nucleic acid”, “nucleic acid molecule” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. Thus, this term includes, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.

Polynucleotide sequences encoding more than one portion of an expressed chimeric molecule can be operably linked to each other and relevant regulatory sequences. For example, there can be a functional linkage between a regulatory sequence and an exogenous nucleic acid sequence resulting in expression of the latter. For another example, a first nucleic acid sequence can be operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences are contiguous and, where necessary or helpful, join coding regions, into the same reading frame.

The terms “non-naturally occurring” or “engineered” are used interchangeably and indicate the involvement of the hand of man. The terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature. In all aspects and embodiments, whether they include these terms or not, it will be understood that, preferably, may be optional and thus preferably included or not preferably included. Furthermore, the terms “non-naturally occurring” and “engineered” may be used interchangeably and so can therefore be used alone or in combination and one or other may replace mention of both together. In particular, “engineered” is preferred in place of “non-naturally occurring” or “non-naturally occurring and/or engineered” or “engineered, non-naturally occurring”.

“Homologue” of a protein as used herein is a protein of the same species which perform the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related. “Homologue” of a protein as used herein also include sequences having one or more additions, deletions, stop positions, or substitutions, as compared to a sequence disclosed herein. The Homologue protein as used herein perform the same or a similar function as the Casl2 protein disclosed herein.

The term “affinity tag” as used herein facilitates the purification of recombinant modified proteins, for example GST, FLAG or hexahistidine sequences.

The term “fusion base editor protein” as used herein refers to proteins that enable the direct conversion or editing of bases.

The term “sgRNA”, “crRNA”, and “gRNA (guide RNA)” are generally used interchangeably based on the context of the present disclosure.

The term “cleavage event” as used herein, refers to a DNA break in a target nucleic acid created by a nuclease of a CRISPR system described herein. In some embodiments, the cleavage event is a double- stranded DNA break. In some embodiments, the cleavage event is a singlestranded DNA break.

A “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides that are known or predicted to form a double strand (stem portion) that is linked on one side by a region of predominantly single- stranded nucleotides (loop portion). The terms “hairpin” and “fold-back” structures are also used herein to refer to stem- loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art. As is known in the art, a stem-loop structure does not require exact basepairing. Thus, the stem may include one or more base mismatches. Alternatively, the base-pairing may be exact, i.e., not include any mismatches.

The term “donor template nucleic acid” as used herein refers to a nucleic acid molecule that can be used by one or more cellular proteins to alter the structure of a target nucleic acid after a CRISPR enzyme described herein has altered a target nucleic acid. In some embodiments, the donor template nucleic acid is a double- stranded nucleic acid. In some embodiments, the donor template nucleic acid is a single-stranded nucleic acid. In some embodiments, the donor template nucleic acid is linear. In some embodiments, the donor template nucleic acid is circular (e.g., a plasmid). In some embodiments, the donor template nucleic acid is an exogenous nucleic acid molecule. In some embodiments, the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome).

As used herein, the term “targeting” refers to the ability of a complex including a CRISPR- associated protein and an RNA guide, to preferentially or specifically bind to, e.g., hybridize to, a specific target nucleic acid compared to other nucleic acids that do not have the same or similar sequence as the target nucleic acid.

As used herein, the term “target nucleic acid” refers to a specific nucleic acid substrate that contains a nucleic acid sequence complement to the entirety or a part of the spacer in an RNA guide. In some embodiments, the target nucleic acid comprises a gene or a sequence within a gene. In certain embodiments, the target nucleic acid comprises a noncoding region (e.g., a promoter). In a specific embodiment, the target nucleic acid is single-stranded. In a specific embodiment, the target nucleic acid is double- stranded.

As used herein, the term “target sequence” or “target nucleic acid” refers to a specific nucleic acid that contains a nucleic acid sequence complement to the entirety or a part of the spacer in an RNA guide. In some embodiments, the target sequence comprises a gene or a sequence within a gene. In certain embodiments, the target sequence comprises a noncoding region (e.g., a promoter). In a specific embodiment, the target sequence is single-stranded. In a specific embodiment, the target sequence is double- stranded. It will be appreciated that the terms Casl2 enzyme, Casl2 protein, Casl2 effector protein and Casl2 are generally used interchangeably and at all points of reference herein refer by analogy to novel CRISPR effector proteins further described in this application, unless otherwise apparent.

In the disclosure, AsCasl2a also names AsCpfl; LbCasl2a also names LbCpfl; FnCasl2a also names FnCpfl. AA is the abbreviation of amino acid.

Metagenomic sequencing samples were selected from public databases and then downloaded. And sequencing reads were assembled with assembling tools. To search for potential Cas protein sequences, Cas sequences were downloaded as references and then Cas sequences were analyzed. We mined 35 novel Cas 12 proteins via lots of work. The information of the 35 novel Cas 12 proteins is showed in table 1.

Table 1 the detailed information of the Cas 12 proteins

Note: indicates unknown.

As shown in Table 1, most of Casl2 proteins are from different microorganism and resources. We continued tracking the family or genus of the microorganism and found that GEBx0047 (MGYG000001629) is more likely from Phascolarcto bacterium, GEBx0070 (CALXXIO 10000001.1_4) is more likely from Spirochaetia bacterium, while GEBx0063 (CALUP0010000002.1_102) and GEBx0064 (CALUQJ010000016.1_32) are more likely from Gastranaerophilales bacterium.

The phylogenetic tree was constructed by IQTREE (FIG.1A) to visualize the relatedness of the orthologs at the primary amino-acid level using 102 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the Casl2 protein disclosed in this invention are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Casl2a than other subtypes, the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. For example, as shown in FIG.1A, GEBx0029 and GEBx0072 are more similar and they are representative clusters; GEBx0047 and GEBx0048 are more similar and they are representative clusters; GEBx0069, GEBx0070, GEBx0071, GEBx0073, and GEBx0066 are more similar and they are representative clusters, particularly, GEBx0073 and GEBx0066 are representative clusters, simultaneously GEBx0073 are unique one; GEBx0013, GEBx0014, GEBx0018, GEBx0024, and GEBx0049 are representative clusters; GEBx0020, GEBx0027, GEBx0063, GEBx0065, and GEBx0064 are representative clusters, and so on.

They each have their unique branches, suggesting that they are evolutionarily distinct. Besides that, the Casl2 proteins share less than 70% identity with the existed Cas protein, some even share less than 60% identity or 50% identity with the existed Cas protein. These features suggest that Cas 12 proteins were independent of the existing Cas 12a family.

The phylogenetic tree was constructed by IQTREE (FIG. IB) to visualize the relatedness of the orthologs at the primary amino-acid level using 88 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the GEBx0030 and GEBx0039 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Casl2a than other subtypes, the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

The multiple sequence alignment was performed by MUSCLE while the identity of each sequence was automatically calculated by GeneDoc. The results are shown in FIG.5A. As shown in the FIG.5A, GEBx0030 and GEBx0039 share less than 40% identity with three reference sequences. Besides, the size of GEBx0030 (1222 aa) and GEBx0039 (1171 aa) are much smaller than the usual Casl2a. These features suggest that GEBx0030 and GEBx0039 are independent of the existing Casl2a family.

The phylogenetic tree was constructed by IQTREE (FIG.1C) to visualize the relatedness of the orthologs at the primary amino-acid level using 73 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the GEBx0019, GEBx0022 and GEBxOO33 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Casl2a than other subtypes, the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

The amino acid sequences are aligned and the results are shown in FIG.3 A, FIG.3B, FIG.3C and FIG.4. As shown in the FIG.3A, GEBx0019, GEBx0022 and GEBxOO33 together with the AsCpfl has a relatively larger WED.3 domain (165 AA) than LbCpfl and FnCpfl (130 AA). In addition, the comparison of the PI domain and NUC domain between GEBx0019, GEBx0022 and GEBxOO33 and AsCpfl in FIGs.3B and FIG.3C shows that GEBx0019, GEBx0022 and GEBxOO33 have smaller PI domain and NUC domain than AsCpfl. These results reflect the differences in their evolutionary orientations, while these smaller size Casl2 protein are expected to facilitate the intracellular delivery via viral vectors or other delivery systems.

The phylogenetic tree was constructed by IQTREE (FIG. ID) to visualize the relatedness of the orthologs at the primary amino-acid level using 120 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the Casl2 protein disclosed in this invention are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Casl2a than other subtypes, the tree shows that the engineered Casl2 protein studied here are representatives of unique Casl2 clusters. For example, as shown in FIG. ID, GEBx0075 and GEBx0077 are more similar and they are representative clusters; GEBx0079 and GEBx0080 are more similar and they are representative clusters; GEBx0076 is a representative cluster; GEBx0074 and GEBx0078 are more similar and they are representative clusters.

They each have their unique branches, suggesting that they are evolutionarily distinct. Besides that, the Casl2 proteins share less than 70% identity with the existed Cas protein, some even share less than 60% identity or 50% identity with the existed Cas protein. These features suggest that Cas 12 proteins were independent of the existing Cas 12a family.

The multiple sequence alignment was performed by MUSCLE while the identity of each sequence was automatically calculated by GeneDoc. The results are shown in FIG.5B. As shown in the FIG.5B, GEBx0074- GEBx0080 share less than 50% identity with three reference sequences (AsCpfl, LbCpfl, and FnCpfl).

In one aspect, the disclosure provides an engineered, non-naturally occurring Cas 12 protein, wherein the Casl2 protein comprises an amino acid sequence selected from SEQ ID NOs: 1-35, or a homologue thereof having at least 70% sequence identity.

For example, “at least 70%”can include 70%, 75%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%”can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

In some embodiments, the Cas 12 protein comprises an amino acid sequence having at least 75%, 80%, 85%, 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.

In some embodiments, the Cas 12 protein comprises an amino acid sequence having at least 90%, 95% or 98% sequence identity to any one of SEQ ID NOs: 1-35.

In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 70% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 75% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 80% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 82% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 85% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 87% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 92% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has at least 98% sequence identity to any one of SEQ ID NOs: 1- 35. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 99% sequence identity to any one of SEQ ID NOs: 1-35. In certain embodiments, the amino acid sequence of the Cas 12 protein has 100% sequence identity to any one of SEQ ID NOs: 1-35. The “100% sequence identity” means the amino acid sequence of the CRISPR-Casl2 protein is selected from one of the SEQ ID NOs: 1-35.

Table 2 The amino acid sequences of Cas 12 protein

In some embodiments, the amino acid sequence of Casl2 protein lacks of 25-40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. Furthermore, the amino acid sequence of Casl2 protein lacks of 15-30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 28 amino acids in PI domain and the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

For example, in some embodiments, the amino acid sequence of Casl2 protein lacks of 28 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 30 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 35 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 40 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In some embodiments, the amino acid sequence of Casl2 protein lacks of 26 amino acids in PI domain compared to the amino acid sequence of SEQ ID NO: 71. In some embodiments, the amino acid sequence of Casl2 protein lacks of at least 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 18 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 20 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 17 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 25 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71. In a specific embodiment, the amino acid sequence of Casl2 protein lacks of 30 amino acids in NUC domain compared to the amino acid sequence of SEQ ID NO: 71.

In some embodiments, the Casl2 protein has an amino acid sequence selected from SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14, or a homologue thereof having at least 70% sequence identity. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.

For example, “at least 70%”can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%” can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

In certain embodiments, the amino acid sequence of the Casl2 protein has at least 70% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 75% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 82% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 87% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 92% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has at least 99% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. In certain embodiments, the amino acid sequence of the Casl2 protein has 100% sequence identity to any one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14. The “100% sequence identity” means the amino acid sequence of the CRISPR-Casl2 protein is selected from one of SEQ ID NO: 5, SEQ ID NO: 8, or SEQ ID NO: 14.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.

For example, “at least 70%”can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

For example, in a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 1, SEQ ID NO: 4, or SEQ ID NO: 13.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 1. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 1.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 4. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 4.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 13. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 13.

In some embodiments, the amino acid sequence of the Casl2 protein has at least 70% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 80%, 85%, 90%, 92%, 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein has at least 95% or 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In some embodiments, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26

For example, “at least 70%”can include 70%, 72%, 75%, 78%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%” can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20, or SEQ ID NO: 26.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 17. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 17.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 19. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 19.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 20.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 85% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 26. In a certain embodiment, the amino acid sequence of the Casl2 protein is set forth in SEQ ID NO: 26.

In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 80% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 90% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 95% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20. In a certain embodiment, the amino acid sequence of the Casl2 protein has at least 98% sequence identity to SEQ ID NO: 19, or SEQ ID NO: 20.

The Casl2 proteins provided by this disclosure may Ide not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions. For example, in some embodiments, there is a “M” at the N terminal of the polypeptide provided by this disclosure. That is to say, to any one of SEQ ID Nos: 1-35, there is a “M” at the N terminus. Just changed as SEQ ID NOs: 148-152.

In some embodiments, the Casl2 protein further comprises promoter sequence, enhancer sequence, and/or termination region sequence. In some embodiments, the Casl2 protein, based on any one of SEQ ID NOs: 1-35, comprises an M amino acid residue at its N-terminus.

In some embodiments, the Casl2 protein further comprises one or more of a nuclear localization signal sequence, a cell penetrating peptide sequence, an affinity tag and/or a fusion base editor protein.

The Casl2 protein comprises one or more nuclear localization signal(s) NLS(s). The NLS(s) can locate at the end or other portion of the peptide. The NLS(s) located each end or other portion of the Casl2 amino acid sequence can be same or not. In some embodiments, the NLS of the N- terminal end and the NLS of the C-terminal end are the same. In some embodiments, the NLS of the N-terminal end and the NLS of the C-terminal end are different. In some embodiments, the N- terminal end of the Casl2 amino acid sequence comprising one NLS and the C-terminal end of the Casl2 amino acid sequence comprising one NLS. The amino acid sequence of NLS fused to the N-terminal end or the C-terminal end of the Casl2 amino acid sequence respectively.

NLS is fused to a peptide or non-peptide moiety that allows proteins to enter or localize to a tissue, a cell, or a region of a cell. For instance, NLS maybe an SV40 (simian virus 40) NLS, c- Myc NLS, or other suitable monopartite NLS. The NLS may be fused to an N-terminal and/or a C-terminal of the Casl2 protein.

Generally, an affinity tag is added for purification of the fusion polypeptide by affinity chromatography.

In some embodiments, NLS located the N-terminal is set forth in SEQ ID NO: 183 (MAPKKKRKV). In some embodiments, NLS located the C-terminal is set forth in SEQ ID NO: 80 (KRPAATKKAGQAKKKK). In some embodiments, the FLAG sequence located the N- terminal is set forth in SEQ ID NO: 81 (DYKDDDDK). In SEQ ID NOs: 148-152, the combination of SEQ ID NO: 183 and SEQ ID NOs:80-81 are just an example. Other available sequences and different combinations can also be chosen for the NLSs sequences and FLAG sequence.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152. In some embodiments, the Casl2 protein comprises an amino acid sequence set forth in any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152. Any one of SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152 comprises the Casl2 protein provided in this disclosure, the NLSs located the N- terminal and the C-terminal, and the FLAG sequence located the N-terminal.

In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 74. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 74. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 148. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 149. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 149. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 150. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 150. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 151. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 151. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 152. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 152.

In another aspect, the disclosure provides an engineered, non-naturally occurring cell comprising the Casl2 protein of any one of above.

In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

The cell maybe the eukaryotic cell or the prokaryotic cell. In one embodiment, the cell is a eukaryotic cell. In another embodiment, the cell is a vertebrate, mammalian, rodent, goat, pig, bird, chicken, turkey, cow, horse, sheep, fish, primate, or human cell. In one embodiment, the cell is a mammalian cell. In one embodiment, the cell is a human cell. In one embodiment, the cell is a somatic cell, a germ cell, or a prenatal cell. In one embodiment, the cell is a zygotic cell, a blastocyst cell, an embryonic cell, a stem cell, a mitotically competent cell, or a meiotically competent cell. In one embodiment, the cell is not part of a human embryo. In one embodiment, the cell is a somatic cell. In one embodiment, the cell is a T cell, a CD 8+ T cell, al naive T cell, a central memory T cell, an effector memory T cell, a CD 4+ T cell, a stem cell memory T cell, a helper T cell, a regulatory T cell, a cytotoxic T cell, a natural killer T cell, a Hematopoietic Stem Cell, a long term hematopoietic stem cell, a short term hematopoietic stem cell, a multipotent progenitor cell, a lineage restricted progenitor cell, a lymphoid progenitor cell, a myeloid progenitor cell, a common myeloid progenitor cell, an erythroid progenitor cell, a megakaryocyte erythroid progenitor cell, a retinal cell, a photoreceptor cell, a rod cell, a cone cell, a retinal pigmented epithelium cell, a trabecular meshwork cell, a cochlear hair cell, an outer hair cell, an inner hair cell, a pulmonary epithelial cell, a bronchial epithelial cell, an alveolar epithelial cell, a pulmonary epithelial progenitor cell, a striated muscle cell, a cardiac muscle cell, a muscle satellite cell, a neuron, a neuronal stem cell, a mesenchymal stem cell, an induced pluripotent stem (iPS) cell, an embryonic stem cell, a monocyte, a megakaryocyte, a neutrophil, an eosinophil, a basophil, a mast cell, a reticulocyte, a B cell, e.g., a progenitor B cell, a Pre B cell, a Pro B cell, a memory B cell, a plasma B cell, a gastrointestinal epithelial cell, a biliary epithelial cell, a pancreatic ductal epithelial cell, an intestinal stem cell, a hepatocyte, a liver stellate cell, a Kupffer cell, an osteoblast, an osteoclast, an adipocyte, a preadipocyte, a pancreatic islet cell (e.g., a beta cell, an alpha cell, a delta cell), a pancreatic exocrine cell, a Schwann cell, or an oligodendrocyte. In one embodiment, the cell is a T cell, a Hematopoietic Stem Cell, a retinal cell, a cochlear hair cell, a pulmonary epithelial cell, a muscle cell, a neuron, a mesenchymal stem cell, an induced pluripotent stem (iPS) cell, or an embryonic stem cell. In another embodiment, the cell is a plant cell.

In another aspect, the disclosure provides a kit comprising the engineered, non-naturally occurring Casl2 protein of any one of above. In addition, the reagent kit can comprise the other components, for example, a solution or a buffer.

It would be appreciated that the kit may further comprise other suitable excipients such as buffers or reagents for facilitating the application of the kit. Preferably, the kit may be applied in various applications such as medical applications including therapies and diagnosis, researches and the like. Accordingly, the Casl2 protein and the kit of the present invention may be used in the preparation of a medicament for treatment and/or in the preparation of an agent for research study.

In another aspect, the disclosure provides an engineered, non-naturally occurring Casl2 polynucleotide encoding the Casl2 protein of any one of above.

The polynucleotides, may be in the form of RNA or DNA, which includes cDNA, genomic DNA, and synthetic DNA. A polynucleotide may be double stranded or single stranded, and if single stranded, may be the coding strand or non-coding (anti- sense strand). A coding polynucleotide may have a coding sequence identical to a coding sequence known in the art or may have a different coding sequence, which, as the result of the redundancy or degeneracy of the genetic code, or by splicing, can encode the same polypeptide.

The polypeptide may include not only coding sequences but also regulatory regions such as promoters, enhancers, and termination regions. The term further can include all introns and other DNA sequences spliced from the mRNA transcript, along with variants resulting from alternative splice sites. These nucleic acid sequences may be a DNA strand sequence that is transcribed into RNA or an RNA sequence that is translated into protein. The nucleic acid sequences include both the full-length nucleic acid sequences as well as non-full-length sequences derived from the full- length protein. The sequences can also include degenerate codons of the native sequence or sequences that may be introduced to provide codon preference in a specific cell type. The polypeptide sequences are referenced throughout the disclosure as is understood by one of ordinary skill in the art.

In some embodiments, the polynucleotide is ribonucleotide sequence or deoxyribonucleotide sequence or analogs thereof; preferably the polynucleotide is mRNA, and the polynucleotide further comprises 5’cap sequence and poly-A tail sequence. In some embodiments, the polynucleotide is codon optimized for expression in a cell of interest. In some embodiments, the polynucleotide is codon optimized for expression in a eukaryotic cell; preferably the polynucleotide has at least 90%, 92%, 95% or 98% sequence identity to any one of SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has the sequence selected from SEQ ID NOs: 111-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115 or SEQ ID NOs: 117-120. In some embodiments, the polynucleotide has at least 95% or 98% sequence identity to any one of SEQ ID NOs: 112-115. In some embodiments, the polynucleotide has a sequence identity to any one of SEQ ID NOs: 111- 120.

In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 112 or SEQ ID NO: 117. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 113 or SEQ ID NO: 118. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 114 or SEQ ID NO: 119. In a certain embodiment, the polynucleotide has at least 85% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has at least 90% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has at least 95% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has at least 98% sequence identity to SEQ ID NO: 115 or SEQ ID NO: 120. In a certain embodiment, the polynucleotide has the sequence set forth in SEQ ID NO: 115 or SEQ ID NO: 120.

As described herein, “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell, preferably a human cell. In some embodiments, the cell is a mammalian cell, preferably a human cell.

In some embodiments, the polynucleotide has at least 70% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87. SEQ ID NOs: 36-70 are the polynucleotide sequences that correspond to encoding the SEQ ID NOs: 1-35. SEQ ID NOs: 83-87 are the polynucleotide sequences that correspond to encoding the SEQ ID NOs: 74-78 or SEQ ID NOs: 148-152. SEQ ID NOs: 83-87 all include NLSs sequences and FLAG sequence.

The polynucleotide encoding the Casl2 protein will change accordingly to correspond with the promoter sequence, enhancer sequence, and/or termination region sequence comprising the Casl2 protein. For example, the polynucleotide will add “ATG” at the 5’ end to encode an M amino acid residue at the N-terminus of the Casl2 protein.

In some embodiments, the polynucleotide has at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 98% or 99% sequence identity to any one of the SEQ ID NOs: 36-70 or SEQ ID NOs: 83-87.

As described herein, “at least 70%”can include 70%, 72%, 75%, 80%, 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 80%”can include 85%, 86%, 87%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 85%”can include 85%, 86%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 90%” can include 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%; “at least 95%” can include 95%, 96%, 97%, 98%, 99% or 100%; “at least 97%” can include 97%, 98%, 99% or 100%; “at least 98%” can include 98%, 99% or 100%; and so on.

Table 3 The nucleic acid sequences of Casl2 protein

In Table 3, the nucleic acids of SEQ ID NOs: 36-70 are the Non-Human Codon Optimized sequences.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein as described herein above, or the Casl2 polynucleotide as described herein above for use as nuclease. In some embodiments, the engineered, non-naturally occurring Casl2 protein as described herein above, or the Casl2 polynucleotide as described herein above for use as doublestrand DNA cleavage nuclease or nickase.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in gene editing.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring Casl2 protein for use in a method of therapeutic treatment of a patient. In another aspect, the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above. In another aspect, the disclosure provides an engineered vector comprising the Casl2 polynucleotide of any one of above.

In certain aspects, the invention involves vectors. As used herein, a “vector” is a tool that allows or facilitates the transfer of an entity from one environment to another. It is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment. Generally, a vector is capable of replication when associated with the proper control elements. In general, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double- stranded, or partially double- stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g., circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a “plasmid,” which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein virally- derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g., retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses (AAVs)). Viral vectors also include polynucleotides carried by a virus for transfection into a host cell. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as “expression vectors”. Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively- linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

In some embodiments, the vector is an expression vector. In some embodiments, the vector is an inducible, conditional, or constitutive expression vector.

In another aspect, the disclosure provides a vector system comprising one or more vectors of any one of above. In some embodiments, one or more vectors comprise a polynucleotide according to any one of above and one or more polynucleotides which are on the same or a different vector encoding a gRNA.

In another aspect, the disclosure provides an engineered cell comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.

In some embodiments, the cell is expressing the Casl2 protein. In some embodiments, the cell transiently expresses or non-transiently expresses the modified CRISPR-Casl2 protein. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

In another aspect, the disclosure provides a reagent kit comprising the Casl2 protein of any one of above, or comprising the Casl2 polynucleotide of any one of above, or comprising the vector of any one of above, or comprising the vector system of any one of above.

In another aspect, the disclosure provides a pharmaceutical composition comprising the Casl2 protein of any one of above or the polynucleotide of any one of above or the vector of any one of above or the vector system of any one of above formulated for delivery by AAV (adena- associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (virus-like particles), liposomes, plasmids, lipid nanoparticles (LNPs), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

“Gammaretrovirus” refers to a genus of the retroviridae family. Exemplary gammaretroviruses include mouse stem cell virus, murine leukemia virus, feline leukemia virus, feline sarcoma virus, and avian reticuloendotheliosis viruses.

The CRISPR-Casl2 system of the below or pharmaceutical composition of above described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof, can be delivered by various delivery systems such as vectors, e.g., plasmids, viral delivery vectors, such as adeno- associated viruses (AAV), lentiviruses, adenoviruses, and other viral vectors, or methods, such as nucleofection or electroporation of ribonucleoprotein complexes consisting of Type V-I effectors and their cognate RNA guide or guides. The proteins and one or more RNA guides can be packaged into one or more vectors, e.g., plasmids or viral vectors. For bacterial applications, the nucleic acids encoding any of the components of the CRISPR systems described herein can be delivered to the bacteria using a phage. Exemplary phages, include, but are not limited to, T4 phage, Mu, X phage, T5 phage, T7 phage, T3 phage, <T>29, M13, MS2, Qp, and 0>X174.

In some embodiments, the vectors, e.g., plasmids or viral vectors, are delivered to the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. Such delivery may be either via a single dose or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.

In certain embodiments, the delivery is via adeno-associated viruses (AAV), e.g., AAV2, AAV8, or AAV9, which can be administered in a single dose containing at least IxlO 5 particles (also referred to as particle units, pu) of adenoviruses or adeno-associated viruses. In some embodiments, the dose is at least about IxlO 6 particles, at least about IxlO 7 particles, at least about IxlO 8 particles, or at least about IxlO 9 particles of the adeno-associated viruses. Due to the limited genomic payload of recombinant AAV, the smaller size of the Casl2 proteins described herein enables greater versatility in packaging the effector and RNA guides with the appropriate control sequences (e.g., promoters) required for efficient and cell-type specific expression.

In some embodiments, the delivery is via a recombinant adeno-associated virus (rAAV) vector. For example, in some embodiments, a modified AAV vector may be used for delivery. Modified AAV vectors can be based on one or more of several capsid types, including AAV1, AV2, AAV5, AAV6, AAV8, AAV8.2. AAV9, AAV rhlO, modified AAV vectors (e.g., modified AAV2, modified AAV3, modified AAV6) and pseudotyped AAV (e.g., AAV2/8, AAV2/5 and AAV2/6). Exemplary AAV vectors and techniques that may be used to produce rAAV particles are known in the art (see, e.g., Aponte-Ubillus et al. (2018) Appl. Microbiol. Biotechnol. 102(3): 1045-54; Zhong et al. (2012) J. Genet. Syndr. Gene Ther. SI: 008; West et al. (1987) Virology 160: 38-47 (1987); Tratschin et al. (1985) Mol. Cell. Biol. 5: 3251-110), each of which is incorporated by reference).

In some embodiments, the delivery is via plasmids. The dosage can be a sufficient number of plasmids to elicit a response. In some cases, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg. Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR enzymes, operably linked to the promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii). The plasmids can also encode the RNA components of a CRISPR-Cas system, but one or more of these may instead be encoded on different vectors. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or a person skilled in the art.

In another embodiment, lipid nanoparticles (LNPs) are contemplated. LNPs can take different materials to form different forms. For example, the LNP may comprises: a cationic lipid at a molar ratio between 35% and 45%, a polyethylene glycol (PEG) conjugated (PEGylated) lipid at a molar ratio between 0.25% and 2.75%, a cholesterol-based lipid at a molar ratio between 20% and 35%, and a helper lipid at a molar ratio of between 25% and 35%, wherein all the molar ratios are relative to the total lipid content of the LNP. LNP can be made into different sizes, such as an average diameter of 30-200 nm or 80-150 nm.

In another embodiment, the delivery is via liposomes or lipofection formulations and the like, and can be prepared by methods known to those skilled in the art. Such methods are described, for example, in WO 2016205764 and U.S. Pat. Nos. 5,593,972; 5,589,466; and 5,580,859; each of which is incorporated herein by reference in its entirety.

In some embodiments, the delivery is via nanoparticles or exosomes. For example, exosomes have been shown to be particularly useful in the delivery of RNA.

Further means of introducing one or more components of the new CRISPR systems into cells is by using cell penetrating peptides (CPP). In some embodiments, a cell penetrating peptide is linked to the CRISPR enzymes. In some embodiments, the CRISPR enzymes and/or RNA guides are coupled to one or more CPPs to transport them inside cells effectively (e.g., plant protoplasts). In some embodiments, the CRISPR enzymes and/or RNA guide(s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.

In another aspect, the disclosure provides an engineered, non-naturally occurring CRISPR- Cas system comprising: a) the Casl2 protein of any one of above or the polynucleotide encoding the Casl2 protein; b) at least one engineered guide sequence or one or more engineered nucleic acid encoding the at least one engineered guide sequence, and the guide sequence comprises a direct repeat sequence capable of binding the Casl2 protein and a spacer sequence capable of hybridizing to a target nucleotide sequence.

The engineered Casl2 protein that complexes with the guide sequence to form a CRISPR complex, and wherein in the CRISPR complex the nucleic acid molecule target one or more polynucleotide loci.

In some embodiments, the direct repeat sequence and the spacer sequence are heterologous. “Heterologous”, as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively.

In some embodiments, the system comprises at least one guide sequence which is capable of hybridizing at least one target sequence or different regions of one target sequence. In some embodiments, the guide sequence hybridizes to one or more target sequences in a prokaryotic cell or in a eukaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a nonhuman primate cell, and a human cell. In some embodiments, the eukaryotic cell comprises a mammalian cell. In some embodiments, the mammalian cell comprises a human cell. In some embodiments, the eukaryotic cell comprises a plant cell.

In some embodiments, the target sequence is a DNA. In some embodiments, the target sequence is selected from: double stranded DNA, single stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.

In some embodiments, the direct repeat sequence comprises a stem- loop structure which comprising a first stem nucleotide strand which comprises 4-6 nucleotides; a second stem nucleotide strand which comprises 4-6 nucleotides, wherein the first and second stem nucleotide strands can hybridize with each other; and a loop nucleotide strand arranged between the first and second stem nucleotide strands, wherein the loop nucleotide strand comprises 4 or 5 nucleotides.

In some embodiments, the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to any one of SEQ ID NOs:153-156.

In some embodiments, the direct repeat sequence is selected from SEQ ID NOs:153-156 for the Casl2 protein comprising an amino acid sequence selected from SEQ ID NOs: 1-4, SEQ ID NOs: 6-7, SEQ ID NOs: 9-11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NOs: 17-28, or a homologue thereof. For example, in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155; in some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156.

In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 (5’- AAUUUCUACUAUUGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 1, wherein UAUU is the loop nucleotide (A of FIG.6, FIG.7). In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 (5’- AAUCCGUAACUUUGCAUUUGCAAAA-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO:9, wherein AUUU is the loop nucleotide (B of FIG.6). In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 (5’- AAUUUCUACUAUCGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 13, wherein UAUC is the loop nucleotide (C of FIG.6). In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 (5’- AAUUUCUACUGUUGUAGAU-3’) corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 15, wherein UGUU is the loop nucleotide (D of FIG.6). In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 2. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 6. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 10. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 17. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 19. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 26. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 2. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 3. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 9. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 154 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 4. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 10. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 13. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 155 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 28. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 1. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 15. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 17. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 19. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 20. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 23. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 26. In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 156 corresponding to the Casl2 protein comprising the amino acid sequence of SEQ ID NO: 28. And so on.

In some embodiments, the direct repeat sequence is shown as SEQ ID NO: 153 (5’- AAUUUCUACUAUUGUAGAU-3’), wherein UAUU is the loop nucleotide. The direct repeat sequence is set forth in SEQ ID NO: 153 used by the Casl2 protein comprising an amino acid sequence selected from any one of SEQ ID NO: 5, SEQ ID NO: 8, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, or SEQ ID NOs: 29-35.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 90%, 95%, or 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156.

In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 95% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 95% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein comprises an amino acid sequence having at least 98% sequence identity to any one of is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence is set forth in SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 1, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 20 or SEQ ID NO: 26, and the direct repeat sequence is set forth in SEQ ID NO: 156.

In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 1 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 19 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 20 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 26 and the direct repeat sequence comprises a nucleotide sequence having at least 90% or 95% identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 17 and the direct repeat sequence is set forth in any one of SEQ ID NO: 156. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 19 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 20 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153. In some embodiments, the Casl2 protein is set forth in SEQ ID NO: 26 and the direct repeat sequence is set forth in any one of SEQ ID NO: 153.

The guide RNA secondary structures of the Casl2 protein suggest that Casl2 protein could process and utilize each other’s crRNAs for DNA targeting. A “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides that are known or predicted to form a double strand (stem portion) that is linked on one side by a region of predominantly single- stranded nucleotides (loop portion). The terms “hairpin” and “fold-back” structures are also used herein to refer to stem-loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art. As is known in the art, a stem-loop structure does not require exact base-pairing. Thus, the stem may include one or more base mismatches. Alternatively, the base-pairing may be exact, i.e., not include any mismatches. The predicted stem loop structures of the direct repeats are illustrated in FIG.6 and FIG.7. In FIG.6 and FIG.7, “N” is just an example illustration and does not represent its actual nucleotide quantity. The direct repeat sequence in FIG.7 is same to the direct repeat sequence in A of FIG.6.

In certain embodiments, the Casl2 protein has the nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity or the nucleic acid binding activity.

In some embodiments, Casl2 protein has endonuclease activity, nickase activity, and/or exonuclease activity.

In certain embodiments, the Casl2 protein according to the disclosure as described herein, the Casl2 protein may be a deactivated or inactivated Casl2 protein (e.g. “dead” Casl2 protein), wherein catalytic activity is partially or (substantially) completely lost, as described herein elsewhere. Loss of catalytic activity in this context means that the Casl2 protein is not capable of cleaving DNA (e.g. not capable of inducing double strand breaks, or only capable of inducing single strand breaks, such as a nickase). The Casl2 protein may be used to reduce off-target effects, as defined herein elsewhere. The Casl2 protein may also be part of a fusion protein, as defined herein elsewhere. The Casl2 protein may also be described to include a destabilization domain, as defined herein elsewhere. The Casl2 protein may also be a split Casl2 protein, as defined herein elsewhere. The Cas 12 protein may also be an inducible Cas 12 protein, as defined herein elsewhere. The Cas 12 protein may also be part of a self-inactivating system (SIN), as defined herein elsewhere. The Cas 12 protein may also be part of a synergistic activator system (SAM) as defined herein elsewhere.

Accordingly, in certain embodiments, the Cas 12 protein polypeptide according to the disclosure as described herein is comprised in a fusion protein with a functional domain. In certain embodiments, said functional domain comprises a (transcriptional) activator domain, a (transcriptional) repressor domain, a recombinase, a transposase, a histone remodeler, a DNA methyltransferase, a cryptochrome, a light inducible/controllable domain, or a chemically inducible/controllable domain.

In certain embodiments, the Casl2 polypeptide according to the disclosure as described herein is not capable of inducing a DNA double strand break. In certain embodiments, the Casl2 polypeptide according to the disclosure as described herein is a nickase. In certain embodiments, the Casl2 polypeptide according to the disclosure as described herein is a catalytically inactive Casl2 polypeptide. In certain embodiments, the Casl2 polypeptide according to the disclosure as described herein is not capable of inducing a DNA single strand break. In an exemplary, the Casl2 protein is a dead Casl2 protein having a catalytically inactive. In an exemplary, the Casl2 protein is a nickase having a catalytically inactive.

In some embodiments, a vector encoding the Casl2 protein lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. In some embodiments, the Casl2 protein lack all DNA cleavage activity when the DNA cleavage activity of the enzyme is about no more than 25%, 10%, 5%, 1%, 0.1%, 0.01%, or less of the DNA cleavage activity. Thus, the Casl2 protein may be used as a generic DNA binding protein with or without fusion to a functional domain. In one aspect of the disclosure, the Casl2 enzyme may be fused to a protein, e.g., a TAG, and/or an inducible/controllable domain such as a chemically inducible/controllable domain. The Casl2 in the disclosure may be a chimeric Casl2 proteins; e.g., a Casl2 having enhanced function by being a chimera. Chimeric Casl2 proteins may be new Cas containing fragments from more than one naturally occurring Cas. In some embodiments, the Cas 12 protein has enhanced on target activity without higher off target cutting or for making super cutting nickases, or for combination with a mutation that renders the Cas dead for a super binder.

The Cas 12 enzyme provided in this disclosure can recognize a short motif associated in the vicinity of a target DNA called a Protospacer Adjacent Motif (PAM). The Casl2 enzyme can recognize the canonical PAM comprising or consisting of 5'-TTTN-3' and the non-canonical sequences, wherein X denotes any nucleotide. For example, the canonical PAM may be TTTA, TTTT, TTTG, or TTTC.

In some embodiments, the spacer sequence is between 10 and 40 nucleotides in length, preferably the spacer sequence is between 15 and 30 nucleotides in length, or between 18 and 25 nucleotides in length.

In some embodiments, a mRNA or a DNA encodes the Cas 12 protein.

In some embodiments, the polynucleotide encoding the Cas 12 protein, operably linked to a promoter. In some embodiments, the promoter is a constitutive promoter, a tissue-specific promoter or an inducible promoter. In some embodiments, the polynucleotide encoding the Cas 12 protein operably linked to a promoter is in a vector. In some embodiments, the vector is selected from the group consisting of a retroviral vector, a lentiviral vector, a phage vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.

In some embodiments, the system further comprising a donor template nucleic acid, the donor template nucleic acid is a DNA or an RNA or a DNA-RNA hybrids.

In some embodiments, the targeting of the target sequence by the Cas 12 protein and guide sequence results in a modification of the target sequence. In some embodiments, the modification of the target sequence is a cleavage event or a nicking event.

In some embodiments, the target sequence is 3’ of a Protospacer Adjacent Motif (PAM), the PAM sequence is TTTR (R is A or G) and the Cas 12 protein comprises an amino acid sequence selected from SEQ ID NO: 1 or SEQ ID NO: 4, or a homologue thereof having at least 70% sequence identity; the PAM sequence is TYYN (Y is T or C, N is A, T, C, or G) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO: 17, or a homologue thereof having at least 70% sequence identity, or the PAM sequence is TTTD (D is A, G, or T) and the Casl2 protein comprises an amino acid sequence set forth in SEQ ID NO:26, or a homologue thereof having at least 70% sequence identity.

In another aspect, the disclosure provides a delivery system, wherein the system of any one of above is presented in selected from the group consisting of AAV (adena-associated viruses), Adenoviruses, retroviruses, HSV (herpes simplex virus), Gammaretrovirus, LV (lentivirus), eCIS (extracellular Contractile Injection System), eVLP (Engineered virus-like particles), VLP (viruslike particles), liposomes, plasmids, lipid nanoparticles (LNPs), exosomes, microvesicles, nucleic acid nanoassemblies, a gene gun, and/or an implantable device.

In another aspect, the disclosure provides an engineered cell comprising the system of any one of above. In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, or the delivery system of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring CRISPR- Cas system of any one of above, delivery system of above or cell of any one of above for use in a method of therapeutic treatment of a patient.

In another aspect, the disclosure provides a method of modifying or targeting a target DNA locus, the method comprising delivering to said locus a CRISPR-Cas system of any one of above or a delivery system of above.

In some embodiments, said modifying or targeting a target locus comprises inducing a DNA strand break. In some embodiments, said modifying or targeting a target locus comprises inducing a DNA double strand break or a DNA single strand break. In some embodiments, said modifying or targeting a target locus comprises altering gene expression of one or more genes. In some embodiments, said modifying or targeting a target locus comprises epigenetic modification of said target DNA locus. In some embodiments, the method is a method of modifying a cell, a cell line, or an organism by manipulation of one or more target sequences at genomic loci of interest.

In some embodiments, the cell is a eukaryotic cell or a prokaryotic cell. In some embodiments, the eukaryotic cell is selected from the group consisting of: a plant cell, a fungal cell, a single cell eukaryotic organism, a mammalian cell, a reptile cell, an insect cell, an avian cell, a fish cell, a parasite cell, an arthropod cell, a cell of an invertebrate, a cell of a vertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In some embodiments, the cell is a mammalian cell or a human cell or a plant cell. In some embodiments, the method is in vitro or in vivo.

In another aspect, the disclosure provides a method of targeting and cleaving a doublestranded target DNA, the method comprising: contacting the double- stranded target DNA with a system of any one of above.

In some embodiments, cleaving the target DNA or target sequence results in the formation of an indel or the insertion of a nucleotide sequence. In some embodiments, cleaving the target DNA or target nucleotide comprising cleaving the target DNA or target sequence in two sites, and results in the deletion or inversion of a sequence between the two sites.

In another aspect, the disclosure provides an isolated eukaryotic cell comprising a modified target locus of interest, wherein the target locus of interest has been modified according to a method or via use of a composition or via use of a system of any one of the preceding contents.

The cleavage efficiency of the Casl2 protein on double- stranded DNA (dsDNA) is verified. The cleavage ratio is 2%- 100%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is less than 10%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is less than 5%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is less than 15%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio can be less than 20%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 30%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 40%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 50%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 60%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 70%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 80%. In some embodiments, in vitro cleavage efficiency assay, the range of the cleavage ratio is more than 90%. In some embodiments, the cleavage ratio is 50%-100%. In some embodiments, the cleavage ratio is 60%-100%. In some specific embodiments, the cleavage ratio is 70%-90%. In some specific embodiments, the cleavage ratio is 80%-90%. In some specific embodiments, the cleavage ratio is 80%-95%. In some specific embodiments, the cleavage ratio is 85%-95%. In some specific embodiments, the cleavage ratio is 85%-98%. In some specific embodiments, the cleavage ratio is 60%-90%. For another example, in a specific embodiment, the cleavage ratio can be 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, 18%, 20%, 25%, 30%, 35%, 40%, 50%, 55%, 58%, 60%, 65%, 70%, 72%, 73%, 75%, 78%, 80%, 82%, 85%, 87%, 88%, 90%, 92%, 95%, 97%, 98%, 99%, 100% and so on.

To some Casl2 proteins, the test of the genome cleavage activity in mammalian cells shows that the gene editing efficiency 50%-95%. For example, in a specific embodiment, the gene editing efficiency can be 50%, 55%, 58%, 60%, 65%, 67%, 70%, 72%, 73%, 75%, 78%, 80%, 82%, 85%, 87%, 88%, 90%, 92%, 95% and so on.

The Casl2 protein also shows a lower off-targets and the off-targets are not detected in some Casl2 protein.

The programmability, specificity, and collateral activity of the Casl2 protein also make it an ideal switchable nuclease for non-specific cleavage of nucleic acids. In one embodiment, a Casl2 protein system is engineered to provide and take advantage of collateral non-specific cleavage of nucleic acids, such as ssDNA. In another embodiment, a Casl2 protein system is engineered to provide and take advantage of collateral non-specific cleavage of ssDNA. Accordingly, engineered Casl2 protein systems provide platforms for nucleic acid detection and transcriptome manipulation, and inducing cell death. Casl2 protein is developed for use as a mammalian transcript knockdown and binding tool. Casl2 protein is capable of robust collateral cleavage of RNA and ssDNA when activated by sequence- specific targeted DNA binding.

In certain embodiments, Casl2 protein is provided or expressed in an in vitro system or in a cell, transiently or stably, and targeted or triggered to non- specifically cleave cellular nucleic acids. In one embodiment, Casl2 protein is engineered to knock down ssDNA, for example viral ssDNA. In another embodiment, Casl2 protein is engineered to knock down RNA. The system can be devised such that the knockdown is dependent on a target DNA present in the cell or in vitro system, or triggered by the addition of a target nucleic acid to the system or cell.

In an embodiment, the Casl2 protein system is engineered to non- specifically cleave RNA in a subset of cells distinguishable by the presence of an aberrant DNA sequence, for instance where cleavage of the aberrant DNA might be incomplete or ineffectual.

Collateral activity was recently leveraged for a highly sensitive and specific nucleic acid detection platform termed SHERLOCK that is useful for many clinical diagnoses (Gootenberg, J. S. et al. Nucleic acid detection with CRISPR-Casl3a/C2c2. Science 356, 438- 442 (2017)).

According to the invention, engineered Cas 12 protein systems are optimized for DNA or RNA endonuclease activity and can be expressed in mammalian cells and targeted to effectively knock down reporter molecules or transcripts in cells.

The collateral effect of engineered Cas 12 protein with isothermal amplification provides a CRISPR-based diagnostic providing rapid DNA or RNA detection with high sensitivity and singlebase mismatch specificity. The Cas 12 protein-based molecular detection platform is used to detect specific strains of virus, distinguish pathogenic bacteria, genotype human DNA, and identify cell- free tumor DNA mutations. Furthermore, reaction reagents can be lyophilized for cold-chain independence and long-term storage, and readily reconstituted on paper for field applications.

The ability to rapidly detect nucleic acids with high sensitivity and single-base specificity on a portable platform may aid in disease diagnosis and monitoring, epidemiology, and general laboratory tasks. Although methods exist for detecting nucleic acids, they have trade-offs among sensitivity, specificity, simplicity, cost, and speed.

In another aspect, the disclosure provides a system for detecting the presence of a nucleic acid target sequence in an in vitro sample, comprising: a Cas 12 protein of any one of above; at least one guide polynucleotide comprising a guide sequence capable of binding the target sequence, and designed to form a complex with the Cas 12 protein; and a nucleic acid-based masking construct comprising a non-target sequence; and wherein the Cas 12 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the nontarget sequence of the nucleic acid-based masking construct activated by the target sequence.

In some embodiments, the system further comprising nucleic acid amplification reagents to amplify the trigger sequence. In some embodiments, the amplification reagents are isothermal amplification reagents. In some embodiments, the amplification reagents are nucleic-acid sequenced-based amplification (NASBA), recombinase polymerase amplification (RPA), loop- mediated isothermal amplification (LAMP), strand displacement amplification (SDA), helicasedependent amplification (HDA), or nicking enzyme amplification reaction (NEAR). In some embodiments, the target sequence is a target RNA sequence and the system further comprises an DNA polymerase and a primer designed to bind the target RNA sequence and further comprises a DNA polymerase promoter.

In another aspect, the disclosure provides a method for detecting target nucleic acids in samples comprising: contacting one or more samples with a Casl2 protein of any one of above; at least one guide polynucleotide comprising a guide sequence designed to have a degree of complementarity with the target sequence, and designed to form a complex with the Casl2 protein; and a nucleic acid-based masking construct comprising a non-target sequence, wherein the Casl2 protein exhibits collateral cleavage activity of RNA and/or ssDNA and cleaves the non-target sequence of the nucleic acid-based masking construct activated by the target sequences; and detecting a signal from cleavage of the non-target sequence, thereby detecting the one or more target nucleic acid sequences in the sample.

In another aspect, the disclosure provides an engineered, non-naturally occurring sgRNA, wherein the sgRNA comprises, in a tandem arrangement:

I. a direct repeat sequence;

II. a spacer sequence, which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182.

In some embodiments, the tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation. In some embodiments, the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

In some embodiments, the sgRNA interacts with the Casl2 protein, such as GEBx0047, GEBx0063, GEBx0064, or GEBx0070.

In another aspect, the disclosure provides a DNA polynucleotide molecule encoding the sgRNA of above.

In another aspect, the disclosure provides a DNA expression vector comprising the DNA polynucleotide molecule of above. In some embodiments, the vector further comprises one or more regulatory element(s) operably linked to sequences encoding the sgRNA. In some embodiments, at least one regulatory element is capable of directing expression of the sgRNA within the cell.

In another aspect, the disclosure provides a delivery vector carrying one or more sgRNA of any one of above.

In another aspect, the disclosure provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.

In another aspect, the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

In another aspect, the disclosure provides a DNA polynucleotide molecule encoding the spacer sequence of above.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a therapeutic or treatment or prevention or diagnosis or detection method of disease.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use as a medicament.

In another aspect, the disclosure provides the engineered, non-naturally occurring spacer sequence of above for use in a method of therapeutic treatment of a patient.

In some embodiments, the method further comprising contacting the one or more samples with reagents for amplifying one or more target sequences. In some embodiments, the amplification reagents are isothermal amplification reagents. In some embodiments, the amplification reagents are nucleic-acid sequenced-based amplification (NASBA), recombinase polymerase amplification (RPA), loop- mediated isothermal amplification (LAMP), strand displacement amplification (SDA), helicase- dependent amplification (HD A), or nicking enzyme amplification reaction (NEAR). In some embodiments, the target sequence is a target RNA sequence and the system further comprises an DNA polymerase and a primer designed to bind the target RNA sequence and further comprises a DNA polymerase promoter. In some embodiments, the masking construct suppresses generation of a detectable positive signal until cleaved or deactivated, or masks a detectable positive signal, or generates a detectable negative signal until the masking construct is deactivated or cleaved. In some embodiments, the masking construct comprises: a. a silencing RNA that suppresses generation of a gene product encoded by a reporting construct, wherein the gene product generates the detectable positive signal when expressed; b. a ribozyme that generates the negative detectable signal, and wherein the positive detectable signal is generated when the ribozyme is deactivated; or c. a ribozyme that converts a substrate to a first color and wherein the substrate converts to a second color when the ribozyme is deactivated; d. an aptamer and/or comprises a polynucleotide-tethered inhibitor; e. a polynucleotide to which a detectable ligand and a masking component are attached; f. a nanoparticle held in aggregate by bridge molecules, wherein at least a portion of the bridge molecules comprises a polynucleotide, and wherein the solution undergoes a color shift when the nanoparticle is disbursed in solution; g. a quantum dot or fluorophore linked to one or more quencher molecules by a linking molecule, wherein at least a portion of the linking molecule comprises a polynucleotide; q. a polynucleotide in complex with an intercalating agent, wherein the intercalating agent changes absorbance upon cleavage of the polynucleotide; or h. two fluorophores tethered by a polynucleotide that undergo a shift in fluorescence when released from the polynucleotide.

The following non-limiting examples are provided to further illustrate embodiments of the disclosure disclosed herein. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent approaches that have been found to function well in the practice of the disclosure, and thus can be considered to constitute examples of modes for its practice. Those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments that are disclosed and still obtain a like or similar result without departing from the spirit and scope of the disclosure.

Example 1: A method of metagenomic analysis for the proteins

Metagenomic sequence data from public databases are search using Hidden Markov Models generated based on known Cas protein sequences including class II type V Cas effector proteins. CRISPR-Cas protein identified by the search are aligned to known proteins to identify potential active sites. From hundreds of potential sequences, finally, this metagenomic workflow results in the delineation of the Cas 12 protein as above described and show in FIG.1A-FIG.1D.

The phylogenetic tree was constructed by IQTREE (FIG.1A) to visualize the relatedness of the orthologs at the primary amino-acid level using 103 Cas 12a, Cas 12b, Cas 12c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the Cas 12 proteins provided by this disclosure are marked with a circle while the reference nucleases (AsCpfl, EbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Cas 12a than other subtypes, the tree shows that the Cas 12 proteins are representatives of unique Cas 12 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

Structure model of Cas 12 protein was built using the crystal structure of AsCpfl/LbCpfl/FnCpfl (PDB ID: 5XH7/5XUZ/6ilK) as a template using SWISS-MODEE web server with its default parameters. The constructed models were used for domain determination and the result shows in FIG.2A and FIG.2B. The Cas 12 protein all contain the RECI, RuvC, and NUC-terminal ends.

The phylogenetic tree was constructed by IQTREE (FIG.1C) to visualize the relatedness of the orthologs at the primary amino-acid level using 73 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the GEBx0019, GEBx0022 and GEBxOO33 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Cas 12a than other subtypes, the tree shows that the engineered Cas 12 protein studied here are representatives of unique Cas 12 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

Based on the preceding classify, the amino acid sequences of GEBx0019, GEBx0022 and GEBxOO33 were analyzed. As shown in the FIG.3A, GEBx0019, GEBx0022 and GEBxOO33 together with the AsCpfl has a relatively larger WED.3 domain (165 AA) than LbCpfl and FnCpfl (130 AA). AA is the abbreviation of amino acid. In addition, the comparison of the PI domain and NUC domain between GEBx0019, GEBx0022 and GEBxOO33 and AsCpfl in FIG.3B, FIG.3C and FIG.4 shows that GEBx0019, GEBx0022 and GEBxOO33 have smaller PI domain and NUC domain than AsCpfl. These results reflect the differences in their evolutionary orientations, while these smaller size Casl2 proteins are expected to facilitate the intracellular delivery via viral vectors or other delivery systems with higher gene editing activity and lower off-target rates.

GEBx0019, GEBx0022 and GEBxOO33 structure modelings were achieved by SWISS MODEE developed by Computational Structural Biology Group at the SIB Swiss Institute of Bioinformatics and the Biozentrum of the University of Basel and the template used for both three proteins was AsCpfl (PDB ID: 5xh7). The modeled structures are shown in FIG.4.

The phylogenetic tree was constructed by IQTREE (FIG. IB) to visualize the relatedness of the orthologs at the primary amino-acid level using 88 Casl2a, Casl2b, Casl2c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the GEBx0030 and GEBx0039 in this study are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Casl2a than other subtypes, the tree shows that the GEBx0030 and GEBx0039 are representatives of unique Casl2 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

Compared the engineered GEBx0030 and GEBx0039 with the reference nucleases (AsCasl2a, LbCasl2a, and FnCasl2a), as show in FIG.5A. From FIG.5A, we can see that GEBx0030 and GEBx0039 share low identity with three reference sequences (below 30% for GEBx0030 and 25% for GEBx0039). Besides, the size of GEBx0030 (1222 aa) and GEBx0039 (1171 aa) are much smaller than the usual Casl2a. These features suggest that GEBx0030 and GEBx0039 are independent of the existing Casl2a family.

The engineered GEBx0030 and GEBx0039 scanned for matches against the InterPro protein signature databases, the result shows in FIG.2C. FIG.2C shows that GEBx0030 and GEBx0039 contain the conserved RuvC domain as other type V Cas protein.

The phylogenetic tree was constructed by IQTREE (FIG. ID) to visualize the relatedness of the orthologs at the primary amino-acid level using 120 Cas 12a, Cas 12b, Cas 12c, Casl2d, Casl2e, Casl2f, Casl2g, Casl2h, Casl2i, Casl2j, and Casl2k sequences from The National Center for Biotechnology Information (NCBI), various publications, and patents. The branches of the tree corresponding to the Cas 12 proteins provided by this disclosure are marked with a circle while the reference nucleases (AsCpfl, LbCpfl, and FnCpfl) were marked with a star. Although phylogenetically more closely related to Cas 12a than other subtypes, the tree shows that the Cas 12 proteins are representatives of unique Cas 12 clusters. They each have their unique branches, suggesting that they are evolutionarily distinct.

Compared the engineered GEBx0074-GEBx0080 with the reference nucleases (AsCpfl, LbCpfl, and FnCpfl), as show in FIG.5B. From FIG.5B, we can see that GEBx0074-GEBx0080 share low identity with three reference sequences (below 50%). These features suggest that GEBx0074-GEBx0080 are independent of the existing Cas 12a family.

The domain arrangement of GEBx0074-GEBx0080 is shown in FIG.2D.

The amino acid sequences of Cas 12 proteins are shown in SEQ ID NOs: 1-35. The detail information is shown in Table 2.

The amino acid sequences of AsCpfl, LbCpfl and FnCpfl are shown in SEQ ID NOs:71-73 (Table 4).

The further sequences analysis to GEBx0013/0047/0063/0064/0070 (FIG.13) and other GEBx Casl2 proteins provided in this disclosure found that there is no Zinc finger domain in any one of the GEBx Casl2 effectors. That is to say, the Casl2 proteins provided by this disclosure are all lack of the Zinc finger domain. The references sequences (SEQ ID NOs:189-194) in FIG.13 are shown in Table 4.

Table 4 The amino acid sequences disclosed herein

EXAMPLE 2: Protocol for predicted crRNA folding

Predicted RNA folding of the active single crRNA sequence located at the CRISPR array of Casl2 proteins is computed using the RNAfold webserver developed by Lorenz et al 2011. The folded crRNAs are shown in FIG.6, and FIG.7.

In FIG.6 and FIG.7, N represents the target specific sequence and the number of N is just an example illustration which does not represent its actual nucleotide quantity.

EXAMPLE 3: Protein Expression and Purification

As shown in Table 5, the complete amino acid sequences of Casl2 proteins (GEBx0013, GEBx0018, GEBx0032, GEBxOO33, GEBx0037) are shown in SEQ ID NOs: 74-78. The amino acid sequences of nuclear localization signals (NLSs) are shown in SEQ ID NOs: 79-80. The FLAG-tagged sequence is shown in SEQ ID NOs: 81-82. The complete amino acid sequences of other Casl2 proteins with NLSs and FLAG sequence are same to GEBx0013, GEBx0018, GEBx0032, GEBxOO33, and GEBx0037.

Table 5 The complete amino acid sequence of Casl2 proteins

The DNA fragments encoding the Casl2 proteins, together with 3’ and 5’ nuclear localization signals (NLS) and FLAG-tagged sequences (shown in SEQ ID NOs:83-87), are synthesized by GenScript and assembled by Gibson assembly into pEAST-Blunt E2 expression plasmid (shown in FIG.8). DNA fragments encoding the Casl2 proteins are partly shown in Table 6 and the others can refer to it to design.

Table 6 The Nucleotide encoding the Casl2 proteins

The nucleotide sequences of the Casl2 protein are synthesized commercially (like by Ruibiotech).

Casl2 proteins are expressed as FLAG-tagged fusion proteins from an inducible T7 promoter (pEASY-Blunt E2 expression plasmid) in a protease deficient E.coli B strain. Cells expressing the FLAG-tagged proteins are lysed by sonication The supernatant was loaded on the Ni 2+ -charged HisTrap HP column (GE Healthcare) and eluted with a linear gradient of increasing imidazole concentration (from 0 to 500 mM) in 20 mM Tris-HCl, pH 7.5 at 25°C, 0.5 M NaCl Buffer on an AKTA Pure25 FPLC (Inscinstech). The eluate was resolved by SDS-PAGE on BeyoGel Plus PAGE (Beyotime) and stained with Feto SDS-PAGE staining buffer (H&Z lifescience). Purity was determined using densitometry of the protein band with ImageLab software (Bio-Rad). Purified endonucleases were dialyzed into a storage buffer composed of 20 mM CHsCOONa, 500 mM NaCl, 0.1 mM EDTA, 0.1 mM TCEP, 50% glycerol; pH 6.0 and stored at -80°C.

The result of SDS-PAGE agarose gel electrophoresis about GEBx0032 is shown in FIG.9A.

As shown in FIG.9A, the purity of GEBx0032 protein reach -50% after Ni particle affinity chromatography. The optimal elution condition was 200 mM imidazole (40% NiB buffer).

The results of SDS-PAGE agarose gel electrophoresis about GEBxOO33 are shown in FIG.9B.

As shown in FIG.9B, the purity of GEBxOO33 protein can reach 70% after Ni particle affinity chromatography. The optimal elution condition was 250 mM imidazole (50% NiB buffer). The result of SDS-PAGE agarose gel electrophoresis about GEBx0037 is shown in FIG.9C.

As shown in FIG.9C, the purity of GEBx0037 protein reach -50% after Ni particle affinity chromatography. The optimal elution condition was 250 mM imidazole (50% NiB buffer).

The results of SDS-PAGE agarose gel electrophoresis about GEBx0013 and GEBx0018 are shown in FIG.9D and FIG.9E.

As shown in FIG.9D, the purity of GEBx0013 protein can reach 90% after Ni particle affinity chromatography. The optimal elution condition was 150 mM imidazole (30% NiB buffer).

As shown in FIG.9E, the purity of GEBx0018 protein can reach 45% after Ni particle affinity chromatography. The optimal elution condition was 100 mM imidazole (20% NiB buffer).

The purification of the other Casl2 proteins was performed according to above-mentioned protein manipulation.

EXAMPLE 4: PAM Sequence identification/confirmation for the endonucleases described herein.

A commercially available cell-free TXTL system developed from an all-Escherichia coli (E.coli) lysate (myTXTL, Arbor Biosciences) is used to rapidly express putative endonucleases from a plasmid (pEASY-Blunt E2) and targeting or non-targeting gRNAs. PAM sequences are determined by sequencing plasmids containing randomly generated potential PAM sequences that could be cleaved by the nucleases. In this system, an E.coli codon-optimized nucleotide sequence encoding the nuclease is transcribed and translated in vitro from a PCR fragment under the control of a T7 promoter. A synthetic crRNA encoding the repeat-spacer sequence is added to the system. Successful expression of the endonuclease in the TXTL system followed by complex with crRNA provides active in vitro CRISPR nuclease complexes.

A library of target DNA fragments containing a protospacer sequence preceded by 8N mixed bases (potential PAM sequence) is incubated with the output of the TXTL reaction. After 1 hour of incubation, the reaction is stopped, and the DNA is recovered via a DNA clean-up kit. Adaptor sequences are blunt end ligated to DNA with active PAM sequences that have been cleaved by the endonuclease, whereas DNA that has not been cleaved was inaccessible for ligation. DNA segments comprising active PAM sequences are then amplified by PCR with primers specific to the library and the adapter sequence. The PCR amplification products are resolved on a gel to identify amplicons that to map cleavage events. The amplified segments of the cleavage reaction are also used as template for preparation of an NGS library or as a substrate for sanger sequencing. Sequencing this resulting library, which is a subset of the starting 8N library, revealed sequences with PAM activity compatible with the CRISPR complex. The PAM sequences are collected into seqLogo (see e.g., Huber et al. Nat Methods. 2015 Feb; 12(2): 115-21) representations. The seqLogo shows the 8 bp which are upstream of the spacer labelled as positions 0-7. For PAM testing with a processed RNA construct, the same procedure is repeated except that an in vitro transcribed RNA is added along with the plasmid library and the minimal CRISPR array template is omitted.

PAM depletion assay

A library of target DNA fragments containing a protospacer sequence preceded by 8N mixed bases (potential PAM sequence) was synthesized for the in vitro PAM depletion assay. In set of experiment, a 10 pL mixture containing 0.417 pM PAM library dsDNA (5’- TACACGACGCTCTTCCGATNNNNNNNNgagaagtcattcaataaggccactAGATCGGAA GAGCAC ACGTCTGAACTCCAGTCAC-3’, SEQ ID NO: 88), 4.17 pM purified Casl2 protein, 5 pM corresponding guide RNA harboring 5’-gagaagUcaUUcaaUaaggccac-3’ (SEQ ID NO: 89) spacer and 1 pL NEBuffer™ 2.1 was incubated under 37°C for 2 hours. After 2 hours of incubation, the reaction was stopped by adding 1 pL Proteinase K (20 mg/mL) and incubating under 50 °C for ten minutes. Subsequently, the adaptor sequence for NGS were ligated to the uncleaved dsDNA through 1 round PCR reaction using the barcoded primers, whereas DNA that has been cleaved was inaccessible for ligation. Barcoded primers bind to sequences adjacent to the randomized PAM of the libraries and append sample barcodes and Illumina read 1 and 2 sequencing primer binding sites. The PCR amplification products were purified using SPRI cleanup beads (Beckman, Cat. B23318) to get an NGS library and then was performed with 2x75 bp paired-end sequencing on an Illumina iSeqlOO Sequencer. The run data were demultiplex and analyzed using HT-PAMDA method (Walton, R.T. et al. 2021). PAM sequences were collected into heatmap representations. The heatmap showed the 4 bp which were upstream of the spacer labelled as positions 1-4.

As shown in FIG.10, GEBxOO33 show a preference for the 5’-TTTV-3’ PAM, wherein V is A, C or G.

EXAMPLE 5: In vitro cleavage efficiency

Target DNAs containing protospacer sequence (5’ -agaagtcattcaataaggccac-3’ , SEQ ID NO: 195) and PAM sequence (5’-TTTV-3’, V is A, C or G)) are constructed by DNA synthesis. A single representative PAM is chosen for testing when the PAM has degenerate bases. The target DNAs are comprised of 515 bp of linear DNA derived from a plasmid via PCR amplification with a PAM and protospacer located 700 bp from one end. Successful cleavage results in fragments of -200 and -300 bp. The target DNA, in vitro transcribed single RNA, and purified recombinant protein are combined in a cleavage buffer (NEBuffer 2.1) with an excess of protein and RNA and are incubated for 5 minutes to 3 hours, usually 1 hour. The reaction is stopped via addition of RNase A and incubation at 60 minutes. The reaction is then resolved on a 2% TAE agarose gel and the fraction of cleaved target DNA is quantified in ImageLab software.

The cleavage efficiency is represented by cutting ratio. The cutting ratio is calculated by the Gray value analysis and the formula like this: The cutting ratio (%) = 100 x (l-sqrt(l-(b + c)/(a + b + c)), “a” represents the uncut band gray value, “b” and “c” respectively represent the gray value of the two short sequences that be cut, “sqrt” is abbreviation for Square Root Calculations. In this application, cutting ratio can be also called cleavage ratio.

As shown in Table 7, in vitro dsDNA cleavage, to GEBx0013, GEBx0018 and GEBx0033, the crRNA is shown as SEQ ID NO: 90; to GEBx0032, the crRNA is shown as SEQ ID NO: 91; to GEBx0037, the crRNA is shown as SEQ ID NO: 92.

For dsDNA cleavage, 0.417 pmol substrate (515 bp DNA) was incubated with 4.17 pmol Casl2 protein and 5 pmol crRNA in lx reaction buffer (Novoprotein) at 37°C for 120 min. The reaction system of the dsDNA cleavage was 20pl. The reaction was then quenched with Ipl Proteinase K (20 mg/ml) at 50°C for 10 min. The cleavage products were mixed with 6x gel loading dye (NEB) and analyzed by 2% agarose gel electrophoresis and the fraction of cleaved target DNA was quantified in ImageLab software. The PAM sequence was TTTC. The nucleotide sequence of 515 bp DNA (dsDNA) was shown as SEQ ID NO: 93. The result of electrophoresis is shown in FIG.11 A- FIG.11C.

Each group of experiments was carried out three times in parallel and the results were analyzed using GraphPad Prism 9. The result is shown in FIG.12.

Table 7 The nucleotide sequences

As shown in FIG.11 A, GEBxOO33 exhibited a high cleavage efficiency and the cleavage efficiency of GEBxOO33 is over 85%.

As shown in FIG.1 IB, GEBx0032 and GEBx0037 exhibited the cleavage activity to the dsDNA and the cleavage efficiency of GEBx0032 is over 85%.

As shown in FIG.11C and FIG.12, GEBx0013 and GEBx0018 exhibited the cleavage activity to the dsDNA and the cleavage efficiency of GEBx0013 and GEBx0018 is 87% and 67%, respectively.

EXAMPLE 6: GFP reporter assay

GFP reporter plasmids (pmax-EGFP) containing a target DNA sequence containing spacer sequences and potential PAM sequences (determined e.g., as in Example 4) are constructed by DNA synthesis and cloning. A single representative PAM is chosen for testing when the PAM has degenerate bases. The target site is located at the EFla promoter region which could drive GFP expression. GFP fluorescence is measured with an Infinite M200 Plate Reader (Tecan) using excitation and emission wavelengths of 488 M and 533 nM, respectively. The reactions are incubated for 16 hours at 29°C and the resulting fluorescence data are analyzed using end-point and time-course analyses. The reported production of GFP is calculated using a linear standard calibration curve developed from recombinant GFP. For the plate reader used for our experiments, the raw fluorescence values were divided by the conversion factor 9212.61/pmol.

EXAMPLE 7 : PAM determination in mammalian cell line

In a set of experiments, the HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (Gibco™). For reverse transfection, the HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (Gibco™). A volume of 450 pL of cells with a density of 100,000 cells/well was mixed with 50 pL mixture containing Lipofectamine™ 3000 (ThermoFisher Scientific, Cat. L3OOOOO8), Opti-Mem (Volume refill to 50 pL), 1 pL dsODN (10 pM), 100 ng (~1 pL) pgRNA (SEQ ID NO:94+ SEQ ID NO:96 or SEQ ID NO:95+ SEQ ID NO:96) harbored Humanspacer3 (GEBx0070, GEBx0013, GEBx0018, GEBx0032, GEBx0063, GEBx0064) or 5854 (GEBx0047) spacer and 400 ng (~1 pL) pCasX plasmid harbored Casl2 protein CDS (with NLS and FLAG) per the manufacturer’s protocol, which shown in FIG.14A- FIG.14G. Then seeded the cell mixture onto a 24-well plate and cultured at 37°C and 5% CO2. 10 pM dsODN was annealed using dsODN-Top and dsODN-BoT oligonucleotides pre-transfection.

72 hours post-transfection, the supernatant was removed and the cell layer was washed by PBS. Then the genomic DNA was extracted from each well of a 24-well plate using DNA Extraction solution (Denogen (Beijing) Bio Sci & Tech Co. Ltd, Cat. DNS033-48) per manufacturer’s protocol. All DNA samples (500ng, 260/280 value: 1.8-2.0) were subjected to Guide-Seq NGS analyses.

The basic method of Guide-Seq library preparation is described by Nikolay et. al (Nat. Protoc. 2021 ) . The extracted DNA sample were first sheared using KAPA Frag Kit (Cat# KK8602, Roche) . Fragmented DNA was purified and then phosphorated using T4 Polynucleotide Kinase (Cat#M0201S, NEB). An SS5-adapter (generated by annealing lOpM SS5TOP oligo with lOpM SS5BTM oligo) was ligated to the fragmented DNA using Quick Ligation™ Kit (Cat#M2200S, NEB), followed by two steps off-target PCR to add chemistry for sequencing.

For off-target PCR1 was performed using Platinum™ Taq DNA Polymerase (Cat#15966005, Invitrogen) with GSP1 (a mixture of GSPl-Top and GSPl-BoT) and Y_XX oligos. For off-target PCR2 was performed using Platinum™ Taq DNA Polymerase with GSP2 (a mixture of GSP2- TopA/B/C and GSPl-BoTA/B/C), Y_XX (Same to PCR1) and i753_XX oligos. The DNA product in each step described above need purification using SPRI Select (Cat#B23318, Beckman Coulter). The final library was quantified with qPCR and sequenced on Illumina NextSeq 1000. The reads were aligned to a reference genome after eliminating those having low quality scores. Q30 rate is more than 0.9. The reads length is between 130 bp-140 bp. The resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected.

Table 8 The nucleotide sequences referred above

Note: p: phosphorylation modification; *: phosphoro thioate (PS) bond or PS linkage.

PS bond or linkage refers to a bond where a sulfur is substituted for one nonbridging phosphate oxygen in a phosphodiester linkage, for example in the bonds between nucleotides bases.

In this example, the PAM preference of Casl2 proteins comprising GEBx0013, GEBx0018, GEBx0032, GEBx0047, GEBx0063, GEBx0064, and GEBx0070 were tested. The nucleic acid sequences (human Codon Optimized sequence) and thereof with NLS and FLAG of GEBx0013, GEBx0047, GEBx0063, GEBx0064, GEBx0070 are shown in Table 9. The nucleic acid sequences of referred in PAM determination GEBx0047, GEBx0063, GEBx0064, and GEBx0070 are set forth in SEQ ID NOs:112-115. The nucleic acid sequences of referred in PAM determination GEBx0013, GEBx0018, GEBx0032 are set forth in SEQ ID NOs: 83-85.

Table 9 The nucleic acid sequences of Casl2 proteins

The PAM preferences of GEBx0013, GEBx0018, GEBx0032, GEBx0047 and GEBx0070 in

HEK293 cell line are shown in FIG.14A, FIG.14C, FIG.14E, FIG.14F, and FIG.14G. As shown in

FIG.14A, GEBx0047 recognizes a PAM having a sequence TYYN (Y is T or C, N is A, T, C, or G). As shown in FIG.14C, GEBx0070 recognizes a PAM having a sequence TTTD (D is A, G, or T). As shown in FIG.14E, GEBx0013 recognizes a PAM having a sequence TTTR (R is A or G). As shown in FIG.14F, GEBx0018 recognizes a PAM having a sequence TTTR (R is A or G).

The statistical curve of FIG.14B, FIG.14D describes the fidelity of Casl2 proteins and FIG.14E, FIG.14F, and FIG.14G contain the statistical curve, wherein ‘perfect match’ (PM), where the sequences have 0 mismatches, and ‘mismatch’ (MM), where the sequences have mismatches. The PM curve with a steeper slope indicates a high fidelity of Casl2, as more perfectly aligned reads are aligned to fewer sites. In contrast, a MM curve with a steeper slope and longer tail indicates lower fidelity of Casl2, as more reads with one or more mismatches are mapped to a higher variety of sites. These FIGs show that GEBx0013, GEBx0018, GEBx0032, GEBx0047, and GEBx0070 are all have a high fidelity.

EXAMPLE 8: Testing of Genome Cleavage Activity of the CRISPR-Casl2 Complexes in Mammalian Cells

1. pgRNA and pCasX plasmid Transfection

In a set of experiments, HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum (Gibco™).

For forward transfection, cells were counted and plated 450pL on 24-well plates at a density of 150,000 cells/well in a 24-well plate for 24 hours prior to transfection. Cells were co-transfected with a total volume of 50pL lipoplex mixture containing pCasX plasmid (~lpL, 400 ng), pgRNA plasmid (~lpL, lOOng), Lipofectamine™ 2000 (IpL, Thermo Fisher Scientific, Cat. 11668019) and Opti-Mem, then the cells were cultured at 37°C and 5% CO2.

The pgRNA plasmid and pCasX plasmid are shown in FIG.15. In the pCasX plasmid, the nucleotide sequences (SEQ ID NO: 111-115) encoding GEBx0013/0047/0063/0064/0070 are codon-optimized for expression in mammalian cells. These nucleotide sequences further comprise 3’ and 5’ nuclear localization signals (NLSs) and FLAG-tagged sequences, and these nucleotide sequences are shown in Table 8 (SEQ ID NO: 116-120, lowercase letters represent the NLSs and FLAG-tagged sequences). In the pgRNA plasmid, the crRNA fraction including the direct repeat sequence and the spacer sequence (MYODI, VEGFA, IL1RN, and DNMT1-1, SEQ ID NO: 121- 124, based on the PAM is 5’-TTTG-3’) are shown in Table 10. To GEBx0013/0047/0063/0064/0070, the direct repeat sequence (DR) is same and it is set forth in SEQ ID NO: 96.

Table 10 The direct repeat sequences and the spacer sequences

2. Genomic DNA isolation

72 hours post-transfection, the supernatant was removed and the cell layer was washed by PBS. Then the genomic DNA was extracted from each well of a 24-well plate using DNA Extraction solution (Denogen (Beijing) Bio Sci & Tech Co. Ltd, Cat. DNS033-48) following the manufacturer’s protocol. All DNA samples (500ng, 260/280 value: 1.8-2.0) were subjected to amplicons NGS analyses.

3. Next-generation sequencing (NGS) analysis

To quantitatively determine the efficiency of editing at the target location in the genome, NGS was utilized to identify the presence of insertions and deletions introduced by gene editing. Primers used for NGS which around the target area within the MY0D1/VEGFA/IL1RN/DNMT1 genes were designed. Additional PCR was performed following the manufacturer’s protocols (Illumina) to add chemistry for sequencing. The amplicons were sequenced on an Illumina iSeq 100 instrument. The reads were aligned to a reference genome after eliminating those having low quality scores. The Q30 rate was more than 0.9. The reads length was between 130 bp- 140 bp. The resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected and the number of wild-type reads versus the number of reads which contain an insertion, substitution, or deletion was calculated. The number of the reads mapped to the reference genome is more than 1000.

The editing efficiency (e.g., the “editing percentage” or “percent editing” or “indel frequency”) is defined as the total number of sequences reads with insertions/deletions (“indels”) or substitutions over the total number of sequences reads, including wild type.

In one in vitro experiment, 5 new Casl2 proteins (GEBx0013/0047/0063/0064/0070) were tested on MYOD1/VEGFA/IL1RN/DNMT1 targets in HEK293T cells using forward transfection. The results are shown in FIG.16A, FIG.17, FIG.18, and FIG.19. From these FIGs, we can see that GEBx0047 has the highest indel frequency on MYODI target while GEBx0064 shows the highest indel frequency on VEGFA and 1E1RN Target. Additionally, GEBx0047/63/64/70 show a comparable editing efficiency on DNMT1-1 target.

EXAMPEE 9: Testing of Genome Cleavage Activity of the CRISPR-Casl2 Complexes in Mammalian Cells

In set of experiments, the HEK293T was cultured in DMEM media supplemented with 10% fetal bovine serum (Gibco™). For lipoplex transfection. A volume of 200 pF of cells with a density of 50,000 cells/well were seeded 24 hours pre-transfection. Cells were transfected with a lipoplex containing Eipofectamine™ 3000 (0.4 pL/well), P3000 (2pE/well), pgRNA/pCasX plasmid (125 ng/well and 375 ng/well, respectively) and Opti-Mem up to 25 pE/well per the manufacturer’s protocol. Plated cells were allowed to settle and adhere for 72 hours in a tissue culture incubator at 37°C and 5% CO2 atmosphere.

72 hours post-transfection, the supernatant was removed and the cell layer was washed by PBS. Then the genomic DNA was extracted from each well of a 24-well plate using DNA Extraction solution (Denogen (Beijing) Bio Sci & Tech Co. Etd, Cat. DNS033-48) per manufacturer’s protocol. All DNA samples (500ng, 260/280 value: 1.8-2.0) were subjected to amplicons NGS analyses to quantitatively determine the efficiency of editing at the target location in the genome.

For NGS, 50 ng of total genomic DNA was input for two-step PCR using KAPA Hifi HotStart Ready Mix Kit (Roche). First-step PCR (PCR 1) resulted in a -200 bp product, followed by indexing PCR (PCR 2) yielding final fragments flanking the Illumina sequencing barcodes for subsequent Next-Seq or iSeq (Illumina, San Diego, CA, USA). PCR 1 reactions were carried out as follows: 98°C for 5 min, then 20 cycles of [98°C for 20 sec; 60°C for 20 sec; 72°C for 20 sec], followed by a final extension at 72°C for 3 min. The indexing PCR 2 reactions were carried out as follows: 98°C for 5 min, then 15 cycles of [98°C for 20 sec; 62°C for 20 sec; 72°C for 20 sec], followed by a final extension at 72°C for 3 min. PCR 2 products were purified by SPRI beads and quantified by VAHTS Library Quantification Kit for Illumina (Vazyme, Cat.NQIOl) on a StepOnePlus Real-time PCR system (Thermo Fisher Scientific). The amplicons were sequenced on an Illumina iSeq 100 or NextSeq instrument. The reads were aligned to a reference genome after eliminating those having low quality scores. Q30 rate is more than 0.9. The reads length is between 130 bp-140 bp. The resulting files containing the reads were mapped to the reference genome (BAM files), where reads that overlapped the target region of interest were selected and the number of wild types reads versus the number of reads which contain an insertion, substitution, or deletion was calculated. The number of the reads mapped the reference genome is more than 1000.

For Indel frequency determination, qualified reads were mapped to the referenced amplicons sequence using CRISPResso2 with default parameters, then subjected to filtering those reads not spanning the corresponding spacer regions. The resulting reads were then estimated the desired and undesired insertion and deletion occurring on the whole spacer region (cleavage offset=-10 and window=10). Total editing frequency was calculated as: [count of total reads] divided by [count of reads with any insertions or deletions]. Out-of-frame frequency was calculated as: [count of edited reads] divided by [count of reads with those insertions or deletions indivisible by 3].

Table HA The direct repeat sequences and the spacer sequences

GEBx0063 and GEBx0064 were tested on MYOD1-TTTG-T1 target in HEK293T cell line. pCasX plasmid harbored GEBx0063 or GEBx0064 CDS were co-transfected with the pgRNA plasmid harbored different length of MYODI -TTTG-T1 spacer (17nt - 25nt, SEQ ID NO: 184- 188, Table 11A). The result is shown in FIG.16B, demonstrated a length of about 21nt-25nt MY0D1- TTTG-T1 spacer is suitable and a more suitable spacer length for GEBx0064 is 25nt.

Table 11B The direct repeat sequences and the spacer sequences

FIG.20A-FIG.23B indicate human cell genome editing efficiency of GEBx0047/0063/0064/0070 at additional 22 loci. The DR sequences and the spacers in pgRNA plasmid referred are shown in Table 11B. GEBx0047 shows the highest editing efficiency on RNF2-TTTG-T1 locus (3.98%, FIG.20A and FIG.20B); GEBx0063 shows more than 10% gene editing efficiency at five loci and has the highest efficiency on TTR-TTTG-T3 locus (14.77%, FIG.21A and FIG.21B); GEBx0064 shows more than 10% gene editing efficiency at three loci and has the highest efficiency on TTR-TTTG-T3 locus (11.78%, FIG.22A and FIG.22B); GEBx0070 shows more than 5% gene editing efficiency at eight loci and has the highest efficiency on HBB- TTTG-T2 locus (7.41%, FIG.23A and FIG.23B).

The nucleotide sequences of GEBx0047/0063/0064/0070 used in this example are shown in Table 9.

The amino acid sequences of Casl2 proteins with NLSs (SEQ ID NO: 183 and SEQ ID NO: 80) and FLAG sequence (SEQ ID NO: 81) are shown in Table 12 (GEBx0013, GEBx0047, GEBx0063, GEBx0064, and GEBx0070, SEQ ID NOs: 148-152), corresponding to SEQ ID NOs: 116-120.

Table 12 The complete amino acid sequences of Casl2 proteins

EXAMPLE 10: About sgRNA

The disclosure also provides an engineered, non-naturally occurring direct repeat sequence, wherein the direct repeat sequence comprises a nucleotide sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, or a variant thereof. In some embodiments, the direct repeat sequence having at least 95% or 98% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156.

In another aspect, the disclosure provides an engineered, non-naturally occurring spacer sequence, wherein the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

The disclosure also provides an engineered, non-naturally occurring sgRNA wherein the sgRNA comprises, in a tandem arrangement:

I. a direct repeat sequence;

II. a spacer sequence, which is capable of hybridizing to a sequence of the target nucleic acid to be manipulated; wherein the direct repeat sequence having at least 90% sequence identity to any one of SEQ ID NOs: 153-156, and the spacer sequence having at least 90% sequence identity to any one of SEQ ID NOs: 157-182 (Table 13).

In some embodiments, the tandem arrangement of the direct repeat sequence and spacer sequence is in a 5’ to 3’ orientation. In some embodiments, the direct repeat sequence having at least 90% or 95% sequence identity to SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the direct repeat sequence set forth in SEQ ID NO: 153 or SEQ ID NO: 156. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NOs: 157-181. In some embodiments, the spacer sequence having at least 95% sequence identity to any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182. In some embodiments, the spacer sequence is set forth in any one of SEQ ID NO: 157, SEQ ID NO: 164, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 171, and SEQ ID NOs: 175 -182.

For example, in some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 157. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 164. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 168. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 169. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 171. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 175. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 176. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 177. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 178. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 179. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 180. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 181. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 182. In some embodiments, sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 157. In some embodiments, sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 158. In some embodiments, sgRNA comprises SEQ ID NO: 156 and SEQ ID NO: 159. In some embodiments, sgRNA comprises SEQ ID NO: 153 and SEQ ID NO: 160.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.