CAS9 VARIANTS HAVING NON-CANONICAL PAM SPECIFICITIES AND USES THEREOF

Title:

CAS9 VARIANTS HAVING NON-CANONICAL PAM SPECIFICITIES AND USES THEREOF

Document Type and Number:

WIPO Patent Application WO/2020/041751

Kind Code:

Abstract:

Some aspects of this disclosure provide strategies, systems, reagents, methods, and kits that are useful for engineering Cas9 and Cas9 variants that have increased activity on target sequences that do not contain the canonical PAM sequence. In some embodiments, fusion proteins comprising such Cas9 variants and nucleic acid editing domains, e.g., deaminase domains, are also provided.

Inventors:

LIU DAVID (US)
WANG TINA (US)
MILLER SHANNON (US)

Application Number:

PCT/US2019/047996

Publication Date:

February 27, 2020

Filing Date:

August 23, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BROAD INST INC (US)
HARVARD COLLEGE (US)

International Classes:

C12N9/00; C12N9/22; C12N9/24; C12N15/62

Foreign References:

US20170121693A1	2017-05-04
US20160340662A1	2016-11-24
US20180073012A1	2018-03-15
CN107177625A	2017-09-19

Other References:

NISHIMASU ET AL.: "Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA", CELL, vol. 156, no. 5, 27 February 2014 (2014-02-27), pages 935 - 949, XP028667665, DOI: 10.1016/j.cell.2014.02.001
See also references of EP 3841203A4

Attorney, Agent or Firm:

MCCOOL, Gabriel, J. et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by any one of SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2. 2. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 3. The Cas9 protein of claim 1 or 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, V1139A, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO:2, wherein X is any amino acid. 4. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 427, 589, 599, 614, 630, 631, 693, 710, 743, 753, 757, 758, 762, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1151, 1180, 1188, 1211, 1221, 1223, 1274, 1290, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2. 5. The Cas9 protein of claim 4, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X427G, X589S, X599R, X614N, X630K, X631A, X693L, X710E, X743I, X753G, X757K, X758H, X762G, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1151E, X1180G, X1188R, X1211R, X1221H, X1223S, X1274R, X1290G, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid.

6. The Cas9 protein of claim 4 or 5, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, E427G, A589S, K599R, D614N, E630K, M631A, F693L, K710E, V743I, R753G, E757K, N758H, E762G, Q768H, N803S, R859S, D861N, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, Q1221H, G1223S, S1274R, V1290G, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 7. The Cas9 protein of any one of claims 1-6, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1, or a combination of conservative mutations thereto. 8. The Cas9 protein of any one of claims 1-7, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1. 9. The Cas9 protein of any one of claims 1-8, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2- 120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2, or a combination of conservative mutations thereto.

10. The Cas9 protein of any one of claims 1-9, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2- 120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2. 11. The Cas9 protein of any one of claims clim 1-10 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2. 12. The Cas9 protein of any one of claims 1-11, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 13. The Cas9 protein of any one of claims 1-12, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. 14. The Cas9 protein of claim 12 or 13, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence.

15. The Cas9 protein of any one of claims 12-14, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 16. The Cas9 protein of any one of claims 1-15, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2. 17. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2. 18. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 19. The Cas9 protein of claim 17 or 18, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X is any amino acid. 20. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by any one of SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 654, 654, 670, 676, 687, 703, 710, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 890, 922, 948, 959, 990, 995, 1014, 1015, 1016, 1016, 1021, 1030, 1036, 1036, 1055, 1057, 1057, 1114, 1127, 1135, 1156, 1156, 1177, 1180, 1184, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1332, 1335, 1338, 1348, 1349, 1367, 1367, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2. 21. The Cas9 protein of claim 20, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X676G, X687R, X703P, X710E, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1338T, X1348V, X1349R, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 22. The Cas9 protein of claim 20 or 21, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, G676G, G687R, T703P, K710E, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, S1338T, I1348V, S1349R, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 23. The Cas9 protein of any one of claims 17-22, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2, or a combination of conservative mutations thereto. 24. The Cas9 protein of any one of claims 17-23, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2. 25. The Cas9 protein of any one of claims 17-24, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6;

N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4;

P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3;

P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn), or a combination of conservative mutations thereto. 26. The Cas9 protein of any one of claims 17-25, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4;

P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3;

P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn). 27. The Cas9 protein of any one of claims claim 17-26 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2. 28. The Cas9 protein of any one of claims 17-27, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 29. The Cas9 protein of any one of claims 17-28, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. 30. The Cas9 protein of claim 28 or 29, wherein the 3ʹ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence.

31. The Cas9 protein of any one of claims 28-30, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 32. The Cas9 protein of any one of claims 17-31, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2. 33. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and1339 of the amino acid sequence provided in SEQ ID NO: 2. 34. The Cas9 protein of claim 33, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X represents any amino acid.

35. The Cas9 protein of claim 33 or 34, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X is any amino acid. 36. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2 werein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1221, 1249, 1253, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and1339 of the amino acid sequence provided in SEQ ID NO: 2. 37. The Cas9 protein of claim 36, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X575S , X596Y, X631L, X649R, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1221H, X1249S, X1253K, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2 wherein X represents any amino acid. 38. The Cas9 protein of claim 36 or 37, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, Q1221H, P1249S, E1253K, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X represents any amino acid. 39. The Cas9 protein of any one of claims 33-38, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3 or a combination of conservative mutations thereto. 40. The Cas9 protein of any one of claims 33-39, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3 41. The Cas9 protein of any one of claims 33-40, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax);

P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2; P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1, or a combination of conservative mutations thereto. 42. The Cas9 protein of any one of claims 33-41, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2;

P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1. 43. The Cas9 protein of any one of claims claim 33-42 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2 44. The Cas9 protein of any one of claims 33-43, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 45. The Cas9 protein of any one of claims 33-44, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 n the same target sequence.

46. The Cas9 protein of claim 44 or 45, wherein the 3ʹ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence. 47. The Cas9 protein of any one of claims 44-46, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 48. The Cas9 protein of any one of claims 33-47, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2 or a corresponding mutation, or mutations, in another Cas9 amino sequence. 49. The Cas9 protein of any one of claims 1-48, wherein the Cas9 exhibits an increased activity on a target sequence comprising a PAM sequence selected from the group consisting of AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, and TTT at its 3ʹ end as compared to

Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 50. The Cas9 protein of any one of claims 1-49, wherein the Cas9 protein exhibits lower off- target activity as compared to an off-target activity of the Streptococcus pyogenes Cas9 domain as provided by SEQ ID NO: 2. 51. A fusion protein comprising (i) the Cas9 protein of any one of claims 1-50, and (ii) an effector domain.

52. The fusion protein of claim 51, wherein the effector domain is a domain that comprises nuclease activity, nickase activity, recombinase activity, deaminase activity, methyltransferase activity, methylase activity, acetylase activity, acetyltransferase activity, transcriptional activation activity, or transcriptional repression activity. 53. The fusion protein of claim 51 or 52, wherein the effector domain is a nucleic acid editing domain. 54. The fusion protein of claim 53, wherein the nucleic acid editing domain comprises a deaminase domain. 55. The fusion protein of claim 54, wherein the deaminase domain is a cytidine deaminase domain. 56. The fusion protein of claim 55, wherein the cytidine deaminase domain is an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. 57. The fusion protein of claim 55 or 56, wherein the cytidine deaminase domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the cytidine deaminase domain of any one of SEQ ID NOs: 27-61. 58. The fusion protein of claim 55 or 56, wherein the cytidine deaminase domain comprises the amino acid sequence of any one of SEQ ID NOs: 27-61.

59. The fusion protein of any one of claims 51-58, wherein the fusion protein further comprises a uracil glycosylase inhibitor (UGI) domain. 60. The fusion protein of claim 59, wherein the UGI domain comprises the amino acid sequence of SEQ ID NO: 115. 61. The fusion protein of any one of claims 51-60, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 122. 62. The fusion protein of any one of claims 51-60, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 123. 63. The fusion protein of any one of claims 51-62, wherein the fusion protein further comprises a second UGI domain. 64. The fusion protein of claim 63, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 123. 65. The fusion protein of claim 63, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 124. 66. The fusion protein of claim 54, wherein the deaminase domain is an adenosine deaminase domain. 67. The fusion protein of claim 66 further comprising a second adenosine deaminase domain.

68. The fusion protein of claim 67, wherein the first adenosine deaminase domain and the second adenosine deaminase domain comprises an ecTadA domain, or variant thereof. 69. The fusion protein of claim 68, wherein the first adenosine deaminase domain and the second adenosine deaminase domain comprise the amino acid sequence of any one of SEQ ID NOs: 62-84. 70. The fusion protein of claim 69, wherein the first adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 62-84. 71. The fusion protein of claim 69, wherein the second adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 62-84. 72. The fusion protein of any one of claims 66-71, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 127. 73. The fusion protein of any one of claims 66-71, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 128. 74. A complex comprising the fusion protein of any one of claims 51-73, and a guide RNA bound to the Cas9 protein.

75. The complex of claim 74, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 76. The complex of claim 75, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, or TTT sequence. 77. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence. 78. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence. 79. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence. 80. The complex of any one of claims 74-79, wherein the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides long.

81. The complex of any one of claims 75-80, wherein the guide RNA comprises a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target sequence. 82. The complex of any one of claims 75-81, wherein the target sequence is a DNA sequence. 83. The complex of claim 82, wherein the target sequence is a sequence in the genome of a mammal. 84. The complex of claim 83, wherein the target sequence is a sequence in the genome of a human. 85. The complex of any one of claims 75-84, wherein the target sequence comprises a sequence associated with a disease or disorder. 86. The complex of claim 85, wherein the target sequence comprises a point mutation associated with a disease or disorder. 87. The complex of claim 86, wherein the complex edits a point mutation in the target sequence. 88. The complex of claim 87, wherein the point mutation is located between about 10 to about 20 nucleotides upstream of the PAM in the target sequence. 89. The complex of claim 87 or 88, wherein the target sequence comprises a T to C point mutation.

90. The complex of claim 89, wherein the complex deaminates the target C point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. 91. The complex of claim 90, wherein the target C point mutation is present in the DNA strand that is not complementary to the guide RNA. 92. The complex of claim 87 or 88, wherein the target sequence comprises a G to A point mutation. 93. The complex of claim 92, wherein the complex deaminates the target A point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. 94. The complex of claim 93, wherein the target A point mutation is present in the DNA strand that is not complementary to the guide RNA. 95. The complex of any one of claims 74-94, wherein the complex exhibits increased deamination efficiency of a point mutation in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the deamination efficiency of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 96. The complex of claim 95, wherein the complex exhibits increased deamination efficiency of a point mutation in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least

1,000,000-fold increased as compared to the deamination efficiency of complex comprising the Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2 on the same target sequence. 97. The complex of any one of claims 90-96, wherein a deamination activity is measured using a deamination assay, PCR, or sequencing. 98. The complex of any one of claims 74-97, wherein the complex produces fewer indels in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 99. The complex of claim 98, wherein the complex produces fewer indels in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold lower as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2 on the same target sequence. 100. The complex of any one of claims 98-99, wherein indels are measured using high- throughput sequencing.

101. The complex of any one of claims 74-100, wherein the complex exhibits a decreased off- target activity as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 102. The complex of claim 101, wherein the off-target activity of the complex is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold decreased as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 103. The complex of any one of claims 75-102, wherein the target sequence is in the genome of an organism. 104. The complex of claim 103, wherein the organism is a prokaryote. 105. The complex of claim 104, wherein the prokaryote is a bacterium. 106. The complex of claim 103, wherein the organism is a eukaryote. 107. The complex of claim 103, wherein the organism is a plant or fungus. 108. The complex of claim 103, wherein the organism is a vertebrate. 109. The complex of claim 108, wherein the vertebrate is a mammal.

110. The complex of claim 109, wherein the mammal is a human. 111. The complex of claim 103, wherein the organism is a cell. 112. The complex of claim 111, wherein the cell is a human cell. 113. A method comprising contacting a nucleic acid with the fusion protein of any one of claims 51-73, and with a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 114. A method comprising contacting a cell with the fusion protein of any one of claims 51-73, and with a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 115. A method comprising contacting a nucleic acid with the complex of any one of claims 74- 112. 116. A method comprising contacting a cell with the complex of any one of claims 74-112. 117. The method of any one of claims 113-116, wherein the contacting is performed in vitro. 118. The method of any one of claims 114-116, wherein the contacting is performed in vivo.

119. A method comprising administering to a subject the fusion protein of any one of claims 51- 73, and a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 120. A method comprising administering to a subject the complex of any one of claims 74-112. 121. The method of any one of claims 113-120, wherein the target sequence of the nucleic acid is a DNA sequence. 122. The method of any one of claims 113-121, wherein the 3ʹ end of the target sequence is not immediately adjacent to the canonical PAM sequence (5¢-NGG-3¢). 123. The method of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAA, GAA, CAA, and TAA. 124. The complex of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAC, GAC, CAC, and TAC. 125. The complex of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAT, GAT, CAT, and TAT. 126. The method of any one of claims 113-125, wherein the target sequence comprises a sequence associated with a disease or disorder.

127. The method of claim 126, wherein the target DNA sequence comprises a point mutation associated with a disease or disorder. 128. The method of claim 127, wherein the activity of the fusion protein, or the activity of the complex, results in a correction of the point mutation. 129. The method of any one of claims 127-128, wherein the target DNA sequence comprises a T to C point mutation associated with a disease or disorder, and wherein the deamination of the mutant C base results in a sequence that is not associated with a disease or disorder. 130. The method of claim 129, wherein the target DNA sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. 131. The method of claim 130, wherein the deamination of the mutant C results in a change of the amino acid encoded by the mutant codon. 132. The method of claim 131, wherein the deamination of the mutant C results in the codon encoding the wild-type amino acid. 133. The method of any one of claims 127-128, wherein the target DNA sequence comprises a G to A point mutation associated with a disease or disorder, and wherein the deamination of the mutant A base results in a sequence that is not associated with a disease or disorder.

134. The method of claim 133, wherein the target DNA sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. 135. The method of claim 134, wherein the deamination of the mutant A results in a change of the amino acid encoded by the mutant codon. 136. The method of claim 135, wherein the deamination of the mutant A results in the codon encoding the wild-type amino acid. 137. The method of any one of claims 113-136, wherein the contacting is in vivo in a subject. 138. The method of claim 137, wherein the subject has or has been diagnosed with a disease or disorder. 139. The method of claim 137 or 138, wherein the disease or disorder is a proliferative disease, a genetic disease, a neoplastic disease, a metabolic disease, or a lysosomal storage disease. 140. A kit comprising a nucleic acid construct, comprising:

(a) a nucleic acid sequence encoding the fusion protein of any one of claims 51-73; and (b) a heterologous promoter that drives expression of the sequence of (a). 141. A kit comprising a nucleic acid construct, comprising:

(a) a nucleic acid sequence encoding the complex of any one of claims 74-112; and (b) a heterologous promoter that drives expression of the sequence of (a).

142. The kit of claim 140 further comprising an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone. 143. A polynucleotide encoding the fusion protein of any one of claims 51-73 or the complex of any one of claims 74-112. 144. A vector comprising a polynucleotide of claim 143. 145. The vector of claim 144, wherein the vector comprises a heterologous promoter driving expression of the polynucleotide encoding the fusion protein or the polynucleotide encoding the complex. 146. A method comprising contacting a cell with the vector of claim 144 or 145. 147. The method of claim 146, wherein the cell vector is transfected into the cell. 148. The method of claim 147, wherein the vector is transfected into the cell using

electroporation, heat shock, or a composition comprising a cationic lipid. 149. A cell comprising the fusion protein of any one of claims 51-73, or a nucleic acid molecule encoding the fusion protein of any one of claims 51-73.

150. A cell comprising the complex of any one of claims 74-112, or a nucleic acid molecule encoding the complex of any one of claims 74-112. 151. A cell comprising the vector of claim 144 or 145. 152. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 122, wherein the SpCas9 has a non-canonical PAM specificity. 153. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 123, wherein the SpCas9 has a non-canonical PAM specificity. 154. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 124, wherein the SpCas9 has a non-canonical PAM specificity. 155. A fusion protein comprising an SpCas9 of any of claims 152-154 and a cytidine deaminase. 156. The fusion protein of claim 155, wherein the cytidine deaminase comprises any one of SEQ ID NOs: 27-61. 157. A fusion protein comprising an SpCas9 of any of claims 152-154 and an adenosine deaminase. 158. The fusion protein of claim 155, wherein the adenosine deaminase comprises any one of SEQ ID NOs: 62-84.

159. A complex comprising a fusion protein of any one of claims 155-158 and a guide RNA.

Description:

CAS9 VARIANTS HAVING NON-CANONICAL PAM SPECIFICITIES AND USES

THEREOF RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Patent Application No.62/722,057 filed August 23, 2018, and to U.S. Provisional Patent Application No.62/886,937, filed August 14, 2019, each of which are incorporated herein by reference. BACKGROUND OF THE INVENTION

[0002] CRISPR-Cas systems, and especially systems based on the Cas9 enzyme from

Streptococcus pyogenes (SpCas9) have successfully been engineered for genome editing and base editing in a wide range of organisms. As one example, base editors have been developed that convert Cas endonucleases into programmable nucleotide deaminases ^{1, 2, 3}, thus facilitating the introduction of C-to-T mutations (by C-to-U deamination) or A-to-G mutations (by A-to-I deamination) without induction of a double-strand break ^{4, 5}.

[0003] One drawback of current genome and base engineering tools (e.g., ZNFs, TALENS, and CRISPR/Cas9) is that they are limited with respect to the DNA sequences that can be targeted. For example, ZNF and TALENS are limited because each system requires the design of a specific DNA- binding portion, the amino acid sequence of which being a function of each individual target nucleotide sequence. CRISPR/Cas9 technologies are also limited. While Cas9 can be programmably targeted to virtually any target sequence by providing a suitable guide RNA, Cas9 strictly requires the presence of a protospacer-adjacent motif (PAM)-- which is typically the canonical nucleotide sequence 5¢-NGG-3¢ (e.g., for SpCas9)--immediately adjacent to the 3¢-end of the targeted nucleic acid sequence in order for the Cas9 to bind and act upon the target sequence. This requirement for a PAM sequence effectively limits the nucleotide sequences which can be efficiently targeted by Cas9. [0004] Accordingly, there is a need for nucleic acid programmable DNA binding proteins, such as Cas9, that are capable of binding target nucleotide sequences that lack canonical PAMs(e.g., 5¢-NGG- 3¢ for SpCas9) in order to expand the scope and flexibility of genome and base editing.

SUMMARY OF THE INVENTION

[0005] The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive immune system that has been modified to enable robust genome and nucleobase engineering in a variety of organisms and cell lines. CRISPR-Cas (CRISPR-associated) systems are protein-RNA complexes that use an RNA molecule (sgRNA) as a guide to localize the complex to a target nucleic acid sequence via base-pairing. In the natural systems, a Cas protein then acts as an endonuclease to cleave the targeted DNA sequence. The target nucleic acid sequence must be both complementary to the sgRNA and also contain a“protospacer-adjacent motif”(PAM) at the 3¢-end of the complementary region in order for the system to function. The requirement for a PAM sequence limits the use of Cas9 technology, especially for applications that require precise Cas9 positioning, such as base editing, which requires a PAM approximately 13-17 nucleotides from the target base and some forms of homology-directed repair, which are most efficient when DNA cleavage occurs ~ 10- 20 base pairs away from a desired alteration. To address this limitation, researchers have harnessed natural CRISPR nucleases with different PAM requirements and engineered existing systems to accept variants of naturally recognized PAMs. Other natural CRISPR nucleases shown to function efficiently in mammalian cells include Staphylococcus aureus Cas9 (SaCas9), Acidaminococcus sp. Cpf1 (AsCpf1), Lachnospiraceae bacterium Cpf1, Campylobacter jejuni Cas9, Streptococcus thermophilus Cas9, and Neisseria meningitides Cas9. None of these mammalian cell-compatible CRISPR nucleases, however, offers a PAM that occurs as frequently as that of SpCas9.

[0006] Some aspects of the disclosure relate to novel Cas9 mutants that are capable of binding to target sequences that do not include a canonical PAM sequence (5¢-NGG-3¢, where N is any nucleotide) at the 3¢-end. The disclosure also provides methods of generating and identifying novel Cas9 variants, e.g., using Phage Assisted Continuous Evolution (PACE) and/or Phage Assisted Non- Continuous Evolution (PANCE), that are capable of recognizing (e.g., binding to) target sequences encompassing the a variety of PAM sequences . In particular, methods and compositions have been developed for targeting sequences that have an adenine (A) at the second nucleic acid position of the PAM (e.g., 5¢-NAN-3¢). It should be appreciated that target sequences having PAMs that lack one or more guanines (Gs) are particularly difficult to target given the paucity of SpCas9 activity (e.g., binding activity) on such sequences. One goal of the disclosure is to provide a repertoire of SpCas9 variants that could be selected from for use in genome and/or base editing applications that are specific for a target nucleic acid sequence (e.g., DNA sequence) based on a particular PAM sequence. Such a catalogue/library of SpCas9 variants would be useful for expanding the scope of genome and base editing, so as not to be restricted by any particular PAM requirement. BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Figures 1A-1C show schematic representations of Phage Assisted Continuous Evolution (PACE) of Cas9 and results of SpCas9 vs xCas9 evolution. Figure 1A, PACE takes place in a fixed- volume“lagoon” that is continuously diluted with fresh host E. coli cells. Upon infection, each selection phage (SP) that encodes a Cas9 variant capable of binding the target PAM and protospacer on the accessory plasmid (AP) induces expression of gene III, resulting in infectious progeny phage that propagate the active Cas9 variant in subsequent host cells. Figure 1B, accessory plasmids representing each of 64 PAM sequences are used to select for Cas9 variants capable of binding to the PAM/protospacer sequences, where RNAP fused to the Cas9 variant induces express ion of gene III upon binding to the sequence having the specific PAM. Figure 1C, data (luciferase assay) for overnight phage propagation reveals on which PAMs SpCas9 and xCas9 have binding activity.

xCas9 has a less strict PAM requirement as compared to SpCas9.

[0008] Figures 2A-B show a schematic representation of a Cas964 PAM Phage Assisted Non- Continuous Evolution (PANCE) and results of SpCas9 vs xCas9 PANCE evolution. Figure 2A, 96 well PANCE format allowed for simultaneous evolution of all 64 PAM sequences. PANCE is lower stringency than PACE as it is not continuous flow, thereby allowing for evolution from low activity. Figure 2B, data (luciferase assay) for PANCE evolution at passage 2 (P2), passage 12 (P12), and passage 16 (P16) for SpCas9 (wt) or xCas9 show an increase in the ability to bind additional PAM sequences.

[0009] Figures 3A-B show clones resulting from PANCE evolution experiments using SpCas9 (N3) after passage 12, including the activity for selected clones. Figure 3A, is a table listing individual clones and their mutations as compared to nuclease inactive SpCas9. The nomenclature of each clone indicates the PAM on which the clone was evolved. For example, clones CAA-2, CAA-3, and CAA-4 were evolved using a 5¢-CAA-3¢-PAM sequence. Figure 3B, shows activity for clones SpCas9, CAA-3, GAT-2, ATG-2, ATG-3, and AGC-3, using a luciferase assay. Clones were obtained from PANCE evolution experiments using SpCas9 (N3) after passage 12.

[0010] Figures 4A-B show clones resulting from PANCE evolution experiments using SpCas9 (N3) after passage 19, including the activity for selected clones. Figure 4A, is a table listing individual clones and their mutations as compared to nuclease inactive SpCas9. The nomenclature of each clone indicates the PAM on which the clone was evolved. For example, clones ACG-1, ACG-2, ACG-3, and ACG-4 were evolved using a 5¢-ACG-3¢-PAM sequence. Figure 4B, shows activity for clones SpCas9, N3.19.CAA1, N3.19.CAA2, N3.19.GAA1, N3.19.GAA2, N3.19.GAC5, N3.19.GAT1, N3.19.GAT3, N3.19.ACG1, N3.19.ACG3, N3.19.ACG6, N3.19.ATG3, and

N3.19.ATG6 using a luciferase assay. Clones were obtained from PANCE evolution experiments using SpCas9 (N3) after passage 19.

[0011] Figures 5A-B show clones resulting from PANCE evolution experiments using xCas9 3.7 (N4) after passage 12, including the activity for selected clones. Figure 5A, is a table listing individual clones and their mutations as compared to xCas93.7. The table indicates the PAM on which each of the clones was evolved. For example, clones N4.12.10 TAT1, N4.12.10 TAT2, and N4.12.10 TAT3 were evolved using a 5¢-TAT-3¢-PAM sequence. Figure 5B, shows activity for clones xCas9 (xCas93.7), TAT-1, TAT-3, GTA-1, GTA-3, and CAC-2 using a luciferase assay. Clones were obtained from PANCE evolution experiments using xCas93.9 (N4) after passage 12.

[0012] Figures 6A-B show clones resulting from PANCE evolution experiments using xCas93.7 (N4) after passage 19, including the activity for selected clones. Figure 6A, is a table listing individual clones and their mutations as compared to xCas93.7. The table indicates the PAM on which each of the clones was evolved. For example, clones N4.19.AAA1, N4.19.AAA2,

N4.19.AAA4, and N4.19.AAA7 were evolved using a 5¢-AAA-3¢-PAM sequence. Figure 6B, shows activity for N4.19.AAA1, N4.19.TAA2, N4.19.TAA5, N4.19.TAT5, N4.19.CAC5, N4.19.CAC6, N4.19.GTA2, N4.19.GTA7, N4.19.GCC2, N4.19.GCC5, and N4.19.GCC8 using a luciferase assay. Clones were obtained from PANCE evolution experiments using xCas93.9 (N4) after passage 19.

[0013] Figure 7 shows the results of mammalian cell editing using cytidine base editor BE3 having various evolved Cas9 clones (top). Indel formation for each of the clones as nuclease active Cas9s is also provided (bottom).

[0014] Figure 8 shows activity data (luciferase assay) for PANCE evolution experiments after passage 2 (N6.2), passage 12 (N6.12) and passage 16 (N6.16) using N4.12.TAT1 as the starting clone (N6). Increased shading indicates increased activity as described in Figure 1C.

[0015] Figures 9A-B show the mutations of TAT1 well as activity data (luciferase assay) on all 64 possible PAM sequences. Figure 9A provides the individual mutations of N4.12.TAT1 (TAT1) as compared to SpCas9. Figure 9B shows activity of TAT1 on all 64 possible PAM sequences.

Increased shading indicates increased activity as described in Figure 1C.

[0016] Figure 10 shows clones of resulting from PANCE evolution experiments using N4.12.TAT1 (N6) after passage 12. The individual mutations in clones N6.12.6, N6.12.7, N6.12.25, and N6.12.28, are shown as compared to TAT1. [0017] Figure 11 shows clones of resulting from PANCE evolution experiments using

N4.12.TAT1 (N6) after passage 18. The individual mutations for each of the listed clones (e.g., N6.18.1-1, N6, 18.1-2, etc.), are shown as compared to TAT1.

[0018] Figure 12 shows activity for N6.18.17-2, N6.18.18-2, N6.18.18-3, N6.18.28-2, N6.18.33-3, N6.18.39-1, N6.18.39-3, N6.18.39-4, N6.18.40-2, N6.18.40-3, N6.18.44-1, SP047a, and SpCas9. using a luciferase assay. Clones were obtained from PANCE evolution experiments using N4.12.TAT1 (N6) after passage 18 (See Figure 11).

[0019] Figures 13A-B show a split-intein PACE configuration to allow evolution of two separate activities of interest. Figure 13A shows that the bacteriophage gIII gene that produces the pIII protein is split into N-terminal (g3N) and C-terminal (g3C) fragments in two separate accessory plasmids (AP1 and AP2). AP1 and AP2 have the same PAM, but a different protospacer (it is not required that they have the same PAM, i.e., both the PAM and protospacer could be changed). Figure 13B shows the workflow for using a split-intein PACE configuration of the gIII gene.

[0020] Figures 14A-C show the evolution and activity of SpCas9 resulting from PACE

experiments using two separate protospacers and split-intein fusion (two allow evolution on two protospacers) as in Figures 13A-B. Figure 14A shows clones resulting from PACE evolution experiments using two protospacers with SpCas9 after passage 4 (P4). Figure 14B shows the ability of the P4 SpCas9 variants incorporated into a BE4max base-editor to support conversion of C to T in CAG, CAT, GAT, CAA, GAA, CGT, or GGG PAMs. Figure 14C shows the ability of the L2-72-4 SpCas9 P4 clone to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs.

[0021] Figures 15A-B show a split-intein PACE configuration (whereby Cas9 is divided into two parts to limit Cas9 concentration) to allow evolution of Cas9 proteins of interest. Figure 15A shows that increasing the SpCas9 concentration increases cleavage of alternative (NAG) PAMs (as reported in Karvelis, T., Gasiunas, G., Young, J., Bigelyte, G., Silanskas, A., Cigan, M., and Siksnys, V. (2015). Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253). Figure 15B shows that the amount of Cas9 protein may be limited in PACE by splitting the inactive Cas9 protein (dCas9) into an N-terminal fragment (dCas9 (1-573)) and a C-terminal fragment (dCas9 (573-end)) and producing the N-terminal fragment from a low-copy number plasmid with a weak promoter (rpoZ).

[0022] Figure 16 shows clones resulting from PACE evolution when a split-intein Cas9 protein with the P4.2.72.4. mutations Experiment P10). The individual mutations for each of the listed clones (e.g., L5.144.2, L5.144.6, etc.), are shown as compared to spCas9 and spCas9 with the P4.2.72.4. mutations.

[0023] Figure 17 shows the ability of the P10 SpCas9 variants from Figure 16 incorporated into a BE4max base-editor to support conversion of C to T in CAG, GAT, TAT, CAT, GAA, CAA-1, or CAA-2 PAMs.

[0024] Figure 18 shows the ability of two P10 SpCas9 variants (P10.5.144.2 and P10.6.144.2) to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs compared to a P4 variant (L2-72-4), SpCas9, and xCas9.

[0025] Figures 19A-C show characterization of a P10 SpCas9 variant with PAM depletion in E. coli. Figure 19A shows a workflow for PAM depletion in E. coli, wherein E. coli containing a Cas9 variant (e.g., P10) are transformed with a library of negative selection plasmids (e.g., pUC ampR with HEK3 protospacer followed by NNNN). See Kleinstiver et al., Engineered CRISPR-Cas9 nucleases with altered PAM specificities, Natur, 523: 481-485. The transformed cells are recovered and Cas9 expression is induced for 1-4 hours. The cells are then plated on carbenicillin media. The plates are then scraped and surviving colonies are sequenced for mutations. Colonies that survive and are sequenced contain PAMs that the P10 Cas9 variant protein could not cut. Figure 19B shows the frequency of PAM sequences present in surviving colonies, wherein more shaded PAM sequences occur more frequently (left), and the activity of P10 Cas9 variant protein on the PAM sequences in a luciferase assay (right). Figure 19C the activity of the P10 SpCas9 variants were characterized by PAM depletion incorporated into a BE4max base-editor to support conversion of C to T in CAG, CAT, GAT, CAA, GAA, CGT, or GGG PAMs

[0026] Figure 20 shows a characterization of the P10 SpCas9 variant protein following PAM depletion as in Figures 19A-19C. The P10 SpCas9 variant protein (left) and xCas9 variant proteins (middle) show preference for the fourth nucleotide in the PAM, wherein C is the most preferred and G is the least preferred. The spCas9 protein (right) does not show this preference. Higher Cas9 protein activity is denoted by darker shading.

[0027] Figure 21 shows clones resulting from split-intein PACE evolution of Cas9 with the P4.2.72.4 mutations Experiment P11) with a AAA PAM. The individual mutations for each of the listed clones (e.g., P11.1.139-2, P11.1.139-4, etc.), are shown as compared to spCas9 with the P4.2.72.4. mutations.

[0028] Figure 22 shows the ability of the P11 SpCas9 variants from Figure 16 incorporated into a BE3 base-editor to support conversion of C to T in CAG, GAT, CAT, GAA, AAA-1, AA1-2, CAA-1, CAA-2, or GGG PAMs.

[0029] Figure 23 shows the ability of two P11 SpCas9 variants (P11-SacB-1 and P11-SacB-2) to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs compared to a P4 variant (L2-72-4), SpCas9, and xCas9.

[0030] Figures 24A-B show clones resulting from split-intein PACE evolution of Cas9 with P12 mutations on AAT (FIG.24A) or TAT (FIG.24B) PAMs. The individual mutations for each of the listed clones (e.g., P12.3.b9-2, P12.3.b10-2 etc.), are shown as compared to spCas9 protein.

[0031] Figures 25A-B show the ability of the P12 SpCas9 variants from Figures 24A-B

incorporated into a BE3 base-editor to support conversion of C to T at sites s893, s1073, s1081, s1140, b3, e1, e2, f1, f2, s33, s34, s35, s36, s37, s38, s39, s40, s41, s43, s44, s45, or s46. Darker shading indicates a higher % of C to T editing (FIG.25A). Figure 25B shows the average C to T editing on NATA, NATT, NATC, or NATG PAMs. pSM060ax is clone P12.3.b9-8 and pSM060ay is clone P12.3.b10-6.

[0032] Figures 26A-B show the ability of two P12 SpCas9 variants (P12.3.b9-8 and P12.3.b10-6) to cleave DNA in bacterial PAM depletion in AAA, AAC, AAT, AAG, CAA, CAC, CAT, CAG, TAA, TAC, TAT, TAG, GAA, GAT, GAG, AGA, AGC, AGT, AGG, CGA, CGC, CGT, CGG, TGA, TGC, TGT, TGG, GGA, GGC, GGT, or GGG PAMs. PPDV is the PAM frequency after Cas9

cutting/frequency of input library, wherein lower numbers signify more active Cas9 proteins.

[0033] Figures 27A-B show a split-intein PACE configuration to allow evolution of Cas9 proteins of interest with 2 protospacers. Figure 27A shows evolution of a split-intein Cas9 using selection on 2 protospacers. A second gene (gVI) is removed from the phage and is used as a selection marker on AP2. AP1 and AP2 have the same PAM, but different protospacers and a different nucleotide immediately 3’ of the PAM. Figure 27B shows clones resulting from split-intein PACE evolution of Cas9 as in Figure 27A. The individual mutations for each of the listed clones (e.g., L2-120-1, L2- 120-2, etc.), are shown as compared to spCas9 protein.

[0034] Figure 28 shows survival-based selection for isolating nuclease-active Cas9 variant proteins. In this selection, cutting identifies nuclease-active PACE variants. SacB is lethal in the presence of sucrose unless it is cut by Cas9, sfGFP loses fluorescence if Cas9 cutting occurs, and kanR confers survival on kanamycin medium if no cutting occurs.

[0035] Figures 29A-B show nuclease-active TAT variants that were identified by SacB selection as in Figure 28. The original spCas9 TAT variant was isolated from PANCE evolution on a TAT PAM (N4.TAT.1), but had no nuclease activity. This N4.TAT.1 (TAT1) Cas9 variant was subcloned from the pool of N4.TAT SP (H840-onward) into a Cas9 plasmid and selected for variants that could cut a SacB selection plasmid with a TAT PAM after a 4 hour induction. Figure 29A shows clones resulting from SacB selection of nuclease-inactive TAT. The individual mutations for each of the listed clones (e.g., SacB-TAT-1, SacB-TAT-2), are shown as compared to SpCas9 and TAT SpCas9 variant proteins. Figure 29B shows the location of mutations in the TAT SpCas9 variant proteins.

[0036] Figures 30A-B show the activity of the TAT SpCas9 variant proteins identified in Figure 29A. Figure 30A shows the ability of the nuclease-active TAT SpCas9 variants (SacB-TAT1 and SacB-TAT2) incorporated into a BE4max base-editor to support conversion of C to T in CAG, GAT, TAT, CAT, GAA-1, GAA-2, CAA-1, CAA-2, or GGG PAMs. Figure 30B shows ability of the SacB- TAT1 and SacB-TAT2 variants to form PAM depletion in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, or GGG PAMs.

[0037] Figure 31 shows the ability of the SacB-TAT-1 SpCas9 protein variant to form insertions or deletions in AAA, AAC, AAT, AAG, CAA, CAC, CAT, CAG, TAA, TAC, TAT, TAG, GAA, GAT, GAG, AGA, AGC, AGT, AGG, CGA, CGC, CGT, CGG, TGA, TGC, TGT, TGG, GGA, GGC, GGT, or GGG PAMs. PPDV is the PAM frequency after Cas9 cutting/frequency of input library, wherein lower numbers signify more active Cas9 proteins.

[0038] Figure 32 shows the location of frequently mutagenized residues by PAM selection.

Positions commonly mutated in SpCas9 variants obtained when evolving on NAN PAMs include: D1135, E1219, D1332.

[0039] Figures 33A-33D show C to T base editing with evolved variants on PAMs. C to T base editing with SpCas9 variants were incorporated into Be4MAX architecture in HEK293T cells. Figure 33A shows C to T base editing with NAA PAMs. Figure 33B shows C to T base editing with NAC PAMs. Figure 33C shows C to T base editing with NAT PAMs. Figure 33D shows C to T base editing with NAG PAMs. Each bar represents the average of 3 independent experiments, and the error bars represents the standard deviation. The“es” SpCas9 variant protein works best on NARH PAMs, with some activity on NARG and NGN PAMS, the“fn” SpCas9 variant protein works best on NRCH PAMs, with some activity on NRCG and NGN PAMs, and the“ax” SpCas9 variant protein works best on NRTH PAMs, with some activity on NRTG and NGN PAMs.

[0040] Figures 34A-34B show C to T base editing with evolved SpCas9 variants on PAMs. C to T base editing with SpCas9 variants were incorporated into BE4MAX architecture in HEK293T cells. Figure 34A shows C to T base editing on NAA, NAC, and NAT PAMs. Figures 34B shows C to T base editing on NAAH, NACH, and NATH PAMs, where H is any base except for G. Each bar represents the average of 3 independent experiments, and the error bars represents the standard deviation.

[0041] Figures 35A-35C show A to G base editing with evolved SpCas9 variants on PAMs. A to G base editing with SpCas9 variants incorporated into ABEMAX architecture in HEK293T cells. Figure 35A shows A to G base editing on NAA/NGA PAMs with es variant SpCas9. Figure 35B shows A to G base editing on NAC/NGC PAMs with fn variant SpCas9. Figure 35C shows A to G base editing on NAG/NGG PAMs with es and fn variant SpCas9 proteins. Each bar represents the average of 2 independent experiments, and the error bars represent the standard deviation.

[0042] Figure 36 show phage-assisted non-continuous evolution (PANCE) of SpCas9 binding activity on non-G PAMs. (A) Original selection scheme for Cas9 DNA binding. w-dSpCas9 expressed by DgIII selection phage (SP) binds to a designated protospacer/PAM sequence upstream of gIII on an accessory plasmid (AP) in host E. coli cells. Host cells and infecting SP are continuously mutagenized by a mutagenesis plasmid (MP). (B) Fold propagation of SP expressing w-dSpCas9 or w-dxCas9 on APs encoding each of all 64 NNN PAM sequences upstream of gIII. (C) Schematic overview of PANCE workflow. Host cells containing an AP and MP are grown to log phase in a deep well plate or tube before being infected with SP. Mutagenesis is induced and SP are allowed to propagate for 6-18 hours before cells are pelleted and the SP-containing supernatant is collected. The SP pool is then used to infect host cells in the next iteration of PANCE. (D) Consensus mutations arising from evolution of w-dSpCas9 (N1) or w-dxCas9 (N2) on NAA (red), NAT (blue), or NAC (green) PAM sequences.

[0043] Figures 37A-37E shows multiple new PACE schemes utilizing a split-intein Cas9 and/or two protospacers. Figure 37A shows new PACE schemes to limit the concentration of spCas9 protein and/or increase the number of Cas9 binding sites. Figure 37B shows SpCas9 individual NAA mutations for each of the listed clones (e.g., N3.GAA-3, N3.GAA-4, etc.), are shown as compared to SpCas9 protein. Figure 37C shows a timecourse of the NAA variants from Figure 37B through evolution. Figure 37D shows SpCas9 individual NAC mutations for each of the listed clones (e.g., N4.CAC-1, N4.CAC-5, etc.), are shown as compared to SpCas9 protein. Also shown is D1135N, R1114G, V1139A, E1219V, Q1221H, R1320V, and R1333K mapped to the SpCas9 crystal structure 4un3. Figure 37E shows SpCas9 individual NAT mutations for each of the listed clones (e.g., SacB.N4.TAT-1, SacB.N4-TAT-3, etc.), are shown as compared to SpCas9 protein. Also shown is D1135N, R1114G, E1219V, H1349R, S1338T, R1335Q, and D1332N mapped to the SpCas9 crystal structure 4un3 (left, lower structure). The lower right structure also shows D1135N, R1114G, E1219V, G1218S, Q1221H, P1321S, R1335, and D1332G mapped to the SpCas9 crystal structure 4un3.

[0044] Figures 38A-38D show characterization of evolved variants and SpCas9-NG through bacterial PAM depletion and mammalian cell indel formation. Figure 38A shows bacterial PAM depletion of SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG using a bacterial NNNN PAM library. The inverse of the depletion score was used to generate enrichment scores of activity on each NNNN PAM, which were then used to create sequence logos (WebLogo3.0). Figure 38B shows indel formation in HEK293T cells across 64 endogenous mammalian sites containing NANN PAMs for SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and SE of three independent biological replicates are shown. Figure 38C provides a summary of indel formation efficiencies in HEK293T cells across 48 endogenous mammalian sites containing NANH (H=non-G) PAMs for SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and standard deviation (SD) of all individual values of three independent biological replicates are plotted. Figure 38D shows DNA targeting specificity of SpCas9, xCas9, and evolved variants SpCas9-NRRH, -NRTH-, and NRCH as determined by % on- target reads resulting from GUIDE-seq analysis using HEK target site 4 in U2OS cells.

[0045] Figure 39A-39E show mammalian C to T and A to G base editing activity of evolved variants and SpCas9-NG. Figure 39A shows cytosine base editing in HEK293T cells across 64 endogenous mammalian sites containing NANN PAMs for BE4-NRRH, BE4-NRTH, BE4-NRCH, and BE4-NG. Mean and SE of three independent biological replicates are shown. Figure 39B shows a summary of cytosine base editing in HEK293T cells across 48 endogenous mammalian sites containing NANH (H=non-G) PAMs for BE4-NRRH, BE4-NRTH, BE4-NRCH, and BE4-NG. Mean and SE of three independent biological replicates are shown. Figure 39C shows adenine base editing in HEK293T cells across 27 endogenous mammalian sites containing NANN PAMs for ABE- NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. Mean and SE of three independent biological replicates are shown. Figure 39D shows the fraction of pathogenic SNPs in the ClinVar Database that could in principle be corrected by a C•G to T•A (left) or A•T to G•C (right) base conversion using NR PAMs. Figure 39E shows the number of possible sgRNAs capable of targeting pathogenic SNPs in the ClinVar Database using NR, NG, or NGG PAMs.

[0046] Figures 40A-40G shows a characterization of PAM preferences using a genomically integrated human cell base editing target sequence library. Figure 40A is a schematic overview of a mammalian cell base editing library experiment. A library of matched sgRNA/protospacer target sites spanning all NNNN PAMs is stably genomically integrated in HEK293T cells. Library cells are then transfected with and selected for genomic integration of plasmids encoding BE4 variants. After antibiotic selection, cells are lysed and the integration of plasmids encoding BE4 variants. After antibiotic selection, cells are lysed and the integrated sgRNA/protospacer site is PCR amplified for HTS analysis. Figure 40B provides a heat map of base editing activity on the NNNN PAM library in HEK293T cells, with positions 2, 3, and 4 of the PAM defined. For each construct, the mean editing across all sites containing the designated PAM over two independent biological replicates, internally normalized against the highest editing value for each construct, is shown.

Figure 40C-E shows the average base editing activity on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH, with PAM positions 2 (C), position 3 (D), or position 4 (E) fixed. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. Figure 40F-40G show the effect of sgRNA length and 5’G mismatches on the base editing efficiency of profiled SpCas9 variants. The percentage decrease of editing efficiency from using a 21 nt sgRNA with either a mached (F) or mismatched (G) 5’G compared to using a matched 20 nt sgRNA is shown for BE4, BE4-NRRH, BE4- NRCH, BE4-NRTH, and BE4-NG on all library sequences containing NAN, NRN, NGN, or NGG PAMs. The mean and SE are plotted.

[0047] Figure 41A-41C shows evolved SpCas9 variants allow correction of pathogenic SNPs using non-G PAMs. Figure 41A provides an overview of adenine base editing strategy for correcting the sickle hemoglobin (HbS) SNP. In HbS, the Glu (GAG codon) at position 6 of normal b-globin (HBB) is mutated to a Val (GTG codon). Targeting this SNP with A•T to G•C base editing on the reverse strand enables a Val to Ala (GTG to GCG) base conversion, leading to the Makassar b-globin variant (HbG) which produces phenotypically normal b-globin. Figure 41B shows A•T to G•C base editing in HEK293T cells engineered with the HbS mutation using a CACC PAM by ABE- NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. This PAM places the target A at position 7, and an off-target A, which leads to a silent pro (CCT) to pro (CCC) mutation, at position 9. Mean and SE of three independent biological replicates are shown. Figure 41C shows A•T to G•C base editing in HEK293T cells engineered with the HbS mutation using a CATG PAM by ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. This PAM places the target A at position 4, and an off-target A, which leads to a silent pro (CCT) to pro (CCC) mutation, at position 6. Mean and SE of three independent biological replicates are shown.

[0048] Figure 42 provides a table of NRNN PAM targeting potential by SpCas9 and SaCs9 variants described herein. The variants SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH are disclosed and discussed herein.

[0049] Figure 43A-43F depicts additional details of Cas9:DNA binding PACE and Cas9 nuclease selections. Figure 43A shows dual AP selection where ώ-dSpCas9 binds two distinct

protospacer/PAM sequences to drive either half of split-intein pIII. Figure 43B shows split-intein Cas9 limits total Cas9 concentration in host cells, thus avoiding saturation of protospacer/PAM binding sites. Residues 574-1368 of Cas9 fused to NpuC is expressed by DgIII SP and ώ–dSpCas9(1- 573) fused to NpuN is encoded on a low copy complimentary plasmid (CP) in host cells. Figure 43C shows a combination of the selection principles from (A) and (B) through use of gVI as an additional PACE-compatible selection marker for phage propagation and DgIIIDgVI SP. Figure 43D shows overnight propagation assay of selection phage (SP) encoding dSpCas9C on host cells containing a complimentary plasmid (CP) providing either ώ–dSpCas9 _N or ώ–dSpCas9 _N-mut and an AP encoding either a AAA or CAA PAM. Figure 43E and 43F show a scheme of survival based selection for Cas9 nuclease activity. Cells containing a high-copy selection plasmid encoding a protospacer/ PAM sequence, sfGFP, and the conditionally lethal protein SacB are transformed with a library of nuclease-active Cas9s encoded on a low-copy plasmid that also includes the matching sgRNA.

Binding and cleavage of the designated PAM/protospacer by Cas9 leads to destruction of the selection plasmid, resulting in loss of both sfGFP and SacB expression, allowing cells to survive on sucrose- containing media.

[0050] Figure 44A-44C show the effects of mutations on PAM recognition by SpCas9 variants. Figure 44A shows the addition of the Y1131C mutation, which was enriched in the later phases of the NAT evolution trajectory, inactivates BE3-NRTH in HEK293T cells. Mean and SE of three independent biological replicates are shown. Figure 44B shows the N-terminal mutations of SpCas9-NRRH, -NRCH, and -NRTH mapped to the SpCas9 crystal structure (4UN3). Figure 44C shows CBE activity of BE3-NRRH, BE3-NRTH, and BE3-NRCH with and without the N-terminal mutations shown in (B) in HEK293T cells. Mean and SE of three independent biological replicates are shown.

[0051] Figure 45A-45D is a characterization of SpCas9, xCas9, and evolved variants (SpCa9- NRTH, SpCas9-NRCH, and SpCas9-NRRH) in bacterial PAM depletion and mammalian indel formation experiments. Figure 45A shows bacterial PAM depletion of SpCas9-NRRH, -NRCH, - NRTH, and SpCas9-NG on a bacterial NNNN PAM library with 1 h, 3 h, and overnight Cas9 induction. The inverse of the depletion score was used to generate enrichment scores of activity on each NNNN PAM, which were then used to create sequence logos (WebLogo3.0). Figure 45B shows indel formation in HEK293T cells across endogenous mammalian sites containing NANN PAMs for xCas9, SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and SE of three independent biological replicates are shown. Figure 45C shows indel formation in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for SpCas9-NRRH, -NRTH, -NRCH, SpCas9-NG, and SpCas9. Mean and SE of three independent biological replicates are shown. Figure 45D shows GUIDE-seq analysis of SpCas9, xCas9, and evolved variants SpCas9-NRRH, -NRTH, and -NRCH targeting HEK site 4 in U2OS cells. GUIDE-seq on-target (indicated by the asterisk) and off-target reads that are greater than or equal to 1% total reads are shown.

[0052] Figure 46A-46C shows the characterization of SpCas9 (BE4), SpCas9-NG (BE4-NG), and evolved CBE and ABE variants in mammalian base editing experiments. Figure 46A shows CBE in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for BE4-NRRH, BE4- NRTH, BE4-NRCH, BE4-NG, and BE4. Mean and SE of three independent biological replicates are shown. Figure 46B shows ABE in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. Mean and SE of three independent biological replicates are shown. For target sites with NGA, NGC, and NGT PAMs, only ABE-NRRH, ABE-NRTH, and ABE-NRCH are shown, respectively, in addition to SpCas9-NG. Figure 46C shows the fraction of pathogenic SNPs in the ClinVar Database with either a single targetable base within the window or multiple targetable bases that could in principle be corrected by a C•G to T•A (top left) or A•T to G•C (top right) base conversion using NR PAMs or C•G to T•A (bottom left) or A•T to G•C (bottom right) base conversion using NG PAMs.

[0053] Figure 47A-47D shows the characterization of PAM preferences of BE4, BE4-NRRH, BE4- NRCH, and BE4-NG using a genomically integrated human cell base editing target sequence library Figure 47A shows the distribution of the number of target sites per PAM within the integrated sgRNA library. Figure 47B shows the PAM preferences for BE4, BE4-NRRH, BE4-NRTH, and BE4- NRCH as determined by base editing on the target sequence library integrated in HEK293T cells. Sequence logos for each construct were created from the CBE activity on each NNNN PAM contained in the library (WebLogo3.0). Figure 47C Average base editing activity on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH, with PAM position 1 fixed. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. Figure 47C-47D shows effect of sgRNA length and 5’G mismatch on base editing efficiency of profiled SpCas9 variants. Average base editing on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH is grouped by sites containing a 20-nt sgRNA with a 5’G matched to the target sequence, a 21-nt sgRNA with a 5’G matched to the target sequence, or a 21-nt sgRNA with a mismatched 5’ nucleotide.

Average editing activity of constructs on NGN (Figure 47D), NAN (Figure 47E), and NGG (Figure 47F) PAMs are shown. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001 (Student’s t test). [0054] Figure 48A-48C shows high-throughput sequencing analysis of sickle cell locus editing by SpCas9 variant-derived ABEs. Figure 48A shows Crispresso2 output showing the HbS mutation in a engineered HEK293T cell line. HEK293T cells were treated with nickase-SpCas9, sgRNA (binding shown in grey), and ssODN containing the point mutation. After two rounds of transfection, sorting, and growth, the cell line sequenced above was isolated and identified to have 100% conversion to the sickle cell anemia allele. Figure 48B shows Crispresso2 output showing ABE activity of ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG in HbS engineered HEK293T cells using a sgRNA (gray bar) targeting a CATG PAM. Figure 48C shows Crispresso2 output showing ABE activity of ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG in HbS engineered HEK293T cells using a sgRNA (gray bar) targeting a CACC PAM. DEFINITIONS

[0055] As used herein and in the claims, the singular forms“a,”“an,” and“the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to“an agent” includes a single agent and a plurality of such agents.

[0056] The term“base editor (BE),” or“nucleobase editor (NBE),” as used herein, refers to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, the base editor is capable of deaminating a base within a nucleic acid. In some embodiments, the base editor is capable of deaminating a base within a DNA molecule. In some embodiments, the base editor is capable of deaminating a cytosine (C) in DNA. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein fused to a nucleic acid editing domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to a cytidine deaminase domain. In some embodiments, the base editor comprises a Cas9 domain (e.g., an evolved Cas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein fused to a cytidine deaminase. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to an cytidine deaminase domain. In some embodiments, the base editor comprises a nuclease-inactive Cas9 (dCas9) fused to a cytidine deaminase domain. In some embodiments, the base editor includes an inhibitor of base excision repair, for example, a UGI domain or a dISN domain.

[0057] In some embodiments, the base editor is capable of deaminating an adenosine (A) in DNA. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein fused to a nucleic acid editing domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to one or more adenosine deaminase domains. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to two adenosine deaminase domains. In some embodiments, the base editor comprises a Cas9 (e.g., an evolvedCas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to two adenosine deaminase domains. In some embodiments, the base editor comprises a nuclease- inactive Cas9 (dCas9) fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a nuclease-inactive Cas9 (dCas9) fused to two adenosine deaminase domains. In some embodiments, the base editor is fused to an inhibitor of base excision repair, for example, a UGI domain, or a dISN domain.

[0058] The term“nucleic acid programmable DNA binding protein” or“napDNAbp” refers to a protein that associates with a nucleic acid (e.g., DNA or RNA), such as a guide nucleic acid (e.g., gRNA), that guides the napDNAbp to a specific nucleic acid sequence, for example, by hybridizing to the target nucleic acid sequence. For example, a Cas9 domain can associate with a guide RNA that guides the Cas9 domain to a specific DNA sequence that has complementary to the guide RNA. In some embodiments, the napDNAbp is a class 2 microbial CRISPR-Cas effector. In some embodiments, the napDNAbp is a Cas9 domain, for example, a nuclease active Cas9, a Cas9 nickase (Cas9n), or a nuclease inactive Cas9 (dCas9). Examples of nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., an evolved Cas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein. It should be appreciated, however, that nucleic acid programmable DNA binding proteins also include nucleic acid programmable proteins that bind RNA. For example, the napDNAbp may be associated with a nucleic acid that guides the napDNAbp to an RNA. Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, though they may not be specifically described in this Application. [0059] As used herein, the term“circular permutant” refers to a protein or polypeptide (e.g., a Cas9) comprising a circular permutation, which is change in the protein’s structural configuration involving a change in order of amino acids appearing in the protein’s amino acid sequence. In other words, circular permutants are proteins that have altered N- and C-termini as compared to a wild-type counterpart, e.g., the wild-type C-terminal half of a protein becomes the new N-terminal half.

Circular permutation (or CP) is essentially the topological rearrangement of a protein’s primary sequence, connecting its N- and C-terminus, often with a peptide linker, while concurrently splitting its sequence at a different position to create new, adjacent N- and C-termini. The result is a protein structure with different connectivity, but which oftern can have the same overall similar three- dimensional (3D) shape, and possibly include improved or altered characteristics, including, reduced proteolytic susceptibility, improved catalytic activity, altered substrate or ligand binding, and/or improved thermostability. Circular permutant proteins can occur in nature (e.g., concanavalin A and lectin). In addition, circular permutation can occur as a result of posttranslational modifications or may be engineered using recombinant techniques.

[0060] The term“circularly permuted Cas9” refers to any Cas9 protein, or variant thereof, that has been occurs as a circular permutant, whereby its N- and C-termini have been topically rearranged. Such circularly permuted Cas9 proteins (“CP-Cas9”), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al.,“Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491–511 and Oakes et al.,“CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, January 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).

[0061] In some embodiments, the napDNAbp is an“RNA-programmable nuclease” or“RNA- guided nuclease.” The terms are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA(s) that is not a target for cleavage. In some embodiments, an RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). Guide RNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. Guide RNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though“gRNA” is also used to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as a single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (i.e., directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 domain. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA and comprises a stem-loop structure. In some embodiments, domain (2) is identical or homologous to a tracrRNA as provided in Jinek et al., Science 337:816-821 (2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in International Patent Application PCT/US2014/054252, filed September 5, 2014, entitled“Switchable Cas9 Nucleases And Uses Thereof,” and International Patent Application PCT/US2014/054247, filed September 5, 2014, entitled“Delivery System For Functional Nucleases,” the entire contents of each are hereby incorporated by reference in their entirety. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.” For example, an extended gRNA will bind two or more Cas9 domains and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA-programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example, Cas9 (also known as Csn1) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.

98:4658-4663 (2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607 (2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821 (2012), the entire contents of each of which are incorporated herein by reference).

[0062] Because RNA-programmable nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to target, in principle, any sequence specified by the guide RNA. Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al., Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al., RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W.Y. et al., Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J.E. et al., Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Research (2013); Jiang, W. et al., RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).

[0063] In general, a“CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a“direct repeat” and a tracrRNA- processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a“spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.

[0064] The term“Cas9” or“Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A“Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A“Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)- associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3¢-5¢ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply“gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A.,

Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.98:4658-4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and“A programmable dual-RNA- guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

[0065] A nuclease-inactivated Cas9 domain may interchangeably be referred to as a“dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science.337:816-821(2012); Qi et al.,“Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell.28;152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non- complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S.

pyogenes Cas9 (Jinek et al., Science.337:816-821(2012); Qi et al., Cell.28;152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2).

[0066] In some embodiments, proteins comprising fragments of Cas9 are provided. In some embodiments, the fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.” A Cas9 variant shares homology to Cas9. For example a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, wild type Cas9 corresponds to Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1, SEQ ID NO: 1 (nucleotide); SEQ ID NO: 2 (amino acid)).

[0069] In some embodiments, Cas9 refers to a Cas9 nickase having a D10A substitution (e.g., S.

(single underline: HNH domain; double underline: RuvC domain)

[0070] In other embodiments, Cas9 refers to a Cas9 nickase having a H840A substitution (e.g., S.

DLSQLGGD (SEQ ID NO: 8) (single underline: HNH domain; double underline: RuvC domain; H840A mutation shown in bold) [0071] In still other embodiments, Cas9 refers to a dead Cas9 having D10A and H840A substitutions (e.g., S. pyogenes Cas9 Q99ZW2 (D10A) (H840A)) (SEQ ID NO: 9):

(D10A and H840A mutations shown in bold; see, e.g., Qi et al., Repurposing CRISPR as an RNA- guided platform for sequence-specific control of gene expression. Cell.2013; 152(5):1173-83, the entire contents of which are incorporated herein by reference).

[0072] In some embodiments, Cas9 refers to Cas9 protein derived from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisI (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1), Listeria innocua (NCBI Ref: NP_472073.1), Campylobacter jejuni (NCBI Ref: YP_002344900.1); Geobacillus stearothermophilus (NCBI Ref: NZ_CP008934.1); or Neisseria meningitidis (NCBI Ref:

YP_002342100.1) or to a Cas9 from any other organism.

[0073] In some embodiments, a Cas9 domain comprising one or more mutations provided herein is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 92%, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to SEQ ID NO: 2. In some embodiments, variants of a Cas9 domain comprising one or more mutations provided herein are provided having amino acid sequences which are shorter, or longer than SEQ ID NO: 2, by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids, or more.

[0074] In some embodiments, the Cas9 domain comprises a D10A mutation, while the residue at position 840 remains a histidine relative to the amino acid sequence as provided in SEQ ID NO: 2, or at corresponding positions in any of the amino acid sequences provided in SEQ ID NO: 2. Without wishing to be bound by any particular theory, the presence of the catalytic residue H840 restores the activity of the Cas9 to cleave the non-edited (e.g., non-deaminated) strand containing a G opposite the targeted C. Restoration of H840 (e.g., from A840) does not result in the cleavage of the target strand containing the C. Such Cas9 variants are able to generate a single-strand DNA break (nick) at a specific location based on the gRNA-defined target sequence, leading to repair of the non-edited strand, ultimately resulting in a base change (e.g., a G to A change) on the non-edited strand. Briefly, the C of a C-G base pair can be deaminated to a U by a deaminase, e.g., an APOBEC deaminase. Nicking the non-edited strand, the strand having the G, facilitates removal of the G via mismatch repair mechanisms. Uracil-DNA glycosylase inhibitor protein (UGI) inhibits Uracil-DNA glycosylase (UDG), which prevents removal of the U.

[0075] In other embodiments, dCas9 variants having mutations other than D10A and H840A are provided, which, e.g., result in nuclease inactivated Cas9 (dCas9). Such mutations, by way of example, include other amino acid substitutions at D10 and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvC1 subdomain).

[0076] The term“Cas9 nickase” or“Cas9n” or“nCas9” as used herein, refers to a Cas9 domain that is capable of cleaving one strand of the duplexed nucleic acid molecule (e.g., a duplexed DNA molecule). In some embodiments, a Cas9 nickase comprises a D10A mutation and has a histidine at position H840 of SEQ ID NO: 2, or a corresponding mutation in any of SEQ ID NOs: 2. For example, in some embodiments, a Cas9 nickase comprises the amino acid sequence as set forth in SEQ ID NO: 8 comprising the H840A substitution. Such a Cas9 nickase (Cas9n) has an active HNH nuclease domain and is able to cleave the non-targeted strand of DNA, i.e., the strand bound by the gRNA. Further, such a Cas9 nickase has an inactive RuvC nuclease domain and is not able to cleave the targeted strand of the DNA, i.e., the strand where base editing is desired. In some embodiments, any of the Cas9 domains provided herein comprises a D10A mutation (e.g., SEQ ID NO: 7). In some embodiments, any of the Cas9 domains provided herein comprises a H840A mutation (SEQ ID NO: 8). Exemplary Cas9 nickases are shown below. However, it should be appreciated that additional Cas9 nickases that generate a single-stranded DNA break of a DNA duplex would be apparent to the skilled artisan and are within the scope of this disclosure.

[0077] In some embodiments, Cas9 fusion proteins as provided herein comprise the full-length amino acid sequence of a Cas9 domain, e.g., one of the sequences provided above. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or a sgRNA, but does not comprise a functional nuclease domain, e.g., it comprises only a truncated version of a nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and Cas9 fragments will be apparent to those of skill in the art. In some embodiments, a Cas9 fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 domain. In some embodiments, a Cas9 fragment comprises at least at least 100 amino acids in length. In some embodiments, the Cas9 fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, or at least 1600 amino acids of a corresponding wild type Cas9 domain. In some embodiments, the Cas9 fragment comprises an amino acid sequence that has at least 10, at least 15, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, or at least 1200 identical contiguous amino acid residues of a corresponding wild type Cas9 domain. In some embodiments, the wild-type protein is S. pyogenes Cas9 (SpCas9) of SEQ ID NO: 2.

[0078] In some embodiments, Cas9 fusion proteins as provided herein comprise the full-length amino acid sequence of a Cas9 domain, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and fragments will be apparent to those of ordinary skill in the art. In some

embodiments, Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1);

Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref:

NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquis I (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1); Geobacillus

stearothermophilus (NCBI Ref: NZ_CP008934.1); Listeria innocua (NCBI Ref: NP_472073.1); Campylobacter jejuni (NCBI Ref: YP_002344900.1); or Neisseria. meningitidis (NCBI Ref:

YP_002342100.1).

[0079] The term“deaminase” or“deaminase domain,” as used herein, refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism, that does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase from an organism.

[0080] In some embodiments, the deaminase or deaminase domain is a cytidine deaminase, catalyzing the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil. In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine or cytosine in deoxyribonucleic acid (DNA). In some embodiments, the cytidine deaminase domain comprises the amino acid sequence of any one disclosed herein. In some embodiments, the cytidine deaminase or cytidine deaminase domain is a naturally-occurring cytidine deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the cytidine deaminase or cytidine deaminase domain is a variant of a naturally-occurring cytidine deaminase from an organism that does not occur in nature. For example, in some embodiments, the cytidine deaminase or cytidine deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring cytidine deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse.

[0081] In some embodiments, the deaminase or deaminase domain is an adenosine deaminase, which catalyzes the hydrolytic deamination of adenine or adenosine. In some embodiments, the deaminase or deaminase domain is an adenosine deaminase, catalyzing the hydrolytic deamination of adenosine or deoxyadenosine to inosine or deoxyinosine, respectively. In some embodiments, the adenosine deaminase catalyzes the hydrolytic deamination of adenine or adenosine in

deoxyribonucleic acid (DNA). The adenosine deaminases (e.g., engineered adenosine deaminases, evolved adenosine deaminases) provided herein may be from any organism, such as a bacterium. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism. In some embodiments, the deaminase or deaminase domain does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase. In some embodiments, the adenosine deaminase is from a bacterium, such as E.coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase. For example, the truncated ecTadA may be missing one or more N- terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the ecTadA deaminase does not comprise an N-terminal methionine.

[0082] In some embodiments, the TadA deaminase is an N-terminal truncated TadA. In certain embodiments, the adenosine deaminase comprises the amino acid sequence:

[0083] In some embodiments the TadA deaminase is a full-length E. coli TadA deaminase. For example, in certain embodiments, the adenosine deaminase comprises the amino acid sequence:

[0084] It should be appreciated, however, that additional adenosine deaminases useful in the present application would be apparent to the skilled artisan and are within the scope of this disclosure. For example, the adenosine deaminase may be a homolog of an ADAT. Exemplary ADAT homologs include, without limitation:

[0085] The term“effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease. In some embodiments, an effective amount of a fusion protein provided herein, e.g., of a fusion protein comprising a Cas9 domain and a nucleic acid editing domain (e.g., a deaminase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a deaminase, a recombinase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors such as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited; on the cell or tissue being targeted; and on the agent (e.g., Cas9 domain, fusion protein, vector, cell, etc.) being used.

[0086] The term“immediately adjacent” as used in the context of two nucleic acid sequences refers to two sequences that directly abut each other as part of the same nucleic acid molecule and are not separated by one or more nucleotides. Accordingly, sequences are immediately adjacent, when the nucleotide at the 3ʹ-end of one of the sequences is directly connected to nucleotide at the 5ʹ-end of the other sequence via a phosphodiester bond.

[0087] The term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., two domains of a fusion protein, such as, for example, a nuclease-inactive Cas9 domain and a nucleic acid editing domain (e.g., a deaminase domain). A linker may be, for example, an amino acid sequence, a peptide, or a polymer of any length and composition. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease, including a Cas9 nuclease domain, and the catalytic domain of a nucleic-acid editing protein. In some embodiments, a linker joins a dCas9 and a nucleic-acid editing protein. In some embodiments, a linker joins a Cas9n and a nucleic-acid editing protein. In some embodiments, a linker joins an RNA- programmable nuclease domain and a UGI domain. In some embodiments, a linker joins a dCas9 and a UGI domain. In some embodiments, a linker joins a Cas9n and a UGI domain. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 1-100 amino acids in length, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. In some

embodiments, a linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89), which may also be referred to as the XTEN linker. In some embodiments, a linker comprises the amino acid sequence SGGS (SEQ ID NO: 90). In some embodiments, a linker comprises the amino acid sequence (SGGS)2-SGSETPGTSESATPES-(SGGS)2 (SEQ ID NO: 96), which may also be referred to as (SGGS)2-XTEN-(SGGS)2. In some embodiments, a linker comprises (SGGS)n (SEQ ID NO: 92), (GGGS)n (SEQ ID NO: 94), (GGGGS)n (SEQ ID NO: 96), (G)n (SEQ ID NO: 97), (EAAAK)n (SEQ ID NO: 99), (GGS)n (SEQ ID NO: 101), SGGS(GGS)n (SEQ ID NO: 103), (SGGS)n-SGSETPGTSESATPES-(SGGS)n (SEQ ID NO: 98), or (XP)n motif, or a combination of any of these, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, n is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. In some

embodiments, n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence:

[0088] The term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 ^th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).

[0089] The terms“nucleic acid” and“nucleic acid molecule,” as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleoside, a nucleotide, or a polymer of nucleotides. Typically, polymeric nucleic acids, e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage. In some embodiments,“nucleic acid” refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides). In some embodiments,“nucleic acid” refers to an oligonucleotide chain comprising three or more individual nucleotide residues. As used herein, the terms“oligonucleotide” and“polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides). In some embodiments,“nucleic acid” encompasses RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms“nucleic acid,”“DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5¢ to 3¢ direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5- methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8- oxoguanosine, O(6)-methylguanine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2¢-fluororibose, ribose, 2¢-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g.,

phosphorothioates and 5¢-N-phosphoramidite linkages). In some embodiments, an RNA is an RNA associated with the Cas9 system. For example, the RNA may be a CRISPR RNA (crRNA), a trans- encoded small RNA (tracrRNA), a single guide RNA (sgRNA), or a guide RNA (gRNA).

[0090] The term“nucleic acid editing domain,” as used herein refers to a protein or enzyme capable of making one or more modifications (e.g., deamination of a cytidine residue) to a nucleic acid (e.g., DNA or RNA). Exemplary nucleic acid editing domains include, but are not limited to a deaminase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an

acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments, the nucleic acid editing domain is a deaminase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments, the nucleic acid editing domain is a deaminase domain (e.g., a cytidine deaminase, such as an APOBEC or an AID deaminase, or an adenosine deaminase, such as ecTadA). In some embodiments, the nucleic acid editing domain is a cytidine deaminase domain (e.g., an APOBEC or an AID deaminase). In some embodiments, the nucleic acid editing domain is an adenosine deaminase domain (e.g., an ecTadA).

[0091] The term“nuclear localization sequence” or“NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 113) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 114).

[0092] The term“proliferative disease,” as used herein, refers to any disease in which cell or tissue homeostasis is disturbed in that a cell or cell population exhibits an abnormally elevated proliferation rate. Proliferative diseases include hyperproliferative diseases, such as pre-neoplastic hyperplastic conditions and neoplastic diseases. Neoplastic diseases are characterized by an abnormal proliferation of cells and include both benign and malignant neoplasias. Malignant neoplasia is also referred to as cancer.

[0093] The terms“protein,”“peptide,” and“polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a

carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, or synthetic, or any combination thereof. The term“fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins, or at least two identical protein domains (i.e., a homodimer). One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic acid editing protein. In some embodiments, a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 ^th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

[0094] The term“subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a plant or a fungus. In some embodiments, the subject is a research animal (e.g., a rat, a mouse, or a non-human primate). In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex, of any age, and at any stage of development.

[0095] The term“target site” refers to a nucleic acid sequence or a nucleotide within a nucleic acid that is targeted or modified by an effector domain that is fused to a napDNAbp. In some embodiments, a“target site” is a sequence within a nucleic acid molecule that is deaminated by a deaminase or a fusion protein comprising a deaminase, (e.g., a dCas9-deaminase fusion protein or a Cas9n-deaminase fusion protein provided herein). In some embodiments, the target site refers to a sequence within a nucleic acid molecule that is cleaved by a napDNAbp (e.g., a nuclease active Cas9 domain) provided herein. The target site is contained within a target sequence (e.g., a target sequence comprising a reporter gene, or a target sequence comprising a gene located in a safe harbor locus).

[0096] The terms“treatment,”“treat,” and“treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms“treatment,”“treat,” and“treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.

[0097] The term“pharmaceutical composition,” as used herein, refers to a composition that can be administrated to a subject in the context of treatment of a disease or disorder. In some embodiments, a pharmaceutical composition comprises an active ingredient, e.g., a nuclease or a nucleic acid encoding a nuclease, and a pharmaceutically acceptable excipient.

[0098] The term“uracil glycosylase inhibitor” or“UGI,” as used herein, refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme. In some embodiments, a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, a UGI domain comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 115-120. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in SEQ ID NO: 115-120. In some embodiments, a UGI comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 115-120, or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 115-120. In some embodiments, proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as“UGI variants.” A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI variant comprises a fragment of UGI, such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild- type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI comprises the amino acid sequence of SEQ ID NO: 115, as set forth below. Exemplary Uracil-DNA glycosylase inhibitor (UGI; >sp|P14739|UNGI_BPPB2)

MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVML LTSDAPEYKPW ALVIQDSNGENKIKML (SEQ ID NO: 115).

[0099] The term“catalytically inactive inosine-specific nuclease,” or“dead inosine-specific nuclease (dISN),” as used herein, refers to a protein that is capable of inhibiting an inosine-specific nuclease. Without wishing to be bound by any particular theory, catalytically inactive inosine glycosylases (e.g., alkyl adenine glycosylase [AAG]) will bind inosine, but will not create an abasic site or remove the inosine, thereby sterically blocking the newly-formed inosine moiety from DNA damage/repair mechanisms. In some embodiments, the catalytically inactive inosine-specific nuclease may be capable of binding an inosine in a nucleic acid but does not cleave the nucleic acid.

Exemplary catalytically inactive inosine-specific nucleases include, without limitation, catalytically inactive alkyl adenosine glycosylase (AAG nuclease), for example, from a human, and catalytically inactive endonuclease V (EndoV nuclease), for example, from E. coli. In some embodiments, the catalytically inactive AAG nuclease comprises an E125Q mutation as shown in SEQ ID NO: 40, or a corresponding mutation in another AAG nuclease. In some embodiments, the catalytically inactive AAG nuclease comprises the amino acid sequence set forth in SEQ ID NO: 40. In some

embodiments, the catalytically inactive EndoV nuclease comprises an D35A mutation as shown in SEQ ID NO: 41, or a corresponding mutation in another EndoV nuclease. In some embodiments, the catalytically inactive EndoV nuclease comprises the amino acid sequence set forth in SEQ ID NO: 41. It should be appreciated that other catalytically inactive inosine-specific nucleases (dISNs) would be apparent to the skilled artisan and are within the scope of this disclosure. Various examples include:

Truncated AAG (H. sapiens) nuclease (E125Q); mutated residue shown in bold. KGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRIVETQAYLGPEDEAAHSRG GRQTPRNR GMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRQLRSTLRKGTASR VLKDRELC SGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVGVGHAGEWARKPL RFYVRGSP WVSVVDRVAEQDTQA (SEQ ID NO: 116); and

EndoV nuclease (D35A); mutated residue shown in bold.

DLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGEVTRAAMVLLKYPSLE LVEYKVARIAT TMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGVASHFGLLVDVPTIG VAKKRLCG KFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSALAWVQRCMKGYR LPEPTRWA DAVASERPAFVRYTANQP (SEQ ID NO: 117). DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

[00100] Streptococcus pyogenes Cas9 (SpCas9) is a widely-utilized genome-editing tool, but is restricted in genome targeting by the requirement for an NGG PAM sequence, which can be limiting for precision genome editing applications such as base editing, homology-directed repair, and predictable template-free genome editing. While SpCas9 variants with alternative PAM requirements have been previously reported, their targeting scope remains restricted primarily to G-containing PAMs.

[00101] The present application provides three SpCas9 variants capable of recognizing NRTH, NRRH, and NRCH PAMs, respectively, using an improved phage-assisted continuous evolution (PACE) Cas9 binding selection. These PAM sequence preferences are provided for these SpCas9 variants, along with the previously reported SpCas9-NG variant, by cytosine base editing, indel formation, and adenine base editing in a panel of 64 mammalian potential cell target sites. In further aspects, the present application provides the editing efficiencies of the SpCas9 variants on a mammalian cell library of ~12,000 genomically integrated sgRNA/protospacer targets.

[00102] Some aspects of this disclosure provide Cas9 proteins (e.g., SgCas9) that efficiently target nucleic acid sequences that do not include the canonical PAM sequence (5´-NGG-3´, where N is any nucleotide, for example A, T, G, or C) at their 3’-ends. It should be appreciated that the phrase“Cas9 proteins” can refer to isolated Cas9 proteins or Cas9 domains as part of fusion proteins. In some embodiments, the Cas9 domains provided herein comprise one or more mutations identified in directed evolution experiments using a target sequence library comprising randomized PAM sequences. The non-PAM restricted Cas9 domains provided herein are useful for targeting DNA sequences that do not comprise the canonical PAM sequence at their 3’-end and thus greatly extend the applicability and usefulness of Cas9 technology for gene editing. The evolution of Cas9 domains that are not restricted to the canonical 5´-NGG-3´ PAM sequence has been previously described, for example, in International Patent Application No., PCT/US2016/058345, filed October 22, 2016, and published as Patent Publication No. WO 2017/070633, published April 27, 2017, entitled“Evolved Cas9 Proteins for Gene Editing” which is herein incorporated by reference in its entirety. In addition to the Cas9 mutations identified and proteins listed in Publication No. WO 2017/070633, provided herein are novel additional mutations and Cas9 domains that have activity on target sequences comprising non-canonical PAM sequences. It should be understood that any of the mutations listed in Patent Publication No. WO 2017/070633 may be combined with or used in lieu of any of the mutations or Cas9 domains disclosed herein, unless explicity stated otherwise.

[00103] Some aspects of this disclosure provide fusion proteins that comprise a Cas9 domain and an effector domain, for example, a nucleic acid editing domain, such as a deaminase domain, a nuclease domain, a nickase domain, a recombinase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain, or a transcriptional repressor domain.

[00104] The deamination of a nucleobase by a deaminase can lead to a point mutation at the specific residue, which is referred to herein as nucleic acid editing. Fusion proteins comprising a Cas9 domain or variant thereof and a nucleic acid editing domain can thus be used for the targeted editing of nucleic acid sequences. Such fusion proteins are useful for targeted editing of DNA in vitro, e.g., for the generation of mutant cells or animals; for the introduction of targeted mutations, e.g., for the correction of genetic defects in cells ex vivo, e.g., in cells obtained from a subject that are subsequently re-introduced into the same or another subject; and for the introduction of targeted mutations, e.g., the correction of genetic defects or the introduction of deactivating mutations in disease-associated genes in a subject in vivo. Typically, the Cas9 domain of the fusion proteins described herein is a Cas9 domain comprising one or more mutations provided herein (e.g., an “xCas9” domain) that has impaired nuclease activity (e.g., a nuclease-inactive xCas9 domain). For example, in some embodiments, the Cas9 domain comprises a D10A and/or a H840A mutation in the amino acid sequence provided in SEQ ID NO: 2. Methods for the use of fusion proteins comprising Cas9 as described herein are also provided.

[00105] Additional suitable nuclease-inactive Cas9 domains will be apparent to those of skill in the art based on this disclosure. Such additional exemplary suitable nuclease-inactive Cas9 domains include, but are not limited to, D10A, D839A, H840A, N863A, D10A/D839A, D10A/H840A, D10A/N863A, D839A/H840A, D839A/N863A, D10A/D839A/H840A, and

D10A/D839A/H840A/N863A mutant proteins (See, e.g., Prashant et al.,“Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,” Nature Biotechnology, 2013; 31(9): 833-838, the entire contents of which are incorporated herein by reference). In some embodiments, the Cas9 domain comprises a D10A mutation of the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2.

[00106] The base editors disclosed herein may also comprise a circular permutant Cas9 variant. The term“circularly permuted Cas9” refers to any Cas9 protein, or variant thereof, that occurs or has been modify to occur as a circular permutant, whereby its N- and C-termini have been topically rearranged. Such circularly permuted Cas9 proteins (“CP-Cas9”), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al.,“Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491–511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, January 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).

[00107] Any of the Cas9 proteins described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular permutant variant.

[00108] In various embodiments, the circular permutants of Cas9 may have the following structure: N-terminus-[original C-terminus]– [optional linker]– [original N-terminus]-C-terminus.

[00109] As an example, the present disclosure contemplates the following circular permutants of S. pyogenes Cas9 (based on 1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) of SEQ ID NO: 6:

[00110] N-terminus-[1268-1368]-[optional linker]-[1-1267]-C-terminus;

[00111] N-terminus-[1168-1368]-[optional linker]-[1-1167]-C-terminus;

[00112] N-terminus-[1068-1368]-[optional linker]-[1-1067]-C-terminus;

[00113] N-terminus-[968-1368]-[optional linker]-[1-967]-C-terminus;

[00114] N-terminus-[868-1368]-[optional linker]-[1-867]-C-terminus;

[00115] N-terminus-[768-1368]-[optional linker]-[1-767]-C-terminus;

[00116] N-terminus-[668-1368]-[optional linker]-[1-667]-C-terminus;

[00117] N-terminus-[568-1368]-[optional linker]-[1-567]-C-terminus;

[00118] N-terminus-[468-1368]-[optional linker]-[1-467]-C-terminus;

[00119] N-terminus-[368-1368]-[optional linker]-[1-367]-C-terminus;

[00120] N-terminus-[268-1368]-[optional linker]-[1-267]-C-terminus;

[00121] N-terminus-[168-1368]-[optional linker]-[1-167]-C-terminus;

[00122] N-terminus-[68-1368]-[optional linker]-[1-67]-C-terminus; or

[00123] N-terminus-[10-1368]-[optional linker]-[1-9]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc). [00124] In particular embodiments, the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (SEQ ID NO: 6):

[00125] N-terminus-[102-1368]-[optional linker]-[1-101]-C-terminus;

[00126] N-terminus-[1028-1368]-[optional linker]-[1-1027]-C-terminus;

[00127] N-terminus-[1041-1368]-[optional linker]-[1-1043]-C-terminus;

[00128] N-terminus-[1249-1368]-[optional linker]-[1-1248]-C-terminus; or

[00129] N-terminus-[1300-1368]-[optional linker]-[1-1299]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

[00130] In still other embodiments, the circular permutant Cas9 has the following structure (based on S. pyogenes Cas9 (SEQ ID NO: 6):

[00131] N-terminus-[103-1368]-[optional linker]-[1-102]-C-terminus;

[00132] N-terminus-[1029-1368]-[optional linker]-[1-1028]-C-terminus;

[00133] N-terminus-[1042-1368]-[optional linker]-[1-1041]-C-terminus;

[00134] N-terminus-[1250-1368]-[optional linker]-[1-1249]-C-terminus; or

[00135] N-terminus-[1301-1368]-[optional linker]-[1-1300]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

[00136] In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length. In some embodiments, the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all.

[00137] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, The C-terminal fragment may correspond to the C- terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1300-1368), or the C- terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., SEQ ID NO: 6). The N-terminal portion may correspond to the N-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1-1300), or the N- terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., of SEQ ID NO: 6).

[00138] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, the C-terminal fragment that is rearranged to the N- terminus, includes or corresponds to the C-terminal 30% or less of the amino acids of a Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 6). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of the amino acids of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410 residues or less of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some

embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 357, 341, 328, 120, or 69 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6).

[00139] In other embodiments, circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S. pyogenes Cas9 of SEQ ID NO: 6: (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into two halves: an N-terminal region and a C-terminal region; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to preceed the original N-terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue. The CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain. For example, the CP site may be located (relative to the S. pyogenes Cas9 of SEQ ID NO: 6) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N-terminal amino acid. Nomenclature of these CP-Cas9 proteins may be referred to as Cas9-CP ¹⁸¹, Cas9-CP ¹⁹⁹, Cas9-CP ²³⁰, Cas9-CP ²⁷⁰, Cas9-CP ³¹⁰, Cas9-CP ¹⁰¹⁰, Cas9-CP ¹⁰¹⁶, Cas9-CP ¹⁰²³, Cas9-CP ¹⁰²⁹, Cas9-CP ¹⁰⁴¹, Cas9- CP ¹²⁴⁷, Cas9-CP ¹²⁴⁹, and Cas9-CP ¹²⁸², respectively. This description is not meant to be limited to making CP variants from SEQ ID NO: 6, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entirely. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant. [00140] Exemplary CP-Cas9 amino acid sequences, based on the Cas9 of SEQ ID NO: 6, are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 6 and any examples provided herein are not meant to be limiting.

[00141] CP1012

[00146] Exemplary C-terminal fragments of Cas9, based on the Cas9 of SEQ ID NO: 6, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C- terminal fragments of Cas9 are exemplary and are not meant to be limiting.

Cas9 domains

[00152] Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢, where N is A, C, G, or T) at its 3¢- end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢- NGG-3¢ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NNG-3´ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢-NNA-3¢ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢-NNC-3¢ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NNT-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGT-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGA-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGC-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAA-3´ PAM sequence at its 3´-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAC-3´ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAT-3´ PAM sequence at its 3´-end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAG-3´ PAM sequence at its 3´-end.

[00153] It should be appreciated that any of the amino acid mutations described herein, (e.g., A262T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., a A262T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an isoleucine, may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.

Mutations in Wild-Type SpCas9

[00154] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by any of the sequences set forth in SEQ ID NO: 2, 4, or 6-11, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NO: 2. In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain. In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 domain. In some embodiments, the Cas9 protein is a nuclease- inactive Cas9 protein. In some embodiments, the Cas9 domain is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2, wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, V1139A, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2, 4, or 6-11, wherein X represents any amino acid.

[00155] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NO: 2, 4, or 6-11. In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain.

[00156] In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein is a nuclease-inactive Cas9 protein. In some embodiments, the Cas9 protein is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NOs: 2, 4, or 6-11, wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in SEQ ID NO: 2, 4, or 6-11, wherein X represents any amino acid. [00157] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by SEQ ID NOs: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in another Cas9 sequence (e.g., any of the sequences of 2, 4, or 6-11). In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain. In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein is a nuclease-inactive Cas9 domain. In some embodiments, the Cas9 protein is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00158] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations in amino acid residues selected from the group consisting of 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00159] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00160] In some embodiments, the present disclosure may utilize any of the Cas9 variants disclosed in the SEQUENCES section herein.

[00161] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, V1139A, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00162] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of 10, 177, 218, 322, 367, 427, 589, 599, 614, 630, 631, 693, 710, 743, 753, 757, 758, 762, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1151, 1180, 1188, 1211, 1221, 1223, 1274, 1290, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00163] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X427G, X589S, X599R, X614N, X630K, X631A, X693L, X710E, X743I, X753G, X757K, X758H, X762G, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1151E, X1180G, X1188R, X1211R, X1221H, X1223S, X1274R, X1290G, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00164] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, E427G, A589S, K599R, D614N, E630K, M631A, F693L, K710E, V743I, R753G, E757K, N758H, E762G, Q768H, N803S, R859S, D861N, N869S, L921P, N946D, Y1016D, M1021T, Q1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, Q1221H, G1223S, S1274R, V1290G, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00165] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, or at least eighty mutations selected from the group consisting of 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00166] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, or at least eighty mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00167] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, at least eighty mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00168] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations in amino acid residues selected from the group consisting of 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 676, 687, 703, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 890, 922, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1348, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00169] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X676G, X687R, X703P, X710E, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1338T, X1348V, X1349R, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00170] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, G676G, G687R, T703P, K710E, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, S1338T, I1348V, S1349R, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00171] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid. [00172] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00173] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00174] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of 575, 596, 631, 649, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1221, 1249, 1253, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00175] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of X575S , X596Y, X631L, X649R, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1221H, X1249S, X1253K, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00176] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, Q1221H, P1249S, E1253K, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00177] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X570T mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X570S.

[00178] In some embodiments, the amino acid sequence of the Cas9 domain comprises an I570T mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is I570S.

[00179] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X589S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X589V.

[00180] In some embodiments, the amino acid sequence of the Cas9 domain comprises an A589S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is A589V.

[00181] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X630G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X630K.

[00182] In some embodiments, the amino acid sequence of the Cas9 domain comprises an E630G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is E630K. [00183] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X631A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence 2, wherein X represents any amino acid. In some embodiments, the mutation is X631I. In some embodiments, the mutation is X631L. In some embodiments, the mutation is X631V.

[00184] In some embodiments, the amino acid sequence of the Cas9 domain comprises an M631A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is M631I. In some embodiments, the mutation is M631L. In some embodiments, the mutation is M631V.

[00185] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X647A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X647I.

[00186] In some embodiments, the amino acid sequence of the Cas9 domain comprises an V647A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is V647I.

[00187] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X654H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X654I. In some embodiments, the mutation is X654L.

[00188] In some embodiments, the amino acid sequence of the Cas9 domain comprises an R654H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is R654I. In some embodiments, the mutation is R654L. [00189] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X890E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X890N.

[00190] In some embodiments, the amino acid sequence of the Cas9 domain comprises a K890E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is K890N.

[00191] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1016C mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1016D. In some embodiments, the mutation is X1016S.

[00192] In some embodiments, the amino acid sequence of the Cas9 domain comprises an Y1016C mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is Y1016D. In some embodiments, the mutation is Y1016S.

[00193] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1021L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1021T.

[00194] In some embodiments, the amino acid sequence of the Cas9 domain comprises an M1021L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is M1021T.

[00195] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1036D mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1036H.

[00196] In some embodiments, the amino acid sequence of the Cas9 domain comprises an Y1036D mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is Y1036H.

[00197] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1057S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1057T. In some embodiments, the mutation is X1057V.

[00198] In some embodiments, the amino acid sequence of the Cas9 domain comprises an I1057S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is I1057T. In some embodiments, the mutation is X1057V.

[00199] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1127A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1121G.

[00200] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1127A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1127G.

[00201] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1156E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1156N. [00202] In some embodiments, the amino acid sequence of the Cas9 domain comprises an K1156E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is K1156N.

[00203] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1180E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1180G.

[00204] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1180E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1180G.

[00205] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1286H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1286K.

[00206] In some embodiments, the amino acid sequence of the Cas9 domain comprises an N1286H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is N1286K.

[00207] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1132G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1132N.

[00208] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1132G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1132N. [00209] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1335L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1335Q.

[00210] In some embodiments, the amino acid sequence of the Cas9 domain comprises an R1335L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is R1335Q.

[00211] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAA-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 1. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 1. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10;

P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2-120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72- 4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9;

P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2-120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2, or a combination of conservative mutations thereto.

[00212] Table 1: NAA PAM Clones

[00213] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1.

[00214] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence.

[00215] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAC-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 2. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4; P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3; P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn). In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5;

P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4; P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3; P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn), or a combination of conservative mutations thereto.

[00216] Table 2: NAC PAM Clones

[00217] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2.

[00218] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence.

[00219] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAT-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 3. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 3. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10- 6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2; P17.4-3; P17.4-4;

P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4- 2; P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1, or a combination of conservative mutations thereto.

[00220] Table 3: NAT PAM Clones

[00221] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3.

Cas9 Activity

[00222] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Ca9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of

Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence.

[00223] In some embodiments, the Cas9 domain exhibits activity on a target sequence having a 3ʹ- end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢), or on a target sequence that does not comprise the canonical PAM sequence (5¢-NGG-3¢), that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 domain exhibits activity on a target sequence having a 3ʹ-end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢), or on a target sequence that does not comprise the canonical PAM sequence (5¢-NGG-3¢), that is at least 10%, at least 15%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% greater than the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an NGT, NGA, NGC, and NNG sequence, wherein N is A, G, T, or C. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, or TTT PAM sequence. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, or CAA sequence. In some embodiments, the Cas9 domain activity is measured by a nuclease assay, a deamination assay, a transcriptional activation assay, a binding assay, or by PCR or sequencing. In some embodiments, the transcriptional activation assay is a reporter activation assay, such as a GFP activation assay. Exemplary methods for measuring binding activity (e.g., of Cas9) using transcriptional activation assays are known in the art and would be apparent to the skilled artisan. For example, methods for measuring Cas9 activity using the tripartite activator VPR have been described in Chavez A., et al.,“Highly efficient Cas9-mediated transcriptional programming.” Nature Methods 12, 326–328 (2015), the entire contents of which are incorporated by reference herein.

[00224] In some embodiments, the Cas9 domain is mutated with respect to a corresponding wild- type protein such that the mutated Cas9 domain lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. In particular embodiments, an aspartate-to- alanine substitution (D10A) in the RuvC1 catalytic domain of S. pyogenes Cas9 converts Cas9 from a nuclease that cleaves both strands to a nickase that nicks the targeted strand, or the strand that is complementary to the gRNA. A histidine-to-alanine substitution (H840A) in the HNH catalytic domain of S. pyogenes Cas9 generates a nick on the strand that is displaced by the gRNA during strand invasion, also referred to herein as the non-edited strand. The single catalytically active nuclease site of the nCas9 leaves a nick in the non-edited strand, which will direct mismatch repair machinery to read (rather than remove) the modified base during repair (i.e., a substituted guanine or guanine derivative at the target site). Other examples of mutations that render Cas9 a nickase include, without limitation, N854A and N863A in SpCas9, and corresponding mutations in other wild- type Cas9 proteins or variants thereof. Reference is made to U.S. Patent No.8,945,839, incorporated herein by reference.

[00225] In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any of SEQ ID NO: 2. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of SEQ ID NO: 2. In some embodiments, the Cas9 domain comprises the RuvC and HNH domains of SEQ ID NO: 2. In some embodiments, the Cas9 domain comprises a D10A and/or a H840A mutation in the amino acid sequence provided in SEQ ID NO: 2, or corresponding mutation(s) in another Cas9 sequence.

[00226] In some embodiments, the disclosure provides SpCas9 mutant proteins that work best on NRRH, NRCH, and NRTH PAMs. The SpCas9 mutant protein that works best on NARH (“es” variant), has an amino acid sequence as presented in SEQ ID NO: 22 (underligned residues are mutated from SpCas9)

[00227] The SpCas9 mutant protein that works best on NRCH (“fn” variant), has an amino acid sequence as presented in SEQ ID NO: 23 (underligned residues are mutated from SpCas9)

[00228] The SpCas9 mutant protein that works best on NRTH (“ax” variant), has an amino acid

[00229] Some aspects of the disclosure provide high fidelity Cas9 domains. In some embodiments, high fidelity Cas9 domains have decreased electrostatic interactions between the Cas9 domain and a sugar-phosphate backbone of a DNA, as compared to a wild-type Cas9 domain. In some

embodiments, any of the Cas9 domains provided herein comprise one or more mutations that decrease the association between the Cas9 domain and a sugar-phosphate backbone of a DNA. In some embodiments, any of the Cas9 domains provided herein comprise one or more mutations that decrease the association between the Cas9 domain and a sugar-phosphate backbone of a DNA by at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%. In some embodiments, any of the Cas9 domains provided herein comprise one or more of a N497X, a R661X, a Q695X, and/or a Q926X mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence, wherein X is any amino acid. In some

embodiments, any of the Cas9 domains provided herein comprise one or more of a N497A, a R661A, a Q695A, and/or a Q926A mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence. In some embodiments, the Cas9 domain comprises a D10A mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence. In some embodiments, the Cas9 domain comprises the amino acid sequence as set forth in SEQ ID NO: 135. High fidelity Cas9 domains have been described in the art and would be apparent to the skilled artisan. For example, high fidelity Cas9 domains have been described in Kleinstiver, B.P., et al.“High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects.” Nature 529, 490-495 (2016); and Slaymaker, I.M., et al.“Rationally engineered Cas9 nucleases with improved specificity.” Science 351, 84-88 (2015); the entire contents of each are incorporated herein by reference. It should be appreciated that, based on the present disclosure and knowledge in the art, that mutations in any Cas9 domain may be generated to make high fidelity Cas9 domains that have decreased electrostatic interactions between the Cas9 domain and a sugar-phosphate backbone of a DNA, as compared to a wild-type Cas9 domain.

[00230] Cas9 domain where mutations relative to Cas9 of SEQ ID NO: 6 are shown in bold and underlines.

[00231] In some embodiments, the Cas9 domain of any of the fusion proteins provided herein comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid set forth as SEQ ID NO: 10 (S. aureus Cas9), below. In some embodiments, the Cas9 domain of any of the fusion proteins provided herein comprises the amino acid sequence of SEQ ID NO: 10. In some embodiments, the Cas9 domain of any of the fusion proteins provided herein consists of the amino acid sequence of SEQ ID NO: 10.

[00232] An exemplary SaCas9 amino acid sequence is:

[00233] An additional Cas9 domain with altered PAM specificity, such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Geobacillus thermodenitrificans Cas9 (SEQ ID NO: 11, GeoCas9) may be used.

[00234] In some embodiments, a Cas9 domain refers to a Cas9 or Cas9 homolog from archaea (e.g., nanoarchaea), which constitute a domain and kingdom of single-celled prokaryotic microbes. In some embodiments, a Cas9 domain may comprise a CasX (now referred to as Cas12e) or CasY (now referred to as Cas12d) omain, which have been described in, for example, Burstein et al.,“New CRISPR–Cas systems from uncultivated microbes.” Cell Res.2017 Feb 21. doi: 10.1038/cr.2017.21, and Liu et al.,“CasX enzymes comprise a distinct family of RNA-guided genome editors,” Nature. 2019; 566(7743):218-223, each of which is incorporated herein by reference. Using genome- resolved metagenomics, a number of CRISPR–Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR–Cas system. In bacteria, two previously unknown systems were discovered, CRISPR–CasX and CRISPR–CasY, which are among the most compact systems yet discovered. In some embodiments, napDNAbp domain refers to CasX, or a variant of CasX. In some embodiments, napDNAbp domain refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a napDNAbp and are within the scope of this disclosure. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein.

Cytidine deaminases

[00235] In some embodiments, the deaminase domain is a cytidine deaminase domain. A cytidine deaminase domain may also be referred to interchangeably as a cytosine deaminase domain. In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine (C) or deoxycytidine (dC) to uridine (U) or deoxyuridine (dU), respectively. In some embodiments, the cytidine deaminase domain catalyzes the hydrolytic deamination of cytosine (C) to uracil (U). In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine or cytosine in deoxyribonucleic acid (DNA). Without wishing to be bound by any particular theory, fusion proteins comprising a cytidine deaminase are useful inter alia for targeted editing, referred to herein as“base editing,” of nucleic acid sequences in vitro and in vivo.

[00236] One exemplary suitable type of cytidine deaminase is a cytidine deaminase, for example, of the APOBEC family. The apolipoprotein B mRNA-editing complex (APOBEC) family of cytidine deaminase enzymes encompasses eleven proteins that serve to initiate mutagenesis in a controlled and beneficial manner (see, e.g., Conticello SG. The AID/APOBEC family of nucleic acid mutators. Genome Biol.2008; 9(6):229). One family member, activation-induced cytidine deaminase (AID), is responsible for the maturation of antibodies by converting cytosines in ssDNA to uracils in a transcription-dependent, strand-biased fashion (see, e.g., Reynaud CA, et al. What role for AID: mutator, or assembler of the immunoglobulin mutasome, Nat Immunol.2003; 4(7):631-638). The apolipoprotein B editing complex 3 (APOBEC3) enzyme provides protection to human cells against a certain HIV-1 strain via the deamination of cytosines in reverse-transcribed viral ssDNA (see, e.g., Bhagwat AS. DNA-cytosine deaminases: from antibody maturation to antiviral defense. DNA Repair (Amst).2004; 3(1):85-89). These proteins all require a Zn ²⁺-coordinating motif (His-X-Glu-X _23-26-Pro- Cys-X _2-4-Cys; SEQ ID NO: 405) and bound water molecule for catalytic activity. The Glu residue acts to activate the water molecule to a zinc hydroxide for nucleophilic attack in the deamination reaction. Each family member preferentially deaminates at its own particular“hotspot”, ranging from WRC (W is A or T, R is A or G) for hAID, to TTC for hAPOBEC3F (see, e.g., Navaratnam N and Sarwar R. An overview of cytidine deaminases. Int J Hematol.2006; 83(3):195-200). A recent crystal structure of the catalytic domain of APOBEC3G revealed a secondary structure comprised of a five-stranded b- sheet core flanked by six a-helices, which is believed to be conserved across the entire family (see, e.g., Holden LG, et al. Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature.2008; 456(7218):121-4). The active center loops have been shown to be responsible for both ssDNA binding and in determining“hotspot” identity (see, e.g., Chelico L, et al. Biochemical basis of immunological and retroviral responses to DNA-targeted cytosine deamination by activation-induced cytidine deaminase and APOBEC3G. J Biol Chem.2009; 284(41).27761-5). Overexpression of these enzymes has been linked to genomic instability and cancer, thus highlighting the importance of sequence-specific targeting (see, e.g., Pham P, et al. Reward versus risk: DNA cytidine deaminases triggering immunity and disease. Biochemistry.2005; 44(8):2703-15).

[00237] Some aspects of this disclosure relate to the recognition that the activity of cytidine deaminase enzymes such as APOBEC enzymes can be directed to a specific site in genomic DNA. Without wishing to be bound by any particular theory, advantages of using a nucleic acid programmable binding protein (e.g., a Cas9 domain) as a recognition agent include (1) the sequence specificity of nucleic acid programmable binding protein (e.g., a Cas9 domain) can be easily altered by simply changing the sgRNA sequence; and (2) the nucleic acid programmable binding protein (e.g., a Cas9 domain) may bind to its target sequence by denaturing the dsDNA, resulting in a stretch of DNA that is single-stranded and therefore a viable substrate for the deaminase. It should be understood that other catalytic domains of napDNAbps, or catalytic domains from other nucleic acid editing proteins, can also be used to generate fusion proteins with Cas9, and that the disclosure is not limited in this regard.

[00238] In view of the results provided herein regarding the nucleotides that can be targeted by Cas9:deaminase fusion proteins, a person of ordinary skill in the art will be able to design suitable guide RNAs to target the fusion proteins to a target sequence that comprises a nucleotide to be deaminated.

[00239] In some embodiments, the cytidine deaminase is an apolipoprotein B mRNA- editing complex (APOBEC) family deaminase. In some embodiments, the cytidine deaminase is an APOBEC1 deaminase. In some embodiments, the cytidine deaminase is an APOBEC2 deaminase. In some embodiments, the cytidine deaminase is an APOBEC3 deaminase. In some embodiments, the cytidine deaminase is an APOBEC3A deaminase. In some embodiments, the cytidine deaminase is an APOBEC3B deaminase. In some embodiments, the cytidine deaminase is an APOBEC3C deaminase. In some embodiments, the cytidine deaminase is an APOBEC3D deaminase. In some embodiments, the cytidine deaminase is an APOBEC3E deaminase. In some embodiments, the cytidine deaminase is an APOBEC3F deaminase. In some embodiments, the cytidine deaminase is an APOBEC3G deaminase. In some embodiments, the cytidine deaminase is an APOBEC3H deaminase. In some embodiments, the cytidine deaminase is an APOBEC4 deaminase. In some embodiments, the cytidine deaminase is an activation-induced deaminase (AID). In some embodiments, the cytidine deaminase is a vertebrate cytidine deaminase. In some embodiments, the cytidine deaminase is an invertebrate cytidine deaminase. In some embodiments, the cytidine deaminase is a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse deaminase. In some embodiments, the cytidine deaminase is a human cytidine deaminase. In some embodiments, the cytidine deaminase is a rat cytidine deaminase, e.g., rAPOBEC1. In some embodiments, the cytidine deaminase is a Petromyzon marinus cytidine deaminase 1 (pmCDA1) (SEQ ID NO: 58). In some embodiments, the cytidine deaminase is a human APOBEC3G (SEQ ID NO: 60). In some embodiments, the cytidine deaminase is a fragment of the human APOBEC3G. In some embodiments, the deaminase is a human APOBEC3G variant comprising a D316R and D317R mutation. In some embodiments, the deaminase is a fragment of the human APOBEC3G and comprising mutations corresponding to the D316R and D317R mutations in SEQ ID NO: 61.

[00240] In some embodiments, the nucleic acid editing domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the cytidine deaminase domain of any one of SEQ ID NOs: 27-61. In some embodiments, the nucleic acid editing domain comprises the amino acid sequence of any one of SEQ ID NOs: 27-61.

[00241] Some exemplary suitable nucleic-acid editing domains, e.g., cytidine deaminases and cytidine deaminase domains, that can be fused to napDNAbps (e.g., Cas9 domains) according to aspects of this disclosure are provided below. It should be understood that, in some embodiments, the active domain of the respective sequence can be used, e.g., the domain without a localizing signal (nuclear localization sequence, without nuclear export signal, cytoplasmic localizing signal).

[00242] Human AID: