Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CAS9 VARIANTS HAVING NON-CANONICAL PAM SPECIFICITIES AND USES THEREOF
Document Type and Number:
WIPO Patent Application WO/2020/041751
Kind Code:
A1
Abstract:
Some aspects of this disclosure provide strategies, systems, reagents, methods, and kits that are useful for engineering Cas9 and Cas9 variants that have increased activity on target sequences that do not contain the canonical PAM sequence. In some embodiments, fusion proteins comprising such Cas9 variants and nucleic acid editing domains, e.g., deaminase domains, are also provided.

Inventors:
LIU DAVID (US)
WANG TINA (US)
MILLER SHANNON (US)
Application Number:
PCT/US2019/047996
Publication Date:
February 27, 2020
Filing Date:
August 23, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BROAD INST INC (US)
HARVARD COLLEGE (US)
International Classes:
C12N9/00; C12N9/22; C12N9/24; C12N15/62
Foreign References:
US20170121693A12017-05-04
US20160340662A12016-11-24
US20180073012A12018-03-15
CN107177625A2017-09-19
Other References:
NISHIMASU ET AL.: "Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA", CELL, vol. 156, no. 5, 27 February 2014 (2014-02-27), pages 935 - 949, XP028667665, DOI: 10.1016/j.cell.2014.02.001
See also references of EP 3841203A4
Attorney, Agent or Firm:
MCCOOL, Gabriel, J. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by any one of SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2. 2. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 3. The Cas9 protein of claim 1 or 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, V1139A, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO:2, wherein X is any amino acid. 4. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 427, 589, 599, 614, 630, 631, 693, 710, 743, 753, 757, 758, 762, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1151, 1180, 1188, 1211, 1221, 1223, 1274, 1290, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2. 5. The Cas9 protein of claim 4, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X427G, X589S, X599R, X614N, X630K, X631A, X693L, X710E, X743I, X753G, X757K, X758H, X762G, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1151E, X1180G, X1188R, X1211R, X1221H, X1223S, X1274R, X1290G, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid.

6. The Cas9 protein of claim 4 or 5, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, E427G, A589S, K599R, D614N, E630K, M631A, F693L, K710E, V743I, R753G, E757K, N758H, E762G, Q768H, N803S, R859S, D861N, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, Q1221H, G1223S, S1274R, V1290G, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 7. The Cas9 protein of any one of claims 1-6, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1, or a combination of conservative mutations thereto. 8. The Cas9 protein of any one of claims 1-7, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1. 9. The Cas9 protein of any one of claims 1-8, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2- 120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2, or a combination of conservative mutations thereto.

10. The Cas9 protein of any one of claims 1-9, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2- 120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2. 11. The Cas9 protein of any one of claims clim 1-10 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2. 12. The Cas9 protein of any one of claims 1-11, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 13. The Cas9 protein of any one of claims 1-12, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. 14. The Cas9 protein of claim 12 or 13, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence.

15. The Cas9 protein of any one of claims 12-14, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 16. The Cas9 protein of any one of claims 1-15, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2. 17. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2. 18. The Cas9 protein of claim 1, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 19. The Cas9 protein of claim 17 or 18, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X is any amino acid. 20. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by any one of SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 654, 654, 670, 676, 687, 703, 710, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 890, 922, 948, 959, 990, 995, 1014, 1015, 1016, 1016, 1021, 1030, 1036, 1036, 1055, 1057, 1057, 1114, 1127, 1135, 1156, 1156, 1177, 1180, 1184, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1332, 1335, 1338, 1348, 1349, 1367, 1367, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2. 21. The Cas9 protein of claim 20, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X676G, X687R, X703P, X710E, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1338T, X1348V, X1349R, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 22. The Cas9 protein of claim 20 or 21, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, G676G, G687R, T703P, K710E, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, S1338T, I1348V, S1349R, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, wherein X represents any amino acid. 23. The Cas9 protein of any one of claims 17-22, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2, or a combination of conservative mutations thereto. 24. The Cas9 protein of any one of claims 17-23, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2. 25. The Cas9 protein of any one of claims 17-24, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6;

N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4;

P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3;

P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn), or a combination of conservative mutations thereto. 26. The Cas9 protein of any one of claims 17-25, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4;

P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3;

P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn). 27. The Cas9 protein of any one of claims claim 17-26 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2. 28. The Cas9 protein of any one of claims 17-27, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 29. The Cas9 protein of any one of claims 17-28, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. 30. The Cas9 protein of claim 28 or 29, wherein the 3ʹ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence.

31. The Cas9 protein of any one of claims 28-30, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 32. The Cas9 protein of any one of claims 17-31, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2. 33. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and1339 of the amino acid sequence provided in SEQ ID NO: 2. 34. The Cas9 protein of claim 33, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X represents any amino acid.

35. The Cas9 protein of claim 33 or 34, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X is any amino acid. 36. A Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NO: 2 werein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1221, 1249, 1253, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and1339 of the amino acid sequence provided in SEQ ID NO: 2. 37. The Cas9 protein of claim 36, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of X575S , X596Y, X631L, X649R, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1221H, X1249S, X1253K, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2 wherein X represents any amino acid. 38. The Cas9 protein of claim 36 or 37, wherein the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, or at least nine mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, Q1221H, P1249S, E1253K, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2 wherein X represents any amino acid. 39. The Cas9 protein of any one of claims 33-38, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3 or a combination of conservative mutations thereto. 40. The Cas9 protein of any one of claims 33-39, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3 41. The Cas9 protein of any one of claims 33-40, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax);

P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2; P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1, or a combination of conservative mutations thereto. 42. The Cas9 protein of any one of claims 33-41, wherein the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2;

P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1. 43. The Cas9 protein of any one of claims claim 33-42 comprising an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 as provided by SEQ ID NOs: 2 44. The Cas9 protein of any one of claims 33-43, wherein the Cas9 exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 45. The Cas9 protein of any one of claims 33-44, wherein the Cas9 protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢- NGG-3¢) that is at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 n the same target sequence.

46. The Cas9 protein of claim 44 or 45, wherein the 3ʹ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence. 47. The Cas9 protein of any one of claims 44-46, wherein the activity is measured by a nuclease assay, a deamination assay, or a transcriptional activation assay. 48. The Cas9 protein of any one of claims 33-47, wherein the Cas9 protein comprises a D10A and/or a H840A mutation of the amino acid sequence provided in SEQ ID NO: 2 or a corresponding mutation, or mutations, in another Cas9 amino sequence. 49. The Cas9 protein of any one of claims 1-48, wherein the Cas9 exhibits an increased activity on a target sequence comprising a PAM sequence selected from the group consisting of AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, and TTT at its 3ʹ end as compared to

Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. 50. The Cas9 protein of any one of claims 1-49, wherein the Cas9 protein exhibits lower off- target activity as compared to an off-target activity of the Streptococcus pyogenes Cas9 domain as provided by SEQ ID NO: 2. 51. A fusion protein comprising (i) the Cas9 protein of any one of claims 1-50, and (ii) an effector domain.

52. The fusion protein of claim 51, wherein the effector domain is a domain that comprises nuclease activity, nickase activity, recombinase activity, deaminase activity, methyltransferase activity, methylase activity, acetylase activity, acetyltransferase activity, transcriptional activation activity, or transcriptional repression activity. 53. The fusion protein of claim 51 or 52, wherein the effector domain is a nucleic acid editing domain. 54. The fusion protein of claim 53, wherein the nucleic acid editing domain comprises a deaminase domain. 55. The fusion protein of claim 54, wherein the deaminase domain is a cytidine deaminase domain. 56. The fusion protein of claim 55, wherein the cytidine deaminase domain is an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. 57. The fusion protein of claim 55 or 56, wherein the cytidine deaminase domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the cytidine deaminase domain of any one of SEQ ID NOs: 27-61. 58. The fusion protein of claim 55 or 56, wherein the cytidine deaminase domain comprises the amino acid sequence of any one of SEQ ID NOs: 27-61.

59. The fusion protein of any one of claims 51-58, wherein the fusion protein further comprises a uracil glycosylase inhibitor (UGI) domain. 60. The fusion protein of claim 59, wherein the UGI domain comprises the amino acid sequence of SEQ ID NO: 115. 61. The fusion protein of any one of claims 51-60, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 122. 62. The fusion protein of any one of claims 51-60, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 123. 63. The fusion protein of any one of claims 51-62, wherein the fusion protein further comprises a second UGI domain. 64. The fusion protein of claim 63, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 123. 65. The fusion protein of claim 63, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 124. 66. The fusion protein of claim 54, wherein the deaminase domain is an adenosine deaminase domain. 67. The fusion protein of claim 66 further comprising a second adenosine deaminase domain.

68. The fusion protein of claim 67, wherein the first adenosine deaminase domain and the second adenosine deaminase domain comprises an ecTadA domain, or variant thereof. 69. The fusion protein of claim 68, wherein the first adenosine deaminase domain and the second adenosine deaminase domain comprise the amino acid sequence of any one of SEQ ID NOs: 62-84. 70. The fusion protein of claim 69, wherein the first adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 62-84. 71. The fusion protein of claim 69, wherein the second adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 62-84. 72. The fusion protein of any one of claims 66-71, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 127. 73. The fusion protein of any one of claims 66-71, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 128. 74. A complex comprising the fusion protein of any one of claims 51-73, and a guide RNA bound to the Cas9 protein.

75. The complex of claim 74, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 76. The complex of claim 75, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, or TTT sequence. 77. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence. 78. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence. 79. The complex of claim 75 or 76, wherein the 3ʹ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence. 80. The complex of any one of claims 74-79, wherein the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides long.

81. The complex of any one of claims 75-80, wherein the guide RNA comprises a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target sequence. 82. The complex of any one of claims 75-81, wherein the target sequence is a DNA sequence. 83. The complex of claim 82, wherein the target sequence is a sequence in the genome of a mammal. 84. The complex of claim 83, wherein the target sequence is a sequence in the genome of a human. 85. The complex of any one of claims 75-84, wherein the target sequence comprises a sequence associated with a disease or disorder. 86. The complex of claim 85, wherein the target sequence comprises a point mutation associated with a disease or disorder. 87. The complex of claim 86, wherein the complex edits a point mutation in the target sequence. 88. The complex of claim 87, wherein the point mutation is located between about 10 to about 20 nucleotides upstream of the PAM in the target sequence. 89. The complex of claim 87 or 88, wherein the target sequence comprises a T to C point mutation.

90. The complex of claim 89, wherein the complex deaminates the target C point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. 91. The complex of claim 90, wherein the target C point mutation is present in the DNA strand that is not complementary to the guide RNA. 92. The complex of claim 87 or 88, wherein the target sequence comprises a G to A point mutation. 93. The complex of claim 92, wherein the complex deaminates the target A point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. 94. The complex of claim 93, wherein the target A point mutation is present in the DNA strand that is not complementary to the guide RNA. 95. The complex of any one of claims 74-94, wherein the complex exhibits increased deamination efficiency of a point mutation in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the deamination efficiency of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 96. The complex of claim 95, wherein the complex exhibits increased deamination efficiency of a point mutation in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least

1,000,000-fold increased as compared to the deamination efficiency of complex comprising the Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2 on the same target sequence. 97. The complex of any one of claims 90-96, wherein a deamination activity is measured using a deamination assay, PCR, or sequencing. 98. The complex of any one of claims 74-97, wherein the complex produces fewer indels in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 99. The complex of claim 98, wherein the complex produces fewer indels in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold lower as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2 on the same target sequence. 100. The complex of any one of claims 98-99, wherein indels are measured using high- throughput sequencing.

101. The complex of any one of claims 74-100, wherein the complex exhibits a decreased off- target activity as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 102. The complex of claim 101, wherein the off-target activity of the complex is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold decreased as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 protein as provided by SEQ ID NO: 2. 103. The complex of any one of claims 75-102, wherein the target sequence is in the genome of an organism. 104. The complex of claim 103, wherein the organism is a prokaryote. 105. The complex of claim 104, wherein the prokaryote is a bacterium. 106. The complex of claim 103, wherein the organism is a eukaryote. 107. The complex of claim 103, wherein the organism is a plant or fungus. 108. The complex of claim 103, wherein the organism is a vertebrate. 109. The complex of claim 108, wherein the vertebrate is a mammal.

110. The complex of claim 109, wherein the mammal is a human. 111. The complex of claim 103, wherein the organism is a cell. 112. The complex of claim 111, wherein the cell is a human cell. 113. A method comprising contacting a nucleic acid with the fusion protein of any one of claims 51-73, and with a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 114. A method comprising contacting a cell with the fusion protein of any one of claims 51-73, and with a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 115. A method comprising contacting a nucleic acid with the complex of any one of claims 74- 112. 116. A method comprising contacting a cell with the complex of any one of claims 74-112. 117. The method of any one of claims 113-116, wherein the contacting is performed in vitro. 118. The method of any one of claims 114-116, wherein the contacting is performed in vivo.

119. A method comprising administering to a subject the fusion protein of any one of claims 51- 73, and a guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. 120. A method comprising administering to a subject the complex of any one of claims 74-112. 121. The method of any one of claims 113-120, wherein the target sequence of the nucleic acid is a DNA sequence. 122. The method of any one of claims 113-121, wherein the 3ʹ end of the target sequence is not immediately adjacent to the canonical PAM sequence (5¢-NGG-3¢). 123. The method of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAA, GAA, CAA, and TAA. 124. The complex of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAC, GAC, CAC, and TAC. 125. The complex of claim 122, wherein the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of AAT, GAT, CAT, and TAT. 126. The method of any one of claims 113-125, wherein the target sequence comprises a sequence associated with a disease or disorder.

127. The method of claim 126, wherein the target DNA sequence comprises a point mutation associated with a disease or disorder. 128. The method of claim 127, wherein the activity of the fusion protein, or the activity of the complex, results in a correction of the point mutation. 129. The method of any one of claims 127-128, wherein the target DNA sequence comprises a T to C point mutation associated with a disease or disorder, and wherein the deamination of the mutant C base results in a sequence that is not associated with a disease or disorder. 130. The method of claim 129, wherein the target DNA sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. 131. The method of claim 130, wherein the deamination of the mutant C results in a change of the amino acid encoded by the mutant codon. 132. The method of claim 131, wherein the deamination of the mutant C results in the codon encoding the wild-type amino acid. 133. The method of any one of claims 127-128, wherein the target DNA sequence comprises a G to A point mutation associated with a disease or disorder, and wherein the deamination of the mutant A base results in a sequence that is not associated with a disease or disorder.

134. The method of claim 133, wherein the target DNA sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. 135. The method of claim 134, wherein the deamination of the mutant A results in a change of the amino acid encoded by the mutant codon. 136. The method of claim 135, wherein the deamination of the mutant A results in the codon encoding the wild-type amino acid. 137. The method of any one of claims 113-136, wherein the contacting is in vivo in a subject. 138. The method of claim 137, wherein the subject has or has been diagnosed with a disease or disorder. 139. The method of claim 137 or 138, wherein the disease or disorder is a proliferative disease, a genetic disease, a neoplastic disease, a metabolic disease, or a lysosomal storage disease. 140. A kit comprising a nucleic acid construct, comprising:

(a) a nucleic acid sequence encoding the fusion protein of any one of claims 51-73; and (b) a heterologous promoter that drives expression of the sequence of (a). 141. A kit comprising a nucleic acid construct, comprising:

(a) a nucleic acid sequence encoding the complex of any one of claims 74-112; and (b) a heterologous promoter that drives expression of the sequence of (a).

142. The kit of claim 140 further comprising an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone. 143. A polynucleotide encoding the fusion protein of any one of claims 51-73 or the complex of any one of claims 74-112. 144. A vector comprising a polynucleotide of claim 143. 145. The vector of claim 144, wherein the vector comprises a heterologous promoter driving expression of the polynucleotide encoding the fusion protein or the polynucleotide encoding the complex. 146. A method comprising contacting a cell with the vector of claim 144 or 145. 147. The method of claim 146, wherein the cell vector is transfected into the cell. 148. The method of claim 147, wherein the vector is transfected into the cell using

electroporation, heat shock, or a composition comprising a cationic lipid. 149. A cell comprising the fusion protein of any one of claims 51-73, or a nucleic acid molecule encoding the fusion protein of any one of claims 51-73.

150. A cell comprising the complex of any one of claims 74-112, or a nucleic acid molecule encoding the complex of any one of claims 74-112. 151. A cell comprising the vector of claim 144 or 145. 152. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 122, wherein the SpCas9 has a non-canonical PAM specificity. 153. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 123, wherein the SpCas9 has a non-canonical PAM specificity. 154. An SpCas9 comprising the amino acid sequence of SEQ ID NO: 124, wherein the SpCas9 has a non-canonical PAM specificity. 155. A fusion protein comprising an SpCas9 of any of claims 152-154 and a cytidine deaminase. 156. The fusion protein of claim 155, wherein the cytidine deaminase comprises any one of SEQ ID NOs: 27-61. 157. A fusion protein comprising an SpCas9 of any of claims 152-154 and an adenosine deaminase. 158. The fusion protein of claim 155, wherein the adenosine deaminase comprises any one of SEQ ID NOs: 62-84.

159. A complex comprising a fusion protein of any one of claims 155-158 and a guide RNA.

Description:
CAS9 VARIANTS HAVING NON-CANONICAL PAM SPECIFICITIES AND USES

THEREOF RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Patent Application No.62/722,057 filed August 23, 2018, and to U.S. Provisional Patent Application No.62/886,937, filed August 14, 2019, each of which are incorporated herein by reference. BACKGROUND OF THE INVENTION

[0002] CRISPR-Cas systems, and especially systems based on the Cas9 enzyme from

Streptococcus pyogenes (SpCas9) have successfully been engineered for genome editing and base editing in a wide range of organisms. As one example, base editors have been developed that convert Cas endonucleases into programmable nucleotide deaminases 1, 2, 3 , thus facilitating the introduction of C-to-T mutations (by C-to-U deamination) or A-to-G mutations (by A-to-I deamination) without induction of a double-strand break 4, 5 .

[0003] One drawback of current genome and base engineering tools (e.g., ZNFs, TALENS, and CRISPR/Cas9) is that they are limited with respect to the DNA sequences that can be targeted. For example, ZNF and TALENS are limited because each system requires the design of a specific DNA- binding portion, the amino acid sequence of which being a function of each individual target nucleotide sequence. CRISPR/Cas9 technologies are also limited. While Cas9 can be programmably targeted to virtually any target sequence by providing a suitable guide RNA, Cas9 strictly requires the presence of a protospacer-adjacent motif (PAM)-- which is typically the canonical nucleotide sequence 5¢-NGG-3¢ (e.g., for SpCas9)--immediately adjacent to the 3¢-end of the targeted nucleic acid sequence in order for the Cas9 to bind and act upon the target sequence. This requirement for a PAM sequence effectively limits the nucleotide sequences which can be efficiently targeted by Cas9. [0004] Accordingly, there is a need for nucleic acid programmable DNA binding proteins, such as Cas9, that are capable of binding target nucleotide sequences that lack canonical PAMs(e.g., 5¢-NGG- 3¢ for SpCas9) in order to expand the scope and flexibility of genome and base editing.

SUMMARY OF THE INVENTION

[0005] The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive immune system that has been modified to enable robust genome and nucleobase engineering in a variety of organisms and cell lines. CRISPR-Cas (CRISPR-associated) systems are protein-RNA complexes that use an RNA molecule (sgRNA) as a guide to localize the complex to a target nucleic acid sequence via base-pairing. In the natural systems, a Cas protein then acts as an endonuclease to cleave the targeted DNA sequence. The target nucleic acid sequence must be both complementary to the sgRNA and also contain a“protospacer-adjacent motif”(PAM) at the 3¢-end of the complementary region in order for the system to function. The requirement for a PAM sequence limits the use of Cas9 technology, especially for applications that require precise Cas9 positioning, such as base editing, which requires a PAM approximately 13-17 nucleotides from the target base and some forms of homology-directed repair, which are most efficient when DNA cleavage occurs ~ 10- 20 base pairs away from a desired alteration. To address this limitation, researchers have harnessed natural CRISPR nucleases with different PAM requirements and engineered existing systems to accept variants of naturally recognized PAMs. Other natural CRISPR nucleases shown to function efficiently in mammalian cells include Staphylococcus aureus Cas9 (SaCas9), Acidaminococcus sp. Cpf1 (AsCpf1), Lachnospiraceae bacterium Cpf1, Campylobacter jejuni Cas9, Streptococcus thermophilus Cas9, and Neisseria meningitides Cas9. None of these mammalian cell-compatible CRISPR nucleases, however, offers a PAM that occurs as frequently as that of SpCas9.

[0006] Some aspects of the disclosure relate to novel Cas9 mutants that are capable of binding to target sequences that do not include a canonical PAM sequence (5¢-NGG-3¢, where N is any nucleotide) at the 3¢-end. The disclosure also provides methods of generating and identifying novel Cas9 variants, e.g., using Phage Assisted Continuous Evolution (PACE) and/or Phage Assisted Non- Continuous Evolution (PANCE), that are capable of recognizing (e.g., binding to) target sequences encompassing the a variety of PAM sequences . In particular, methods and compositions have been developed for targeting sequences that have an adenine (A) at the second nucleic acid position of the PAM (e.g., 5¢-NAN-3¢). It should be appreciated that target sequences having PAMs that lack one or more guanines (Gs) are particularly difficult to target given the paucity of SpCas9 activity (e.g., binding activity) on such sequences. One goal of the disclosure is to provide a repertoire of SpCas9 variants that could be selected from for use in genome and/or base editing applications that are specific for a target nucleic acid sequence (e.g., DNA sequence) based on a particular PAM sequence. Such a catalogue/library of SpCas9 variants would be useful for expanding the scope of genome and base editing, so as not to be restricted by any particular PAM requirement. BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Figures 1A-1C show schematic representations of Phage Assisted Continuous Evolution (PACE) of Cas9 and results of SpCas9 vs xCas9 evolution. Figure 1A, PACE takes place in a fixed- volume“lagoon” that is continuously diluted with fresh host E. coli cells. Upon infection, each selection phage (SP) that encodes a Cas9 variant capable of binding the target PAM and protospacer on the accessory plasmid (AP) induces expression of gene III, resulting in infectious progeny phage that propagate the active Cas9 variant in subsequent host cells. Figure 1B, accessory plasmids representing each of 64 PAM sequences are used to select for Cas9 variants capable of binding to the PAM/protospacer sequences, where RNAP fused to the Cas9 variant induces express ion of gene III upon binding to the sequence having the specific PAM. Figure 1C, data (luciferase assay) for overnight phage propagation reveals on which PAMs SpCas9 and xCas9 have binding activity.

xCas9 has a less strict PAM requirement as compared to SpCas9.

[0008] Figures 2A-B show a schematic representation of a Cas964 PAM Phage Assisted Non- Continuous Evolution (PANCE) and results of SpCas9 vs xCas9 PANCE evolution. Figure 2A, 96 well PANCE format allowed for simultaneous evolution of all 64 PAM sequences. PANCE is lower stringency than PACE as it is not continuous flow, thereby allowing for evolution from low activity. Figure 2B, data (luciferase assay) for PANCE evolution at passage 2 (P2), passage 12 (P12), and passage 16 (P16) for SpCas9 (wt) or xCas9 show an increase in the ability to bind additional PAM sequences.

[0009] Figures 3A-B show clones resulting from PANCE evolution experiments using SpCas9 (N3) after passage 12, including the activity for selected clones. Figure 3A, is a table listing individual clones and their mutations as compared to nuclease inactive SpCas9. The nomenclature of each clone indicates the PAM on which the clone was evolved. For example, clones CAA-2, CAA-3, and CAA-4 were evolved using a 5¢-CAA-3¢-PAM sequence. Figure 3B, shows activity for clones SpCas9, CAA-3, GAT-2, ATG-2, ATG-3, and AGC-3, using a luciferase assay. Clones were obtained from PANCE evolution experiments using SpCas9 (N3) after passage 12.

[0010] Figures 4A-B show clones resulting from PANCE evolution experiments using SpCas9 (N3) after passage 19, including the activity for selected clones. Figure 4A, is a table listing individual clones and their mutations as compared to nuclease inactive SpCas9. The nomenclature of each clone indicates the PAM on which the clone was evolved. For example, clones ACG-1, ACG-2, ACG-3, and ACG-4 were evolved using a 5¢-ACG-3¢-PAM sequence. Figure 4B, shows activity for clones SpCas9, N3.19.CAA1, N3.19.CAA2, N3.19.GAA1, N3.19.GAA2, N3.19.GAC5, N3.19.GAT1, N3.19.GAT3, N3.19.ACG1, N3.19.ACG3, N3.19.ACG6, N3.19.ATG3, and

N3.19.ATG6 using a luciferase assay. Clones were obtained from PANCE evolution experiments using SpCas9 (N3) after passage 19.

[0011] Figures 5A-B show clones resulting from PANCE evolution experiments using xCas9 3.7 (N4) after passage 12, including the activity for selected clones. Figure 5A, is a table listing individual clones and their mutations as compared to xCas93.7. The table indicates the PAM on which each of the clones was evolved. For example, clones N4.12.10 TAT1, N4.12.10 TAT2, and N4.12.10 TAT3 were evolved using a 5¢-TAT-3¢-PAM sequence. Figure 5B, shows activity for clones xCas9 (xCas93.7), TAT-1, TAT-3, GTA-1, GTA-3, and CAC-2 using a luciferase assay. Clones were obtained from PANCE evolution experiments using xCas93.9 (N4) after passage 12.

[0012] Figures 6A-B show clones resulting from PANCE evolution experiments using xCas93.7 (N4) after passage 19, including the activity for selected clones. Figure 6A, is a table listing individual clones and their mutations as compared to xCas93.7. The table indicates the PAM on which each of the clones was evolved. For example, clones N4.19.AAA1, N4.19.AAA2,

N4.19.AAA4, and N4.19.AAA7 were evolved using a 5¢-AAA-3¢-PAM sequence. Figure 6B, shows activity for N4.19.AAA1, N4.19.TAA2, N4.19.TAA5, N4.19.TAT5, N4.19.CAC5, N4.19.CAC6, N4.19.GTA2, N4.19.GTA7, N4.19.GCC2, N4.19.GCC5, and N4.19.GCC8 using a luciferase assay. Clones were obtained from PANCE evolution experiments using xCas93.9 (N4) after passage 19.

[0013] Figure 7 shows the results of mammalian cell editing using cytidine base editor BE3 having various evolved Cas9 clones (top). Indel formation for each of the clones as nuclease active Cas9s is also provided (bottom).

[0014] Figure 8 shows activity data (luciferase assay) for PANCE evolution experiments after passage 2 (N6.2), passage 12 (N6.12) and passage 16 (N6.16) using N4.12.TAT1 as the starting clone (N6). Increased shading indicates increased activity as described in Figure 1C.

[0015] Figures 9A-B show the mutations of TAT1 well as activity data (luciferase assay) on all 64 possible PAM sequences. Figure 9A provides the individual mutations of N4.12.TAT1 (TAT1) as compared to SpCas9. Figure 9B shows activity of TAT1 on all 64 possible PAM sequences.

Increased shading indicates increased activity as described in Figure 1C.

[0016] Figure 10 shows clones of resulting from PANCE evolution experiments using N4.12.TAT1 (N6) after passage 12. The individual mutations in clones N6.12.6, N6.12.7, N6.12.25, and N6.12.28, are shown as compared to TAT1. [0017] Figure 11 shows clones of resulting from PANCE evolution experiments using

N4.12.TAT1 (N6) after passage 18. The individual mutations for each of the listed clones (e.g., N6.18.1-1, N6, 18.1-2, etc.), are shown as compared to TAT1.

[0018] Figure 12 shows activity for N6.18.17-2, N6.18.18-2, N6.18.18-3, N6.18.28-2, N6.18.33-3, N6.18.39-1, N6.18.39-3, N6.18.39-4, N6.18.40-2, N6.18.40-3, N6.18.44-1, SP047a, and SpCas9. using a luciferase assay. Clones were obtained from PANCE evolution experiments using N4.12.TAT1 (N6) after passage 18 (See Figure 11).

[0019] Figures 13A-B show a split-intein PACE configuration to allow evolution of two separate activities of interest. Figure 13A shows that the bacteriophage gIII gene that produces the pIII protein is split into N-terminal (g3N) and C-terminal (g3C) fragments in two separate accessory plasmids (AP1 and AP2). AP1 and AP2 have the same PAM, but a different protospacer (it is not required that they have the same PAM, i.e., both the PAM and protospacer could be changed). Figure 13B shows the workflow for using a split-intein PACE configuration of the gIII gene.

[0020] Figures 14A-C show the evolution and activity of SpCas9 resulting from PACE

experiments using two separate protospacers and split-intein fusion (two allow evolution on two protospacers) as in Figures 13A-B. Figure 14A shows clones resulting from PACE evolution experiments using two protospacers with SpCas9 after passage 4 (P4). Figure 14B shows the ability of the P4 SpCas9 variants incorporated into a BE4max base-editor to support conversion of C to T in CAG, CAT, GAT, CAA, GAA, CGT, or GGG PAMs. Figure 14C shows the ability of the L2-72-4 SpCas9 P4 clone to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs.

[0021] Figures 15A-B show a split-intein PACE configuration (whereby Cas9 is divided into two parts to limit Cas9 concentration) to allow evolution of Cas9 proteins of interest. Figure 15A shows that increasing the SpCas9 concentration increases cleavage of alternative (NAG) PAMs (as reported in Karvelis, T., Gasiunas, G., Young, J., Bigelyte, G., Silanskas, A., Cigan, M., and Siksnys, V. (2015). Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253). Figure 15B shows that the amount of Cas9 protein may be limited in PACE by splitting the inactive Cas9 protein (dCas9) into an N-terminal fragment (dCas9 (1-573)) and a C-terminal fragment (dCas9 (573-end)) and producing the N-terminal fragment from a low-copy number plasmid with a weak promoter (rpoZ).

[0022] Figure 16 shows clones resulting from PACE evolution when a split-intein Cas9 protein with the P4.2.72.4. mutations Experiment P10). The individual mutations for each of the listed clones (e.g., L5.144.2, L5.144.6, etc.), are shown as compared to spCas9 and spCas9 with the P4.2.72.4. mutations.

[0023] Figure 17 shows the ability of the P10 SpCas9 variants from Figure 16 incorporated into a BE4max base-editor to support conversion of C to T in CAG, GAT, TAT, CAT, GAA, CAA-1, or CAA-2 PAMs.

[0024] Figure 18 shows the ability of two P10 SpCas9 variants (P10.5.144.2 and P10.6.144.2) to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs compared to a P4 variant (L2-72-4), SpCas9, and xCas9.

[0025] Figures 19A-C show characterization of a P10 SpCas9 variant with PAM depletion in E. coli. Figure 19A shows a workflow for PAM depletion in E. coli, wherein E. coli containing a Cas9 variant (e.g., P10) are transformed with a library of negative selection plasmids (e.g., pUC ampR with HEK3 protospacer followed by NNNN). See Kleinstiver et al., Engineered CRISPR-Cas9 nucleases with altered PAM specificities, Natur, 523: 481-485. The transformed cells are recovered and Cas9 expression is induced for 1-4 hours. The cells are then plated on carbenicillin media. The plates are then scraped and surviving colonies are sequenced for mutations. Colonies that survive and are sequenced contain PAMs that the P10 Cas9 variant protein could not cut. Figure 19B shows the frequency of PAM sequences present in surviving colonies, wherein more shaded PAM sequences occur more frequently (left), and the activity of P10 Cas9 variant protein on the PAM sequences in a luciferase assay (right). Figure 19C the activity of the P10 SpCas9 variants were characterized by PAM depletion incorporated into a BE4max base-editor to support conversion of C to T in CAG, CAT, GAT, CAA, GAA, CGT, or GGG PAMs

[0026] Figure 20 shows a characterization of the P10 SpCas9 variant protein following PAM depletion as in Figures 19A-19C. The P10 SpCas9 variant protein (left) and xCas9 variant proteins (middle) show preference for the fourth nucleotide in the PAM, wherein C is the most preferred and G is the least preferred. The spCas9 protein (right) does not show this preference. Higher Cas9 protein activity is denoted by darker shading.

[0027] Figure 21 shows clones resulting from split-intein PACE evolution of Cas9 with the P4.2.72.4 mutations Experiment P11) with a AAA PAM. The individual mutations for each of the listed clones (e.g., P11.1.139-2, P11.1.139-4, etc.), are shown as compared to spCas9 with the P4.2.72.4. mutations.

[0028] Figure 22 shows the ability of the P11 SpCas9 variants from Figure 16 incorporated into a BE3 base-editor to support conversion of C to T in CAG, GAT, CAT, GAA, AAA-1, AA1-2, CAA-1, CAA-2, or GGG PAMs.

[0029] Figure 23 shows the ability of two P11 SpCas9 variants (P11-SacB-1 and P11-SacB-2) to form insertions or deletions in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, and GGG PAMs compared to a P4 variant (L2-72-4), SpCas9, and xCas9.

[0030] Figures 24A-B show clones resulting from split-intein PACE evolution of Cas9 with P12 mutations on AAT (FIG.24A) or TAT (FIG.24B) PAMs. The individual mutations for each of the listed clones (e.g., P12.3.b9-2, P12.3.b10-2 etc.), are shown as compared to spCas9 protein.

[0031] Figures 25A-B show the ability of the P12 SpCas9 variants from Figures 24A-B

incorporated into a BE3 base-editor to support conversion of C to T at sites s893, s1073, s1081, s1140, b3, e1, e2, f1, f2, s33, s34, s35, s36, s37, s38, s39, s40, s41, s43, s44, s45, or s46. Darker shading indicates a higher % of C to T editing (FIG.25A). Figure 25B shows the average C to T editing on NATA, NATT, NATC, or NATG PAMs. pSM060ax is clone P12.3.b9-8 and pSM060ay is clone P12.3.b10-6.

[0032] Figures 26A-B show the ability of two P12 SpCas9 variants (P12.3.b9-8 and P12.3.b10-6) to cleave DNA in bacterial PAM depletion in AAA, AAC, AAT, AAG, CAA, CAC, CAT, CAG, TAA, TAC, TAT, TAG, GAA, GAT, GAG, AGA, AGC, AGT, AGG, CGA, CGC, CGT, CGG, TGA, TGC, TGT, TGG, GGA, GGC, GGT, or GGG PAMs. PPDV is the PAM frequency after Cas9

cutting/frequency of input library, wherein lower numbers signify more active Cas9 proteins.

[0033] Figures 27A-B show a split-intein PACE configuration to allow evolution of Cas9 proteins of interest with 2 protospacers. Figure 27A shows evolution of a split-intein Cas9 using selection on 2 protospacers. A second gene (gVI) is removed from the phage and is used as a selection marker on AP2. AP1 and AP2 have the same PAM, but different protospacers and a different nucleotide immediately 3’ of the PAM. Figure 27B shows clones resulting from split-intein PACE evolution of Cas9 as in Figure 27A. The individual mutations for each of the listed clones (e.g., L2-120-1, L2- 120-2, etc.), are shown as compared to spCas9 protein.

[0034] Figure 28 shows survival-based selection for isolating nuclease-active Cas9 variant proteins. In this selection, cutting identifies nuclease-active PACE variants. SacB is lethal in the presence of sucrose unless it is cut by Cas9, sfGFP loses fluorescence if Cas9 cutting occurs, and kanR confers survival on kanamycin medium if no cutting occurs.

[0035] Figures 29A-B show nuclease-active TAT variants that were identified by SacB selection as in Figure 28. The original spCas9 TAT variant was isolated from PANCE evolution on a TAT PAM (N4.TAT.1), but had no nuclease activity. This N4.TAT.1 (TAT1) Cas9 variant was subcloned from the pool of N4.TAT SP (H840-onward) into a Cas9 plasmid and selected for variants that could cut a SacB selection plasmid with a TAT PAM after a 4 hour induction. Figure 29A shows clones resulting from SacB selection of nuclease-inactive TAT. The individual mutations for each of the listed clones (e.g., SacB-TAT-1, SacB-TAT-2), are shown as compared to SpCas9 and TAT SpCas9 variant proteins. Figure 29B shows the location of mutations in the TAT SpCas9 variant proteins.

[0036] Figures 30A-B show the activity of the TAT SpCas9 variant proteins identified in Figure 29A. Figure 30A shows the ability of the nuclease-active TAT SpCas9 variants (SacB-TAT1 and SacB-TAT2) incorporated into a BE4max base-editor to support conversion of C to T in CAG, GAT, TAT, CAT, GAA-1, GAA-2, CAA-1, CAA-2, or GGG PAMs. Figure 30B shows ability of the SacB- TAT1 and SacB-TAT2 variants to form PAM depletion in CAA1, CAA2, AAA1, AAA2, TAA1, TAA2, CAG1, CAG2, GAT1, GAT2, TAT1, TAT2, CAT, GAA1, GAA2, CGT, or GGG PAMs.

[0037] Figure 31 shows the ability of the SacB-TAT-1 SpCas9 protein variant to form insertions or deletions in AAA, AAC, AAT, AAG, CAA, CAC, CAT, CAG, TAA, TAC, TAT, TAG, GAA, GAT, GAG, AGA, AGC, AGT, AGG, CGA, CGC, CGT, CGG, TGA, TGC, TGT, TGG, GGA, GGC, GGT, or GGG PAMs. PPDV is the PAM frequency after Cas9 cutting/frequency of input library, wherein lower numbers signify more active Cas9 proteins.

[0038] Figure 32 shows the location of frequently mutagenized residues by PAM selection.

Positions commonly mutated in SpCas9 variants obtained when evolving on NAN PAMs include: D1135, E1219, D1332.

[0039] Figures 33A-33D show C to T base editing with evolved variants on PAMs. C to T base editing with SpCas9 variants were incorporated into Be4MAX architecture in HEK293T cells. Figure 33A shows C to T base editing with NAA PAMs. Figure 33B shows C to T base editing with NAC PAMs. Figure 33C shows C to T base editing with NAT PAMs. Figure 33D shows C to T base editing with NAG PAMs. Each bar represents the average of 3 independent experiments, and the error bars represents the standard deviation. The“es” SpCas9 variant protein works best on NARH PAMs, with some activity on NARG and NGN PAMS, the“fn” SpCas9 variant protein works best on NRCH PAMs, with some activity on NRCG and NGN PAMs, and the“ax” SpCas9 variant protein works best on NRTH PAMs, with some activity on NRTG and NGN PAMs.

[0040] Figures 34A-34B show C to T base editing with evolved SpCas9 variants on PAMs. C to T base editing with SpCas9 variants were incorporated into BE4MAX architecture in HEK293T cells. Figure 34A shows C to T base editing on NAA, NAC, and NAT PAMs. Figures 34B shows C to T base editing on NAAH, NACH, and NATH PAMs, where H is any base except for G. Each bar represents the average of 3 independent experiments, and the error bars represents the standard deviation.

[0041] Figures 35A-35C show A to G base editing with evolved SpCas9 variants on PAMs. A to G base editing with SpCas9 variants incorporated into ABEMAX architecture in HEK293T cells. Figure 35A shows A to G base editing on NAA/NGA PAMs with es variant SpCas9. Figure 35B shows A to G base editing on NAC/NGC PAMs with fn variant SpCas9. Figure 35C shows A to G base editing on NAG/NGG PAMs with es and fn variant SpCas9 proteins. Each bar represents the average of 2 independent experiments, and the error bars represent the standard deviation.

[0042] Figure 36 show phage-assisted non-continuous evolution (PANCE) of SpCas9 binding activity on non-G PAMs. (A) Original selection scheme for Cas9 DNA binding. w-dSpCas9 expressed by DgIII selection phage (SP) binds to a designated protospacer/PAM sequence upstream of gIII on an accessory plasmid (AP) in host E. coli cells. Host cells and infecting SP are continuously mutagenized by a mutagenesis plasmid (MP). (B) Fold propagation of SP expressing w-dSpCas9 or w-dxCas9 on APs encoding each of all 64 NNN PAM sequences upstream of gIII. (C) Schematic overview of PANCE workflow. Host cells containing an AP and MP are grown to log phase in a deep well plate or tube before being infected with SP. Mutagenesis is induced and SP are allowed to propagate for 6-18 hours before cells are pelleted and the SP-containing supernatant is collected. The SP pool is then used to infect host cells in the next iteration of PANCE. (D) Consensus mutations arising from evolution of w-dSpCas9 (N1) or w-dxCas9 (N2) on NAA (red), NAT (blue), or NAC (green) PAM sequences.

[0043] Figures 37A-37E shows multiple new PACE schemes utilizing a split-intein Cas9 and/or two protospacers. Figure 37A shows new PACE schemes to limit the concentration of spCas9 protein and/or increase the number of Cas9 binding sites. Figure 37B shows SpCas9 individual NAA mutations for each of the listed clones (e.g., N3.GAA-3, N3.GAA-4, etc.), are shown as compared to SpCas9 protein. Figure 37C shows a timecourse of the NAA variants from Figure 37B through evolution. Figure 37D shows SpCas9 individual NAC mutations for each of the listed clones (e.g., N4.CAC-1, N4.CAC-5, etc.), are shown as compared to SpCas9 protein. Also shown is D1135N, R1114G, V1139A, E1219V, Q1221H, R1320V, and R1333K mapped to the SpCas9 crystal structure 4un3. Figure 37E shows SpCas9 individual NAT mutations for each of the listed clones (e.g., SacB.N4.TAT-1, SacB.N4-TAT-3, etc.), are shown as compared to SpCas9 protein. Also shown is D1135N, R1114G, E1219V, H1349R, S1338T, R1335Q, and D1332N mapped to the SpCas9 crystal structure 4un3 (left, lower structure). The lower right structure also shows D1135N, R1114G, E1219V, G1218S, Q1221H, P1321S, R1335, and D1332G mapped to the SpCas9 crystal structure 4un3.

[0044] Figures 38A-38D show characterization of evolved variants and SpCas9-NG through bacterial PAM depletion and mammalian cell indel formation. Figure 38A shows bacterial PAM depletion of SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG using a bacterial NNNN PAM library. The inverse of the depletion score was used to generate enrichment scores of activity on each NNNN PAM, which were then used to create sequence logos (WebLogo3.0). Figure 38B shows indel formation in HEK293T cells across 64 endogenous mammalian sites containing NANN PAMs for SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and SE of three independent biological replicates are shown. Figure 38C provides a summary of indel formation efficiencies in HEK293T cells across 48 endogenous mammalian sites containing NANH (H=non-G) PAMs for SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and standard deviation (SD) of all individual values of three independent biological replicates are plotted. Figure 38D shows DNA targeting specificity of SpCas9, xCas9, and evolved variants SpCas9-NRRH, -NRTH-, and NRCH as determined by % on- target reads resulting from GUIDE-seq analysis using HEK target site 4 in U2OS cells.

[0045] Figure 39A-39E show mammalian C to T and A to G base editing activity of evolved variants and SpCas9-NG. Figure 39A shows cytosine base editing in HEK293T cells across 64 endogenous mammalian sites containing NANN PAMs for BE4-NRRH, BE4-NRTH, BE4-NRCH, and BE4-NG. Mean and SE of three independent biological replicates are shown. Figure 39B shows a summary of cytosine base editing in HEK293T cells across 48 endogenous mammalian sites containing NANH (H=non-G) PAMs for BE4-NRRH, BE4-NRTH, BE4-NRCH, and BE4-NG. Mean and SE of three independent biological replicates are shown. Figure 39C shows adenine base editing in HEK293T cells across 27 endogenous mammalian sites containing NANN PAMs for ABE- NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. Mean and SE of three independent biological replicates are shown. Figure 39D shows the fraction of pathogenic SNPs in the ClinVar Database that could in principle be corrected by a C•G to T•A (left) or A•T to G•C (right) base conversion using NR PAMs. Figure 39E shows the number of possible sgRNAs capable of targeting pathogenic SNPs in the ClinVar Database using NR, NG, or NGG PAMs.

[0046] Figures 40A-40G shows a characterization of PAM preferences using a genomically integrated human cell base editing target sequence library. Figure 40A is a schematic overview of a mammalian cell base editing library experiment. A library of matched sgRNA/protospacer target sites spanning all NNNN PAMs is stably genomically integrated in HEK293T cells. Library cells are then transfected with and selected for genomic integration of plasmids encoding BE4 variants. After antibiotic selection, cells are lysed and the integration of plasmids encoding BE4 variants. After antibiotic selection, cells are lysed and the integrated sgRNA/protospacer site is PCR amplified for HTS analysis. Figure 40B provides a heat map of base editing activity on the NNNN PAM library in HEK293T cells, with positions 2, 3, and 4 of the PAM defined. For each construct, the mean editing across all sites containing the designated PAM over two independent biological replicates, internally normalized against the highest editing value for each construct, is shown.

Figure 40C-E shows the average base editing activity on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH, with PAM positions 2 (C), position 3 (D), or position 4 (E) fixed. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. Figure 40F-40G show the effect of sgRNA length and 5’G mismatches on the base editing efficiency of profiled SpCas9 variants. The percentage decrease of editing efficiency from using a 21 nt sgRNA with either a mached (F) or mismatched (G) 5’G compared to using a matched 20 nt sgRNA is shown for BE4, BE4-NRRH, BE4- NRCH, BE4-NRTH, and BE4-NG on all library sequences containing NAN, NRN, NGN, or NGG PAMs. The mean and SE are plotted.

[0047] Figure 41A-41C shows evolved SpCas9 variants allow correction of pathogenic SNPs using non-G PAMs. Figure 41A provides an overview of adenine base editing strategy for correcting the sickle hemoglobin (HbS) SNP. In HbS, the Glu (GAG codon) at position 6 of normal b-globin (HBB) is mutated to a Val (GTG codon). Targeting this SNP with A•T to G•C base editing on the reverse strand enables a Val to Ala (GTG to GCG) base conversion, leading to the Makassar b-globin variant (HbG) which produces phenotypically normal b-globin. Figure 41B shows A•T to G•C base editing in HEK293T cells engineered with the HbS mutation using a CACC PAM by ABE- NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. This PAM places the target A at position 7, and an off-target A, which leads to a silent pro (CCT) to pro (CCC) mutation, at position 9. Mean and SE of three independent biological replicates are shown. Figure 41C shows A•T to G•C base editing in HEK293T cells engineered with the HbS mutation using a CATG PAM by ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. This PAM places the target A at position 4, and an off-target A, which leads to a silent pro (CCT) to pro (CCC) mutation, at position 6. Mean and SE of three independent biological replicates are shown.

[0048] Figure 42 provides a table of NRNN PAM targeting potential by SpCas9 and SaCs9 variants described herein. The variants SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH are disclosed and discussed herein.

[0049] Figure 43A-43F depicts additional details of Cas9:DNA binding PACE and Cas9 nuclease selections. Figure 43A shows dual AP selection where ώ-dSpCas9 binds two distinct

protospacer/PAM sequences to drive either half of split-intein pIII. Figure 43B shows split-intein Cas9 limits total Cas9 concentration in host cells, thus avoiding saturation of protospacer/PAM binding sites. Residues 574-1368 of Cas9 fused to NpuC is expressed by DgIII SP and ώ–dSpCas9(1- 573) fused to NpuN is encoded on a low copy complimentary plasmid (CP) in host cells. Figure 43C shows a combination of the selection principles from (A) and (B) through use of gVI as an additional PACE-compatible selection marker for phage propagation and DgIIIDgVI SP. Figure 43D shows overnight propagation assay of selection phage (SP) encoding dSpCas9C on host cells containing a complimentary plasmid (CP) providing either ώ–dSpCas9 N or ώ–dSpCas9 N-mut and an AP encoding either a AAA or CAA PAM. Figure 43E and 43F show a scheme of survival based selection for Cas9 nuclease activity. Cells containing a high-copy selection plasmid encoding a protospacer/ PAM sequence, sfGFP, and the conditionally lethal protein SacB are transformed with a library of nuclease-active Cas9s encoded on a low-copy plasmid that also includes the matching sgRNA.

Binding and cleavage of the designated PAM/protospacer by Cas9 leads to destruction of the selection plasmid, resulting in loss of both sfGFP and SacB expression, allowing cells to survive on sucrose- containing media.

[0050] Figure 44A-44C show the effects of mutations on PAM recognition by SpCas9 variants. Figure 44A shows the addition of the Y1131C mutation, which was enriched in the later phases of the NAT evolution trajectory, inactivates BE3-NRTH in HEK293T cells. Mean and SE of three independent biological replicates are shown. Figure 44B shows the N-terminal mutations of SpCas9-NRRH, -NRCH, and -NRTH mapped to the SpCas9 crystal structure (4UN3). Figure 44C shows CBE activity of BE3-NRRH, BE3-NRTH, and BE3-NRCH with and without the N-terminal mutations shown in (B) in HEK293T cells. Mean and SE of three independent biological replicates are shown.

[0051] Figure 45A-45D is a characterization of SpCas9, xCas9, and evolved variants (SpCa9- NRTH, SpCas9-NRCH, and SpCas9-NRRH) in bacterial PAM depletion and mammalian indel formation experiments. Figure 45A shows bacterial PAM depletion of SpCas9-NRRH, -NRCH, - NRTH, and SpCas9-NG on a bacterial NNNN PAM library with 1 h, 3 h, and overnight Cas9 induction. The inverse of the depletion score was used to generate enrichment scores of activity on each NNNN PAM, which were then used to create sequence logos (WebLogo3.0). Figure 45B shows indel formation in HEK293T cells across endogenous mammalian sites containing NANN PAMs for xCas9, SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG. Mean and SE of three independent biological replicates are shown. Figure 45C shows indel formation in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for SpCas9-NRRH, -NRTH, -NRCH, SpCas9-NG, and SpCas9. Mean and SE of three independent biological replicates are shown. Figure 45D shows GUIDE-seq analysis of SpCas9, xCas9, and evolved variants SpCas9-NRRH, -NRTH, and -NRCH targeting HEK site 4 in U2OS cells. GUIDE-seq on-target (indicated by the asterisk) and off-target reads that are greater than or equal to 1% total reads are shown.

[0052] Figure 46A-46C shows the characterization of SpCas9 (BE4), SpCas9-NG (BE4-NG), and evolved CBE and ABE variants in mammalian base editing experiments. Figure 46A shows CBE in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for BE4-NRRH, BE4- NRTH, BE4-NRCH, BE4-NG, and BE4. Mean and SE of three independent biological replicates are shown. Figure 46B shows ABE in HEK293T cells across endogenous mammalian sites containing NGNN PAMs for ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG. Mean and SE of three independent biological replicates are shown. For target sites with NGA, NGC, and NGT PAMs, only ABE-NRRH, ABE-NRTH, and ABE-NRCH are shown, respectively, in addition to SpCas9-NG. Figure 46C shows the fraction of pathogenic SNPs in the ClinVar Database with either a single targetable base within the window or multiple targetable bases that could in principle be corrected by a C•G to T•A (top left) or A•T to G•C (top right) base conversion using NR PAMs or C•G to T•A (bottom left) or A•T to G•C (bottom right) base conversion using NG PAMs.

[0053] Figure 47A-47D shows the characterization of PAM preferences of BE4, BE4-NRRH, BE4- NRCH, and BE4-NG using a genomically integrated human cell base editing target sequence library Figure 47A shows the distribution of the number of target sites per PAM within the integrated sgRNA library. Figure 47B shows the PAM preferences for BE4, BE4-NRRH, BE4-NRTH, and BE4- NRCH as determined by base editing on the target sequence library integrated in HEK293T cells. Sequence logos for each construct were created from the CBE activity on each NNNN PAM contained in the library (WebLogo3.0). Figure 47C Average base editing activity on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH, with PAM position 1 fixed. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. Figure 47C-47D shows effect of sgRNA length and 5’G mismatch on base editing efficiency of profiled SpCas9 variants. Average base editing on the NNNN PAM library in HEK293T cells by BE4, BE4-NRRH, BE4-NRTH, and BE4-NRCH is grouped by sites containing a 20-nt sgRNA with a 5’G matched to the target sequence, a 21-nt sgRNA with a 5’G matched to the target sequence, or a 21-nt sgRNA with a mismatched 5’ nucleotide.

Average editing activity of constructs on NGN (Figure 47D), NAN (Figure 47E), and NGG (Figure 47F) PAMs are shown. Mean and SE for individual editing values (averaged across two independent biological replicates) at all relevant library sequences are shown. ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001 (Student’s t test). [0054] Figure 48A-48C shows high-throughput sequencing analysis of sickle cell locus editing by SpCas9 variant-derived ABEs. Figure 48A shows Crispresso2 output showing the HbS mutation in a engineered HEK293T cell line. HEK293T cells were treated with nickase-SpCas9, sgRNA (binding shown in grey), and ssODN containing the point mutation. After two rounds of transfection, sorting, and growth, the cell line sequenced above was isolated and identified to have 100% conversion to the sickle cell anemia allele. Figure 48B shows Crispresso2 output showing ABE activity of ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG in HbS engineered HEK293T cells using a sgRNA (gray bar) targeting a CATG PAM. Figure 48C shows Crispresso2 output showing ABE activity of ABE-NRRH, ABE-NRTH, ABE-NRCH, and ABE-NG in HbS engineered HEK293T cells using a sgRNA (gray bar) targeting a CACC PAM. DEFINITIONS

[0055] As used herein and in the claims, the singular forms“a,”“an,” and“the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to“an agent” includes a single agent and a plurality of such agents.

[0056] The term“base editor (BE),” or“nucleobase editor (NBE),” as used herein, refers to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, the base editor is capable of deaminating a base within a nucleic acid. In some embodiments, the base editor is capable of deaminating a base within a DNA molecule. In some embodiments, the base editor is capable of deaminating a cytosine (C) in DNA. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein fused to a nucleic acid editing domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to a cytidine deaminase domain. In some embodiments, the base editor comprises a Cas9 domain (e.g., an evolved Cas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein fused to a cytidine deaminase. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to an cytidine deaminase domain. In some embodiments, the base editor comprises a nuclease-inactive Cas9 (dCas9) fused to a cytidine deaminase domain. In some embodiments, the base editor includes an inhibitor of base excision repair, for example, a UGI domain or a dISN domain.

[0057] In some embodiments, the base editor is capable of deaminating an adenosine (A) in DNA. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein fused to a nucleic acid editing domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase domain. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to one or more adenosine deaminase domains. In some embodiments, the base editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to two adenosine deaminase domains. In some embodiments, the base editor comprises a Cas9 (e.g., an evolvedCas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a Cas9 nickase (Cas9n) fused to two adenosine deaminase domains. In some embodiments, the base editor comprises a nuclease- inactive Cas9 (dCas9) fused to an adenosine deaminase domain. In some embodiments, the base editor comprises a nuclease-inactive Cas9 (dCas9) fused to two adenosine deaminase domains. In some embodiments, the base editor is fused to an inhibitor of base excision repair, for example, a UGI domain, or a dISN domain.

[0058] The term“nucleic acid programmable DNA binding protein” or“napDNAbp” refers to a protein that associates with a nucleic acid (e.g., DNA or RNA), such as a guide nucleic acid (e.g., gRNA), that guides the napDNAbp to a specific nucleic acid sequence, for example, by hybridizing to the target nucleic acid sequence. For example, a Cas9 domain can associate with a guide RNA that guides the Cas9 domain to a specific DNA sequence that has complementary to the guide RNA. In some embodiments, the napDNAbp is a class 2 microbial CRISPR-Cas effector. In some embodiments, the napDNAbp is a Cas9 domain, for example, a nuclease active Cas9, a Cas9 nickase (Cas9n), or a nuclease inactive Cas9 (dCas9). Examples of nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., an evolved Cas9 domain), or an evolved version of a CasX, CasY, Cpf1, C2c1, C2c2, C2c3, or Argonaute protein that comprises one or more mutations homologous to the mutations provided herein. It should be appreciated, however, that nucleic acid programmable DNA binding proteins also include nucleic acid programmable proteins that bind RNA. For example, the napDNAbp may be associated with a nucleic acid that guides the napDNAbp to an RNA. Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, though they may not be specifically described in this Application. [0059] As used herein, the term“circular permutant” refers to a protein or polypeptide (e.g., a Cas9) comprising a circular permutation, which is change in the protein’s structural configuration involving a change in order of amino acids appearing in the protein’s amino acid sequence. In other words, circular permutants are proteins that have altered N- and C-termini as compared to a wild-type counterpart, e.g., the wild-type C-terminal half of a protein becomes the new N-terminal half.

Circular permutation (or CP) is essentially the topological rearrangement of a protein’s primary sequence, connecting its N- and C-terminus, often with a peptide linker, while concurrently splitting its sequence at a different position to create new, adjacent N- and C-termini. The result is a protein structure with different connectivity, but which oftern can have the same overall similar three- dimensional (3D) shape, and possibly include improved or altered characteristics, including, reduced proteolytic susceptibility, improved catalytic activity, altered substrate or ligand binding, and/or improved thermostability. Circular permutant proteins can occur in nature (e.g., concanavalin A and lectin). In addition, circular permutation can occur as a result of posttranslational modifications or may be engineered using recombinant techniques.

[0060] The term“circularly permuted Cas9” refers to any Cas9 protein, or variant thereof, that has been occurs as a circular permutant, whereby its N- and C-termini have been topically rearranged. Such circularly permuted Cas9 proteins (“CP-Cas9”), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al.,“Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491–511 and Oakes et al.,“CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, January 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).

[0061] In some embodiments, the napDNAbp is an“RNA-programmable nuclease” or“RNA- guided nuclease.” The terms are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA(s) that is not a target for cleavage. In some embodiments, an RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). Guide RNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. Guide RNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though“gRNA” is also used to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as a single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (i.e., directs binding of a Cas9 complex to the target); and (2) a domain that binds a Cas9 domain. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA and comprises a stem-loop structure. In some embodiments, domain (2) is identical or homologous to a tracrRNA as provided in Jinek et al., Science 337:816-821 (2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in International Patent Application PCT/US2014/054252, filed September 5, 2014, entitled“Switchable Cas9 Nucleases And Uses Thereof,” and International Patent Application PCT/US2014/054247, filed September 5, 2014, entitled“Delivery System For Functional Nucleases,” the entire contents of each are hereby incorporated by reference in their entirety. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.” For example, an extended gRNA will bind two or more Cas9 domains and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA-programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example, Cas9 (also known as Csn1) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.

98:4658-4663 (2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607 (2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821 (2012), the entire contents of each of which are incorporated herein by reference).

[0062] Because RNA-programmable nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to target, in principle, any sequence specified by the guide RNA. Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al., Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al., RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W.Y. et al., Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J.E. et al., Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Research (2013); Jiang, W. et al., RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).

[0063] In general, a“CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a“direct repeat” and a tracrRNA- processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a“spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.

[0064] The term“Cas9” or“Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A“Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A“Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)- associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3¢-5¢ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply“gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A.,

Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.98:4658-4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and“A programmable dual-RNA- guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

[0065] A nuclease-inactivated Cas9 domain may interchangeably be referred to as a“dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science.337:816-821(2012); Qi et al.,“Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell.28;152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non- complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S.

pyogenes Cas9 (Jinek et al., Science.337:816-821(2012); Qi et al., Cell.28;152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2).

[0066] In some embodiments, proteins comprising fragments of Cas9 are provided. In some embodiments, the fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.” A Cas9 variant shares homology to Cas9. For example a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 2). In some embodiments, wild type Cas9 corresponds to Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1, SEQ ID NO: 1 (nucleotide); SEQ ID NO: 2 (amino acid)).

[0069] In some embodiments, Cas9 refers to a Cas9 nickase having a D10A substitution (e.g., S.

(single underline: HNH domain; double underline: RuvC domain)

[0070] In other embodiments, Cas9 refers to a Cas9 nickase having a H840A substitution (e.g., S.

DLSQLGGD (SEQ ID NO: 8) (single underline: HNH domain; double underline: RuvC domain; H840A mutation shown in bold) [0071] In still other embodiments, Cas9 refers to a dead Cas9 having D10A and H840A substitutions (e.g., S. pyogenes Cas9 Q99ZW2 (D10A) (H840A)) (SEQ ID NO: 9):

(D10A and H840A mutations shown in bold; see, e.g., Qi et al., Repurposing CRISPR as an RNA- guided platform for sequence-specific control of gene expression. Cell.2013; 152(5):1173-83, the entire contents of which are incorporated herein by reference).

[0072] In some embodiments, Cas9 refers to Cas9 protein derived from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisI (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1), Listeria innocua (NCBI Ref: NP_472073.1), Campylobacter jejuni (NCBI Ref: YP_002344900.1); Geobacillus stearothermophilus (NCBI Ref: NZ_CP008934.1); or Neisseria meningitidis (NCBI Ref:

YP_002342100.1) or to a Cas9 from any other organism.

[0073] In some embodiments, a Cas9 domain comprising one or more mutations provided herein is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 92%, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to SEQ ID NO: 2. In some embodiments, variants of a Cas9 domain comprising one or more mutations provided herein are provided having amino acid sequences which are shorter, or longer than SEQ ID NO: 2, by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids, or more.

[0074] In some embodiments, the Cas9 domain comprises a D10A mutation, while the residue at position 840 remains a histidine relative to the amino acid sequence as provided in SEQ ID NO: 2, or at corresponding positions in any of the amino acid sequences provided in SEQ ID NO: 2. Without wishing to be bound by any particular theory, the presence of the catalytic residue H840 restores the activity of the Cas9 to cleave the non-edited (e.g., non-deaminated) strand containing a G opposite the targeted C. Restoration of H840 (e.g., from A840) does not result in the cleavage of the target strand containing the C. Such Cas9 variants are able to generate a single-strand DNA break (nick) at a specific location based on the gRNA-defined target sequence, leading to repair of the non-edited strand, ultimately resulting in a base change (e.g., a G to A change) on the non-edited strand. Briefly, the C of a C-G base pair can be deaminated to a U by a deaminase, e.g., an APOBEC deaminase. Nicking the non-edited strand, the strand having the G, facilitates removal of the G via mismatch repair mechanisms. Uracil-DNA glycosylase inhibitor protein (UGI) inhibits Uracil-DNA glycosylase (UDG), which prevents removal of the U.

[0075] In other embodiments, dCas9 variants having mutations other than D10A and H840A are provided, which, e.g., result in nuclease inactivated Cas9 (dCas9). Such mutations, by way of example, include other amino acid substitutions at D10 and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvC1 subdomain).

[0076] The term“Cas9 nickase” or“Cas9n” or“nCas9” as used herein, refers to a Cas9 domain that is capable of cleaving one strand of the duplexed nucleic acid molecule (e.g., a duplexed DNA molecule). In some embodiments, a Cas9 nickase comprises a D10A mutation and has a histidine at position H840 of SEQ ID NO: 2, or a corresponding mutation in any of SEQ ID NOs: 2. For example, in some embodiments, a Cas9 nickase comprises the amino acid sequence as set forth in SEQ ID NO: 8 comprising the H840A substitution. Such a Cas9 nickase (Cas9n) has an active HNH nuclease domain and is able to cleave the non-targeted strand of DNA, i.e., the strand bound by the gRNA. Further, such a Cas9 nickase has an inactive RuvC nuclease domain and is not able to cleave the targeted strand of the DNA, i.e., the strand where base editing is desired. In some embodiments, any of the Cas9 domains provided herein comprises a D10A mutation (e.g., SEQ ID NO: 7). In some embodiments, any of the Cas9 domains provided herein comprises a H840A mutation (SEQ ID NO: 8). Exemplary Cas9 nickases are shown below. However, it should be appreciated that additional Cas9 nickases that generate a single-stranded DNA break of a DNA duplex would be apparent to the skilled artisan and are within the scope of this disclosure.

[0077] In some embodiments, Cas9 fusion proteins as provided herein comprise the full-length amino acid sequence of a Cas9 domain, e.g., one of the sequences provided above. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or a sgRNA, but does not comprise a functional nuclease domain, e.g., it comprises only a truncated version of a nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and Cas9 fragments will be apparent to those of skill in the art. In some embodiments, a Cas9 fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 domain. In some embodiments, a Cas9 fragment comprises at least at least 100 amino acids in length. In some embodiments, the Cas9 fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, or at least 1600 amino acids of a corresponding wild type Cas9 domain. In some embodiments, the Cas9 fragment comprises an amino acid sequence that has at least 10, at least 15, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, or at least 1200 identical contiguous amino acid residues of a corresponding wild type Cas9 domain. In some embodiments, the wild-type protein is S. pyogenes Cas9 (SpCas9) of SEQ ID NO: 2.

[0078] In some embodiments, Cas9 fusion proteins as provided herein comprise the full-length amino acid sequence of a Cas9 domain, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and fragments will be apparent to those of ordinary skill in the art. In some

embodiments, Cas9 refers to Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1);

Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref:

NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquis I (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1); Geobacillus

stearothermophilus (NCBI Ref: NZ_CP008934.1); Listeria innocua (NCBI Ref: NP_472073.1); Campylobacter jejuni (NCBI Ref: YP_002344900.1); or Neisseria. meningitidis (NCBI Ref:

YP_002342100.1).

[0079] The term“deaminase” or“deaminase domain,” as used herein, refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism, that does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase from an organism.

[0080] In some embodiments, the deaminase or deaminase domain is a cytidine deaminase, catalyzing the hydrolytic deamination of cytidine or deoxycytidine to uridine or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytidine deaminase domain, catalyzing the hydrolytic deamination of cytosine to uracil. In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine or cytosine in deoxyribonucleic acid (DNA). In some embodiments, the cytidine deaminase domain comprises the amino acid sequence of any one disclosed herein. In some embodiments, the cytidine deaminase or cytidine deaminase domain is a naturally-occurring cytidine deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the cytidine deaminase or cytidine deaminase domain is a variant of a naturally-occurring cytidine deaminase from an organism that does not occur in nature. For example, in some embodiments, the cytidine deaminase or cytidine deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring cytidine deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse.

[0081] In some embodiments, the deaminase or deaminase domain is an adenosine deaminase, which catalyzes the hydrolytic deamination of adenine or adenosine. In some embodiments, the deaminase or deaminase domain is an adenosine deaminase, catalyzing the hydrolytic deamination of adenosine or deoxyadenosine to inosine or deoxyinosine, respectively. In some embodiments, the adenosine deaminase catalyzes the hydrolytic deamination of adenine or adenosine in

deoxyribonucleic acid (DNA). The adenosine deaminases (e.g., engineered adenosine deaminases, evolved adenosine deaminases) provided herein may be from any organism, such as a bacterium. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism. In some embodiments, the deaminase or deaminase domain does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase. In some embodiments, the adenosine deaminase is from a bacterium, such as E.coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase. For example, the truncated ecTadA may be missing one or more N- terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the ecTadA deaminase does not comprise an N-terminal methionine.

[0082] In some embodiments, the TadA deaminase is an N-terminal truncated TadA. In certain embodiments, the adenosine deaminase comprises the amino acid sequence:

[0083] In some embodiments the TadA deaminase is a full-length E. coli TadA deaminase. For example, in certain embodiments, the adenosine deaminase comprises the amino acid sequence:

[0084] It should be appreciated, however, that additional adenosine deaminases useful in the present application would be apparent to the skilled artisan and are within the scope of this disclosure. For example, the adenosine deaminase may be a homolog of an ADAT. Exemplary ADAT homologs include, without limitation:

[0085] The term“effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a nuclease may refer to the amount of the nuclease that is sufficient to induce cleavage of a target site specifically bound and cleaved by the nuclease. In some embodiments, an effective amount of a fusion protein provided herein, e.g., of a fusion protein comprising a Cas9 domain and a nucleic acid editing domain (e.g., a deaminase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a deaminase, a recombinase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors such as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited; on the cell or tissue being targeted; and on the agent (e.g., Cas9 domain, fusion protein, vector, cell, etc.) being used.

[0086] The term“immediately adjacent” as used in the context of two nucleic acid sequences refers to two sequences that directly abut each other as part of the same nucleic acid molecule and are not separated by one or more nucleotides. Accordingly, sequences are immediately adjacent, when the nucleotide at the 3ʹ-end of one of the sequences is directly connected to nucleotide at the 5ʹ-end of the other sequence via a phosphodiester bond.

[0087] The term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., two domains of a fusion protein, such as, for example, a nuclease-inactive Cas9 domain and a nucleic acid editing domain (e.g., a deaminase domain). A linker may be, for example, an amino acid sequence, a peptide, or a polymer of any length and composition. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease, including a Cas9 nuclease domain, and the catalytic domain of a nucleic-acid editing protein. In some embodiments, a linker joins a dCas9 and a nucleic-acid editing protein. In some embodiments, a linker joins a Cas9n and a nucleic-acid editing protein. In some embodiments, a linker joins an RNA- programmable nuclease domain and a UGI domain. In some embodiments, a linker joins a dCas9 and a UGI domain. In some embodiments, a linker joins a Cas9n and a UGI domain. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 1-100 amino acids in length, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. In some

embodiments, a linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89), which may also be referred to as the XTEN linker. In some embodiments, a linker comprises the amino acid sequence SGGS (SEQ ID NO: 90). In some embodiments, a linker comprises the amino acid sequence (SGGS)2-SGSETPGTSESATPES-(SGGS)2 (SEQ ID NO: 96), which may also be referred to as (SGGS)2-XTEN-(SGGS)2. In some embodiments, a linker comprises (SGGS)n (SEQ ID NO: 92), (GGGS)n (SEQ ID NO: 94), (GGGGS)n (SEQ ID NO: 96), (G)n (SEQ ID NO: 97), (EAAAK)n (SEQ ID NO: 99), (GGS)n (SEQ ID NO: 101), SGGS(GGS)n (SEQ ID NO: 103), (SGGS)n-SGSETPGTSESATPES-(SGGS)n (SEQ ID NO: 98), or (XP)n motif, or a combination of any of these, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, n is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. In some

embodiments, n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence:

[0088] The term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)).

[0089] The terms“nucleic acid” and“nucleic acid molecule,” as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleoside, a nucleotide, or a polymer of nucleotides. Typically, polymeric nucleic acids, e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage. In some embodiments,“nucleic acid” refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides). In some embodiments,“nucleic acid” refers to an oligonucleotide chain comprising three or more individual nucleotide residues. As used herein, the terms“oligonucleotide” and“polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides). In some embodiments,“nucleic acid” encompasses RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms“nucleic acid,”“DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5¢ to 3¢ direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5- methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8- oxoguanosine, O(6)-methylguanine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2¢-fluororibose, ribose, 2¢-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g.,

phosphorothioates and 5¢-N-phosphoramidite linkages). In some embodiments, an RNA is an RNA associated with the Cas9 system. For example, the RNA may be a CRISPR RNA (crRNA), a trans- encoded small RNA (tracrRNA), a single guide RNA (sgRNA), or a guide RNA (gRNA).

[0090] The term“nucleic acid editing domain,” as used herein refers to a protein or enzyme capable of making one or more modifications (e.g., deamination of a cytidine residue) to a nucleic acid (e.g., DNA or RNA). Exemplary nucleic acid editing domains include, but are not limited to a deaminase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an

acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments, the nucleic acid editing domain is a deaminase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments, the nucleic acid editing domain is a deaminase domain (e.g., a cytidine deaminase, such as an APOBEC or an AID deaminase, or an adenosine deaminase, such as ecTadA). In some embodiments, the nucleic acid editing domain is a cytidine deaminase domain (e.g., an APOBEC or an AID deaminase). In some embodiments, the nucleic acid editing domain is an adenosine deaminase domain (e.g., an ecTadA).

[0091] The term“nuclear localization sequence” or“NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 113) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 114).

[0092] The term“proliferative disease,” as used herein, refers to any disease in which cell or tissue homeostasis is disturbed in that a cell or cell population exhibits an abnormally elevated proliferation rate. Proliferative diseases include hyperproliferative diseases, such as pre-neoplastic hyperplastic conditions and neoplastic diseases. Neoplastic diseases are characterized by an abnormal proliferation of cells and include both benign and malignant neoplasias. Malignant neoplasia is also referred to as cancer.

[0093] The terms“protein,”“peptide,” and“polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a

carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, or synthetic, or any combination thereof. The term“fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins, or at least two identical protein domains (i.e., a homodimer). One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic acid editing protein. In some embodiments, a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

[0094] The term“subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a plant or a fungus. In some embodiments, the subject is a research animal (e.g., a rat, a mouse, or a non-human primate). In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex, of any age, and at any stage of development.

[0095] The term“target site” refers to a nucleic acid sequence or a nucleotide within a nucleic acid that is targeted or modified by an effector domain that is fused to a napDNAbp. In some embodiments, a“target site” is a sequence within a nucleic acid molecule that is deaminated by a deaminase or a fusion protein comprising a deaminase, (e.g., a dCas9-deaminase fusion protein or a Cas9n-deaminase fusion protein provided herein). In some embodiments, the target site refers to a sequence within a nucleic acid molecule that is cleaved by a napDNAbp (e.g., a nuclease active Cas9 domain) provided herein. The target site is contained within a target sequence (e.g., a target sequence comprising a reporter gene, or a target sequence comprising a gene located in a safe harbor locus).

[0096] The terms“treatment,”“treat,” and“treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms“treatment,”“treat,” and“treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.

[0097] The term“pharmaceutical composition,” as used herein, refers to a composition that can be administrated to a subject in the context of treatment of a disease or disorder. In some embodiments, a pharmaceutical composition comprises an active ingredient, e.g., a nuclease or a nucleic acid encoding a nuclease, and a pharmaceutically acceptable excipient.

[0098] The term“uracil glycosylase inhibitor” or“UGI,” as used herein, refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme. In some embodiments, a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, a UGI domain comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 115-120. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in SEQ ID NO: 115-120. In some embodiments, a UGI comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 115-120, or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 115-120. In some embodiments, proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as“UGI variants.” A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI variant comprises a fragment of UGI, such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild- type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI comprises the amino acid sequence of SEQ ID NO: 115, as set forth below. Exemplary Uracil-DNA glycosylase inhibitor (UGI; >sp|P14739|UNGI_BPPB2)

MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVML LTSDAPEYKPW ALVIQDSNGENKIKML (SEQ ID NO: 115).

[0099] The term“catalytically inactive inosine-specific nuclease,” or“dead inosine-specific nuclease (dISN),” as used herein, refers to a protein that is capable of inhibiting an inosine-specific nuclease. Without wishing to be bound by any particular theory, catalytically inactive inosine glycosylases (e.g., alkyl adenine glycosylase [AAG]) will bind inosine, but will not create an abasic site or remove the inosine, thereby sterically blocking the newly-formed inosine moiety from DNA damage/repair mechanisms. In some embodiments, the catalytically inactive inosine-specific nuclease may be capable of binding an inosine in a nucleic acid but does not cleave the nucleic acid.

Exemplary catalytically inactive inosine-specific nucleases include, without limitation, catalytically inactive alkyl adenosine glycosylase (AAG nuclease), for example, from a human, and catalytically inactive endonuclease V (EndoV nuclease), for example, from E. coli. In some embodiments, the catalytically inactive AAG nuclease comprises an E125Q mutation as shown in SEQ ID NO: 40, or a corresponding mutation in another AAG nuclease. In some embodiments, the catalytically inactive AAG nuclease comprises the amino acid sequence set forth in SEQ ID NO: 40. In some

embodiments, the catalytically inactive EndoV nuclease comprises an D35A mutation as shown in SEQ ID NO: 41, or a corresponding mutation in another EndoV nuclease. In some embodiments, the catalytically inactive EndoV nuclease comprises the amino acid sequence set forth in SEQ ID NO: 41. It should be appreciated that other catalytically inactive inosine-specific nucleases (dISNs) would be apparent to the skilled artisan and are within the scope of this disclosure. Various examples include:

Truncated AAG (H. sapiens) nuclease (E125Q); mutated residue shown in bold. KGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRIVETQAYLGPEDEAAHSRG GRQTPRNR GMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRQLRSTLRKGTASR VLKDRELC SGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVGVGHAGEWARKPL RFYVRGSP WVSVVDRVAEQDTQA (SEQ ID NO: 116); and

EndoV nuclease (D35A); mutated residue shown in bold.

DLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGEVTRAAMVLLKYPSLE LVEYKVARIAT TMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGVASHFGLLVDVPTIG VAKKRLCG KFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSALAWVQRCMKGYR LPEPTRWA DAVASERPAFVRYTANQP (SEQ ID NO: 117). DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

[00100] Streptococcus pyogenes Cas9 (SpCas9) is a widely-utilized genome-editing tool, but is restricted in genome targeting by the requirement for an NGG PAM sequence, which can be limiting for precision genome editing applications such as base editing, homology-directed repair, and predictable template-free genome editing. While SpCas9 variants with alternative PAM requirements have been previously reported, their targeting scope remains restricted primarily to G-containing PAMs.

[00101] The present application provides three SpCas9 variants capable of recognizing NRTH, NRRH, and NRCH PAMs, respectively, using an improved phage-assisted continuous evolution (PACE) Cas9 binding selection. These PAM sequence preferences are provided for these SpCas9 variants, along with the previously reported SpCas9-NG variant, by cytosine base editing, indel formation, and adenine base editing in a panel of 64 mammalian potential cell target sites. In further aspects, the present application provides the editing efficiencies of the SpCas9 variants on a mammalian cell library of ~12,000 genomically integrated sgRNA/protospacer targets.

[00102] Some aspects of this disclosure provide Cas9 proteins (e.g., SgCas9) that efficiently target nucleic acid sequences that do not include the canonical PAM sequence (5´-NGG-3´, where N is any nucleotide, for example A, T, G, or C) at their 3’-ends. It should be appreciated that the phrase“Cas9 proteins” can refer to isolated Cas9 proteins or Cas9 domains as part of fusion proteins. In some embodiments, the Cas9 domains provided herein comprise one or more mutations identified in directed evolution experiments using a target sequence library comprising randomized PAM sequences. The non-PAM restricted Cas9 domains provided herein are useful for targeting DNA sequences that do not comprise the canonical PAM sequence at their 3’-end and thus greatly extend the applicability and usefulness of Cas9 technology for gene editing. The evolution of Cas9 domains that are not restricted to the canonical 5´-NGG-3´ PAM sequence has been previously described, for example, in International Patent Application No., PCT/US2016/058345, filed October 22, 2016, and published as Patent Publication No. WO 2017/070633, published April 27, 2017, entitled“Evolved Cas9 Proteins for Gene Editing” which is herein incorporated by reference in its entirety. In addition to the Cas9 mutations identified and proteins listed in Publication No. WO 2017/070633, provided herein are novel additional mutations and Cas9 domains that have activity on target sequences comprising non-canonical PAM sequences. It should be understood that any of the mutations listed in Patent Publication No. WO 2017/070633 may be combined with or used in lieu of any of the mutations or Cas9 domains disclosed herein, unless explicity stated otherwise.

[00103] Some aspects of this disclosure provide fusion proteins that comprise a Cas9 domain and an effector domain, for example, a nucleic acid editing domain, such as a deaminase domain, a nuclease domain, a nickase domain, a recombinase domain, a methyltransferase domain, a methylase domain, an acetylase domain, an acetyltransferase domain, a transcriptional activator domain, or a transcriptional repressor domain.

[00104] The deamination of a nucleobase by a deaminase can lead to a point mutation at the specific residue, which is referred to herein as nucleic acid editing. Fusion proteins comprising a Cas9 domain or variant thereof and a nucleic acid editing domain can thus be used for the targeted editing of nucleic acid sequences. Such fusion proteins are useful for targeted editing of DNA in vitro, e.g., for the generation of mutant cells or animals; for the introduction of targeted mutations, e.g., for the correction of genetic defects in cells ex vivo, e.g., in cells obtained from a subject that are subsequently re-introduced into the same or another subject; and for the introduction of targeted mutations, e.g., the correction of genetic defects or the introduction of deactivating mutations in disease-associated genes in a subject in vivo. Typically, the Cas9 domain of the fusion proteins described herein is a Cas9 domain comprising one or more mutations provided herein (e.g., an “xCas9” domain) that has impaired nuclease activity (e.g., a nuclease-inactive xCas9 domain). For example, in some embodiments, the Cas9 domain comprises a D10A and/or a H840A mutation in the amino acid sequence provided in SEQ ID NO: 2. Methods for the use of fusion proteins comprising Cas9 as described herein are also provided.

[00105] Additional suitable nuclease-inactive Cas9 domains will be apparent to those of skill in the art based on this disclosure. Such additional exemplary suitable nuclease-inactive Cas9 domains include, but are not limited to, D10A, D839A, H840A, N863A, D10A/D839A, D10A/H840A, D10A/N863A, D839A/H840A, D839A/N863A, D10A/D839A/H840A, and

D10A/D839A/H840A/N863A mutant proteins (See, e.g., Prashant et al.,“Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,” Nature Biotechnology, 2013; 31(9): 833-838, the entire contents of which are incorporated herein by reference). In some embodiments, the Cas9 domain comprises a D10A mutation of the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2.

[00106] The base editors disclosed herein may also comprise a circular permutant Cas9 variant. The term“circularly permuted Cas9” refers to any Cas9 protein, or variant thereof, that occurs or has been modify to occur as a circular permutant, whereby its N- and C-termini have been topically rearranged. Such circularly permuted Cas9 proteins (“CP-Cas9”), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al.,“Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491–511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, January 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).

[00107] Any of the Cas9 proteins described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular permutant variant.

[00108] In various embodiments, the circular permutants of Cas9 may have the following structure: N-terminus-[original C-terminus]– [optional linker]– [original N-terminus]-C-terminus.

[00109] As an example, the present disclosure contemplates the following circular permutants of S. pyogenes Cas9 (based on 1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) of SEQ ID NO: 6:

[00110] N-terminus-[1268-1368]-[optional linker]-[1-1267]-C-terminus;

[00111] N-terminus-[1168-1368]-[optional linker]-[1-1167]-C-terminus;

[00112] N-terminus-[1068-1368]-[optional linker]-[1-1067]-C-terminus;

[00113] N-terminus-[968-1368]-[optional linker]-[1-967]-C-terminus;

[00114] N-terminus-[868-1368]-[optional linker]-[1-867]-C-terminus;

[00115] N-terminus-[768-1368]-[optional linker]-[1-767]-C-terminus;

[00116] N-terminus-[668-1368]-[optional linker]-[1-667]-C-terminus;

[00117] N-terminus-[568-1368]-[optional linker]-[1-567]-C-terminus;

[00118] N-terminus-[468-1368]-[optional linker]-[1-467]-C-terminus;

[00119] N-terminus-[368-1368]-[optional linker]-[1-367]-C-terminus;

[00120] N-terminus-[268-1368]-[optional linker]-[1-267]-C-terminus;

[00121] N-terminus-[168-1368]-[optional linker]-[1-167]-C-terminus;

[00122] N-terminus-[68-1368]-[optional linker]-[1-67]-C-terminus; or

[00123] N-terminus-[10-1368]-[optional linker]-[1-9]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc). [00124] In particular embodiments, the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (SEQ ID NO: 6):

[00125] N-terminus-[102-1368]-[optional linker]-[1-101]-C-terminus;

[00126] N-terminus-[1028-1368]-[optional linker]-[1-1027]-C-terminus;

[00127] N-terminus-[1041-1368]-[optional linker]-[1-1043]-C-terminus;

[00128] N-terminus-[1249-1368]-[optional linker]-[1-1248]-C-terminus; or

[00129] N-terminus-[1300-1368]-[optional linker]-[1-1299]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

[00130] In still other embodiments, the circular permutant Cas9 has the following structure (based on S. pyogenes Cas9 (SEQ ID NO: 6):

[00131] N-terminus-[103-1368]-[optional linker]-[1-102]-C-terminus;

[00132] N-terminus-[1029-1368]-[optional linker]-[1-1028]-C-terminus;

[00133] N-terminus-[1042-1368]-[optional linker]-[1-1041]-C-terminus;

[00134] N-terminus-[1250-1368]-[optional linker]-[1-1249]-C-terminus; or

[00135] N-terminus-[1301-1368]-[optional linker]-[1-1300]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

[00136] In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length. In some embodiments, the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all.

[00137] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, The C-terminal fragment may correspond to the C- terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1300-1368), or the C- terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., SEQ ID NO: 6). The N-terminal portion may correspond to the N-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1-1300), or the N- terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., of SEQ ID NO: 6).

[00138] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, the C-terminal fragment that is rearranged to the N- terminus, includes or corresponds to the C-terminal 30% or less of the amino acids of a Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 6). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of the amino acids of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410 residues or less of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6). In some

embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 357, 341, 328, 120, or 69 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 6).

[00139] In other embodiments, circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S. pyogenes Cas9 of SEQ ID NO: 6: (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into two halves: an N-terminal region and a C-terminal region; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to preceed the original N-terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue. The CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain. For example, the CP site may be located (relative to the S. pyogenes Cas9 of SEQ ID NO: 6) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N-terminal amino acid. Nomenclature of these CP-Cas9 proteins may be referred to as Cas9-CP 181 , Cas9-CP 199 , Cas9-CP 230 , Cas9-CP 270 , Cas9-CP 310 , Cas9-CP 1010 , Cas9-CP 1016 , Cas9-CP 1023 , Cas9-CP 1029 , Cas9-CP 1041 , Cas9- CP 1247 , Cas9-CP 1249 , and Cas9-CP 1282 , respectively. This description is not meant to be limited to making CP variants from SEQ ID NO: 6, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entirely. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant. [00140] Exemplary CP-Cas9 amino acid sequences, based on the Cas9 of SEQ ID NO: 6, are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 6 and any examples provided herein are not meant to be limiting.

[00141] CP1012

[00146] Exemplary C-terminal fragments of Cas9, based on the Cas9 of SEQ ID NO: 6, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C- terminal fragments of Cas9 are exemplary and are not meant to be limiting.

Cas9 domains

[00152] Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢, where N is A, C, G, or T) at its 3¢- end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢- NGG-3¢ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NNG-3´ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢-NNA-3¢ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5¢-NNC-3¢ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NNT-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGT-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGA-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NGC-3´ PAM sequence at its 3ʹ-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAA-3´ PAM sequence at its 3´-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAC-3´ PAM sequence at its 3¢-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAT-3´ PAM sequence at its 3´-end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5´-NAG-3´ PAM sequence at its 3´-end.

[00153] It should be appreciated that any of the amino acid mutations described herein, (e.g., A262T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., a A262T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an isoleucine, may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.

Mutations in Wild-Type SpCas9

[00154] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by any of the sequences set forth in SEQ ID NO: 2, 4, or 6-11, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NO: 2. In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain. In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 domain. In some embodiments, the Cas9 protein is a nuclease- inactive Cas9 protein. In some embodiments, the Cas9 domain is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2, wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, V1139A, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NO: 2, 4, or 6-11, wherein X represents any amino acid.

[00155] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by SEQ ID NO: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in any of the amino acid sequences provided in SEQ ID NO: 2, 4, or 6-11. In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain.

[00156] In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein is a nuclease-inactive Cas9 protein. In some embodiments, the Cas9 protein is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in any of the amino acid sequences provided in SEQ ID NOs: 2, 4, or 6-11, wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in SEQ ID NO: 2, 4, or 6-11, wherein X represents any amino acid. [00157] Some aspects of this disclosure provide a Cas9 protein comprising an amino acid sequence that is at least 80% identical to the amino acid sequence of Cas9 as provided by SEQ ID NOs: 2, wherein the amino acid sequence of the Cas9 protein comprises at least one mutation in an amino acid residue selected from the group consisting of amino acid residues 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of S. pyogenes having the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding amino acid residue in another Cas9 sequence (e.g., any of the sequences of 2, 4, or 6-11). In some embodiments, the Cas9 protein comprises a RuvC and an HNH domain. In some embodiments, the amino acid sequence of the Cas9 protein is not identical to the amino acid sequence of a naturally occurring Cas9 protein. In some embodiments, the Cas9 protein is a nuclease-inactive Cas9 domain. In some embodiments, the Cas9 protein is a Cas9 nickase. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid. In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one mutation selected from group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00158] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations in amino acid residues selected from the group consisting of 10, 177, 218, 322, 367, 409, 427, 589, 599, 614, 630, 631, 654, 673, 693, 710, 715, 727, 743, 753, 757, 758, 762, 763, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1139, 1151, 1180, 1188, 1211, 1219, 1221, 1223, 1256, 1264, 1274, 1290, 1318, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00159] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X409I, X427G, X589S, X599R, X614N, X630K, X631A, X654L, X673E, X693L, X710E, X715C, X727I, X743I, X753G, X757K, X758H, X762G, X763I, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1139A, X1151E, X1180G, X1188R, X1211R, X1219V, X1221H, X1223S, X1256R, X1264Y, X1274R, X1290G, X1318S, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00160] In some embodiments, the present disclosure may utilize any of the Cas9 variants disclosed in the SEQUENCES section herein.

[00161] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, or at least fifty-nine mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, S409I, E427G, A589S, K599R, D614N, E630K, M631A, R654L, K673E, F693L, K710E, G715C, L727I, V743I, R753G, E757K, N758H, E762G, M763I, Q768H, N803S, R859S, D861N, G865G, N869S, L921P, N946D, Y1016D, M1021T, E1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, V1139A, K1151E, D1180G, K1188R, K1211R, E1219V, Q1221H, G1223S, Q1256R, H1264Y, S1274R, V1290G, L1318S, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00162] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of 10, 177, 218, 322, 367, 427, 589, 599, 614, 630, 631, 693, 710, 743, 753, 757, 758, 762, 768, 803, 859, 861, 865, 869, 921, 946, 1016, 1021, 1028, 1054, 1077, 1080, 1114, 1134, 1135, 1137, 1151, 1180, 1188, 1211, 1221, 1223, 1274, 1290, 1317, 1320, 1323, and 1333 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00163] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of X10T, X177N, X218R, X322V, X367T, X427G, X589S, X599R, X614N, X630K, X631A, X693L, X710E, X743I, X753G, X757K, X758H, X762G, X768H, X803S, X859S, X861N, X865G, X869S, X921P, X946D, X1016D, X1021T, X1028D, X1054D, X1077D, X1080S, X1114G, X1134L, X1135N, X1137S, X1151E, X1180G, X1188R, X1211R, X1221H, X1223S, X1274R, X1290G, X1317T, X1320V, X1323D, and X1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00164] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of A10T, D177N, K218R, I322V, A367T, E427G, A589S, K599R, D614N, E630K, M631A, F693L, K710E, V743I, R753G, E757K, N758H, E762G, Q768H, N803S, R859S, D861N, N869S, L921P, N946D, Y1016D, M1021T, Q1028D, N1054D, G1077D, F1080S, R1114G, F1134L, D1135N, P1137S, K1151E, D1180G, K1188R, K1211R, Q1221H, G1223S, S1274R, V1290G, N1317T, A1320V, A1323D, and R1333K of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00165] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, or at least eighty mutations selected from the group consisting of 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 670, 673, 676, 687, 703, 710, 711, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 922, 928, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1207, 1219, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1337, 1338, 1348, 1349, 1365, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00166] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, or at least eighty mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X673E, X676G, X687R, X703P, X710E, X711T, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X928T, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1207G, X1219V, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1337N, X1338T, X1348V, X1349R, X1365L, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00167] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, at least seventy-two, at least seventy-three, at least seventy-four, at least seventy-five, at least seventy-six, at least seventy-seven, at least seventy-eight, at least seventy-nine, at least eighty mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, K673E, G676G, G687R, T703P, K710E, A711T, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, E1207G, E1219V, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, T1337N, S1338T, I1348V, H1349R, L1365L, G1367E, G1367T, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00168] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations in amino acid residues selected from the group consisting of 472, 562, 565, 570, 589, 608, 625, 627, 629, 630, 631, 638, 647, 647, 652, 653, 654, 676, 687, 703, 716, 740, 742, 752, 753, 767, 771, 775, 789, 790, 795, 797, 803, 804, 808, 848, 866, 875, 890, 890, 922, 948, 959, 990, 995, 1014, 1015, 1016, 1021, 1030, 1036, 1055, 1057, 1057, 1114, 1127, 1135, 1156, 1177, 1180, 1184, 1234, 1246, 1251, 1252, 1286, 1301, 1332, 1335, 1348, 1367, and 1368 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00169] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations selected from the group consisting of X472I, X562F, X565D, X570T, X570S, X589V, X608R, X625S, X627K, X629G, X630G, X631I, X631V, X638P, X647A, X647I, X652T, X653K, X654L, X654I, X654H, X670T, X676G, X687R, X703P, X710E, X716R, X740A, X742E, X752R, X753G, X767D, X771H, X775R, X789E, X790A, X795L, X797N, X803S, X804A, X808D, X848N, X866R, X875I, X890E, X890N, X922A, X948E, X959N, X990S, X995S, X1014N, X1015A, X1016C, X1016S, X1021L, X1030R, X1036H, X1036D, X1055E, X1057S, X1057T, X1114G, X1127A, X1127G, X1135N, X1156E, X1156N, X1177S, X1180E, X1184T, X1234D, X1246E, X1251G, X1252D, X1286H, X1301S, X1332N, X1332G, X1335Q, X1338T, X1348V, X1349R, X1367E, X1367T, X1367fs?, and X1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00170] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, at least forty-eight, at least forty-nine, at least fifty, at least fifty-one, at least fifty-two, at least fifty-three, at least fifty-four, at least fifty-five, at least fifty-six, at least fifty-seven, at least fifty-eight, at least fifty- nine, at least sixty, at least sixty-one, at least sixty-two, at least sixty-three, at least sixty-four, at least sixty-five, at least sixty-six, at least sixty-seven, at least sixty-eight, at least sixty-nine, at least seventy, at least seventy-one, or at least seventy-two mutations selected from the group consisting of T472I, I562F, V565D, I570T, I570S, A589V, K608R, L625S, E627K, R629G, E630G, M631I, M631V, T638P, V647A, V647I, K652T, R653K, R654L, R654I, R654H, I670T, G676G, G687R, T703P, K710E, Q716R, T740A, K742E, G752R, R753G, N767D, Q771H, K775R, K789E, E790A, I795L, K797N, N803S, T804A, N808D, K848N, K866R, V875I, K890E, K890N, V922A, K948E, K959N, N990S, T995S, K1014N, V1015A, Y1016C, Y1016S, M1021L, G1030R, Y1036H, Y1036D, I1055E, I1057S, I1057T, R1114G, D1127A, D1127G, D1135N, K1156E, K1156N, N1177S, D1180E, A1184T, N1234D, K1246E, D1251G, N1252D, N1286H, P1301S, D1332N, D1332G, R1335Q, S1338T, I1348V, S1349R, G1367E, G1367T, G1367fs?, and D1368D of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00171] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of 575, 596, 631, 649, 654, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 955, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1219, 1221, 1227, 1249, 1253, 1256, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid. [00172] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of X575S, X596Y, X631L, X649R, X654L, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X955L, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1219V, X1221H, X1227V, X1249S, X1253K, X1256R, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00173] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, at least forty-three, at least forty-four, at least forty-five, at least forty-six, at least forty-seven, or at least forty-eight mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R654L, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, V955L, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, E1219V, Q1221H, A1227V, P1249S, E1253K, Q1256R, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00174] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of 575, 596, 631, 649, 664, 710, 740, 743, 748, 750, 753, 765, 790, 797, 853, 922, 961, 985, 1012, 1049, 1057, 1114, 1131, 1135, 1150, 1156, 1162, 1180, 1191, 1218, 1221, 1249, 1253, 1286, 1293, 1308, 1317, 1320, 1321, 1332, 1335, and 1339 of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00175] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of X575S , X596Y, X631L, X649R, X664K, X710E, X740A, X743I, X748I, X750A, X753G, X765X, X790A, X797E, X853E, X922A, X961E, X985Y, X1012A, X1049G, X1057V, X1114G, X1131C, X1135N, X1150V, X1156E, X1162A, X1180G, X1180A, X1191N, X1218S, X1221H, X1249S, X1253K, X1286K, X1293T, X1308D, X1317K, X1320V, X1321S, X1332G, X1335L, and X1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00176] In some embodiments, the amino acid sequence of the Cas9 protein comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, at least twenty-four, at least twenty-five, at least twenty-six, at least twenty-seven, at least twenty-eight, at least twenty-nine, at least thirty, at least thirty-one, at least thirty-two, at least thirty-three, at least thirty-four, at least thirty-five, at least thirty-six, at least thirty-seven, at least thirty-eight, at least thirty-nine, at least forty, at least forty-one, at least forty-two, or at least forty-three mutations selected from the group consisting of F575S, D596Y, M631L, K649R, R664K, K710E, T740A, V743I, V748I, V750A, R753G, R765X, E790A, K797E, D853E, V922A, K961E, H985Y, D1012A, E1049G, I1057V, R1114G, Y1131C, D1135N, E1150V, K1156E, E1162A, D1180G, D1180A, K1191N, G1218S, Q1221H, P1249S, E1253K, N1286K, A1293T, N1308D, N1317K, A1320V, P1321S, D1332G, R1335L, and T1339I of the amino acid sequence provided in SEQ ID NO: 2, or in a corresponding mutation, or mutations, in another Cas9 sequence (e.g., any of the sequences of SEQ ID NOs: 2, 4, or 6-11), wherein X represents any amino acid.

[00177] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X570T mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X570S.

[00178] In some embodiments, the amino acid sequence of the Cas9 domain comprises an I570T mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is I570S.

[00179] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X589S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X589V.

[00180] In some embodiments, the amino acid sequence of the Cas9 domain comprises an A589S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is A589V.

[00181] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X630G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X630K.

[00182] In some embodiments, the amino acid sequence of the Cas9 domain comprises an E630G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is E630K. [00183] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X631A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence 2, wherein X represents any amino acid. In some embodiments, the mutation is X631I. In some embodiments, the mutation is X631L. In some embodiments, the mutation is X631V.

[00184] In some embodiments, the amino acid sequence of the Cas9 domain comprises an M631A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is M631I. In some embodiments, the mutation is M631L. In some embodiments, the mutation is M631V.

[00185] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X647A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X647I.

[00186] In some embodiments, the amino acid sequence of the Cas9 domain comprises an V647A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is V647I.

[00187] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X654H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X654I. In some embodiments, the mutation is X654L.

[00188] In some embodiments, the amino acid sequence of the Cas9 domain comprises an R654H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is R654I. In some embodiments, the mutation is R654L. [00189] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X890E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X890N.

[00190] In some embodiments, the amino acid sequence of the Cas9 domain comprises a K890E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is K890N.

[00191] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1016C mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1016D. In some embodiments, the mutation is X1016S.

[00192] In some embodiments, the amino acid sequence of the Cas9 domain comprises an Y1016C mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is Y1016D. In some embodiments, the mutation is Y1016S.

[00193] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1021L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1021T.

[00194] In some embodiments, the amino acid sequence of the Cas9 domain comprises an M1021L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is M1021T.

[00195] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1036D mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1036H.

[00196] In some embodiments, the amino acid sequence of the Cas9 domain comprises an Y1036D mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is Y1036H.

[00197] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1057S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1057T. In some embodiments, the mutation is X1057V.

[00198] In some embodiments, the amino acid sequence of the Cas9 domain comprises an I1057S mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is I1057T. In some embodiments, the mutation is X1057V.

[00199] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1127A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1121G.

[00200] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1127A mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1127G.

[00201] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1156E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1156N. [00202] In some embodiments, the amino acid sequence of the Cas9 domain comprises an K1156E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is K1156N.

[00203] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1180E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1180G.

[00204] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1180E mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1180G.

[00205] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1286H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1286K.

[00206] In some embodiments, the amino acid sequence of the Cas9 domain comprises an N1286H mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is N1286K.

[00207] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1132G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1132N.

[00208] In some embodiments, the amino acid sequence of the Cas9 domain comprises an D1132G mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is D1132N. [00209] In some embodiments, the amino acid sequence of the Cas9 protein comprises an X1335L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence, wherein X represents any amino acid. In some embodiments, the mutation is X1335Q.

[00210] In some embodiments, the amino acid sequence of the Cas9 domain comprises an R1335L mutation in the amino acid sequence provided in SEQ ID NO: 2, or a corresponding mutation in another Cas9 sequence. In some embodiments, the mutation is R1335Q.

[00211] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAA-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 1. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 1. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72-4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10;

P10.6.144.5; P10.6.192.1; P10.6.192.9; P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2-120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N3.19.4c-3; N3.19.4c-4; P4.2-72- 4; P4.2-72-5; P10.6.144.2; P10.5.192.7; P10.5.192.10; P10.6.144.5; P10.6.192.1; P10.6.192.9;

P10.6.192.12; P13.2-8; P13.3-3; P13.4-3; P16.2-120-1; P16.2-120-2; P16.2-120-3; P16.2-120-4; P16.2-120-5; P16.2-120-6; P16.1-3; P16.3-2; P16.4-5(es); and P16.6-2, or a combination of conservative mutations thereto.

[00212] Table 1: NAA PAM Clones

[00213] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1.

[00214] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence.

[00215] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAC-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 2. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5; P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4; P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3; P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn). In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of N4.CAC-1; N4.CAC-5; N4.CAC06; SacB.CAC.4h; N3.CAC-1; N3.CAC-5; N3.CAC-6; N3.CAC-8; P15.1.166-3; P15.1.166-8; P15.2.166-2; P15.3.166-4; P15.3.166-5;

P15.3.166-7; P15.4.166-4; P15.4.166-8; P17.1.144-1; P17.1.144-2; P17.1.144-3; P17.1.144-4; P17.1.144-5; P17.1.144-7; P17.1.144-8; P17.2.144-1; P17.2.144-2; P17.2.144-3; P17.2.144-4; P17.2.144-5; P17.2.144-6; P17.2.144-7; P17.2.144-8; P17.1-1; P17.1-5; and P17.1.7-4(fn), or a combination of conservative mutations thereto.

[00216] Table 2: NAC PAM Clones

[00217] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2.

[00218] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAC, GAC, CAC, or TAC sequence.

[00219] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5´-NAT-3´ PAM sequence at its 3’-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 3. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 3. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10- 6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4-2; P17.4-3; P17.4-4;

P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1. In some embodiments, the Cas9 protein comprises the combination of mutations present in any one of the Cas9 clones selected from the group consisting of SacB.N4.19.TAT-4h-1; SacB.N4.19.TAT-4h-3; P12.2.b9-8; P12.3.b9-8; P12.3.b9-8 (ax); P12.3.b10-6; SacB.P12a2.AAT.3hr.maj; SacB.P12a2.AAT.3hr.min; P17.4-1; P17.4- 2; P17.4-3; P17.4-4; P17.4-5; P17.4-6; P17.4-8; P17-4-1-1; P17-4-3-1; and P17-4-6-1, or a combination of conservative mutations thereto.

[00220] Table 3: NAT PAM Clones

[00221] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3.

Cas9 Activity

[00222] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5´-NGG-3´) at its 3’ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the Ca9 protein exhibits an activity on a target sequence having a 3’ end that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5´-NGG-3´) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500- fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000- fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of

Streptococcus pyogenes as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3’ end of the target sequence is directly adjacent to an AAT, GAT, CAT, or TAT sequence.

[00223] In some embodiments, the Cas9 domain exhibits activity on a target sequence having a 3ʹ- end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢), or on a target sequence that does not comprise the canonical PAM sequence (5¢-NGG-3¢), that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the Cas9 domain exhibits activity on a target sequence having a 3ʹ-end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢), or on a target sequence that does not comprise the canonical PAM sequence (5¢-NGG-3¢), that is at least 10%, at least 15%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% greater than the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an NGT, NGA, NGC, and NNG sequence, wherein N is A, G, T, or C. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an AAA, AAC, AAG, AAT, CAA, CAC, CAG, CAT, GAA, GAC, GAG, GAT, TAA, TAC, TAG, TAT, ACA, ACC, ACG, ACT, CCA, CCC, CCG, CCT, GCA, GCC, GCG, GCT, TCA, TCC, TCG, TCT, AGA, AGC, AGT, CGA, CGC, CGT, GGA, GGC, GGT, TGA, TGC, TGT, ATA, ATC, ATG, ATT, CTA, CTC, CTG, CTT, GTA, GTC, GTG, GTT, TTA, TTC, TTG, or TTT PAM sequence. In some embodiments, the 3ʹ-end of the target sequence is directly adjacent to an CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, or CAA sequence. In some embodiments, the Cas9 domain activity is measured by a nuclease assay, a deamination assay, a transcriptional activation assay, a binding assay, or by PCR or sequencing. In some embodiments, the transcriptional activation assay is a reporter activation assay, such as a GFP activation assay. Exemplary methods for measuring binding activity (e.g., of Cas9) using transcriptional activation assays are known in the art and would be apparent to the skilled artisan. For example, methods for measuring Cas9 activity using the tripartite activator VPR have been described in Chavez A., et al.,“Highly efficient Cas9-mediated transcriptional programming.” Nature Methods 12, 326–328 (2015), the entire contents of which are incorporated by reference herein.

[00224] In some embodiments, the Cas9 domain is mutated with respect to a corresponding wild- type protein such that the mutated Cas9 domain lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. In particular embodiments, an aspartate-to- alanine substitution (D10A) in the RuvC1 catalytic domain of S. pyogenes Cas9 converts Cas9 from a nuclease that cleaves both strands to a nickase that nicks the targeted strand, or the strand that is complementary to the gRNA. A histidine-to-alanine substitution (H840A) in the HNH catalytic domain of S. pyogenes Cas9 generates a nick on the strand that is displaced by the gRNA during strand invasion, also referred to herein as the non-edited strand. The single catalytically active nuclease site of the nCas9 leaves a nick in the non-edited strand, which will direct mismatch repair machinery to read (rather than remove) the modified base during repair (i.e., a substituted guanine or guanine derivative at the target site). Other examples of mutations that render Cas9 a nickase include, without limitation, N854A and N863A in SpCas9, and corresponding mutations in other wild- type Cas9 proteins or variants thereof. Reference is made to U.S. Patent No.8,945,839, incorporated herein by reference.

[00225] In some embodiments, the amino acid sequence of the HNH domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the HNH domain of any of SEQ ID NO: 2. In some embodiments, the amino acid sequence of the RuvC domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of the RuvC domain of SEQ ID NO: 2. In some embodiments, the Cas9 domain comprises the RuvC and HNH domains of SEQ ID NO: 2. In some embodiments, the Cas9 domain comprises a D10A and/or a H840A mutation in the amino acid sequence provided in SEQ ID NO: 2, or corresponding mutation(s) in another Cas9 sequence.

[00226] In some embodiments, the disclosure provides SpCas9 mutant proteins that work best on NRRH, NRCH, and NRTH PAMs. The SpCas9 mutant protein that works best on NARH (“es” variant), has an amino acid sequence as presented in SEQ ID NO: 22 (underligned residues are mutated from SpCas9)

[00227] The SpCas9 mutant protein that works best on NRCH (“fn” variant), has an amino acid sequence as presented in SEQ ID NO: 23 (underligned residues are mutated from SpCas9)

[00228] The SpCas9 mutant protein that works best on NRTH (“ax” variant), has an amino acid

[00229] Some aspects of the disclosure provide high fidelity Cas9 domains. In some embodiments, high fidelity Cas9 domains have decreased electrostatic interactions between the Cas9 domain and a sugar-phosphate backbone of a DNA, as compared to a wild-type Cas9 domain. In some

embodiments, any of the Cas9 domains provided herein comprise one or more mutations that decrease the association between the Cas9 domain and a sugar-phosphate backbone of a DNA. In some embodiments, any of the Cas9 domains provided herein comprise one or more mutations that decrease the association between the Cas9 domain and a sugar-phosphate backbone of a DNA by at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%. In some embodiments, any of the Cas9 domains provided herein comprise one or more of a N497X, a R661X, a Q695X, and/or a Q926X mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence, wherein X is any amino acid. In some

embodiments, any of the Cas9 domains provided herein comprise one or more of a N497A, a R661A, a Q695A, and/or a Q926A mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence. In some embodiments, the Cas9 domain comprises a D10A mutation of the amino acid sequence provided in SEQ ID NO: 135, or a corresponding mutation in another Cas9 sequence. In some embodiments, the Cas9 domain comprises the amino acid sequence as set forth in SEQ ID NO: 135. High fidelity Cas9 domains have been described in the art and would be apparent to the skilled artisan. For example, high fidelity Cas9 domains have been described in Kleinstiver, B.P., et al.“High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects.” Nature 529, 490-495 (2016); and Slaymaker, I.M., et al.“Rationally engineered Cas9 nucleases with improved specificity.” Science 351, 84-88 (2015); the entire contents of each are incorporated herein by reference. It should be appreciated that, based on the present disclosure and knowledge in the art, that mutations in any Cas9 domain may be generated to make high fidelity Cas9 domains that have decreased electrostatic interactions between the Cas9 domain and a sugar-phosphate backbone of a DNA, as compared to a wild-type Cas9 domain.

[00230] Cas9 domain where mutations relative to Cas9 of SEQ ID NO: 6 are shown in bold and underlines.

[00231] In some embodiments, the Cas9 domain of any of the fusion proteins provided herein comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid set forth as SEQ ID NO: 10 (S. aureus Cas9), below. In some embodiments, the Cas9 domain of any of the fusion proteins provided herein comprises the amino acid sequence of SEQ ID NO: 10. In some embodiments, the Cas9 domain of any of the fusion proteins provided herein consists of the amino acid sequence of SEQ ID NO: 10.

[00232] An exemplary SaCas9 amino acid sequence is:

[00233] An additional Cas9 domain with altered PAM specificity, such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Geobacillus thermodenitrificans Cas9 (SEQ ID NO: 11, GeoCas9) may be used.

[00234] In some embodiments, a Cas9 domain refers to a Cas9 or Cas9 homolog from archaea (e.g., nanoarchaea), which constitute a domain and kingdom of single-celled prokaryotic microbes. In some embodiments, a Cas9 domain may comprise a CasX (now referred to as Cas12e) or CasY (now referred to as Cas12d) omain, which have been described in, for example, Burstein et al.,“New CRISPR–Cas systems from uncultivated microbes.” Cell Res.2017 Feb 21. doi: 10.1038/cr.2017.21, and Liu et al.,“CasX enzymes comprise a distinct family of RNA-guided genome editors,” Nature. 2019; 566(7743):218-223, each of which is incorporated herein by reference. Using genome- resolved metagenomics, a number of CRISPR–Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR–Cas system. In bacteria, two previously unknown systems were discovered, CRISPR–CasX and CRISPR–CasY, which are among the most compact systems yet discovered. In some embodiments, napDNAbp domain refers to CasX, or a variant of CasX. In some embodiments, napDNAbp domain refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a napDNAbp and are within the scope of this disclosure. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein.

Cytidine deaminases

[00235] In some embodiments, the deaminase domain is a cytidine deaminase domain. A cytidine deaminase domain may also be referred to interchangeably as a cytosine deaminase domain. In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine (C) or deoxycytidine (dC) to uridine (U) or deoxyuridine (dU), respectively. In some embodiments, the cytidine deaminase domain catalyzes the hydrolytic deamination of cytosine (C) to uracil (U). In some embodiments, the cytidine deaminase catalyzes the hydrolytic deamination of cytidine or cytosine in deoxyribonucleic acid (DNA). Without wishing to be bound by any particular theory, fusion proteins comprising a cytidine deaminase are useful inter alia for targeted editing, referred to herein as“base editing,” of nucleic acid sequences in vitro and in vivo.

[00236] One exemplary suitable type of cytidine deaminase is a cytidine deaminase, for example, of the APOBEC family. The apolipoprotein B mRNA-editing complex (APOBEC) family of cytidine deaminase enzymes encompasses eleven proteins that serve to initiate mutagenesis in a controlled and beneficial manner (see, e.g., Conticello SG. The AID/APOBEC family of nucleic acid mutators. Genome Biol.2008; 9(6):229). One family member, activation-induced cytidine deaminase (AID), is responsible for the maturation of antibodies by converting cytosines in ssDNA to uracils in a transcription-dependent, strand-biased fashion (see, e.g., Reynaud CA, et al. What role for AID: mutator, or assembler of the immunoglobulin mutasome, Nat Immunol.2003; 4(7):631-638). The apolipoprotein B editing complex 3 (APOBEC3) enzyme provides protection to human cells against a certain HIV-1 strain via the deamination of cytosines in reverse-transcribed viral ssDNA (see, e.g., Bhagwat AS. DNA-cytosine deaminases: from antibody maturation to antiviral defense. DNA Repair (Amst).2004; 3(1):85-89). These proteins all require a Zn 2+ -coordinating motif (His-X-Glu-X 23-26 -Pro- Cys-X 2-4 -Cys; SEQ ID NO: 405) and bound water molecule for catalytic activity. The Glu residue acts to activate the water molecule to a zinc hydroxide for nucleophilic attack in the deamination reaction. Each family member preferentially deaminates at its own particular“hotspot”, ranging from WRC (W is A or T, R is A or G) for hAID, to TTC for hAPOBEC3F (see, e.g., Navaratnam N and Sarwar R. An overview of cytidine deaminases. Int J Hematol.2006; 83(3):195-200). A recent crystal structure of the catalytic domain of APOBEC3G revealed a secondary structure comprised of a five-stranded b- sheet core flanked by six a-helices, which is believed to be conserved across the entire family (see, e.g., Holden LG, et al. Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature.2008; 456(7218):121-4). The active center loops have been shown to be responsible for both ssDNA binding and in determining“hotspot” identity (see, e.g., Chelico L, et al. Biochemical basis of immunological and retroviral responses to DNA-targeted cytosine deamination by activation-induced cytidine deaminase and APOBEC3G. J Biol Chem.2009; 284(41).27761-5). Overexpression of these enzymes has been linked to genomic instability and cancer, thus highlighting the importance of sequence-specific targeting (see, e.g., Pham P, et al. Reward versus risk: DNA cytidine deaminases triggering immunity and disease. Biochemistry.2005; 44(8):2703-15).

[00237] Some aspects of this disclosure relate to the recognition that the activity of cytidine deaminase enzymes such as APOBEC enzymes can be directed to a specific site in genomic DNA. Without wishing to be bound by any particular theory, advantages of using a nucleic acid programmable binding protein (e.g., a Cas9 domain) as a recognition agent include (1) the sequence specificity of nucleic acid programmable binding protein (e.g., a Cas9 domain) can be easily altered by simply changing the sgRNA sequence; and (2) the nucleic acid programmable binding protein (e.g., a Cas9 domain) may bind to its target sequence by denaturing the dsDNA, resulting in a stretch of DNA that is single-stranded and therefore a viable substrate for the deaminase. It should be understood that other catalytic domains of napDNAbps, or catalytic domains from other nucleic acid editing proteins, can also be used to generate fusion proteins with Cas9, and that the disclosure is not limited in this regard.

[00238] In view of the results provided herein regarding the nucleotides that can be targeted by Cas9:deaminase fusion proteins, a person of ordinary skill in the art will be able to design suitable guide RNAs to target the fusion proteins to a target sequence that comprises a nucleotide to be deaminated.

[00239] In some embodiments, the cytidine deaminase is an apolipoprotein B mRNA- editing complex (APOBEC) family deaminase. In some embodiments, the cytidine deaminase is an APOBEC1 deaminase. In some embodiments, the cytidine deaminase is an APOBEC2 deaminase. In some embodiments, the cytidine deaminase is an APOBEC3 deaminase. In some embodiments, the cytidine deaminase is an APOBEC3A deaminase. In some embodiments, the cytidine deaminase is an APOBEC3B deaminase. In some embodiments, the cytidine deaminase is an APOBEC3C deaminase. In some embodiments, the cytidine deaminase is an APOBEC3D deaminase. In some embodiments, the cytidine deaminase is an APOBEC3E deaminase. In some embodiments, the cytidine deaminase is an APOBEC3F deaminase. In some embodiments, the cytidine deaminase is an APOBEC3G deaminase. In some embodiments, the cytidine deaminase is an APOBEC3H deaminase. In some embodiments, the cytidine deaminase is an APOBEC4 deaminase. In some embodiments, the cytidine deaminase is an activation-induced deaminase (AID). In some embodiments, the cytidine deaminase is a vertebrate cytidine deaminase. In some embodiments, the cytidine deaminase is an invertebrate cytidine deaminase. In some embodiments, the cytidine deaminase is a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse deaminase. In some embodiments, the cytidine deaminase is a human cytidine deaminase. In some embodiments, the cytidine deaminase is a rat cytidine deaminase, e.g., rAPOBEC1. In some embodiments, the cytidine deaminase is a Petromyzon marinus cytidine deaminase 1 (pmCDA1) (SEQ ID NO: 58). In some embodiments, the cytidine deaminase is a human APOBEC3G (SEQ ID NO: 60). In some embodiments, the cytidine deaminase is a fragment of the human APOBEC3G. In some embodiments, the deaminase is a human APOBEC3G variant comprising a D316R and D317R mutation. In some embodiments, the deaminase is a fragment of the human APOBEC3G and comprising mutations corresponding to the D316R and D317R mutations in SEQ ID NO: 61.

[00240] In some embodiments, the nucleic acid editing domain is at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the cytidine deaminase domain of any one of SEQ ID NOs: 27-61. In some embodiments, the nucleic acid editing domain comprises the amino acid sequence of any one of SEQ ID NOs: 27-61.

[00241] Some exemplary suitable nucleic-acid editing domains, e.g., cytidine deaminases and cytidine deaminase domains, that can be fused to napDNAbps (e.g., Cas9 domains) according to aspects of this disclosure are provided below. It should be understood that, in some embodiments, the active domain of the respective sequence can be used, e.g., the domain without a localizing signal (nuclear localization sequence, without nuclear export signal, cytoplasmic localizing signal).

[00242] Human AID:

(underline: nuclear localization sequence; double underline: nuclear export signal)

[00243] Mouse AID:

(underline: nuclear localization sequence; double underline: nuclear export signal)

[00244] Dog AID:

(underline: nuclear localization sequence; double underline: nuclear export signal)

[00245] Bovine AID:

(underline: nuclear localization sequence; double underline: nuclear export signal)

[00246] Rat AID:

(underline: nuclear localization sequence; double underline: nuclear export signal)

[00247] Mouse APOBEC-3:

[00248] Rat APOBEC-3:

(italic: nucleic acid editing domain)

[00249] Rhesus macaque APOBEC-3G:

(italic: nucleic acid editing domain; underline: cytoplasmic localization signal)

[00250] Chimpanzee APOBEC-3G:

[00251] Green monkey APOBEC-3G:

(SEQ ID NO: 36)

(italic: nucleic acid editing domain; underline: cytoplasmic localization signal)

[00252] Human APOBEC-3G:

(italic: nucleic acid editing domain; underline: cytoplasmic localization signal)

[00253] Human APOBEC-3F:

(SEQ ID NO: 38)

(italic: nucleic acid editing domain)

[00254] Human APOBEC-3B:

[00255] Rat APOBEC-3B:

40)

[00256] Bovine APOBEC-3B:

Adenosine deaminases

[00277] The disclosure provides fusion proteins that comprise one or more adenosine deaminases. In some aspects, such fusion proteins are capable of deaminating adenosine in a nucleic acid sequence (e.g., DNA or RNA). As one example, any of the fusion proteins provided herein may be base editors, (e.g., adenine base editors). Without wishing to be bound by any particular theory, dimerization of adenosine deaminases (e.g., in cis or in trans) may improve the ability (e.g., efficiency) of the fusion protein to modify a nucleic acid base, for example to deaminate adenine. In some embodiments, any of the fusion proteins may comprise 2, 3, 4 or 5 adenosine deaminases. In some embodiments, any of the fusion proteins provided herein comprise two adenosine deaminases. Exemplary, non-limiting, embodiments of adenosine deaminases are provided herein. It should be appreciated that the mutations provided herein (e.g., mutations in ecTadA) may be applied to adenosine deaminases in other adenosine base editors, for example those provided in U.S. Patent Publication No. 2018/0073012, published March 15, 2018, which issued as U.S. Patent No.10,113,163, on October 30, 2018; U.S. Patent Publication No.2017/0121693, published May 4, 2017, which issued as U.S. Patent No.10,167,457 on January 1, 2019; International Publication No. WO 2017/070633, published April 27, 2017; U.S. Patent Publication No.2015/0166980, published June 18, 2015; U.S. Patent No. 9,840,699, issued December 12, 2017; and U.S. Patent No.10,077,453, issued September 18, 2018, all of which are incorporated herein by reference in their entireties.

[00278] In some embodiments, any of the adenosine deaminases provided herein is capable of deaminating adenine. In some embodiments, the adenosine deaminases provided herein are capable of deaminating adenine in a deoxyadenosine residue of DNA. The adenosine deaminase may be derived from any suitable organism (e.g., E. coli). In some embodiments, the adenosine deaminase is a naturally-occurring adenosine deaminase that includes one or more mutations corresponding to any of the mutations provided herein (e.g., mutations in ecTadA). One of skill in the art will be able to identify the corresponding residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Accordingly, one of skill in the art would be able to generate mutations in any naturally- occurring adenosine deaminase (e.g., having homology to ecTadA) that corresponds to any of the mutations described herein, e.g., any of the mutations identified in ecTadA. In some embodiments, the adenosine deaminase is from a prokaryote. In some embodiments, the adenosine deaminase is from a bacterium. In some embodiments, the adenosine deaminase is from Escherichia coli, Staphylococcus aureus, Salmonella typhi, Shewanella putrefaciens, Haemophilus influenzae, Caulobacter crescentus, or Bacillus subtilis. In some embodiments, the adenosine deaminase is from E. coli.

[00279] In some embodiments, the adenosine deaminase comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences set forth in any one of SEQ ID NOs: 62-84, or to any of the adenosine deaminases provided herein. It should be appreciated that adenosine deaminases provided herein may include one or more mutations (e.g., any of the mutations provided herein). The disclosure provides adenosine deaminases with a certain percent identity plus any of the mutations or combinations thereof described herein. In some embodiments, the adenosine deaminase comprises an amino acid sequence that has 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more mutations compared to any one of the amino acid sequences set forth in SEQ ID NOs: 62-84, or any of the adenosine deaminases provided herein. In some embodiments, the adenosine deaminase comprises an amino acid sequence that has at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 110, at least 120, at least 130, at least 140, at least 150, at least 160, or at least 170 identical contiguous amino acid residues as compared to any one of the amino acid sequences set forth in SEQ ID NOs: 62-84, or any of the adenosine deaminases provided herein.

[00280] In some embodiments, the adenosine deaminase comprises an E59X mutation in ecTadA SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase, where X indicates any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In particular embodiments, the adenosine deaminase comprises a E59A mutation in SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase.

[00281] In some embodiments, the adenosine deaminase comprises a D108X mutation in ecTadA SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase, where X indicates any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises a D108W, D108Q, D108F, D108K, or D108M mutation in SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase. In particular embodiments, the adenosine deaminase comprises a D108W mutation in SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase. It should be appreciated, however, that additional deaminases may similarly be aligned to identify homologous amino acid residues that may be mutated as provided herein.

[00282] In some embodiments, the adenosine deaminase comprises TadA 7.10, whose sequence is provided as SEQ ID NO: 65, or a variant thereof. TadA7.10 comprises the following mutations in ecTadA: W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F, K157N.

[00283] In particular embodiments, the adenosine deaminase comprises an N108W mutation in SEQ ID NO: 65, an embodiment also referred to as TadA 7.10(N108W). Its sequence is provided as SEQ ID NO: 67.

[00284] In some embodiments, the adenosine deaminase comprises an A106X mutation in ecTadA SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase, where X indicates any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises an A106V mutation in SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises an A106Q, A106F, A106W, or A106M mutation in SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase.

[00285] In particular embodiments, the adenosine deaminase comprises a V106W mutation in SEQ ID NO: 65, an embodiment also referred to as TadA 7.10(V106W). Its sequence is provided as SEQ ID NO: 66.

[00286] In some embodiments, the adenosine deaminase comprises a R47X mutation in SEQ ID NO: 65, or a corresponding mutation in another adenosine deaminase, where the presence of X indicates any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises a R47Q, R47F, R47W, or R47M mutation in SEQ ID NO: 65, or a corresponding mutation in another adenosine deaminase.

[00287] In particular embodiments, the adenosine deaminase comprises a R47Q, R47F, R47W, or R47M mutation in SEQ ID NO: 65.

[00288] In particular embodiments, the adenosine deaminase comprises a V106Q mutation and an N108W mutation in SEQ ID NO: 65. In particular embodiments, the adenosine deaminase comprises a V106W mutation, an N108W mutation and an R47Z mutation, wherein Z is selected from the residues consisting of Q, F, W and M, in SEQ ID NO: 65.

[00289] It should be appreciated that any of the mutations provided herein (e.g., based on the ecTadA amino acid sequence of SEQ ID NO: 64) may be introduced into other adenosine deaminases, such as S. aureus TadA (saTadA), or other adenosine deaminases (e.g., bacterial adenosine deaminases), such as those sequences provided below. It would be apparent to the skilled artisan how to identify amino acid residues from other adenosine deaminases that are homologous to the mutated residues in ecTadA. Thus, any of the mutations identified in ecTadA may be made in other adenosine deaminases that have homologous amino acid residues. It should also be appreciated that any of the mutations provided herein may be made individually or in any combination in ecTadA or another adenosine deaminase. For example, an adenosine deaminase may contain a D108N, an A106V, and/or a R47Q mutation in ecTadA SEQ ID NO: 64, or a corresponding mutation in another adenosine deaminase.

[00290] In some embodiments, the adenosine deaminase comprises one, two, or three mutations selected from the group consisting of D108, A106, and R47 in SEQ ID NO: 64, or a corresponding mutation or mutations in another adenosine deaminase.

[00291] In other aspects, the disclosure provides adenine base editors with broadened target sequence compatibility. In general, native ecTadA deaminates the adenine in the sequence UAC (e.g., the target sequence) of the anticodon loop of tRNA Arg . Without wishing to be bound by any particular theory, in order to expand the utility of ABEs comprising one or more ecTadA deaminases, such as

any of the adenosine deaminases provided herein, the adenosine deaminase proteins were optimized

to recognize a wide variety of target sequences within the protospacer sequence without

compromising the editing efficiency of the adenosine nucleobase editor complex. In some

embodiments, the target sequence is an A in the middle of a 5’-NAN-3’ sequence, wherein N is T, C,

G, or A. In some embodiments, the target sequence comprises 5’-TAC-3’. In some embodiments, the

target sequence comprises 5’-GAA-3’.

[00292] In some embodiments, the adenosine deaminase is an N-terminal truncated E. coli TadA. In

certain embodiments, the adenosine deaminase comprises the amino acid sequence:

[00293] In some embodiments, the TadA deaminase is a full-length E. coli TadA deaminase

(ecTadA). For example, in certain embodiments, the adenosine deaminase comprises the amino acid

sequence:

[00294] It should be appreciated, however, that additional adenosine deaminases useful in the

present application would be apparent to the skilled artisan and are within the scope of this disclosure.

For example, the adenosine deaminase may be a homolog of an ADAT. Exemplary ADAT homologs

include, without limitation:

[00295] Staphylococcus aureus TadA: [00296] Bacillus subtilis TadA:

[00316] Any two or more of the adenosine deaminases described herein may be connected to one another (e.g. by a linker) within an adenosine deaminase domain of the fusion proteins provided herein. For instance, the fusion proteins provided herein may contain only two adenosine deaminases. In some embodiments, the adenosine deaminases are the same. In some embodiments, the adenosine deaminases are any of the adenosine deaminases provided herein. In some embodiments, the adenosine deaminases are different. In some embodiments, the first adenosine deaminase is any of the adenosine deaminases provided herein, and the second adenosine is any of the adenosine deaminases provided herein, but is not identical to the first adenosine deaminase. In some embodiments, the fusion protein comprises two adenosine deaminases (e.g., a first adenosine deaminase and a second adenosine deaminase). In some embodiments, the fusion protein comprises a first adenosine deaminase and a second adenosine deaminase. In some embodiments, the first adenosine deaminase is N-terminal to the second adenosine deaminase in the fusion protein. In some embodiments, the first adenosine deaminase is C-terminal to the second adenosine deaminase in the fusion protein. In some embodiments, the first adenosine deaminase and the second deaminase are fused directly or via a linker.

[00317] In particular embodiments, the base editors disclosed herein comprise a heterodimer of a first adenosine deaminase that is N-terminal to a second adenosine deaminase, wherein the first adenosine deaminase comprises a sequence with at least 80%, 85%, 90%, 95%, 98%, 99%, or 99.5% sequence identity to SEQ ID NO: 62-84; and the second adenosine deaminase comprises a sequence with at least 80%, 85%, 90%, 95%, 98%, 99%, or 99.5% sequence identity to SEQ ID NO: 62-84.

[00318] In other embodiments, the second adenosine deaminase of the base editors provided herein comprises a sequence with at least 80%, 85%, 90%, 95%, 98%, 99%, or 99.5% sequence identity to SEQ ID NO: 65 (TadA 7.10), wherein any sequence variation may only occur in amino acid positions other than R47, V106 or N108 of SEQ ID NO: 65. In other words, these embodiments must contain amino acid substitutions at R47, V106 or N108 of SEQ ID NO: 65.

[00319] In other embodiments, the second adenosine deaminase of the heterodimer comprises a sequence with at least 80%, 85%, 90%, 95%, 98%, 99%, or 99.5% sequence identity to SEQ ID NO: 62-84.

Base editor constructs

[00320] Any of the Cas9 domains (e.g., Cas9 domains that recognize a non-canonical PAM sequence) disclosed herein may be fused to a second protein, thus providing fusion proteins that comprise a Cas9 domain as provided herein and a second protein, or a“fusion partner.” In some embodiments, the second protein is an effector domain. As used herein, an“effector domain” refers to a molecule (e.g., a protein) that regulates a biological activity and/or is capable of modifying a biological molecule (e.g., a protein, or a nucleic acid such as DNA or RNA). In some embodiments, the effector domain is a protein. In some embodiments, the effector domain is capable of modifying a protein (e.g., a histone). In some embodiments, the effector domain is capable of modifying DNA (e.g., genomic DNA). In some embodiments the effector domain is capable of modifying RNA (e.g., mRNA). In some embodiments, the effector molecule is a nucleic acid editing domain. In some embodiments, the effector molecule is capable of regulating an activity of a nucleic acid (e.g., transcription, and/or translation). Exemplary effector domains include, without limitation, a deaminase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments, the effector domain is a nucleic acid editing domain. Some aspects of the disclosure provide fusion proteins comprising a Cas9 domain and a nucleic acid editing domain.

[00321] In some embodiments, the fusion proteins provided herein exhibit increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to a fusion protein comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the fusion protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of a fusion protein comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of NGT, NGA, NGC, and NNG, wherein N is an A, G, T, or C. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, and CAA. In some embodiments, the fusion protein activity is measured by a nuclease assay, a deamination assay, a transcriptional activation assay, a binding assay, PCR, or sequencing. In some embodiments, the transcriptional activation assay is a GFP activation assay. In some embodiments, sequencing is used to measure indel formation. In some embodiments, the increased activity is increased binding. In some embodiments, the increased activity is increased deamination of a nucleobase in the target sequence.

[00322] Some aspects of the disclosure provide a fusion protein comprising a Cas9 domain fused to a nucleic acid editing domain, wherein the nucleic acid editing domain is fused to the N-terminus of the Cas9 domain. In some embodiments, the nucleic acid editing domain is fused to the C-terminus of the Cas9 domain. In some embodiments, the Cas9 domain and the nucleic acid editing-editing domain are fused via a linker. In some embodiments, the linker comprises a (GGGS)n (SEQ ID NO: 93), a (GGGGS)n (SEQ ID NO: 95), a (G)n (SEQ ID NO: 97), an (EAAAK)n (SEQ ID NO: 99), a (GGS)n (SEQ ID NO: 101), (SGGS) n (SEQ ID NO: 91), an SGSETPGTSESATPES (SEQ ID NO: 89) motif (see, e.g., Guilinger JP, Thompson DB, Liu DR. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol.2014; 32(6): 577-82; the entire contents are incorporated herein by reference), or a combination of any of these, wherein n is independently an integer between 1 and 30. In some embodiments, n is independently 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30, or, if more than one linker or more than one linker motif is present, any combination thereof. In some embodiments, the linker comprises a (GGS)n motif (SEQ ID NO: 101), wherein n is 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. Additional suitable linker motifs and linker configurations will be apparent to those of ordinary skill in the art (e.g., SEQ ID NOs: 89-112). In some embodiments, suitable linker motifs and configurations include those described in Chen et al., Fusion protein linkers: property, design and functionality. Adv. Drug Deliv. Rev.2013; 65(10):1357-69, the entire contents of which are incorporated herein by reference. Additional suitable linker sequences will be apparent to those of ordinary skill in the art based on the instant disclosure. In some embodiments, the general architecture of exemplary Cas9 fusion proteins provided herein comprises the structure: [NH 2 ]-[nucleic acid editing domain]-[Cas9 domain]-[COOH];

[NH 2 ]-[nucleic acid editing domain]-[linker]-[Cas9 domain]-[COOH]; [NH2]-[Cas9 domain]-[nucleic acid editing domain]-[COOH]; or

[NH2]-[Cas9 domain]-[linker]-[nucleic acid editing domain]-[COOH],

wherein NH 2 is the N-terminus of the fusion protein, and COOH is the C-terminus of the fusion protein. In some embodiments, the“]-[“ used in the general architecture above indicates the presence of an optional linker sequence.

[00323] The fusion proteins of the present disclosure may comprise one or more additional features. For example, in some embodiments, the fusion protein comprises a nuclear localization sequence (NLS). In some embodiments, the NLS of the fusion protein is localized between the nucleic acid editing domain and the Cas9 domain. In some embodiments, the NLS of the fusion protein is localized C-terminal to the Cas9 domain. In some embodiments, the NLS of the fusion protein is localized N-terminal to the Cas9 domain. In some embodiments, the NLS comprises the amino acid sequence of SEQ ID NO: 113 or 114. In some embodiments, the NLS comprises the amino acid sequence of SEQ ID NO: 113.

[00324] Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags that are useful for solubilization, purification, or detection of the fusion proteins. Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc-tags, calmodulin-tags, FLAG-tags,

hemagglutinin (HA)-tags, polyhistidine tags, also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags , biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of ordinary skill in the art. In some embodiments, the fusion protein comprises one or more His tags.

[00325] In some embodiments, the nucleic acid editing domain is a deaminase. In some

embodiments, the deaminase is a cytidine deaminase. For example, in some embodiments, the general architecture of exemplary Cas9 fusion proteins with a cytidine deaminase domain comprises the structure:

[NH 2 ]-[NLS]-[cytidine deaminase]-[Cas9]-[COOH];

[NH 2 ]-[Cas9]-[cytidine deaminase]-[COOH];

[NH 2 ]-[cytidine deaminase]-[Cas9]-[COOH]; or

[NH 2 ]-[cytidine deaminase]-[Cas9]-[NLS]-[COOH],

wherein NLS is a nuclear localization sequence, NH 2 is the N-terminus of the fusion protein, and COOH is the C-terminus of the fusion protein. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT Application, PCT/EP2000/011690, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 113) or

MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 114). In some embodiments, a linker is inserted between the Cas9 and the cytidine deaminase. In some embodiments, the NLS is located C- terminal of the Cas9 domain. In some embodiments, the NLS is located N-terminal of the Cas9 domain. In some embodiments, the NLS is located between the cytidine deaminase and the Cas9 domain. In some embodiments, the NLS is located N-terminal of the cytidine deaminase domain. In some embodiments, the NLS is located C-terminal of the cytidine deaminase domain. In some embodiments, the“]-[“ used in the general architecture above indicates the presence of an optional linker sequence.

[00326] In some embodiments, the fusion protein comprises any one of nucleic acid editing domains provided herein. In some embodiments, the nucleic acid editing domain is a cytidine or adenosine deaminase domain provided herein.

[00327] In some embodiments, the cytidine deaminase domain and the Cas9 domain are fused to each other via a linker. Various linker lengths and flexibilities between the deaminase domain (e.g., AID, APOBEC family deaminase) and the Cas9 domain can be employed, for example, ranging from very flexible linkers of the form (GGGS)n (SEQ ID NO: 93), (GGGGS)n (SEQ ID NO: 95), (GGS)n (SEQ ID NO: 101), and (G)n (SEQ ID NO: 97), to more rigid linkers of the form (EAAAK)n (SEQ ID NO: 99), (SGGS)n (SEQ ID NO: 91), SGGS(GGS)n (SEQ ID NO: 103), SGSETPGTSESATPES (SEQ ID NO: 89) (see, e.g., Guilinger JP, Thompson DB, Liu DR. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol.2014; 32(6): 577-82; the entire contents are incorporated herein by reference),

(SGGS)n-SGSETPGTSESATPES-(SGGS)n (SEQ ID NO: 98), and (XP)n, wherein n is an integer between 1 and 30, inclusive, in order to achieve the optimal length for deaminase activity for the specific application. In some embodiments, the linker comprises a (GGS)n motif, wherein n is 1, 3, or 7. In some embodiments, the linker comprises a SGSETPGTSESATPES (SEQ ID NO: 89) motif. In some embodiments, the linker comprises a (SGGS)2-SGSETPGTSESATPES-(SGGS)2 (SEQ ID NO: 96) motif.

[00328] In some embodiments, the fusion protein comprises a Cas9 domain (e.g., a Cas9 domain comprising one or more mutations that recognizes a non-canonical PAM sequence) fused to a cytidine deaminase domain, wherein the fusion protein comprises the amino acid sequence of SEQ ID NO: 2. In some embodiments, the fusion protein comprises any one of the amino acid sequences of SEQ ID NOs: 122-132.

[00329] Some aspects of the disclosure relate to fusion proteins that comprise a uracil glycosylase inhibitor (UGI) domain. In some embodiments, any of the fusion proteins provided herein that comprise a Cas9 domain (e.g., a Cas9 domain comprising one or more mutations that recognizes a non-canonical PAM sequence) may be further fused to a UGI domain either directly or via a linker. Some aspects of this disclosure provide deaminase-dCas9 fusion proteins, deaminase-nuclease active Cas9 fusion proteins and deaminase-Cas9 nickase fusion proteins with increased nucleobase editing efficiency. Without wishing to be bound by any particular theory, cellular DNA-repair response to the presence of U:G heteroduplex DNA may be responsible for the decrease in nucleobase editing efficiency in cells. For example, uracil DNA glycosylase (UDG) catalyzes removal of U from DNA in cells, which may initiate base excision repair, with reversion of the U:G pair to a C:G pair as the most common outcome. A Uracil DNA Glycosylase Inhibitor (UGI) may inhibit human UDG activity. Thus, this disclosure contemplates a fusion protein comprising a Cas9 domain and a nucleic acid editing domain (e.g., a deaminase) further fused to a UGI domain. In some embodiments, the fusion protein comprising a Cas9 nickase-nucleic acid editing domain further fused to a UGI domain. In some embodiments, the fusion protein comprising a dCas9-nucleic acid editing domain further fused to a UGI domain. It should be understood that the use of a UGI domain may increase the editing efficiency of a nucleic acid editing domain that is capable of catalyzing, for example, a C to U change. For example, fusion proteins comprising a UGI domain may be more efficient in deaminating C residues.

[00330] In some embodiments, the fusion protein comprises the structure:

[nucleic acid editing domain]-[optional linker sequence]-[Cas9]-[optional linker sequence]- [UGI];

[nucleic acid editing domain]-[optional linker sequence]-[UGI]-[optional linker sequence]- [Cas9];

[UGI]-[optional linker sequence]-[ nucleic acid editing domain]-[optional linker sequence]- [Cas9];

[UGI]-[optional linker sequence]-[Cas9]-[optional linker sequence]-[ nucleic acid editing domain];

[Cas9]-[optional linker sequence]-[ nucleic acid editing domain]-[optional linker sequence]- [UGI]; or

[Cas9]-[optional linker sequence]-[UGI]-[optional linker sequence]-[ nucleic acid editing domain].

[00331] In some embodiments, the fusion protein comprises the structure: [deaminase]-[optional linker sequence]-[Cas9]-[optional linker sequence]-[UGI];

[deaminase]-[optional linker sequence]-[UGI]-[optional linker sequence]-[Cas9];

[UGI]-[optional linker sequence]-[deaminase]-[optional linker sequence]-[Cas9];

[UGI]-[optional linker sequence]-[Cas9]-[optional linker sequence]-[deaminase];

[Cas9]-[optional linker sequence]-[deaminase]-[optional linker sequence]-[UGI]; or

[Cas9]-[optional linker sequence]-[UGI]-[optional linker sequence]-[deaminase].

[00332] In some embodiments, the fusion protein comprises the structure:

[cytidine deaminase]-[optional linker sequence]-[Cas9]-[optional linker sequence]-[UGI]; [cytidine deaminase]-[optional linker sequence]-[UGI]-[optional linker sequence]-[Cas9]; [UGI]-[optional linker sequence]-[cytidine deaminase]-[optional linker sequence]-[Cas9]; [UGI]-[optional linker sequence]-[Cas9]-[optional linker sequence]-[cytidine deaminase]; [Cas9]-[optional linker sequence]-[cytidine deaminase]-[optional linker sequence]-[UGI]; or [Cas9]-[optional linker sequence]-[UGI]-[optional linker sequence]-[cytidine deaminase].

[00333] In some embodiments, the fusion proteins provided herein do not comprise a linker sequence. In some embodiments, one or both of the optional linker sequences are present.

[00334] In some embodiments, the“-” used in the general architecture above indicates the presence of an optional linker sequence. In some embodiments, the fusion proteins comprising a UGI domain further comprise a nuclear targeting sequence, for example, a nuclear localization sequence. In some embodiments, fusion proteins provided herein further comprise a nuclear localization sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the UGI protein. In some embodiments, the NLS is fused to the C-terminus of the UGI protein. In some embodiments, the NLS is fused to the N-terminus of the Cas9 domain. In some embodiments, the NLS is fused to the C-terminus of the Cas9 domain. In some embodiments, the NLS is fused to the N-terminus of the deaminase. In some embodiments, the NLS is fused to the C-terminus of the deaminase. In some embodiments, the NLS is fused to the N-terminus of the second Cas9. In some embodiments, the NLS is fused to the C-terminus of the second Cas9. In some embodiments, the NLS is fused to the fusion protein via one or more linkers. In some embodiments, the NLS is fused to the fusion protein without a linker. In some embodiments, the NLS comprises an amino acid sequence of any one of the NLS sequences provided or referenced herein. In some embodiments, the NLS comprises an amino acid sequence as set forth in SEQ ID NO: 113 or SEQ ID NO: 114.

[00335] In some embodiments, a UGI domain comprises a wild-type UGI or a UGI as set forth in any of SEQ ID NOs: 115-120. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, a UGI domain comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 115. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in SEQ ID NO: 115. In some embodiments, a UGI comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 115 or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 115. In some embodiments, proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as“UGI variants.” A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in SEQ ID NO: 115. In some embodiments, the UGI variant comprises a fragment of UGI, such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild-type UGI or a UGI as set forth in SEQ ID NO: 115.

[00336] Suitable UGI protein and nucleotide sequences are provided herein and additional suitable UGI sequences are known to those in the art, and include, for example, those published in Wang et al., Uracil-DNA glycosylase inhibitor gene of bacteriophage PBS2 encodes a binding protein specific for uracil-DNA glycosylase. J. Biol. Chem.264:1163-1171(1989); Lundquist et al., Site- directed mutagenesis and characterization of uracil-DNA glycosylase inhibitor protein. Role of specific carboxylic amino acids in complex formation with Escherichia coli uracil-DNA glycosylase. J. Biol. Chem.272:21408-21419(1997); Ravishankar et al., X-ray analysis of a complex of

Escherichia coli uracil DNA glycosylase (EcUDG) with a proteinaceous inhibitor. The structure elucidation of a prokaryotic UDG. Nucleic Acids Res.26:4880-4887(1998); and Putnam et al., Protein mimicry of DNA from crystal structures of the uracil-DNA glycosylase inhibitor protein and its complex with Escherichia coli uracil-DNA glycosylase. J. Mol. Biol.287:331-346(1999), the entire contents of each of which are incorporated herein by reference.

[00337] It should be appreciated that additional proteins may be uracil glycosylase inhibitors. For example, other proteins that are capable of inhibiting (e.g., sterically blocking) a uracil- DNA glycosylase base-excision repair enzyme are within the scope of this disclosure. Additionally, any proteins that block or inhibit base-excision repair as also within the scope of this disclosure. In some embodiments, a protein that binds DNA is used. In another embodiment, a substitute for UGI is used. In some embodiments, a uracil glycosylase inhibitor is a protein that binds single-stranded DNA. For example, a uracil glycosylase inhibitor may be a Erwinia tasmaniensis single-stranded binding protein. In some embodiments, the single-stranded binding protein comprises the amino acid sequence (SEQ ID NO: 118). In some embodiments, a uracil glycosylase inhibitor is a protein that binds uracil. In some embodiments, a uracil glycosylase inhibitor is a protein that binds uracil in DNA. In some embodiments, a uracil glycosylase inhibitor is a catalytically inactive uracil DNA- glycosylase protein. In some embodiments, a uracil glycosylase inhibitor is a catalytically inactive uracil DNA-glycosylase protein that does not excise uracil from the DNA. For example, a uracil glycosylase inhibitor is a UdgX. In some embodiments, the UdgX comprises the amino acid sequence (SEQ ID NO: 119). As another example, a uracil glycosylase inhibitor is a catalytically inactive UDG. In some embodiments, a catalytically inactive UDG comprises the amino acid sequence (SEQ ID NO: 55). It should be appreciated that other uracil glycosylase inhibitors would be apparent to the skilled artisan and are within the scope of this disclosure. In some embodiments, a uracil glycosylase inhibitor is a protein that is homologous to any one of SEQ ID NOs: 115-120. In some embodiments, a uracil glycosylase inhibitor is a protein that is at least 50% identical, at least 55% identical, at least 60% identical, at least 65% identical, at least 70% identical, at least 75% identical, at least 80% identical at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 98% identical, at least 99% identical, or at least 99.5% identical to any one of SEQ ID NOs: 115- 120.

[00341] In various embodiments, the fusion protein is:

[00342] xCas9(3.7)–BE3 (APOBEC–linker(16aa)–xCas9(3.7)n–linker(4aa)–UGI–l inker(4aa)– [00345] In some embodiments, any of the fusion proteins provided herein comprise a second UGI domain. In some embodiments, the second UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 115-120. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, the second UGI domain comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 115. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in SEQ ID NO: 115. In some embodiments, the second UGI domain comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 115 or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 115. In some embodiments, proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as“UGI variants.” A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in SEQ ID NO: 39. In some embodiments, the UGI variant comprises a fragment of UGI, such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild-type UGI or a UGI as set forth in SEQ ID NO: 115.

[00346] In some embodiments, the fusion protein comprises the amino acid sequence of any one of SEQ ID NOs: 122-132. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO: 122. In some embodiments, the fusion protein consists of the amino acid sequence of SEQ ID NO: 123. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO: 124. In some embodiments, the fusion protein consists of the amino acid sequence of SEQ ID NO: 125. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO: 126. In some embodiments, the fusion protein consists of the amino acid sequence of SEQ ID NO: 127. In some embodiments, the fusion protein comprises an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence as set forth in SEQ ID NOs: 56-61. In some embodiments, the Cas9 domain is replaced with any of the Cas9 domains comprising one or more mutations provided herein.

[00347] xCas93.6-BE4 (APOBEC1-linker(32aa)-xCas9(3.6)n-linker(9aa)-UGI-linker(9aa )-UGI):

[00350] In some embodiments, any of the fusion proteins provided herein may further comprise a Gam protein. The term“Gam protein,” as used herein, refers generally to proteins capable of binding to one or more ends of a double strand break of a double stranded nucleic acid (e.g., double stranded DNA). In some embodiments, the Gam protein prevents or inhibits degradation of one or more strands of a nucleic acid at the site of the double strand break. In some embodiments, a Gam protein is a naturally-occurring Gam protein from bacteriophage Mu, or a non-naturally occurring variant thereof. Fusion proteins comprising Gam proteins are described in Komor et al. (2017) Improved Base Excision Repair Inhibition and Bateriophage Mu Gam Protein Yields C:G-to-T:A base editors with higher efficiency and product purity. Sci Adv, 3: eaao4774; the entire contents of which is incorporated by reference herein. In some embodiments, the Gam protein comprises an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence provided by SEQ ID NO: 121. In some embodiments, the Gam protein comprises the amino acid sequence of SEQ ID NO: 121. In some embodiments, the fusion protein (e.g., BE4-Gam of SEQ ID NO: 126) comprises a Gam protein, wherein the Cas9 domain of BE4 is replaced with any of the Cas9 domains provided herein.

[00351] Gam from bacteriophage Mu:

AKPAKRIKSAAAAYVPQNRDAVITDIKRIGDLQREASRLETEMNDAIAEITEKFAARIAP IKTDIETL SKGVQGWCEANRDELTNGGKVKTANLVTGDVSWRVRPPSVSIRGMDAVMETLERLGLQRF IRTKQEIN KEAILLEPKAVAGVAGITVKSGIEDFSIIPFEQEAGI (SEQ ID NO: 121)

[00352] BE4-Gam:

[00353] Some aspects of the disclosure provide fusion proteins comprising a nucleic acid Cas9 domain (e.g., ) and an adenosine deaminase. In some embodiments, any of the fusion proteins provided herein are base editors. Some aspects of the disclosure provide fusion proteins comprising a Cas9 domain and an adenosine deaminase. The Cas9 domain may be any of the Cas9 domains (e.g., a Cas9 domain) provided herein. In some embodiments, any of the Cas9 domains (e.g., a Cas9 domain) provided herein may be fused with any of the adenosine deaminases provided herein. In some embodiments, the fusion protein comprises the structure:

[NH 2 ]-[adenosine deaminase]-[Cas9]-[COOH]; or

[NH 2 ]-[Cas9]-[adenosine deaminase]-[COOH].

[00354] In some embodiments, the fusion proteins comprising an adenosine deaminase and a Cas9 domain do not include a linker sequence. In some embodiments, a linker is present between the adenosine deaminase domain and the Cas9 domain. In some embodiments, the“-“ used in the general architecture above indicates the presence of an optional linker. In some embodiments, the adenosine deaminase and the Cas9 domain are fused via any of the linkers provided herein. For example, in some embodiments the adenosine deaminase and the Cas9 domain are fused via any of the linkers provided below. In some embodiments, the linker comprises the amino acid sequence of any one of SEQ ID NOs: 89-112. In some embodiments, the adenosine deaminase and the Cas9 domain are fused via a linker that comprises between 1 and 200 amino acids. In some embodiments, the adenosine deaminase and the Cas9 domain are fused via a linker that comprises from 1 to 5, 1 to 10, 1 to 20, 1 to 30, 1 to 40, 1 to 50, 1 to 60, 1 to 80, 1 to 100, 1 to 150, 1 to 200, 5 to 10, 5 to 20, 5 to 30, 5 to 40, 5 to 60, 5 to 80, 5 to 100, 5 to 150, 5 to 200, 10 to 20, 10 to 30, 10 to 40, 10 to 50, 10 to 60, 10 to 80, 10 to 100, 10 to 150, 10 to 200, 20 to 30, 20 to 40, 20 to 50, 20 to 60, 20 to 80, 20 to 100, 20 to 150, 20 to 200, 30 to 40, 30 to 50, 30 to 60, 30 to 80, 30 to 100, 30 to 150, 30 to 200, 40 to 50, 40 to 60, 40 to 80, 40 to 100, 40 to 150, 40 to 200, 50 to 6050 to 80, 50 to 100, 50 to 150, 50 to 200, 60 to 80, 60 to 100, 60 to 150, 60 to 200, 80 to 100, 80 to 150, 80 to 200, 100 to 150, 100 to 200, or 150 to 200 amino acids in length. In some embodiments, the adenosine deaminase and the Cas9 domain are fused via a linker that comprises 3, 4, 16, 24, 32, 64, 100, or 104 amino acids in length. In some embodiments, the adenosine deaminase and the Cas9 domain are fused via a linker that comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89),

SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 106), or

GGSGGSPGSPAGSPTSTEEGTSESATPESGPGTSTEPSEGSAPGSPAGSPTSTEEGT STEPSEGSAP GTSTEPSEGSAPGTSESATPESGPGSEPATSGGSGGS (SEQ ID NO: 110). In some embodiments, the adenosine deaminase and the Cas9 domain are fused via a linker comprising the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89), which may also be referred to as the XTEN linker. In some embodiments, the linker is 24 amino acids in length. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPES (SEQ ID NO: 111). In some embodiments, the linker is 32 amino acids in length. In some embodiments, the linker comprises the amino acid sequence (SGGS)2-SGSETPGTSESATPES-(SGGS)2 (SEQ ID NO: 96), which may also be referred to as (SGGS)2-XTEN-(SGGS)2. In some embodiments, the linker comprises the amino acid sequence (SGGS)n-SGSETPGTSESATPES-(SGGS)n (SEQ ID NO: 98), wherein n is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, the linker is 40 amino acids in length. In some embodiments, the linker comprises the amino acid sequence

SGGSSGGSSGSETPGTSESATPESSGGSSGGSSGGSSGGS (SEQ ID NO: 106). In some embodiments, the linker is 64 amino acids in length. In some embodiments, the linker comprises the amino acid sequence

SGGSSGGSSGSETPGTSESATPESSGGSSGGSSGGSSGGSSGSETPGTSESATPESS GGSSGGS

(SEQ ID NO: 107). In some embodiments, the linker is 92 amino acids in length. In some embodiments, the linker comprises the amino acid sequence

PGSPAGSPTSTEEGTSESATPESGPGTSTEPSEGSAPGSPAGSPTSTEEGTSTEPSE GSAPGTSTEP SEGSAPGTSESATPESGPGSEPATS (SEQ ID NO: 108).

[00355] In some embodiments, the fusion proteins comprise one or more adenosine deaminases defined herein, or to any amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences set forth herein.

[00356] In some embodiments, the fusion proteins comprising an adenosine deaminase provided herein further comprise one or more nuclear targeting sequences, for example, a nuclear localization sequence (NLS). In some embodiments, a NLS comprises an amino acid sequence that facilitates the importation of a protein, that comprises an NLS, into the cell nucleus (e.g., by nuclear transport). In some embodiments, any of the fusion proteins provided herein further comprise a nuclear localization sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In some embodiments, the NLS is fused to the N-terminus of the IBR (e.g., dISN). In some

embodiments, the NLS is fused to the C-terminus of the IBR (e.g., dISN). In some embodiments, the NLS is fused to the N-terminus of the Cas9 domain. In some embodiments, the NLS is fused to the C- terminus of the Cas9 domain. In some embodiments, the NLS is fused to the N-terminus of the adenosine deaminase. In some embodiments, the NLS is fused to the C-terminus of the adenosine deaminase. In some embodiments, the NLS is fused to the fusion protein via one or more linkers. In some embodiments, the NLS is fused to the fusion protein without a linker. In some embodiments, the NLS comprises an amino acid sequence of any one of the NLS sequences provided or referenced herein. In some embodiments, the NLS comprises an amino acid sequence as set forth in SEQ ID NO: 37 or SEQ ID NO: 38. Additional nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al.,

PCT/EP2000/011690, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 113). In some embodiments, a NLS comprises the amino acid sequence MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 114).

[00357] In some embodiments, the general architecture of exemplary fusion proteins with an adenosine deaminase and a Cas9 domain comprises any one of the following structures, where NLS is a nuclear localization sequence (e.g., any NLS provided herein), NH 2 is the N-terminus of the fusion protein, and COOH is the C-terminus of the fusion protein. Fusion proteins comprising an adenosine deaminase, a napDNAbp, and a NLS:

NH 2 -[NLS]-[adenosine deaminase]-[Cas9]-COOH;

NH 2 -[adenosine deaminase]-[NLS]-[Cas9]-COOH;

NH 2 -[adenosine deaminase]-[Cas9]-[NLS]-COOH;

NH 2 -[NLS]-[Cas9]-[adenosine deaminase]-COOH;

NH 2 -[Cas9]-[NLS]-[adenosine deaminase]-COOH; and

NH 2 -[Cas9]-[adenosine deaminase]-[NLS]-COOH.

[00358] In some embodiments, the fusion proteins comprising an adenosine deaminase domain provided herein do not comprise a linker. In some embodiments, a linker is present between one or more of the domains or proteins (e.g., adenosine deaminase, Cas9 domain, and/or NLS). In some embodiments, the“ -” used in the general architecture above indicates the presence of an optional linker. [00359] Some aspects of the disclosure provide fusion proteins that comprise a Cas9 domain (e.g. a Cas9 domain) and at least two adenosine deaminase domains. Without wishing to be bound by any particular theory, dimerization of adenosine deaminases (e.g., in cis or in trans) may improve the ability (e.g., efficiency) of the fusion protein to modify a nucleic acid base, for example to deaminate adenine. In some embodiments, any of the fusion proteins may comprise 2, 3, 4 or 5 adenosine deaminase domains. In some embodiments, any of the fusion proteins provided herein comprise two adenosine deaminases. In some embodiments, any of the fusion proteins provided herein contain only two adenosine deaminases. In some embodiments, the adenosine deaminases are the same. In some embodiments, the adenosine deaminases are any of the adenosine deaminases provided herein. In some embodiments, the adenosine deaminases are different. In some

embodiments, the first adenosine deaminase is any of the adenosine deaminases provided herein, and the second adenosine is any of the adenosine deaminases provided herein, but is not identical to the first adenosine deaminase. Additional fusion protein constructs comprising two adenosine deaminase domains suitable for use herein are illustrated in Gaudelli et al. (2017) Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage, Nature, 551(23); 464-471; the entire contents of which is incorporated herein by reference.

[00360] In some embodiments, the first adenosine deaminase and the second deaminase are fused directly or via a linker. In some embodiments, the linker is any of the linkers provided herein. In some embodiments, the linker comprises the amino acid sequence of any one of the linker sequences disclosed herein (e.g., linkers of SEQ ID NOs: 21-36, 64, 65, 66, or 67). In some embodiments, the first adenosine deaminase is the same as the second adenosine deaminase. In some embodiments, the first adenosine deaminase and the second adenosine deaminase are any of the adenosine deaminases described herein. In some embodiments, the first adenosine deaminase and the second adenosine deaminase are different. In some embodiments, the first adenosine deaminase is any of the adenosine deaminases provided herein. In some embodiments, the second adenosine deaminase is any of the adenosine deaminases provided herein but is not identical to the first adenosine deaminase. In some embodiments, the first adenosine deaminase is an ecTadA adenosine deaminase. In some

embodiments, the first adenosine deaminase comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences set forth herein.

[00361] In some embodiments, the general architecture of exemplary fusion proteins with a first adenosine deaminase, a second adenosine deaminase, and a Cas9 domain (e.g. ) comprises any one of the following structures, where NLS is a nuclear localization sequence (e.g., any NLS provided herein), NH 2 is the N-terminus of the fusion protein, and COOH is the C-terminus of the fusion protein:

NH 2 -[first adenosine deaminase]-[second adenosine deaminase]-[Cas9]-COOH;

NH 2 -[first adenosine deaminase]-[Cas9]-[second adenosine deaminase]-COOH;

NH 2 -[Cas9]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;

NH 2 -[second adenosine deaminase]-[first adenosine deaminase]-[Cas9]-COOH;

NH 2 -[second adenosine deaminase]-[Cas9]-[first adenosine deaminase]-COOH;

NH 2 -[Cas9]-[second adenosine deaminase]-[first adenosine deaminase]-COOH;

[00362] In some embodiments, the fusion proteins provided herein do not comprise a linker. In some embodiments, a linker is present between one or more of the domains or proteins (e.g., first adenosine deaminase, second adenosine deaminase, and/or napDNAbp). In some embodiments, the“-” used in the general architecture above indicates the presence of an optional linker.

[00363] In some embodiments, a fusion protein comprising a first adenosine deaminase, a second adenosine deaminase, and a Cas9 domain further comprise a NLS. Exemplary fusion proteins comprising a first adenosine deaminase, a second adenosine deaminase, a napDNAbp, and an NLS are shown as follows: NH 2 -[NLS]-[first adenosine deaminase]-[second adenosine deaminase]-[Cas9]-COOH;

NH 2 -[first adenosine deaminase]-[NLS]-[second adenosine deaminase]-[Cas9]-COOH;

NH 2 -[first adenosine deaminase]-[second adenosine deaminase]-[NLS]-[Cas9]-COOH;

NH 2 -[first adenosine deaminase]-[second adenosine deaminase]-[Cas9]-[NLS]-COOH;

NH 2 -[NLS]-[first adenosine deaminase]-[Cas9]-[second adenosine deaminase]-COOH;

NH 2 -[first adenosine deaminase]-[NLS]-[Cas9]-[second adenosine deaminase]-COOH;

NH 2 -[first adenosine deaminase]-[Cas9]-[NLS]-[second adenosine deaminase]-COOH;

NH 2 -[first adenosine deaminase]-[Cas9]-[second adenosine deaminase]-[NLS]-COOH;

NH 2 -[NLS]-[Cas9]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;

NH 2 -[Cas9]-[NLS]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;

NH 2 -[Cas9]-[first adenosine deaminase]-[NLS]-[second adenosine deaminase]-COOH;

NH 2 -[Cas9]-[first adenosine deaminase]-[second adenosine deaminase]-[NLS]-COOH;

NH 2 -[NLS]-[second adenosine deaminase]-[first adenosine deaminase]-[Cas9]-COOH;

NH 2 -[second adenosine deaminase]-[NLS]-[first adenosine deaminase]-[Cas9]-COOH;

NH 2 -[second adenosine deaminase]-[first adenosine deaminase]-[NLS]-[Cas9]-COOH;

NH 2 -[second adenosine deaminase]-[first adenosine deaminase]-[Cas9]-[NLS]-COOH;

NH 2 -[NLS]-[second adenosine deaminase]-[Cas9]-[first adenosine deaminase]-COOH;

NH 2 -[second adenosine deaminase]-[NLS]-[Cas9]-[first adenosine deaminase]-COOH;

NH 2 -[second adenosine deaminase]-[Cas9]-[NLS]-[first adenosine deaminase]-COOH;

NH 2 -[second adenosine deaminase]-[Cas9]-[first adenosine deaminase]-[NLS]-COOH;

NH 2 -[NLS]-[Cas9]-[second adenosine deaminase]-[first adenosine deaminase]-COOH;

NH 2 -[Cas9]-[NLS]-[second adenosine deaminase]-[first adenosine deaminase]-COOH;

NH 2 -[Cas9]-[second adenosine deaminase]-[NLS]-[first adenosine deaminase]-COOH;

NH 2 -[Cas9]-[second adenosine deaminase]-[first adenosine deaminase]-[NLS]-COOH;

[00364] In some embodiments, the fusion proteins provided herein do not comprise a linker. In some embodiments, a linker is present between one or more of the domains or proteins (e.g., first adenosine deaminase, second adenosine deaminase, Cas9 domain, and/or NLS). In some embodiments, the“-” used in the general architecture above indicates the presence of an optional linker.

[00365] In some embodiments, the fusion protein comprises a Cas9 domain fused to one or more adenosine deaminase domains (e.g., a first adenosine deaminase and a second adenosine deaminase), wherein the fusion protein comprises or consists of the amino acid sequence of SEQ ID NO: 127. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO: 128. In some embodiments, the fusion protein is the amino acid sequence of SEQ ID NO: 129. In some embodiments, the Cas9 domain of SEQ ID NOs: 127-129 is replaced with any of the Cas9 domains provided herein.

[00366] xCas9(3.7)–ABE: (ecTadA(wt)–linker(32 aa)–ecTadA*(7.10)–linker(32 aa)–nxCas9(3.7)– NLS):

[00368] ABE7.10: ecTadA (wild-type) -(SGGS) 2 -XTEN-(SGGS) 2 - ecTadA (W23R_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y _R152P_E155V_I156F_K157N) -(SGGS) 2 -XTEN- (SGGS) C 9 SGGS NLS

[00370] In some embodiments, the fusion proteins provided herein comprising one or more adenosine deaminase domains and a Cas9 domain exhibit an increased activity on a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to a fusion protein comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the fusion protein exhibits an activity on a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of a fusion protein comprising

Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of NGT, NGA, NGC, and NNG, wherein N is an A, G, T, or C. In some

embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, and CAA. In some embodiments, the fusion protein activity is measured by a nuclease assay, a deamination assay, a transcriptional activation assay, or high- throughput sequencing. In some embodiments, the transcriptional activation assay is a GFP activation assay. In some embodiments, high-throughput sequencing is used to measure indel formation.

[00371] It should be appreciated that the fusion proteins of the present disclosure may comprise one or more additional features. For example, in some embodiments, the fusion protein may comprise cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags that are useful for solubilization, purification, or detection of the fusion proteins. Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc-tags, calmodulin-tags, FLAG-tags,

hemagglutinin (HA)-tags, polyhistidine tags, also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags, biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of ordinary skill in the art. In some embodiments, the fusion protein comprises one or more His tags.

[00372] Additional suitable strategies for generating fusion proteins comprising a napDNAbp (e.g., a Cas9 domain) and a nucleic acid editing domain (e.g., a deaminase domain) will be apparent to those of ordinary skill in the art based on this disclosure in combination with the general knowledge in the art. Suitable strategies for generating fusion proteins according to aspects of this disclosure using linkers or without the use of linkers will also be apparent to those of ordinary skill in the art in view of the instant disclosure and the knowledge in the art. For example, Gilbert et al., CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell.2013; 154(2):442-51, showed that C-terminal fusions of Cas9 with VP64 using 2 NLS’s as a linker, can be employed for transcriptional activation. Mali et al. (CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat Biotechnol.2013; 31(9):833-8), reported that C-terminal fusions with VP64 without linker can be employed for transcriptional activation. And Maeder et al. (CRISPR RNA-guided activation of endogenous human genes. Nat Methods.2013; 10: 977-979), reported that C-terminal fusions with VP64 using a GGGGS (SEQ ID NO: 94) linker can be used as transcriptional activators. Recently, dCas9- FokI nuclease fusions have successfully been generated and exhibit improved enzymatic specificity as compared to the parental Cas9 enzyme (In Guilinger JP, Thompson DB, Liu DR. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol.2014; 32(6): 577-82; and in Tsai SQ, et al. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat Biotechnol.2014; 32(6):569-76. PMID: 24770325 a SGSETPGTSESATPES (SEQ ID NO: 89) or a GGGGS (SEQ ID NO: 94) linker was used in FokI-dCas9 fusion proteins, respectively).

[00373] In some embodiments, the Cas9 fusion protein comprises: (i) Cas9 domain; and (ii) a transcriptional activator domain. In some embodiments, the transcriptional activator domain comprises a VPR. VPR is a VP64-SV40-P65-RTA tripartite activator. In some embodiments, VPR comprises a VP64 amino acid sequence encoded by the nucleic acid sequence of SEQ ID NO: 85: ( Q )

[00374] In some embodiments, VPR comprises a VP64 amino acid sequence as set forth in SEQ ID NO: 86:

EASGSGRADALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLGSDALDDFDLDMLIN SR (SEQ ID NO: 86).

[00375] In some embodiments, VPR compises a VP64-SV40-P65-RTA amino acid sequence encoded

[00376] In some embodiments, VPR comprises a VP64-SV40-P65-RTA amino acid sequence as set forth in SEQ ID NO: 88:

(SEQ ID NO: 88).

[00377] Some aspects of this disclosure provide fusion proteins comprising a transcription activator. In some embodiments, the transcriptional activator is VPR. In some embodiments, the VPR comprises a wild type VPR or a VPR as set forth in SEQ ID NO: 88. In some embodiments, the VPR proteins provided herein include fragments of VPR and proteins homologous to a VPR or a VPR fragment. For example, in some embodiments, a VPR comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 88. In some embodiments, a VPR comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 88 or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 8. In some embodiments, proteins comprising VPR or fragments of VPR or homologs of VPR or VPR fragments are referred to as“VPR variants.” A VPR variant shares homology to VPR, or a fragment thereof. For example a VPR variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to a wild type VPR or a VPR as set forth in SEQ ID NO: 88. In some embodiments, the VPR variant comprises a fragment of VPR, such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to the corresponding fragment of wild type VPR or a VPR as set forth in SEQ ID NO: 88. In some embodiments, the VPR comprises the amino acid sequence set forth in SEQ ID NO: 88. In some embodiments, the VPR comprises an amino acid sequence encoded by the nucleotide sequence set forth in SEQ ID NO: 88.

[00378] In some embodiments, a VPR is a VP64-SV40-P65-RTA triple activator. In some embodiments, the VP64-SV40-P65-RTA comprises a VP64-SV40-P65-RTA as set forth in SEQ ID NO: 88. In some embodiments, the VP64-SV40-P65-RTA proteins provided herein include fragments of VP64-SV40-P65-RTA and proteins homologous to a VP64-SV40-P65-RTA or a VP64-SV40-P65- RTA fragment. For example, in some embodiments, a VP64-SV40-P65-RTA comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 88. In some embodiments, a VP64-SV40-P65-RTA comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 88 or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 88. In some embodiments, proteins comprising VP64-SV40-P65-RTA or fragments of VP64- SV40-P65-RTA or homologs of VP64-SV40-P65-RTA or VP64-SV40-P65-RTA fragments are referred to as“VP64-SV40-P65-RTA variants.” A VP64-SV40-P65-RTA variant shares homology to VP64-SV40-P65-RTA, or a fragment thereof. For example a VP64-SV40-P65-RTA variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to a VP64-SV40- P65-RTA as set forth in SEQ ID NO: 88. In some embodiments, the VP64-SV40-P65-RTA variant comprises a fragment of VP64-SV40-P65-RTA, such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% to a fragment of a VP64-SV40-P65-RTA as set forth in SEQ ID NO: 88. In some embodiments, the VP64-SV40-P65-RTA comprises the amino acid sequence set forth in SEQ ID NO: 88. In some embodiments, the VP64-SV40-P65-RTA comprises an amino acid sequence encoded by the nucleotide sequence set forth in SEQ ID NO: 87.

[00379] In some embodiments, the fusion protein comprises the nucleic acid sequence of SEQ ID NO: 87.

[00380] dCas9–VPR (dCas9(3.7)–NLS–linker(22aa)–VP64–linker(4aa)–NLS p65AD–linker(6aa)-

[00381] Some aspects of this disclosure provide fusion proteins comprising a Cas9 domain as provided herein that is fused to a second protein, or a“fusion partner”, such as a nucleic acid editing domain, thus forming a fusion protein. In some embodiments, the nucleic acid editing domain is fused to the N-terminus of the Cas9 domain. In some embodiments, the nucleic acid editing domain is fused to the C-terminus of the Cas9 domain. In some embodiments, the Cas9 domain and the nucleic acid editing domain are fused to each other via a linker. Suitable strategies for generating fusion proteins according to aspects of this disclosure using linkers or without the use of linkers will also be apparent to those of skill in the art in view of the instant disclosure and the knowledge in the art. For example, Gilbert et al., CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013; 154(2):442-51, showed that C-terminal fusions of Cas9 with VP64 using 2 NLS’s as a linker (SPKKKRKVEAS), can be employed for transcriptional activation. Mali et al., CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat Biotechnol.2013; 31(9):833-8, reported that C-terminal fusions with VP64 without linker can be employed for transcriptional activation. Maeder et al., CRISPR RNA-guided activation of endogenous human genes. Nat Methods.2013; 10: 977-979, reported that C-terminal fusions with VP64 using a GGGGS (SEQ ID NO: 94) linker can be used as transcriptional activators. Recently, dCas9- FokI nuclease fusions have successfully been generated and exhibit improved enzymatic specificity as compared to the parental Cas9 enzyme (In Guilinger JP, Thompson DB, Liu DR. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol.2014; 32(6): 577-82, and in Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, Goodwin MJ, Aryee MJ, Joung JK. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat Biotechnol.2014; 32(6):569-76. PMID: 24770325 a

SGSETPGTSESATPES (SEQ ID NO: 89) or a GGGGS n (SEQ ID NO: 95) linker was used in FokI- dCas9 fusion proteins, respectively).

[00382] In some embodiments, the second protein in the fusion protein (i.e., the fusion partner) comprises a nucleic acid editing domain. Such a nucleic acid editing domain may be, without limitation, a nuclease, a nickase, a recombinase, a deaminase, a methyltransferase, a methylase, an acetylase, or an acetyltransferase. Non-limiting exemplary nucleic acid editing domains that may be used in accordance with this disclosure include cytidine deaminases and adenosine deaminases. In some embodiments, the nucleic acid editing domain is a deaminase domain. In some embodiments, the nucleic acid editing domain is a nuclease domain. In some embodiments, the nuclease domain is a FokI DNA cleavage domain. In some embodiments, this disclosure provides dimers of the fusion proteins provided herein, e.g., dimers of fusion proteins may include a dimerizing nuclease domain. In some embodiments, the nucleic acid editing domain is a nickase domain. In some embodiments, the nucleic acid editing domain is a recombinase domain. In some embodiments, the nucleic acid editing domain is a methyltransferase domain. In some embodiments, the nucleic acid editing domain is a methylase domain. In some embodiments, the nucleic acid editing domain is an acetylase domain. In some embodiments, the nucleic acid editing domain is an acetyltransferase domain. Additional nucleic acid editing domains would be apparent to a person of ordinary skill in the art based on this disclosure and knowledge in the field and are within the scope of this disclosure. In other embodiments, the second protein comprises a domain that modulates transcriptional activity. Such transcriptional modulating domains may be, without limitation, a transcriptional activator or transcriptional repressor domain.

Guide RNA

[00383] In various embodiments, the base editors described herein may be complexed, bound, or otherwise associated with (e.g., via any type of covalent or non-covalent bond) one or more guide sequences, i.e., the sequence which becomes associated or bound to the base editor and directs its localization to a specific target sequence having complementarity to the guide sequence or a portion thereof. The particular design embodiments of a guide sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., type of Cas protein) present in the base editor, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.

[00384] In general, a guide sequence is any polynucleotide sequence having sufficient

complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non- limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.

[00385] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a base editor to a target sequence may be assessed by any suitable assay. For example, the components of a base editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a base editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a base editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.

[00386] A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 134) where NNNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 135) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG (SEQ ID NO: 134) where NNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 135) has a single occurrence in the genome. For the S. thermophilus CRISPR1Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXXAGAAW (SEQ ID NO: 138) where NNNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T) (SEQ ID NO: 139) has a single occurrence in the genome. A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form

MMMMMMMMMNNNNNNNNNNNXXAGAAW (SEQ ID NO: 140) where NNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T) (SEQ ID NO: 140) has a single occurrence in the genome. For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 142) where

NNNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 142) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG (SEQ ID NO: 144) where

NNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 145) has a single occurrence in the genome. In each of these sequences“M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique.

[00387] In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker & Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr & GM Church, 2009, Nature Biotechnology 27(12): 1151-62). Additional algorithms may be found in Chuai, G. et al., DeepCRISPR: optimized CRISPR guide RNA design by deep learning, Genome Biol. 19:80 (2018), and U.S. application Ser. No.61/836,080, the entireties of each of which are incorporated herein by reference.

[00388] The guide sequence is linked to a tracr mate sequence which in turn hybridizes to a tracr sequence. A tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the disclosure, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In certain embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the disclosure, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5¢ to 3¢), where“N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator:

[00389] In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.

It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a thymine alkyltransferase, as disclosed herein, to a target site, e.g., a site comprising a point mutation to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein. In some embodiments, the guide RNA comprises a structure 5¢-[guide sequence]- guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaagu ggcaccga gucggugcuuuuu-3¢ (SEQ ID NO: 152), wherein the guide sequence comprises a sequence that is complementary to the target sequence. See U.S. Publication No.2015/0166981, published June 18, 2015, the disclosure of which is incorporated by reference herein in its entirety. The guide sequence is typically 20 nucleotides long.

[00390] The sequences of suitable guide RNAs for targeting Cas9:nucleic acid editing enzyme/domain fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are are well known in the art and can be used with the base editors described

herein.Additional exemplary guide sequences are disclosed in, for example, Jinek M., et al., Science 337:816-821(2012); Mali P, Esvelt KM & Church GM (2013) Cas9 as a versatile tool for engineering biology, Nature Methods, 10, 957-963; Li JF et al., (2013) Multiplex and homologous recombination- mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9, Nature Biotechnology, 31, 688-691; Hwang, W.Y. et al., Efficient genome editing in zebrafish using a CRISPR-Cas system, Nature Biotechnology 31, 227-229 (2013); Cong L et al., (2013) Multiplex genome engineering using CRIPSR/Cas systems, Science, 339, 819-823; Cho SW et al., (2013) Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease, Nature Biotechnology, 31, 230-232; Jinek, M. et al., RNA-programmed genome editing in human cells, eLife 2, e00471 (2013); Dicarlo, J.E. et al., Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acid Res. (2013); Briner AE et al., (2014) Guide RNA functional modules direct Cas9 activity and orthogonality, Mol Cell, 56, 333-339, the entire contents of each of which are herein incorporated by reference.

Base editor complexes

[00391] Further provided herein are complexes comprising (i) any of the fusion proteins provided herein, and (ii) a guide RNA bound to the Cas9 domain of the fusion protein. Without wishing to be bound by any particular theory, these fusion proteins can be directed by designing a suitable guide RNA to specifically and efficiently target single point mutations in a genome without introducing double-stranded DNA breaks or requiring homology directed repair (HDR). However, the suitability of a target site for base editing (e.g., a point mutation in the genome) is dependent on the presence of a suitably positioned PAM. The broaden PAM compatibility of the Cas9 domains provided herein has the potential to expand the targeting scope of base editors to those target sites that do not lie within approximately 15 nucleotides of a canonical 5¢-NGG-3¢ PAM sequence. A person of ordinary skill in the art will be able to design a suitable guide RNA (gRNA) sequence to target a desired point mutation based on this disclosure and knowledge in the field. In addition, these fusion proteins comprising a Cas9 domain generate fewer insertions and deletions (indels) and exhibit reduced off-target activity compared to fusion proteins (e.g., base editors) comprising a Cas9 domain that can only recognize the canonical 5¢-NGG-3¢ PAM sequence.

[00392] In some embodiments, the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides long. In some embodiments, the guide RNA comprises a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the target sequence is a DNA sequence. In some embodiments, the target sequence is in the genome of an organism. In some embodiments, the organism is a prokaryote. In some embodiments, the prokaryote is a bacterium. In some embodiments, the bacterium is E. coli. In some embodiments, the organism is a eukaryote. In some embodiments, the organism is a plant or fungus. In some embodiments, the organism is a vertebrate. In some embodiments, the vertebrate is a mammal. In some embodiments, the mammal is a human. In some embodiments, the organism is a cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a HEK293T or U2OS cell.

[00393] In some embodiments, the target sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the target sequence comprises a T®C point mutation. In some embodiments, the complex deaminates the target C point mutation, wherein the deamination results in a sequence that is not associated with a disease or disorder. In some embodiments, the target C point mutation is present in the DNA strand that is not complementary to the guide RNA. In some embodiments, the target sequence comprises a T®A point mutation. In some embodiments, the complex deaminates the target A point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. In some embodiments, the target A point mutation is present in the DNA strand that is not complementary to the guide RNA.

[00394] In some embodiments, the complex edits a point mutation in the target sequence. In some embodiments, the point mutation is located between about 10 to about 20 nucleotides upstream of the PAM in the target sequence. In some embodiments, the point mutation is located between about 13 to about 17 nucleotides upstream of the PAM in the target sequence. In some embodiments, the point mutation is about 13 nucleotides upstream of the PAM. In some embodiments, the point mutation is about 14 nucleotides upstream of the PAM. In some embodiments, the point mutation is about 15 nucleotides upstream of the PAM. In some embodiments, the point mutation is about 16 nucleotides upstream of the PAM. In some embodiments, the point mutation is about 17 nucleotides upstream of the PAM.

[00395] In some embodiments, the complex exhibits increased deamination efficiency of a point mutation in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the deamination efficiency of a complex comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the complex exhibits increased deamination efficiency of a point mutation in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5- fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the deamination efficiency of complex comprising the Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of NGT, NGA, NGC, and NNG, wherein N is an A, G, T, or C. In some

embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, and CAA. In some embodiments, deamination activity is measured using high-throughput sequencing.

[00396] In some embodiments, the complex produces fewer indels in a target sequence that does not comprise the canonical PAM (5¢-NGG-3¢) at its 3ʹ end as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the complex produces fewer indels in a target sequence having a 3ʹ end that is not directly adjacent to the canonical PAM sequence (5¢-NGG-3¢) that is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold lower as compared to the amount of indels produced by a complex comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2 on the same target sequence. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of NGT, NGA, NGC, and NNG, wherein N is an A, G, T, or C. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, and CAA. In some embodiments, indels are measured using high-throughput sequencing.

[00397] In some embodiments, the complex exhibits a decreased off-target activity as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the off-target activity of the complex is at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold decreased as compared to the off-target activity of a complex comprising Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 2. In some embodiments, the off-target activity is determined using a genome-wide off-target analysis. In some embodiments, the off-target activity is determined using GUIDE-seq.

Methods of using base editors

[00398] Some aspects of this disclosure provide methods of using the Cas9 domains, fusion proteins, or complexes provided herein.

[00399] In one aspect, provided herein are methods comprising contacting a nucleic acid molecule (a) with any of the Cas9 domains or fusion proteins provided herein, and with at least one guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence in the nucleic acid molecule; or (b) with a Cas9 domain, a fusion protein comprising a Cas9 domain, or a complex comprising a Cas9 domain, wherein the Cas9 domain is associated with at least one gRNA as provided herein. In some embodiments, the nucleic acid is present in a cell. In some embodiments, the nucleic acid is present in a subject. In some embodiments, the contacting is in vitro. In some embodiments, the contacting is in vivo in a subject.

[00400] In another aspect, provided herein are methods comprising contacting a cell (a) with any of the Cas9 domains or fusion proteins provided herein, and with at least one guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence in the nucleic acid molecule; or (b) with a Cas9 domain, a fusion protein comprising a Cas9 domain, or a complex comprising a Cas9 domain, wherein the Cas9 domain is associated with at least one gRNA as provided herein. In some embodiments, the contacting is in vitro. In some embodiments, the contacting is in vivo in a subject. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the prokaryotic cell is a bacterium. In some embodiments, the bacterium is E. coli. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the eukaryotic cell is a mammalian cell. In some embodiments, the mammalian cell is a human cell. In some embodiments, the cell is a plant or fungal cell.

[00401] In another aspect, provided herein are methods for administering to a subject (a) any of the Cas9 domains or fusion proteins provided herein, and at least one guide RNA, wherein the guide RNA is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence in the nucleic acid molecule; or (b) a Cas9 domain, a fusion protein comprising a Cas9 domain, or a complex comprising a Cas9 domain, wherein the Cas9 domain is associated with at least one gRNA as provided herein. In some embodiments, an effective amount of the Cas9 domain, fusion protein, or complex is administered to the subject. In some embodiments, the effective amount is an amount effective for treating a disease or disorder, wherein the disease comprises one or more point mutations in a nucleic acid sequence associated with the disease or disorder. [00402] In some embodiments, the 3ʹ end of the target sequence is not immediately adjacent to the canonical PAM sequence (5¢-NGG-3¢). In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of NGT, NGA, NGC, and NNG, wherein N is an A, G, T, or C. In some embodiments, the 3ʹ end of the target sequence is directly adjacent to a sequence selected from the group consisting of CGG, AGT, TGG, AGT, CGT, GGG, CGT, TGT, GGT, AGC, CGC, TGC, GGC, AGA, CGA, TGA, GGA, GAA, GAT, and CAA.

[00403] In some embodiments, the target sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the activity of the Cas9 domain, the Cas9 fusion protein, or the complex results in a correction of the point mutation. In some embodiments, the target sequence comprises a T®C point mutation associated with a disease or disorder, wherein the deamination of the mutant C base results in a sequence that is not associated with a disease or disorder. In some embodiments, the target sequence comprises a A®G, wherein deamination of the C that is base- paired to the mutant G base results in a sequence that is not associated with a disease or disorder. In some embodiments, the target sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. In some embodiments, the deamination of the mutant C results in a change of the amino acid encoded by the mutant codon. In some embodiments, the deamination of the mutant C results in the codon encoding the wild-type amino acid. In some embodiments, the target DNA sequence comprises a G®A point mutation associated with a disease or disorder, and wherein the deamination of the mutant A base results in a sequence that is not associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a C®T point mutation associated with a disease or disorder, wherein deamination of the A that is base-paired with the mutant T results in a sequence that is not associated with a disease or disorder. In some embodiments, the target DNA sequence encodes a protein and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. In some embodiments, the deamination of the mutant A results in a change of the amino acid encoded by the mutant codon. In some embodiments, the deamination of the mutant A results in the codon encoding the wild-type amino acid. In some embodiments, the contacting is in vivo in a subject. In some embodiments, the subject has or has been diagnosed with a disease or disorder. In some embodiments, the disease or disorder is cystic fibrosis, phenylketonuria, epidermolytic hyperkeratosis (EHK), Charcot-Marie-Toot disease type 4J, neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenital, hereditary renal amyloidosis, dilated cardiomyopathy (DCM), hereditary lymphedema, familial Alzheimer’s disease, HIV, Prion disease, chronic infantile neurologic cutaneous articular syndrome (CINCA), desmin-related myopathy (DRM), a neoplastic disease associated with a mutant PI3KCA protein, a mutant CTNNB1 protein, a mutant HRAS protein, or a mutant p53 protein. In some embodiments, the target sequence comprises a sequence located in a genomic locus. In some embodiments, the genomic locus is a HEK site. In some embodiments, the HEK site is HEK site 3 or HEK site 4. In some embodiments, the HEK site comprises a CGG, GGG, TGT, GGT, AGC, CGC, TGC, AGA, or TGA PAM sequence. In some embodiments, the genomic locus is EMX1. In some embodiments, the EMX1 locus comprises a GGG or CAA PAM sequence. In some embodiments, the genomic locus is VEGFA. In some embodiments, the VEGFA locus comprises a AGT, GGC, GGA, or GAT PAM sequence. In some embodiments, the genomic locus is FANCF. In some embodiments, the FANCF locus comprises a CGT, GAA, GAT, TGG, AGT, TGT, GGT, CGC, TGC, GGC, AGA, or TGA PAM sequence.

[00404] Some embodiments provide methods for using the Cas9 DNA editing fusion proteins provided herein. In some embodiments, the fusion protein is used to introduce a point mutation into a nucleic acid by deaminating a target nucleobase, e.g., a C or A residue. In some embodiments, the deamination of the target nucleobase results in the correction of a genetic defect, e.g., in the correction of a point mutation that leads to a loss of function in a gene product. In some embodiments, the genetic defect is associated with a disease or disorder, e.g., a lysosomal storage disorder or a metabolic disease, such as, for example, type I diabetes. In some embodiments, the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease or disorder. For example, in some embodiments, methods are provided herein that employ a fusion protein comprising a Cas9 domain (e.g., a base editor) to introduce a deactivating point mutation into an oncogene (e.g., in the treatment of a proliferative disease). A deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.

[00405] In some embodiments, the purpose of the methods provide herein is to restore the function of a dysfunctional gene via genome editing. The Cas9-deaminase fusion proteins provided herein can be validated for gene editing-based human therapeutics in vitro, e.g., by correcting a disease-associated mutation in human cell culture. It will be understood by the skilled artisan that the fusion proteins provided herein, e.g., the fusion proteins comprising a Cas9 domain and a cytidine deaminase domain can be used to correct any single T®C or A®G point mutation. In the first case, deamination of the mutant C back to U corrects the mutation, and in the latter case, deamination of the C that is base- paired with the mutant G, followed by a round of replication, corrects the mutation. The fusion proteins comprising a Cas9 domain and one or more adenosine deaminase domains can be used to correct any single G®A or C®T point mutation. In the first case, deamination of the mutant A to I corrects the mutation, and in the latter case, deamination of the A that is base-paired with the mutant T, followed by a round of replication, corrects the mutation.

[00406] An exemplary disease-relevant mutation that can be corrected by the provided fusion proteins in vitro or in vivo is the H1047R (A3140G) polymorphism in the PI3KCA protein. The

phosphoinositide-3-kinase, catalytic alpha subunit (PI3KCA) protein acts to phosphorylate the 3-OH group of the inositol ring of phosphatidylinositol. The PI3KCA gene has been found to be mutated in many different carcinomas, and thus it is considered to be a potent oncogene. 50 In fact, the A3140G mutation is present in several NCI-60 cancer cell lines, such as, for example, the HCT116, SKOV3, and T47D cell lines, which are readily available from the American Type Culture Collection (ATCC). 51

[00407] In some embodiments, a cell carrying a mutation to be corrected, e.g., a cell carrying a point mutation, e.g., an A3140G point mutation in exon 20 of the PI3KCA gene, resulting in a H1047R substitution in the PI3KCA protein, is contacted with an expression construct encoding a Cas9 deaminase fusion protein and an appropriately designed sgRNA targeting the fusion protein to the respective mutation site in the encoding PI3KCA gene. Control experiments can be performed where the sgRNAs are designed to target the fusion enzymes to non-C residues that are within the PI3KCA gene. Genomic DNA of the treated cells can be extracted, and the relevant sequence of the PI3KCA genes PCR amplified and sequenced to assess the activities of the fusion proteins in human cell culture.

[00408] It will be understood that the example of correcting point mutations in PI3KCA is provided for illustration purposes and is not meant to limit the instant disclosure. The skilled artisan will understand that the instantly disclosed DNA-editing fusion proteins can be used to correct other point mutations and mutations associated with other cancers and with diseases other than cancer including other proliferative diseases.

[00409] The successful correction of point mutations in disease-associated genes and alleles opens up new strategies for gene correction with applications in therapeutics and basic research. Site-specific single-base modification systems like the disclosed fusions of Cas9 domains and deaminase domains also have applications in“reverse” gene therapy, where certain gene functions are purposely suppressed or abolished. In these cases, site-specifically mutating Trp (TGG), Gln (CAA and CAG), or Arg (CGA) residues to premature stop codons (TAA, TAG, TGA) can be used to abolish protein function in vitro, ex vivo, or in vivo. [00410] The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a fusion protein comprising a Cas9 domain and nucleic acid editing domain (e.g., a deaminase domain) provided herein. For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a PI3KCA point mutation as described above, an effective amount of a Cas9 deaminase fusion protein that corrects the point mutation or introduces a deactivating mutation into the disease-associated gene. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some

embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.

[00411] The instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by deaminase-mediated gene editing. Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure. Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering. One of skill in the art will be able to identify the respective residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Exemplary suitable diseases and disorders include, without limitation, cystic fibrosis (see, e.g., Schwank et al., Functional repair of CFTR by CRISPR/Cas9 in intestinal stem cell organoids of cystic fibrosis patients. Cell stem cell.2013; 13: 653-658; and Wu et. al., Correction of a genetic disease in mouse via use of CRISPR-Cas9. Cell stem cell.2013; 13: 659-662, neither of which uses a deaminase fusion protein to correct the genetic defect); phenylketonuria– e.g., phenylalanine to serine mutation at position 835 (mouse) or 240 (human) or a homologous residue in phenylalanine hydroxylase gene (T>C mutation)– see, e.g., McDonald et al., Genomics.1997; 39:402-405;

Bernard-Soulier syndrome (BSS)– e.g., phenylalanine to serine mutation at position 55 or a homologous residue, or cysteine to arginine at residue 24 or a homologous residue in the platelet membrane glycoprotein IX (T>C mutation)– see, e.g., Noris et al., British Journal of Haematology. 1997; 97: 312-320, and Ali et al., Hematol.2014; 93: 381-384; epidermolytic hyperkeratosis (EHK)– e.g., leucine to proline mutation at position 160 or 161 (if counting the initiator methionine) or a homologous residue in keratin 1 (T>C mutation)– see, e.g., Chipev et al., Cell.1992; 70: 821-828, see also accession number P04264 in the UNIPROT database at www[dot]uniprot[dot]org; chronic obstructive pulmonary disease (COPD)– e.g., leucine to proline mutation at position 54 or 55 (if counting the initiator methionine) or a homologous residue in the processed form of a 1 -antitrypsin or residue 78 in the unprocessed form or a homologous residue (T>C mutation)– see, e.g., Poller et al., Genomics.1993; 17: 740-743, see also accession number P01011 in the UNIPROT database; Charcot-Marie-Toot disease type 4J– e.g., isoleucine to threonine mutation at position 41 or a homologous residue in FIG4 (T>C mutation)– see, e.g., Lenk et al., PLoS Genetics.2011; 7:

e1002104; neuroblastoma (NB)– e.g., leucine to proline mutation at position 197 or a homologous residue in Caspase-9 (T>C mutation)– see, e.g., Kundu et al., 3 Biotech.2013, 3:225-234; von Willebrand disease (vWD)– e.g., cysteine to arginine mutation at position 509 or a homologous residue in the processed form of von Willebrand factor, or at position 1272 or a homologous residue in the unprocessed form of von Willebrand factor (T>C mutation)– see, e.g., Lavergne et al., Br. J. Haematol.1992, see also accession number P04275 in the UNIPROT database; 82: 66-72; myotonia congenital– e.g., cysteine to arginine mutation at position 277 or a homologous residue in the muscle chloride channel gene CLCN1 (T>C mutation)– see, e.g., Weinberger et al., The J. of Physiology. 2012; 590: 3449-3464; hereditary renal amyloidosis– e.g., stop codon to arginine mutation at position 78 or a homologous residue in the processed form of apolipoprotein AII or at position 101 or a homologous residue in the unprocessed form (T>C mutation)– see, e.g., Yazaki et al., Kidney Int. 2003; 64: 11-16; dilated cardiomyopathy (DCM)– e.g., tryptophan to Arginine mutation at position 148 or a homologous residue in the FOXD4 gene (T>C mutation), see, e.g., Minoretti et. al., Int. J. of Mol. Med.2007; 19: 369-372; hereditary lymphedema– e.g., histidine to arginine mutation at position 1035 or a homologous residue in VEGFR3 tyrosine kinase (A>G mutation), see, e.g., Irrthum et al., Am. J. Hum. Genet.2000; 67: 295-301; familial Alzheimer’s disease– e.g., isoleucine to valine mutation at position 143 or a homologous residue in presenilin1 (A>G mutation), see, e.g., Gallo et. al., J. Alzheimer’s disease.2011; 25: 425-431; Prion disease– e.g., methionine to valine mutation at position 129 or a homologous residue in prion protein (A>G mutation)– see, e.g., Lewis et. al., J. of General Virology.2006; 87: 2443-2449; chronic infantile neurologic cutaneous articular syndrome (CINCA)– e.g., Tyrosine to Cysteine mutation at position 570 or a homologous residue in cryopyrin (A>G mutation)– see, e.g., Fujisawa et. al. Blood.2007; 109: 2903-2911; and desmin-related myopathy (DRM)– e.g., arginine to glycine mutation at position 120 or a homologous residue in aB crystallin (A>G mutation)– see, e.g., Kumar et al., J. Biol. Chem.1999; 274: 24137-24141. The entire contents of all references and database entries is incorporated herein by reference.

Pharmaceutical compositions

[00412] Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the various components described herein (e.g., including, but not limited to, the napDNAbps, fusion proteins, guide RNAs, and complexes comprising fusion proteins and guide RNAs).

[00413] The term“pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).

[00414] As used here, the term“pharmaceutically-acceptable carrier” means a pharmaceutically- acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A

pharmaceutically acceptable carrier is“acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as“excipient”,“carrier”,“pharmaceutically acceptable carrier” or the like are used interchangeably herein.

[00415] In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.

[00416] In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.

[00417] In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med.321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and

Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem.23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol.25:351; Howard et al., 1989, J. Neurosurg.71:105.) Other controlled release systems are discussed, for example, in Langer, supra.

[00418] In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.

[00419] A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.

[00420] The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in“stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438- 47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl- amoniummethylsulfate, or“DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos.4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.

[00421] The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term“unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.

[00422] Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The

pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.

[00423] In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use. Delivery methods

[00424] In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein encoding one or more components described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a base editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a base editor to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Felgner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).

[00425] Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos.5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration). The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther.2:291-297 (1995); Behr et al., Bioconjugate Chem.5:382-389 (1994); Remy et al., Bioconjugate Chem.5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

[00426] The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.

[00427] The tropism of a viruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol.66:2731-2739 (1992); Johann et al., J. Virol.66:1635-1640 (1992); Sommnerfelt et al., Virol.176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol.65:2220-2224 (1991); PCT/US94/05700). In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No.4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest.94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No.5,173,414; Tratschin et al., Mol. Cell. Biol.5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol.4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol.63:03822-3828 (1989).

[00428] Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and y2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.

Kits, vectors, cells

[00429] Some aspects of this disclosure provide kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a Cas9 domain or a fusion protein comprising a Cas9 domain as provided herein; and (b) a heterologous promoter that drives expression of the sequence of (a). In some embodiments, the kit further comprises an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.

[00430] Some aspects of this disclosure provide polynucleotides encoding a Cas9 domain or a fusion protein comprising a Cas9 domain as provided herein. Some aspects of this disclosure provide vectors comprising such polynucleotides. In some embodiments, the vector comprises a heterologous promoter driving expression of polynucleotide.

[00431] In one aspect, provided herein are methods comprising contacting a cell with a kit provided herein. In another aspect, provided herein are methods comprising contacting a cell with a vector provided herein. In some embodiments, the vector is transfected into the cell. In some embodiments, the vector is transfected into the cell using a suitable transfection reaction. Transfection reactions may be carried out, for example, using electroporation, heat shock, or a composition comprising a cationic lipid. Cationic lipids suitable for the transfection of nucleic acid molecules are provided in, for example, Patent Publication WO2015/035136, published March 12, 2015, entitled“Delivery System for Functional Nucleases”; the entire contents of which is incorporated by reference herein.

[00432] Some aspects of this disclosure provide cells comprising a Cas9 domain, a fusion protein, a nucleic acid molecule, and/or a vector as provided herein.

[00433] The description of exemplary embodiments of the reporter systems (e.g., GFP) herein is provided for illustration purposes only and not meant to be limiting. Additional reporter systems, e.g., variations of the exemplary systems described in detail above, are also embraced by this disclosure. REFERENCES

[00434] 1. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

[00435] 2. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016).

[00436] 3. Gaudelli, N. M. et al. Programmable base editing of A.T to G.C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

[00437] 4. Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun.8, 15790 (2017).

[00438] 5. Kim, J.-S. Precision genome engineering through adenine and cytosine base editing. Nat. Plants 4, 148–151 (2018). EXAMPLES EXAMPLE 1

Identification of PAM sequences that SpCas9 and xCas9 have low activity

[00439] A key limitation to the use of CRISPR-Cas9 domains for genome editing and other applications is the requirement that a protospacer adjacent motif (PAM) be present at the target site. For the most commonly used Cas9 from Streptococcus pyogenes (SpCas9), this PAM requirement is NGG. No natural or engineered Cas9 variants shown to function efficiently in mammalian cells offer a PAM less restrictive than NGG. Phage-assisted continuous evolution (PACE) was used to evolve the wild type SpCas9 and an expanded PAM SpCas9 variant (xCas9) that can recognize a broad range of PAM sequences. The PAM compatibility of xCas9 is the broadest reported to date among Cas9s active in mammalian cells, and supports applications in human cells including targeted transcriptional activation, nuclease-mediated gene disruption, and both cytidine and adenine base editing.

[00440] Here, phage-assisted continuous evolution (PACE) is used for identification on PAMs that spCas9 and xCas9 have low activity. During PACE, host E. coli cells continuously dilute an evolving population of bacteriophages (selection phage, SP). Since dilution occurs faster than cell division but slower than phage replication, only the SP, and not the host cells, can accumulate mutations. Each SP carries a gene to be evolved instead of a phage gene (gene III) that is required for the production of infectious progeny phage. SP containing desired gene variants trigger host-cell gene III expression from the accessory plasmid (AP) and the production of infectious SP that propagate the desired variants. Phage encoding inactive variants do not generate infectious progeny and are rapidly diluted out of the culture vessel (FIG.1A). As phage replication can occur in as little as 10 minutes, PACE enables hundreds of generations of directed evolution to occur per week without researcher intervention.

[00441] To link Cas9 DNA recognition to phage propagation during PACE, a bacterial one-hybrid selection in which the SP encodes a catalytically dead SpCas9 (dCas9) fused to the w subunit of bacterial RNA polymerase was developed (FIG.1A). When this fusion binds an AP-encoded sgRNA and a PAM and protospacer upstream of gene III in the AP, RNA polymerase recruitment causes gene III expression and phage propagation (FIG.1B). A library of all 64 possible NNN PAM sequences at the target protospacer in the AP, so that SP encoding Cas9 variants with broader PAM compatibility would replicate in a larger fraction of host cells and thus experience a fitness advantage, was generated. After overnight propagation. As expected, xCas9 are less stringent on PAM requirement. Both SpCas9 or xCas9 exhibited low activity on NAA, NAC, and NAT PAMs (FIG.1C). The following experiments were designed to identify Cas9 variants that are able to bind to NAA, NAC, and NAT PAMs. EXAMPLE 2

Phage Assisted Non-continuous Evolution (PANCE) of Cas9 variants for expanded PAM compatibility

[00442] Phage-assisted non-continuous evolution (PANCE) system was used to further evolve SpCas9 and xCas9 for identification of Cas9 variants that can recognize non-NGG PAMs. In the PANCE system, the SP is iteratively passaged through serial dilution in host cells in order to evolve SpCas9 and/or xCas9 proteins that bind to all possible The PANCE system preferentially replicates Cas9 variants that bind a greater variety of PAM sequences, similar to PACE, but with lower stringency since there is no outflow of phage. Although lower in stringency, the PANCE system allows for higher throughput, enabling evolution towards multiple targets (e.g., NAA, NAC, NAT PAMS) simultaneously.

[00443] In this experiment, SPs were iteratively passaged through serial dilution in host cells to evolve either SpCas9 or xCas9 proteins capable of binding to all 16 NAN PAM target sequences. In PANCE, E. coli host cells transformed with an AP and mutagenesis plasmid (MP) or dilution plasmid (DP) are plated in individual wells of a multi-well plate and grown to log phase. Selection phages, are then introduced and mutagenesis is induced with arabinose or aTc. The SPs are then grown for at least 6 additional hours, before being collected and used to infect the next multi-well plate of E. coli host cells that have grown to log phase (FIG.2A). Each one of these infection-incubation-collection cycles is referred to as a“passage”.

[00444] Increased recognition of non-NGG PAMs were observed in both SpCas9 and xCas9 as they were evolved through more passages in PANCE. FIG.2B shows evolving SpCas9 and xCas9’s ability to recognize all 64 PAMs for passage 2, passage 12 and passage 16. After performing 19 rounds of selection in PANCE and sequencing the surviving phage pools (FIG.36), mutations largely differing according to the third base of the NAN PAM targeted for evolution were observed. For example, variants selected on NAA enriched for Gly, Ile, or Lys at position 1333, while those selected for NAT enriched for Gln or Leu at position 1335. Finally, variants evolved to bind NAC enriched simultaneously for Gln at position 1335 and Asn at position 1337.

[00445] The clones of mutated SpCas9 and xCas9 variants that were able to recognize NAA PAMs were isolated and sequenced for identification of mutations in Cas9. FIG.3A shows mutations in SpCas9 at passage 12 that can recognize CAA, GAT, ATG, or AGC PAMs. FIG.4A shows mutations in SpCas9 at passage 19 that can recognize ATG, CAA, or GAA PAMs. Further, the wild type SpCas9 clones, e.g., CAA-3, GAT-2, ATG-2, ATG-3, or AGC-3 in passage 12 were tested using the luciferase assay described above for their ability to recognize all 64 PAMs, as shown in FIG. 3B. Similarly, the wild type SpCas9 clones, e.g., CAA-1, CAA-2, GAA-1, GAA-2, GAC-5, GAT-1, GAT-3, AGC-1, AGC-3, AGC-6. ATG-3, or ATG-6 in passage 19 were tested using the luciferase assay described above for their ability to recognize all 64 PAMs, as shown in FIG.4B.

[00446] Similarly, FIG.5A shows mutations in xCas9 at passage 12 that can recognize TAT, GTA, or CAC PAMs, and FIG.6A shows mutations in xCas9 at passage 19 that can recognize AAA, GCC, or TAA PAMs. Further, xCas9 mutant clones, e.g., TAT-1, TAT-3, GTA-1, GTA-3, or CAC-2 in passage 12 were tested using the luciferase assay described above for their ability to recognize all 64 PAMs, as shown in FIG.5B. Similarly, xCas9 mutant clones, e.g., AAA-1, TAA-2, TAA-5, TAT-5, CAC-5, CAC-6, GTA-2, GTA-7, GCC-2, GCC-5, or GCC-8 in passage 18 were tested using the luciferase assay described above for their ability to recognize all 64 PAMs, as shown in FIG.6B.

[00447] To test if mutations evolved during PANCE in bacteria are compatible with xCas9 function in mammalian cells, SpCas9 and xCas9 variants were characterized for their activity and PAM compatibility in human cells in two contexts: adenine base editing and genomic DNA cutting.

Additionally, to further characterize genomic DNA cleavage in human cells by xCas9 variants, we targeted endogenous genomic sites in HEK293T cells and measured indel formation by high- throughput sequencing (HTS).

[00448] To evaluate C•G-to-T•A base editing activity of xCas9 variants, SpCas9 was substituted with xCas93.7 and 3.6 in the third-generation (BE3) base editor architecture. Both xCas9–BE3s were transfected into mammalian cells to compare editing efficiency. The xCas9-BE3 protein

demonstrated base editing activity only on CGT and CGG PAMs, whereas the ATG2-BE3 protein demonstrated base editing activity on CAG and ATG PAMs, the CAA3-BE3 protein demonstrated base editing activity on CGG PAMs, and the TAT1-BE3 protein demonstrated base editing activity on CAT PAMs (FIG.7).

[00449] The xCas9 protein produced indels in CAG, ATG, CAT, CGT, and CGG PAMs, whereas the ATG2 protein produced indels in CAG and CGG PAMs, the CAA3 protein produced indels in CAT and CGG PAMs, and the TAT1 protein produced indels in CAT PAMs (FIG.7).

[00450] Thus, the PANCE evolved spCas9 variants have some activity in vitro on non-NGG PAMs.

[00451] Additionally, the xCas9-passage 12-TAT1 (N6) variant was subjected to further PANCE evolution. A comparison of xCas9-passage 12-TAT1 to SpCas9 in various amino acid residues was shown in FIG.9A. The clones resulting from further PANCE evolution of the xCas9-passage 12- TAT1 (N6) variant are shown in FIGs.10-11. FIG.12 shows evolving’s xCas9-passage 12-TAT1 variant’s ability to recognize all 64 PAMs for passage 2, passage 12 and passage 16.

EXAMPLE 3

Selection improvement allows the evolution of NAA PAM binding activity

[00452] Despite enriching for multiple consensus mutations in the PAM-interacting domain (PID), (D1135N/E1219V/Q1221H/H1264Y/A1320V/R1333K), the NAA-targeted PANCE-evolved mutants exhibited low activity when subcloned into C to T base editors (CBEs) and tested for base conversion on sites containing NAA PAMs in mammalian cells (FIGs.7, 37C). One possible explanation is that evolving increased binding activity might require increased selection stringency, and three strategies were implemented to accomplish this.

[00453] First, two variants evolved to bind a CAA PAM in the initial PANCE assay were selected and subjected to PACE using a dual-AP system. Here, each AP provides one half of slit- intein pIII under control of an orthogonal Cas91-hybrid circuit, requiring w-dCas9 to successfully bind two distinct protospacer-PAM motifs to produce full-length pIII (FIGs.13A, 13B, 37A). These experiments led to the acquisition of a few additional consensus mutations (FIGs.14A, 37B). This in turn led to improvements in CBE on sites (FIGs.14B, 37C) and increased percentages of indels in mammalian cells (FIG.14C).

[00454] Next, the total amount of Cas9 present in the selection was limited by using a split- intein to divide w-dCas9 into two halves and encoding only the C-terminal half (which contains the PID) on the SP. Production of large amounts of w-dCas9 by the SP might lead to saturation of binding to protospacer-PAM sites AP despite the presence of a non-optimal PAM (FIG.15A).

Indeed, using higher concentrations of SpCas9 in in vitro PAM depletion assays can lead to depletion of non-canonical PAM sequences (REF). Here, residues 574-1368 of Cas9 fused to NpuC (dCas9 C ) reside on the phage, while w-dCas9 (1-573) fused to NpuN (w-dCas9 N ) is provided on a

complimentary plasmid (CP) (FIGs.15B, 37A). This strategy allows the total amount of full-length SpCas9 produced in the host cells in PACE to be user-defined on the CP.

[00455] The consensus mutations obtained from the dual-AP selection were subcloned into a split-intein w-dCas9 format. However, several mutations (T10A/I322V/S409I,E427G) had accumulated in the 1-573 region over the course of the previous selection. These mutations were incorporated into w-dCas9 N and investigated their effect on Cas9 DNA binding in overnight phage propagation assays using an evolved dCas9 C phage clone (P4.72.5). High phage propagation was observed on host cells containing a CP encoding w-dCas9 N (T10A/I322V/S409I/E427G), suggesting that the mutations might have a beneficial effect on Cas9 binding. Therefore, these four mutations were incorporated into w-dCas9 N for all future evolutions (hereon referred to as w-dCas9 N-mut ).

[ 00456] Thus, the evolved dCas9 C was subjected to two subsequent evolutions using host cells encoding a medium-copy AP containing an AAA PAM and low-copy CPs providing w-dCas9 N-mut from increasingly weak constitutive promoters. These rounds lead to the accumulation of additional mutations in the PID, including D1180G, which was present in several sequenced clones (FIGs.16A, 37B). The Cas9s evolved through this split-intein method exhibited a large increase in mammalian cell base editing activity, with more than double the activity of our previous variants on most NAA sites tested (FIGs.17, 37C). Additionally, the Cas9s evolved through this split-intein method exhibited a large increase in percentage of indels in most NAA PAMs tested (FIG.18).

[00457] Finally, to further increase selection stringency, gVI, whose protein product pVI is essential for phage propagation, was removed from the phage genome for use as an orthogonal selection marker for phage propagation on a second AP (FIGs.27A). Both previously described selection principles were employed, requiring a split-intein w-dCas9 to bind two distinct protospacers on APs providing both gIII and gVI (FIG.37A). Thus, three dCas9 C clones from pervious evolutions (P13.3.3, P10.5.192.7, P10.6.192.1) were subjected to this highest-stringency selection in PACE, resulting in additional mutations in the PID-notably R1114G and L1318S, which both converged to a high degree (FIG.37B).

[00458] Unfortunately, these variants proved to be inactive in mammalian cell CBE experiments (FIG.37C). The large numbers of mutations present in these highly evolved variants, especially those outside the PID, might prove deleterious to expression and/or nuclease activity. To address this, DNA shuffling was performed of the C-terminal portion (residues 574-1368) of the pool of variants from this final evolution with that of wild-type Cas9 and re-subjected the resulting library to the most stringent binding selection. This led to the isolation of several clones that exhibited improved CBE activity at both NAA and NGA sites in mammalian cells, most notably clone P16s.4-5 (R1114G/D1135N/V1139A/D1180G/E1219V/Q1221H/A1320V/R1333K) (FIG.37B), which exhibited the highest levels of activity across all sites tested amongst the variants (FIG.37C).

EXAMPLE 4

Evolution of Cas9 variants that recognize NAC or NAT PAM sequences

[00459] The strategy evolved in Example III was employed in evolving toward NAT and NAC PAMs in SpCas9 and xCas9 proteins to minimize the accumulation of potentially deleterious bystander mutations. To ensure the variants retained nuclease activity, the dCas9 from the SP pool was evolved to bind either a TAT or CAC PAM in PANCE to a nuclease-active form and passed the resulting library through a modified version of a previously reported bacterial DNA cleavage selection (data not shown). Here, Cas9 variants are challenged for their ability to bind to and cleave a protospacer-PAM sequence on a high-copy plasmid that also encodes a conditionally toxic gene (sacB). The surviving cells should then encode Cas9 variants with mutations that confer binding to a specific PAM and are compatible with nuclease activity.

[00460] From these experiments, two clones were isolated that exhibited DNA cleavage activity on a selection plasmid containing a TAT PAM with PID consensus mutations of

D1135N/E1219V/Q1221H/P1321S/R1335L, and one clone that cleaved a selection plasmid with a CAC PAM with PID mutations N1135D/E1219V/D1332N/R1335Q/T1337N (FIGs.37D, 37E). These nuclease-active TAT and CAC variants were then converted into split-intein w–dCas9 format and evolved in PACE using host cells encoding APs with either NAT (AAT or TAT) or NAC (AAC, TAC, or CAC) PAMs, respectively. These experiments resulted in the enrichment of several additional PID mutations, including R1114G, which arose independently in all three PAM trajectories (NAA, NAT, and NAC) (FIGs.37D, 37E), suggesting that this mutation may be generally beneficial for modifying PAM recognition by the PID.

[00461] Next, gVI was removed from the genome of these evolved SP pools, which were subjected to additional selection in PACE using a dual-AP system containing two distinct protospacers and either an AAT or TAC PAM driving gIII/gVI expression. A Y1131C mutation was enriched in the SP pool evolved on AAT (FIG.37E); however, variants carrying this mutation were inactive in mammalian cell BE experiments (Supplementary Figure XX). Because no additional functional mutations in the PID were observed, the most active NAT PAM-targeting variant was selected from the split-intein w–dCas9 evolution (clone P12.3.b9-8) to move forward with. This variant contained the PID mutations R1114G/D1135N/D1180G/G1218S/E1219V/Q1221H/P1249S/E1253K/

P1321S/D1332G/R1335L (FIG.37E).

[00462] Several additional mutations were also enriched in the SP pool selected for binding to a TAC PAM in the split-intein w-dCas9/dual protospacer PACE. The C-terminal portion (residues 574-1368) of this pool was shuffled with that of wild-type Cas9 and re-challenged the resulting library with our most stringent binding selection. From the surviving SP pool, clone P17s.1.7-4 with the PID mutations R1114G/D1135N/E1219V/D1332N/R1335Q/T1337N/S1338T/H1249R was isolated from the surviving pool (FIG.37C).

EXAMPLE 5

Mutations outside of the PID

[00463] Structural studies of the SpCas9 suggest that residues in the PID mediate PAM specificity (REF). Indeed, most of the previous efforts to engineer or evolve SpCas9 to accept alternative PAMs have focused on modulating this region of the protein. However, because PANCE and PACE experiments involved mutagenesis of either the entire SpCas9 sequence or residues 574- 1368 (in the case of split-intein w–dCas9), there was an enrichment of a number of mutations outside of the PID. Because many of these mutations fell within the RuvC or HNH nuclease domains, some may negatively impact Cas9 nuclease activity. However, other mutations in the helical domain consistently enriched across several independent evolving populations, suggesting that they may confer a beneficial effect on Cas9 DNA binding/unwinding. [00464] Therefore, to minimize the deleterious effects from bystander mutation accumulation in the nuclease domains but also to preserve beneficial mutations in the helical domain, the evolved PIDs from Example 4 were transferred onto a fixed N-terminal sequence that included the mutations T10A/I322V/S409I/E427G shown to improve phage propagation in the split-intein w– dCas9 selection, as well as R654L/R753G, which consistently enriched across multiple independently evolving SP pools. The addition of these mutations to CBEs containing the PIDs of NAA variant P16.4-5 and NAC variant P17.1.7-4 improved CBE activity in mammalian cells across several sites when compared to just the PID mutations alone (data not shown). A smaller effect was observed for NAT variant P12.3.b9-8, but because there did not appear to be a decrease in overall CBE activity (data not shown), there N-terminal mutations were incorporated into all three final variants, from hereon referred to as NRRH, NRCH, and NRTH, which are derived from clones P16.4-5, P17.1.7-4, and P12.3.b9-8, respectively.

EXAMPLE 6

Characterization of PAM specificity through bacterial depletion

[00465] To better characterize the PAM specificities of the evolved variants, bacterial PAM depletion was performed using a library consisting of 4Ns following the protospacer (FIGs.19A- 19C). For comparison, depletion experiments were also performed with wild-type Cas9 that acts on an NGG PAM sequence (SpCas9-NG) in parallel. Cells were plated after 1 or 3 h or overnight expression of the SpCas9 variant from an inducible promoter to better resolve any kinetic differences in PAM sequence preference. As expected, depletion scores of any given PAM increased with longer induction times (data not shown), with the shortest induction times resulting in the most noticeable sequence preferences (data not shown).

[00466] For example, at 1 hour (h) induction, NRRH exhibited a strong preference for C at the 4 th PAM position, a mixed preference for G/A at positions 2 and 3 and a moderate preference for G at position 1 (FIGs.20, 38A). However, longer induction times resulted in more relaxed specificity at all positions. Similarly, NRCH showed a strong preference for G at position 2 and a moderate preference for pyrimidines at position 4 (FIG.38A) at 1 h induction, but only a mixed enrichment for G/A at position 2 was observable at longer induction times (FIG.38A). Finally, at 1 h induction, NRTH enriched strongly for G and T at positions 2 and 3, respectively (FIG.38A), but by 3 h we observed a shift in the nucleotide preference at position 2 to a mix of G and A, suggesting that this variant recognizes and cleaves NAT PAMs more slowly when compared to NGT PAMs. Additionally, this suggests that NRTH may preferentially recognize NRT over NGG PAMs.

[00467] Interestingly, SpCas9-NG displayed a moderate preference for G at the 3 rd and 4 th PAM position at short induction times. This is consistent with SpCas9-NG’s T1337R mutation, which is also found in SpCas9 VRER and VRQR [REF] and is the cause for the increased specificity for G at the 4th PAM position of these variants. Similar to the evolved Cas9 variants, SpCas9-NG’s PAM sequence requirements also became more relaxed with longer induction times (data not shown).

[00468] Further, the P11 clone, which also possesses the P4.2.72.4 spCas9 mutations, was evolved using split-intein Cas9 mutants on AAA PAM bacterial depletion to generate clones with new mutations (FIG.21). The ability of the newly P11-SacB-1 and P11-SacB-2 clones to perform base- editing and generate indels was evaluated in vitro in HEK293T cells (FIGs.22-23). Both the P11- SacB-1 and P11-SacB-2 clones had higher base editing activity and a greater percentage of indels generated compared to xCas9 proteins (FIGs.22-23).

[00469] Similarly, the P12 clone was evolved using split-intein Cas9 mutants on AAT or TAT PAM bacterial depletion to generate clones with new mutations (FIGs.24A-24B). The ability of these newly-generated P12.3.b9-8 and P12.3.b10 clones to perform base-editing and generate indels was evaluated in vitro in HEK293T cells (FIGs.25A, 25B, 26A, 26B).

EXAMPLE 7

Survival-based selection for isolating nuclease-active Cas9 variants [00470] A survival-based selection method for isolating nuclease-active SpCas9 clones was generated (FIG.28). The SacB gene produces a toxic protein, and clones that survive this selection will have active nuclease that can cut the SacB gene. The original TAT clone was generated from PANCE on a TAT PAM, but lacked nuclease activity. This TAT cloned was subcloned from a pool of N4.TAT selection phage (SP) into a Cas9 plasmid and selection was performed for variants that cut a SacB selection plasmid with a TAT PAM. Two additional TAT clones, SacB-TAT-1 and SacB-TAT-2, were isolated (FIGs.29A, 29B).

[00471] These SacB-TAT-1 and SacB-TAT-2 clones were evaluated for their ability to perform base editing and generating indels in vitro in HEK293T cells (FIGs.30A, 30B, 31). The SacB-TAT-1 and SacB-TAT-2 clones both possessed higher base editing activity on GAT, CAT, and GAAP AMs compared with xCas9 (FIG.30A), as well as higher indel generation on GAT and TAT PAMs compared with xCas9 and spCas9 (FIGs.30B, 31).

EXAMPLE 8

Evolved Cas9 to generate indels at endogenous human genomic loci

[00472] The activity of the evolved SpCas9 and xCas9 variant proteins was assessed in HEK293T cells through indel formation at endogenous target sites spanning all 64 NANN PAMs. For comparison, the activity of the SpCas9 wild-type (SpCas9-NG) protein was tested at these sites in parallel. Generally, each of the variants displayed the highest indel formation activity on target sites containing a PAM it was evolved to recognize, with NRRH and NRTH showing an average of 23.0±7.8% and 22.9±7.2% indel formation on target sites containing NAAN and NATN PAMs, respectively. Sites containing NACN PAMs were edited at slightly lower efficiencies, with NRCH averaging 18.0±5.9% indel formation. Additionally, NRRH displayed 20% average indel formation on sites containing a NAG PAM, even though it had not been evolved to bind this PAM sequence (FIG. 38B). [00473] Interestingly, indel formation was observed with SpCas9-NG at a number of NANN sites. Although its average indel formation across these sites was lower than the evolved variants, SpCas9-NG displayed activity at sites with NANG PAMs (12.2±3.0%, 11.9±5.2%, 21.2±6.2%, and 18.3±4.4% average indel formation for NAAG, NACG, NATG, and NAGG, respectively) (FIG.38B). In contrast, the evolved variants showed the lowest average activity at sites with PAM sequences with a G at position 4, and the highest at sites with a non-G (H) at this position (27.3±8.6%, 23.7±6.8, 26.9±8.1%, and 26.8±7.6% average indel formation for NRRH, NRCH, NRTH, and NRRH on NAAH, NACH, NATH, and NAGH PAMs, respectively) (FIGs.38B, 38C). These results are consistent with the sequence preferences predicted by the bacterial PAM depletion experiments, and suggest that the variants and SpCas9-NG exhibit orthogonal PAM specificities.

[00474] The indel formation activity of evolved variants and SpCas9-NG were tested on a number of endogenous target sites containing NGN PAMs, with SpCas9-NG, NRCH, and NRTH performing best on NGA, NGC, and NGT PAMs, respectively, with 41.1±10.7%, 42.4±4.4%, and 67.7±6.8% average indel formation (data not shown). Similar to above, a preference for H at position 4 of the PAM by our variants was observed in these experiments.

[00475] Thus, increasing the DNA targeting capabilities of SpCas9 and xCas9 variants towards NRN PAMs could also greatly increase the proportion of genomic off-target sequences accessible by these Cas9 variants.

EXAMPLE 9

Evolved Cas9s are compatible with base editing technology

[00476] Next, the ability of evolved Cas9 variant proteins to support base editing was determined. C to T base editors (CBEs) were generated by incorporating the evolved Cas9 variants into BE4max (REF) in place of wt-Cas9. The activity of these CBEs was analyzed at the same 64 endogenous examined above for indel formation. As before, each of the three variants showed the highest average activity on sites containing the PAM it was evolved to recognize. BE4max-NRRH and BE4max-NRTH performed best on NAAN and NATN PAMs, with an average of 11.7±3.7% and 17.3±4.0% C•G to T•A conversion, respectively. CBE activity on NACN PAMs was slightly less efficient, with BE4max-NRCH enabling the highest editing activity at these sites at an average of 10.8±3.0% base conversion. Both BE4max-NRRH and BE4max-NG edit NAGN sites similarly, at 11.4±3.6 and 11.6±4.8% average base conversion (FIG.39A).

[00477] Improved base editing activity was again observed on sites with NANH PAMS, where C•G to T•A conversion at NAAH, NACH, NATH, and NAGH sites increasing to 14.4±4.1%, 13.0±2.6%, 21.0±4.2%, and 14.5±4.0 for BE4max-NRRH, -NRCH, -NRTH, and -NRRH, respectively (FIGs.39A, 39B). BE4max-NG performs well at sites containing NANG PAMs, with 13.6±4.4% average editing (FIG.39A). These editors also function on sites with NGN PAMs (data not shown). As expected, the CBE activity across all 64 sites is much more variable than that of indel formation, since there are increased requirements for efficient base editing such as sequence context and position of the C within the window. Finally, the Cas9 variants are also compatible with A to T base editors, exhibiting similar performance on a subset of sites containing NAN and NGN PAMs when substituted in place of wt-Cas9 in ABEmax (FIG.39C).

EXAMPLE 10

Characterization of evolved Cas9s and SpCas9-NG using a mammalian library for base editing activity

[00478] Finally, to thoroughly profile the PAM preferences of these variants, the base editing efficiencies of the three evolved variants, SpCas9-NG, and wt-Cas9 were evaluated on a library of 11,776 unique sequences in mammalian cells. This library was designed using 46 distinct protospacers derived from sequences found in the human genome, each with different sequence contexts surrounding a fixed C in the 4th position. Each protospacer is adjacent to a PAM sequence of 4Ns, and is additionally flanked with designated primer binding sites for amplification for high- throughput sequencing (HTS) analysis (FIG.40B). [00479] Characterization of the evolved variants in this library format recapitulated the same preferences observed with both bacterial PAM depletion and base editing on endogenous mammalian genomic sites. For instance, our evolved variants exhibited the highest editing activity on the third base towards which it was evolved (FIG.40E) or when a non-G was at the 4th position of the PAM, performing best when a pyrimidine was at this position (FIG.40F). Additionally, our evolved variants, in particular NRRH, performed best when a G or C was present at position 1 of the PAM, whereas wt-Cas9 exhibited only slight preference for G at this position (data not shown).

[00480] The U6 promoter, commonly used to express sgRNAs in mammalian cells, initiates transcription with a 5’ G. If a G is not natively present at the 5’ end of the protospacer, guide sequences are typically either extended to the next native G or transcribed with a mismatched G at position 21 of the guide sequence. However, high-fidelity (HF) Cas9s, which are less tolerant of mismatches between the protospacer and sgRNA, exhibit decreased efficiency when using a 21 nucleotide (nt) with a mismatched 5’ G [REF]. Because PACE has previously led to Cas9s with HF properties, including sgRNA mismatch intolerance [REF], we sought to determine if our new variants shared the same characteristics.

[00481] The average base editing activity of the evolved variants was evaluated across all sites containing either a 20 nt protospacer with a matched 5’ G, a 21 nt protospacer with a matched 5’ G, or a 21 nt protospacer with a mismatched 5’G. Both the evolved variants and wt-Cas9 showed the highest base editing activity with a 20 nt protospacer and a matched 5’ G. When examining all NNNN PAMs, both the variants and wt-Cas9 showed a significant decrease in base editing efficiency when the protospacer was increased to 21 nt, regardless if the 5’ G was matched with the target sequence (FIG.40C). The magnitude of this decrease was greater for the evolved variants when compared to wt-Cas9. Interestingly, the deleterious effect of using a 21 nt protospacer on editing efficiency is ameliorated when targeting sites with a NGNN PAM (data not shown), and almost completely absent when targeting sites with a NGGN PAM (FIG.40D). This is especially true for wt-Cas9, which shows no significantly decreased base editing activity on sites with a 21 nt protospacer when the PAM is NGG.

EXAMPLE 11

Evolved Cas9s correct disease-associated SNPs by accessing non-G PAMs

[00482] To demonstrate the utility of the evolved variants in a disease-relevant context, the Glu to Val point mutation at position 6 of the sickle-hemoglobin (HbS) variant of b-globin, which is causative of red blood cell sickling in sickle-cell anemia, was targeted [REF]. The HbS mutation arises from a GAG to GTG codon change, which cannot be fully reverted through current base editing technologies. However, this SNP can be partially corrected with ABE to a GCG (Ala) through A·T to G·C conversion on the opposite strand. This genotype, known as the Makassar mutation, has been shown to result in phenotypically normal hemoglobin.

[00483] Unfortunately, the only NGG or NGN PAMs available at this site place the target A at either position 2 or 9, respectively, which fall outside the optimal editing window for ABE.

However, two alternative target protospacer sequences that fall adjacent to a CAT or CAC PAM place the target A at either position 4 or 7, respectively, with an off-target A present at either position 6 or 9 leading to a silent CCT to CCC (Pro to Pro) mutation. Thus, the ability of the evolved variants, along with SpCas9-NG, to convert the sickle-cell SNP to the Makassar mutation using these alternative sites with non-G PAMs was evaluated.

[00484] In experiments using HEK293T cells engineered with a GAG to GTG mutation at codon 6 of b-globin, while the evolved variants supported considerable A to G conversion at both sites, SpCas9-NG edited efficiently only using the protospacer sequence containing a CAT PAM. This is perhaps due to the presence of a G at the 4th position of this PAM sequence (FIGs.41B, 41C), which appears to improve SpCas9-NG’s recognition of NAN PAMs (see above). Unfortunately, editing using the CAT PAM protospacer occurred primarily at the off-target base (position 6), with the target A (position 4) showing less than 10% conversion across all editors (FIG.41C). Base conversion using the CAC PAM protospacer, however, was much more efficient. As expected, ABEmax-NRCH showed the highest editing activity, with 40.6±6.5% base conversion at the target A (position 7) and 13.0±5.6% at the off-target A (position 9).

[00485] ABEmax-NRRH and -NRTH were also able to achieve 28.9±7.4% and 14.1±4.8% conversion, respectively. The high activity of all three evolved variants at this site likely stems from the presence of a C at the 4th position of the CAC PAM sequence. In comparison, ABEmax-NG showed negligible (1.0±0.8%) base conversion activity at this site (FIG.41B). Collectively, these results suggest that both the evolved variants and SpCas9-NG have the potential to edit disease relevant SNPs using non-G PAMs, and furthermore highlight the utility of targeting a SNP using multiple protospacer/PAM sequences.

[00486] Together with SpCas9-NG, the evolved variants NRRH, NRCH, and NRTH should expand the targeting scope of SpCas9 to sites with NR PAMs, increasing the number of pathogenic SNPs correctable by either CBE or ABE. Based on analysis of the ClinVar database, 95.0% of pathogenic SNPs correctable through a C·G to T·A conversion and 94.7% of pathogenic SNPs correctable through an A·T to G·C conversion can be targeting using an NR PAM. Additionally, expansion to NR PAMs increases the number of possible protospacers available for targeting a given SNP for correction with base editors: on average, there are XX protospacers per disease SNP targetable with CBE and XX protospacers for those targetable with ABE with NR PAMs, compared to XX targetable with CBE and XX targetable with ABE, respectively, when using NG PAMs.

EXAMPLE 12

[00487] Characterizing Mutants that Work on NRRH, NRCH, and NRTH PAMs

[00488] SpCas9 mutant proteins were identified that work best on NRRH, NRCH, and NRTH PAMs. The SpCas9 mutant protein that works best on NARH (“es” variant), has an amino acid sequence as presented in SEQ ID NO: 22 (underligned residues are mutated from SpCas9) MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAE ATRLKRTA RRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY HEKYPTIY

[00489] The SpCas9 mutant protein that works best on NRCH (“fn” variant), has an amino acid sequence as presented in SEQ ID NO: 23 (underligned residues are mutated from SpCas9)

[00490] The SpCas9 mutant protein that works best on NRTH (“ax” variant), has an amino acid

[00491] The base-editing activity of the ax, es, fn, and SpCas9 (“NG”) proteins was characterized in vitro in HEK293T cells on NAA, NAC, NAT, and NAG PAMs (FIGs.33A-33D; 34A-34B). The es protein had increased activity on CAAA, CAAC, AAAT, and GAAC PAMs, the fn protein had increased activity on AACC, AACT, TACT, TACC, CACT, and CACC PAMs, the ax protein had increased activity on AATA, TATT, TATA, TATC, CATA, CATT, CATC, GATA, GATT, and GATC PAMS compared with other SpCas9 proteins (FIGs.33A-33C; 34A-34B).

[00492] The A to G base editing activity of es and fn SpCas9 proteins were also characterized in vitro in HEK293T cells on NAA, NGA, NAC, and NGC PAMs (FIGs.35A-35C). The es, fn, or wild-type SpCas9 proteins were incorporated into the ABEMAX A to G gene editing fusion protein. The es protein had increased base-editing activity on AAAT, CAAC, GAAC, AACC, TACT, TACC, CACT, CACC, AGCC, AAGA, and AAGC PAMs compared with NG SpCas9 protein (FIGs.35A, 35B). The fn protein had increased base-editing activity on GGGT and TGGC compared with NG SpCas9 protein (FIG.35C). EXAMPLE 13

[00493] Continuous evolution of SpCas9 variants compatible with non-G PAMs

[00494] Streptococcus pyogenes Cas9 (SpCas9) is a widely used genome editing tool, but can only access a small fraction of DNA sites due to its requirement for an NGG protospaceradjacent motf (PAM). This limits SpCas9’s utility for precision genome editing

applications such as base editing (Rees and Liu, 2018), homology-directed repair (Paquet et al., 2016), and predictable template-free end-joining repair (Shen et al., 2018). While

SpCas9 variants with alternative PAM requirements have been reported, their targeting scope remains primarily restricted to PAMs containing G. Here, we report the laboratory

evolution of three new SpCas9 variants collectively capable of recognizing NRNH PAMs

(where R = A or G and H = A, C, or T) using an improved phage-assisted continuous evolution (PACE) selection for DNA binding. We show that these variants recognize

NAAH, NACH, NATH, and NAGH PAMs to effect indel formation, cytosine base editing, and adenine base editing using a panel of 64 endogenous human genome target sites

containing all NANN PAMs. Additionally, we profile the editing efficiencies of our evolved SpCas9s and the previously-reported SpCas9-NG as base editors on a 11,776-member genomically integrated protospacer/sgRNA pair library spanning all NNNN PAMs in HEK293T cells to provide an exhaustive characterization of their PAM preferences in a human cell setting. Finally, we demonstrate the ability of our variants to enable A•T-to- G•C base editing of the founder sickle-cell anemia mutation of b-globin using a previously inaccessible CAC PAM. Together with previously reported SpCas9 mutants, these newly evolved variants expand the targeting scope of SpCas9 to include a majority of NR PAM sequences, greatly increasing the fraction of genomes accessible to Cas9- mediated genome editing.

[00495] The CRISPR-Cas9 system, originally evolved as a mechanism for adaptive immunity in bacteria, has in recent years transformed the life sciences by enabling a wide range of techniques for targeted genome manipulation including gene disruption, homologydirected repair, gene regulation, and base editing (Komor et al., 2017). The applicability of these techniques is limited by the requirement of Cas9 for a protospacer-adjacent motif (PAM) in order to bind a DNA sequence. For example, wild-type Streptococcus pyogenes Cas9 (SpCas9), the most widely-used and well- characterized Cas9 homolog (Komor et al., 2017), recognizes an NGG PAM immediately 3’ of the target DNA sequence, and with rare exception will not efficiently engage DNA sequences lacking an NGG PAM (Jinek et al., 2012). To address this limitation and expand the range of targetable genomic loci, researchers have used naturally occurring Cas9 orthologs with different PAM specificities (Cebrian-Serrano and Davies, 2017). The majority of these natural Cas9 variants, however, are less well-characterized, less active in a variety of conditions, and/or more stringent in their PAM requirements than SpCas9.

[00496] Motivated by the limited set of natural Cas9 homologs that have successfully been used for genome editing, researchers have engineered or evolved both Staphylococcus aureus Cas9 (SaCas9) (Kleinstiver et al., 2015a) and SpCas9 (Hu et al., 2018; Kleinstiver et al., 2015b; Nishimasu et al., 2018) to increase their PAM targeting scope. These efforts have led to an expansion of SpCas9’s potential PAM compatibility from NGG to most NG sites (Hu et al., 2018; Nishimasu et al., 2018). However, despite this substantial increase in SpCas9’s DNA targeting capability, non G-rich locations in the genome remain difficult to access, despite their abundance. The restriction on Cas9 targeting is especially problematic when using precision genome editing techniques which require strict placement of the Cas9 in relation to the desired genomic edit, such as homology-directed repair (HDR) (Paquet et al., 2016), predictable template-free end-joining (Shen et al., 2018), and base editing (Rees and Liu, 2018).

[00497] Base editing is a widely used genome editing technology in which a target base is directly converted to another base through deamination of cytosine to uracil (cytosine base editor, CBE) (Komor et al., 2016), or adenine to inosine (adenine base editor, ABE) (Gaudelli et al., 2017) by a Cas9-directed deaminase, ultimately resulting in a C•G-to- T•A, or A•T-to-G•C conversion, respectively. This technology is particularly sensitive to Cas9 positioning: activity for SpCas9-derived editors, for example, is optimal when the PAM is located approximately 13-17 nt away from the target base (Rees and Liu, 2018). In addition, for any given base edit, it may be desirable to screen multiple target sequence windows to maximize on-target activity while minimizing editing of other bases (Jin et al., 2019; Lee et al., 2018a; Xin et al., 2019; Zuo et al., 2019). Taken together, these requirements highlight the major ongoing need to access additional PAM sequences. Here we report the directed evolution of three new SpCas9 variants capable of recognizing NRRH, NRTH, and NRCH PAMs, respectively, where R = A or G, and H = A, C, or T. These variants were evolved through improved phage-assisted continuous evolution (PACE) selections for SpCas9 binding to specific sequences with non-NGG PAMs. We extensively characterized these three new variants, as well as SpCas9-NG (Nishimasu et al., 2018), a previously-reported engineered SpCas9 that recognizes NG PAMs, on 64 endogenous human genomic target sites, as well as a library of 11,776 integrated target sites. The new variants reported here, together with previously reported NG- compatible Cas9 variants, expand the potentially accessible PAM sequence space of SpCas9 to cover the vast majority of NR sequences.

Results

Initial evolution of SpCas9 toward non-G PAM sequences

[00498] Phage-assisted continuous evolution (PACE), a method for the rapid directed evolution of biomolecules, has been used to evolve a wide range of proteins including RNA polymerases (Carlson et al., 2014; Dickinson et al., 2013; Esvelt et al., 2011; Pu et al., 2017), proteases (Dickinson et al., 2014; Packer et al., 2017), antibody-like proteins (Badran et al., 2016; Wang et al., 2018), insecticidal proteins (Badran et al., 2016), metabolic enzymes (Roth et al., 2019), aminoacyl-tRNA synthetases (Bryson et al., 2017), and DNA-binding proteins (Hu et al., 2018; Hubbard et al., 2015). In PACE, a population of bacteriophage (selection phage, SP) is continuously diluted by E. coli host cells (Esvelt et al., 2011). These SP lack gene III (gIII), which encodes the coat protein pIII that is essential for phage infectivity, and instead express the protein to be evolved.

[00499] SP carrying protein variants with desired activity are able to trigger the production of pIII from an accessory plasmid (AP) in the host cells, thus generating infectious progeny and allowing the SP population to persist despite continuous dilution. Conversely, SP encoding inactive variants cannot trigger pIII production, and produce non-infectious progeny that are rapidly diluted out of the system. The SP genome is continuously mutagenized by a mutagenesis plasmid (MP), thus generating diversity in the evolving protein of interest.

[00500] PACE was used to evolve SpCas9 variants with broadened PAM compatibility by linking PAM recognition to SP propagation through a bacterial one-hybrid protein:DNA binding selection (Hu et al., 2018). In this selection system, binding of a nuclease-inactive dSpCas9 variant fused to the E. coli RNA polymerase omega subunit (ώ–dSpCas9) to a target protospacer-PAM sequence recruits E. coli RNA polymerase to drive gIII transcription from an adjacent s70 promoter (FIG.36 (A)). Only SP carrying w– dSpCas9 variants capable of binding to the target PAM sequence will produce infectious progeny phage and replicate during PACE (Hu et al., 2018). Evolving SpCas9 against a mixture of all possible NNN or HHH (H = non-G) PAMs using this selection led to xCas9, which can bind some NG PAMs, but very few non-G PAMs (Hu et al., 2018). We hypothesized that during the evolution of xCas9, the use of a complex mixture of many PAMs reduced the selection pressure for binding activity on any specific PAM. Therefore, we reasoned that selecting for binding to specific PAM sequences in parallel PACE experiments might result in SpCas9 variants with better recognition of non-canonical PAMs.

[00501] To determine which non-G PAMs might be accessible upon extensive SpCas9 evolution, we performed phage propagation assays, which serve as a proxy for a protein’s activity on a defined target, of SP encoding either SpCas9 or xCas9 on host cells containing APs spanning all 64 NNN PAM sequences (FIG.36(B)). While SpCas9 and xCas9 demonstrated phage propagation activity on many G-containing PAMs, SP encoding xCas9 and, to a more limited extent, SpCas9, also showed modest propagation on host cells containing NAN PAM APs (FIG.36(B)). Thus, we decided to focus our evolution efforts on the NAN subset of PAM sequence space.

[00502] We began by using phage-assisted non-continuous evolution (PANCE) (Roth et al., 2019; Suzuki et al., 2017), in which SP are iteratively passaged through serial dilution in

plate wells containing host cells, to evolve either SpCas9 or xCas9 for binding to each of the 16 possible NAN PAM target sequences in parallel (FIG.36(C)). While slower than

PACE, PANCE is less stringent, enabling weakly active variants to replicate (Roth et al., 2019) and can be performed in higher throughput, allowing us to evolve simultaneously

towards many different targets. After performing 19 rounds of serial dilution in PANCE (total net phage replication of ~1038-fold) on each of the 16 NAN PAM variants in parallel, we observed mutations largely differing according to the 3rd base of the NAN PAM targeted for evolution (FIG. 36(D)). For example, variants selected on NAA enriched Gly, Ile, or Lys at position 1333, while those selected on NAT enriched Gln or Leu at position 1335. [00503] Finally, variants evolved to bind NAC simultaneously acquired Gln at position 1335 and Asn at position 1337. Given this early divergence, we decided to divide the evolution of these SpCas9 variants into three separate non-G PAM trajectories: HAA, HAT, and HAC. Because our goal was to evolve SpCas9 to recognize non-G PAMs, we chose to exclude NAG from our targets; additionally, NG-targeting SpCas9 variants have been reported (Hu et al., 2018; Nishimasu et al., 2018), which in theory should allow targeting of sites with NAG PAMs by simply shifting the protospacer sequence by a single nt in the 3’ direction.

New Cas9 PACE selections enable evolution of NAA PAM binding activity

[00504] The NAA PAM trajectory was initially focused on. Despite enriching for multiple consensus mutations in the PAM-interacting domain (PID; residues 1099-1368) (D1135N, E1219V, Q1221H, H1264Y, A1320V, R1333K), our NAA-targeted PANCE evolved variants exhibited low base editing activity when subcloned into C to T base editors (CBEs) and tested on sites containing NAA PAMs in mammalian cells (clone GAA.N1-4; FIG.37C). We hypothesized that evolving increased binding activity might benefit editing efficiencies, and implemented three strategies to increase selection stringency.

[00505] First, we required that the evolving SpCas9 also bind a second, distinct protospacer by using a dual-AP system. In this system, each AP provides one half of split-intein pIII (Wang et al., 2018) under control of the Cas91-hybrid circuit. Binding of the SpCas9 variant to both sites produces both pIII-intein halves, which must be coexpressed to splice and generate functional full-length pIII (FIG. 37A). We chose two variants evolved in PANCE (GAA.N1-2 and GAA.N1-4; FIG.37D and 37B) and subjected them to PACE using this dual-AP system. These experiments, which also targeted a CAA PAM, lead to the acquisition of five additional consensus mutations (A10T, I322V, S409I, E427G and G715C; FIG.44B), which together in clone CAA.P1-1 improved CBE activity on sites with NAA PAMs in mammalian cells 4.2-fold on average when compared to PANCE evolved variant GAA.N1-4 (FIG.37C). [00506] Second, we reasoned that production of large amounts of w–dSpCas9 by the SP might saturate binding to protospacer-PAM sites even if the affinity of the SpCas9 variant for that PAM was modest. Indeed, previous reports have shown that using higher concentrations of SpCas9 can lead to recognition of non-canonical PAM sequences (Karvelis et al., 2015), despite modest binding of these sequences by SpCas9. Unfortunately, as both the promoter and ribosome-binding site for w–dSpCas9 are encoded on the SP, the total amount of w–dSpCas9 produced is subject to selection in PACE and thus falls outside of experimenter control.

[00507] Therefore, we sought to limit the total amount of SpCas9 present in the selection by using a split-intein to divide w– dSpCas9. Here, only the C-terminal segment of dSpCas9 (residues 574-1368) fused to NpuC (dSpCas9C) is encoded on the evolving SP, and the w–N-terminal portion (residues 1- 573) fused to NpuN (w–dSpCas9N) is provided on an immutable complementary plasmid (CP) in the host cells (FIG.37A and 43B). This strategy (hereafter,“split-SpCas9”) allows the total amount of full-length SpCas9 produced in the host cells in PACE to be limited by the expression level of w– dSpCas9N from the CP.

[00508] We subcloned the mutations obtained from clone CAA.P1-1 (FIG.37B), evolved using the dual-AP selection, into the split-SpCas9 format. Four mutations (T10A, I322V, S409I, and E427G) had accumulated in residues 1-573 of this clone. To investigate their effect, we compared the activity of w–dSpCas9N with that of w–dSpCas9N(T10A I322V S409I E427G) (hereafter referred to as w– dSpCas9N-mut) in overnight phage propagation assays using phage encoding dSpCas9C derived from CAA.P1-1. We observed greater phage propagation on host cells with a CP encoding w–dSpCas9N- mut(FIG.43D), suggesting that these four mutations might have a beneficial effect on SpCas9 binding. Therefore, we used w–dSpCas9N-mut in the CP supporting all subsequent evolution efforts.

[00509] We subjected our evolved CAA.P1-1 dSpCas9C to two subsequent PACE campaigns (8 and 3 days, respectively, at average flow rates of 1.3 V/h) using host cells harboring an AP containing an AAA or CAA PAM target site and CPs providing successively decreasing amounts of w–dSpCas9N- mut (see Methods for details). These rounds lead to the accumulation of additional mutations in the PID, including D1180G, which was present in several sequenced clones (CAA.P2-2, AAA.P3-1, CAA.P3-1,2; FIG.37B).

[00510] Among 10 surviving clones randomly chosen for sequencing, we observed 7-17 nonsilent mutations per clone (FIG.37B). From these, the SpCas9 variant CAA.P2-2 exhibited a large increase in mammalian cell base editing activity, with more than double the activity of our previous variants on most NAA sites tested (FIG.37C). Third, to further increase selection stringency, we removed gene VI (gVI), which is essential for phage propagation, (Brödel et al., 2016) from the SP for use as a second selection marker (in addition to gIII) in PACE. This strategy allowed us to combine both selection modifications described above by requiring a split-dSpCas9 to bind each of two distinct protospacers in order to express both gIII and gVI (FIG.37A).

[00511] Thus, three dSpCas9C clones from our previous evolutions (CAA.P2-1, CAA.P2-2, and CAA.P3-1) were subjected to this highest-stringency selection in PACE, resulting in additional mutations in the PID. Most notably, R1114G and L1318S were both highly enriched among sequenced surviving variants, which on average contained 20 non-silent mutations relative to SpCas9 (TAA.P4; FIG.37B). When tested in mammalian cell CBE experiments, these variants showed little editing activity (FIG.37C). We theorized that the large number of mutations present in these highly evolved variants, especially those outside of the PID, might prove deleterious to expression and/or inactivate either nuclease domain. To address this possibility, we performed DNA shuffling of the C- terminal portion (residues 574-1368) of the pool of variants from this final evolution with wild-type SpCas9(574-1368), allowing deleterious mutations to exit while shuffling mutations between the pool members, and re-subjected the resulting library to our most stringent binding selection. This “backcrossing” process led to the isolation of clone TAA.P4s-4 (R1114G, D1135N, V1139A, D1180G, E1219V, Q1221H, A1320V, R1333K) (FIG.37B and 37D), which demonstrated a 1.2- fold increase relative to the previous best PACE mutant across all HAA sites tested amongst our variants (FIG.37C).

Evolution of SpCas9 variants that recognize NAT or NAC PAM sequences

[00512] Based on the outcomes of the NAA PAM evolution campaigns, we approached the evolution of SpCas9 variants capable of recognizing NAT and NAC PAM sites in a fashion that avoids potentially deleterious bystander mutations. To ensure that we started with nuclease-active variants, we developed a modified version of a previously reported (Kleinstiver et al., 2015b) bacterial DNA cleavage selection (FIG.43E and 43F). In this nuclease selection, SpCas9 variants are challenged for their ability to cleave a protospacer-PAM sequence on a high-copy plasmid that also encodes a conditionally toxic gene (sacB). The surviving cells encode nuclease-active SpCas9 variants that cleave the target sequence, destroying the toxic plasmid.

[00513] Thus, we converted the dSpCas9 clones from the NAT or NAC PANCE pools into nuclease- active forms by restoring Asp 10 and His 840, then passed the resulting libraries through the nuclease selection using a TAT or CAC PAM, respectively. From this, we isolated two clones (SacB.TAT-1 and -2; FIG.37E) that exhibited DNA cleavage activity on the TAT PAM with PID consensus mutations of D1135N, E1219V, Q1221H, P1321S, and R1335L, and a third clone that cleaved a CAC PAM with PID mutations N1135D, E1219V, D1332N, R1335Q, and T1337N (SacB.CAC; FIG.37F). We evolved these nuclease-active TAT and CAC variants in split-dSpCas9 PACE using host cells encoding APs with either NAT (AAT or TAT) PAMs or NAC (AAC, TAC, or CAC) PAMs, respectively. These experiments resulted in the enrichment of several additional PID mutations, including R1114G, which arose independently in all three trajectories (NAA, NAT, and NAC) (Figures 37B, 37E, and 37F), suggesting that this mutation may be generally beneficial for modifying PAM recognition by the PID in a manner compatible with NA PAMs.

[00514] Next, we removed gVI from these evolved SP pools and subjected them to additional selection in split-dSpCas9 PACE using the dual-AP system (FIG.37A and FIG.43C). Both protospacers contained either an AAT or TAC PAM for evolution following the NAT or NAC trajectory, respectively. Increasing stringency for the NAT-targeting SpCas9 did not improve activity despite enrichment of several mutations (TAT.P6; FIG.37D and 44A). We therefore selected the most active NAT PAM-targeting variant from the split-dSpCas9 evolution (TAT.P5-1; Figure 37D) to move forward with. This variant contained the 11 PID mutations R1114G, D1135N, D1180G, G1218S, E1219V, Q1221H, P1249S, E1253K, P1321S, D1332G, R1335L (Figures 37E and 37G). PACE of NAC-targeting splitdSpCas9 using dual protospacers and a TAC PAM also enriched for several mutations (TAC.P9; Figure 37G). We shuffled residues 574-1368 of the surviving clones with that of SpCas9 and re-challenged the resulting library with our most stringent binding selection (TAC.P9s; Figure 37G). From the surviving SP pool, we isolated clone TAC.P9s-3 with the PID mutations R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N, S1338T, and H1249R (Figures 37F and 37H).

Mutations outside of the PID

[00515] Structural studies of SpCas9 suggest that residues in the PID mediate PAM specificity (Anders et al., 2014). Indeed, most of the previous efforts to engineer or evolve SpCas9 to accept alternative PAMs have focused on modulating this region of the protein (Kleinstiver et al., 2015b; Nishimasu et al., 2018). However, because our PANCE and PACE experiments allowed mutation of either the entire SpCas9 sequence or residues 574-1368 (in the case of split-intein w–dSpCas9), we observed the enrichment of many where from 3 to 15 mutations outside of the PID. Because many of these mutations fell within the RuvC or HNH nuclease domains, we anticipated that some would negatively impact SpCas9 nuclease activity (Jiang and Doudna, 2017). However, other mutations in the helical domain consistently enriched across several independent evolving populations, suggesting that they may confer a beneficial effect on SpCas9 DNA binding/unwinding. To minimize the deleterious effects from bystander mutation accumulation in the nuclease domains while preserving beneficial mutations in the helical domain, we decided to transplant our evolved PIDs onto a fixed N- terminal sequence that included the mutations T10A, I322V, S409I, E427G that we found to improve phage propagation in the split-dSpCas9 selection (FIG.43D), as well as R654L and R753G, which consistently enriched across multiple independent PACE experiments (Figure 44B).

[00516] The addition of all six NTD mutations to CBEs containing the PIDs of NAA variant TAA.P4s-4 and NAC variant TAC.P9s-3 improved CBE activity in mammalian cells across several sites when compared to SpCas9 variants containing only the evolved PID mutations (Figure 44C). A smaller benefit was observed when the NTD mutations were added to the PID mutations of NAT variant TAT.P5-1 (Figure 44C). We incorporated these six N-terminal mutations into all three final variants, hereafter referred to as SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH, which are the addition of T10A, I322V, S409I, E427G, R654L, and R753G to the evolved PID domains from TAA.P4s-4, TAT.P5-1, and TAC.P9s-3, respectively.

Characterization of PAM specificity through bacterial depletion

[00517] To better characterize the PAM specificities of our evolved variants as nucleases, we performed bacterial PAM depletion using a NNNN PAM library (Kleinstiver et al., 2015b). For comparison, we also performed depletion experiments with SpCas9-NG in parallel. Cells were plated after 1-hour, 3-hour, or overnight expression of the SpCas9 variant from an inducible promoter to assess kinetic differences in PAM sequence preference. Consistent with the eventual cleavage of even modestly recognized PAMs, depletion scores of any given PAM (defined as the frequency of the PAM in the input library divided by the frequency of the PAM post-selection) increased with longer induction times, with the shortest induction times resulting in the most noticeable sequence preferences (Figure 45A).

[00518] For example, at shorter induction times, SpCas9-NRRH exhibited a strong preference for C at the 4th PAM position, a mixed preference for purines at positions 2 and 3 and a moderate preference for G at position 1 (Figure 38A). However, longer induction times resulted in more relaxed preferences at all PAM positions. Similarly, SpCas9-NRCH showed a strong preference for G at position 2 and a moderate preference for pyrimidines at position 4 (Figure 38A) at shorter inductions, but only a mixed enrichment for purines at position 2 was observable at longer induction times (Figures 38A and 45A). Finally, at short induction times, SpCas9-NRTH enriched strongly for G and T at positions 2 and 3, respectively (Figure 38A), but the nucleotide preference at position 2 shifted to a mix of G and A at longer timepoints, suggesting that this variant recognizes and cleaves NAT PAMs more slowly than NGT PAMs. These results also suggest that SpCas9-NRTH may preferentially recognize NGT over NGG PAMs, as the NGT PAMs were more strongly depleted than NGG PAMs (average depletion score of 1394 for NGT compared to 223 for NGG at 1 h induction).

[00519] Interestingly, SpCas9-NG displayed a moderate preference for G at the 3rd and 4 th PAM position at short induction times. This finding is consistent with the T1337R mutation in SpCas9-NG, which is also found in SpCas9 VRER and VRQR (Kleinstiver et al., 2015b) and is the basis of the increased specificity for G at the 4th PAM position in these two variants (Anders et al., 2016; Hirano et al., 2016b; Kleinstiver et al., 2015b). Similar to the evolved SpCas9s described here, SpCas9-NG’s PAM sequence requirements also became more relaxed with longer induction times (Figure 45A). Evolved SpCas9 nucleases generate indels at endogenous human genomic loci

[00520] Next, we assessed the activity of our evolved variants in HEK293T cells through indel formation at 64 endogenous target sites spanning all possible NANN PAMs. For comparison, we also tested the activity of SpCas9-NG at these sites in parallel. Generally, each of our variants displayed the highest indel formation activity on target sites containing a PAM it was evolved to recognize, with SpCas9-NRRH and -NRTH showing an average of 23±4.5% and 23±4.1% indel formation on target sites containing NAAN and NATN PAMs, respectively. Sites containing NACN PAMs were edited at slightly lower efficiencies, with SpCas9-NRCH averaging 18±3.4% indel formation. Additionally, SpCas9-NRRH displayed 23±4.3% average indel formation on sites containing a NAG PAM, even though it had not been evolved to bind this PAM sequence (Figure 3B). Indel formation activity of xCas9 was also examined at a subset of NAN sites and found to be minimal (Figure 45B). [00521] Interestingly, we also observed indel formation with SpCas9-NG at some NANN sites. Although its average indel formation across these sites was lower than our evolved variants, SpCas9- NG displayed activity at sites with NANG PAMs (NAAG: 12±1.7%, NACG: 14±3.0%, NATG: 23±3.6%, NAGG: 20±2.5% average indel formation) (Figure 38B). In contrast, our evolved variants showed the lowest average activity at sites with PAM sequences with a G at position 4, and the highest at sites with a non-G (H) at this position (27±5.0%, 27±4.7%, 24±3.9%, and 27±4.4% average indel formation for SpCas9-NRRH, -NRTH, -NRCH, and -NRRH on NAAH, NATH, NACH, and NAGH PAMs, respectively) (Figures 38B and 38C). These results are consistent with the sequence preferences predicted by our bacterial PAM depletion experiments and suggest that our variants and SpCas9-NG exhibit complementary PAM specificities, especially with respect to non-G versus G bases at the 4th position.

[00522] We also tested the indel formation activity of our evolved variants and SpCas9-NG on a number of endogenous target sites containing NGN, rather than NAN, PAMs. While treatment with SpCas9-NG gave rise to robust indel formation on most NGN PAMs examined (48±4.4%), SpCas9- NRTH and -NRCH showed slightly higher activity than SpCas9-NG at NGT and NGC PAMs, with 68±3.9% and 42±2.5% average indel formation, respectively (Figure 45C). Consistent with the PAM depletion assay results, a preference for H at position of the PAM was observed in these experiments for SpCas9-NRTH and -NRCH.

DNA specificity of evolved SpCas9 nucleases

[00523] As broadening the PAM targeting capabilities of various Cas9 has been shown to increase the proportion of genomic off-targets edits (Kleinstiver et al., 2015a; Nishimasu et al., 2018), we performed genome-wide, unbiased identification of double-strand breaks enabled by sequencing (GUIDE-seq) using SpCas9, SpCas9-NRRH, -NRCH, and–NRTH in U2OS cells (Tsai et al., 2015). For comparison, we also analyzed xCas9, which was previously shown to possess reduced off-target activity (Hu et al., 2018). These experiments showed that, when targeting the highly promiscuous HEK site 4 (HEK4) (Tsai et al., 2015), our evolved variants displayed comparable or better on-target activity (8.8%, 22.5%, and 7.8% on-target reads of total reads for SpCas9-NRRH, -NRTH, and - NRCH, respectively) when compared to SpCas9 (5.1% total reads) (Figure 38D and 45D). This is similar to xCas9, which also exhibited improved on-target activity (12.7% total reads) relative to SpCas9 (Figure 38D and 45D) (Hu et al., 2018). Interestingly, our variants primarily displayed off- target activity at sites containing PAMs consistent with their evolved preferences. For example, the most prominent off-target for SpCas9-NRRH occurs at a site bearing a CAA PAM (10% total reads), SpCas9-NRTH at a GGT PAM (10.2% total reads), and SpCas9-NRCH at a TGC PAM (9.9% total reads) (Figure 45D).

[00524] Various off-targets were also observed at sites with NRN PAMs, such as GAA, GAT, and CAG, for these evolved SpCas9s (Figure 45D). Taken together, these results suggest that our evolved variants may have similar or increased DNA specificity compared to SpCas9 on sites with NGG PAMs, and due to their altered PAM specificities may access a different set of off-target sequences. Evolved SpCas9s support cytosine and adenine base editing

[00525] Since expanding the targeting scope of base editing was a major motivation behind our efforts, next we determined the ability of our evolved SpCas9s to support both cytosine and adenine base editing. We generated CBEs by incorporating our evolved variants into BE4max (Koblan et al., 2018) (hereafter referred to as“BE4”) in place of SpCas9 and tested their activity at the same 64 endogenous NANN PAM sites examined above for indel formation. As with their nuclease forms, each of the three evolved CBE variants showed the highest average activity on sites containing the PAM it was evolved to recognize. BE4-NRRH and BE4-NRTH performed best on NAAN and NATN PAMs with an average of 12±2.1% and 17±2.3% C•G to T•A conversion, respectively. CBE activity on NACN PAMs was slightly less efficient, with BE4-NRCH enabling the highest editing activity at these sites at an average of 11±1.7% base conversion. Both BE4-NRRH and BE4-NG (generated from SpCas9-NG) edit NAGN sites similarly, at 12±2.8% and 11±2.1% average base conversion (Figure 39A).

[00526] Improved editing activity was again observed on sites with NANH PAMs, where C•G to T•A conversion at NAAH, NATH, NACH, and NAGH sites increasing to 14±2.4%, 21±2.5%, 13.0±2.0%, and 14±2.3 for BE4-NRRH, -NRTH, -NRCH, and -NRRH, respectively (Figure 39A and 39B) . BE4-NG performed well at sites containing NANG PAMs, with 14±1.3% average editing (Figure 39A). Average CBE editing efficiency across all 64 sites was lower than that of indel formation, likely due to increased requirements for efficient base editing such as sequence context and position of the C within the window.

[00527] These editors also function on sites with NGN PAMs, editing at 17±2.3%, 9.1±3.0%, 19±2.9% and 20±4.0% for BE4-NRRH, -NRTH, -NRCH, and -NG, respectively (Figure 46A).

Finally, we also generated ABEmax (Koblan et al., 2018) variants (hereafter referred to as“ABE”) from SpCas9-NRRH, -NRTH, -NRCH, and SpCas9-NG, and tested adenine base editing at 54 endogenous loci. We observed that the newly evolved variants are also compatible with adenine base editing, exhibiting similar performance on a subset of sites containing NAN and NGN PAMs as we observed for the corresponding CBEs and nucleases. For example, ABE-NRRH, -NRTH, -NRCH, and -NRRH edited most efficiently at NAAH, NATH, NACH, and NAGH PAMs, with 16±2.6%, 24±2.9%, 13±2.2%, and 26±3.5% base conversion (Figure 39C and 46B).

[00528] The scope of base editing is limited by the requirement that the target base be located within the canonical CBE or ABE editing window (approximately protospacer positions 4-8, counting the PAM as positions 21-23). The evolved variants SpCas9- NRRH, -NRCH, and -NRTH, together with SpCas9-NG and xCas9, expand the targeting scope of SpCas9 to sites to cover the vast majority of NR PAMs, greatly increasing the fraction of known human pathogenic SNPs that can in theory be corrected by base editing. [00529] Among all pathogenic SNPs in the ClinVar database (Landrum et al., 2014) that are corrected by C•G to T•A conversion, 95% are targetable in principle with CBEs derived from SpCas9-NRRH, -NRCH, -NRTH, or SpCas9-NG/xCas9. Likewise, 95% of pathogenic SNPs in ClinVar that are correctable via A•T to G•C conversion can now be targeted with ABEs derived from the same set of Cas9 variants (Figure 39D).

[00530] In addition, these new variants greatly increase the number of possible protospacers available for targeting a given SNP for base editing: on average, there are 2.7 protospacers per pathogenic SNP targetable with CBE and 2.7 protospacers for those targetable with ABE with NR PAMs, compared to 1.7 targetable with CBE and 1.7 targetable with ABE, respectively, when using NG PAMs, and 1.3 and 1.3 protospacers available when using NGG PAMs only to target CBE and ABE, respectively (Figure 39E).

[00531] Since many pathogenic SNPs correctable by current base editors contain multiple targetable bases within the editing window (Figure 46C), expansion to NR PAMs enables multiple targeting strategies for a given SNP to optimize editing of the desired base, as we explicitly demonstrate below.

[00532] Collectively, these findings establish that evolved Cas9 variants SpCas9-NRRH, -NRCH, and -NRTH are compatible with both CBEs and ABEs, and thereby expand the targeting scope of base editing substantially.

Characterization of evolved SpCas9s on a human cell library of 11,776 integrated target sites

[00533] To comprehensively profile the PAM preferences of these variants, we analyzed the CBE efficiencies of our three evolved variants, SpCas9-NG, and SpCas9 on a library of 11,776 unique sequences in human cells. This library was designed using 46 distinct protospacers derived from sequences found in the human genome, each with different sequence contexts surrounding a fixed C at protospacer position 6, counting the PAM as positions 21-23. Each protospacer is adjacent to a PAM sequence of 4Ns, and is additionally flanked with designated primer binding sites for amplification for highthroughput sequencing (HTS) analysis (Figure 40A).

[00534] Due to the very large number of target sites (Figure 47A), characterization of our evolved variants in this library format revealed PAM preferences in finer detail when compared to our bacterial depletion and endogenous mammalian genomic site editingexperiments (Figure 40B).

Consistent with these previous experiments, our evolved variants exhibited the highest editing activity when either A or G was present at the 2 nd PAM position (Figures 40C and 47B), when the 3rd PAM base was the one on which it was evolved (Figures 40D and 47B), and when a non-G was present at the 4th position of the PAM (Figures 40E and 47B). BE4-NG also showed the highest editing activity when either A or G was present at the 2nd PAM position (Figure 40C), but, unlike our evolved variants, was most active when a G was present at the 4th position of the PAM (Figure 40E and 47B) or when G or T was in the 3 rd position (Figure 40D). In contrast, we found that BE4 editing efficiency at sites containing its canonical NGG PAM or its alternate NAG/NGA PAMs showed virtually no dependence on the 4th PAM nucleotide (Figure 40B). BE4 also showed some editing at sites containing a NCGG or NTGG PAM, which could be due to PAM slippage (Jiang et al., 2013), resulting in binding to a canonical NGG sequence.

[00535] Interestingly, our evolved variants and SpCas9-NG exhibit some level of editing activity at many more non-canonical PAMs when compared to SpCas9, supporting their broadened PAM scope (Figure 40B). Finally, both SpCas9-NG and our variants (most notably BE4-NRRH) performed best when a G was present at position 1 of the PAM and worst when a T was at this position; in contrast, BE4 exhibited only a slight preference for G at position 1 (Figures 40B, 47B, and 47C). Taken together, these results strongly support the PAM preferences observed in our bacterial depletion and endogenous mammalian genome editing experiments: specifically, recognition of NRRH, NRCH, NRTH, and NRNG PAMs for SpCas9-NRRH, -NRCH, -NRTH, and -NG, respectively. [00536] Additionally, this library allowed us to investigate the tolerance of our variants to mismatches between the sgRNA and the target DNA sequence. The U6 promoter, commonly used to express sgRNAs in mammalian cells, initiates transcription with a 5’ G. If a G is not natively present at the 5’ end of the protospacer, guide sequences are typically either extended to the next native G, or simply transcribed with a mismatched 5’ G at position -1 of the guide sequence. However, high- fidelity (HF) SpCas9s (Chen et al., 2017; Hu et al., 2018; Kleinstiver et al., 2016; Lee et al., 2018b; Slaymaker et al., 2016), which are less tolerant of mismatches between the protospacer and sgRNA, generally exhibit decreased efficiency when using a 21 nucleotide (nt) guide with a mismatched 5’ G (Kim et al., 2017b; Zhang et al., 2017). Because PACE has previously led to SpCas9s with HF properties (Hu et al., 2018), we sought to determine if our new variants shared the same

characteristics.

[00537] We investigated the average base editing activity of our evolved variants across all 11,776 library sites containing either a 20 nt protospacer with a matched 5’ G (“20- matched”), a 21 nt protospacer with a matched 5’ G (“21-matched”), or a 21 nt protospacer with a mismatched 5’G (“21- mismatched”). Our three evolved SpCas9 variants and SpCas9 all showed the highest base editing activity with a 20-matched sgRNA (Figures 40F, 40G, and 47D-F; however, interestingly, SpCas9- NG performed best with a 21- matched sgRNA (Figures 40F and 47D-F). When examining all NRNN PAMs, our variants and SpCas9 also showed a significant decrease in base editing efficiency when the sgRNA protospacer was increased to 21 nt, regardless if the 5’ G was matched with the target sequence (Figures 40F, 40G, 47D, and 47E); in contrast, for SpCas9-NG this was only true when the 21-mismatched sgRNA (Figures 40G, 47D, and 47E). The magnitude of this decrease was similar to or greater for our evolved variants (SpCas9-NRRH: 23±2.7%, SpCas9-NRTH: 12±2.9%, SpCas9- NRCH: 14±2.9%) when compared to SpCas9 (13±5.3%). In contrast, SpCas9-NG demonstrated a preference for 21-matched sgRNAs, leading to an average 18.5±5.4% increase of editing efficiency when compared to 20-matched sgRNAs (Figures 40F, 47D, and 47E); however, a decrease in editing efficiency was still observed with 21-mismatched sgRNAs (7.3±3.2%, Figures 40G, 47D, and 47E). Interestingly, the deleterious effect of using a 21 nt protospacer on the editing efficiency of our evolved variants and SpCas9 is lessened when targeting sites with NGNN or NGGN PAMs (Figures 40F, 40G, 47D, and 47F). This is especially true for SpCas9, which shows no significantly decreased base editing activity on sites with a 21 nt matched or mismatched protospacer when the PAM is NGG (Figures 40F and 47F). Together, these results suggest that our evolved variants are somewhat sensitive to the use of 21 nt sgRNA protospacers, and that this sensitivity is exacerbated by the presence of 5’G mismatches. Additionally, these experiments suggest that the optimal sgRNA protospacer length for SpCas9-NG may be longer than 20 nt.

Evolved SpCas9s enable efficient base editing of a pathogenic SNP

[00538] To demonstrate the utility of our evolved SpCas9 variants in a disease-relevant context, we targeted the Glu to Val point mutation at amino acid 6 of b-globin (HBB), which results in the HbS allele that is the most common cause of sickle-cell anemia (Rees et al., 2010). The HbS mutation arises from a GAG (Glu) to GTG (Val) codon change that cannot be reverted through current base editing technologies. However, this SNP can be edited with ABE to a GCG (Ala) through A•T to G•C conversion on the opposite strand (Figure 41A). The resulting HBB E6A genotype, known as the hemoglobin Makassar allele (HbG), has been reported as clinically normal in homozygous and heterozygous individuals (Quentin Blackwell et al., 1970; Sangkitporn et al., 2002; Viprakasit et al., 2002).

[00539] Unfortunately, the only NGG or NGN PAMs available at this site place the target A at either protospacer position 2 or 9, respectively, which fall outside the optimal editing window for ABE (positions 4-7) (Rees and Liu, 2018). However, two alternative target protospacer sequences using a CAT or CAC PAM place the target A at either position 4 or 7, respectively, with a bystander A present at either position 6 or 9 leading to a silent CCT to CCC (Pro to Pro) mutation. Thus, we tested the ability of our evolved variants, along with SpCas9-NG, to convert the sickle-cell SNP to the Makassar mutation using these two protospacer sites with non-G PAMs. We transfected ABE-NRRH, -NRTH, and NRCH, or ABE-NG into HEK293T cells with homozygous GAG to GTG mutations at codon 6 of HBB (Figure 48A). While ABEs derived from the SpCas9 variants evolved in this study supported substantial (14-55%) A•T-to-G•C conversion using guide RNAs targeting either the CAT PAM or the CAC PAM site, ABE-NG edited efficiently (40±0.2%) only using the protospacer sequence containing a CAT PAM (Figures 41B and 41C), perhaps due to the presence of a G at the 4th position of the CAT PAM, which improves SpCas9-NG’s recognition of NAN PAMs (see above).

[00540] Unfortunately, editing using the CAT PAM protospacer occurred primarily at the silent bystander base (position 6), with the target A (position 4) showing less than 10% editing across all four ABEs tested (Figures 41B and 48B).

[00541] Target base conversion of GTG to GCG in codon 6 of HBB using the CAC PAM protospacer, however, was much more efficient. As expected, ABE-NRCH showed the highest editing activity, with 41±3.8% base conversion at the target A (position 7) and 13±3.2% at the silent bystander A (position 9). ABE-NRRH and ABE-NRTH achieved 29±4.3% and 14±2.8% conversion, respectively (Figures 41C and 48C). In comparison, ABE-NG showed negligible (1.0±0.5%) target base conversion activity at this site (Figures 41C and 48C). Collectively, these results demonstrate that our evolved SpCas9 variants enable efficient base editing of previously inaccessible disease- relevant SNPs using non-G PAMs, and furthermore highlight the utility of evaluating multiple protospacer/PAM sequences for targeting a desired SNP.

Discussion

[00542] Here we report three new variants of SpCas9, evolved using phage-assisted continuous evolution (PACE), that are capable of recognizing NRRH (SEQ ID NO: 149), NRCH (SEQ ID NO: 150), and NRTH (SEQ ID NO: 151) PAM sequences. As our initial experiments suggested that increased selection stringency may be necessary to produce SpCas9 variants that were highly active on non-G PAMs, we developed several improved selection strategies for evolving Cas9:DNA binding. Specifically, by increasing the number of target DNA protospacer/PAM sites that must be recognized by the evolving SpCas9 through use of an additional PACE-compatible selection marker (gVI), and limiting the total concentration of full-length SpCas9 in the host cell through use of a split- intein strategy, we were able to select for variants that efficiently recognize a desired PAM while reducing the probability of evolving undesired recognition of specific protospacer sequences. These improved selection strategies should be applicable to a majority of Cas9 orthologs, and enable the further evolution of Cas9 variants capable of targeting a wide range of PAM sequences.

[00543] From our initial experiments evolving SpCas9 for binding activity on all 16 individual NAN PAMs, we were able to identify three distinct groups of consensus mutations that conferred binding activity on NAA, NAT, and NAC PAMs, respectively (Figure 1D), leading us to split our subsequent evolutionary efforts into three separate trajectories to target these specific PAMs. Accordingly, the diverging consensus mutations of our evolved variants give insight to potential modes of PAM interaction.

[00544] SpCas9-NRRH, evolved to bind HAA PAMs, acquired a mutation at R1333, which in SpCas9 contacts the 2nd guanine in its canonical PAM, but not R1335, which contacts the 3rd NGG guanine (Figure 37B and 37D). The R1333K mutation likely allows SpCas9-NRRH to accept both A and G at the 2nd PAM nucleotide, while the preservation of R1335 may explain why this variant recognizes both NAA and NAG PAMs. On the other hand, SpCas9-NRTH (evolved to bind HAT PAMs) preserves R1333 but eliminates R1335 through mutation to a Leu (Figure 37E and 37G). Interestingly, SpCas9-NRTH shows a strong preference for T in the 3rd PAM position and appears to have lost some recognition of the wild-type NGG PAM (Figure 40B). Finally, SpCas9-NRCH displays altered interactions at both R1335 and T1337 (Figure 37F and 37H); the T1337N in particular may form contacts with a 4th PAM nucleotide to compensate for weakened binding interactions with the HAC target PAM. [00545] In addition to alterations at residues responsible for direct contacts with PAM nucleobases, we observed a number of additional mutations which we suspect modulate more general interactions with the target- and non-target DNA, including R1114G, E1219V, Q1221H, and D1135N (Figure 37B, 37D-H). Residue E1219 forms hydrogen bonds with R1335 in SpCas9, and mutations at this residue are thought to destabilize the interaction between R1335 and the 3rd PAM guanine. Mutations at residue D1135 have been previously reported and are thought to modulate interactions with the sugarphosphate backbone of the non-target DNA strand; R1114G and Q1221H may alter similar interactions. Finally, we observed mutations in the helical domain of Cas9 that arose in several independently evolving populations (Figure 44B).

[00546] These mutations, when added to the N-terminal region of NRRH and NRCH, improve their recognition of non-G PAMs in base editing experiments (Figure 44C), and may contribute to increasing the overall DNA binding/unwinding activity of these variants. Along with bacterial PAM depletion and mammalian cell genome editing on endogenous genomic sites spanning all NANN PAMs, we characterized our variants and SpCas9-NG using a 11,776-member

sgRNA/protospacer/NNNN PAM library that was genomically integrated into HEK293T cells. The large number of sites examined greatly increases our ability to confidently profile the editing activity of these proteins using all NNNN PAMs in a human cell context, and illuminated the sequence preferences of these Cas9 variants, including previously uncovered activity of SpCas9-NG on NANG PAMs.

[00547] Both our bacterial PAM depletion experiments and mammalian library data demonstrated that our evolved variants display a different 4th base PAM preference (H) compared to SpCas9-NG (G), suggesting that they may have complimentary utility. While further investigation is required to explain the 4th base preferences of our mutants, crystal structures of SpCas9-NG and other previously reported evolved SpCas9s (VRER/VRQR) suggest that the T1337R mutation in these variants may create a direct interaction with the 4th base G (Anders et al., 2016; Hirano et al., 2016b; Nishimasu et al., 2018).

[00548] Additionally, both SpCas9-NG and our variants display a moderate preference for G at the 1st PAM position, whereas this preference in this position in SpCas9 is virtually nonexistent (Figure 47B and 47C). Because of these numerous sequence preferences, we suggest screening all variants reported here along with SpCas9-NG when optimizing targeting efficiency on sites with NR PAMs, and provide a recommended list of SaCas9 and SpCas9 variants to test for targeting any given NRNN PAM (FIG.42). However, we note that other Cas9 orthologs and related CRISPR effector proteins not included here have been also been shown to mediate genome editing in mammalian cells (Chatterjee et al., 2018; Cong et al., 2013; Edraki et al., 2019; Esvelt et al., 2013; Harrington et al., 2017; Hirano et al., 2016a; Hou et al., 2013; Kim et al., 2017a; Zetsche et al., 2015). Our evolved variants, along with SpCas9-NG, expand the utility of SpCas9 towards disease-relevant genome editing applications. Access to a broad range of PAMs is especially essential for base editing, as illustrated by our experiments targeting the sickle cell mutation of human b-globin. While ABE-NG was able to bind to this locus using a CATG PAM, the majority of base editing we observed occurred at an off-target A within the window (Figure 41B). However, we were able to achieve high levels of conversion at the correct base and lower levels of off-target editing with our evolved variants by using an adjacent CACC PAM (Figure 41C). Notably, the sickle cell SNP occurs within the optimal ABE window for both sgRNAs tested, suggesting that it may be beneficial to assay several protospacer sequences for a single target. Expanding the PAMs accessible by Cas9 variants to NR increases not only the number of targetable pathogenic SNPs (Figure 39D), but also the number of possible sgRNAs that can target an individual SNP (Figure 39E). Additionally, although only results from indel formation and base editing are shown in this work, we anticipate that our evolved variants should be compatible with the majority of Cas9-associated genome editing technologies. Access to NR PAMs should benefit all precision genome editing applications, including other base editing applications, HDR, and predictable template-free genome editing.

Example 13 REFERENCES

The following references are incorporated herein by reference.

Anders, C., Niewoehner, O., Duerst, A., and Jinek, M. (2014). Structural basis of

PAMdependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573. Anders, C., Bargsten, K., and Jinek, M. (2016). Structural Plasticity of PAM Recognition by Engineered Variants of the RNA-Guided Endonuclease Cas9. Mol. Cell 61, 895–902. Badran, A.H., Guzov, V.M., Huai, Q., Kemp, M.M., Vishwanath, P., Kain, W., Nance, A.M., Evdokimov, A., Moshiri, F., Turner, K.H., et al. (2016). Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance. Nature 533, 58–63. Brödel, A.K., Jaramillo, A., and Isalan, M. (2016). Engineering orthogonal dual transcription factors for multi-input synthetic promoters. Nat. Commun.7, 13858. Bryson, D.I., Fan, C., Guo, L.-T., Miller, C., Söll, D., and Liu, D.R. (2017). Continuous directed evolution of aminoacyl-tRNA synthetases. Nat. Chem. Biol.13, 1253–1260. Carlson, J.C., Badran, A.H., Guggiana-nilo, D.A., and Liu, D.R. (2014). Negative selection and stringency modulation in phage-assisted continuous evolution. Nat. Chem. Biol.10, 216–222. Cebrian-Serrano, A., and Davies, B. (2017). CRISPR-Cas orthologues and variants:

optimizing the repertoire, specificity and delivery of genome engineering tools. Mamm. Genome 28, 247–261. Chatterjee, P., Jakimo, N., and Jacobson, J.M. (2018). Minimal PAM specificity of a highly similar SpCas9 ortholog. Sci. Adv.4, eaau0766. Chen, J.S., Dagdas, Y.S., Kleinstiver, B.P., Welch, M.M., Sousa, A.A., Harrington, L.B., Sternberg, S.H., Joung, J.K., Yildiz, A., and Doudna, J.A. (2017). Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410. Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W., Marraffini, L.A., et al. (2013). Multiplex genome engineering using CRISPR/Cas systems. Science (80-. ).339, 819–823. Dickinson, B.C., Leconte, A.M., Allen, B., Esvelt, K.M., and Liu, D.R. (2013).

Experimental interrogation of the path dependence and stochasticity of protein evolution using phage- assisted continuous evolution. Proc. Natl. Acad. Sci.110, 9007–9012. Dickinson, B.C., Packer, M.S., Badran, A.H., and Liu, D.R. (2014). A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations. Nat. Commun. 5, 5352. Edraki, A., Mir, A., Ibraheim, R., Gainetdinov, I., Yoon, Y., Song, C.Q., Cao, Y., Gallant, J., Xue, W., Rivera-Pérez, J.A., et al. (2019). A Compact, High-Accuracy Cas9 with a Dinucleotide PAM for In Vivo Genome Editing. Mol. Cell 73, 714-726.e4. Esvelt, K.M., Carlson, J.C., and Liu, D.R. (2011). A system for the continuous directed evolution of biomolecules. Nature 472, 499–503. Esvelt, K.M., Mali, P., Braff, J.L., Moosburner, M., Yaung, S.J., and Church, G.M. (2013). Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat. Methods 10, 1116–1121. Gaudelli, N.M., Komor, A.C., Rees, H.A., Packer, M.S., Badran, A.H., Bryson, D.I., and Liu, D.R. (2017). Programmable base editing of T to G C in genomic DNA without DNA cleavage. Nature 551, 464–471. Harrington, L.B., Paez-Espino, D., Staahl, B.T., Chen, J.S., Ma, E., Kyrpides, N.C., and Doudna, J.A. (2017). A thermostable Cas9 with increased lifetime in human plasma. Nat. Commun.8, 1424. Hirano, H., Gootenberg, J.S., Horii, T., Abudayyeh, O.O., Kimura, M., Hsu, P.D., Nakane, T., Ishitani, R., Hatada, I., Zhang, F., et al. (2016a). Structure and Engineering of Francisella novicida Cas9. Cell 164, 950–961. Hirano, S., Nishimasu, H., Ishitani, R., and Nureki, O. (2016b). Structural Basis for the Altered PAM Specificities of Engineered CRISPR-Cas9. Mol. Cell 61, 886–894. Hou, Z., Zhang, Y., Propson, N.E., Howden, S.E., Chu, L.-F., Sontheimer, E.J., and Thomson, J.A. (2013). Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl. Acad. Sci.110, 15644–15649. Hu, J.H., Miller, S.M., Geurts, M.H., Tang, W., Chen, L., Sun, N., Zeina, C.M., Gao, X., Rees, H.A., Lin, Z., et al. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63. Hubbard, B.P., Badran, A.H., Zuris, J.A., Guilinger, J.P., Davis, K.M., Chen, L., Tsai, S.Q., Sander, J.D., Joung, J.K., and Liu, D.R. (2015). Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat. Methods 12, 939–942. Jiang, F., and Doudna, J.A. (2017). CRISPR–Cas9 Structures and Mechanisms. Annu.Rev. Biophys.46, 505–529. Jiang, W., Bikard, D., Cox, D., Zhang, F., and Marraffini, L.A. (2013). RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol.31, 233–239. Jin, S., Zong, Y., Gao, Q., Zhu, Z., Wang, Y., Qin, P., Liang, C., Wang, D., Qiu, J.L., Zhang, F., et al. (2019). Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science (80-. ).364, 292–295. Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A., and Charpentier, E. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science (80-. ). 337, 816–821. Karvelis, T., Gasiunas, G., Young, J., Bigelyte, G., Silanskas, A., Cigan, M., and Siksnys, V. (2015). Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements. Genome Biol.16, 253. Kim, E., Koo, T., Park, S.W., Kim, D., Kim, K., Cho, H.Y., Song, D.W., Lee, K.J., Jung, M.H., Kim, S., et al. (2017a). In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun.8, 14500. Kim, S., Bae, T., Hwang, J., and Kim, J.S. (2017b). Rescue of high-specificity Cas9 variants using sgRNAs with matched 5’ nucleotides. Genome Biol.18, 218. Kleinstiver, B.P., Prew, M.S., Tsai, S.Q., Nguyen, N.T., Topkar, V. V, Zheng, Z., and Joung, J.K. (2015a). Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat. Biotechnol.33, 1293–1298. Kleinstiver, B.P., Prew, M.S., Tsai, S.Q., Topkar, V. V, Nguyen, N.T., Zheng, Z., Gonzales, A.P.W., Li, Z., Peterson, R.T., Yeh, J.R.J., et al. (2015b). Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485. Kleinstiver, B.P., Pattanayak, V., Prew, M.S., Tsai, S.Q., Nguyen, N.T., Zheng, Z., and Joung, J.K. (2016). High-fidelity CRISPR-Cas9 nucleases with no detectable genomewide off-target effects. Nature 529, 490–495. Koblan, L.W., Doman, J.L., Wilson, C., Levy, J.M., Tay, T., Newby, G.A., Maianti, J.P., Raguram, A., and Liu, D.R. (2018). Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol.36, 843–846. Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A., and Liu, D.R. (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420– 424. Komor, A.C., Badran, A.H., and Liu, D.R. (2017). CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes. Cell 168, 20–36. Landrum, M.J., Lee, J.M., Riley, G.R., Jang, W., Rubinstein, W.S., Church, D.M., and Maglott, D.R. (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res.42, D980--D985. Lee, H.K., Willi, M., Miller, S.M., Kim, S., Liu, C., Liu, D.R., and Hennighausen, L.

(2018a). Targeting fidelity of adenine and cytosine base editors in mouse embryos. Nat. Commun.9, 4804 Lee, J.K., Jeong, E., Lee, J., Jung, M., Shin, E., Kim, Y.-H., Lee, K., Jung, I., Kim, D., Kim, S., et al. (2018b). Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun.9, 3048. Nishimasu, H., Shi, X., Ishiguro, S., Gao, L., Hirano, S., Okazaki, S., Noda, T.,Abudayyeh, O.O., Gootenberg, J.S., Mori, H., et al. (2018). Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science (80-. ).361, 1259–1262. Packer, M.S., Rees, H.A., and Liu, D.R. (2017). Phage-assisted continuous evolution of proteases with altered substrate specificity. Nat. Commun.8, 956. Paquet, D., Kwart, D., Chen, A., Sproul, A., Jacob, S., Teo, S., Olsen, K.M., Gregg, A., Noggle, S., and Tessier-Lavigne, M. (2016). Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129. Pu, J., Zinkus-Boltz, J., and Dickinson, B.C. (2017). Evolution of a split RNA polymerase as a versatile biosensor platform. Nat. Chem. Biol.13, 432–438. Quentin Blackwell, R., Oemijati, S., Pribadi, W., Weng, M.I., and Liu, C.S. (1970). Hemoglobin G Makassar: b6 Glu®Ala. BBA - Protein Struct.214, 396–401. Rees, H.A., and Liu, D.R. (2018). Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet.19, 770–778. Rees, D.C., Williams, T.N., and Gladwin, M.T. (2010). Sickle-cell disease. Lancet 376, 45 2018–2031. Roth, T.B., Woolston, B.M., Stephanopoulos, G., and Liu, D.R. (2019). Phage-Assisted Evolution of Bacillus methanolicus Methanol Dehydrogenase 2. ACS Synth. Biol.8, 796–806. Sangkitporn, S., Rerkamnuaychoke, B., Sangkitporn, S., Mitrakul, C., and Sutivigit, Y. (2002). Hb G Makassar (beta 6: Glu® Ala) in a Thai Family.85, 577–582. Shen, M.W., Arbab, M., Hsu, J.Y., Worstell, D., Culbertson, S.J., Krabbe, O., Cassa,C.A., Liu, D.R., Gifford, D.K., and Sherwood, R.I. (2018). Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651. Slaymaker, I.M., Gao, L., Zetsche, B., Scott, D.A., Yan, W.X., and Zhang, F. (2016). Rationally engineered Cas9 nucleases with improved specificity. Science (80-. ).351,84–88. Suzuki, T., Miller, C., Guo, L.T., Ho, J.M.L., Bryson, D.I., Wang, Y.S., Liu, D.R., andSöll, D. (2017). Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNAsynthetase. Nat. Chem. Biol.13, 1261–1266. Tsai, S.Q., Zheng, Z., Nguyen, N.T., Liebers, M., Topkar, V. V, Thapar, V., Wyvekens,N., Khayter, C., Iafrate, A.J., Le, L.P., et al. (2015). GUIDE-seq enables genome-wide profiling of off- target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol.33, 187–197. Viprakasit, V., Wiriyasateinkul, A., Sattayasevana, B., Miles, K.L., and Laosombat, V. (2002). Hb G-Makassar [b6(A3)Glu®Ala; codon 6 (GAG®GCG)]: Molecular characterization, clinical, and hematological effects. Hemoglobin 26, 245–253 Wang, T., Badran, A.H., Huang, T.P., and Liu, D.R. (2018). Continuous directed evolution of proteins with improved soluble expression. Nat. Chem. Biol.14, 972–980. Xin, H., Wan, T., and Ping, Y. (2019). Off-Targeting of Base Editors: BE3 but not ABE induces substantial off-target single nucleotide variants. Signal Transduct. Target. Ther.4, 9 Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O., Slaymaker, I.M., Makarova, K.S., Essletzbichler, P., Volz, S.E., Joung, J., Van Der Oost, J., Regev, A., et al. (2015). Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell 163, 759–771.46 Zhang, D., Zhang, H., Li, T., Chen, K., Qiu, J.L., and Gao, C. (2017). Perfectly matched 20- nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol.18, 191. Zuo, E., Sun, Y., Wei, W., Yuan, T., Ying, W., Sun, H., Yuan, L., Steinmetz, L.M., Li, Y., and Yang, H. (2019). Cytosine base editor generates substantial off-target singlenucleotide variants in mouse embryos. Science (80-. ).364, 289–292.

f 9 9

o s s

a s

ne C u a s

C u e s

pu

oc c c yc

C G G G A T C AA C A C C AATA G YAVMI G T T T CT C TA G G T TAA AA C AA T AT G AA AA A G A C T A ATKF G Q A K VYIKL D L SF L C G G C A C G C G T G C A C A C T A T C T CATTA A C C G A G AG A T A G C C A G AT G T A C T A AT GA TT A T C A T G A C K G I I K QRTTNKANNTTT G G T C T A AT A TA C G G T C G AA G G A G A A TT C G C A A A C A G T G C T I MALFKEV ITA A AT C G T A S C HHLLYLPV T RV GKY AT C G G C A C T T C T T G G TATTTA A T GT G C A TAA C A G A G A GAAAT T T G T G C A G T T G C A A GT RALDERKEVP S A T A G C TT A G ATTAATTTTA C G G T T K Q A C AT G T A S H T G T C C G T C T CTAA G TATA A AT CAT T A C C C ATT T A G TA G A T G DLNEYE VL CA G A ATAA LE G H S E A I C T TAA AAAT G TAT A AT G A TTA CAA A TT G TA TA TAA C ADR CA C T C C G G T G G C T A C A T A T TA G A T G N LLN GYDL L IMPN I P SLLAAA G T TAA TAT VLEAT TA T G T C G G T AA TT A T A C T C S V GA G G C T T A TT C TT A G C T T C C C C A C G C AA A G T G A T T CTLIDKH G M KNY A G C A A A G TA C C A G GT C A C T G C A T G ATTA A G G G TA A K L SED C C AT C T C A T C C G C G T C C AT C TT G TVLDVKRV C PDL S D T C T G G T G A C C TT C G A G A A T G TA A A GT C AT A G T C T T A G A RYL DF L C G T A A A T A C G G A T T AT G G TT A K LEE H G N VT G AT G A T G A G A T G G T C A G C T T C C T G T T C G FLTL AA G C A T T C C A T A G A ATT C TKDDEVFD G K A G G E C G C T G ATTA G A TA C G TA A T GT A A C TA ATT A G C A G ATA A G T TT K KLVRH Q K K C A A G T A C G A A C C A C G AA A A CT K A E TINH Q T L C T T G G G A TT CTT C T C AA G AT C A G C T G A G A TA T A AA T A T TAT C G A S K S C PD T E LKKYV T G G A G A S G A A C C T A A C T GTT G T C A AA AA C C A C A AA G T G T A L G N TV NE G E TA G T G A TT G C T G G TATT G AC G G T AA T G G A G C AA G G A G A GA T G AA G G A T G C TAA G V T Q DP L D NT YT A TAATT G G A TATT C AG T TT C C GA C T CTTTTA G C T AT G T G C K S LML G YDKKNV T S Q RIK S A RT G T CAT T T CA CTAT AG TA A G G A G T C G T ATTTTTATATA ATT G T T A C C T C DAAEKILTEKAKTTA GA TT C G A ATA AA G AA T G AT GT A C A C G C T TAA G T G G TAAA G AT T CT G A A T TA D RVLR A A A C G CTT C C C G G C AA T ADLDLD CATKEIFE I L GVVIMD G G T G C C G A TT T A A G C T TAA T C T T G G G A T C C C A T C T T C ATTT A G TAT T TTTATAT A IKAPNLKKKNRIT G G T G C A G VRLKTIKNY T T A CTT G G C G ATT G T T G A A C A TAA T A C A TT G C G G A T G G A G T CAA C T C A T T A AAT G A A G A G C A T T CT C T ATALDIMDID V KT ATT AATA G T A C G AAT A C T G C A T AAAAT EA F Q Q P R T GD T G G T G G CAT G A C T T G A G C T C G A ATT A G A CATAT C AA CAAT G A ATT G A HFFR CT G W Y KENP I G S F NF T A C TAT G A T G TAAAAA G T T TTTT C A G A T A A CA T A T A GT T C C TT G AT CAT C C TAAT A T I S N YIE D S M EYT A T GT G A C C G T T S TKFFE S D GD

s AA A TTA

G G ATAAT T A T C T GA G T TA AT T G TTT G V T KLLKTAATTA C G A G TTAT C NPFE NAKRVEFAAA

e TAA G A G C C G TA A

GTTT G C T T C T CATTA C G A A G T G G TT DLIFKFA A A

c AA TT TA A G

GA G G AT A TA G A C A T CT G A T YNE Q S A LNF T A A G G T GA G AA G C

n A G AA

TA G A T AAAATT GAAT C C A CTTAT G A TT G G KP C G G IET S Q S FA D R SV L A SPAAA

e A G A C G

C G T TA G AT A AA G A T TA C C A AATAAT G T A DH D Q S VTY AT T AT GAA

u G T C A ATA A AT G T G A

C C G T A GA A C C A C T CA A G ATAT GTAAT TTAAAA T C T L A G A K Q I PLAK G A AT C A ATA

q AAAT A AA G C

AA AATAA A A TT G C A

e A A A T

GT T C C C AC C G G G A G A TA AA G A GAA G A C T A A T A T L Y GA G G DEV AAAT G G A C

V

s T G T C T TTAT C G TTT TA G C A A AT T G AT A

CA A C T G TT C A L G K SDDKHI K FPL SDLNA A A SELIVI HK KTT G T C A G T CTA A AA A C TATA C A A TA T T C AT I GT G G G AA T A I S L DL G R ILAT C T AA

9 A T

CAAT C T G TA A T GT GA A G ATAYDA A G T G T TT A G T AA GA C T AA G TAT KVI Y V GEK VTKIT 0 s G G ATAA G T G C AT A T

GTA C G T G C T TA CAAAA T C C KILAEL G H DIDLF G TAAT C T G AA 0 C T G A C A C T G

a T C G G T T T T C T T G G

A G AT G C L YVWDLT A T C C O C A C G T G C C G A G A C A G G T T A G A G A G A A C T G G G C T G A T A G C A T C C T A G A G G A A A A TD C A M G N G N Y F G N D G Q D K V K H A T G T G C C G C A 6 W D 5 I 0

0 1 : 7 . 7 QO . N 5 7

9 9 S E 1 8 8

1 2 3 1 B 7 4 r

e ) ]

k ] ]

n e Q

c c g 5 e c ]

i

i i a G s ] t c i i l t t ] a t

] h A 1 2

] ] ] ] ] ] ] ] ] ] ] ] ] ] ] e e p A s a s h

c c c c c l h h 2 E ( n e ] i i e

l s t e h i i i c i i i c i c i c i c i c i c i c i c i c i a c i t t Bs d e c c i n i n n t t t t t t t t t t t t t t t t n t n n P ee u t w y n

i y y P u l ts i p n r e i yr e h r e h r e h r e h r e h r e h r e h r e h r e h r e h r e h r e h r e h r e h r e h m e h B aaa ) e h [

e t S [ S [ _ l i ] c E n S

t s VAt [ a S [ k n e t t t t t t t t t

kn e t

kn e t

kn e k n e k n e n e t

k n k n e t

k e k n e k n e k n e k n e t r

kn e k n en c n e l m X

nyny nyny nyny nynyny nyny nynyny nyty S S I G a 2 S uc. o d 5 n

3y Bs g G i S i S i S L N B ru n a d D l [ i l S [ i l S [ i l S [ l [ i l S [ i l S [ l S [ i l S [ i l S [ i l S [ i l [ i l S [ i l S i l S [ - C S L

[ [ N N U B [ P T n H [ E D ( S [ S S t U U

G S G G A AV G R F Q V VA S S S L Q H V F D S R E G T G R A T G M G M F V G R L Q S T G L S KE S E L P R M I W G G L Q T F G G R M K YR S E P T H PE S L H I K P S G G R K G G G E H I V R F S S L G N K EN S S A V G R H G V G G G P S A Q G G G L D Q V A LH S S L VN E A HA S G V FA T R KR G G C A Q VT T T KV G S T LY S T A Y F DR K K W T G S L S S A D D Q PV E F G E KF H E V P G A G G G E G Q A V G K N S V Q A SP V F G S S S S S G S S R LR A L Q A VL G T P ID M E D N PL G G G A NV E S T TP VV S P G S S S S E E M WA W VRN C V S AV K YVR G Q G G S P FV AA R LD AK G G G T YW LD P LA T AAAA S S P A Q DALW S S G S L M G AW L RVR G S M Y S G PR D L G S G G G E E R S K IR YT A G IL LL S P T I IV EP E DD A S S S T G K YY RE I AD A D G G A P N VF P Y VA G G G S A YR F A SL E L K H E G S S E S G LL LR V Q L G NP S S S G P H E N S T G K F G Y G S AA G RF R Q V P G G T G E L S PR PK K D IFE T S P P D Q KA I R L A S RDK A S A E I MW Y M C L LE DEW S G T V F P Y E G S G S L E G M R Q R R ID GE L Q DI S S A M G A TV E G AE E P T S E S T G W R TW A K T S G P P P N H G AA V P G A K SLK P G G E A K R 4

V IL E V P G LE T S T Y P G R A FRPK G S 3 E S S S S G E A A S L PL Y 3 S P T V QR VD L S S K Q / G 7 G G G G T G E S A RA K K G P L I G PN S S P P A Y V S FI A S L S PEL 4 2 S S S S A E D S G G A EV LD LP G G G S T T V VR VD DH T G L E SN G G G G S L R S V L VPPT S S L HA E H VD E G RF KT EP S S S S E S T G M AP K S G E E P E V A L T HD A S D EP G G P P E E N E E A WI V PF S PI EP L Q G VH S T T T E DP YF TP P Q AAPR S S T S D EE KL KE P G A S A S E P RC T T P L LP EN G A Q G G P T S G L K I D G G K V F S E S E S E P S E L G V N N C TA MP K S S D YR M G T E G R P MVA F G G T G E T A Y A E AA G A ID P F C R G P P S P A Q L A T T T T TW R K N RD I Y S S T Q N V L G S E P S A G H EV TR K S A V E S G S S P V VA VW D Q H AH Q RL A L G G S G A L IE S G S E V R P G VD A Q S S A S P S I G R G D G W Q W Q G S D S E G G N P S S P P G S G S RA G A A G G S Q Q Q G RE G G G G G E S E LL E L Q E H SP F EHA S G S S P P P ED R V H Q PY S S S S A E K F T G E G T Q G S G G ) T G R A

Q VK A Q A T Q V P G G G G G S G T S N L V EL G ND AD T S G S ) S I G I P M I Q TRA

Q Q V

G

n S S S T G L V L F S A AF H

S S S S E G L N DD V G LH

) G S G P P U ( E RK AP A A G R N TIR S G G G G G E G G N

G ( C E RN IA V Y G VK S G S S L s V VI LL A G S G

G Q G S S T S L L VR S S S S S E s Y r E LA D A G G L VKD S G S G P l T o E VL P G G F Q AAVI ( G

G G G G T G G G T a E t P A P C G PDA - S G S S S P n R L Q P N

G DE P N K G S S S S S A S S i b M L Q C K M R AEK

CA S G E S G S G E E G S E g R S G P I

FL D S Y Q T S K P P P E P E E i G L

S P s K i h A LL RP E G P D C T G S G S T T P T T A I S R K S R A G A K T P V A Q A G W n i A DE E Q Q AL H Q M A S S G S G A S A T S A A S E A n

S E

o R Q L P G EF PP AL RTY E G S E E i V e I P RK D AVKW S S S G S E S E S E S S t N s V V S C I Q G LVRT S

E G T S T S G S a K G C Q G

G G G S G T G T G T T a L AL V W EVAFW

SL L G AD P L P S G P P T G T G z l Q E

G P P P F Q y P

G Q R S S V Q T S S S T T E T T i R N L

l K P L

Y s G DD AK N G E HF E E S FK LK V N D S A P S PV S S G G G S E G T G S E S E E a L o

T S A S c F c E FL EA L T V S FP G S P K y K EV I I G L S n ) G S S G G HT G

S S G S G S T G S G o V F P - P S S l R

G R l E

g I G L R Q G V PM S Y

S Q I K N Q VE YV N S

n n ) n ) S G S G G G S G S G S G S

S G P S G r V N I LA AT N FALT ) K K n G G G

) ) ( G S S S G G G N S A A n S G a K M R RP I TY Q

S G S A G S S G S S R L l i D S T T G V V G D Q L KH n A A e

S P l K c L LK L S D A N

S W K 0 ) G G A A S G S G S G G G G S G S G S G S

S G G S S G G G c K L S a N R AV R S A A G P Q L

GR 0 G ( S ( E ( E ( G ( G G G G u K D r T H G L LL A A IE W S

( S S S S G S G S G S P G S G G G S G G N P M u M K T D L M G O

G M R C M E V W 6

5 0 0 1 7 . . 7 5 7 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 9 0

7 9 8 9 9 9 0 0 1 8

1 1 0 1 0 1 0 8

1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 B 4 7

VI S VTKITW KEIFE G KKNRI Q G G G T C TT EKDDIDLFN N T SIKAPNLKNY T T C A C ATA C G G C TT EL YVWDLI RLKTIKD V KTI Q TV A C AA C C C A GA C G TT AT A A CA G G G AA A T C C C T C C A C AT T G T G G A G G G C G G C FL Q G HE P V SALDIMDI F P G R DL A G C A G AA T GT T A C C T T TA C G G AA A G A C C T G A G A C T C T A A ND D KVK SVIKIYY HFFREA I Q S F A G G C T G C A C C C A A A C T GTTTA G AG T GTA C T C C C G C A G C A T C G T C ATATAAT C G G WH S G LE ILN G W Y KENPDD M NFA SEYWK T C T C G C C G C G C C T T G G A G T T G AG C G T T C T A G T G C CTTTA C A TT T A C PY RR E V GEN V I S N YIE DKLLKPR C A G A C C TAA T G AT CATTA C T C A T G A A ATTT T V NITKE C L F S V TKFFE G S KRVEFKKA T G G A G C A TA A A A T A TT A G T G A C C G T G C G C I T Q G AIL NALFKFAYK TA T G T GAA C A T G C G T C C AA C C A G C G TT G C G G C G C C C A C C G G C A GT G G TA TT C G A TNNPFE QER T C C A T DK G E Y YNE Q S DLF R AEK G C A A T C C A T T G TA G A G C T A C A T A C A T C C T A C C T A C A E S L Q K LDNKEK T KP Q A LN D V S L PPP C TA A A CAT G C A C C A C C C G T A C T C C G G G C EAI RR W G C IET S S FA S S Q VTY G C C T A C G C T G C A A C CA A A G AT G A T G T G C A T C T G G T C C A C A G S ND E NTA QEEEILYAH A DIPLAK A A G S C AT C C G G T C T G G C C C A T A A C AA KFEDDILPE Y G L A KHV FPL S D A A C C C G G G G C T T T G C G G T A A G T CAAT G A C AC C C G C C A C A T C C G G C G A C G A C T G CA C C C C C RRKVYLFKR S G G L A G G G KDEI S K DLNT G A G C A A S T C T C TTA G A G G AT C A G C C C C G A G G C TA G A A T C G G G T AG C C G G TDFYKPDDP V S L DDKHHK L A A A A G C AA C C T A G G METMTRIRD Q E S I ELIVI DL R KTL GILLM G G G C C C C A T AAA T C AC G G G T G C C TTT C A CT G C C A G G C AA C G AA C A G A A G C T WV DNKPHFEYDA I S L VTKITMK C AC C C C T T G C CT A C A G C G AAT C A G C A G A T TA AC G G T CA G G C A AT A A G A L KFTKVI Y V GEKDDIDLFVI T G G G TA C G G G G C A G AA C A TA C T GA G G C AA G C C T G T C F G S S D R MRN G G T A TAT T RIKNVMKILAEL YVWDLNK TAT C T C G G C G A G C A C G TA A ATT TAA G C T G A G C C C G ATAA A T C G A T C G RID YE L Q G VKHEN G C TA A A G A G C T A C G G A G AG T CA T A A C C A SEH N S Q D E E GF I N G Y F GND D K SVIKID G G C G C T A G T C T C C A TTAT S G N A A F Q D T T G T C A C G C T A C A T A C A C A G A G A G C G N LLN S EI FFNWH S G LE E C C A T G G G C A G T T G A TA C G G A TA G C T T G TTA S C AA A AT C C AT G V I S Y A S LHT G G I KPY E VI G A G A AA G G C A T C RD Q L Y Q I LRVP P G L V RR G EN S T N TTA G G AT C G T T C C C TA C A A T C G T G T TA C G T AT C G A A T TTA T C A C AC T A F MLATEKE G V S S HN S T QI T Q NITKEE S T GAIL C TT G A C C C C C A A C T T A CT A A G T G A G C AATA C A T C A C A G AG A G A G C A G A T CTA C A T C C G T T G G C G TA C C L C FYVIMDI RKDT E L DK G G Y A D QY D A C T Q C A G G G G T C T G G A T G A G A T T A G A T T A TT A C G TAT PENLHEILR S S G G EKFE S Q K LDNKEAI C A C G G G C T A GIRKKTTNRI H F AI TARTV A C G T C C ATA G T C AG T G C TT T T C T G C AAA A C G G A AT T A CTT T A VKNETK RL S S K G E I S E ND E N QEEEIHL C T A G T C C T CTT C A C C A T G A G A G T A T T G A C C T G C C G G C C C C G A GA YKA I A LDEKPEKFEDDILPVAA A AT C A C A C A G T C C T T A A A G G C A C T G A A A C C T YF N I F G D QFLATRPD RKVYLFKLW A G C G T C G C G T AAT C T C T C T A T C T G G A G C GA A C G C A A C C TTA T T A C T TA A TAAA A AT PY F Q G RNLLPLTE L KR QYTDFYKPDDIP G T A G A C C G C C GA G T A G T C A C A T G C A AT C ID L TMEID G AKMETMTRIR KAA G G T G C A AA G A AAA C A A G G AT C AA C A AAT C G A A RE D Q S TEIKVV Q A E SVIEWV A G TA T A GTAT G C A CT G C G T AATAA G A T G A GT G A NKPH S D Y A C T G T A A G A TA A A G T T TT FKKNV AR LPA L D G S MRNKEE A AA C AA GT G T A C AA A C G A G A A T LE N V R VN E L S D G R KNPP AA C C G G C T G C TT T A C G T A T A C G A G C T A G C C A G A G G A A C G TA T G G TA T T GT C A C G G A G L L G Q FV L S S Q YK S K F NLF S D R I S EYKA A C T C C G A T G G A A A T G A T C G G G T A T C G ATAATT C T IKDPRFLF P R E GP T S G EL Q Q R I SEH Q N D G E F A TA G T T C C C G C TAT C C A C G TA G C A CTA C AA CAA G G C T C T T A C T AA G C A A A C T C AT T G G T C C KVLHKFK TDPERR ILLN AN D E S S G S A C T A C AA G A TA C C G G A G ETIEIY A G G T AT T G A AA G C C T C A G G G TA A A GAT G C A T G G A K S EATL N V R V G S L Y S LIT A C G T G A C AAA T TTA G A AT C A A G C T AT C G C T A A C G AA C G 0 IVTK Q Y I A QLRVVL G C A T C

G 0 KK F K S S HER S LRD T G C A T G G C G C AA C G TA C G C G G AA G C KL G A I S H MLATEKEL T C TT C C A C T C O E R G I A T K Q G E M Y S H G F K AA S K L C F F Y V I M D E M A C C G C G G C C G A G G G C G G G T G G A G T A G AT A C C A G T G A G A C G G T A T A A T C T C AT TA G C G G G C T C A C A C A A T T A T A G C C C G A 6 W 0 5 0 1 7 . . 7 5 7 0 1 9 9

8 1 3 1 3 1 1 8

B 7 4

C A C G C T C AD S FE NALFKFA HHKDDEVFDRH

T C A C T C G C G T H A NP S S G T YNE Q DLF D R L A RP A EKLVNH T Q KK QL I C C A T GT C C G C T C G C C R G P LN S S V S P WY S K K S K ETVKYV S T K C T G G C G A G K CIET S Q S A FA Q VTY RPD T NLK E G V TN T C A I S G MAH PLAK G A S I PV L G PTV D N SNT Y C C C A C C AT G G C AC G G T GT PD L A DI G A V TAAT T K FPL HWK T Q D SLMLL RIK S A R G E A A C R V Y T C G L A G G KH SDLN RHYDKKNV Q T TEKAK C AA G C G G C A C T AN S P L G KDEI K KT AEVTEKILLRVLR S N C G G I V S DDKHH G A A T W S ELIVI L G R IL G EDLDLDD VVIMD GT G A T G A C A C A G G G S E QFYDA VI L D SVTKIT W FE G I KKNRI Q D G G C C G G T T T E ATKVI G Y KKDDIDLF N N TKEI SIKAPNLKNY KTI A C G T G T A G A G KVKILAEL VWDL I TIKD V C T T T A C C A C T C T C AI D N KVKH E P VRLK SALDIMDI F Q R TV GDL T C V K Y QL S G N FL G Y Q G G Y ND D VIKI YY FFREA I Q P SF NFA C C G A G A C A G G T C C T G G RAT FNWH G S S LE VI LN W H GY ENPDD S M EYWK T G T G NKA G F GI R G E EN V N K SYIE DKLLKPR CT C G G G A A C G A A NID L KPY G T V R ITKE C L F V I STKFFE G S KRVEFKK T C G A A C C C T T C G A TH S P SHN Q S I T Q N GAIL NNPFE A G A C G G C C V E I QL KDT K G E Y T G Q A ER S NALFKFAYK QDLF G G C AT C TA G TLRR G R GEKFE S L K D QLDNKE K T YNE D R L AEK A G T C A A C T A G C AVRY S H E F AI NTAR R W G KP CIET S Q A LN S PPP SFA S S V QVTY A C G G T G G N S ND E T C C A M Q EEEI LYAH C C G A S K G I E G G A C G G C G R Q EKPEKFEDDILP E L A A DIPLAK G A S GKHV PL D G S G G G T C C TTVFMPD YLFK R G G S L Y GA G KDEI K F SDLNT G C A G S G G T G A C A G A PFVT L KRRKV QYTDFYKPDD P L G DKHHK KTLL CT C C T G A A V E KMETMTRIR D E S D Q I V C AG C S ELIVI DL G R ILLM G G T C T C G E S D G L A G A SVIEWV NKPH FEYDA I S L VTKITMK A C AA G C C G C A C T G RL LPA L D MRNK FTKVI Y V GKKDDIDLFVI C C T C TEL Q G E L S LF G S S D R C G T A G C C G C C C A G TADAR F N R IKN VMKILAEL YVWDLNK S Y E L Q G KHEN CAA C W L T S E GEL Q ID Q S R EH Q N D E E GF Q I S D G N G N Y F GND D KV SVIKID C A G G ATA GAA C A C A APERR LN A F S EI FFNWH S G LE VI G E G G G G T G C G C C C T C G C C TREMTL V G N V IL S Y I A S S L HT G G I Y TAA C A A C C C C G KDI LRD Q L Y Q LRV P L KP E V RR G EN S T G G AAE E R S R SHKAA ATEK E V P G G S S HN S T QI T Q NITKEE N X X S N N GAIL T C C G T C C G G G G C AC G T C G T LLA KL F IL CFYVIMD I K G G Y A D QY D N N GEKFE L K D Q N N G A A C C T A G C T H G FA S L ENLHEIL R S RK E G DT S S Q LDNKEAI N N G C G G T CAA L G I A F L L G P IRKKTTN RI H AI ARTV N N T G A C C C A AET S S R GD TVKNETK A RL S S K E F GI S E ND E NT QEEEIHL N A G G T T A G C G T C C T G G C HTP D S A LYKA I N I F G D LDEKPEKFEDDILPVA N G N G N C AT G C C C A C G T RID S G VL YF F Q Q FLA TRPD RKVYLFKLW N X N C C G G G MEH D Q PY G RNLL PLT L KR DFYKPDDIP N N N A T T A G G C A G A G C A C WV S KI GAAHID L TMEI D E Q YT ETMTRIR K N N N T C A C T G R G L HRE D Q S TEIKV V G A G AKM Q S VIEWV C C T T C C Y G G C C G G AEHI G MK SEAEFKKNV L DNKPH S D Y N N N N V R AR E LLPA G S MRNKEE A N N M C G T C G C G T G H NAD F G KNPP N M N M C C C C C C S N DDT LE L S S S S D R QYK K VN G S F NL RID R I S EYKA R M N M T C T G T G C G C T C C C C TAF M R YL Q L FV GN T S V SF RIKDPRFLF P R GP T S E GEL Q Q S EH Q N D G E F TAAAAT C P G KKVLHKFK PERR AN D M N M G C A WS I S C C T C C A E GVH G Q AIETIEIY E TD V ILLN G G C G C C T C A H A Q E N K S EATL N S S S G S e M N M

SLIT M N M 0 A 0

FK S ER R V G S LRD L Y QY I A QLRVVL i d M N M

G T G A T C T C T C E L G E KLI M IVTK SKK G A Q I S H SH HKAA O C C C T G C T A G T T C S C M V I K Y P A E R K L G I A T G K E V M Y S G F A K L F ILATEKEL M N M C F Y V I M D E M G u M N M 6 W

0 5 0 1 7 . . 7 5 7 2 3 4 5 6 9 9 3 3 3 3 1 8 8 1 1 1 3 1 1 1 B 7 4

W W A A G A A G G G X X A A X X G G G G X X X X N N N N N W N N N N A N W N N N A N G N N A N G N N A N A N X N G G N G N X N A N G N X G N G N X N X N X N G G X N N N X N N N X N N N N N N N N N N N N N N N N N N N N N M N N N M N N M N M N M N M N N M N M N M N M N N M N M N M N M N N M N M N M N M N N M N M N M N M N 0 N M N M N M N M N 0 N M N M N M N M N O N M N M N M N M N 6 W

0 5 0 1 7 . . 7 5 7 8 9 0 1 2 3 4 5 9 9 7

3 3 4 4 1 4 1 8 3 8 1 1 1 1 1 1 4 1 4 1 B 7 4

a g a t t t a KFIKVN Q VKKIHHLLYLK c a g EE S L A M RALDERH g a g t g t t u KK RT

g u II G Q LFLPPN S D S DLNEYE t g a a g YILFKREVL F Q

QHTADRLE G R

t a c a a

g a a a c EHLDEK Y G N TI

c t c c c KANEYK Y D

G AHPL N LLN IM GYDL S L MV

t g c A g G G c IM S D DAL LIDKHEK a t A a A A u DLENI

g TALK MV HLA S T VLDVKRV a A a IDI S L EK G K A LNKRYLPDL a A A t T T

G A A A u G N YEKHRVRH S I ELFLTLLEE t a A A A c LIDEKDLNHE KDDEVFD A G A c A A Y EEKY N V

A g VL LP G E

A a t G G g g RR S FLFDAN S K K A EKLV

SK S K

A c c a a a a s MLDNVLV NF K K

Q PD T ETV GNLK

G t t t t t a DK TI S S V L S T V Q L DPTV a c c u n K T

c c i KAA E K

SELKVE G G K S LMLL

t t a g c g g a e AK T t c T LLR E T

GYYDKKNV Q

a a a t PE F DE

Q Q VI Q T EVAEKIL

t a g t g a g g a o VKL E P

QLV Q S RKI A R SKDLDLDD

a t t a a a a r KKPKYILNYEARTKEIFE G I g g t t t t u p YDTTFD D TLDIKAPNLK a t t t t t E G I F KLVVRLKTIK a t t t t t u DAKLD

g 9 DLDLNLK I G

SFKRIALDIMDI

c t a s TEE EVRK

t t g t g t g t t KD

g a I S IDKI W HFFREA

GY

c N N N a IKL K N

ST EAKRN R D GF N KENP

t t N

g N N N N c C NLM

g I R

SLFYRNPIF V I S YIE STKFFE G S

c N N N N N a l YKHD F N NF

QEKNPFE

a N N N N N u a W H

GYKYIP G S N PLF S NA

t N N N N N a n KFFNA A S

QV T YNE Q DL GKP

g N N N N N a V I

DLPL Y EA SFAIET S Q A LN SFA

t N N N N N o S T FAA

a i N S D E EL I L PAH

t N N N N N g F G E I S N V S L L A A DI t N N N N N a i t T F

GK A Q Q N S A DVHKKY G A L Y G G KH GA G

t N N N N N u d IE N TKKVDLRK L G KDE

DDKH

t N N N N N DEITVP S L I V S

SELIVI

g N N N N N c d DH G E

g A LKDVTLTYDA VI S L N N N N N N X YV G K K ILYIAIFKVI G Y EKD N N N N N N a G DIED

g 4 LIVKN FITKILAEL NT N N N N N a I K

SEK LE Q K G NT N N N N N u e Y L G Y DL D S VHLFD

SLKKKL N N Y FL Q ND NT N N N N N u P Q E FAEDHNRKDN M G G G

SFFNWH S G 0 NT N N N N N u b l KLE FH IAI KPY 0 NT NT N N N N a NTA G Y NY S DIKA

GIRDRI L O N T N T N N N N u g T M A F D W T A D R W I N A P G T Q V

G H N Q S I G T A 6 W

0 5 0 1 7 . . 7 5 7 7 8 9 0 1 2 3 4 9 9 6

4 1 4 1 1 4 4 1 1 5 1 5 1 5 5 1 8

1 1 8

B 7 4

EVP S D K Q EDKAK T P VKF G Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L S I ELIVI L G R ILKF G D Q R YAIKL S D F G L E VL S HVKEKA S V I G EKK KVYNKANNTKK VYNKANNTYDA VI L D SVTKITKK VYNKANNT SA PINKY NFFNI I Q I RTTEV RVI I K TTEV KVI G Y EKDDIDLF I K I TTEV PN S I LLALY S L ERIDR S MALFKPV G T KY I I Q R SMALFKPV T RVI GKY KILAEL VWDL I I Q R SMALFKPV T RVI GKY VLEATEREL EKNHHLLYLKEVP Q S HHLLYLKEVP S D N N FL G Y Q KVKHHHLLYLKEVP S NY LAKELN K N GIK LRALDERH L S K HRALDERH VL K Q S HA G G G Y ND D VIKIRALDERH VL K Q K S H DA S EDF KHTDY S K LDLNEYE E V SA YE E A PIAFFNWH G S S LE VIDLNEYE E A PI SDF N LL Q L E TR ADRLE G R PN I PIDLNE SLLTADRLE R S G PN S I LLPI L KPY R G E ENTADRLE R S G PN S I LL KH G VRLN G T RE E YDT QKN LLN IMVLEAT GA G G ETIEETN G N YDL S L MVNY N LLN L IMVLEAT ITKE SMVNY LA V P G T V R QN GHN Q S I G T AIL N LLN GYDL L IMVLEAT SMVNY LA RH K E QK V E GAD ILTL IDKHEK K LA G YDL SEDLIDKHEK A S K EDHRKDT K G E G Y Q A LIDKHEK A S K ED NH Q T L R G Y ELEY V L GVLDVKRV D A SDF KRV S D DF DNKEVLDVKRV S D DF KYV T AV E S DYTKEYI IKRYLPDL N LVLDV PDL H N L GV I EKFE S L K D QL GH NTARKRYLPDL H N L GV DNE G TIKILIEE G N VFLTLLEE K H GA G G VKRYL G EE G K A G G VK E F GI E AI SND Q E EEEIFLTLLEE G K A G G SNT V RKPTRKDDEVFDRH K EFLTLL FDRH K E QKKKPEKFEDDILPKDDEVFDRH K E QK RIK A YVE SRDEL S N EK A EKLVNH T Q KKDDEV QL K A EKLVNH Q T L RD RRKVYLFK KLVNH Q T L TEKAKRLN NT M EY SEL S K K S K TVKYV S T S K S K TVKYV T KE L K QYTDFYKPDD K A SK K E S ETVKYV T LRVLRTAN Q I EY D T E LK E E S G TKEAKMETMTRIRPD T NLK NE E S G T VVIMDE LE Q T S K E P L G NLK E G E TPD T E TV D N SNT YKVIEWV NKPHV L G Q DPTV S D NT KKNRIY S K DIRI A Y G V QKNK T Q DPTV D N SNT YV L G N SLMLL RIK S A RK T Q DP SLMLL RIK S A RPLLPA L D S MRNKK S T LMLL IK A Y SR NY KTDNNEN KYIYDKKNV Q T TEKAKYDKKNV Q T TEKAKA NLF S G D G R D V F Q R TI L R G R ELKEVAEKILLRVLREVAEKILLRVLRM S F E R IKNYDKKNV T R QTEKAK S E EYEVAEKILLRVLR I Q P G D R A Q E KK EPIDLDLDD VVIMDDLDLDD VVIMDKEL Q Q R ID SEH Q N D G F DLDLDD VIMD SF NF I S G NT Q G FNLTKEIFE G I KKNRITKEIFE G I KKNRIDERR V LN A DD S M EY I YLMK KAPNLKNY LKNY L N IL G S S S TKEIFE I V GKKNRI DKLLK Y Q G EL EA M KDI QENVRLKTIKD V KTIKAPN Q TVRLKTIKD V KTD Q TDR R V SLRD L Y QY I A S LIKAPNLKNY QLRVVRLKTIKD V KT KRVEF D S E NLN NALDIMDI F P G R DALDIMDI F P G R DDH AA ATEKALDIMDI F Q Q P R T GD LFKFA V K S ANINE G D Y HFFREA I Q S F EA I Q S F F S K KL F ML CFYVIMD REA S I F NF F D R AT S T NLMLM W Y KENPDD M NF HFFR SEY G W Y KENPDD M NFK SEYY EIL W HFF GY ENPDD S M EY S S V S L P YN V Q Y F G S I S N YIE DKLLK I S N YIE DKLLKD F LL ENLH SRL G P IRKKTTN N K IE KLLK QVTY I L AT K I QHKEA S V TKFFE G S KRVEF S V TKFFE G S KRVEFID TVKNETK V I S Y STKFFE S D GKRVEF PLAK A G E GI YLAKKMINPFE NALFKFANPFE NALFKFADD S A LYKA F I A GDNPFE V FPLD G N KEDKAIF YNE Q S DLF R A YNE Q S DLF R AHVL F Q N Q I FLA S NALFKFA QDLF I S K DLN T S P G T KP L F VE KLE G T KP LN D V L A LN D V S PDKI D Y QPY G NLL T YNE GKP D R AK T G L D S Q K F STDK ET S Q S A FA S S Q VTY ET S Q S FA S S Q VTY AAHID L R QTMEIIET S Q A LN S S V L A SP SFA Q VTY DL R K GILLERINELL A I QAH L A A DIPLAK A I GDH A DIPLAK A K GYMKHRE S D TEIKVDH VTKITIELDK Y G KHV FPL Y G L A KHV FPL EAEFKKNV A DIPLAK G A DIDLFYVE E Y K SK K Q S N G L A G G G KDEI S K DLN G L A G G L G KDEI S K DLN G D N V R L Y L A G G KHV L S S A G G KDEI K FPL SDLN YVWDLNEE S E RIWYI I V S L DDKHAK KHHK N DDT SVYL L LE QFV Q YK K G S V S L DDKHHK KT DKVKHRNP ELIVI DL R KT V S DD GIL S I ELIVI DL R KTD GILHF RIKDPRFLF S I ELIVI L G R IL SVIKIKVF S A QL K E SK T K S Q KYDA I S L VTKITYDA I S L VTKITDI S G KKVLHKFK DA I L D SVTKIT LE E VIMNYYE PLKVI Y V GEK DIDLFKVI Y V GEK DIDLFK Y E Y SKVI Y V GEK DIDLF 0 RR G ENHH II D F SIDKKILAEL G D YVWDLKILAEL G D YVWDLY E AIETIEI QN VTK K S K ILAEL G D YVWDL 0 NITKE T TI HKD N FL VKHD L VKHDLI M I SKK L G F A I K QD L VKH O I L E Y A G S V C G H L I S F S D H A M G N G G Y N D G Q D K S V I K I M G N G N Y F G N D G Q D K S V I K I M Y P A E R G K I A T G K E M G N G N Y F G N D G Q D K S V I K I 6 W

0 5 0 1 7 . . 7 5 7 5 6 7 8 9 9 9 5 1 5 1 5 1 5 1 5 1 8

1 1 8

B 7 4

KK Q Y TELALPKN S K K S K T PLPTL G Y EKDDIDLFFKDRDKT Q W F G N F KF G D Q R Y II KEKK PD T ETVKYV S TF GNLK E G E T E E DPL SNLR TI G D R NFM QFDEKLVP V L DPTV D N SNT Y A G N R DY Y E AEL YVWDLLVYAPDIN FI KK KV CDR Q I L Q G VKHRINEFEEE G Q Q RT KHAKIRVWEEA K T Q S LMLL T RIK S A R F LVR Q EKKLDLY Y F GND D R SVIRI HADYFREV G E I I I GE G MALF KAERTYVFPDV YDKKNV Q TEKAK N R SYD Q I EMMKR H S G LE H T YE HHLLY SLALYN I S YAV EVAEKILLRVLRFKDKDIV M S G L NW Q KPY E VI G S L S N D G T I VP I R S VL C KD RALDE TALYRE S G KTY DLDLDD I VVIMD LVYAPKK Q FFI T V RR G ENW AFVTILVAHL DLNEY DLLNELKMAIK TKEIFE G KKNRI RI TN M Q S I T Q NITKEM GAIL Y A HLEKNDLLIVY TADRL GYDNYRP PNLKNY T H N EFK SPYDI G G G G L DT E YLAKY VYDE LLN FVNALRK K MLD IKA QIVA VRLKTIKD V K T G S H T E S E IYI FE S L K DK G G Q H QLDNKE IVLMI Q A RAAY G N YDL S L VLLD K ILAF ALDIMDI F Q D WL S D D T E CFEPLKD F L TAR G N LI IWNDRR LIDKH KRDI S L I G Y VN FREA I Q P G R SF F LAFV VVAHD I E A SND E N QEEEIKRD L R GEK DHYK VLDAK MLD AA Y E SM W HF GY N KENPDD M N SEY HLKK Y V ST DILPKPTKIE G I KK KRYLP KDE K HLM QKKV S YIE DKLLK YLERL I VVVL EKFED QLI D N KRRKVYLFKLD I IR F A SHNL FLTLL KAYLP E VEI SVVK V I STKFFE G S KRVEF G K IK LVY S D YTDFYKPDDLA G D Y G F LM KDDEV TPTIL R K GIPAAD NPFE NALFKFANLI I MD CRY RARY KMETMTRVREKKKE L KDH QIDFPA V EK PRDEVEI E Q S DLF E NDN EKYAYAKDVA S K K S K VD EPFE L NKF Q G LAL T YN GKP D R ARR KKI S KIIWDHK D EWV Q PA L DNKPHY S MRDKHKVFT TY PD T E KN G K TEIDNYYE IET Q S A LN P KP S S FA S S V S L QVTY YN ETTVTA DH A A MDDV N G F ANK LF S D G R KNEN N PIN SHVY D R I S EYD E EA S LIAI V L G N DA S F PPLIKD G K GF L A DIPLAK G LA Y G F E G I DHPI Q Q R I SEH Q N D G E F E Q EDA G S A LK T Q DP SLML CAVINA S D QRT S E KYDKKN DLL RHYE L Y G G A G G KHV FPL RK Q G KEIR GKDEI S K DLN HEKYALKDYVA R ILLN A M LLPDILPN T I LTE GYKLVAKA L DDKHHK T N FTDLKDVN V G N S V S S V EITV I IYDAAEK GLKDLDLD V E K SHKLI NHVK I V S S ELIVI DL R K GIL M T I SPEAFAVNAY LRD L Y S LA QY I A QLRV KMD E KLK GKIYK VRELDI Q S EHPH YDA LIKI AA MLATEK G W R K Q I FVHI KK TKEIF ALETNDLKLDD KVI Y VI S L VTKIT D DDNP GEKDDIDLF V C E C E VV F S DTDL KL C FYVIMD L S K PDIEDK E C S IKAPN S FVRLKT HHDKIE KN KILAEL YVWDL VLLPD Q S G S Q R KL NLHEIL S V HIYYKD H Q S LDIM GY KTW G I IN L E GL D FL Q G H AAIAVDAPN V A GIY L P E N Y G IRKKTTNNYNKNTY I Y QRAE L A G HFFR VI G T TMKKTLE K A G N G G ND D KVK SVIKI KM VKKLKTVKNETK FIWFKYEDNT G W Y KE STLYR LAK Q AFFNWH S G LE I G W RK K IIL S IYDAKLYKA I A K NKVKK E K SIVRR PI PY RR E V GEN VLTP Q D K R GEHIA T YF N I F G D T I GT Y PDF N L V I S N YI D S S V TKFF THKYINAKVP G E L K T Q V NITKE DHH Q C Y Q PY F Q Q FVAIP G L G LI T R S P S G TKFFVPNKLD V P G G HN Q S I G T AIL S HVRYFV NAK L RNLLDY G LIW G P L Q F L VYRPEK V Q L T NPFE S YNE Q S IDFDDE ERLK HRKDT K G E G Y A G YNNNIE Q IMIWNE I Q KDE I HID G HRE D Q TMEI STEIKV L K GDLNEAID S H TRI G T KP DH S S EED G I LKA E S L K D QLDNKE G T TKYPTYYV T EFKKNV AH VAFIET S Q S A L Y G S K I EKF GH KR N N SL LE N V R S I Y SYKL Q K S L D IL QDENTTDH GLM S E A NP SLLDF G N SV VK E F GI E AI E NTARLP NLFT SND Q EEEI DY G N YIDFDFLV P DT S YL Q L FV L S Q YK S K YTTKK MLAFVT Y L A G A I L TL D KKPEKFEDDILP KA SE N V G EE Q E K S E D W SY RD RRKVYLFK G L H Y R I G Q W G KL EV DPRFLFRKFRD S T LYK G L A G G S T Q L Q RIK S KKVLYKFK ENE N PR SKVTEN V L G K SDD YEV G K KPVLRKL KE L K QYTDFYKPDD I Y G I D S H LRI IETIEIY E ID KDF Q F E T DEMDDATFFPLLKEAKMETMTRIR C YT K NPD SETEILNAF IVTK K S DN ELIV YKKLE G S S V G TL I S KRMII K S VE WV NKPH YTVK DNFNV S M KK F K S I KF ARKDIYI P DYDA 0 K K K M G V A Q S Q LV K KVIE GPLLPA L D S MRNKDYL KL P LALVI ER K L G A GIAT K Q ND GEKKAFI N S IKVI Y V GE QTNKILAE 0 L G F G Y EMLPAMLLA N F S G D G R VDE E G S S E C T S KR N S A RN KVW Q KI N Q S M TEVKD O M F L A D D T I L E H D M S F E Q L R I D N R IKN S E E Y I N K D L S T K I Q Y G E N L N T S KAKIAE Q S D D K E I M I E E A Y K K N L Y H M G N N Y F G G N 6 W 0 5 0 1 7 . . 7 5 7 0 1 2 3 4 5 9 9 6 6 1 6 1 6 1 6 1 8 1 1 1 8

B 7 4

AIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAI YNKANNT KK ANNT KK KVYNKANNTKK KVYNKANNTKK KVYNKANNTKK KVYN TEV I KVYNK I KPV T RVI GKY I I Q RTTEV SMALFKPV T RVI RTTEV RV GKY I I Q I SMALFKPV G T KY I I Q I RTTEV RVI RTTEV RVI LKEVP S HHLLYLKEVP S HHLLYLKEVP S S MALFKPV G T KY I I Q I LFKPV G T KY I I Q I RTTE QHHLLYLKEVP S S MA QHHLLYLKEVP S S MALFKP QHHLLYLK RH VL K Q S H RALDERH VL K Q S H RALDERH HRALDERH S HRALDERH HRALDERH E E A I PI DLNEYE E A PI DLNEYE E VL S K SA IDLNKYE E VL K SA IDLNEYE E VL S K SA IDLNEYE E R S G PN S LL TADRLE R S G PN S I LF TADRLE G R PN I P SLFTADRLE G R PN I P SLFTADRLE G R PN I P SLLTADRLE G R IMVLEAT LEAT T LLN IMVLEAT LLN IMVLEAT LLN IM MVNY LA N LLN GYDL L IMV SMVNY LA N LLN GYDL L IMVLEA SMVNY LA G N YDL S L MVNY G YDL S L MVNY A G N YDL S L MV EK A S K ED LIDKHEK A S K ED LIDKHEK K D A S EDLIDKHEK K LA N KHEK K L DLIDKHEK RV S D DF VLDAKRV S D DF VLDAKRV S DF VLDAKRV D A S EDLID SDF AKRV D A S E SDF LVLDAKRV DL H N L GVKRYLPDL K H N L GVKRYLPDL H N L GVKRYLPDL N LVLD LPDL N VKRYLPDL EE G K A G G FLTLLEE G A G G FLTLLEE G K A G G FLTLLEE K H VKRY GA G G G EFLTLLEE K H GA G G G EFLTLLEE FDRH K E QKKDDEVFDRH K E QKKDDEVFDRH K E QKKDDEVFDRH K KKDDEVFDRH K KKDDEVFD LVNH Q T L H Q T L KLVNH Q T L TVKYV T K A KLVN SK K E S ETVKYV T K A SK K E S ETVKYV T K A EKLVNH T Q Q L EKLVNH T Q Q L SK S K ETVKYV T A S S K K S K I A EKLV LK NE E S G T PD T NLK NE E S G T PD T NLK NE E S G TPD T NLK T ETVKYV S S K K S K GNLK KPD T ETV GNLK TV S D NT V L G Q DPTV S D NT V L G Q DPTV S D NT V L G Q DPTV D NE G E TPD G E SNT YV L DPTV D NE SNT YV L DPTV L IK A Y SRK S T LMLL IK A Y SRK S T LMLL IK A Y SRK S T LMLL IK S A RK T Q S LMLL RIK S A RK T Q S LMLL V T R QTEKAKYDKKNV T R QTEKAKYDKKNV T R QTEKAKYDKKNV T R QTEKAKYDKKNV Q T TEKANYDKKNV Q T ILLRVLRDVAEKILLRVLRDVAEKILLRVLRDVAEKILLRVLRDAAEKILLRVL DAAEKIL D VIMD DLDLDD VIMD DLDLDD VIMDDLDLDD VIMDDLDLDD VVIM R G DLDLDD E I V GKKNRI TKEIFE I V GKKNRI TKEIFE I V GKKNRITKEIFE I V GKKNRITKEIFE G I KKNRITKEIFE G I LKNY IKAPNLKNY IKAPNLKNY PNLKNY TIKAPNLK IKD V KT VRLKTIKD V KT IKAPNLKNY VRLKTIKD V KTIKA KTIKD V K TVRLKTIK DI F Q Q P R T GD ALDIMDI F Q VRLKTIKD V KT QP R T GD ALDIMDI F Q Q P R T GDALDIMDI F Q VRL QP R T GDALDIMDI F Q P R G DALDIMDI EA S I F F NF REA S I F NF EA S I F NF FREA I Q S F NPDD M NF SEY W HFFREA S I GY D S M EY W HFF GY ENPDD S M EY W HFFR GY E DKLLK N KENPD KLLK N K IE KLLK N KENPDD S M EY W HF GY KENPDD M N SE F HFFREA C G W Y KENP E G S KRVEF V I S YIE STKFFE S D GKRVEF V I S Y STKFFE S D GKRVEF V I S YIE KLLK N IE DKLLK I S N YIE STKFFE S D GKRVEF V I S Y STKFFE G S KRVEF S V TKFFE G S NALFKFANPFE FKFANPFE NPFE DLF R A S NAI S NAIFKFA S NAIFKFANPFE S NALFKFANPFE NA LN D V S L P T YNE Q DLF GKP D R Q DLF SV L T SP T YNE GKP D R T YNE Q DLF Q DLF GKP D R Q A LN S V L T SP T YNE GKP D R A YNE Q S DL FA S S Q VTY ET S Q A LN SFA Q S VTY IET S Q A LN S V L T SP SFA Q S VTY IET S S FA Q S VTY IET S Q A LN S V L S S P G T KP SFA Q VTY Q LN DIPLAK A I G DH A LAK G A DH DH A IET S S A FA KHV FPL Y G L A DIP GKHV L A A DIPLAK G A L A A DIPLAK G A DH L A A DIPLAK G DH A DEI S K DLN G L A G G KDEI K FPL G KHV SDLN L Y G G A G KDEI K FPL SDLN L Y G G KHV G KHV A DI GA G K FPL KHHK V S L DDKHHK KT L G DKHHK KT L G KDEI S DLN L Y G G A G KDEI K FPL SDLN L Y G L GA G G KH GKDE I L DL R KT GIL S I ELIVI DL G R IL I V S D SELIVI L G R IL I V S DDKHHK KT L G DKHHK KT L DDKH SELIVI L G R IL I V S D V S S ELIVI L G R IL I S ELIVI I S VTKIT YDA I S L VTKIT YDA VI L D SVTKITYDA VI L D SVTKITYDA VI L D SVTKITYDA K D DIDLF KVI Y V GEK DIDLF KVI G Y EK IDLFKVI G Y EK IDLFKVI G Y EK IDLFKVI Y VI S L GEK 0 L G YVWDL KILAEL G D YVWDL KILAEL D D GYVWDLKILAEL D D GYVWDLKILAEL D D GYVWDLKILAEL G D 0 L Q D KVKH D N N Y FL VKH D FL KVKHD FL KVKHD FL RVKHD FL O D G S V I R I M G G G N D G Q D K S V I R I M G N G N G Y N D G Q S D V I R I M G N G N G Y N D G Q S D V I R I M G N G N G Y N D G Q S D V I R I M G N G N G Y N D G Q

6 W 0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9

8 6 1 6 1 6 1 7 1

1 1 8

B 7 4

KL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL S D F G L KF G D Q R YAIKL KANNTKK NTKK YNKANNTKK KVYNKANNT KK KVYNKANNT KK KVYNKA V VI I KVYNKAN QRTTEV VI I KV QRTTEV RVI I Q I RTKEV RVI I Q I RTTEV RVI I Q I RTTEV V T R GKY I I ALFKPV T R GKY I I MALFKPV G T KY I MALFKPV G T KY I MALFKPV G T EVP S S M S S MALFKPV G T KY I K Q HHLLYLKEVP K Q HHLLYLKEVP S S Q HHLLYLKEVP S S K Q HHLLYLKEVP S S Q HHLLYLKEV EVL S HRALDERH S HRALDERH ALDERH L S H RALDERH L S K H RALDERH SA PIDLNEYE E VL PIDLNEYE E VL S K HR LNEYE E V SA E E V SA LNEYE E V SA PN S I LLTADRLE R S A GPN S I LLTADRLE R S A GPN I PID SLLTADRLE G R PN I PI DLNEY SLL TADRLE G R PN I PI D SLL TADRLE G R PN VLEAT LN MVLEAT LN MVLEAT LLN IMVLEAT LLN IMVLEAT LLN IMVL NY LA N L GYDL L I SMVNY LA N L GYDL L I SMVNY LA G N YDL S L MVNY MVNY YDL S L MVNY DA S K EDLIDKHEK A S K EDLIDKHEK A S K EDLIDKHEK K LA G N YDL S L LA G N SED LIDKHEK K S ED LIDKHEK SDF N LVLDAKRV S D DF N LVLDAKRV S D DF LDAKKV D A SDF RV D A SDF LDAKRV D A SD KH G G VKRYLPDL H G VKRYLPDL H N LV GVKRYLPDL N L VLDAK DL N L V RYLPDL GA G TLLEE G K A G G TLLEE K H GA G G VKRYLP G EE K H GA G G VK G LTLLEE K H GA RH K EFL T Q KKDDEVFDRH K EFLTLLEE G K A G G QKKDDEVFDRH K EFL QKKDDEVFDRH K E FLTLL NH Q L L T Q KKDDEVFDRH K E F DDEVFDRH KYV I K A K EKLVNH Q T I K A K EKLVNH Q T L K EKLVNH Q L A EKLVNH T Q KK QL A EKLVNH E S S K S T ETVKYV E S S K S IKYV I K A S K T ETVKYV S T S K K S TVKYV S T S K K S K TVKY DNE G KPD G KPD T ET SNT L G NLK NE QDPTV S D NT L G NLK NE E S S K GKPD L G NLK E G E T PD T E LK E G E T PD T E QDPTV D N SNT L G N TV D N SNT Y V L G NLK RIK A YV SRK S T LMLL A YV T Q DPTV S D NT SRK S LMLL A YV LMLL A Y V SRK T Q DP SLMLL RIK S A RK T Q DPTV D N SN SLMLL RI TEKANYDKKNV T RIK QTEKANYDKKNV T RIK S RK S T QTEKANYDKKNV T RIK QTEKAKYDKKNV Q T TEKAKYDKKNV Q T TE LRVL DAAEKILLRVL DAAEKILLRVL DAAEKILLRVLRDAAEKILLRVLRDAAEKILLR VVIM G R DLDLDD M G R DLDLDD DLDD MD DLDLDD VVIMD DLDLDD VV KKNRITKEIFE I VVI GKKNRITKEIFE I VVIM G R DL GKKNRITKEIFE I VVI GKKNRI TKEIFE G I KKNRI TKEIFE G I KK NY KTIKAPNLKNY KTIKAPNLKNY KTIKAPNLKNY KT IKAPNLKNY KAPNLKNY D V F Q R TVRLKTIKD V R TVRLKTIKD V LKTIKD V IKD V KT I Q T VRLKTIKD I Q P G DALDIMDI F Q Q P G DALDIMDI F Q Q P R TVR GDALDIMDI F Q Q P R T VRLKT GD ALDIMDI F P G R D ALDIMDI F SF M NF FFREA S I F F NF FFREA S I F NF FFREA I Q S F HFFREA I Q S F DD S EY W H GY DD M NF FFREA S I SEY W H GY PDD S M EY W H GY DKLLK N KENP SYIE LK N KEN SYIE N KENPDD S M EY W H GY NPDD M NF SEY G W Y KENPDD SYIE LK N KE SYIE KRVEF V I STKFFE S DKL GKRVEF V I STKFFE S DKLLK GKRVEF V I STKFFE S DKL GKRVEF V I STKFFE S DKLLK I S N YIE DK GKRVEF S V TKFFE G S KR LFKFANPFE ALFKFANPFE ALFKFANPFE ALFKFANPFE ALFKFANPFE NAIF F D R A NE S N QDLF L A NE S N QDLF NE S N QDLF LF YNE Q S DLF S S V S L P T Y GKP S P T Y GKP N D R S S V L A SP T Y GKP N D R SV L A NE S N QD SP T Y GKP N D R SV L A SP G T KP LN D QVTY Q N D R SV S A L SFA Q S VTY IET S Q A L SFA Q VTY IET S Q A L SFA Q S VTY IET S Q A L SFA Q S VTY IET S Q S A FA S S Q V PLAK A IET GDH A K G A DH IPLAK G A DH V FPL Y G L A DIPLA PL L A G A D HV PL L A IPLAK G A DH IPLAK G A DH A DIPL G A D GKHV PL L A G A D HV PL Y G L A K KHV I S DLN G L A G G KHV GKDEI K F SDLN L Y GA G G K EI K F SDLN L Y GAL G G KDEI K F SDLN L Y GA G G K L G KDEI K F SDLN G L A G G G KDEI S K HK T V S L DDKHHK R KT L G KD SDDKHHK KT LDDKHHK S DDKHHK KT V S L DDKHHK DL R K GIL S I ELIVI DL G IL I V SELIVI DL R G IL I V SELIVI R KT GIL I V SELIVI VTKITYDA ITYDA VI L S VTKITYDA VI L DL SVTKIT YDA VI L DL G R IL S I ELIVI DL SVTKIT YDA I S L VT DIDLFKVI Y VI S L VTK GEK DIDLFKVI G Y EK I G Y EK LF KVI G Y EK EK DI 0 YVWDLKILAEL G D YVWDLKILAEL D DIDLFKV GYVWDLKILAEL D DID GYVWDL KILAEL D DIDLF KVI Y V G G YVWDL KILAEL G D YV 0 DRVKHD Q D RVK D G N D G S V I R I M G N N FL G G Y N D Q KVKHD G S D V I R I M G N N Y FL KH D O S V I R I M G N G N Y FL H G G N D Q RV G S D V I R I M G N N Y FL G G N D Q D KVKH D G S V I R I M G N N Y FL G G N D G Q D K S V 6 W 0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9

8 7 1 7 1 7 1 7 1

1 1 8

B 7 4

D L NR D R NNIKKKYYNR HVMI

SF G L KF G Q YAVMI G R N QYAIK R NHVMI L L NR VMI L NR QYAIKL S D F G KF D NH G Q R YAIKL S D F G L KF D NHVMI G Q R YAIKL S D NNT KK VYIKL D L SF L KF D L D L SF L NR GKF G D GKK KVYNKANNTKK KVYNKANNTKK NKANNTKK RVI I K TKNKANNT I Q I RTTEV RVI RTTEV RVI I KVY EV VI I KVYNKAN QRTTEV KY I I Q R SMALFKEV I S I MALFKPV G T KY I I Q I LFKPV G T KY I I Q RTT PV T R GKY I I ALFKPV T R GK P S HHLLYLPV T RV GKY LYLKEVP S S MALFK QHHLLYLKEVP S S M LLYLKEVP L K Q S H RALDERKEVP S HHLLYLKEVP S S MA K Q HHL Q HH QRALDERH S HRALDERH HRALDERH K I PI DLNEYE VL S K HDLNEYE E VL SA IDLNEYE E VL S K SA IDLNEYE E VL S HRALDERH SLL TADRLE G H S E A PITADRLE G R PN I P SLLTADRLE G R PN I P SLFTADRLE R S A PIDLNEYE E VL GPN S I LLTADRLE R S A GPN S I EAT N LLN L IMPN S I LL N LLN IMVLEAT LLN IMVLEAT LLN MVLEAT LN MVLE KLA G YDL S MVVLEAT G YDL S L MVNY L S L MVNY A G N YDL L I SMVNY LA N L GYDL L I SMVNY SED LIDKHEKNY LALIDKHEK K LA G N YD KHEK K L DLIDKHEK A S K EDLIDKHEK A S K F N L VLDAKKV A S K EDVLDVKRV D A S EDLID SDF AKRV N A S E SDF LVLDVKRV S D DF DAKRV S D DF G G V KRYLPDL S D DF KRYLPDL N LVLD LPDL N VKRYLPDL H N LVL GVKRYLPDL H G K E FLTLLEE H N L GVFLTLLEE K H VKRY GA G G G EFLTLLEE K H GA G G G EFLTLLEE G K A G G TLLEE G K A G G T Q K KDDEVFD G K A G G KDDEVFDRH K KKDDEVFDRH K KKDDEVFDRH K EFL QKKDDEVFDRH QL KLVRH K E QK K EKLVNH T Q Q L V I K V SK K E S ETINH Q T L K A EKLVNH T Q Q L SK S ETVKYV T K A S S K S K T K A EKLVNH Q T L K S K VKYV T K A E E S G K PD T NLKKYV I PD T NLK T ETVKYV S S K EKLVNH Q T GNLK TPD T ET GNLK E S S K S T ETVKYV T V L G Q DPTV NE E S G KV L G Q DPTV D NE G E TPD SNT YV L DPTV D NE G E SNT YV L DPTV D NE G TPD SNT L G NLK NE QDPTV S D NT K A Y SR K S T LMLL D NT K S T LMLL RIK S A RK T Q S LMLL RIK S A RK T Q S LMLL RIK A YV SRK S T LMLL KAN YDKKNV T S Q RIK A Y SRYDKKNV Q T TEKAKYDKKNV Q T TEKAKYDKKNV Q T TEKAKYDKKNV T RIK QTEK VL DATEKILTEKANDVAEKILLRVLRDVAEKILLRVLRDVAEKILLRVLRDVAEKILLRV IM G R DLDLDD RVLRDLDLDD VIMDDLDLDD VVIMDDLDLDD VVIMDDLDLDD NRI TKEIFE I L GVVIMETKEIFE I V GKKNRITKEIFE G I KKNRITKEIFE G I KKNRITKEIFE I VVI GKKN VKT IKAPNLKKKNRIIKAPNLKNY PNLKNY TIKAPNLKNY TIKAPNLKNY Q VRLKTIKNY VRLKTIKD V KTIKA KTIKD V K TVRLKTIKD V K P R T GD ALDIMDID V KT ALDIMDI F Q TVRL QP G R DALDIMDI F Q DALDIMDI F Q TVRLKTIKD V M NF REA F Q G DALDIMDI F Q Q P R T GD I F NF FREA I Q P G R SF F HFFREA I Q P R Q P FFREA S S F F FFREA S I F SEY W HFF GY ENP S I F NF W H GY S EY W HF GY KENPDD M N SEY G W Y KENPDD M N SEY W H GY LLI N K IE D S M EY N KENPDD M YIE DKLLK I S N YIE DKLLK N KENPDD S M SYIE VEF V I S Y STKFFE S D GDKLLK V I S YIE KLLK STKFFE S D GKRVEF V I S N STKFFE G S KRVEF S V TKFFE G S KRVEF V I STKFFE S DKL GKRV KFA NPFE NPFE E NAIFKFANPFE NALFKFANPFE AIFK R A S NAKRVEF QDLLFKFA S NALFKFANPF E Q S DLF V S L P T YNE GKP T YNE Q DLF GKP D R SV L A SP T YN GKP D R A YNE Q S DLF V S L D R A YNE S N QDLF TY ET S Q A LNF SFA D R IET S Q A LN LN P G T KP SFA Q S VTY IET S Q S A FA S S Q VTY Q LN S S V S L P G T KP AK A I G DH S S V L A SP DH A IET S S A FA Q VTY Q N D R SV S A L SFA Q S VT FPF Y L A G A D VTY GK I Q Q PLAK A G DH G L A A DIPLAK A L A A DIPLAK G DH A PLAK A IET GDH A GKHV FPL Y G L A DI V FPL Y G L A DIPLA DLN G L A G G KDEV L Y G G KHV GA G K FPL K L L L G KDEI S DLN L Y G G A G KDEI S DLN G L A G G KH GKDEI S K DLN G A G G KHV GKDEI K F SD RKT V S DDKHI K FPL SDLN DKHHK T V S L DDKHHK T V S L DDKHHK GIL S I ELIVI K KT I V S DDKHHK KT L G S ELIVI L G R IL I V S D SELIVI DL R K GVL S I ELIVI DL R K GIL S I ELIVI DL G R KIT YDA I L H SDL G R ILYDA VI L D SVTKITYDA VI S L VTKITYDA VTKITYDA DLF KVI Y V GEK TKITKVI G Y EK IDLFKVI G Y EK DIDLFKVI Y VI S L GEK DIDLFKVI Y VI S L VTK GEK DID 0 WDL KILAEL H V GDIDLFKILAEL D D GYVWDLKILAEL G D YVWDLKILAEL G D YVWDLKILAEL G D YVW 0 VKH D N N Y FL YVWDLD FL KVKHD N N L KVKHD O I R I M G G G N D G Q D K V K H M G N G N G Y N D G Q S D V I K I M G G Y F G N D G Q S D V I R I M G N G N Y FL G N D G Q D KVKHD S V I K I M G N G N Y FL G N D G Q D KV S V I 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9

8 7 1 7 1 7 1 8 1

1 1 8

B 7 4

L R HVMI L NR D R NNIKKKLINR

F L N G KF G D R N QYAIKL S D F G L KF G Q YAVMIDF R NHIKKKLINR IKKKLINR QYAVMIDF R NN VMIDF KF D NHIKKK G Q R YAVMID NT KK KVYNKANNTKK KVYIKLFM S KF G D Q YA I Q KK KVYIKLFM Q S KK VI I Q RTTEV I KVYIKLFM S KF G D QKK RTTNKA H I Q I RTKNKA EH I KVYIKLF QRTTNKA Y I I I Q RTKNKA S S MALFKPV T RVI GKY S MALFKEV S EH I I Q I LFKEV S E GNI S I MALFKEV S NI I I SMALFKEV S K Q HHLLYLKEVP S HHLLYLPV T G NI S MA GRALHHLLYLPV G T RALHHLLYLPV T G G RALHHLLYLPV T G G R SH RALDERH E VL K Q S HRALDERKEVA RALDERKEVA TRALDERKEVA PI DLNEYE S A PIDLNEYE VY K T Q DLNEYE K TRALDERKEVA LL TADRLE G R PN S I LLTADRLE G H S E A L S S TADRLE G H E VY Q K SA S DLNEYE STADRLE G H E VY Q DLNEYE SA L S S TADRLE H E VY G S A AT LLN IMVLEAT N IMPN I L S L LLN IMPN S I LN MPN S I LA G N YDL S L MVNY N LLN N S I GYDL L IMP SMVVLE D L GI N LL GYDL S L MVVLE G D I G N YDL S L MVVLE D L GI N L GYDL L I SMVVLE ED LIDKHEK K LA SEDLIDKHEKNYN LIDKHEKDYN NYN NL VLDVKRV D A SDF VLDAKKV AT S R Q S VLDAKKV T S RS LIDKHEK QVLDAKKV R LIDKHEKDYN GV KRYLPDL N L KRYLPDL S D DLNPKRYLPDL D A SDLNPKRYLPDL D AT S Q S VLDAKKV AT SDLNPKRYLPDL S D DL KE FLTLLEE K H GA G G V G FLTLLEE H N FLTLLEE QK KDDEVFDRH K E KDDEVFD G K A G A N G S KDDEVFD K H GA A N GN S FLTLLEE GKDDEVFD K H N FLTLLEE H GA G A N G S KDDEVFD G K A G A L T A EKLVNH T Q K QL K A KLVRH AY KLVRH Y A EKLVRH Y E S S K K S K TVKYV S I S K K E S ETVNH Q T MR K A SK K E S ETVNH T A QMR S K K S K NH T A QMR K A SK K EKLVRH S G T PD T E L G NLK E G E KPD T NLKKYV PD T NLKKYV KPD T ETV GNLKKYV KPD T ETVNH Q T KKYV AY V TV D N SNT YV L G Q DPTV NE V K SRV L G Q DPTV RV L DPTV S RV L G NL QD SR K T Q DP SLMLL RIK S A RK S T LMLL D NTAPK S T LMLL D NE S V SNTAPK T Q S LMLL D NE V PTV SNTAPK S T LMLL D NE AK YDKKNV Q T TEKANYDKKNV S S Q RIKLIYDKKNV Q S RIKLIYDKKNV Q S RIKLIYDKKNV S S NT QRIK LR DVAEKILLRVL AAEKILTEKMTDAAEKILTEKMTDAAEKILTEKMTDAAEKILTEK MD DLDLDD VVIM R D GDLDLDD RVR DLDLDD RVR LRVR DLDD RI TKEIFE G I KKNRITKEIFE I L GIVIR G T TKEIFE I L GVVIR T DLDLDD GTKEIFE G I VVIR T DL GTKEIFE I LRV GVVI KT IKAPNLKNY KAPNLKKKN IKAPNLKKKN LIKAPNLKKKN LIKAPNLKKKN RT VRLKTIKD V KTI Q TVRLKTIKNY T L TIKNY T FVRLKTIKNY T FVRLKTIKNY GD ALDIMDI F P G R DALDIMDID V G FVRLK MDID V G KALDIMDID V G NF HFFREA I Q S F HFFREA F Q NKALDI QPEF REA F Q N QPEF HFFREA F Q NKALDIMDID V EY G W Y KENPDD M NF SEY G W Y KENP S I F FD W HFF GY ENP S I F FD G W Y KENP I Q PEF W HFFREA F Q Q P SF D G Y KENP S I F LK I S N YIE DKLLK I S N YIE D S M EA N K IE D S M EA I S N YIE DD M F SEA I S N YIE EF S V TKFFE G S KRVEF S V TKFFE S D GDKLFP V I S Y STKFFE S D GDKLFP S V TKFFE G S DKLFP S V TKFFE S DD S M GDKL FA NPFE S NAIFKFANPFE NAKRV NPFE S NAKRV NPFE NAKRV ANPFE AKRV LA YNE Q DLF R A YNE Q S DLIFK L A S Q DLIFK L A S S P G T KP LN D V S L P G T KP LNF T YNE T YNE Q S DLIFK S L E S N QDLIFK ALNF G KP F G YN Y IET S Q S A FA S S Q VTY ET S Q S A FA D RY S G G KP S S VKAIET S Q S FA D RY S G D RY S G T KP Q LNF K G A DH DIPLAK A I GDH A D Q VTPTDH S S VKAIET Q LN S S A FA A S S VKAIET S S FA D R PL L A G A KHV FPL Y G L A K Q I PLALL L A A D VTPTDH L A I Q VTPTDH A S S V GK I Q Q PLALL A D PLALL Y G L A D I Q VT LN L Y GA G G DEI S K DLN G L A G G G G K Q G KDEV L Y G KDEV L Y GA G G KDEV FKT G L A G G K Q PLA GKDEV KT L G K SDDKHHK V S L DDKHI K FKT G A G SD L G DKHI K FKT SD I S K D L V S L DDKHI K F SD IL I V SELIVI DL R KT GIL S I ELIVI HK I L I V S D VI K I L S DDKH CL I V L SELIVI HK I L S I ELIVI HK IT YDA I S L VTKITYDA I S L DL R C L S ELI GKNYDA VI L H SDL G R KNYDA DL R C G KNYDA VI S L DL R LF KVI Y V GEK DIDLFKVI Y V GEK VTKDIKVI Y G G EK TKDIKVI Y VI S L GEK VTKDIKVI G Y EK VTK 0 DL KILAEL G D YVWDLKILAEL G H DID KILAEL H V GDIN KILAEL G H DID VKILAEL G H DID 0 KH D N Y FL VKHD N L YVW I V Q D W I V Q D FL YVW Q I O K I M G N G G N D G Q D K S V I R I M G G N Y F G N D G Q D K V V S S M G N N Y FL V G G N D Q Y G D K V V S S M G N G N G Y N D G Q D K V V S D S M G N G N Y FL Q YVW G N D G D K V 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 8 1 8 1 8 1 8 1 8

1 1 8

B 7 4

LI NR N R NNIKKKLINR R NHIKKKYYNR

F F G Q YAVMIDF KF G D Q YAVMI L R NHVMI IKKKYYNR QYAIKL D L SF L NR GKF G D R NN QYAVMI L KF D R NNIKKK G Q YAVMI M S K Q KK KVYIKLFM Q S KK KVYNKANNTKK KVYIKL S D F G L KK EH I Q I RTKNKA I KVYIKL S D F L KF G D GKK L S D RTTEV RVI Q RTKNKANNT I KVYIK QRTKNKAN NI S I MALFKEV S EH I I Q RTTNKANNT SMALFKEV I I Q I LFKPV G T KY I I I EV VI I I GMALFKEV AL HHLLYLPV T G NI GRALHHLLYLPV T RVI G MA GKY HHLLYLKEVP S G MALFK QHHLLYLPV T R GKY LLYLPV T R GK KT RALDERKEVA RALDERKEVP S RALDERH HRALDERKEVP S HH Q LNEYE VY K T Q DLNEYE H E VL K Q S HDLNEYE E VL S K SA IDLNEYE K Q RALDERKEVP L S D S TADRLE G H S E A L S S TADRLE G S A PITADRLE G R PN I P SLLTADRLE G H E VL S HDLNEYE SA PITADRLE H VL G S E A DL LLN IMPN S I N IMVLEAT LLN IMPN S I LL LN MPN S I GI G N YDL S L MVVLE D L GI N LLN N S I LL GYDL L IMP S VVLEAT N LL GYDL S L MVNY A G N YDL S L MVVLEAT N L GYDL L I SMVVLE R IDKHEKNYN LIDKH G M KNY LA KHEK K L DLIDKHEKNY S S L Q VLDAKKV T S R Q S VLDVKRV A S K ED L ID GLDAKRV D A S E SDF LVLDAKKV K LALIDKHEKNY NP KRYLPDL D A SDLNPKRYLPDL S D DF KRYLPDL N VKRYLPDL D A S EDVLDAKKV A S K SDF N LTLLEE FLTLLEE H N L GVLLTLLEE K H GA G G G ELLTLLEE N LKRYLPDL S D DF N S F G KDDEVFD K H GA A N GN G S KDDEVFD G K A G G KDDEVFDRH K KKDDEVFD K H G VLLTLLEE H GA G G DEVFD G K A G G AY A K EKLVRH KLVNH T Q Q L RH K EKD K MR S K K S TVNH T AY QMR K A KLVRH K E QK SK K E S ETINH Q T L K V SK K E S ETVKYV T V EKLV S S K K S K NH T Q Q L K V K EKLVRH S VK PD T E LKKYV KPD T NLKKYV T PD T NLK TPD T ETI T GNLKKYV I S K SPD T ETINH Q S R V L G N TV E S V RV L G Q DPTV NE E S G TV L G Q DPTV D NE G E SNT YV L DPTV L G NLKKYV AP K T Q DP SLMLL D N SNTAPK S T LMLL D NT K S T LMLL RIK S A RK T Q S LMLL D NE G E KV Q DPTV SNT YK S T LMLL D NE LI YDKKNV Q S RIKLIYDKKNV T S Q RIK A Y SRYDKKNV Q T TEKAKYDKKNV Q T RIK S A RYDKKNV T S NT QRIK MT DAAEKILTEKMTDAAEKILTEKAKDAAEKILLRVLRDAAEKILTEKANDAAEKILTEK R LDLDD LRVR LDLDD RVLRDLDLDD VIMDDLDLDD LRVLRDLDLDD R T D G TKEIFE G I VVIR T D GTKEIFE I L GVVIMDTKEIFE I V GKKNRITKEIFE G I VVIMETKEIFE I LRV GVVI TL IKAPNLKKKN KAPNLKKKNRIIKAPNLKNY TIKAPNLKKKNRIIKAPNLKKKN GF VRLKTIKNY T LI RLKTIKNY TIKD V K TVRLKTIKNY TVRLKTIKNY NK ALDIMDID V G FV QNKALDIMDID V KTVRLK MDI F Q Q P G R DALDIMDID V K EF HFFREA F PEF HFFREA F Q ALDI QP R T GD REA S I F NF HFFREA F Q TALDIMDID V FD G W Y KENP I Q S F Y KENP S I F NF W HFF GY ENPDD S M EY G W Y KENP I Q P G R D HFFREA F Q Q P SF F G W Y KENP S I F EA I S N YIE DD M FD G W SEA I S N YIE D S M EY N K IE KLLK I S N YIE DD M N SEY I S N YIE FP S V TKFFE G S DKLFP S V TKFFE S D GDKLLK V I S Y STKFFE S D GKRVEF S V TKFFE G S DKLLK S V TKFFE S DD S M GDKL LA NPFE NAKRV ANPFE S NAKRVEFNPFE N S T YNE Q S DLIFK S L YNE Q DLIFKFA S NALFKFANPFE NAKRVEFNPFE AKRV QDLF Q DLLFK Y S G G KP LNF RY S G G T KP LNF T YNE D R SV L A DLLFKFA YNE S SP T YNE Q S GKP F KA IET S Q S A FA D VKAIET S Q S A FA D R G KP SV L A SPIET S Q A LN SFA Q S VTY IET Q LN T KP S S A FA D R A G Q LNF PT DH D S S I Q VTPTDH A D Q S VTY DH S S V S L PIET S S A FA D R LL L A G A K Q PLALL Y G L A K Q I PLAK G A L A A DIPLAK G A DH L A Q VTY A S S V GKHV G A D PLAK A DH G Y G L A D I Q VT KT L Y GA G G L G KDEV FKT G L A G G G KDEV L Y G KDEI K FPL SDLN L Y GA G G K Q I GKDEV FPL G L A G G K Q PLA GKDEV IL S DDKHI S K D I V S L DDKHI K FPL G A G SDLN L G DKHHK KT I S K DLN V S L DDKHI K F SD CL I V SELIVI HK I L ELIVI HK I V S D VI L G R IL I V S L DDKH SELIVI HK T S I ELIVI HK KN YDA I L C L S S DL R G KNYDA I S L DL R KT S ELI GILYDA VI L D SVTKITYDA DL R K GILYDA VI S L DL R DI KVI Y V GEK VTKDIKVI Y V GEK VTKITKVI Y G G EK IDLFKVI Y VI S L GEK H VTKITKVI G Y EK VTK 0 V KILAEL G H DID VKILAEL G H DIDLFKILAEL D D GYVWDLKILAEL G DIDLFKILAEL G H DID 0 Q I S D N N FL Q YVW Q I N L YVWDLD FL RVKHD FL YVWDLD O V S M G G G Y N D G D K V V S D S M G G N Y F G N D G Q D K V K H M G N G N G Y N D G Q S D V I R I M G N G N G Y N D G Q D K V K H M G N G N Y FL Q YVW G N D G D K V 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 8 1 8 1 8 1 9 1 8

1 1 8

B 7 4

YY NR D R NHVMI L NR NHVMI

L F G Q YAIKL S D F G L KF G D Q R YAIKL D L SF L NR GKF G D R N S IKLDV NRYFELI PIF NR QYYIMLFY KYDTNVN S H AF D L S MVVLE GKHEKNY F L K G KK KVYNKANNTKK KV EL E KK KRLI D KF S KKIAKRV A S K NT I Q I RTTEV I KVYNKANNTKKI RT K N QAHA G G R I D R FT I S Q IEPMKVI VI S MALFKPV T RVI GKY I I Q RTTEV RVIII SMALFKPV G T KY LFLPV YH K C L TKMYARPTF I I SI Q LPDL S D DF GLLEE H Y S HHLLYLKEVP S HHLLYLKEVP S M G E Q Q S HLFT V G N KE KHKLFP K Q RALDERH VL K Q S HRALDERH S HKA DE R N Q EVKP RAEL K KYYD KHLEVFD G K A G G GVKAF RALEKLVRH SH DLNEYE E A PIDLNEYE E VL K A IDL G L EYR G H IVKVI NLE KTINH Q T PI TADRLE R S G PN S I LLTADRLE R S G PN I P SLLTAERLIM I VPE SANY N ILY R K Q S ALIY Q L LL N LLN IMVLEAT D L MVPNWN DLEEIE I LLLD TAD G A NLKEYV GPLPN LDPTV AT G YDL S L MVNY LA N LLN LEAT LL GYDL L IMV SMVNY A G N YDI S EKVL D IDT N L DM V NE LA LIDKHEK A S K EDLIDKHEK A K L SEDLVEKHRVDY Y K SE G YNLLR IPM G S F G Y G LIDK L L QV T S NT QRIK ED VLDVKRV S D DF VLDVKRV S D DF LVLEDKDL E Q VLD G LDI S I QMEAAYE VLDEEILTDK NL KRYLPDL H N L GVKRYLPDL H G N VKRYLPEE D AN SDK Q I L S I REKKEENV K KRYLDD GV FLTLLEE G K A G G FLTLLEE G K A G G EMLTFLFD K HT K MIEEPRVD G H FLTIFE I LRV GVVI KE KDDEVFDRH K E KDDEVFDRH Q K KKDD QK A K EKLVNH T Q K QL L D VLV G AK S Y S KD MLDIL V G G YA KDDPNLKKKN GKTIRHRHLR Y K NKN L K K A KLVNH Q T I S K S TVKYV T S K K E S ETVKYV T KA SAK S K S ELKNHPYDA A C Q D I VE SKFI G K LLV K A SK K KTIKNY SIMDID V E S PD T E LK E E S G TPD T NLK NE G E TP G E KYILIVER GK V L G N T Q DPTV D N SNT V L G Q DPTV S D NT V S Q Q F NP T V Q S RYPY PE LRL Q EI SKT Q NADD V L FREA L Q Q P QKENP S I FM AY K S LMLL RIK A Y SRK S T LMLL IK A Y SRKIL LI Q S D DL SNKTTR K T SFEPLWVD P SR YDKKNV Q T TEKAKYDKKNV T R QTEKAKYDK Q I YILRI EYNNELI F K S LYIE AN EVAEKILLRVLREVAEKILLRVLRDAALLD TE D T SFYDAL I KH S L S YDKFFE S DDA GDKL G LR DLDLDD VVIMDDLDLDD VIMDDLDLDE G I LR L YK S AAV DAAE AKRV ME TKEIFE G I KKNRITKEIFE I V GKKNRITH L DLK G TIK S HKK DLD S N RI IKAPNLKNY IKAPNLKNY I NIKVV G E Q R G TDMA IKVHRH TKE E Q DLIFK Q DF S G LTKRKKPTLHEL Q D EVLHPNLIKTT A L SFA D R KT VRLKTIKD V KT Q TVRLKTIKD V KTVK E KTKINYPIIVREENEAVHIIRVRLA S S V RT ALDIMDI F P G R DALDIMDI F Q VR QP R T GDAL Q L LMEADLLV KNLE LDL A D GK I Q VT QPLA GD HFFREA I Q S F YRNP S ALETTFP Q HLRMM NIKD E A G HFKKDEV NF G W Y KENPDD M NF SEY W HFFREA S I F NF GY ENPDD S M EY W HF GYNKTA L E SF I L SEH G W YDLRT G S DEEYD G W Y TDKHI K F SD EY V I S N YIE DKLLK I N K SYIE KLLK IK G S NEE I IFFETP LK S TKFFE G S KRVEF S V TKFFE S D GKRVEF V ITY STKFFDPDK N L S V TTKIMI L RPHW I S N TVI SVNRL S V TKRVI L HKK SDLE EF NPFE NALFKFANPFE FKFANPFEKDLKR K G S TNPKYFDANKAAKNPFKEKDVTK FA S T YNE Q DLF R L A YNE S NAL QDLF ELDIFF Q K A FEPKDY HR T YNREL LA G KP LN D V S P G T KP D R SV L A SP T FN GD D SP IET S Q S A FA S S Q VTY ET S Q A LN SFA Q S VTY ID P Q Q K A F N G T Y K QFDID V VI S K YV G NPHFL G DIN QYVW AI S F H SV N G L IDNLEIRTKND Y A DH A DIPLAK G DH A LAK G A DH V KKV A S G E QV L IRN CDH E Q A D Q S LD D KV K G Y G L A KHV FPL Y G L A DIP G G KHV L G EDNPL T R Q E Y Q A TNLL F FILVDH S TTR Y L S S VV G I WH G GPY PL G L A G G DEI S K DLN G L A G KDEI K FPL SDLN L Y G G K KLAI E S A T G L N AETLR Q M EVR G L AL V LE R G E LN L G K SDDKHHK V S L DDKHHK KT L N DIII S N EA PRKLE VA I T R GI T Q G ANIT KT I V SELIVI DL R KT GIL S I ELIVI L G R IL I V S E SELILKDHKTL V IK G N STEV K KDV GD IFYNY S I ELVA IL YDA I S L VTKITYDA VI L D SVTKITYAA IRDDLKLTY KV Q M S N I FE L KIP IT KVI Y V GEK DIDLFKVI G Y EK IDLF DLVITKRYE E M SLVV H S Q L YDA GKAI Y E S EDK G E G AILDN 0 LF KILAEL G D YVWDLKILAEL D D GYVWDL K AI G Y QILADD IVRRKLK D DT G P GLTKLIA S E NE 0 DL D N L VKHD N L KVKHD FH D D G IIRKETL Y E S S D GALMIL E NT KE O K H M G N G Y F G N D G Q D K S V I K I M G G N Y F G N D G Q S D N N KFE Q V I K I M G G G Y N Y I Q Y K N K R M A F A F K I D T K I Q L V E S M G N G N Y G R R K D D I 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 9 1 9 1 9 1 9 1 8

1 1 8

B 7 4

LL KY G Q YYIPN F G Q YYIPN G KKIKV Q S V S EHKKIKVDVVALV S KKIKVDVVALV ET KKVKV K G LK KIKV L S K A II AAE S L AE II A V I Q II A V I I Q S NAII RT K NVL S TK QDDYF Q K I RA Q KN V QDYF K S Q LFM Q R GLFLPPN N I GLFM Q R GLF Q Y I E S L V S I EHFM Q R GLF Q Y I E S V S EH AD YM G K LFLP A I M G K LFLP A LARHLFT EVL S K HHLFTAE S AE RHLFTAE S I AE KL KHLFTRN D A SD G L S Y GE HLFTRN S D D G G EEKALDE Q R YF K T QAKALDELPPN N I CLKALDELPPN N I GL QVKALDEE L K ALDEE YK H N G A LNDLNEY AVL S K DLNEY TVL S K LE DLEDYH G D K H GA T T C K Q EDLEEYH G H K H GA T TTDLNE Q TAERLIM S D D G G ELTAERL Q R ET TAERVIMRHA S A KTAERVIMRHV S A S T H NYF K T QATAERL Q R TV L MV H KI A LN H NYF K T QA NLLE MVNHEA LLE MVNHEAK N LLE GYDI S EK G K A Q T N LLE GYDI L K G S IM S D D G G EL N LLE H G YDI L K G D A LN SIM S D G G EL A YDI S DKKYVL S T G N YDI S Y DKKYVLYLIEKHRVRHV S A S E LIEKHMV H KILIEKHMV H KI S T G Q LVEKHRVADKLTLVEKHRVADKLRVLDEKDLNHEAVVLDDKEK G K A Q T VLDEKEK G K A Q T AY VLEDKDL KRYVLDDKDL KRLKRYLPEEDYRLTKRYLPRVRHV S A S E KRYLPRVRHV S A S E LRKRYLPEE S N SIVRRKRYLPEE S N SIVRNMLTFLFDKDKMTMLTFLDLNHEAVMLTFLDLNHEAV ML MLTFLFDTEIRKMLTFLFDTEIR DDYVIV NKRYKDDYVEEDYRLTKDDYVEEDYRLT RRKDDYVLVLRN DD LVLRN S K R TL S L IVRR DKLT KFDKDKLT R G KA K TIVV G RK V Y V QKTIVV G Q T DK LKTEI K T KFDK SK R D S EIV NKRY K T SK K D S EIV NKRY R Q AK S D K SELKKK V G PK QEIAK K S S LKRK V G K S K K QEVPE A E TILRN R K GRPE A ETI S S IVRRPE T ETI S L IVRR GKPE TVNYPFKPE G E H VI V F G Q EPLKTEI V F G Q EPLKTEI NVV F DE L DF F G ETVNYPFDV F G E K Q EP V Q S RK V ND QKVK S K LFLTILRN R K GRK S K LFLTILRN R K GR ED K Q Q DP SLILV Q S Y E SF E V GK P Q NPL DF SLKLV Q S Y ELK S LFL SFKYDP ILNYPLDYDP YH V NDYDP YH V ND LL YDKKYIL I K SFL LYDK IL I K SFL LDAA K Y SLN I DL L E DAA S K LV S V QRK Q V EVDAA S K LV S V QRK Q V EI ED DAATLD NEV S L FDAA E Y SLD NEV S L FD E G S F G K D KLDILNYPLDD KLDILNYPLD FL DLDLDE G I DKKYNDLDIDE G I DKKYKT L KLD QELYLK L R SFL FT Q L ELYN L E T Q L YN L E LF TY INIKKRRKFTH IKKRRKFIKENNIKDEV S L FIKENNE I D G E S L F G K IK E L GNNE I D G R S Y F G K ST IK S E TLIKIFVPDIK S E I N SL VP RLKTEIDKKYK RLKTLK S L FL TLK S L FL Y RLKT RLKT I KIF QIF IL A V GALDIMEAKRKKF S V LDIMIKDEV L F SF V RLK SLDIMIKDEV L F SF K F V G ALKLM Q IF ILAV GA Q DKPALKLMEA Q NKP HFYKNPIFIPA HFYKAIDKKYK KEIDKKYK P HFYRNP A S Q VFI W HFHRNP A S Q VFI W YHKNN F T A G W YHKNEAKRKKF W HFY GYHKNEAKRKKF L S G G W YNKTT PLDL G A G YNKTT PLDL A G G P G S N A C L P IKYINPIFIPA K ITYIK G S I ITYIK G S I V IKYI S R K TK DP A S Q VFL V TKFFK F V IKYINPIFIPA F G A S V TKFFDPI S N K KL QDA S V TKFFDPI S Q D F S S NPF S F S F DLPLDI A S G NPFTDA G S N T S TKFF A L A CPNPFAD S KS F G N T SA L A CP I NPFEKDLHKK E TNPFIKDLHKK E I M V KNL FL EDP Q A VFL L S F N LDDLN Q L FN F G T F D Q E E QAF Q E I S N EEA G T F D DP A S Q V Q E E QADLPLDI G A T D G Q F E ADLPLDI G A ET T F G D E D ELDDLN Q K P Q AF HITWVT G T EVHKKKTIK Q A T M I KNLIK A Q Q T KY I Q Q K S S VDIVNLI E Q AF HITWVKIK Q A T Q Q P K S S VDIVNLDH T S S K DLNRFDH T S S F Q E I S N EEADH S M SF E V QI N KNL SEEA KKDH N V VYVVHLDH KVYVVHL Y G D Q V ITWVT Y G N AVHKK T N T AEVHKKKT IL L KK G G EDA K Y G L V K GEDA KKDIANL G L KV K A GKK DLN C K F L Y G G EV G K KK LNRF D L Y GALNKLE S D V G E Y N GI G L TLNKLE D K L S V E N G L EV K A GK G G Y I KIDDIVYIIHL KIDDD Q V ITWVT KIDDD V D QITWVT K S I DADDIILDDK DAEDIILDDK I ELIVID ELIVKK ANL S I ELIVKKDIANL YV S ELIVKDRRAK S F S I ELIVKDHRAK F S S YEK KD S K I SV E G Y N S G IYEK IV D I CIIHLYEK IVYIIHL G A Y VKDDINE YEA KDDINE IL Y I F Y E G ELT TK AL Y I GEID K S S AI G ELVIVTL S A TAV Y V GELVIVTL E N SKLFAEL L T QRDN F N SKLFAEKN S K SV E Y NNAL Y I GEID K G G IKLFAEKD S S V G E Y N GI 0 A KKLADD F IKKLADD DKE LNT D G K NN T LT TK T LT TK 0 E S N E D DKE N FH G LDI S EE S F A S E Y F GNHVIK K ENN SLAEA S E Y F GNL L T QRDN F N H G D LDI S EA S E Y F GNL L T RDN S F O L L M N Q G G G Y N Y V E N D V E M G N G N Y F G N Y V E N D V S E S M F F D W Y V D K Y F L M F F D W D G K N N K E N M F F D W D G K N N K E N 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 9 1 9 1 9 1 9 1 0 1 8

2 1 8

B 7 4

W DTKVRKAH L YFLIE Y Q L IVADE W L ADE W ANY Q L IVADE F H V GDFL H E VMLVA L G ETE I S F K P K W SE F H ANY GL TADVF H ANY Q IV GL LTADVF NRATE S IKLNM NR RV P F R Q E P F G H L LTADVF P KYY NKA I KF D N S D IKLDV NR E L Q G Q R YYIMLFY KY G I A S L IL S D C VG Q N G KY G I A L RV SIL S D V Q NR E C G G KY I Q G A L RV SIL S D C VG Q G KKE R NL QYRTVT S P VKV NEL E KKEIH LI KV PVAD E KK S II R E Y Q H KKEIH E NY Q H KKEIH E H D RT Q K AHA G G R IVLKK Q D K N GL VLKK Q R D K R Q G L E IVLKK Q D K NY GL YI G RT S D AEVF G E LFLPV H KIDEPRVRY K E I QV KIDEPRVRY Q K V KIDEPRVRY K E QV NHILYF F LFA NV N Y GKE KHELLDINAPY KHELLDINAPY KHELLDINAPY KA FK Y G S M K V Q H G S G G G Q P KAEF EKEDDA KAEFVEKEDDA KAEFVEKEDDA EL G E DE S K MPNY S KA Q DL F DE R V GEYR H EVK G VPE EL FLT L YKFLT EL KFLT TALEY IVLKH TAERLIM S I ADY TA Y YK SE I Q L L E G S TA S Y E I Q G G L S L TA Y Y SE Q N LLNL Q L EDYTE LLD VPNWD E T Q S S G S S S G HNE L E T Q G HNE E I G G L L E A S T Q S G S HNE GYAE EVDATT G N YDI P M SEKVL KD N L GY S E A S G PLYVHK G N Y S E A S G PLYVHK N L S G Y S G PLYVHK YVEI S M RLLDDV LVEKHRVDY S Y E YIAALI LYV Q YIAALI YV Q S YIAALI YV Q S VL L KHLE EDKDL ANI G E ILYIYK G I VDPN ILYIYK I L GVDPN ILYIYK I L GVDPN KR S KKMD K HWY VL GAEK KRYLPEE S D DKRL PRT IKLNIP PRT IKLNIP PRT TIKLNIP MLELPKVRHK TFLFD K HT IKNVLL MLI D T QDIKNVLL MLI Q D DIKNVLL KDDFLRINHK L ML S KDD Y MLI D T QD DVLV G AK S S K KDKIFDIDEKL KDKIFDIDEKL KDKIFDIDEKL RAYYVDK G KTIRHRHL K AINEA R K AINEA R RKEEKEL K Y D KA SD S R P AK S K LP L R SV N K S Q C K A AINEA QEKTLP S L V S N Q C K Q A EKTLP S L V S N Q C VEE S ELKNHPYD K Q A EKT S S NIK P G E VKYILI VEEKMK DK EEKMK DK KD VEEKMK K KD LE A F G S E I Q S ILE V S K Q F NP Q T LR L S EVRE G S DY P KD V K G AV Y G P AV RE S D GDY G P AV DA S DPTFVEK ILI S D DL Q S NKTT Q S AAK KHV Q L S EVRE G S D S AAK KHV Q L S EV S EAAK LD ILL LR N KTL C YDKKYILRI D TE YD L E G Q Y INLM G L PYKYD G L Y E QINLM G L PYKYD G L Q Y INLM L KHV GPYK DA Q F KYI G I VI S D H DAALLD S FY DAFYFDEFFKLRDAFYFDEFFKLRDAFYFDEFFKLR ELLELKKRK K DLDLDE I TE GLR E N S I LVFAPN LR QEKIDVKNY G D A TH R Q L ELVFAP G TKKKTHV Q S I LRE STT TKKKTHV Q S S I I LRELVFAPN G S TT TKKKTHV Q S I S S I TT LKALFIID I NIKVV GLTKRKKPT MKKLADTPRELE MKKLADTPRELE MKKLADTPRELE VRDNNEA I KH D VR S E KTKINYPII VRF ALETTEP L G PY G VR SYLLLAL Q L LMEADLLV ALD S Q S L QE E IF RF L IF SI K LY V SH LD S Q Q S E S E I K LY VRF Q S R LD S Q S L F QE E I SI K LY WHEIMF NN R HFYRNP L Q S AVDALH S FE L A G HAVDALH S Q FE L A G HAVDALH Q S H SFE G L GYLYRM G S DK I F STR G W YNKTA L E EH W H GYK TDT G YK RTDT G YK LRT VIEKEPAVREVR ITYIK S S F S I GNEE K LR T W EI G P GLT I K L GDTEI G PT W DT L V G LT STFYIELIF T S V TKFFDPDK N I GL V I ST N G DT GAVLED S T G N AVLED S L V I K N G DTEI G PT NATFFN F K L K AVLED N Q S HENPFEKDLKR S K T NPVHT HYA K V Q S S Q NPVHT K V T G L G LT S V Q S S Q NPVHT TYKKLA Q Q Y YMA G GNKEEPV S S FAY QT T FN D ELDIFF Q A MT G H G P Q F LN S Y GTLI E C Q K E KV M HYA C V Q M Y YMA G HYA Q K S S S LE I G T TLI Q E K S E H K SLE I G T TLI E C E H KV M Q Q Y QK S S LE IKF PL G L L G DP GIDHK S AF H SV N E IEK LL G I IEK L LTV G N I IEK L TV N I A G I DHN Q DI S S DEI TDHLVKKV A S G G QV K C L DHL G Y NL S LTV N QRIK T DHL Y L GNL Q S RIK T DHL Y L GNL S L QRIK LY G T LII S AK V Q I AH GA Q K S K V HKI Q M L YD G EDNPL Q T SKLAI A E FAW ST L Y GA I K Q K A YFAW L VAD G L A D AH K Q K A YFAW K T GA D AH Q I K Q A QVAD IV S DLE S G K Q V S L EDIII N E SEA IV Q P Q D G G Y RY K Q Q L VAD L SVIEF IV Q Q I G Y P GRY Q KL Q S VIEF IV G Q Y P GRY Q K S L VIEF TE G N K LK SDPEITV G N N S I ELILKDHKTL S V VEFDI ILKNKI VEFDI NKI VEFDI YEVNVD DIK IYAA AEK T ILK T ILKNKI EEIDVI S E YIK Q K IPAI Y IRDDLKLT YD SDLVITKRY DE L P S T G Q K E N E Q MA E YD SKDE L S AEK G K P QE MA E YD S AEK Q S KDE G L K P QE E Q MA E SK 0 IMIEKN K ADDDDIVRRKLNEE S F K G D S K G D 0 G K T RVAT IL K HA NEE F N SK D E G HA LNEE F N TK ALA S VIEA G HD RKKKAN AN A T S K YL I K S KKAN S A T K HA SYL S I O M G L G Y L K K L K N K E M G N G N Y F G N Y I Y II Q K N K R M A K K S A Q N F T S YL I KL S KK S Y K N L T M A K K Q S N S F Y K N L T M A K K Q N S F Y K N L T 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 0 0 0 0 1 8

1 8 2 2 2 2 B 7 4

F G L ADVF G A KEEAATF F G L TADVF P F G L LTADVF

NR E LT Q V P F G I T FD NR E L Q RV R Q E P F G L LTADVF P KY G I A L R SIL S D V C G Q NRE G LK L E VKL SVMNLV KY G I A S L IL S D C VG Q N G KY G I A L RV SIL S D VG Q NR E C G KY I Q G A L RV SIL S D C VG Q G KKEIH E NY H KY Q KKDE EFIKPVV KKEIH E Y Q H KKEIH E NY Q H KKEIH E IVLKK Q R D G K L E IVEL N TR IVLKK Q R D K N GL VLKK Q R D G K L E IVLKK Q R D K NY Q H GL KIDEPRVRY Q K V KIEF F Q RKY GKVVYE KIDEPRVRY K E I QV KIDEPRVRY Q K V KIDEPRVRY K E QV KHELLDINAPY KH DINAPY KHELLDINAPY KHELLDINAPY KAEFVEKEDDA KA Y Y VKPVKAL KHELL SE MFAELVI KAEFVEKEDDA KAEFVEKEDDA KAEFVEKEDDA EL YKFLT FLT L YKFLT TA S Y E Q L EL E F V S G G L S TA S G A K S T G Y K L ST E EL YK I Q L E G S L TA S Y E I Q G L EL KFLT G L E I S TA Y Y SE Q N L Q G S HNE AA E MPT P Q TA S Y E SF E T Q S S G G HNE L E T Q S G S HNE E I S G G L S L ST Q S G HNE GY S E A S T GPLYVHK N L YI Y Q L IVADE N L GY S E A S G PLYVHK G N Y S E A S G PLYVHK N L E A S G Y S G PLYVHK YIAALI LYV S G Y Q YIT LTADVF I LYV Q S YIAALI YV Q YIAALI YV Q S ILYIYK G I VDPN ILI Q D V P YIAAL D V K G I VDPN ILYIYK I L GVDPN ILYIYK I L GVDPN PRT LNIP PRKI L R SIL S C G Q ILYIY G PRT IKLNIP PRT IKLNIP PRT TIKLNIP MLI D TIK QDIKNVLL MLII H E Y Q H MLI D T QDIKNVLL MLI D T QDIKNVLL MLI Q D DIKNVLL KDKIFDIDEKL KDKK K Q R D K N GL DIDEKL KDKIFDIDEKL KDKIFDIDEKL K INEA AK PRVRY K E KDKIF QV K AINEA R K AINEA R K A A QEKTLP L R SV S N C K Q K Q A EV LDINAPY K Q A EKTLP L R SV N K S Q C K A AINEA QEKTLP S L V S N Q C K Q A EKTLP S L V S N Q C VEEKMK DK P KDVEE VEKEDDA VEEKMK DK EEKMK DK KDVEEKMK K KD L VRE G S DY G AV E Q Y KFLT E G S DY P KDV GAV Y G P AV RE S D GDY G P AV Q S E S K KHV Q L S S Q L EVR AAK KHV Q L S EVRE G S D L Y EAA S AAK KHV Q L S EV S EAAK YD G Q INLM G L PYKYD L Y G S L Q L S S GF E I ST S G Q G S HNE YD L E G Q Y INLM G L PYKYD G L Y E QINLM G L PYKYD G L Q Y INLM L KHV GPYK DAFYFDEFFKLRDAFK PLYVHK DEFFKLRDAFYFDEFFKLRDAFYFDEFFKLR ELVFAPN VL LI LYV S DAFYF Q ELVFAPN LVFAPN TKKKTHV S LREL Q S I S I TTTKK I V Q S S I I LRE STTTKKKTHV Q S S I I LRELVFAPN STTTKKKTHV Q S I LR S S I TT MKKLADTPRELEMKK Q YK G VDPN TKKKTH S TIKLNIP MKKLADTPRELEMKKLADTPRELEMKKLADTPRELE VRF IF YVRFV DIKNVLL VRF ALD Q L S Q S E S E I K L SH E IF RF L IF K FDIDEKL ALD S Q S L QE S I K LYV SH LD S Q Q S E S E I K LYVRF SH LD S Q S L F QE E I SI K LY Q W HAVDALH S FE L ALD G HA G NEA R LH S Q FE L A G HAVDALH S Q FE L A G HAVDALH Q S H SFE G L GYK W N C AVDA KLRTDT T G YKA TLP S L V S Q W H GYK TDT G YK RTDT VI G DTEI G P GLT H MK P KD K LR T W EI G P GLT I K L GDTEI G PT G W YK LRTDT N G LT G DTEI G PT ST G AVLED S L V I T G N A RE S DK GDY G AV V I ST N G DT GAVLED S L V T G N AVLED S L V I K NPVHT YA K V Q S S Q NPVI Q EAAK HVNPVHT HYA K V Q S S Q NPVHT K V T G N AVLED L G LT S V Q S S Q NPVHT Q S T YMT G H C Y YMA G HYA C V Q M Y YMA G HYA K GTLI Q E K S E H KV Q M Y T YM SLE I G TL Y LINLM L K GPYK MA G GNFDEFFKLR T Y GTLI E C Q K E S H KV Q M SLE I G T TLI Q E K S E H K SLE I G T TLI E C KV M Q Q Y QK S E S H LE IEK LTV G N IIEKAWAPN LRIEK LL N DHL Y LL GNL Q S RIK TDHL S LTV G IIEK L LTV G N IIEK L TV N I S I Y G I YPTHV Q S S I TT DHL G NL Q RIK TDHL Y L GNL Q S RIK TDHL Y L GNL S L QRIK LYFAW H Q A YF G RADTPRELE FAW AH GA D A I K K DI I K Q K A YFAW D AH K Q K A YFAW K T IV G Q Y P Q G RY Q KL Q VAD G L A SVIEFIV G Q S L E IF LY L Y GA K Q VAD G L A VAD G L A D AH Q I K Q A K L Q VAD Q S L VIEFIV Q Q I G Y P GRY Q KL Q S VIEFIV G Q Y P GRY Q S VIEF VEFDI LKNKIVEF K P Q E S I K H IV Q Y P Q D G G RY QEDALH Q S S FE G L VEFDI ILKNKIVEFDI NKIVEFDI YD T I EK EELRTDT PT YD T L S AEK T ILK T ILKNKI DE G L K P S A QE M EYD DE Q S A KDE G N DTEI G LT DE L K P G Q E N E Q MA EYD SKDE L S AEK G K P QK M EYD S AEK F D S KDE G L A L D E Q A K P QE 0 KLNEE F N SK G HA K Q S VLED L G S V K G G HA LNEE F N D E Q MA E SK SK G 0 KKAN S A T S K YL I KLN SKKAFRT HYA Q K S KLNEE S A K H S I K NEE F N SK YL S KKAN A T S K YL I K SKKAN A T K HA SYL S I O M A K K Q N S F Y K N L T M A K L K E C G E H K V M Q KKAN Q Y M A K K Q S N F T S Y K N L T M A K K Q N N S F Y K N L T M A K K Q S N S F Y K N L T 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9

8 0 2 0 2 0 1 1 1 8

2 2 B 7 4

W NY Q L IVADE W ANY Q L IVADE W Q L IVADE W ADE W E Q R YLN Q K S L F H A GL ADVF L TADVF H ANY Q L IV GL LTADVF G VYKVN NR E LT Q V P F G H Q NR E LTADVF H ANY P F G L RV P F R Q E P L H P E SA KY G I A L R SIL S D C V G G I Q V GA L R SIL S D C VG Q NR E L I Q G KY G A S L IL S D C VG Q N H G KY G I A L RV SIL S D V Q NR D K G G DMTPVADF C G KYILYYLEVFE KKEIH E NY H KY Q KKEIH E NY Q KKEIH Q KKEIH E NY Q H IVLKK Q R D G K L E IVLKK Q R D G K L R E Y H KKIFKK QD K N GL VLKK Q R D G K L E LVATEK G S S K G VG A GR KIDEPRVRY Q K V KIDEPRVRY K E IVLKK QV KIDEPRVRY K E I QV KIDEPRVRY Q K V YIL Y MPNYH KHELLDINAPY KHELLDINAPY KHELLDINAPY KHELLDINAPY KHL Q E I Q L IVLLE KAEFVEKEDDA KAEFVEKEDDA KAEFVEKEDDA KAEFVEKEDDA TATE EL YKFLT FLT L YKFLT M EDDYKR TA S Y E Q L EL Y YKFLT Y E I S G G L S TA S E Q L EL EI S G G S L TA Y YK SE I Q L E G S L TA S E I Q L L ELEI S RVDA G G S TALKHILKD P V SY NL Q G S HNE T Q S S G G HNE L E T Q S G S HNE KME GY S E A S T GPLYVHK N L S E S T Q G HNE E S G A PLYVHK N L E LYVHK G N Y S E A S G PLYVHK N LEE GYELPKD G KH WE GN YIAALI LYV S G Y Q YIAALI S G Y S A S G P Q YIAALI LYV Q S YIAALI YV Q S YVEFLRVRHT S E ILYIYK G I VDPN ILYIYK I LYV GVDPN ILYIYK G I VDPN ILYIYK I L GVDPN VLYYVDINHKK PRT LNIP PRT TIKLNIP PRT IKLNIP PRT IKLNIP KREEKEKEY MLI D TIK QDIKNVLL MLI Q D DIKNVLL MLI D T QDIKNVLL MLI D T QDIKNVLL MLE EFLKD S R S G KDKIFDIDEKL KDKIFDIDEKL KDKIFDIDEKL KDKIFDIDEKL KDK G A TV NID K INEA N C K AINEA R EA R K PPPT Q S S S ILE K A A QEKTLP L R SV S Q K Q A EKTLP S L V S N C K AIN Q K Q A EKTLP L R SV N C K S Q K A AINEA QEKTLP S L V S N Q C K Q A LLLVEKD VEEKMK DK P KDVEEKMK KD VEEKMK DK EEKMK DK KD V I V QKYI R L VRE G S DY G AV EVRE S DK GDY G P AV EVRE G S DY P KD V GAV Y G P AVL S E IETK I L GVV S D Q V Q S E S K KHV Q L S S AAK KHV Q L S EVRE G S D L Y EAA L S AAK YD G Q INLM G PYKYD L EAAK HV Q L S S G Q Y INLM L K GPYKYD L E G Q Y INLM G L PYKYD G L Y E QINLM L KHVDT DVKRK K GPYKYD I I GAFIKNH G D AD DAFYFDEFFKLRDAFYFDEFFKLRDAFYFDEFFKLRDAFYFDEFFKLRNADKNEIDIKHV ELVFAPN VFAPN LRELVFAPN LVFAPN TKKKTHV S LREL Q S I S I TTTKKKTHV Q S S I S I TT TKKKTHV Q S S I I LRE STT TKKKTHV Q S S I I LRELEKTEA APYL STT THEIMLP S L MKKLADTPRELEMKKLADTPRELE MKKLADTPRELE MKKLADTPRELE LKLYRT F KLK QILR VRF S IF YVRF ALD Q L S Q E S E I K L SH Q LY VRF S S L QE E IF SI K H ALD S Q S L QE E IF RF Q S L IF SI K LY V SH LD S Q E S E I K LY VRDKE S N GDKTTR SH LFYI S E ATREVT WHAVDALH S Q FE L ALD G HAVDALH Q S S FE G L AVDALH S Q FE L A G HAVDALH S Q FE L A G HTFFNLTF GYK K LRTDT T G W YK K LRTDT PT W H GYK TDT YK RTDT VI G DTEI G P GLT G DTEI G LT K LR EI G PT G W GLT I K L GDTEI G PT G W YKEKE KL E GLT ST G N AVLED S L V I N TE N F Q Q S S Y SFA AVLED L G S V V I N G DT ED S L V K V S S T G T G N AVLED S L V IK NPVHT YA Q Q NPVHT HYA Q K S S T G AVL S HYA K V Q S S Q NPVHT K V S S TF Q A A KV Q T PRT S DVP S P G L Q Q N L G LT TYMT G H C T A C G M Q NPVHT QY GTLI Q E K S E H KV Q M Y YM SLE N I G TLI Q E K E KV S S H LE T YMA G E KV Q M Y YMA G HYA C V Q M Y F VK EV S S H LE I G T TLI Q E K S E H K SLE I G T E Q G VK Q L HI S T VI S Q IEK LTV G IIEK N I G TLI E C Q K DHL Y LL GNL Q S RIK TDHL Y LL G I IEK LL N GNL S LTV QRIK S LTV G I IEK L LTV G N I IR IELHKI Q L Q S QRIK T DHL Y L GNL Q S RIK T DY G N S K DRDDLE LYFAW H AW K T DHL G Y NL AH GA D A I K Q K A YF D AH I K Q A FAW I K Q K A YFAW D AH K Q K A TEITP N N D G I K L Q VAD G L A Q IV G Q Y P Q G RY Q S VIEFIV G Q Y P Q AD L Y GA GRY K Q V Q S L VIEF IV Q Y P G G RY KL Q VAD G L A VAD G L Y VNV SMEV Q S VIEF IV Q Q I G Y P GRY Q KL Q S VIEF IVLVE I TDVK GNYVK K I QT VEFDI LKNKIVEFDI ILKNKI VEFDI NKI TEK IM RVA YD P T I SAEK T ILKNKI VEFDI T AEK T ILK DE G L Q K E E M E Q A YD P S AEK D S KDE G L K Q E A E YD SKDE L G K P S Q E N E Q MA E YD SKDE L S AEK G K P QE M E YEL Y A GLKD S S VIE S A Q S A KEE A N L ANNL 0 KLNEE F N SK G HA EE F N SK D E Q M G A KLNEE S F K G D G F S L QNR 0 KKAN K HA NEE F N SK D E G HA M Q F Y SA T S K YL I KLN SKKAN S A T K H SYL S I KKAN KAN A T S K YL I K S K DAHI Q MA L F N S F Y N N L T M A K K S A Q N F T S YL I KL S K S G P S O M A K K Q N S Y K N L T M A K K Q Y N N L T M A K K Q S N S F Y K N L T M K A G L S R H E L T P A G E

6 W 0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 1 1 1 1 1 8

1 8 2 2 2 2 B 7 4

W KFR KR G I D G G H W T G I D G G H W V I F H I GKR I VV H W GD G H F G H EL K T QKVD G V YE F H KR GEL Q K KVD G V YE F H IKFR GKR I V G H W KR K T G D G G H F G H EL Q KVD G V YE NREL K T QKVD V G G YE NRLFYIIKNKN NRLFYIIKNKN NREL K T G D QKVD V G H GYE NRLFYIIKNKN KYLFYIIKNKN KYLNLMEKLAV KYLNLMEKLAV KYLFYIIKNKN KYLNLMEKLAV KKLNLMEKLAV KKE DLYPY KKE EDLYPY KKLNLMEKLAV KKE TLE E EDLYPY TLE Q E L E SRVRADD TLE Q E S L RVRADD TLE EDLYPY TLE Q E L EDLYPY SRVRADD SLE Q S L RVRADD LENDINDW S LLENDINDW LE Q E S L RVRADD NDINDW SHLENDINDW S L DIKEKEHN F S S HDIKEKEHN S F S S HLENDINDW S LLE SHDIKEKEHN S F KADIKEKEHN F S H S KAEKPFLKANE KAEKPFLKANE KADIKEKEHN S F KAEKPFLKANE EMEKPFLKANE EMETLV K EMETLV KK EMEKPFLKANE EMETLV HKK TAETLV S HKK TAYLVT S S HK Q S HR T Q S S H SHR TAETLV HKK TAYLVT Q S S S S HR NLYLVT Q S HR TFKLLVYP H TAYLV S TFKLLVYP S H LYLVT Q S S S HR KLLVYP S H GYTFKLLVYP S H N L GYEVNT T LELI G N YTFKLLVYP S H N LTF GYEVNT ELI AVEVNT LELI AV I LELI N L GYEVN K G I VNLE AVEVNT LELI AV HK I L I G VNLE VL EHK G VNLE VL K EHK G VNLE AV S IKKIKD VL EHK G I VNLE VL K E S PIKKIKD KR S K KIKD KR S PIKKIKD VL K EH S S P S Y KR IKNE R S K IKKIKD KR S MIKNE MI S PIK GMIKNE F G IIKNE QRYEIDR S Q MI F G I QRYEIDR S Y K S Q MI S P GIIKNE MI F G Q RYEIDR S S Q Y KD Q F RYEIDR S S Y MI Q KDL MEA VDK KDL MEA DK D Q F RYEIDR S S Q Y KDL MEA VDK KAL K G I DFP S I KKA G K K G I DFP I V SKKA K K G AL EA VDK DFP S I KKA G K SKK I MEA GDFP I VDK K K A SKKA G S KATT HM K A SKATT VEATT S NYKHMVEELN S MS NYK GD YRVEELN M NYKHM S K KK I M GDFP S I KKA G K K AK G I SKATT YKHM S G S D PYRVEATT KHMVEELN S MS N GD L LN S M G D PYRL S EENATT I P GELI L EEIATT G I ELI L M NY PYRL IATT I PYR GELI R S E SEEIATT G I ELIR S EKMLLTFVLRR S S EKMLLTFVLRR S ELN S G S D SEEIATT G I ELI R S EE SEKMLLTFVLR YD KMLLTFVLRYD KREE RTTT YD E TTT YD KMLLTFVLRYD REE RTTT DA G L KREE TTDA G L VEPK S F NEVE DA L KRE GVEPK F R SNEVE DA G L KREE TTT DA L K GVEPK S F NEVE DLFVEPK F RT SNEVEDLFFIDVRR LY DLFFIDVRR LFVEPK F R SNEVE DLFFIDVRR LY T FIDVRR YT KFEIPF S K H T VKFEIPF K LY D SH L A V QNKFEIPF K L SH A V YRLEI E G L L Q A NYRLEI FE L T G L A VFIDVRR LY T FEIPF S K H QNKFEIPF S E H A VK QNYRLEI VRKYRLEI FE L L Q N G RKFV EI Q F S PT KFV EI S Q RKYRLEI FE L L G RKFV EI Q FE G L W S PT SLFFV I S Q T S V LFDA Q LHV G G LI V R SLFDA Q W LHV G PT GLI S V LFFV WHNDA W E QLHV G P GLI NDL HNDA W EI S Q QLHV G PT S V LFDA Q W LHV G G LI GLI GY E NDL L E KNDL W HN Q S K GMEI K S W HN E S K S W G EI K L Q Q G Y KNDL W HN S KNDL S V I Q G Q G MEI K L Q S G Y Q I Q TEETD A Q S I F Q G Y SH G Q G M QTEETD S A I S F H I Q G Q E G S MEI K L Y Q E G MEI K L S S G A Q I Q G TEETD A Q S I F Q S H ST N TEETD S I S F H V Q S T AMRDYKENI V I ST DYKENI S V T A N TEETD S I S F H S V T AMRDYKENI NP G AMRDYKENINP G N H DKT LTAI NP N AMR G DKT LTAI NP G AMRDYKENI NP G N KT LTAI TYV G VNI S D TK I S D TK T YV G M H DKT GVNI D LTAI YV STK T G T EI K T V G H VN Q T Y G MEI LIK Q K T H DKT TAI YV H D GVNI S D TK K T G T I Q T LEI LIK Q K T M IE S E M LIK QRIVA Q S I Q T LIE S E M Q RIVA S G Q I T M G VNI D L ST QLEI K Q K T MEI QLIE E IK K T Q E M S M L QRIVA Q S DHKIE S Q RIVA S I Q L QDHK Y PMAHRIEWDHK PMAHRIEWDHKIE E LI S Q M RIVA S I Q DHK MAHRIEW L Y I HRIEW G LTKIVNKI KIVNKI I MAHRIEW Y P TKIVNKI G S F Y PMA GLTKIVNKI G L Y I SFANLKDKM L I G Y LT SFANLKDKM E G L S Y F Y P GLTKIVNKI G L Y I G L SFANLKDKM IK ANLKDKM EIK N E G Y ILRH S N T IK M E IK FILRH N E ST T S Q WFILRH S N TT Q Y WFILRH S T IK GPDEPIHA T Q WF S G Y PDEPIHA Q ANLKDK S FILRH S N T T Q Y W DEPIHA Y Q E L G Y PDEPIHA E S NKHK K PHV L T S Y Q E L Y W GPDEPIHA E S G P QLNKHK PHV S L EK L Y Q L A PHV S L Y Q E LNKH YNKHK A PHV S EK V IYK G ENMT EK K G A ENMT EK Y NKHK HV L Y S EK YK G A ENMT 0 KI G IYK G ENMTKI G Y S KT TNIRY KI Y V IY G S KT KI G Y V I SKT TNIRY 0 KT V F TNIRY KI G NMT N S KT F TNIRYKT NET S F YTVKRKT S YTVKRKT V IYK A P GE IRY KT NET S F YTVKR O M A S N E T S Y T V K R M A S N D E Y L M D K R K M A N NET S D E Y L M D K R K M A N S KT TN S N E T S F Y T V K R M A S N D E Y L M D K R K 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 1 1 1 2 1 8

1 8 2 2 2 2 B 7 4

W R T G I D G H W KR G I D G H W KR T G I D G G H I F H K GEL Q K KVD V G G YE F G H EL K T QKVD V G G YE F G H EL Q K KVD G V YE L H FFLDARL GDTEFM N Y I FFLDARL Y GV G L L G H DTEFM I G N V G L NRLFYIIKNKN NRLFYIIKNKN NRLFYIIKNKN NR K I KPT NR N E Q K AKPT KYLNLMEKLTV KYLNLMEKLAV KYLNLMEKLAV KY G D R N QY L E Q A SIVKPLI KY G D Q R Y S L IVKPLI KKE LYPY KKE DLYPY KKE EDLYPY KKIKVDVVALV TLE E ED Q S L RVRADD TLE Q E L E SRVRADD TLE Q E S L RVRADD II RA S KKIKVDVVALV SLLENDINDW LENDINDW DINDW FM G Q LF Y EV F S L Q S I I I Q II A V I Q S SEH FM Q R GLF Q Y I E S V S I EH SHDIKEKEHN S S HDIKEKEHN F LEN S S L SHDIKEKEHN S F RHLFTAE I V SAE KAEKPFLKANE KAEKPFLKANE KAEKPFLKANE KALDELPPN N I RHLFTAE S I AE EMETLV ETLV K EMETLV K G L KALDELPPN N I GL S TAYLVT Q S S HKK EM SHR YLVT S HK Q S S HR Q S S HKK DLNEY S K T DLNEY AVL K SHR TAERL R AVL Q NYF Q ATAERL Q R N LTFKLLVYP H TA S LTFKLLVYP H TAYLVT S TFKLLVYP S H LLE K G H H NYF K T QA GYEVNT LELI G N YEVNT G LN LLE AV EHK G I VNLE AV I LELI N L GYEVNT L N I LELI G N YDI S IM D A SD G EL G YDI L K G A LN SIM S D D G G EL GVNLE LIEKHMV VL S K KIKD VL K EHK G VNLE AV S KKIKD VLDDKEK K H GA T KI LIEKHMV H KI Q LDDKEK G K A Q T KR S PIK GIIKNE S PIKKIKD VL K EHK S S PI KNE RYLPRVRHV S AE V S KRYLPRVRHV S A S E MI Q F RYEIDR S S Y KR Q MI F G MIKNE QRYEIDR S S Y KR Q MI F G MI QRYEIDR S K S Q Y MLTFLDLNHEAVMLTFLDLNHEAV KDL I MEA I VDK DL I MEA VDK KDL MEA VDK DDYVEEDYRLT KDDYVEEDYRLT KAK G DFP S KKA K K G K AK G DFP S I KKA G K K G I DFP S I KKA K K G T KFDKDKLT K T DKFDKDKLT SKATT NYKHM S KATT HM K A SKATT K K D S KRY S K S K EIV NKRY VEELN S M G S D PYRVEELN S MS NYK YKHM S K GD YRVEELN S MS N GD PYRPE T EIV VRRPE T ETI S S IVRR L EIATT G I ELIL EIATT I P GELI L EEIATT G I ELI V F G ETI S N SI QEPLKTEI KV F G PLKTEI R S E SEKMLLTFVLRR S E SEKMLLTFVLRR S S EKMLLTFVLRK S K LFLTILRN G R RK K Q E SLFLTILRN R K GR YD KREE KREE F RTTT YD E RTTT YDP H VV DA G L VEPK F RTTTYD SNEVEDA G L VEPK S NEVE DA L KRE GVEPK S F NEVE DAA K Y SLV Q S RK V ND YDP H V ND QEI DAA K Y SLV S V QRK Q V EI DLFFIDVRR YDLFFIDVRR LY DLFFIDVRR LY D KLDILNYPLD D T KFEIPF K L SH KFEIPF S K H T VKFEIPF S K H L DLYN DL L KLDILNYPLD L A V QNYRLEI FE L T GL A V QNYRLEI E G L L Q A NYRLEI E L T Q G IKENNE G I L E Q DLYN L E SF K T G IKENNE I D G R S L F G K V RKFV I S Q T RKFV W EI Q F S PT KFV EI Q F S PT VRLKTLK L R SFL F VRLKTLK S L FL SLFDA W E QLHV G P GLI S V LFDA Q LHV G G LI V R SLFAA Q W LHV G G LI ALDIMIKDEV S L F ALDIMIKDEV L F SF WHN DL NDL L FYKAIDKKYK HFYKAIDKKYK GY E KN Q G S MEI K L Q S HN Q G W Y E ENDL L Q G S MEI K S W HN E S K EI K S F H HKNEAKRKKF G F YHKNEAKRKKF VI Q G TEETD S A I S F H I Q G TEETD A Q S I F Q G Y SH G Q G M QTEETD A Q S I F Q G Y SH KYIN IPA V IKYIN IFIPA ST AMRDYKENI S V T AMRDYKENI V I ST DYKENI V I STKFFK P IF SF I A S TKFFK S P F NP G N H DKT D LTAINP G N H DKT AI NP N AMR G DKT LTAI NPFTDA N A C L P NPFTDA N I SA L A CP TYV G VNI S TK T G VNI D LT STK I S D TK G MEI LIK Q K T YV EI LIK K T V G H VN Q T Y G MEI K T D G E ED S A S Q VFL D S G Q A VFL I Q T LIE S E Q M RIVA S G Q I T M QLIE S E Q M RIVA Q S I Q T LIE S E M LIK Q G T Q F DI G A G T F D Q E E QADLPLDI G A QRIVA Q S IK A Q ADLPL QT KNL IK Q A T M I KNL DHK Y PMAHRIEWDHK N Y PMAHRIEWDHK PMAHRIEWDH S M I F Q E I S N EEADH S N T S F Q E I S EEA L Y I G LTKIVNKI L Y I G LTKIVNKI G LTKIVNKI N T S G K AEVHKKKT Y G EVHKKKT G S FANLKDKM E G S FANLKDKM Y I Y KDKM L Y V G KK NRF G L EV K A GKK DLNRF IK WFILRH S N TIK N E G L S FANL ILRH N E G E Q Y S T IDDD V DL QITWVT I KIDDD Q V ITWVT T E S G PDEPIHA Q Y WFILRH S T IK GPDEPIHA Q WF S G Y PDEPIHA I K SELIVKKDIANL S ELIVKKDIANL Y Q LNKHK L T NKHK K PHV S L K IIVYIIHL EK IVYIIHL EK A PHV S Y E S Q L A PHV L T S Y Q E LNKH Y IYK G ENMTEK V IYK G ENMT EK K G A DNMT Y E GTL G Y EID S K G TL Y I GEID 0 KI G V S KT TNIRYKI G Y S KT TNIRY KI G Y V IY SKT S V G E Y N Y GI KLFAEKD S K SV E Y N G G I 0 KT F TNIRY KLFAEKD NNET S F YTVKRKT NET S F YTVKRKT NET S YTVKRKT K T LT TK O M A S D E Y L M D K R K M A S N D E Y L M D K R K M A S N D E Y L M D K R K M A E FLT TT S G Y N L K Q L R D N F K S M A S E Y F G N L K L T Q R D N S F

6 W 0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 2 2 2 2 1 8

1 8 2 2 2 2 B 7 4

L FLVEHALLRL YFLVEHALLRL TAKAPYD L APYD I LDARLVEI L H Y GDTEDVVVKTTL G H DTEDVVVKTT L H YFL GDTELMKTILI L H YFLTAK GDTELMKTILI L G H F F GTEFM II NR IV VENR D NR IEHALLRNR IEHALLRNR NLE Q K AN N L GT KY G D R NKV QYYIEA S D LYKY D G R NKVIV VE QYYIEA S LY KY G D R N QYDVVVKTT KY G D R N QYDVVVKTT KY G D Q R YTIVKN KKIKVKH D KKIKVKH N H KKIKVKVIV KIKVKVIV VE KKIKVDVVAD K T QD II RAEE L N H SL G P G L II YIEA D VE K SLY II RAYIEA S D LY II A VFLL YI G K LFLPPY PTYI K RAEE S L L G D P G L II H G LFLPPY PT YI K RA GLFKH I G K LFKH N H L FM Q R GLF Q Y I E S VLEF RHLFKRDVA Q LVRHLFKRDVA Q H LVRHLFKEE L N SL D H GP L Y G RHLFKEE S L L G D P G HHLFTAE S I A I KALDEK DDLI LDEK I KALDELPPY ALDELPPY ELEEYR G Y S KA EEYR Y DDL G H M Q S ELEEYRDVA H PT K QLVELEEYRDVA H PT KALDELPPN G G Q QLVDLNEY NVLN S S S N TAERLIM D H Q EL SA I M SEHTAERLIM S D A S I EH TAERLK DDLI AERLK DDLI TAERL Q R N LLD MV LD V HE R G Y S T LLD R G Y H NY AP GYDI S L EE K HE I GH N L N L GYDI L M SEE G K H N I LD GL N L GYDI S L IM D H SA I M Q S EH G N YDI S L IM D H SA I M Q S SEH N LLE GYDI L K G A G Q LK SIM S D DVLY FIEKHRIRY K G S TFIEKHRIRY S K MV E IEKHMV E VLEDKDLNNF Q K AVLEDKDLNNF K T FIEKH QAVLEDKEE K H GH N I F LEDKEE K H GH N I LIEKHMV H RR KRYIPEEKN YIPEEKN LE KRYIPRIRY K G L V LDNKEK G K A Q T RV S T KRYIPRIRY K G L V S T KRYLPRVRHE MITFLFD G LEKR TFLFD I G G EL MITFLDLNNF Q K AMITFLDLNNF Q K AMLTFLDLNHE G Q S S KDEYVLV A I G ELMI SE IKDEYVLV S A E KI KDEYVEEKN DEYVEEKN KA N KR T K Q A E A D G LE K GEL A G LE KDDYVEEDYTD GEL Q S K K TI TVM S A S K K N IKR Q T S K T A E V A I SE K D K FD K T KFDKDKN S P K S Q S Q LK GDTLLKEA K S Q LKTVM S A K A S K D S K F IKR T KI S K Q E P K S LV A I SE Q S Q T KI S K K D S EIV NKFI Q E PE A ETI S S IVEE V F EPL VYVL I P Q S V F G DTLLKEA P Q K S Q L EPL L S I V F G DT QEPLKTVM S A AV F G DTIKR M S A AV F G PLKTEIF K S Q S IILV Q S K KLTK S Q S IILV S VYV QK LT K S S IILTLLKEA S Q EPLKTV I K S IILTLLKEA K Q E SLLLTILRN V YDKK VLN Q L KRYYDKK LN L K QKRY YDKK S YDKK L VYVL I K Y S YDP YH V L G S L DAAE Q N DYVRRDAAE Y V QN RRDAAE Y L QV S VYVL QK KLT DAAE Q Y V Q S K KLT DAA K V I S LI S V QRK Q YF D IDE G IDE I DYV G DV LN Q L KRY D T L D QEAYLK L DV KD SKN G R RT L D QEAYLK S L KN R KD L DIDI GRT Q EAYD L DIDILN Q L KRY DLKLDILNYPKK VKAKNIKDR DVKAKNIKDR ND VKAKNE I DYVRRT Q EAYD DYVRRTHELYN G DV KAKNE G I V KIKENNE I D G L PF G S L L VRLKTDIDF V N QEI RLKTDIDF Q V EI LKTLK S L KN R KV GR RLKTLK L D SKN G R RVRLKTLK S L FLK G A SLEIMEAK PLV S V LEIMEAK LV V R SLEIMIKDR ND S V LEIMIKDR WHFFRNPI S Q FRNPI Q P S E FFR IDF Q V EI HFFR V ND ALDIMIKDEVIP QEI HFYK GYNRTE FV Y E SF S HF G G W YNRTE FV S Y F G S W H GYNRT Q E AK LV G W YNRT E IDF QAK PLV G W YHKN E IDKKI GAKRKL G A V ITYIP G S L YIP G S PI Q P S E ITYINPI S Q STKFFNA A LL L IT Q V S F S V TKFFNA A LL Q L L TYIN SF V I STKFFE S F G S S V TKFFE F Y E IKYINPIFI NPFNNDLP S N KYKNPFNNDLP N V SKYKNPFNNE S FV Y V S F G S S V TKFFK N G LL PFNNE G S L L NPFADP S F G N T S S F SANT T F N E EL IKRKF A Q A L L N SF V S L F FD NP Q A VFIL G Q Q A L Q N ILVPA G T F N Q E EL IKRKF N Q L Q N ILVPA G T Q F E EA LP N V SKYK G T F N Q E EAA A L Q IK Q P T S DVHTTLAIK Q P T S A DVHTTLAIK P Q Q T A D SI P Q DLP S N KYK G T K E E QADIPLDNK DH N IKRKF IK Q T S A I IKRKF IK Q A T M I LTT DIRKPDH TT K EDIRKP DH Q ILVPADH TF Q N ILVPADH T S S F Q E I N KKL SEY IY G K E K E SAIIFI VHTTLA L T G AVHTTLA Y G N EVHKK L GEL G KVRDN A Y G L G G I EL K E S AIIFI L TTF G K EA GKVRDN I G A I Y GKL G KKEDIKKP I Y GKL K E GKKEDIRKP G L EV K A GKK DLN G S C I IKAEDIYYV D I SNIIKAEDIYYV S D NI IKAEDDAIIFI IKAEDDAIIFI AELIVKD AELIVKD RDN I G A AELIVRRDN A KIDDD Q V ITWE SELIVK Y A MKE Q KKKA SPKETY Q KKKAAELIVR YYV S D NI Y A MLYYV D I G I KDIAE S F Y S NI YEK IVYIIL K Q E L G ELA Y MKE S PKET Y A ML LDDLFK E T QL G ELA F K Q E L G Y EID KKKAK Q E L G Y EID KKAPEL F V GEID E 0 KLFADDE Q RWITKLFADDE L DDL QRWIT KLFADKE S Q PKET KLFADKE Q K SPKET KLFAEKD N K SV E Y S G T 0 K T N Y FH DIVNLK Y FH DIVNL K A DDLF K DLF KT LT TV Q L O M G G G N Y S G I R I K L M G T G N G N Y S G I R I K L M G T N Y FL G G N L E Q L R W I T M G T N FLA G G Y N L E L D Q R W I T M A S E Y F G N L K L T Q R D P Y 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 2 2 2 3 1 8

1 8 2 2 2 2 B 7 4

I KHRIKAW YFLLEP LE S Y Y L H A GAERDIRDD K L S L G H NTE VK V K L YFLLE S G P K L SD L H YFL GNTE G S P D L G H N V K L YFLLE S Y NRYLPEE D N HKK NR D VK V TE VK G S P D L G H NTE VK G VP K SD SVMLDV NR N S VMLDV NR N S D VMLDV KYEFLFD G AKV KF D N S D VMLDV NR G Q R YYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KF G D Q R YYIKLFY KKEYVVV NNRA K KVNNRA IIDEKTI K HRLNEK SHPV II I KVNNRA QRTKEV G D EK KV GL II Q I RTEEV G D E GL II Q I RTEEV G D EK VNNRA SII TYID G E ALFL K YH IALFL V G K YH IALFL K G L II I K QRTEEV G D GYH FL K G L GYH QH A ELN GDILTHLVL H I SHLFKR P V G Q DIKK S H HLFKR Q P DIKK S H HLFKR P V QDIKK H IAL SHLFKR P V QDIKK KA S K EPY LNKK KALDEK P KALDEK KP KALDEK VKP KALDEK VKP DLLVLI Q S VLNK S K DLVEYR H VK G S K ATE DLVEYR G H K V SATE DLVEYR G H S K ATE DLVEYR G H S K ATE TALDYILKENALTAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLILE NR LE IVLWV LLE MIVLWV LLE MIVLWV GYELDE G I DL G Y Y D SI N L GYEI L M SEKDYYKD G N YEI S L EKDYYKD G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LIL EKHRV AFH EKHRV FH VEKHRV AFH LVEKHRV AFH VLD L HLK GLLK F KVLRLV SYALTVLEDKDL S D DKI G LV G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRE DLMREKRYLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPEE HKNL MIE K TDI SMKANDNIYMLTFLFD G K ATK MLTFLFD K H GATKKMLTFLFD G K ATKKMLTFLFD G K ATKK KD Y FRNPKF L D Y VLVRHKA S K KDD LVRHKAN KDD LVRHKAN KDD VLVRHKAN KM S KEE IN Q S H L KD GKV TINHRHL KV Y V G TINHRHL KV Y SKDYIE G S K G INHRHL KV Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PD FFNP Y AKETAK SRFPVPE S Q LKEYPYD AK K G S K TLKDILI PE S Q G NTLKDILI PE S NTLKDILI VP S K D R DLRFLL F G NTLKDILI PE S Q Q EPL LRVP F G N QEPL LLRVP Q F EPL LLRVP F G Q EPL NLLR KT D S L P S VP R A L Q KNLILV S S NL Q S IK V Q S S N SIK NLILV Q S S N SIK YD Q F E F S S V S G HYDPKYILTE T T KNLIL GE YDPKYILTE T T K S S T T KNLILV Q S IK DALT S A EVIA V IDAANLD R S D LY DAANLD D G E YDPKYILTE D G E YDPKYILTE T T GE ELTVTKLHF S Q Q LDLKLDE I L GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV MTIRE D E G N E YLKRK G N N G L T E YLKRK N H GN L D G T N H LKLDE I L GVV H L G N L D G T YLKRK G N N G L VKK D V S V E T T SV Q Y E G NIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRE K M G Q A SDINDKR Q K KVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALETVITYLKL LMEA I ELV ALELMEA LV LELMEA LV I W HLP KDFTK L ALE C HFYRNP S F I Q S FYRNP I E SF S A S ALELMEA ELV SF I Q S GYDV Q V LIAIV G E E G W YHKN E S I EH W H GYHKN I I Q HFYRNP I E SF HFYRNP SNE S EH G W YHKN I I Q S EH G W YHKN E S I EH VIF YI S KS N GDKE G DKE IKYI KS NE S G DKE STV L ELN GF L II S T SKN S V IK FFNAKR N I KYI S K GL V I STKFFNAKR N I TKFFNAKR N I IKYI S KS N GDKE STKFFN NPKRN D E QIDTVA T S TK SNPFAADIIF S K K T NPFAADIIF K G L S V S T NPFAADIIF K G L V AKR N I GL S T NPFAADIIF S K T YKKWY PTPLP FD K N A Q A FD L K T GNFAPT Q T E EL Q A D QAF D F S N F S LN T F GN E EL D F F Q K S S N T FD L F F GN E E QAF S D N IKDDRTA D R Y G T N CL S L SRRIE Q A N A S DV Q V G G EL IE A Q AF QN V Q A V G L GEL IE Q A N DV A S Q V G LN G T N E E A Q AF D F S N F Q A S LN GEL IE Q N DV Q A V G G EL DH A KI NKYRKDH T S K S D VPL RI DH QI T N T S K V PL RI DH LY Q KA Y K SRL I R Y G N Q I T N T S S K V PL G D Q I T RI DH T S S K L Q E Y G N D V P QI T RI GN N YEAIP G E V G R P G L KV K ED GKLNI N Q S E A E G ED ST L Y GKI G K KLNI N Q S E A E Y ST G L KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IK G L KI VIDDILHKEA IDDILHKEA IDDILHKEA DDILHKEA SEL D NI S Q NK GEKEI S I ELIVKNDLVL S T I V SELIV NDLVL T I V S S ELIV VL S T I VI SELIV YEM S G K F E QKTNVM DITKLT YEK K NDL CDITKLT YEK K NDLVL S T CDITKLT PELV Y IRDITKLT YEK I C K R EFYKIE Q YEK GPEL G ELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLKY Q EDLKNFFKLFADD YIVKRKLFADD KLFADD E YIVKRKLFADD YIVKR 0 K E YIVKR TLAL DRE L FK Y FH G E G KI E FH G I KK H G E KI O M G F A W G V G L I N I S K M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 3 3 3 3 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y VP K L YFLLE S Y K L DVVL L H Y GNTE VK G S D L G H NTE VK G V S P D L H YFL GDTEAVVL KN L S S L H YFLLEP GNTE VK VP K L YFLLE S Y G S D L G H NTE VK G VP K SD NR D VMLDV NR Y DAPYD NR N S D VMLDV NR N S D VMLDV KF G D R N S Q YYIKLFY KF D N S D VMLDV NR G Q R YYIKLFY KF G D R N QYK Q I ID F G D Q R YYIKLFY KF G D Q R YYIKLFY EK KVNNRA D EK AE E K SVVN G K G KK KVNNRA II Q I RTEEV G L II I KVNNRA D KKIKV QRTEEV G L II LPPVEDL II Q I RTEEV G D KK VNNRA K G L II I K QRTEEV G D HIALFL V K G G YH ALFL K G H K RT GLY SHLFKR Q P DIKK H I SHLFKR P V G Y QDIKK H M SHLFK R EVVLL ITLFL G YH FL K G L GYH Q DA K HLFKR P V QDIKK H IAL SHLFKR P V QNIKK KALDEK LDEK P KALDEK G K P H S S H SHL KALDEK VKP KALDEK VKP DLVEYR G H K VKP KA SATE DLVEYR H VK G S K ATE DLEEYIM D N SLKAD DLVEYR G H S K ATE DLVEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLMV PHI TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE EK K Y GANYR LLE MIVLWV GYEI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYDI S P RVRDDLT G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHDLNHDLE LVEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKEEAAFDY VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPFDKHL RYLPEE HKNL KRYLPEE HKNL MLTFLFD K HKNLKR GATKKMLTFLFD G K ATKKMLTFLLV Q L K LTFLFD G K ATK MLTFLFD G K ATK KDD RHKANKDD I S H SY G S G M GKT KDD LVRHKA S K KDD VLVRHKA S K KV Y VLV G NHRHLKV Y VLVRHKAN KDDYVT KTDN V Y V G TINHRHL KV Y AK S K K TI EYPYDAK K G INHRHL KA S K T LLN K I K PE S Q LK GNTLKDILIPE S Q LKEYPYD AK K DKL S Q S K S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD VP Q F EPL F G NTLKDILI PE S ET QEPL LRV F G NL QAPV S VI G V S A Q PE S Q G NTLKDILI PE S NTLKDILI QREIVH VP Q F EPL LLRVP F G Q EPL NLLR KNLILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LILILNR YDPKYILTE T EYDPKYILTE T T K T GE YDK YN T L I KNLILV Q S S N SIK T T KNLILV Q S S S IK QL YDPKYILTE DAVNLD LR D G S LYDAANLD LY DAA S K LE I DV Q G KE T DAANLD LR D G E YDPKYILTE T T GE SLY DAANLD R S D LY DLKLDE G I VV LDE I LR S D GVV H DLDLDLK S I YE G N ADLKLDE G I VV T RK N H GN L DLK GT KNLVKE T N H LKLDE I L GVV H GN L D G T YLKRK G N N G L V Y E QE L YLK GNIKNYKPTV Y E QE L YLKRK G N N G L TH GNIKNYKPT VK E TNI SNNEIDEKH Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTAAKFKV M V C VRLKTKIDLPLI VRLKTKIDLPLI ALELMEA I ELV LMEA ELV ALKLMNPIEVEP ALELMEA LV S ALELMEA ELV WHFYRNP S F S ALE Q HFYRNP S I F I Q S FYHE GYHKN NE I I SEH G W YHKN EH W H P S I F I S I Q G YNKEP S FKIKL W HFYRNP I E SF Q HFYRN G A RN YHKN I I SEH G W YHKN E S I EH VIKYI K S S S G DKE I IKYI S KS NE GDKE A Q F S L G S Y IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKR N I TYIN GL V I STKFFDLP V N I IKYI S KS N GDKE NPFAADIIF K G S TNPFAADIIF S K S Q AD S V TKFFNAKR V K G L S TKFFNAKR N I GL KT NPFE S I S PWKNPFAADIIF S T NPFAADIIF S K T FD F Q VV L FD L F F Q K A T FD L K T GN E EL QAF S D N F Q K A FD E EL F Q A N A M QAF S D N F LN T F G D Q F S N G T N VHL S Y R G K G T N E E A Q AF S D N G N E E QAF D F S N F Q A S LN IE Q A N S DV Q A V G L GELIE A Q N DV A S Q V G G EL I Q KP Q AV QK KT IE Q N DV A S Q V G LN GEL IE Q A N DV Q A V G G EL DH V D NT S K PL IDH T S S K V PL RI DH S K S K PL LY G I T R Q E Y G N Q I T L S D Q I N L SV H T S G EKHDKK I Y D G Q Y G N D Q V I T RI DH N T S S K L Q E Y G D V P QI T RI GKV K ED Q V GKLNI S N E S A T G L KV K ED GKLNI N Q E G V SE S A T L Y GA LYLK L KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IVIDDILHKEA DDILHKEA L NKI SEDVD N G Q IDDIL SELIV DLVL T VI S S I ELIV D TV G K HKEA I VIDDILHKEA YEK I K N CDITKLTYEK K NDLVL S T I D SELIIK T D S VTIN I V SELIVKNDLVL S S ELIVKNDLVL S T Y S LITDL YEK RDITKLT YEK RDITKLT PEL G ELTDVKRYPEL Y I C DITKLT YEA GELTDVKRY PAI G Y V L QLNRKIVKPEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLFADD YIVKRKLFADD YIVKRKKLAD DNVHL KLFADD YIVKRKLFADD YIVKR 0 K N E Y FH G E Q KI KK Y FH G E VIRKT G I K E H G E I KK H G E KI O M G A G N Y I S V N G R R M N E G A G N Y I Q KI S V N R KK G R M G N N F Q D G G Y N Y Y D R T Y L M G N A Y F G N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 3 3 3 4 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y K L YFLLE S Y VP K L LE S Y L YFLLE S Y LLE S Y L H Y GNTE VK G V S P D L G H NTE VK G S D L H YFL GNTE VK G VP K SD L G H NTE G VP K L YF SD L G H NTE VK G VP K SD NR D VMLDV NR S VMLDV NR D S VMLDV NR D VK SVMLDV NR N S D VMLDV KF G D R N S Q YYIKLFY KF D N D G Q R YYIKLFY KF G D R N QYYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KK KVNNRA D KK NNRA K KVNNRA II Q I RTEEV G L II I KVNNRA D KK KV QRTEEV G L II Q I RTEEV G D K GL II Q I RTEEV G D KK VNNRA HIALFL V K G G YH ALFL K G H IALFL V G K YH IALFL K G L II I K QRTEEV G D P G YH FL K G L GYH SHLFKR Q DIKK H I SHLFKR P V G Y QNIKK S H HLFKR Q P NIKK S H HLFKR P V QNIKK H IAL SHLFKR P V QNIKK KALDEK LDEK P KALDEK KP KALDEK H K VKP KALDEK VKP DLVEYR G H K VKP KA SATE DLVEYR H VK G S K ATE DLVEYR G H K V SATE DLVEYR G S ATE DLVEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE MIVLWV LLE MIVLWV GYEI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYEI S L EKDYYKD G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHRV FH VEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPEE HKNL MLTFLFD K HKNLKR GATK MLTFLFD G K ATK MLTFLFD K H GATK LTFLFD G K ATK MLTFLFD G K ATK KDD RHKA S K KDD LVRHKA K M S KDD LVRHKA S K KDD VLVRHKA S K KV Y VLV G NHRHLKV Y VLVRHKA S K KDD K K TI TINHRHL KV Y V G TINHRHL KV Y AK S EYPYDAK K G INHRHL KV Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE S Q LK GNTLKDILIPE S Q LKEYPYD AK K G S K LKDILI PE S Q G NTLKDILI PE S NTLKDILI VP Q F EPL F G NTLKDILI PE S Q Q EPL LRVP F G NT QEPL KNLILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LLRVP Q F EPL LLRVP F G Q EPL NLLR Q S N SIK NLILV Q S S N SIK YDPKYILTE T EYDPKYILTE T T KNLILV GE YDPKYILTE T T K T T KNLILV Q S S S IK DAANLD LR D G S LYDAANLD LY DAANLD D G E YDPKYILTE D G E YDPKYILTE T T GE I R S D DLKLDE G VV LDE I L GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV T RK N H GN L DLK GT KRK N H GN L D G T N H LKLDE I L GVV H GN L D G T YLKRK G N N G L V Y E QE L YLK GNIKNYKPTV Y E QE L YLKRK G N N G L T E YL GNIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALELMEA I ELV LMEA ELV ALELMEA ELV ALELMEA LV LMEA ELV WHFYRNP S F S ALE Q W HFYRNP S I F I I Q S HFYRNP S I F S Q HFYRNP I E SF I S ALE Q HFYRNP S I F I Q S GYHKN KS NE I I SEH G YHKN NE S EH W G YHKN I I SNE S EH G W YHKN I S EH G W YHKN E S I EH VIKYI S G DKE I IKYI S KS G DKE G DKE IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKR N I KYI S K GL V I STKFFNAKR N I TKFFNAKR N I IKYI S KS N GDKE NPFAADIIF K G S TNPFAADIIF S K K T NPFAADIIF K G L S V S K G L S V TKFFNAKR N I GL KT NPFAADIIF S T NPFAADIIF S K T FD F GN E EL QAF S D N F Q K A FD E EL Q A D QAF D F S N F LN T F GN E EL D F Q A FD L F F Q K A FD L K T T A S N G N S N F S LN G T N E E IE Q A N DV Q V G L GELIE Q A N DV A S Q V G G EL IE A Q AF QN V Q A V G G EL IE A Q AF S D N T N E E QA QN DV A S Q V G LN G F D F S N F Q A A S LN S G EL IE Q A N DV Q V G G EL DH N T S K PL IDH T S S K PL RI DH S D L ED Q V I T N T S K V PL RI DH V L Y G I T R Q E Y G N Q I T N T S S K PL G D Q I T RI DH N T S S K L Q E Y G D V P QI T RI GKV K ED Q V GKLNI S N E S A T G KV G K KLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E L Y ST G KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IVIDDILHKEA DDILHKEA IDDILHKEA IDDILHKEA VIDDILHKEA SELIV DLVL T VI S S I ELIVKNDLVL S T I V SELIVKNDLVL S T I V SELIVKNDLVL S T S I ELIVKNDLVL S T YEK I K N CDITKLTYEK DITKLT YEK RDITKLT YEK RDITKLT PEL G Y ELTDVKRYPEL Y IRDITKLT YEK IR GELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLFADD E YIVKRKLFADD YIVKRKLFADD FADD E YIVKRKLFADD YIVKR 0 K E YIVKRKL NE FH G Q KI R KK E Y FH G G KI E FH G I KK H G E KI O M G A G Y N Y I S V N G R M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 4 4 4 4 1 8

1 8 2 2 2 2 B 7 4

L RLIM S D DKI G G L YFLDVVL N DVVL L L H Y GED MV DTEAVVL K L S S L H YFL GDTEAVVL S K S N L H YFLLE S Y GNTE VK VP K L YFLLE S Y G S D L G H NTE VK G VP K SD NR D I S L EK K HKNLL G H GATKKNR Y DAPYD NR N S D VMLDV NR N S D VMLDV KF G KHRVRHKANKF D NY PYD NR G Q R YK I DA Q KID F G D R N QYK Q I ID F G D Q R YYIKLFY KF G D Q R YYIKLFY KKIDKDLNHRHLKKIKVAE S E VVN G K G KKIKVAE E K SVVN G K G KK VNNRA II LPEEKYPYDII K RTLPPVEDL II RTLPPVEDL II I K QRTEEV G D KK VNNRA LL M G K LY K G L II I K E Q RTKEV G D HM G WLFD G LY EVV SHLVVLV D DILI SNLLR H M SHLFK Q R H S K S H HLFK R EVVLL IALFL G YH FL K G L GYH Q DA K HLFKR P V QDIKK H MAL SHLFKR P V QDIKK KA DKTIRIKTTKALDEK K DA G N S P HL KALDEK G K P H S S H SHL KALDEK VKP KALDEK VKP D L TE EEDLEEYIM S D LKAD DLEEYIM D N D S LKAD DLVEYR G H S K ATE DLVEYR G H S K ATE T L G S E A ELK GE LR S LYTAERLMV YPHI TAERLMV PHI TAERLIMPNDD TAERLIMPNDD NLLEP T L Q VV H LE K G K ANYR LLE EK K Y GANYR LLE LWV GYDVLI Q S RK G N N G L N L GYD P E SRVRDDLT G N YD P RVRDDLT G N YEI L MIV SEKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKYILNYKPTLVE Q I HDLNHDLE LVE I S Q HDLNHDLE LVEKHRV AFH LVEKHRV AFH VLENLD DLPLIVLEDKEEAAFDY VLEDKEEAAFDY VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLDE G I YLPFDKHL FDKHL RYLPEE HKNL KRYLPEE HKNL MLT I ELV SF S KR QMLTFLLV H Q KRYLP S G L MLTFLLV Q L K LTFLFD G K ATK MLTFLFD G K ATK KDD I NIK GL NE I I SEHKDDYVTI S S Y G G KT KDDYVTI S H SY G S G M GKT KDD LVRHKA S K KDD VLVRHKA S K KA KT I K QIDKE IKA LKTDN V Y V G TINHRHL KA Y AK S K VMEAKR N LAK K DKLKTDN S K I KA TLLN K I K K S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE YRNPIF K G S K TPE S ETLLN S AK K DK S Q PE S E L VI Q S G V S A Q PE S Q L G Q V S F AV F G N QTPV S VI QREIVH V F G N QTPV Q S REIVH VP F G NTLKDILI PE S NTLKDILI V F KTE S Q EPL NLLRV F G Q EPL NLLR K T Q S LYIK G N F Q S LILILNR S LILILNR NLILV Q S S S IK K S T LILV Q S S A S NK T S IK YNKFFNA Q V G L GELYDK YN DV Q T L I K T QL YDK YN DV T L I K Q Q L YDPKYILTE T T DAAEKDIPL IDAA S K LE I G KE E G I E T DAANLD LR D G E YDKKYILTE T T GE SLY DAANLD R S D LY DLDEEL I T R Q I A EDLDLDLK S YE N T DAA S K L GADLDLDLK I K SYE G N ADLKLDE G I VV TK E D I S N E S TTH IKNLVKE T N H DLDLDE I L GVV H E G N G L TH YLKRK G D N G L VE S T S AF S S VHKEA E TNIKNLVKE TH SNNEIDEKH VK E TN SNNEIDEKH Y E LKRK QE L Y GNIKNYKPT VK S E G L NIKNYKPT VR L AKK DLVL T VK SVRLKTAAKFKV C M VRLKTAAKFKV M V C VRLKTKIDLPLI VRLKTKIDLPLI AL Q LEN Q V ITKLTALKLMNPIEVEP ALKLMNPIEVEP ALELMEA LV S ALKLMEA ELV WHFKKLNDVKRY HFYHE KL GYN S FKIKL HFYRNP I E S W S F W P S I F I Q GDI YIVKR G YNKEP S FKI G RN W HFYHE G N YHKN I I Q HFYRN SEH G YNKN E S I EH VIT S LK G L V YINA Q A F S L G YNKEP SY TYINA A R QF S L G W IKYI KS NE S G DKE STKRIRD D KI K IT S N G R R S TKFFDLP V AD V I STKFFDLP V S Y N I ITYI S KS N GDKE NKFKDLML Q V DNPFE S Q S Q AD S V TKFFNAKR V K G L S TKFFNAKR N I GL AM S I S PWKNPFE S I S PWKNPFAADIIF S T NPFAADIIF S K T FNRDD RR V N QEI FN Q VV FD L F F Q K A FN GE LFH G E NIPLN G T D Q F Q VV L N A M QAVVHL S Y R G K T F G D Q F VHL Y L SR G K G T N E E QAF S D N E EL K T IK Q P KNYIIK KI Q K Q P K G LN G T VF D F S N F Q A S LN DH S K V D KT I Q KP Q AV QK S K V D LKT IE Q A N DV A S Q V G EL I K Q Q Q P N DV Q A V G G EL LKW DK Y K SFDDH S K PL S V S D Q I N L SV Q I S N V H T S K L LY G DP G T Q V LNL F Y G L G EKHDKK I Y DH G Q L S D HDKK G I Y D V I DH T S Q Y G N D Q I T R Q E Y G L D V P QI T RI GK L TLAPNV S L F G L A NKILYLK L Y G G V EK LYLK L KV K E GKLNI S N E S A T G L A K E GKLNI N Q S E A E ST IK S G A I DEKYK D S L EDVD N K G A G Q L NKI SEDVD N SELVK S K SAEDKKF S I ELIIK D TV SVTIN I D SELIIK D TV G K G Q VIDDILHKEA L I D S DDILHKEA SELIV VL S T S YEA ENITTIPAYEA D S VTIN I ELIVKNDLVL S T SLITDL YEK K NDL Y C DITKLT YEA RDITKLT PKI G DFEYE APAI Y L S D LITDL YEA G Q V LNRKIVKPAI G Y V L QLNRKIVKPEL Y I GELTDVKRY PAI Y I GELTDVKRY 0 KLLAAVEMT N L SKPKKLAD D DNVHL KLFADD YIVKRKKLADD YIVKR 0 K D DNVHL KKLAD T N Y KKKDNII Y F Q VIRKT G I K N F Q VIRKT G I K I KK H G E KI O M G G G Y N F R F H I A K G M G N G N G N Y Y D R T Y L M G N G G Y N Y Y D R T Y L M N E H G E G A Y F G N Y I Q K S V N G R R M G N G N Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 4 4 4 5 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y K L YFLLEP LE S Y L H Y GDTE VK G V S P D L G H NTE VK G VP K LHYFL LE S Y H YFLLE Y SD L G P K L SD L H YFL GNTE VP K L S G S D L G NTE VK G VP K SD NR D D R N S VMLDV NR D VK V SVMLDV NR D VK SVMLDV NR N S D VMLDV KF G Q YYIKLFY KF D N S D VMLDV N G NTE G G Q R YYIKLFY KF G D R N QYYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KKIKVNNRA D KK NNRA K KVNNRA II RTKEV G L II I KVNNRA D KK KV QRTEEV G L II Q I RTKEV G D K GL II Q I RTKEV G D KK VNNRA HM G K LFL V K G G YH ALFL K G H IALFL V G K YH IALFL K G L II I K QRTKEV G D H G YH FL K G L GYH SHLFKR Q P DIKK H I SHLFKR P V G Y QDIKK S HLFKR Q P NIKK S H HLFKR P V QDIKK H IAL SHLFKR P V QNIKK KALDEK LDEK P KALDEK KP KALDEK G K VKP KA SATE DLVEYR H VK G S K ATE DLVEYR G H K V SATE DLVEY H K VKP KALDEK VKP DLEEYR H R G S ATE DLVEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE MIVLWV LLE MIVLWV GYDI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYEI S L EKDYYK G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHRV FH VEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPEE KNL KRYLPEE KNL KRYLPEE HKNL MLTFLFD K HKNLKR GATK MLTFLFD G K ATK MLIFLFD K H GATK LIFLFD K H GATKKMLIFLFD G K ATK KDDYVLVRHKA S K KDD LVRHKA K M S KDD LVRHKAN KDD VLVRHKA S K KA D NHRHLKT Y VLVRHKA S K KDD K K TI TINHRHL KT Y V G TINHRHL KT Y AK S EYPYDAK K G INHRHL KT Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE S Q LK GNTLKDILIPE N Q LKEYPYD AK K G S K LKDILI PE S Q G NTLKDILI PE S NTLKDILI V F APL F G NTLKDILI PE S Q Q EPL LRVP F G NT QEPL K T Q S LILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LLRVP Q F EPL LLRVP F G Q EPL NLLR Q S N SIK NLILV Q S S N SIK YDK TE T EYDPKYILTE T T KNLILV GE YDPKYILTE T T K T T KNLILV Q S S S IK DAA K YIL SLD LR D G S LYDAANLD LY DAANLD D G E YDPKYILTE D G E YDPKYILTE T T GE R S D DLDLDE G I VV LDE I L GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV TH TYLKRK D H GN L DLK GT KRK N H GN L D G T N H LKLDE I L GVV H GN L D G T YLKRK G N N L E G VK S NNIKNYKPTV Y E QE L YLKRK G N N G L T E YL GNIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALKLMEA I ELV LMEA ELV ALELMEA ELV LELMEA LV WHFYRNP S F S ALE Q HFYRNP S I F I Q S FYRNP S I F S A S ALELMEA ELV GYNKN NE I I SEH G W YHKN EH W H GYHKN I I Q HFYRNP I E SF I I Q HFYRNP S I F I Q S S NE S EH G W YHKN S EH G W YHKN E S I EH VITYI K E S I S G S DKE I IKYI S KS N GDKE G DKE IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKR N I KYI S K GL V I STKFFNAKR N I TKFFNAKR N I IKYI S KS N GDKE NPFEADIIF K G S TNPFAADIIF S K D N F Q K A FD K T NPFAADIIF K G L S V S K G L S V TKFFNAKR N I GL KT NPFAADIIF S T NPFAADIIF S K T FN F G D EL K N F Q A Q A FD L K T QVF S S N G T N E EL Q A D QAF D F S N F S LN T F GN E EL D F S S LN T FD F GN E EL F VF S D N I Q K Q P K Q A V G L GELIE Q A N DV Q A V G G EL IE A Q VF QN V Q A V G G EL IE A Q Q N DV A S Q V G LN G T N E E QVF D F S N F Q A S LN GEL IE Q A N DV Q A V G G EL DH S DV L V S K PL IDH T S S K PL RI DH S D S V N T S K V PL RI DH K L LY G G ED Q V I T R Q E Y G N I T Q I T N T S S K PL G D Q V I T RI DH N T S Q E Y G D V P QI T RI GA K ED Q LNKLNI S N E S A T G L KV G KLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E L Y ST G KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST ID S EDILHKEA DDILHKEA IDDILHKEA IDDILHKEA VIDDILHKEA SELIVKNDLVL T VI S S I ELIVKNDLVL S T I V SELIVKNDLVL S T I V SELIVKNDLVL S T S I ELIVKNDLVL S T YEA Y IRDITKLTYEK DITKLT YEK IRDITKLT YEK RDITKLT PAV G ELTDVKRYPEL Y IRDITKLT YEK IR GELTDVKRY PEL G Y ELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY 0 KKLADD E YIVKRKLFADD YIVKRKLFADD FADD E YIVKRKLFADD YIVKR 0 K E YIVKRKL NN FH G Q KI R KK E Y FH G G KI E FH G I KK H G E KI O M G R G Y N Y I S V N G R M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 5 5 5 5 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y K L YFLLE S Y VP K L LE S Y L YFLLE S Y LLE S Y L H Y GNTE VK G V S P D L G H NTE VK G S D L H YFL GNTE VK G VP K SD L G H NTE G VP K L YF SD L G H NTE VK G VP K SD NR D VMLDV NR S VMLDV NR D S VMLDV NR D VK SVMLDV NR N S D VMLDV KF G D R N S Q YYIKLFY KF D N D G Q R YYIKLFY KF G D R N QYYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KK KVNNRA D KK NNRA K KVNNRA II Q I RTKEV G L II I KVNNRA D KK KV QRTKEV G L II Q I RTKEV G D K GL II Q I RTEEV G D KK VNNRA HIALFL V K G G YH ALFL K G H IALFL V G K YH IALFL K G L II I K QRTKEV G D P G YH FL K G L GYH SHLFKR Q DIKK H I SHLFKR P V G Y QDIKK S H HLFKR Q P DIKK S H HLFKR P V QDIKK H IAL SHLFKR P V QDIKK KALDEK LDEK P KALDEK KP KALDEK H K VKP KALDEK VKP DLVEYR G H K VKP KA SATE DLVEYR H VK G S K ATE DLVEYR G H K V SATE DLVEYR G S ATE DLVEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE MIVLWV LLE MIVLWV GYEI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYEI S L EKDYYKD G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHRV FH VEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPEE HKNL MLIFLFD K HKNLKR GATK MLIFLFD G K ATK MLIFLFD K H GATK LIFLFD G K ATK MLIFLFD G K ATK KDD RHKA S K KDD LVRHKA K M S KDD LVRHKA S K KDD VLVRHKA S K KT Y VLV G NHRHLKT Y VLVRHKA S K KDD K K TI TINHRHL KT Y V G TINHRHL KT Y AK S EYPYDAK K G INHRHL KT Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE S Q LK GNTLKDILIPE S Q LKEYPYD AK K G S K LKDILI PE S Q G NTLKDILI PE S NTLKDILI VP Q F EPL F G NTLKDILI PE S Q Q EPL LRVP F G NT QEPL KNLILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LLRVP Q F EPL LLRVP F G Q EPL NLLR Q S N SIK NLILV Q S S N SIK YDPKYILTE T EYDPKYILTE T T KNLILV GE YDPKYILTE T T K T T KNLILV Q N S S IK DAANLD LR Y G S LYDAANLD LY DAANLD D G E YDPKYILTE D G E YDPKYILTE T T GE I R S D DLKLDE G VV LDE I L GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV T RK N H GN L DLK GT KRK D H GN L D G T D H LKLDE I L GVV H GD L D G T YLKRK G D N G L V Y E QE L YLK GNIKNYKPTV Y E QE L YLKRK G D N G L T E YL GNIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALELMEA I ELV LMEA I ELV ALELMEA ELV LELMEA ELV WHFYRNP S F S ALE Q HFYRNL S F GYHKN I I Q S HFYRNL S I F S A SF S ALELMEA ELV GYHKN KS NE I I SEH W EH W G YHKN E I I Q HFYRNP I SEH G W YH I I Q HFYRNL S I F I S W Q K S N KN S EH G YHKN E S I EH VIKYI S G DKE I IKYI S KS NE S G DKE YI S G DKE I IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKRK N I K GL V I STKFFNAKRK G N L S V TKFFNAKR N I IKYI S KS N GDKE NPFAADIIF K G S TNPFAADIIFF K T NPFAADIIFF K G L S V TKFFNAKRK N I GL KT NPFAADIIF S T NPFAADIIFF TFD F GN E EL QVF S D N F Q K A FD E EL Q A D QAF D F S N F LN T F GN E EL D F Q A FD L F F Q K A FD L K T T A S N G N S N F S LN G T N E E IE Q A N DV Q V G L GELIE Q A N DV A S Q V G G EL IE A Q AF QN V Q A V G G EL IE A Q AF S D N T N E E QA QN DV A S Q V G LN G F D F S N F Q A A S LN S G EL IE Q A N DV Q V G G EL DH N T S K PL IDH T S S K PL RI DH S D L ED Q V I T N T S K V PL RI DH V L Y G I T R Q E Y G N Q I T N T S S K PL G D Q I T RI DH N T S S K L Q E Y G D V P QI T RI GKV K ED Q V GKLNI S N E S A T G KV G K KLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E L Y ST G KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IVIDDILHKEA DDILHKEA IDDILHKEA IDDILHKEA VIDDILHKEA SELIVKNDLVL T VI S S I ELIVKNDLVL S T I V SELIVKNDLVL S T I V SELIVKNDLVL S T S I ELIVKNDLVL S T YEK IRDITKLTYEK DITKLT YEK RDITKLT YEK RDITKLT PEL G Y ELTDVKRYPEL Y IRDITKLT YEK IR GELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLFADD E YIVKRKLFADD YIVKRKLFADD FADD E YIVKRKLFADD YIVKR 0 K E YIVKRKL NE FH G Q KI R KK E Y FH G G KI E FH G I KK H G E KI O M G A G Y N Y I S V N G R M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 5 5 5 6 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y K L YFLLE S Y VP K L LE S Y L H Y GNTE VK G V S P D L G H NTE VK G S D L H YFL GNTE P K L H YFLLE S Y V K L YFLLEP NR D VMLDV NR D VK G V S D L G DTE VK G S P D L G H DTE VK G VP K SD SVMLDV NR S VMLDV NR N S D VMLDV NR N S D VMLDV KF G D R N S Q YYIKLFY KF D N D G Q R YYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KF G D Q R YYIKLFY KK KVNNRA D KK NNRA KIKVNNRA II Q I RTEEV G L II I KVNNRA D KK KV QRTEEV G L II Q I RTEEV G D K GL II RTEEV G D KKIKVNNRA HIALFL V K G G YH ALFL K G H IALFL V G K YH I G K LFL K G L II TEEV G D H G YH K R FL K G L GYH SHLFKR Q P DIKK H I SHLFKR P V G Y QDIKK S HLFKR Q P DIKK S H HLFKR P V QDIKK H I G L SHLFKR P V QDIKK KALDEK LDEK P KALDEK KP KALDEK G K VKP KA SATE DLVEYR H VK G S K ATE DLVEYR G H E V SATE DLEEY H K VKP KALDEK VKP DLVEYR H R G S ATE DLEEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE MIVLWV LLE MIVLWV GYEI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYEI S L EKDYYKD G N YDI S L EKDYYKD N LLE GYDI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHRV FH VEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPEE HKNL MLTFLFD K HKNLKR GATK MLTFLFD G K ATK MLTFLFD K H GATK LTFLFD G K ATK MLTFLFD G K ATK KDD RHKA S K KDD LVRHKA K M S KDD LVRHKA S K KDD VLVRHKA S K KV Y VLV G NHRHLKV Y VLVRHKA S K KDD K K TI TINHRHL KV Y V G TINHRHL KV Y AK S EYPYDAK K G INHRHL KV Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE S Q LK GNTLKDILIPE S Q LKEYPYD AK K G S K TLKDILI PE S Q G NTLKDILI PE S NTLKDILI VP Q L EPL L G NTLKDILI PE S Q Q EPL LRVP F G N QEPL KNLILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LLRVP Q F EPL LLRVP F G Q EPL NLLR Q S N SIK NLILV Q S S N SIK YDPKYILTE T EYDPKYILTE T T KNLILV GE YDPKYILTE T T K T T KNLILV Q S S S IK DAANLD LR D G S LYDAANLD D YILTE LY DAANLD D G E YDPK D G E YDPKYILTE T T GE S DLKLDE G I VV LDE I LR GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV T RK N H GN L DLK GT KRK N H GN L D G T N H LKLDE I L GVV H GN L D G T YLKRK G N N G L V Y E QE L YLK GNIKNYKPTV Y E QE L YLKRK G N N G L T E YL GNIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALELMEA I ELV LMEA ELV ALELMEA ELV ALELMEA LV LMEA ELV WHFYRNP S F S ALE Q W HFYRNP S I F I I Q S HFYRNP S I F S Q HFYRNP I E SF I S ALE Q HFYRNP S I F I Q S GYHKN KS NE I I SEH G YHKN NE S EH W G YHKN I I SNE S EH G W YHKN I S EH G W YHKN E S I EH VIKYI S G DKE I IKYI S KS G DKE G DKE IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKR N I KYI S K GL V I STKFFNAKR N I TKFFNAKR N I IKYI S KS N GDKE NPFAADIIF K G S K TNPFAADIIF S K A FD K T NPFAADIIF K G L S V S T NPFAADIIF K G L S V TKFFNAKR N I GL S T NPFAADIIF S K T FD F GN E EL QVF S D N F Q E EL Q A D QVF D F S N F S LN T F Q A FD L F F Q K A T FD A G N E EL D F F K T L K T S S N LN G N E E QAF S D N N E E QA IE Q A N DV A S N G T N QV G L GELIE Q A N DV Q V G G EL IE A Q AF QN V Q A V G G EL IE Q A N DV A S Q V G LN G F D F S N F Q A A S LN S G EL IE Q A N DV Q V G G EL DH N T S K PL IDH T S S K PL RI DH S D L ED Q V I T N T S K V PL RI DH V L Y G I T R Q E Y G N Q I T N T S S K PL G D Q I T RI DH N T S S K L Q E Y G D V P QI T RI GKV K ED Q V GKLNI S N E S A T G KV G K KLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E L Y ST G KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IVIDDILHKEA DDILHKEA IDDILHKEA IDDILHKEA T I VIDDILHKEA SELIVKNDLVL T I VI S S ELIVKNDLVL S T I V SELIV K NDLVL T V S S I ELIV VL S S ELIV YEK IRDITKLTYEK K I C DITKLT YEK K NDL CDITKLT YEK K NDLVL S T CDITKLT PEL G Y ELTDVKRYPEL Y IRDITKLT YE GELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLFADD E YIVKRKLFADD YIVKRKLFADD FADD E YIVKRKLFADD YIVKR 0 K E YIVKRKL NE FH G Q KI R KK E Y FH G G KI E FH G I KK H G E KI O M G A G Y N Y I S V N G R M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 6 6 6 6 1 8

1 8 2 2 2 2 B 7 4

L FLLE S Y K L YFLLE S Y VP K L LE S Y L YFLLE S Y LLE S Y L H Y GNTE VK G V S P D L G H NTE VK G S D L H YFL GNTE VK G VP K SD L G H NTE G VP K L YF SD L G H NTE VK G VP K SD NR D VMLDV NR S VMLDV NR D S VMLDV NR D VK SVMLDV NR N S D VMLDV KF G D R N S Q YYIKLFY KF D N D G Q R YYIKLFY KF G D R N QYYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KK KVNNRA D KK NNRA K KVNNRA II Q I RTEEV G L II I KVNNRA D KK KV QRTEEV G L II Q I RTEEV G D K GL II Q I RTEEV G D KK VNNRA HIALFL V K G G YH ALFL K G H IALFL V G K YH IALFL K G L II I K QRTEEV G D P G YH FL K G L GYH SHLFKR Q DIKK H I SHLFKR P V G Y QDIKK S H HLFKR Q P DIKK S H HLFKR P V QDIKK H IAL SHLFKR P V QDIKK KALDEK LDEK P KALDEK KP KALDEK H K VKP KALDEK VKP DLVEYR G H K VKP KA SATE DLVEYR H VK G S K ATE DLVEYR G H K V SATE DLVEYR G S ATE DLVEYR G H S K ATE TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TAERLIMPNDD NLLE MIVLWV LE IVLWV LE MIVLWV LLE MIVLWV GYEI S L EKDYYKD N L GYEI L M SEKDYYKD N L GYEI S L EKDYYKD G N YEI S L EKDYYKD N LLE GYEI L MIVLWV SEKDYYKD LVEKHRV EKHRV AFH LVEKHRV FH VEKHRV AFH LVEKHRV AFH VLEDKDL D AFH LV SDKI G G VLEDKDL S D DKI G G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLEDKDL S D DKI G G KRYLPEE YLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPEE HKNL MLTFLFD K HKNLKR GATK MLTFLFD G K ATK MLTFLFD K H GATK LTFLFD G K ATK MLTFLFD G K ATK KDD RHKA S K KDD LVRHKA K M S KDD LVRHKA S K KDD VLVRHKA S K KV Y VLV G NHRHLKV Y VLVRHKA S K KDD K K TI TINHRHL KT Y V G TINHRHL KT Y AK S EYPYDAK K G INHRHL KV Y V S K T LKEYPYD AK S K K LKEYPYD AK K G S K TINHRHL QLKEYPYD PE S Q LK GNTLKDILIPE S Q LKEYPYD AK K G S K LKDILI PE S Q G NTLKDILI PE S NTLKDILI VP Q F EPL F G NTLKDILI PE S Q Q EPL LRVP F G NT QEPL KNLILV Q S S NLLRVP SIK TKNLILV S NL Q S S IK S LLRVP Q F EPL LLRVP F G Q EPL NLLR Q S N SIK NLILV Q S S N SIK YDPKYILTE T EYDPKYILTE T T KNLILV GE YDPKYILTE T T K T T KNLILV Q S S S IK DAANLD LR D G S LYDAANLD LY DAANLD D G E YDPKYILTE D G E YDPKYILTE T T GE I R S D DLKLDE G VV LDE I L GVV H DLKLDE I LR S LY DAANLD LR S LY DAANLD R S D LY GVV LKLDE G I VV T RK N H GN L DLK GT KRK N H GN L D G T N H LKLDE I L GVV H GN L D G T YLKRK G N N G L V Y E QE L YLK GNIKNYKPTV Y E QE L YLKRK G N N G L T E YL GNIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT V Y E QE G L NIKNYKPT VRLKTKIDLPLIVRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI VRLKTKIDLPLI ALELMEA I ELV LMEA ELV ALELMEA ELV ALELMEA LV LMEA ELV WHFYRNP S F S ALE Q W HFYRNP S I F I I Q S HFYRNP S I F S Q HFYRNP I E SF I S ALE Q HFYRNP S I F I Q S GYHKN KS NE I I SEH G YHKN NE S EH W G YHKN I I SNE S EH G W YHKN I S EH G W YHKN E S I EH VIKYI S G DKE I IKYI S KS G DKE G DKE IKYI K NE S G S DKE STKFFNAKR N L S V TKFFNAKR N I KYI S K GL V I STKFFNAKR N I TKFFNAKR N I IKYI S KS N GDKE NPFAADIIF K G S TNPFAADIIF S K K T NPFAADIIF K G L S V S K G L S V TKFFNAKR N I GL KT NPFAADIIF S T NPFAADIIF S K T FD F GN E EL QAF S D N F Q K A FD E EL Q A D QAF D F S N F LN T F GN E EL D F Q A FD L F F Q K A FD L K T T A S N G N S N F S LN G T N E E IE Q A N DV Q V G L GELIE Q A N DV A S Q V G G EL IE A Q AF QN V Q A V G G EL IE A Q AF S D N T N E E QA QN DV A S Q V G LN G F D F S N F Q A A S LN S G EL IE Q A N DV Q V G G EL DH N T S K PL IDH T S S K PL RI DH S D L ED Q V I T N T S K V PL RI DH V L Y G I T R Q E Y G N Q I T N T S S K PL G D Q I T RI DH N T S S K L Q E Y G D V P QI T RI GKV K ED Q V GKLNI S N E S A T G KV G K KLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E L Y ST G KV K E GKLNI S N E S A T G L KV K E GKLNI N Q S E A E ST IVIDDILHKEA DDILHKEA IDDILHKEA IDDILHKEA VIDDILHKEA SELIV DLVL T VI S S I ELIV NDLVL S T I V SELIV VL S T S I ELIV YEK I K N CDITKLTYEK K NDLVL S T I V SELIV K I C K DITKLT YEK K NDL CDITKLT YEK K NDLVL S T CDITKLT PEL G Y ELTDVKRYPEL Y I C DITKLT YE GELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PEL Y I GELTDVKRY 0 KLFADD E YIVKRKLFADD YIVKRKLFADD FADD E YIVKRKLFADD YIVKR 0 K E YIVKRKL NE FH G Q KI R KK E Y FH G G KI E FH G I KK H G E KI O M G A G Y N Y I S V N G R M N E G A G N Y I Q KI S V N R KK E FH G R M G N A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M N E G A Y F G N Y I S Q V N R K G R 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 6 6 6 7 1 8

1 8 2 2 2 2 B 7 4

L FLDARL C L Y L L YFLLE S Y LE S L YFLLE S Y L H F GDTEFM I G N V G L G H NTE VK G VP K L SD L H YFL GNTE G VP K SD L H G NTE VK V G P K I LFM YVF SD L G H Y F QTELE G Y NR E Q K AKPTNR D VK R S VMLDV NR N S D VMLDV NR NHIK K TY D R NL G KY G Q YTIVKPLIKF D N S D VMLDV NR G Q YYIKLFY KF G D R N QYYIKLFY KF G D Q R YYIKLFY KF G D Q R YAVML P K SK KKIKVDVVALV KK NNRA K KVNNRA II RA EV I Q S II I KVNNRA QRTKEV G D KK KV GL II Q I RTEEV G D E GL II Q I RTEEV G D KK V IKIDR Y I H M G Q LF Q S ALFL K YH IALFL V G K YH IALFL K G L II I K QRT Q Y AFL GYH TIALFD Q R QVK V SHLFTAE L V S I EH SAE H I LFKR P V G Q DIKK S H HLFKR Q P NIKK S H HLFKR P V QDIKK KHLFTLPIA G G D RALDELLPN N I S H K G LKALDEK K KP KALDEK KP KALDEK H K VKP RA ERLAVY DLNEY VL S VEYR H V G S ATE DLVEYR G H K V SATE DLVEYR G S ATE EV L D SEY VK S E N TAERL R A Q NYF K TDL QATAERLIMPNDD TAERLIMPNDD TAERLIMPNDD TADRL S K G Y S E AR DLLE K G Y LE IVLWV LLE MIVLWV LLE MIVLWV IMPNT G S G E GYEI S L IM D A LN SD G G EL N L GYEI L M SEKDYYKD G N YEI S L EKDYYKD G N YEI S L EKDYYKD N LLD GYDI S L MIVLDTL FIEKHMV EKHRV AFH EKHRV FH VEKHRV AFH LIDKHKKNYWL VLDDKEK K H KILV GA Q T EDKDL S D DKI G LV G VLEDKDL D A SDKI G L G VLEDKDL S D DKI G G VLALKKVDANE S Q RRYLPRVRHV A VL S S E KRYLPEE HKNL KRYLPEE KNL KRYLPEE HKNL KRYLPDLLDNHL MVTFLDLNHEAVMLIFLFD G K ATK MLTFLFD K H GTTK LTFLFD G K ATK MLTLLEE HKYD K YVEEDYRLTKDD S KDD LVRHKA K M S KDD LVRHKA S K KDDPVFD G K ARHI K D D S K DKFDKNKLTKT Y VLVRHKA K K G INHRHL KV Y V TINHRHL KV Y V G TINHRHL KAKEKLVRHPAR AK S S K T LKEYPYD AK S K K LKEYPYD T A ETINHIAT PE A EIV GETI S NKRYDK SIVRRPE S Q LKEYPYD K G S K TLKDILI PE S Q G NTLKDILI P Q K S NLK YLLE V F EPLKTEI R KVP F G NTLKDILI A K SE S Q Q EPL LRVP F G N QEPL LLRVP Q F EPL LLRVP L G Q DPTI G A HKYY K K Q S LFLTILRN G RKNLILV S NL Q S S IK V Q S S N SIK NLILV Q S S N SIK YDP VV DYDPKYILTE T T KNLIL GE YDPKYILTE T T K DPKYILTE T T RNLLLLT N L DAA K YH SLV Q S RK V N QEVDAANLD I LR S D LY DAANLD D G E Y D G E YDKRYI Q I S D F G L D LDILNYPLDDLKLDE G VV H DLKLDE I LR S LY DAANLD I LR S LY DVVALI Q S S TD TT GVV LKLDE G VV T L K QELYN D E YLKRK G N N G L T E YLKRK N H GN L D L G T N H LDLDD R G N MV GN L D G TE NE I L GVI Y VKENNE G I L Y E K T E G NIKNYKPT V Q Y E G L NIKNYKPT V Y E LKRK QE L Y GNIKNYKPT VK K I SPKIKKK S K D Q S VRLKTLK I G S F G V Q Y SFL L FVRLKTKIDLPLI VRLKTKIDLPLI RLKTKIDLPLI VR KTIKNYLVH ALDIMIKDEV S FALELMEA ELV ALELMEA LV V LELMEA LV WHFYREIDKKYK HFYRNP S I F I Q S FYRNP I E SF S S S VL G L LM ID GYHKHEAKRKKF G W YHKN EH W H GYHKN I I Q HFYRNP I E SF K I I Q HFYK Q A L I PI G S LL SNE S EH G W YHKN S EH G W YNKENP S I FEFT VIKYINPIFIPA IKYI S KS NE S I GDKE I S K G DKE IKYI K NE S G S DKE STKFFK F FFNAKR N I KY GL V I STKFFNAKR N I TKFFNAKR N I IAYID EKLA NPFADP G S N T A S V TK A S A C L PNPFAADIIF S K K T NPFAADIIF K G L S V S T NPFAADIIF K G L S V TKFFP S N GNKYEN S T NPFE NAKRF T D Q VFL G Q F E EDP Q DLPLDI A FD G G T N E EL Q A D QVF D F S N F S LN T F GN E EL D F F Q K A FD L F F Q K A N E E I T YNE Q DLIF N L A Q AF S D N KP L G G V IK Q T S A M I KNLIE A S S N QN V Q A V G G EL IE A Q AF QN V Q A V G LN G T GEL IE Q A N DV A S Q V G LN G G EL IKT S Q S A F N F G D G S K E QK DH N TRF Q E I S N EEADH T S D SK V PL RI DH S D NT S K V PL H T S S K PL LY G HKKKT Y G N Q I T RI D Y G N D Q V I T RI DH VADV Q S V Q T L Q E LY G L KLPLV T GEV K AEV GKK DLDRF G L KV K ED Q I T GKLNI N Q S E A E G ED ST L Y GKV G K KLNI N Q S E A E ST G L KV K E GKLNI S N E S A T AA Q E GE E S IKIDDD Q V ITWVT VIDDILHKEA IDDILHKEA IDDILHKEA T IV S L ED D DI Q I K E S T ST SELIVKKDIANL S I ELIVKNDLVL S T I V SELIVKNDLVL T V S S I ELIV VL S TELIVI S K HKK A Y SR Y K VTVYIIHLYEK DITKLT YEK K NDL Y C DITKLT YDA IYDLKAK P Q E L G EID Y IRDITKLT YEK IR SK NPEL G ELTDVKRY PEL G Y ELTDVKRY PEL Y I GELTDVKRY PAI Y I GEKEITVLR 0 KLFAEKD S V G E G Y IKLFADD YIVKRKLFADD RKLFADD YIVKRKLIAELTDIILD 0 KT Y FH E YIVK E FLT L TTK E Y FH G G KI E FH G E I KKN L NRI O M A S G N L K Q R D N F K S M N E G A G N Y I Q KI S V N R KK N E G R M G A G Y N Y I S Q V N R KK G R M G N A G Y N Y I Q K S V N G R R M A G N Y F G N D G KY I S H V R T 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 7 7 7 7 1 8

1 8 2 2 2 2 B 7 4

I FLFM YVF I FIKLNE W L G H Q Y TELE G Y Y L G H Y FLFM VF W QTELE Y Y G TY L N ALY GLF N ALYFIKLNE W ALYFIKLNE NR IK K T G K NR K G K E RT S A F G LF T A F N T G LF Q Q VN G F NR Q K Q Q T R QVN S P F NR KQ RT Q Q T D R NH VN P A SF KF G Q YAVML P Q S K KF D R NHI G Q YAVML P K NRLN Q K SK KYVE LPVADM KY L N SE ADM KY L N SE KK KV IKIDR KK RPEVFE KKEK Y PI C Q L PEVLE KKEK Y L PIADM C Q PEVLE II Q I RT Q Y I KV IKIDR KKEE C Y QRT Q Y FL E ILE TIALFD Q RAFL II QVK V TIALFD Q RA QVK I ILE N G Y S L V R C G G I IDI L E SK Y L GP V R G G G I ILE QIDI L E L SK G Y P G VG R G KHLFTLPIA G G D KHLFTLPIA G V Q ID M GD KHE I S Q HIMPNY H Q S KHEKHIMPNY S N KHEKHIMPNY S N RA DERLAVY M VLKE KAEAKM LKE KAEAKM LKE EV S L EY E RA L DERLAVY KAENK Y E VK S NEV S EY K S E N L IPR G I NYII E L IPR V V SDYMI PR V V SDYMI TADRL S K G S AR DRL S KY V G S E AR E A S Y FLKV PY S A S Y FLKV APY E L SA Y I SFLKV APY NLLD IMPNT S E TA G G LD MPNT G S E S G LDYVDL D A GDDA LDYVDL G D DDD VDL G D DDD GYDI S L MIVLDTL N L GYDI L I SMIVLDTL G N YKDNEEKHW D YKDNEEKHW D LDY GYKDNEEKHW LIDKHKKNYWL LIDKHKKNYWL V FDEAD L G S AV EFDEAN S L AV EFDEAN S L VLALKKVDANE S Q VLALKKVDANE Q A S VL C AA E GKVVRHA L C A G A KVVRHT IL C A G A KVVRHT KRYLPDLLDNHLKRYLPDLLDNHL KREEPTINHK K I G KREEPTI HK G K KREEPTI HK G K MLTLLEE TLLEE HKYD MLIVLLKAYR MLIILLK G N YR MLIILLK G N Y KDDPVFD K HKYDML GARHIKDDPVFD G K ARHI KDDHYILADP S D KD NYILADP S D KD YILAD S R S D KAKEKLVRHPARKAKEKLVRHPARRATETK IA RA S D ETK NIV RA D N SETK NIV T NHIATT A I Q S S N SVL KKKLDV Q S S S VL KKKLDV Q S S S VL P K A Q S ETI L G NLK P Q K S ETINHIAT KKKLD ILAEK C E VEETKILTDK C E VEETKILTDK C E VP Q DPTI A YLLE GHKYYVP L G NLK YLLE VDEMK QDPTI G A HKYY L E LR R N L IMNN R N RNLLLLT LLLLT L S TKN SLKTE G I VV S N L SK D S IMNN SLKTE I L GVV S S A D S S LKTE I L GVV S S A YDKRYI Q N L RN SI S D F G L YDKRYI Q N I S D F L D G NNMMLKRK NDMMLKLK AD MLKLK VD DVVALI S Q TD TTDVVALI Q S S TD TT Q Y VFYKMKNY N AD GHD Q Y VFYKMKNY G N HD Y NDM QVFYKMKNY G N HD DLDLDD LR G N MVDLDLDD EID KYL DLPKEEID TE K INE G I VI I LR G N MVDLFKE TA L PLRTE YITA L KHL DLPKEEID QPLRTE ITA L KHL TE PLR VK S PKIKKK K Y SD Q S VK K INE G VI Y TKKYI SPKIKKK S K D Q S LEKFFNP I Q S NLKLE Q K FFNP S I NP I Q S VR KTIKNYLVHVR L N Q F ATRVRFDDV S N F KLKLE K Y QFF QIT RFDDV F KLK QIT VL G L LM D L KTIKNYLVH ARFAD L I PIVL G LM ID PI AL P G S DKTMAAL P G DKTV R V S AL VP S N GDKTV S R W HFYK K I QA FYK Q K A G L S I LL G N E I Q DATRALE H G N A V Q NATRALE H G N Q A GYNKENP I G S LL H SFEFT G W YNKENP S I FEFT W H GYKK S A DLIF YKK S A DLIF A NATRALE VIAYID NEKLA IAYID LA LDF N HY G W SN I T LDF K HY G W YKK S DLIF V S N I K HY STKFFP G S NKYEN S TKFFP S NEK SNKYEN V I ST S T G G K IA Q S K IA Q FP G L S V T NPFE NAKRF LNPFE AKRF S S FP G L S V T G S T G K LDF GIA Q S N SFP G L E V Q V PV K G G F S S TYNE Q I DLIF N V YNE I N QDLIF N L NRI G K F QIPL E LT N GVI YIDD E V Q V QIPL E LT NPV G K F V Q S V LT GVI GKP F G G G G V T YIED P A T T YIDD Q E IPL G E VI IKT Q L S S A F G N D G K E G T KP S Q A L SF N F G D G S K E G N QKIE L IV S K S V K N IVK I S AK N IVK DH VADV S S KIKT QV T Q Q L VADV Q S V Q T L Y V Q VA Q Q MS G Q IK S L K S V Q Q Q S G Q IK S L K S V QI S A Q K Q Q Q S L E G IHKI Y DHI Y G E I Q G IHKI HI G Y E G I IHKI LY G PLV T DH SLY G L T DHI G VDDLE G N I Y AEVDDLE N Y D GI Y AEVDDLE N Y GI AA Q EKL L G E I E S E TAA Q EKLPLV GE DI E S S T L Y GA F AE Q I KITT L K I G E Q F KKITT I G L E Q F IKKITT IV S ED D D Q K I S K T YIV S L ED Q D K E QDDVK Q T IVA Y I GNLDDVK Q K ELIVI K I S T VA G Y N K SHKK A Y I SRTEFDWLAYIKAATEFHWLAY T IVA G Y NLDDVK K I Q TELIVI S HKK S A RT IKA S TEFHWLAYIKA S T YDA IIYDLKAKYDA PAI G Y EKEITVLRPAI Y IIYDLKAKYELKPD GEKEITVLRPEDERH T VEN YE M KPDT VEN YE KPDT RVEN S N R S INL PE G ERHT N R S INL PE G M ERHT S N 0 KLIAELTDIILDKLIAELTDIILD KI 0 KN INRIKN V L Q V NAL KI TIY N N FL Y Y FL RI KT T VIY SD Q NE E KT S A V L Q V A L KI TIY V INL VL Q N Y NK A L SE O M A G G N D G K S H V R T M A G N G N D KY IN G S H V R T M A K N E T G T E H I M Q G S K M A K S N G E T Q NK S E H M M S E KT S A Q A K M A K S N G E T Q S E H M Q M A K 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 7 7 7 8 1 8

1 8 2 2 2 2 B 7 4

W LYFI ALYFI E NR D R YAIV G T I L W G D R FNNR LV F H A GLF K LKE W K Q LF K LK G Q VYNVV K Y S L SK H YFLDVV GDTEAVVL S K S N F G N I Q VLKV Q L II NR N Q Q T Q T D F G H QVN G S F NR K Q T Q T D KF QVN G S F KKIKTKDEVL Q L R NY APYD NRIKTPPVPAY KY S L E PIADL KY L N Q Q Y L S E PIADL IIERFAP G Q N F G D Q R YK I D Q KID KFARF EYYE KKEE C Q PEVFA KKEE C Y Q L PEVFA TIALMLN I A SN V S I K SLL KKIKVAE S E VVN G G KKLLK L K Q IILE E LE R K LFER LPPVEDL EE G Y E VAF SRVI QIDI S L K Y L R GP G V G G H I I QIDI L E SK Y L LEA I RT GP G V G G H R S H LDYK H P GVY I I M G K LY VLLK I ILF SINNY MPLVE KHTKHIMPNYV KHTKHIMPNYV DLNDLEMNA K L G S EN S H HLFK R EV Q A HN KHE I Q L IV KADAKM V VLKE KADAKM E TADR V F ALDEK K D G N S P HL KAI Q E E LYIPR S NYII E LYIPR V VLK SNYII LD P I SLK D D SH N L K LEEYIM S D LKAD EINE L AN S L P R N S F S Q V SATFLKV D APY S ATFLKV APY N L GYEIHKV A G VD G AERLMV YPHI TAEIHIL D LDK SAFE DLDYVDL G DDA DLDYVDL G D DDA FIEKKEI K A GH K D T LLE EK G K ANYR R E GYEDNEEKHW EDNEEKHW ERH S Q Q F S S G N YDI S P RVRDDLT N LEK GYYEP S M D K V GV G K GN SV EAD L Y S S G V L VLEAP DNYIET LVEKHDLNHDLE FVDILRVRAY IL C AA EFD GKVVRHK A EFDEAD S TRYLL Q D C G A KVVRHK VANENT VLEDKEEAAFDY VLDFVDI NK Q N KRNEPTI E IL NEPTI HK E MLTFVF G KDDEKLV NT Y KRYLPFDKHL NR NEK S N LKI MLIILLK N HK G KR GYRD MLIILLK G N YRD KAKDETK Q K VK S A RMLTFLLV H Q S Q Y EFL YT KDDKYILADPI KDDKYILADPI A T LTEKAKKDDYVTI S S Y G S G L ML T GKT KD NV K AN S Y RAKETK ETK A P Q K G EL KKKLDV Q S S NIA RAK SVLE KKKLDV S NI Q S S VLE VD I G PI QKLL S TRVLRKA DKLKTDN L Y QLLILD AK S K TLLN K I K A Q G LT S S Q TDW VKETKILTEKY VKETKILTEKY KRLLYVLVKNR S YLLTH V I PE S E Q S S KL S IE IV A Q S Q L AND LR K L AND K YDK R TI F G NL VI G V Q D Q H SLKTE G I VV S N E D Q H SLKTE I LR GVV S N E DAA S DE I LY RT V IVH V Q K Q K DK I L GVHK S D GN V T Q APV Q S RE SLILILNR G DVKIK YDNMMLKLK N Y DNMMLKLK AN ELDLYEKD L Q G P R E K GD YDK N DV T K D RHRN Q L I QL YDELIVKNLR QIFYKMKNY K A GHD Q IFYKMKNY G K HD ND K F NF DAA K Y SLE G I E T DAETTEIDDI G K DLHKEEID KEEID YL IK E INL SPTII S I D S Y EF DLDLDLK I K SYE G N ADL MEA NLN TEKYITA F KYLDLH YITA F K QPLRVR ANKLLVTH TNIKNLVKE TY L T QKRL I IKI LKKFFNP V Q PLRTEK S F NLKLKKFFNP S V LKAL L K GI M E GAPDRVEF VK S E NNEIDEKH VKFLET P S S NE IV VRFDDL N Q PA DDL F N AL S N Q PA FYEN S KFKF RLKTAAKFKV C M I R S Q KE AVP G S DKTM R VRF SAL A VP G DKTM S R W H GY G I T V LKLMNPIEVEP V RVY SLKKF Q D G D STV WH G K Q NATRVLE H G K Q NATRVLE N RIA SYFEAF D R SI L S A SP HFYHE FKIKL AYLLK G E K T SL GYTK S A DLIF Y G W YTK S A DLIF HY V I STKF L VTY G W YNKEP G S N Y HKY VI T LEF N H T LDF N N PFD A D QDN Q S IKK G A ITYINA A R QF S L G YFFVK E FYKHR ST G S K G K IA Q S N L I K G K IA Q S AP P S V TKFFDLP V S Y V INE QAD S T A E S S L YV SDV Q N Q P LR NPV S S FP G S V T G S S S FP L N G YNE S G F Q V TNPV G F Q V LT G T TP A L SFVI K F SDL S L NPFE M I S PWKNP G E QEADIPF LT TFI S D E V QIPL E L GLI S D E V QIPL G E LI IKR S Q AAAIKKKT FN A GDIIVK V T FI S G DIIVK EHLVIF G T D Q F Q S VV Y L NLEIT S I EE SR G K G T Q F N K GTEYEIKEMY IKN S I S A Q K Q M Q IKN S V A Q KM Q S DH TDTKLI I Q KP Q AVVHL QK DYV Y K GE M Q G LHKI YDYV Y K I S Q Y L VEK G ED GE M Q G LHKI G AL G K DK IDLF DH S K D LKT IEV DP R L Y L V GI Y F AEVDDLE G N I L I I S N V HI G K VH A H SDF S K L L Y AEVDDLE N I RAEVI S DIWKL L G V S D Q G EKHDKK G I Y D Q FLDVL H G G A Q F Y IKKITTKI G L A Q KKITTKI S ELIVIEYKV L Y K L HKIDE Q I QD E F G T S PI IVA G NLDDVKH Y I GNLDDVKH EA R VV E N G A QL L NKILYL SEDVD V G N K G Q III NDEYV G G I TEFHWLTYIKA T IVA STEFHWLAYIKA S T S Y Y DK GKLK S Q N IM I D SELIIK D T SVTIN TE G Y MFK L YE PD KPD EN Y K L SLAFLILR G E A L S D LITDL YE Q F ANKY S T K I Q S PE V K GERH S T N RVENYE SVINLPE G V ERH S T N RV SVINL R ANAT D N QL Y E SAI G Y Q V LNRKIVKDDA WKPL A Q S I L H QI 0 KVTTIY LTNALKVTTIY LTNAL K G N G N Y ND GWNKNLE 0 K Y A KKLAD S K D DNVHL LF G Y PL KE TTD E T Q V NE V M EK D T Q NE E KFFNPY Q G N M T T G K D G E A E H M M K RD Q H SDLV N F GV O M G K D G A E H M Q G S N M I L K T T K IKT G Q N N F Q VIRKT I N G Q D K N K E M G G G Y N Y Y D R T Y L M T I N G G K I L D I T K K A 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 8 8 8 8 1 8

1 8 2 2 2 2 B 7 4

W LY IKLN ALY IKLN FIKLN F N A GLF F E W LF F E W N ALY E W N ALYFIKLN W ALYFIKLN NRLN K Q Q E T RT P S F G N QVN S F NRLN K Q Q E T RT QVN P S F G LF SF NRLN Q KQ RT G LF T G E F G N LF Q Q T VN P S F SF NRLN Q K Q Q T R QVN S P F NRLN K RT E Q Q Q Q T VN P S S F KYAE PIADT KYAE PIADT KYTE KKEE C Y Q L PEVLE KKEE C Y Q L PEVLE KKEE C Y L PIADT KYAEY PIADM KYTE QPEVLE KKEEY Q L PEVLE KKEE Y PIADT C Q L PEVLE IILE E LE E L ILE QIDI S L K Y L VG R GP G G H I I QIDI L E SK Y L GP G VG R I ILE GH Q IDI S L K G Y P G VG R GH Q I IDI L E SK Y L GP V R G G G I ILE QIDI L E Y L SK G P G VG R GH KHEKHIMPNYA KHEKHIMPNYA KHEKHIMPNYA KHEKHIMPNY S N KHEKHIMPNYA KAETKM I VLKE KAETKM I VLKE KAEAKM VLKE KAEAKM LKE KAEAKM LKE DL Y IPR S DYMI R S I DYMI L IPR I V SDYMI PP I V SDYMI SA S FLKV APY D L SA Y IPR S DYVI L SFLKV APY S D A Y IP SFLKV PY S D A S Y FLKV APY D L SA Y I SFLKV APY NLDYVDL G G DDK DYVDL G G DDK DYVDL G A GDDK LDYVDL G D DDD VDL G G DDK GYKDNEEKHW N L KDNEEKHW N L EEKHW G N YKDNEEKHW N LDY GYKDNEEKHW AV EAN L G Y S AV L G YKDN FDEAN S L AV FDEAN S L AV EFDEAN S L VL C AA EFD GKVVRHE A EFDEAN S AV C G A KVVRHE A VVRHE VL C AA E GKVVRHE VL C A G S KVVRHE KREEPTI K VL EEPTI HK K VL C S E GK G KREEPTI K G K KREEPTI HK G K KREEPTI HK G K MLIVLLK N HK G KR GYR IVLLK G N YR LK N H GYR MLIILLK G N YR MLIILLK G N KDDNYILADP D ML S KDDNYILADP D MLIIL S KDDNYILADP S D KDDNYILADP S D KDDNYILA Y R GP S D RATETK TETK A RATETK IA RATETK NIA RATETK NIA KKKLDV Q S S NIA RA SVL KLDV S NI Q S S VL V Q S S N SVL KKKLDV Q S S S VL KKKLDV Q S S S VL VEETKILTEK E KK C VEETKILTEK E KKKLD C VEEIKILTEK C E VEELKILTEK C E VEEIKILTEK C E L TND I LR N L VTND N L ITND LR R N L ND R N D S V SLKTE G VV S S K D S S LKTE I LR GVV S S K D S S LKTE G I VV S N L SK D S VTND L SLKTE G I VV S S K D S IT SLKTE I L GVV S S K YNNLMLKLK AD NNLMLKLK AD DLMLKLK NDLMLKLK QVFYKIKNY G N HD Q Y VFYKIKNY G N HD Y N QVFYKMKNY N AD GHD Q Y VFYKMKNY N A G Y NDLMLKLK AD GHD Q VFYKMKNY G N HD DLVKEEID VKEEID YL DLVKEEID KYL DLVKEEID KHL NLVKEEID TE YITA L KYLDL A L PLRTE YITA L PLRTK ITA L KYL QPLR LE Q K FFNP I Q PLRTE S K K YITA L K QPLRTE FFNP S I K YIT P I Q S K KLE Q K FFNP I Q S KLKLE K Y QFFNP S I VRFDDL N Q F I L KLE Q S RFDDL F K L KLE Q FFN AL S N Q I S VRFDDL S N Q F I S L RFDDL N Q F ITRVRFDDL F K QI L K S VVP G S DKTV R V SAL R V VP G DKTV S AL G DKTV R V S AL P G S DKTIAAL VP S N GDKTV S R W H G N Q NATRALE H G N Q ATRALE N V VP G Q ATRALE H G N V V Q NATRALE H G N Q V GYKK S A DLIF Y G W YKK A N SDLIF HY W H GYKK A N SDLIF Y G W YKK S A ELIF A NATRALE VI T F K H T EF K H S I T LDF K YKK S DLIF S Q Y G W G I T K HY ST G N K LD Q S L I K LDF K N A Q F S N G L S V T G S K VA Q FK G L S V T G S K LEF GIA Q S S F S N G L NRV K G IA G V I ST S T G K L G G F S S F S N G S V T G N QV TNRV K G IA Q S S F S L S I GF V Q V LT NRV G K F V S S Q V RV K G G F S S TYIDD E V QIPL E L GVI DD Q E IPL G E VI IPL E LT N GVI E LT NRV G K F V Q S V LT GN IVK V T YI L S IVK T YIDD Q E T YIDD E V Q V QIPL G VI E T YIDD Q IPL G E VI IK S I S A Q K Q L S G N QIK S L S V S Q KL Q S G N Q IK L IVK S K S V N IVK QI S A Q K Q L S G Q IK S L S V S K N IVK Q Q Q S G Q ID S L K S V AK Q L Q S DHI Y K GE L Q K Q I A A GLHKI YDHI G Y E G L LHKI LHKI Y DHI Y K GE I Q I S Q G LHKI Y DHI G Y E M Q I GLHKI LY AEVDDLE G N I Y AEVDDLE N Y DHI G Y E G M GI DDLE G N I Y E G N I Y AEVDDLE N Y GI GA Q F ITT I G L A Q F L Y F AEV Y KITT I G L A F AEVDDL Q KKIMT I G L A Q F KKITT IVA Y IKK GNLDDVK Q K TIVA Y IKKITT GNLDDVK K I G A Q IK QT IVA G NLDDVK Q K T IVA Y I GNLDDVK Q K T IVA Y I GNLDDVK K I QT TEFDWLTYIKA DWLTYIKA TEFDWLTYIKA FDWLTYIKA EFDWLTYIKA YE IPD S TEF MIPD E S S YE PE G M ERH S T N RVE S YE S INLPE G ERH S T N RV S NL PE M IPD GARH T N RVE S TE S YE M IPD VE A T S YE IPD RVE S S S S V INL PE G ARH S T N R S INL PE G M ARH S T S N 0 NI IY L Q V NALNI AIY QNE EKT S T N T V L V I QNAL NI 0 KT T A SN Q NE T AIY V L Q N N ET V Q NE A L NI SE KT T VIY L Q V S T Q V NE A L NI AIY V INL QN E O M A K N G A E H M Q I S S K M A K N G A E H M Q I S E KT S S K M A K S N E T G A E H M Q M A K M A K S N G E A E H M M S E KT S T NE A L SE Q A K M A K S N E T V L Q G A E H M Q M A K 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9

8 8 2 8 2 8 9 1 1 8

2 2 B 7 4

]

W LYFI ALYFIKLN FI S L F N A GLF K LN E W GLF E W N ALY K LN S E G F N K FRDYAD IDLKT Q T D TI NRLN Q K Q Q T Q T G Q S F N VN G F NRLN Q K Q Q T RT QVN P S F G LF SF NRLN Q KQ Q T LMF REPDKK G S Y G K VE Q Q T VN S G W GF L H LF Q KD GLNYMK S P EKY NFAKFDI KYTE PIAEV KYAE PIADT KYTE KKEE C Y Q L PEVFA KKEE C Y Q L PEVLE KKEE C Y L PIAEV NREEI LEA IMKLLRN E L KD S S AF QPEVFA KYEN L MP QKVLEK RLTF IILE E LE L E L KLE S L RRNKIH RNNP N KLTVY S G I H QIDI S L K Y L VG K GP G G H I I QIDI L E SK Y L GP G VG R ILE GH Q I IDI S K G Y P G VG K K GH VINIRKIDLYE TY VNV G G D VH SLL KHEKHIMPNYA KHEKHIMPNYA KHEKHIMPNYA EMEAKRILA G V YT V Y GYIIERK KAETKM V VLKE KAEAKM I VLKE KAEAKM VLKE KHDEPME I G I RDKKELEEV S V S H D L Y IPR S NYII R S V NYII KAYLLKD G K SA S FLKV APY D L SA Y IPP S DYMI SFLKV APY D L SA Y IP SFLKV PY DLDFVRVR V YF RVV IKLAVYE GIK RLT Q E K KK NLDYVDL G D DNA DYVDL G G DDK DL D A GDNA TAEYPDINNN AKLRE S Q RT L AV SIF GYKDNEEKHL N L KDNEEKHW N LDYV EEKHL NLE ERVLP S H TLIRELLLAIH AV EAN L G Y S AV L G YKDN FEEAN S L YK E E GDFITYDT RF VL C AA EFE GKVVRHT AA EFDEAN S AV C G KVVRHE A VVRHT S G ILNPI ALE KL D I S S RNLPR S G R EDD H KREEPTV E VL EEPTI HK K VL C A E GK G KREEPTV K G E ILDELT Q S S S DND LAL S I L Y G S K MLIVLLK N HK G KR GYKD MLIILLK G N YR LK N H GYKD KRL LYTHNE RNLKF G WR F SKHFE KDDNYILADPI KDDNYILADP D MLIIL S KDDNYILADPI ILK E Y GTI LAKN RVILETNRK R RATETK S S NIA RATETK A RATETK IA KDIKDK G I VHIY ADAEPYN Y G G D KKKLDV Q S VLE KKKLDV S S NI Q S VL V Q S S N SVLE KAD HPH ET KH Q VEETKILTEKR VEEIKILTEK E KKKLD C VEEIKILTEKR R L IKL SIKNYIK AN D N SLRR I S G I D YE GKED L TND LR K L ITND N L ITND K D Q R D S I SLKTE G I VV S S E D S S LKTE I LR GVV S S K D S S LKTE I LR Q EATEIDDRL T GVV S K V SE L KHE E Q LH SN M K YKK GK R A Q G VP G Y NDLMLKLK A S NDLMLKLK AD DLMLKLK N LKLEA SNVNLP F N SVEYL DK Q I ELT S FD G G S I QVFYKMKNY G D HD Q Y VFYKMKNY G N HD Y N QVFYKMKNY D A S T GHD LDFFEI NETVR S G LAH NLVKEEID KEEID YL NLVKEEID RYL DAIKIE G S DRKFL D A K EW TE YITA L RYLNLV YITA L K QPLRTE A L PLRDL YFDPTV D AP Q K S K Q AN S F S K LE Q K FFNP I Q PLRTE S KLKLE Q K FFNP S I K YIT P I Q S KLKKD Q K FLELIK K ERF G IIYLN Q L ENTL SLT LEF K E LKYD VRFDDL N Q F PT DDL F K L KLE Q FFN AL S N Q I S VRFDDL S N Q F PT KFDEDD PLE LIL G S V S I R Q H KR VVP G S DK R VRF SAL R V VP G DKTV S AL G DK R I RNE KK F Y SF WH G N Q NATR N M QLE H G N Q ATRALE N VP G Q V ATR N M S V QLE ALA A EVRD R Y LDNKPAWPP R I N T S R A S Q FADTDM YIAT GYKK S DLIF Y G W YKK A N SDLIF HY W H GYKK A N SDLIF HK Q S S KLIPFNP L G G I LFE S Y KLLE VI T F N H T DF N HY SN W Y TDLEIKILT M H Y F G Q TE IKKKYY ST G S K LD Q S N L I K LDF K N A S Q FP L G G I G N KKIKTYP NR N S I VML L NRV K G IA GF S S FP G S V T G S QV TNRV K G IA Q S S F S G L V I ST S T G K L GI GF V Q S V LT NRV G K F V Q S V V V K I GDKIHRE F G D Q R YEIKL S D L T YIDD E V QIPL E L GLI DD Q E IPL E E LT S T GVI IPL G LI NPIDVMDDFK Q MS K Q KK KVY GN IVK V T YI Q M S G N IVK T YIDD E IK S L S I S A Q K Q Q IK S L S V L G N S V M LIIDNI F F VI Q I RT N NAD F G Q T QI S A Q K Q S Y Q G T KT P D S Q L G N I IALF K S V SPI T RII G Y DHI Y K GE I Q G LHKI YDHI Y K Q I S A Q K Q Q S IK L IVK S K GE G M LHKI LHKI L Y K GDI Q V YT LY AEVDDLE G N I Y AEVDDLE N Y DHI G Y E G I GI DDLE N Y ID T HLFML S K Q S GI DYFAF R V Q G K EAV K QE RALDER VLNR GA Q F ITTKI G L A Q F L Y F AEV KITTKI K N IVA Y IKK GNMDDVKH Y IKKITT GNLDDVK K I G A Q Y IK QT IVA G NMDDVKH L Y I S I T L Q G Y N GWL L S AK L LNEYK Y I GYA PI G S D G TADRLAMPN S I TEFDWLAYIKV T IVA STEFDWLTYIKA TEFDWLAYIKV T G N S IKFNPD Q D E G E W LLE IVVLE L L S YE IPD IPD E S S YE PE G M ARH S T N RVENYE SVINLPE G M ARH S T N RV S NL PE M IPD GARH T RVEN TDLKKIA Q K GLE N YDL L N S MKNY L S T S S VINL YE S TIYKILA S KL G C LIE KV A S K EN 0 NI IY LPNALNI AIY L V I QN EV S D DF 0 KT T A S V LPNAL NK Q AK KDN VLE K H SK ET Q V NE EKT S T N T Q V NE A L NI M S E KT T AIY S Q NE RDE T RDV SILIKN R IPDE H N L GI O M A K S N G A E H M Q M G S K M A K N G E A E H M Q A K M A K S N E T G A E H M Q MS E KE G K M N K K E Y K D N I E T K S M L S F L L E D G K A G G K E 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9

8 9 2 9 2 9 9 1 1 8

2 2 B 7 4

L G H ETE I S F K S P E W DNL Q R PVKVI I AVKPY PY ETEDVVKHPL NR D IKLDV F G H ITEYK L H ETET G R FMIAIL L I G L H ETEDVVK G KVIAIL L I G L G H NKVIAAL KF G D R N S Q YYIMLFY NRMPNK K LV D Q NR D N G Q YFEEVLFT NR G D R N QYY EVLFT NR G D Q R YY VLA S T KKIKV EL E KYETFK Y S LP GDA F KFIKVDV KKI KFIKVN Q I KKI KFIKVN I E Q V LN II E RT K N QAHA G G R KKE R I IPL G S N KK V L V SA D M K RTEE L V SA I D M KK TEE S L A S EL SM G LFLPV H II QHLFT V N Y GKE KM N Q T Q L IVAF S II Q RTK GLFY S K I G Q LFLPPN S S SKYTADV F YILFKI I PN S Q VL F Q I G ILFKREVL F Q II Q R GLFLPPNE KALDE R N Q EVKP KHKL IDV G H EHLDEDEDY N Q H Y GTI EHLDEK DY N Q H YILFKREVL N V GE GTI EHLDEK Y S K DLE RL K R QVLKAYE KANEYLP AHPL KANEYK G Y HPL KANEYK Y D G AF K K Q TAE E YR G H CLIM I VPE KA SANY ELLFYMEKNKN DLENIRE S D DAL LENIIM D A SDAL DLENIIM S D D L S T N LLD MVPNWN MALNLEELLTV TALK GYDI S P EKVL KD K VRYVY DI L K SK Y K HLA T D S TALK MV LA S T TALK G G G A IDI S L EK K H GA L MV H G LVEKHRVDY S Y E K L E Q E M R SDINADD N L GYEKHIMRH I LN SEL G N YEKHRVRH I LN S EK G K A E T GY SEL N LDI GYEKHRVRH Q T VLEDKDL E S Y LDYEKEDW VNHE IDEKDLNHE KRYLPEE D AN SDK I G HV QLVLDIKFLKHN Y LVDEKM S VL KKY N VL GE VL LPEEKY N VLIDEKDLNHI A R SK MLTFLFD EKPI D RR Y LPE SFLRVAN S K R S Y FLFDAN K G E VL PEEKYEAR S S KRR Y L SFLFDADTLD KDD K HT GAK S Y K PR SMIEALT T AN Q S HKK ILDYVDL NF K KR Q LDNVLV F Q K KA D VLV K G KTIRHRHLKDYLVIWVHRY KDKEKEE S S V L T M S KDK TI S N SV T MLDNVLV NKLV AK S NHPYDKATMKI A Q KAA DVE G G AA E K SELKVE G L S KDK KTI S S VKRI G PE S ELK GE KYILIAKEPEK I LHK GVHII AK S EF VLR E T K GY AK DE E T KAA S E ELKVEVRK V F NP T V Q ETVKKNLE G V PE F G EL Q PTIVI Q T E Q F T LLR Q VI T G Y AK E LLRI Q RP F D P Q T IN R D GF K K Q S LILI Q S D DLLRME SNKTTL K KRKI A RP SKVKL E P QLV Q S RKI S A KV E Q Q L Q E LV S V QRK NF YNKKYILRI ER T S S S PIKNIKDD VKL Q E LL GLDIDEDYWKKPKY LNYEARKKPKYILNYEARKKPKYILNY Q V EK DAALLD TE D T SFYYD Q I PYEA RPHL YDTTL Q T DLDLDE G I LR S DFTLD YDTTFD D TLD YDTTFD Q VKLVDAKLDE G I F KLVDAKLDE I D G F PLF G EA TE VV G E Q RL DAI GDLK V TFP S L VKND DAKLDV GDM AADLDLNIL S I FKRI DLDLNLK I G S FKRI DLDLNLK S I F Y E I NIK S FA IE S G LTKRKKPTT TYE S HKE GDHKHMTEE ND VR KTKINYPIIV K V QELNEPV YRIKL S K TE I DEVRKTEE VRKTEE GDKI L K N ST I KDE SIDKI D IKL K N ST I KDEL SIDKV L P S AL Q L LMEADLLV S VREEIMII I T GNLR NLMLKKRN R D IK GF NLMEAKRN G R F RNLMEAKRKY G A W HFYRNP KMEA FVLR I R SLFYRIKIF NF I R SLFYRNPIF GYNKTA L E Q ALD SF I L SEH HLK E S F TT HKHLIF V S LFYRNPIFRK EK YKHD F N NF I QEK HHKHD VITYIK G S NEE I G W YHL Q R Q P VR V T QEVE W H GYKYIEA N Q P S L SPLF W H GYKYIP G S N PLF G W YKYIP S F G N V STLT STKFFDPDK N L IFFIEVPRKLY KFFNP Q A V EA KFFNA A S Q V NKFEKDLKR K G S T S V T KFLEIFYK V I STFAAD Y Y EA IKFFNA Q A VAIF SPL S FA V I K L S TFAADLPL S FA S V TFAADLPLFMT TFN IFF Q ANP Q A YKLLI E G NPD G I D L P NPD L I GK D ELD Q F FTELH Q F S PT F E ET AI N L SV L P NP P S E EL I V S L F E E Q F G E I N HLF SKKL IK Q E S AF H SV N G LN T YK A S G E D A I DV G S LV G T K A Q Q N A N SDLHKKY G A T F GK A Q F G E I S N QN S A DVHKKY G A G T K Q A N S A DVHKKDN DH L VKKV Q V L G C I N F QN A S I Q S IL I E DLRK IE RK E TKKVDLKDI LY G PL T R Q EDY Q KKD Q T S IE N TKL GITVP S L DH N TKKVDL G EDEITVP L I S DH G N DEITDRI GE G EDN L S KLAI E S A T T G A E Y S S V F Q DH G S Y YV K EF GKDIDVTLT V G K KLKDVTLT YV K E GKLKDVWIN IK S EDIII S N EA L Y Q G TYT S D VYIAIF L Y GDIEDILYIAIF G L DIEDILYIANN SELILKDHKTL V G A SLK G N E KENI G L YIED E KFIT LIVKN FIT KLIVKN Q KTLA YEA IRDDLKLTTEV K DYL S LVAI KLIV G K GIPTLTK K S Q VHLF I K SEK LE Q K HLF S I EK E S V EKI G Y DLVITKRYYEIDVKLRIK K T S I EK Q Y LL LLKKKL Y L G Y DL D S V SLKKKL Y Y L L S D LK G KY L GE KLLADD DIVRRALLIDAEDIVL S Y G DI Q P E L QFAEKNNRKDN P Q E FAEDHNRKDN P E L G D QFAEDHNRM D 0 K FH G D YIIRKKLK KTI KLE FH KAI KLE H DIY S K 0 M G T N K G G Y NYI Y KLKIRI WKLE GNL D DI S T I Q KNKRNTL G AEADIN Q E I NTA Y F IRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY G S IR V O M F F K W S V V N D M P F A N L I L D V H E M A F D W D H D R W I N M A F D W T A D R W I N M A F D W T A D R R E S G L I 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 9 9 9 0 1 8

1 8 2 2 2 3 B 7 4

I TEDVVKPY I ETEDVVKPY I DVVKPY KPY I ETEDVVKPY L H E G IAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L L G H NKVIAIL G L NR G D R NKV QYY EVLFTNR D NKVIAIL G L L H ETE G G Q R YY FT NR G D R N QYY EVLFT NR G D R N QYY VLFT NR G D Q R YY VLFT KFIKVN Q I IKVN I EVL Q VKKI KFIKVN Q I KKI KFIKVN I E Q VKKI KFIKVN I E Q VKKI KK RTEE L VKKIKF SA M EE L V SA K RTEE S L A M KK TEE S L A M II G Q LFLPPN S D S KK Q RTEE S L A M GLFLPPN S D S KK RT LPPN D M S S K I G Q LFLPPN S D S II Q R GLFLPPN S D S YILFKREVL F Q II QHYILFKREVL F Q II G Q LF QH YILFKREVL F Q I ILFKREVL F Q EHLDEK DY G N TIEHLDEK TI EHLDEK DY N Q H Y GTI EHLDEK DY N Q H YILFKREVL F Q Q H Y G YK G NEYK Y DY N G AHPL KANEYK Y G TI EHLDEK Y G N TI KANE G HPL KANEYK G Y HPL KANEYK Y D G AHPL DLENIIM D AHPLKA SDAL ENIIM S D DAL DLENIIM D A SDAL LENIIM D A SDAL DLENIIM S D DAL TALK MV T DL KHLA S TALK V K HLA S T TALK MV LA T D S TALK MV LA T L S TALK NIDI S EK G A N DI L M SEK G A LN DI S L EK K H GA IDI S L EK K H GA L MV HLA S T GYEKHRVRH I L SEL N I GYEKHRVRH S I EL N I GYEKHRVRH I LN SEL G N YEKHRVRH I LN S EK G K A LN SEL N IDI GYEKHRVRH S I EL LIDEKDLNHE VLIDEKDLNHE DLNHE IDEKDLNHE VL LPEEKY N EVL N VLIDEK EEKY N VL L SFLFDAN K G S KRR Y LPEEKY G E VL SFLFDAN S K Y LP DAN K G E V S KR Y LPEEKY N VLIDEKDLNHE RR Y R S FLFDAN K G E VL PEEKY N V GE S KRR Y L SFLFDAN S K MLDNVLV F Q K DNVLV NF K KRR S FLF Q MLDNVLV F Q K LDNVLV F Q K KDK S N SV T ML SKDK KTI S S V L S T KDK KTI S N SV T M DK TI S N SV T MLDNVLV S NF K K Q KAA E KTI SELKVE G L G TKAA S E ELKVE G G E V L S T ELKVE G L S K G AA E K SELKVE G L S KDK KTI S G AK DE LR E YAK E T KAA S T T LLR E T K AK DE E T KAA S E ELKVE G G PE Q F T L VI T G Q RPE F DE LLR G Y AK Q Q F DE Y AK E G Y P Q T T G Y QRKI S A KVKL E Q LV S VI QRKI A RPE Q P Q S Q RPE Q F T LLR Q VI T G Q RPE F D Q P T LLR E T Q I Q T VKL E P Q Q LV S S KVKL E VI QLI Q RKI S A KVKL E P QLV Q S RKI S A KVKL Q E LV S V QRKI A R SK KKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEAR YDTTFD D TFD LD YDTTFD DAKLDE G I F TLDYDT LDE I D G F T GKLVDAKLDE I D TLD YDTTFD D TLD YDTTFD G G F KLVDAKLDE G I F KLVDAKLDE I D G F TLD GKLV DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI DLDLNLK S I FKRI TEE DEVRKTEE KDEVRKTEE VRKTEE N KDEVRK IKL K N ST I K SIDKI DIKL K N KDEVRKTEE N ST S I IDKI IDKI KL K N ST I KDE R S IDKI D IKL S K T S I IDKI IRNLMEAKRN G F RNLMEAKRN R D IKL S K T S I GF NLMEAKRN R D I GF RNLMEAKRN G R F RNLMEAKRN R D GF SLFYRNPIF F S I LFYRNPIF NF I R SLFYRNPIF N NF S I LFYRNPIF N NF S I LFYRNPIF WHYKHD F N N KHD N EK YKHD GYKYIP G S N Q EK HY S F EK HYKHD F Q EK HYKHD N NF A S PLF G W YKYIP S F G N Q LF W H GYKYIP G N Q S PLF G W YKYIP G S N PLF G W YKYIP S F G N Q EK SPLF VIKFFNA Q V A IKFFNA A S P QV EA KFFNA Q A V EA IKFFNA A S Q V S FAADLPL Y E SFA S V AADLPL S Y FA V I S FAADLPL S Y FA S V Y EA V IKFFNA Q A V EA FAADLPL S FA S DLPL S Y FA N S T D T E EL I S D L Q A F G E N L PN T F L S D E EL I I S V S Q F G E I N L SV L P N S T D E EL P N T FAA SD L I TF S E I GI N L SV L P N S N T F E E E I L GI S V S L F E E Q F G E I N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A G K A Q F QN S A DVHKKY G A G T K Q A N S A DVHKKY G A IE KKVDLRK TKKVDLRK IE VDLRK IE RK E TKKVDLRK DH N T G EDEITVP L IE SDH G N K EDEITVP S L DH N TKK G EDEITVP S L DH N TKKVDL G DEITVP L I S DH G N DEITVP S L L YV G K KLKDVTLT YV G KLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT YV K E GKLKDVTLT GDIEDILYIAIF G L DIEDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF G L DIEDILYIAIF IKLIVKN IVKN N SEK LE Q KFIT KL SVHLF S I EK Q KFIT LIVK LE Q KFIT LIVKN FIT KLIVKN FIT LE S VHLF I K SEK Y L G Y DL S D LKKKLY Y G DL S D LKKKL Y L G Y DL D S VHLF I K SEK LE Q K SVHLF S I EK E Q K SVHLF SLKKKL Y L G Y DL S D LKKKL Y Y L L S D LKKKL P Q E FAEDHNRKDNP E L QFAEDHNRKDN P Q E FAEDHNRKDN P Q E FAEDHDRKDN P E L G D QFAEDHNRKDN 0 KLE FH S DIKAIKLE H DIKAI KLE FH 0 NTA G Y NY G IRDRINTA Y F GNY G S IRDRI NTA G Y NY S DIKAI KLE FH KAI KLE H DIKAI GIRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY G S IRDRI O M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 0 0 0 0 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETEDVVKPY I DVVKPY KPY I ETEDVVKPY L H E G IAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L L G H NKVIAIL G L NR G D R NKV QYY EVLFTNR D NKVIAIL G L L H ETE G G Q R YY FT NR G D R N QYY EVLFT NR G D R N QYY VLFT NR G D Q R YY VLFT KFIKVN Q I IKVN I EVL Q VKKI KFIKVN Q I KKI KF KVN I E Q VKKI KFIKVN I E Q VKKI KK Q RTEE L VKKIKF SA M EE L V SA K Q I RTEE S L A M KK TEE S L A M II G LFLPPN S D S KK Q RTEE S L A M GLFLPPN S D S KK RT LPPN D M S S K IRLFLPPN S D S II Q R GLFLPPN S D S YILFKREVL F Q II QHYILFKREVL F Q II G Q LF QH YILFKREVL F Q I ILFKREVL F Q EHLDEK DY G N TIEHLDEK TI EHLDEK DY N Q H Y GTI EHLDEK DY N Q H YILFKREVL F Q Q H Y G YK G NEYK Y DY N G AHPL KANEYK Y G TI EHLDEK Y G N TI KANE G HPL KANEYK G Y HPL KANEYK Y D G AHPL DLENIIM D AHPLKA SDAL ENIIM S D DAL DLENIIV D A SDAL LENIIM D A SDAL DLENIIM S D DAL TALK MV T DL KHLA S TALK V K HLA S T TALK MV LA T D S TALK MV LA T L S TALK NIDI S EK G A N DI L M SEK G A LN DI S L EK K H GA IDI S L EK K H GA L MV HLA S T GYEKHRVRH I L SEL N I GYEKHRVRH S I EL N I GYEKHRVRH I LN SEL G N YEKHRVRH I LN S EK G K A LN SEL N IDI GYEKHRVRH S I EL LIDEKDLNHE VLIDEKDLNHE DLNHE IDEKDLNHE VL LPEEKY N EVL N VLIDEK EEKY N VL L SFLFDAN K G S KRR Y LPEEKY G E VL SFLFDAN S K Y LP DAN K G E V S KR Y LPEEKY N VLIDEKDLNHE RR Y R S FLFDAN K G E VL PEEKY N V GE S KRR Y L SFLFDAN S K MLDNVLV F Q K DNVLV NF K KRR S FLF Q MLDNVLV F Q K LDNVLV F Q K KDK S N SV T ML SKDK KTI S S V L S T KDK KTI S N SV T M DK TI S N SV T MLDNVLV S NF K K Q KAA E KTI SELKVE G L G TKAA S E ELKVE G G E V L S T ELKVE G L S K G AA E K SELKVE G L S KDK KTI S G AK DE LR E YAK E T KAA S T T LLR E T K AK DE E T KAA S E ELKVE G G PE Q F T L VI T G Q RPE F DE LLR G Y AK Q Q F DE Y AK E G Y P Q T T G Y QRKI S A KVKL E Q LV S VI QRKI A RPE Q P Q S Q RPE Q F T LLR Q VI T G Q RPE F D Q P T LLR E T Q I Q T VKL E P Q Q LV S S KVKL E VI QLV Q RKI S A KVKL E P QLV Q S RKI S A KVKL Q E LV S V QRKI A R SK KKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEAR YDTTFD D TFD LD YDTTFD DAKLDE G I F TLDYDT LDE I D G F T GKLVDAKLDE I D TLD YDTTFD D TLD YDTTFD G G F KLVDAKLDE G I F KLVDAKLDE I D G F TLD GKLV DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI DLDLNLK S I FKRI TEE DEVRKTEE KDEVRKTEE VRKTEE N KDEVRK IKL K N ST I K SIDKI DIKL K N KDEVRKTEE N ST S I IDKI IDKI KL K N ST I KDE R S IDKI D IKL S K T S I IDKI IRNLMEAKRN G F RNLMEAKRN R D IKL S K T S I GF NLMEAKRN R D I GF RNLMEAKRN G R F RNLMEAKRN R D GF SLFYRNPIF F S I LFYRNPIF NF I R SLFYRNPIF N NF S I LFYRNPIF N NF S I LFYRNPIF WHYKHD F N N KHD N EK YKHD GYKYIP G S N Q EK HY S F EK HYKHD F Q EK HYKHD N NF A S PLF G W YKYIP S F G N Q LF W H GYKYIP G N Q S PLF G W YKYIP G S N PLF G W YKYIP S F G N Q EK SPLF VIKFFNA Q V A IKFFNA A S P QV EA KFFNA Q A V EA IKFFNA A S Q V S FAADLPL Y E SFA S V AADLPL S Y FA V I S FAADLPL S Y FA S V Y EA V IKFFNA Q A V EA FAADLPL S FA S DLPL S Y FA N S T D T E EL I S D L Q A F G E N L PN T F L S D E EL I I S V S Q F G E I N L SV L P N S T D E EL P N T FAA SD L I TF S E I GI N L SV L P N S N T F E E E I L GI S V S L F E E Q F G E I N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A G K A Q F QN S A DVHKKY G A G T K Q A N S A DVHKKY G A IE KKVDLRK TKKVDLRK IE VDLRK IE RK E TKKVDLRK DH N T G EDEITVP L IE SDH G N K EDEITVP S L DH N TKK G EDEITVP S L DH N TKKVDL G DEITVP L I S DH G N DEITVP S L L YV G K KLKDVTLT YV G KLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT YV K E GKLKDVTLT GDIEDILYIAIF G L DIEDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF G L DIEDILYIAIF IKLIVKN IVKN N SEK LE Q KFIT KL SVHLF S I EK Q KFMT LIVK LE Q KFIT LIVKN FIT KLIVKN FIT LE S VHLF I K SEK Y L G Y DL S D LKKKLY Y G DL S D LKKKL Y L G Y DL D S VHLF I K SEK LE Q K SVHLF S I EK E Q K SVHLF SLKKKL Y L G Y DL S D LKKKL Y Y L L S D LKKKL P Q E FAEDHNRKDNP E L QFAEDHNRKDN P Q E FAEDHNRKDN P Q E FAEDHNRKDN P E L G D QFAEDHNRKDN 0 KLE FH S DIKAIKLE H DIKDI KLE FH 0 NTA G Y NY G IRDRINTA Y F GNY G S IRDRI NTA G Y NY S DIKAI KLE FH KAI KLE H DIKAI GIRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY G S IRDRI O M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 0 0 0 1 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETEDVVKPY I DVVKPY KPY I ETEDVVKPY L H E G IAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L L G H NKVIAIL G L NR G D R NKV QYY EVLFTNR D NKVIAIL G L L H ETE G G Q R YY FT NR G D R N QYY EVLFT NR G D R N QYY VLFT NR G D Q R YY VLFT KFIKVN Q I IKVN I EVL Q VKKI KFIKVN Q I KKI KFIKVN I E Q VKKI KFIKVN I D Q VKKI KK RTEE L VKKIKF SA M EE L V SA K RTEE S L A M KK TEE S I A M II G Q LFLPPN S D S KK Q RTEE S L A M GLFLPPN S D S KK RT LPPN D M S S K I G Q LFLPPN S D S II Q R GLFLPPN S D S YILFKREVL F Q II QHYILFKREVL F Q II G Q LF QH YILFKREVL F Q I ILFKREVL F Q EHLDEK DY G N TIEHLDEK TI EHLDEK DY N Q H Y GTI EHLDEK DY N Q H YILFKREVL Y Q Q H Y G YK G NEYK Y DY N G AHPL KANEYK Y G TI EHLDEK Y S N TI KANE G HPL KANEYK G Y HPL KANEYK Y D G AHPL DLENIIM D AHPLKA SDAL ENIIM S D DAL DLENIIM D A SDAL LENIIM D A SDAL DLENIIM S D DAL TALK MV T DL KHLA S TALK V K HLA S T TALK MV LA T D S TALK MV LA T L S TALK NIDI S EK G A N DI L M SEK G A LN DI S L EK K H GA IDI S L EK K H GA L MV HLA S T GYEKHRVRH I L SEL N I GYEKHRVRH S I EL N I GYEKHRVRH I LN SEL G N YEKHRVRH I LN S EK G K A LD SEL N LDI GYEKHRVRH S I EL LIDEKDLNHE VLIDEKDLNHE DLNHE IDEKDLNHE VL LPEEKY N EVL N VLIDEK EEKY N VL L SFLFDAN K G S KRR Y LPEEKY G E VL SFLFDAN S K Y LP DAN K G E V S KR Y LPEEKY N VLVDEKDLNHE RR Y R S FLFDAN K G E VL PEEKY N V GE S KRR Y L SFLFDAN S K MLDNVLV F Q K DNVLV NF K KRR S FLF Q MLDNVLV F Q K LDNVLV F Q K KDK S N SV T ML SKDK KTI S S V L S T KDK KTI S N SV T M DK TI S N SV T MLDYVLV S NF K K Q KAA E KTI SELKVE G L G TKAA S E ELKVE G G E V L S T ELKVE G L S K G AA E K SELKVE G L S KDK KTI S G AK DE LR E YAK E T KAA S T T LLR E T K AK DE E T KAA S E EIKVE G G PE Q F T L VI T G Q RPE F DE LLR G Y AK Q Q F DE Y AK E G Y P Q T T G Y QRKI S A KVKL E Q LV S VI QRKI A RPE Q P Q S Q RPE Q F T LLR Q VI T G Q RPE F D Q P T LLR E T Q I Q T VKL E P Q Q LV S S KVKL E VI QLV Q RKI S A KVKL E P QLV Q S RKI S A KVKL Q E LV S V QRKV A R SK KKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEAR YDTTFD D TFD LD YDTTFD DAKLDE G I F TLDYDT LDE I D G F T GKLVDAKLDE I D TLD YDTTFD D TLD YDTTLD G G F KLI DAKLDE G I F KLVDAKLDE I D G F TLD GKL DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI DLDLNLK S I FKR S V TEE DEVRKTEE KDEVRKTEE VRKTEE N KDEVRK IKL K N ST I K SIDKI DIKL K N KDEVRKTEE N ST S I IDKI IDKI KL K N ST I KDE R S IDKI D IKL S K T S I IDKI IRNLMEAKRN G F RNLMEAKRN R D IKL S K T S I GF NLMEAKRN R D I GF RNLMEAKRN G R F RNLMEAKRN R D GF SLFYRNPIF F S I LFYRNPIF NF I R SLFYRNPIF N NF S I LFYRNPIF N NF S I LFYRNPIF WHYKHD F N N KHD N EK YKHD GYKYIP G S N Q EK HY S F EK HYKHD F Q EK HHKHD V NF A S PLF G W YKYIP S F G N Q LF W H GYKYIP G N Q S PLF G W YKYIP G S N PLF G W YKYIP S F G N Q EK SPLF VIKFFNA Q V A IKFFNA A S P QV EA KFFNA Q A V EA IKFFNA A S Q V S FAADLPL Y E SFA S V AADLPL S Y FA V I S FAADLPL S Y FA S V Y EA V IKFFNA Q A V EA T TFAADLPL S Y FA N S D PN T F T FAADLPL S FA S EEL Q F E I L G L S D EL I I S N V S E Q F G E I N L SV L P N S T D S E EL E I S D L GI N L SV L P N N A S T F E E E I L P NPD L V TF G I S V S L F E E Q F G E I N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A G K A Q F QN S A DVHKKY G A G T K Q A N S A DVHKKY G A IE KKVDLRK TKKVDLRK IE VDLRK IE RK E TKKVDLRK DH N T G EDEITVP L IE SDH G N K EDEITVP S L DH N TKK G EDEITVP S L DH N TKKVDL G DEITVP L I S DH G N DEITVP S L L YV G K KLKDVTLT YV G KLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT YV K E GKLKDVTLT GDIEDILYIAIF G L DIEDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF G L DIEDILYIAIF IKLIVKN IVKN N SEK LE Q KFIT KL SVHLF S I EK Q KFIT LIVK LE Q KFIT LIVKN FIT KLIVKN FIT LE S VHLF I K SEK Y L G Y DL S D LKKKLY Y G DL S D LKKKL Y L G Y DL D S VHLF I K SEK LE Q K SVHLF S I EK K Q K SVHLF SLKKKL Y L G Y DL S D LKKKL Y Y L L S D LKKKL P Q E FAEDHNRKDNP E L QFAEDHNRKDN P Q E FAEDHNRKDN P Q E FAEDHNRKDN P E L G D QFAED NRKDN 0 KLE FH DIKAIKLE DIKAI KLE FH KAI KME H H S DIKTI 0 NTA G Y NY G S IRDRINTA Y FH GNY G S IRDRI NTA G Y NY S DIKAI KLE FH GIRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY IPDRI O M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T S G D K W I N 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 1 1 1 1 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETETAVKPY I TAVKPY KPY I ETETAVKPY L H E G IAIL G L L G H FMIAIL L I G L H ETETAV G FMIAIL G L L G H NFMIAIL G L NR G D R NKV QYY DVLFTNR D NFMIAIL G L L H ETE G G Q R YFEEVLFT NR G D R N QYFEEVLFT NR G D R N QYFEEVLFT NR G D Q R YFEEVLFT KFIKVN Q I IKVDV VKKI KFIKVDV KKI KFIKVDV VKKI KFIKVDV VKKI KK RTEE I VKKIKF SA M KV L V SA K RTKV S L A M KK TKV S V A M II G Q LFLPPN S D S KK Q RTKV S V A M GLFY N S D S KK RT Y PN D M S S K I G Q LFY PN S D S II Q R GLFY N S D S YILFKREVL Y Q II QHYILFKI I P QVL F Q II G Q LF QH YILFKI Q I VL Y Q I ILFKI Q I VL Y Q EHLDEK DY S N TIEHLDEDEDY G N TI EHLDEDEDY N Q H Y GTI EHLDEDEDY N Q H YILFKI I P QVL F Q N Q H GTI EHLDEDEDY G TI KANEYK G Y NEYLP AHPL KANEYLP HPL KANEYLP HPL KANEYLP AHPL DLENIIM D AHPLKA SDAL ENIRE S D DAL T DLENIRE D A SDAL LENIRE D A SDAL DLENIRE S D DAL TALK L MV T DL KHLA S TALK A S TALK K LA T D S TALK K LA S T TALK NLDI S EK G A D DI L K SR Y HL G G K A LN DI S L R G Y K H GA LDI S L R G Y K H GA L K HLA S T GYEKHRVRH I L SEL N L GYEKHIMRH S I EL N L GYEKHIMRH I LN SEL G N YEKHIMRH I LN S R G Y G K A LN SEL N LDI GYEKHIMRH S I EL LVDEKDLNHE VLIDEKMVNHE MVNHE IDEKMVNHE VL LPEEKY N EVL N VLIDEK EKKY N VL L SFLFDAN K G S KRR Y LPEKKY G E VL SLLRVAN S K Y LP VAN K G E V S KR Y LPEKKY N VLIDEKMVNHE RR Y R S LLRVAN K G E VL PEKKY N V GE S KRR Y L SLLRVAN S K MLDYVLV F Q K DYVDL NF K KRR S LLR Q MLDYVDL F Q K LDYVDL F Q K KDK S N SV T ML SKDKEKEE S S V L S T KDKEKEE S N SV T M DKEKEE S N SV T MLDYVDL NF K K Q KAA E KTI SEIKVE G L G TKAA DVE G L S K G AA FDVE G L S KDKEKEE S S V L S T G AK DE LR E YAK S EFDVE G G E T KAA VLR E T K K S E GELVLR E T KAA EFDVE G G ET PE Q F T L VI T G Q RPE F G ELVLR G Y AK S EF Q F G EL IVI T G Y A Q RPE Q F TIVI T G Y AK S ELVLR DY Q RPE F G Q PTIVI Q T VKL E P Q Q LV Q S RKV S A KVKL E PTIVI Q T QLLKRKI A RPE Q PT SKVKL Q E LLKRKV S A KVKL E P QLLKRKV S A KVKL Q E LLKRKI A R SK KKPKYILNYEARKKPKY LNYEARKKPKY LNYEARKKPKY EARKKPKY LNYEAR YDTTLD I D F TLDYDTTL Q T LD YDTTL Q T DAKLDE G LDV S D Q F T GKLVDAKLDV S D TLD YDTTL T LNY Q D TLD YDTTL Q T Q G F KLVDAKLDV Q S F KLVDAKLDV S D Q F TLD GKLV DLDLNLK I G KL SFKR V DAK SDLDLNIL S I FKRI DLDLNIL S I FKRI DLDLNIL I G S FKRI DLDLNIL S I FKRI TEE DEVRKTEE IKL K N ST I K SIDKI DIKL K ND RKTEE ND STE I DEV GDKI I DEVRKTEE D DEVRKTEE ND EVRK GDKI KL K N STE G I DKI D IKL S K TE I D GDKI IRNLMEAKRN G R F RNLMLKKRN R D IKL S K TE GF NLMLKKRN R D I GF RNLMLKKRN G R F RNLMLKKRN R D GF SLFYRNPIF F S I LFYRIKIF NF I R SLFYRIKIF NF S I LFYRIKIF I LFYRI WHHKHD F V N KHLIF V EK HKHLIF V EK HHKHLIF V NF S KIF QEK HHKHLIF V NF GYKYIP G S N Q EK HH A S PLF G W YKYIEA N Q LF W H GYKYIEA N Q S PLF G W YKYIEA N PLF G W YKYIEA N Q EK SPLF VIKFFNA Q V A IKFFNP A S P QV EA KFFNP Q A V EA V IKFFNP A S Q V KFFNP Q A V EA STFAADLPL Y E SFA S V TFAAD PL S Y FA V I STFAAD TFAAD PL Y EA I SFA NPD S PL S Y FA S S V TFAAD L S Y FA EEL E V L L PNPD S E ET G I G I PD T G S I L P NPD T S P GI TF Q F G I S N V S Q NAI N L SV L P NPD N A S E ET AI N L SV L P N S F E E AI S V S L F E E Q NAI N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DLHKKY G A T F GK A Q Q N A N SDLHKKY G A T N GK A Q Q N S A DLHKKY G A G T K Q A N S A DLHKKY G A IE KKVDLRK TKL E DLRK IE RK IE RK E TKL DLRK DH N T G EDEITVP L IE SDH G N K EF G ITVP S L DH N TKL G EF E DL GITVP S L DH N TKL DL G F G E ITVP L I S DH G N F G E ITVP S L L YV G K KLKDVTLT YV G KDVDVTLT V G K KDVDVTLT V K E GKDVDVTLT YV K E GKDVDVTLT GDIEDILYIAIF G L YIED YIAIF L Y GYIED VYIAIF L Y GYIED AIF G L YIED IKLIVKN IV K V GE E KFIT LIV K VYI GE FIT KLIV K VYIAIF GE Q KFMT SEK LK Q KFIT KL SVHLF S I EK Q KFMT LIV G K K S Q VHLF I K SEK LLK Q K SVHLF S I EK LK S VHLF Y L G Y DL S D LKKKLY Y LLK S VHLF I K SEK LL GDILLKKKL Y L G Y DILLKKKL Y L G Y DILLKKKL Y Y L ILLKKKL P Q E FAED NRKDNP E L QFAEKNNRKDN P Q E FAEKNNRKDN P Q E FAEKNNRKDN P E L G D QFAEKNNRKDN 0 KME Y FH S H DIKTIKME DDIKTI KME FKDDIKTI KME K DIKDI 0 NTA G NY IPDRINTA Y FK D DIKDI KME FK GNL S IRDRI NTA G Y NLLIRDRI NTA G Y NLLIRDRI NTA Y F GNL S D IRDRI O M A F D W T S G D K W I N M A F D W D H D R W I N M A F D W D H D R W I N M A F D W D H D R W I N M A F D W D H D R W I N 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 1 1 1 2 1 8

1 8 3 3 3 3 B 7 4

I TETAVKPY I ETETAVKPY I TAVKPY KPY I ETETAVKPY L H E G IAIL G L L G H FMIAIL L I G L H ETETAV G FMIAIL G L L G H NFMIAIL G L NR G D R NFM QYFEEVLFTNR D NFMIAIL G L L H ETE G G Q R YFEEVLFT NR G D R N QYFEEVLFT NR G D R N QYFEEVLFT NR G D Q R YFEEVLFT KFIKVDV IKVDV VKK KFIKVDV KKI KFIKVDV VKKI KFIKVDV VKK KK RTKV V VKKIKF SA M KV V V SA K RTKV S V A M KK TKV S V A M S I II G Q LFY PN S D S KK Q RTKV S V A M S I KK RT GLFY N S D Y PN D M S S K I G Q LFY PN S D S II Q R GLFY N S D YILFKI Q I VL F Q II QHYILFKI I P QVL F KII G Q LF QH YILFKI Q I VL F Q I ILFKI Q I VL F Q EHLDEDEDY G N TIEHLDEDEDY G N TI EHLDEDEDY N Q H Y GTI EHLDEDEDY N Q H YILFKI I P QVL F K N Q H GTI EHLDEDEDY G TI KANEYLP NEYLP AHPL KANEYLP HPL KANEYLP HPL KANEYLP AHPL DLENIRE D AHPLKA SDAL ENIRE S D DAL DLENIRE D A SDAL LENIRE D A SDAL DLENIRE S D DAL TALK K T DL KHLA S TALK A S T TALK K LA T D S TALK K LA S T TALK NLDI S L R G Y G A N DI L K SR Y HL G G K A LN DI S L R G Y K H GA LDI S L R G Y K H GA L K HLA S T GYEKHIMRH I L SEL N L GYEKHIMRH S I EL N L GYEKHIMRH I LN SEL G N YEKHIMRH I LN S R G Y G K A LN SEL N LDI GYEKHIMRH S I EL LIDEKMVNHE VLIDEKMVNHE MVNHE IDEKMVNHE VL LPEKKY N EVL N VLIDEK EKKY N VL L SLLRVAN K G S KRR Y LPEKKY G E VL SLLRVAN S K Y LP VAN K G E V S KR Y LPEKKY N VLIDEKMVNHE RR Y R S LLRVAN K G E VL PEKKY N V GE S KRR Y L SLLRVAN S K MLDYVDL DYVDL NF K KRR S LLR Q MLDYVDL F Q K LDYVDL F Q K KDKEKEE S NF Q K SV T ML SKDKEKEE S S V L S T KDKEKEE S N SV T M DKEKEE S N SV T MLDYVDL NF K K Q EKEE S S V L S T KAA VE G L G TKAA DVE G L S K G AA FDVE G L S KDK GET KAA EFDVE G G AK S EFD GELVLR E YAK S EFDVE G G E T KAA VLR E T K K S E GELVLR PE Q F VI T G Q RPE F G ELVLR G Y AK S EF Q F G EL IVI T G Y A Q RPE Q F TIVI T DY AK S ELVLR E T GY Q RPE F G Q PTIVI Q T VKL E PTI QLLKRKI S A KVKL E PTIVI Q T QLLKRKI A RPE Q PT SKVKL Q E LLKRKI S A KVKL E P QLLKRKI S A KVKL Q E LLKRKI A R SK KKPKY NYEARKKPKY LNYEARKKPKY LNYEARKKPKY EARKKPKY LNYEAR YDTTL T L Q S D F TLDYDTTL Q T LD YDTTL Q T DAKLDV Q LDV S D Q F T GKLVDAKLDV S D TLD YDTTL T LNY Q D TLD YDTTL Q T Q G F KLVDAKLDV Q S F KLVDAKLDV S D Q F TLD GKLV DLDLNIL I G KLVDAK SFKRIDLDLNIL S I FKRI DLDLNIL S I FKRI DLDLNIL I G S FKRI DLDLNIL S I FKRI TEE DEVRKTEE IKL K ND STE G I DKI DIKL K ND RKTEE ND STE I DEV GDKI I DEVRKTEE D DEVRKTEE ND EVRK GDKI KL K N STE G I DKI D IKL S K TE I D GDKI IRNLMLKKRN G R F RNLMLKKRN R D IKL S K TE GF NLMLKKRN R D I GF RNLMLKKRN G R F RNLMLKKRN R D GF SLFYRIKIF F S I LFYRIKIF NF I R SLFYRIKIF NF S I LFYRIKIF WHHKHLIF V N KHLIF V EK HKHLIF V EK HHKHLIF V NF S I LFYRIKIF QEK HHKHLIF V NF GYKYIEA N Q EK HH A S PLF G W YKYIEA N Q LF W H GYKYIEA N Q S PLF G W YKYIEA N PLF G W YKYIEA N Q EK SPLF VIKFFNP Q V A IKFFNP A S P QV EA KFFNP Q A V EA IKFFNP A S Q V STFAAD PL Y E SFA S V TFAAD PL S Y FA V I STFAAD Y EA IKFFNP Q A V EA STFAAD PL S FA V TFAAD L S Y FA NPD S PL S Y FA V S EET G S I L L PNPD S E ET G I G I PD T G S I L P NPD T S P GI TF Q NAI S N V S Q NAI N L SV L P NPD N A S E ET AI N L SV L P N S F E E AI S V S L F E E Q NAI N L SV L P S G K Q A N S DLHKKY A T F G G K Q A N S A DLHKKY G A T F GK A Q Q N A N SDLHKKY G A T N GK A Q Q N S A DLLKKY G A G T K Q A N S A DLHKKY G A IE KL DLRK TKL E DLRK IE RK IE RK E TKL DLRK DH N T G EF G E ITVP L IE SDH G N K EF G ITVP S L DH N TKL G EF E DL GITVP S L DH N TKL DL G F G E ITVP L I S DH G N F G E ITVP S L L YV G K KDVDVTLT YV G KDVDVTLT V G K KDVDVTLT V K E GKDVDVTLT YV K E GKDVDVTLT GYIED YIAIF G L YIED YIAIF L Y GYIED VYIAIF L Y GYIED AIF G L YIED IKLIV K V GE IV K V GE E KFMT LIV K VYI GE FMT KLIV K VYIAIF GE KFMT SEK LLK Q KFMT KL SVHLF S I EK Q KFMT LIV G K K S Q VHLF I K SEK LLK Q K SVHLF S I EK LK Q S VHLF Y L G Y DILLKKKLY Y LLK S VHLF I K SEK LL GDILLKKKL Y L G Y DILLKKKL Y L G Y DILLKKKL Y Y L ILLKKKL P Q E FAEKNNRKDNP E L QFAEKNNRKDN P Q E FAEKNNRKDN P Q E FAEKNNRKDN P E L G D QFAEKNNRKDN 0 KME FK D DIKDIKME K DIKDI KME FK 0 NTA G Y NL S IRDRINTA Y F GNL S D IRDRI NTA G Y NL D DIKDI KME FK KDI KME K DIKDI SIRDRI NTA G Y NL D DI SIRDRI NTA Y F GNL S D IRDRI O M A F D W D H D R W I N M A F D W D H D R W I N M A F D W D H D R W I N M A F D W D H D R W I N M A F D W D H D R W I N 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 2 2 2 2 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETEDVVKPY I DVVKPY KPY I ETEDVVKPY L H E G IAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L L G H NKVIAIL G L NR G D R NKV QYY EVLFTNR D NKVIAIL G L L H ETE G G Q R YY FT NR G D R N QYY EVLFT NR G D R N QYY VLFT NR G D Q R YY VLFT KFIKVN Q I IKVN I EVL Q VKKI KFIKVN Q I KKI KFIKVN I E Q VKKI KFIKVN I E Q VKKI KK RTEE L VKKIKF SA M EE L V SA K RTEE S L A M KK TEE S L A M II G Q LFLPPN S D S KK Q RTEE S L A M GLFLPPN S D S KK RT LPPN D M S S K I G Q LFLPPN S D S II Q R GLFLPPN S D S YILFKREVL F Q II QHYILFKREVL F Q II G Q LF QH YILFKREVL F Q I ILFKREVL F Q EHLDEK DY G N TIEHLDEK TI EHLDEK DY N Q H Y GTI EHLDEK DY N Q H YILFKREVL F Q Q H Y G YK G NEYK Y DY N G AHPL KANEYK Y G TI EHLDEK Y G N TI KANE G HPL KANEYK G Y HPL KANEYK Y D G AHPL DLENIIM D AHPLKA SDAL ENIIM S D DAL DLENIIM D A SDAL LENIIM D A SDAL DLENIIM S D DAL TALK MV T DL KHLA S TALK V K HLA S T TALK MV LA T D S TALK MV LA T L S TALK NIDI S EK G A N DI L M SEK G A LN DI S L EK K H GA LDI S L EK K H GA L MV HLA S T GYEKHRVRH I L SEL N I GYEKHRVRH S I EL N L GYEKHRVRH I LN SEL G N YEKHRVRH I LN S EK G K A LN SEL N IDI GYEKHRVRH S I EL LIDEKDLNHE VLIDEKDLNHE DLNHE FDEKDLNHE VL LPEEKY N EVL N VLFDEK EEKY N VL L SFLFDAN K G S KRR Y LPEEKY G E VL SFLFDAN S K Y LP DAN K G E V S KR Y LPEEKY N VLIDEKDLNHE RR Y R S FLFDAN K G E VL PEEKY N V GE S KRR Y L SFLFDAN S K MLDNVLV F Q K DNVLV NF K KRR S FLF Q MLDYVLV F Q K LDYVLV F Q K KDK S N SV T ML SKDK KTI S S V L S T KDKEKTI S N SV T M DKEKTI S N SV T MLDNVLV NF K K Q K KTI S S V L S T KAA E KTI SELKVE G L G TKAA S E ELKVE G G KVE G L S K G AA LKVE G L S KD G S ELKVE G G AK DE LR E YAK E T KAA PE Q F T L VI T G Q RPE F DE LLR T G Y AK S EL LLR E T K E T KAA E Q Q F G E T T G Y AK S E GE Y AK E G Y P Q T QRKI S A KVKL E Q LV S VI QRKI A RPE Q P Q S Q RPE Q F T LLR Q VI T G Q RPE F D Q P T LLR E T Q I Q T VKL E P Q Q LV S S KVKL E VI QLV Q RKI S A KVKL E P QLV Q S RKI S A KVKL Q E LV S V QRKI A R SK KKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEAR YDTTFD D TFD LD YDTTLD DAKLDE G I F TLDYDT LDE I D G F T GKLVDAKLDE I D TLD YDTTLD D TLD YDTTFD G G F KLVDAKLDE G I F KLVDAKLDE I D G F TLD GKLV DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI DLDLNLK S I FKRI TEE DEVRKTEE KDEVRKTEE VRKTEE N KDEVRK IKL K N ST I K SIDKI DIKL K N KDEVRKTEE N ST S I IDKI IDKI KL K N ST I KDE R S IDKI D IKL S K T S I IDKI IRNLMEAKRN G F RNLMEAKRN R D IKL S K T S I GF NLMEAKRN R D I GF RNLMEAKRN G R F RNLMEAKRN R D GF SLFYRNPIF F S I LFYRNPIF NF I R SLFYRNPIF V NF S I LFYRNPIF V NF S I LFYRNPIF WHYKHD F N N KHD N EK HKHD GYKYIP G S N Q EK HY S F EK HHKHD F Q EK HYKHD N NF A S PLF G W YKYIP S F G N Q LF W H GYKYIP G N Q S PLF G W YKYIP G S N PLF G W YKYIP S F G N Q EK LF VIKFFNA Q V A IKFFNA A S P QV EA KFFNA Q A V EA IKFFNA A S Q V KFFNA A S P QV EA S FAADLPL Y E SFA S V AADLPL S Y FA V I STFAADLPL S Y FA S V TFAADLPL Y EA I SFA S V DLPL S Y FA N S T D E EL I L PN T F SD E EL I PD F Q F G E I S N V S L Q F G E I N L N I PD L SV L P S G EL G E I N L SV L P N N A S T F G E E I L P N T FAA SD L I T G I S V S L F E E Q F G E I N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A G K A Q F QN S A DVHKKY G A G T K Q A N S A DVHKKY G A IE KKVDLRK TKKVDLRK IE VDLRK IE RK E TKKVDLRK DH N T G EDEITVP L IE SDH G N K EDEITVP S L DH N TAK G EDEITVP S L DH N TAKVDL G DEITVP L I S DH G N DEITVP S L L YV G K KLKDVTLT YV G KLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT YV K E GKLKDVTLT GDIEDILYIAIF G L DIEDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF G L DIEDILYIAIF IKLIVKN IVKN N SEK LE Q KFIT KL SVHLF S I EK Q KFIT LIVK LK Q KFMT LIVKN FMT KLIVKN FIT LE S VHLF I K SEK Y L G Y DL S D LKKKLY Y G DL S D LKKKL Y L G Y EL D S VHLF I K SEK LK Q K SVHLF S I EK E Q K SVHLF SLKKKL Y L G Y EL S D LKKKL Y Y L L S D LKKKL P Q E FAEDHNRKDNP E L QFAEDHNRKDN P Q E FAEDHNRKDN P Q E FAEDHNRKDN P E L G D QFAEDHNRKDN 0 KLE FH S DIKAIKLE H DIKAI KME FH 0 NTA G Y NY G IRDRINTA Y F GNY G S IRDRI NTA G Y NY S DIKDI KME FH KDI KLE H DIKAI GIRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY G S IRDRI O M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N 6 W

0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 2 2 2 3 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETEDVVKPY I DVVKPY KPY I ETEDVVKPY L H E G IAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L L G H NKVIAIL G L NR G D R NKV QYY DVLFTNR D NKVIAIL G L L H ETE G G Q R YY FT NR G D R N QYY DVLFT NR G D R N QYY VLFT NR G D Q R YY VLFT KFIKVN Q V IKVN V DVL Q VKKI KFIKVN Q V KKI KFIKVN V D Q VKKI KFIKVN V D Q VKKI KK RTEE I VKKIKF SA M EE I V SA K RTEE S I A M KK TEE S I A M II G Q LFLPPN S D S KK Q RTEE S I A M GLFLPPN S D S KK RT LPPN D M S S K I G Q LFLPPN S D S II Q R GLFLPPN S D F YILFKREVL Y Q II QHYILFKREVL Y Q II G Q LF QH YILFKREVL Y Q I ILFKREVL Y Q EHLDEK NY G N TIEHLDEK TI EHLDEK NY N Q H Y GTI EHLDEK NY N Q H YILFKREVL Y Q Q H Y G YK G NEYK Y NY N G AHPL KANEYK Y G TI EHLDEK Y G N TI KANE G HPL KANEYK G Y HPL KANEYK Y N G AHPL DLENIIM D AHPLKA SDAL ENIIM S D DAL DLENIIM D A SDAL LENIIM D A SDAL DLENIIM S D DAL TALK MV T DL KHLA S TALK V K HLA S T TALK MV LA T D S TALK MV LA T L S TALK NLDI S EK G A N DI L M SEK G A LN DI S L EK K H GA LDI S L EK K H GA L MV HLA S T GYEKHRVRH I L SEL N L GYEKHRVRH S I EL N L GYEKHRVRH I LN SEL G N YEKHRVRH I LN S EK G K A LN SEL N LDI GYEKHRVRH S I EL LVDEKDLNHE VLVDEKDLNHE DLNHE VDEKDLNHE VL LPEEKY N EVL N VLVDEK EEKY N VL L SFLFDAD K G S KRR Y LPEEKY G E VL SFLFDAD S K Y LP DAD K G E V S KR Y LPEEKY N VLVDEKDLNHE RR Y R S FLFDAD K G E VL PEEKY N V GE S KRR Y L SFLFDAD S K MLDYVLV DYVLV NF K KRR S FLF Q MLDYVLV F Q K LDYVLV F Q K KNKEKTI S NF Q K SV T ML SKDKEKTI S S V L S T KDKEKTI S N SV T M DKEKTI S N SV T MLDYVLV NF K K Q EKTI S S V L S T KAA VE G L G TKAA KVE G L S K G AA LKVE G L S KNK G AK S ELK GE LR E YAK S ELKVE G G E T KAA T T LLR E T K E T KAA ELKVE G G PE Q F T L VI T G Q RPE F G E LLR G Y AK S EL Q F G E T G Y AK S E GY AK S E G Y P T G E Q Q Q RKI S A KVKL E Q LV S VI QRKI A RPE Q P Q S Q RPE Q F T LLR Q VI T Q RPE F G Q P T LLR E T Q I Q T VKL E P Q Q LV S S KVKL E VI QLV Q RKI S A KVKL E P QLV Q S RKI S A KVKL Q E LV S V QRKI A R SK KKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEARKKPKYILNYEAR YDTTLD D TLD LD YDTTLD DAKLDE G I F TLDYDT LDE I D G F T GKLVDAKLDE I D TLD YDTTLD D TLD YDTTLD G G F KLVDAKLDE G I F KLVDAKLDE I D G F TLD GKLV DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI DLDLNLK S I FKRI TEE DEVRKTEE KDEVRKTEE VRKTEE N KDEVRK VKL K N ST I K SIDKI DVKL K N KDEVRKTEE N ST S I IDKI IDKI KL K N ST I KDE R S IDKI D VKL S K T S I IDKI VRNLMEAKRN G F RNLMEAKRN R D VKL S K T S I GF NLMEAKRN R D V GF RNLMEAKRN G R F RNLMEAKRN R D GF SLFYRNPIF F S V LFYRNPIF NF V R SLFYRNPIF V NF S V LFYRNPIF V NF S V LFYRNPIF YHHKHD F V N KHD V EK HKHD GYKYIP G S N Q EK HH S F EK HHKHD F Q EK HHKHD V NF A S PLF G Y YKYIP S F G N Q LF Y H GYKYIP G N Q S PLF G Y YKYIP G S N PLF G Y YKYIP S F G N Q EK SPLF VIKFFNA Q V A IKFFNA A S P QV EA KFFNA Q A V EA IKFFNA A S Q V STFAADLPL Y E SFA S V TFAADLPL S Y FA V I STFAADLPL S Y FA S V TFAADLPL Y EA V IKFFNA Q A V EA SFA S TFAADLPL S Y FA NPD G EL V L PNPD GI S N V S L G EL E V PD F Q A F E Q F G I N L SV L P N S G EL E V PD L GI N L SV L P N S N T F G E E V L P NPD L V T G I S V S L F G E Q F G E I N L SV L P S G K Q A N S DVHKKY A T F G G K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A G K A Q F QN S A DVHKKY G A G T K Q A N S A DVHKKY G A IE AKVDLRK TAKVDLRK IE VDLRK IE RK E TAKVDLRK DH N T G EDEITVP L IE SDH G N K EDEITVP S L DH N TAK G EDEITVP S L DH N TAKVDL G DEITVP L I S DH G N DEITVP S L L YV G K KLKDVTLT YV G KLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT YV K E GKLKDVTLT GDIEDILYIAIF G L DIEDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF G L DIEDILYIAIF IKLIVKN IVKN N SEK LK Q KFMT KL SVHLF S I EK Q KFMT LIVK LK Q KFMT LIVKN FMT KLIVKN FMT LK S VHLF I K SEK Y L G Y EL S D LKKKLY Y G EL S D LKKKL Y L G Y EL D S VHLF I K SEK LK Q K SVHLF S I EK K Q K SVHLF SLKKKL Y L G Y EL S D LKKKL Y Y L L S D LKKKL P Q E FAEDHNRKDNP E L QFAEDHNRKDN P Q E FAEDHNRKDN P Q E FAEDHNRKDN P E L G E QFAEDHNRKDN 0 KME FH S DIKDIKME H DIKDI KME FH 0 NTA G Y NY G IRDRINTA Y F GNY G S IRDRI NTA G Y NY S DIKDI KME FH KDI KME H DIKDI GIRDRI NTA G Y NY S DI GIRDRI NTA Y F GNY G S IRDRI O M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T A D R W I N 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9 3 3 3 3 1 8

1 8 3 3 3 3 B 7 4

I TEDVVKPY I ETEDVVKPY L G H DTEFM H L H E G IAIL G L L G H K IILH L G DTEFMKI G N V G L L YFLIEDYK S NR G D R NKV QYY DVLFTNR D R NKVIAIL G L NR G Q YY FT KY G D R N QY I E Q ANEI NR AKPT L G H ETE IAK N S G P SIVKN L KY G D R N QY I EK SIVKPLI NR N S D IKK S P KT KFIKVN Q V IKVN V DVL Q VKKI KK AVIAD G N T KKIKVDVVALV KF G D Q R YYIRLPHY KK RTEE I VKKIKF SA M Y EVF AIT RA V II G Q LFLPPN S D S KK Q RTEE S I A M I KV GLFLPPN S D S II Q RA H S I L Q K N FI G Q LF Y I E Q S I I Q S KKVKV N LNID SEH II T Q K A C A AA YILFKREVL Y Q II QHYILFKREVL Y Q FIVLF QH HHLFTAE L V SA HLFTAE L V SAE E R EHLDEK NY G N TIEHLDEK TI KALDELPPN G L GE F R C KALDELPPN N I S M G LFLPV E R KANEYK G Y NEYK Y NY G N G AHPL DLNEY K G L Q HLFT NV T N G Y GFRR DLENIIM D AHPLKA SDAL ENIIM S D DAL TAERL R AVLNI LNEY S KA E Q R I L KP Q NY S D AERL R AVL Q NYF K T QADL L D GEYR H E G V G NI TALK MV T DL KHLA S TALK V HLA S T LE K G H Q S T G S S V LLE K G Y N LDI S L EK G A N DI L M SEK G K A LN N L GYDI S L IM D P SDVA N YDI S L IM D A SD G LN TAERLIM S I ARNK GEL GYEKHRVRH I L SEL N L GYEKHRVRH S I EL LIEKHMV S G IEKHMI N LLD L MVPNP LVDEKDLNHE VLVDEKDLNHE K K H GA T L Q L QLY VLDDKEK K H GA T KI G YDI S EKVL S T G V Q LVEKHRVNY G K LL VL LPEEKY N KE EVL N VVLDD K G E KRYLPRVRHERRRRYLPRVRHV S A S E VLEDKDL AR RR S Y FLFDAD K G LPEEKY S KRR Y S FLFDAD S LNHERVMLTFLDLNHEAVKRYLPEE S D DD I F SE MLDYVLV DYVLV NF K KMFTFLD Q KDDYVEEDYT DDYVEEDYRLT MLTFLFD H AF KNKEKTI S NF Q K SV T ML SKNKEKTI S S V L S T DKDK G Q N K S A DKFDKDKLT KDD VLV G K A G L L KAA VE G L G TKAA K A K DKF V NKN K K S K IV KRY KA D KTIRHHM S D AK S ELK GE LR E YAK S ELKVE G G E T S K S L S S IVN Q S S PE A E VRRAK K G F T L S ELKNH RA PE Q VI T G Q RPE F G E LLR G Y PE A EI Q F G ET KTEIFI V F G ETI S N SI I KP S E V Y Q T R VKL E P Q S P Q T I Q T QLV Q RKI S A KVKL Q E LV S V QRKI A RV Q EPL SKK S K LLLTILRNEE K K Q EPLKTE S L FLTILRN G R RV S Q F G Q NP Q T E DV S KKPKYILNYEARKKPKYILNYEARYDP YH YDTTLD D TLD LD DAA S K LV S VV D S H VV I F TLDYDT Q RK V F Q V Y AA K Y SLV Q S RK V ND KIL LI S S Q NE S G G Q EVYDK Q I YIL S S ITD S L DAKLDE G LDE I D G F T GKLVDLKLDILNYP L G D SL DLKLDILNYPLD DAALLD EKKT DLDLNLK I G KLVDAK SFKRIDLDLNLK S I FKRI THELYN TEE DEVRKTEE I DL YF THELYN DL G L R S L KKIKENNE G I Y E LDLDE I T GLRKLL SF K D G TH NIKVVVEE VKL K N ST I K SIDKI DVKL K N KDEVRKIKENNE ST S I IDKI K S FLPF VRLKTLK L K SFL F IK S E G I L KRKIFF VRNLMEAKRN G R F RNLMEAKRN R D VRLKTL GF ALDIMIKDEVL ALDIMIKDEV S L F VR KT Q I INYN SLFYRNPIF F S V LFYRNPIF NF FYKAIDKKK G T W HFYK KYKAL Q L LMAADF S L G L Y HHKHD F V N KHD F V EK W H GYHKNEAKRKIP G YHKN E IDK QAKRKKF HFYRNP K V S N Q EK HH Q YA GYKYIP G A S PLF G Y YKYIP G S N Q LF KYINPIFII IKYIN IFIPA G W YNKTE F FPKN VIKFFNA Q V A V IKFFNA A S P QV EA V I STKFF T G A S V TKF K S P F T A ITYIK S S G DD P STFAADLPL Y E SFA S TFAADLPL S Y FANPFAD S KS F G N L SAN NPFA S F H N V C L P S V TKFFNPDK S Y L S K NPD G EL V L PNPD G EL V P Q A VFP S F D FL PFEKDLKRLEN TF Q F G E I S N V S L Q F G E I N L SV L P D T S T F GN E ED LPLDNI T F GK E EN S G A S Q V GK Q A N S A DVHKKY A F G G K Q A N S A DVHKKY G A IK A Q Q T A D SM A Q ADLPLDI A N G FN LDIFVIV IE AKVDLRK TAKVDLRK DH E I IL IK Q T MEI KNL G T E D E Q Q I N K SENKDH S EEAID Q P K S AF HY SV Q KI SRK G A DH N T G EDEITVP L IE SDH N T S S FRI N G L N TRF KEDEITVP S G AEVHKKKL G N EVHKKKT DH VKKV Q A VVKL LYV G K KLKDVTLT YV S KLKDVTLT L Y GEV G K KK DLNY L Y V K A GKK DLNRF Y G L NNPLII GDIEDILYIAIF G L DIEDILYIAIF IDDD Q V ITW L G E IDDD Q V ITWVT G L KL G E SKLAI KD S F I KLIVKN IVKN K IA G S C I I K SELIVKTDIANL VFEDIII S N FK SEK Y LK Q KFMT KL SVHLF S I EK Q KFMT I K SELIVK V C D IIE YEK IILYIIHL S I ELILKDHKD T Y L G EL S D LKKKLY Y LK D S VHLF YEK II GEL S LKKKL PEL G Y EID K E S F NAL G Y EID P Q E FAEDHNRKDNP E L QFAEDHNRKDN KLFAEKN S S E L NLFAEKD S K S E Y N YAA RDDL Y C G G I AI Y I GDLVIT T G I Q E 0 KME FH DIKDIKME Y FH DIKDI NT T V G 0 NTA G Y NY G S IRDRINTA G NY G S IRDRI NA E Y FL S G NL L S TY S E NT E FLT V TK K ILADD DIK G K E G NL O M A F D W T A D R W I N M A F D W T A D R W I N M F F D W D K Q RDA Y L NA S G NL L S DN F Q S N H D G N D K I Q M F F D W D K Q R G N I K E N M G N G N Y F G N Y I Y I Q K W L S N

6 W 0 5 0 1 7 . . 7 5 7 6 7 8 9 0 9 9 3 3 3 4 1 8

1 8 3 3 3 3 B 7 4

KYDKE S D P E F G D Q R YAIKL S D F G L KYAE L PVADM DLT KDKFEK Q L KKIKV Q Y S I N KKDRFYE I M S D K SIVD KK KVYNNA T KKEE C Y Q PEVFE L F Q L KLLHTEI II KAAE L N L FKF SL K G S T LVT IVTL PT RMKDV D K GVP KI F Q T Q T G Y VT Q F SK I I Q I SIALFVPT G T KY I ILE Y L R D S V REKILAFE GL G VG G LFLPP GH TEVIVTNKFL F FM Q YF Q K ARIA S HHLFT VA N H KH G M VY DALA THLFMLNEVPK S Q ID L E SK Q KHE Q I HIMPNYM MPIKKK G H S D IALEL L I QI DER IVWNY KAEDKM S DRFAKALDE R A Q ND G L GEL S DLLR KD D V RAL SV G L GN DLNEYR G Q NA I ELYIPR I VLKE AL ADLNEYK G H GNYII EP S ME R G ELL QR S G ILLII TAERLIM D H I WLF SA T K Q TATD S L KIKAYD TADRLEMPN I P SLL AATFLKV APY E NTEETNPE LLE MV E MAK NTDIRRINN E IIVLEA L G D DNA F S A S S EKKA T K G N YDI S L EK K HV S A S HL GHEAV Y S F GYLKKME K H TLL L S P MKNY T LDYVD EKHW KFDM G L Y Q N VLIEKHRVRYTLT G K II LVDDPAE G L Q V QYPY G YD QIE K L S G S YKDNE DEAD L FVL S LLYA D VLDDKDLNDKMT HLD VLELLKVNADE ILE K H N AV SK A V QI D A S E S I VL C AA EF GKVVRHA A LK FY G IE QIFAKRYLPEEDNKRY RRK KREFVDIKD RR IPDE D F N I KREDPTINHK E Q G L R A S PD I K GE MLYYKEKT M S ML S Y LLED K Q G A G G G E MLIILLKAY D Y H YFNV E AR GKT T MLTFLFDKIVRRKPE Q KDDYVIV RDTDEFL H G I KVFVRH K N KDENYILTD S R N R E S G KYIIIKAKN S EIRKLDD KAEKEI R G ED KDD SHKP EKLI T Q K T DKTI S RN RLEE RK KLT Q A VHVF K A S K N H Q LT RAAATK S NIA LNKAKLKTTEA S K S K TV G D NKD D S VL D A EKLLAME PE A ELK GETILK V S Q EVYEH IE S LYPK P K S Q T ETK S YVE GELI S KKKLDV Q S S VDEMKILVEK E RV C NF S Q K V F EPH VYPLD H LPL D LLR SYI VDIV V I DPT A DEK QNT Y L KNE LR N R RE S K Q RR KIF QNTL K K Q S LFLI Q S R K L TNKKTK G I KNLK K D Q S LLLW Q S TVK S A RN S D SLKTE G I VV S S K R T D QAHKL FE YDP N G L L E SF K E S E G D YDLIDIKNIKD YDKKYVLTEMAK NDMMLKRK D NLI I KK I K GVDAA K YIL SLN DFL L V E A CE DAKTMIKDE LTI I LRVLR Q Y VFYKMKNY N A GHE S R YII S G R S G RRFLI DLKLDE G I V S L F VLK DLYKNDI S Y DAA LDE G VLILD DLVKEDID S KKYKAHL KHEMTEA I R S RNDLD SV DT IFEKTKNRI TKKYIAA L KYL II A F THELYLK L E MKDVMWPNK D N GAEI T E QLPNLKNY I Q PLRRV S T Q N W R G Q T I S Q Q IKENNIKDRKKF KM S NLKRMLLAT G K IF E Q Q VVRLKTEIDFVPA G W RK VRDKRV S DYKHLVRLKTIID V K D LEKFFNP G VRFKDL N Q F VTRRFEEKY ALDIMEAK T A LT ALLEED G TLPYKALEIM F Q G R N AL S DKTMAA A H S T L KK GPI S K W HDIIAPTKYLY HFYR D A QP I G P SF F H D G E IP G Q NATRALE T Q V QAL G I KTAYLK W HFYKNPI S N A C L P S V HV GYHKN FVFL GYFYFYLFFILR G W YNKEN ND Y N SEF G W YKT S A DLIF Y R HRR A PA A AYN VIVKKEE YIA G S DKMLK I T F N H F D V IKYI S K G S STPYEPA S ETET IT QKEVE S V TKFFEAKRVEF S V T G S K LE Q S N L K Q Y I DE Q T Y G E KL S TKFFDP A LDI G IF Q KNL G D TK NPKF DVPRKLYNPFEADLIFKF K G IA S S FP G LDV Q DLNLY NPFADDLP S N EEAIP QV T RKNHTKEKV K T GH TYFD S A DTIFWE RDNF A NKI G F PL E L GVI R P KREFKV T FD EM KKKT DY G N GK E AF E I QILNRF KA GT RLDI LL L T YNE G G DP D R S YIDD E V QI L R S YH S TINT S Q A LA P G T ELIVT SFI S S I L QV S I V K SLAWEE Q T K I Y E IK A Q Q T TWVT G DL IN S N Q E EY I S Q Q MS D S A Q ETA DH S AVH DHTNKAIDT Q P GLIDH AATPM T Y QK A IEN G DHI Y KV GE HKI Y ANL V IVMYPHA GKEEAATF N T S K V DIANL IYA LY NKHIM L T DLE G N I TEFNT KLFD L Y G IIHL NY GEV G KA D Q I QKKDK N YT A KE KEI MPL YFAE G G G I C G A N T G K V TD K V Q S Y G Q G L TL G Q DEI K A S DLD G L AR ITT I DL A V SVMNLV DDIVYV G E G Y I IKL IVV G I G TY S S K Q M F IADDK KKKT IVA Y I GN K K QDDVK Q K T G H T V S FEFIDPVV I KI SELIVID NTK KDE TEIDEDD KE I S I E VF L H SDLVIF TEFDWLAYIKAAEL S D TN YELIDIE S K LV G N LYD L I S VIEITKIT YE KPD Q RKYTR Y EK IKN S N RDN S F NNK NEM FKILLK TPKV G Y EKNDID F PE G M ERH S T N RVEN E S VL G Y ELT N EN NFY S INL F K Y GE R F G KVVYE QVKPVKAL KLFAEL L K S K LANDA 0 K LM G Y NA VK Q K LLAELKYIW S S L KI IIY L Q V NAL LE KMFAELVI 0 Q S FAWL T N QNIVF S K GR FLI KV T N RT S T D T Q V NE E RI G D RF N T E Y FD K Q G NKYFL SA S G NHFIDETI Q KA SDK O M G R Y P D A I N I E R M G N G N G Y N D T S Q V I Q L M A K N G E A E H V Q M G S K M L V L K S T G Y K LV S T P Q E M F F D W Y V E K I I E M I N 6 W

0 5 0 1 7 . . 7 5 7 1 2 3 4 5 6 9 9 4 4 3 4 3 4 3 4 1 8 3 3 1 8

B 7 4

VYFILTRD FKFVYFILTR S E S MDT S T Q K NYTV KIKV YETF KII KYETF D LVAYY LVTD LVALY Q S G VELTWIMAPA S K QII RT K NVLLV QDDY S K K L I QI Q I LV K P T V CTLLIKN RIAP T V CTLLIKN EPEETEDRDRTYYM G K LYLP I V Q K DFV K D A S EHIL G D R I QMHAKLP A K S IL G D R I QM DND A VYN QKAVY S H GH D DFV SDND A VYNK DADIM QKAVY YEFLR I N CV Q F SA D KIKHIFT LIKYKINV QD ALDE R N S DE Q N IK HKL ILPN S F KLIKY G HKL IRYWNDPK WLFIRYWNDPK LNNKKKVVHKL I K QDLEEYR G H K H GA K G LK S TKAEL Q K F D K S KAEL Q K KMI M MAKKMI AITTAERVIMRHF Q K AELLFY M EVA QEDI ERI I DHV G ATA HL I DHVM RLEKIELLH G I AKA AIFLLVHPLD LLE MVNH A TLRVD G L ELLFY GH TA TL IKKR S I HAK F ERI SIKKR S HAK V F G N ED I N LIEKDHLY K Y GIILIEKDHIY H F S E N I RN QNN E P G YDI S L DKKY G LNT G L S L DIR G F YT HRVAD E L YE Q E S L EKKNKN N L S L EK G YE Q E S L KFIIDFTILHLDKFIIDFT Y G TV A RDL L Q S LI QE KEFAKDN AKD I G R GLYYLI A Q S KL N L VLEDKDL T S C G N Q EHVLDNFL TV HVLDN IDDPLND E KRRKKEL CYKPEIDEPLN G T DK TP A S L R K N S KRYLPEE S N SIT S A K LDIKI K L QYVY LDIK GHIMLTFLFDTEEA V REKPT Q S DD S V REKP YAY VKLDDYAY K DDYVLVLRVL T S S MLEELLW N A QDW MLEEL KT S LIK G S D L LEEKT S LIKIY K K GVWH A D QVKNN G D IKRITDRDKDK Q G Y G K SKA DKTIVVKLTKDYLVI N S F KDYLV YK S N A Q R S D K S K NKDYK S N A Q S R S E R KPDD TPAK S K LKRKKRYRETFKK G I K H SAN RETFK FDELPNNTYYEHFDNLPNH K V H A EVN T S S HVNVV N KP QIF S G Q PE S E F G E T VNYVRRKKEPEVKAHK Q E KKEPE DDI E VKT QIYD Q H LEVN S G M E S K EDDI E VKE C YVMDNKA QIYDN M LAEKD P FDA S FK S Q V Q DFIRKME GMK H Q DP SLILI Q S N K ETIKVHRY ME VV VHINETD AVV VHINF G KLALD G D T Q S EDL YDK IL I K SF G R T S DILHET Q K ET S S P GLEAVPII L T S S S P K G L PI G EDHPN V C E EPI G E EDHPDT EAFVYPTDT G K DAA K Y SLD DD V G P Q L QEIYD Q F VYFPKNLK E Q G YD Q F VY TEFA V YLL S P VLKTEFA V YRL S Y QI I KNILLRKLLDLDLDE G I DKPFKDVLVTI NIKDADVLVT KNI IEAHL NI EPTH DYWDLK SAK Y Q RV SYET Y Q RVI S P DE G TWTDI SYET TIPKNH M A QA E TNIKKR LK G P G S DE PNTFTRV L A M S K AK QI W K GRKPNTFTRL L DDR QAVIL TLIDHY Q IK S TLIKIF Y E SF E D T G T G S VRLKTTIF L LL K V S Q D EP P LT G QELKML I R SVK Q H K V S Q D QELK HWFKD K THWFKDFNRI V G D IDVITLNIALKLMEA Q V S L FVRAETE EA D L GVRAET NPD F N A S F V L SHVNPD DK P A S Q VRYNALEKM D NK ILI G L Q F SR NIAYNILI G LAFAF V D S NY SR FL W H GY F YRN SKTT HLK P Q DY QIV K HMALEKM L S YR HLK YIW H F H FNV F FF Q H GY D DL QY T V D S ID FYIW V D S VT W Q G RKDE V G Q T S NF S PLRKF V V ITYIK G I VPD G W YDV Q R EVI G NLI G W YDV Q R N RP S IL G T ID D I GTK S IL G S ITIKAEA S T D S KF CET DDNE IP N RP R G S TKFFDPI S N KLA IFFILD VLR IFFI QLNPFEKDLHKEKP S V TIKFWV F F S TTT S V TIKF YN T T S L DY N C ET GYN T DDNED K S ANT S E K KLNLVL N N EDLFI PRYR EVENPRYR NE L S S N L AN SREPI KANE S L N S L RE L S T L S R ERTA GYI S R ADI I S D T F G D EL RITDL A N G YKFM E L R Q T Q S PRK Y YKFM N TA AT G L DLN A VE P S S I T M GLLAMNLIL S G LVI Q E P Q Q K S A S F VDI NL G T FY L K K VE Q T K G S D G TLV R IYAK K T G D G K TLTTIKIKWFMNENIADH K FE MKI QN A AHI F L G K FE KE C DTVKI G NY AK E C S DTVKV DYATAK HA L KK T DAI G V YV Q G ED V S KK Y E S Q K D G I Q N A SKAIYVKYNYT C S KAIYVKY C K KNFN ML IE Q S NHPN L Y GKLNKLT S D VN E TD QL Y Q A TE KK H S Q Q S DT G S TDY E S Q K GLV Y Q A TE KKNT LIKLKKNT M KTFN L R GLLEKEN DVAA AEDIVLNWVT G L E EIEK M KTY CTVL K C TVLLIHL A A G D IH I I SELIIKDHRA N TINFVLKI L G TVET S G E TI DI R KDEEIE GNNKDI Q D K T E G ET E E GRV YDA IKDDII N LIK QLTEV K D GVKT Q F Q IK G N QPYTEV K D GV RW K K Q K N GE C F G G INFYRW K K Q G E C F G R NY G G YITD D S S TY E G L NAV G Y ELIIV EIKVAA Y A QKE IYEIKV FD F F G M YDDTNDAFD F M YDNLPTAV L IM Y T QVFIKKLADD G Y NY GVDELI LKHLV G N IDELI 0 V G L Q HKK Q Q HKIR PI S G F Q TLDELDLE D DK E GLDTK KLK E 0 E Q EKTNYE T KAV F G G L S S DKE Q F EKT YE R YA Q Y K MIEVT G N N FH G G Y NYIENDK S F TT Y S KALTKKTKLK E O F R I E Y D Y I I M I N F R I E Y S N Y I N S S M D K D Q I Q A D G R L K V R M F F N W T I D E K E I M P L G FMINIK Q ATT Y S C A N F E E I V S D M P L G F C A N 6 W 0 5 0 1 7 . . 7 5 7 7 8 9 0 1 9 9 4 3 4 3 5 3 5 1 8

3 1 8

B 7 4

K

LI KII KYETF I KII KYETF I KII KKILYLP A Q S KKIKV VLF Q AKM G K LYLP QI Q I LV KK I Q L I Q I LV KK I Q L I Q I LV II TRN S D D I M A S EHII RA K N QDDY NIHLFTRN HAKLP S A IL G D Q R MHAKLP S IL G D Q R MHAKLP S A YM K F GDEE HE IYM G K LFLP G L KIKV F KLIKYKIKV F KLIKYKIKV F KHLEYH G G G K A N K G LKHLFTRN D A G E S S D L YALDEE CKL EYH G D ILPN G KHKL N G S KHKL KALRVIMRH S KALDE T T EKA G E RVIM MEVAF S D KAEL K ILP Q EVAF S D KAEL K ILPN G S Q EVAF S D DLEE HF K T QADLEEY Q Q G D K H Q G AT S A KDLLE MV QEDI ELLFY Q M EDI TAEI H MVN SDKKY LNTAERVIMRHEA RVD G L TA G L ELLFY Q M EDI DIR F G H GYT L TLRVD F G H TA FRVD G L G E MVNHVL T TYDI S H DK S VEKH GYT L T F G H N LLKHKVAD G E L H KV GYDDKDL N S C N LL GYDI S EKKYKLN G N LEDKDL EKKNKN N L S G YE Q E L DIR SEKKNKN N L S G YE Q E L DIR G YT SEKKNKN LVELPEE S S I Q T LVEKHRIADKRYLR LPEE FL LTV HVLDNFL LTV HVLDNFL LTV VLEFLFDTET A E SKVLEDKDL RVL S Y FLFD I K YVY YVY KI K YVY KRYYVLVLREA KRYLPEE S NVR SIIRKKDDYVLV T S Q Q N ADD V LDIKI K SREKPT S Q Q ADD V LDI SREKPT S Q Q ADD MLTDKTIVVVL S T MLTFLFDTEN RMA DKTI LW Q DW F MLEELLW Q N DW MLEELLW Q N DW KDD ELKRKKLTKDDYVLVLR G PKK S K I HN S KDYLVI HN S F K YLVI HN S F KA S ETVNYKRYKA KTIVV V G Q EIKE S ELK GETV K G I S K AN E RETFKK G I S K AN D E R G TFKK G I S K AN AK K G S DPK FVRRAK K D S ELKRKPFKA F VKAHK Q KKEPEVKAHK Q KKEPEVKAHK Q E PE LI S D Q EIRKPE S ETVNY S Q APK IKVHRY ME HRY ME TIKVHRY V F I QKYIL S I FN V F G Q NPK DF Y E SF E P S LILI Q S V G VDK DILHET Q K ETIK S P HET Q K E PDILHET Q K S S LILD D G R T S DIL G PK S T LILI Q S LKAA K YIL SLD EAVPII L Q S LEAVPII L S Q S T S LEAVPII YDK DE I D GDK Q V EIYDKKYIL I KL SFV S L FYLDLDE G I FPKNLK G E YD F G Q VYFPKNLK G E YD F G Q VYFPKNLK G E DAA S K NIKKRPFKDAATLD DEKYNDH TNIK I IKDADVLVTI IKDADVLVTI DLDLLVKIF DE G I DKRKFDK S E NLVK P S D GDEDYWDLK S N EDYWDLK S NIKDA TH TEIF Y E E DLDL F G TH NIKKRVPDTRLKTEI EP I RP D P G D RP T V S G D P G DEDYW QEP RP IK E I SNMEA Q S S L IK S E G I LVKIFKLAILKLMEA ML S VK H L T Q K V S G Q EP QELKML S I VK H L Q L Q K ELKML S I VK H L Q VRLKRNP Q A VV L L SFVRLKTEIF PVHFYRNP E D NKEA D L G VRAETE KEA G D VRAETE KEA G D ALKLTV LKYNALKLMEA Q EK SFM P Q DY LEKM D N Y QIV K HMA HLK P Q D K HMALEKM D N P Q DY K HM W HFYIK S P GI RNP Q A VDL A AYNKTV G IMYIK G S GYNKFDPI N RKF SVPD W HFY GYNKTV LTKL G W TKFFDP EVI L S YR GNLI G W YDV R Q IV QEVI L S YR HLK GNLI G W YDV R Q IV QEVI L S YR GNLI LD FVLR IFFILD FVLR V IFFILD FVLR V ITYKDLHKKLA IK S P GI STKFELDDLEKP V ITY STKFFDPI N RDA SK V PFEKDL SFN WV S F T TTT S V TIKFWV S F T TTT S TIKFWV S F PFE TFI NPFEKDLHKN E T QLN D ELD E R Q EVE NPRYR Q EVE NPRYR T TTTN QEVE FN A FHI G ELDDLWVT E Q FH Q S L PRK YKFM Q E L R SPRK YKFM Q E L R SPRK G D S AVDIDL A T Q Q P K S A AV MKIFY L Y Q T MKIFY L Y Q T K KKVYV KL T FN G D Q P Q Q KEDA K Q I DAI Q KP Q L V KKV L G K FE MKIFY L Y T Q Q K AF HITA G H S S VDII N L QLIY G G EDA AHI Q F L G K FE D G I Q N A AHI Q F D G I Q N A AHI L I QF H KLE S D VKETDH KKVYV DKLNKLE K H S Y E S Q K Y G L G V DIILDNRL L EDT K G E Y N K G I Q S DT G S T D GLV Y Q A TE K H S Q S K DT G S T DY E S Q K GLV Y Q A TE K S D G D ST Q K H SDT G G LV G L ALNIKDHRWIT L Y G G V GALNKLE S D VTK L VAEDII GVLIIKD NFVLKI L E TINFVLKI L E TINFVLKI DAEVKDDIANL DIILDDN S F TVET S G K G N TVET ELIKLVIVIHL I DAE SELIIKDHR K I DA SAI Y VKD GKLV KT Q F Q I QPY TEV K D GVKT Q F S G Q IK G N TVET F Q S S I QPY TEV K D GVKT A Q Q PYYEA D DK YEA VKDDI S K L S D YKLADD AA Y A QKE I YEIKVAA Y A QKE I YEIKVAA Q Y KE AI Y D GFH G D LD G E Y N GINAI G Y KLVIVYFIN FH G D LKHLV G N I DELI LKHLV G N I DELI LKHLV N IN GIKKLANYVENDK KKLADD KEEEK G N G N G Y NYV 0 KALTKKT KLK E KALTKKT KLK E KALTKKTK 0 MINIK ATT Y S G FMINIK Q ATT Y S G FMINIK AT N N Y W AN S F K FH D D GLDVVEKFFNW V G G G P T V G D E Q G DNE T G N G N G Y NYVENVP TILKP G T O F E E I V S D M P C L A N F E E I V S D M P C L A N F E E I V S Q D M F F N T L S Q T H T L S N M F F N W T V D E P N S I M P T N T L S Q

6 W 0 5 0 1 7 . . 7 5 7 2 3 4 5 6 9 9 5 3 5 3 5 3 5 1 8

3 1 8

B 7 4

DA S I EHKKIKV NVLPLIKKILYLP S F G D Q R YYIRL V KA L DKEILIKP SDE N III K RA Q K DDYLV II RN D A SD I M Q K SEHKKIKV D NE II Q I RT SLY A Q H V K V G L GLT IIKKD KH G LYM G LYLP A M Q S YM K FT GDEE E I RT K N QD D L CA S ITLF LPV G T TAEI R F GK E K GA GA S K LFTRN S D D S I EHKHLEYH G G K H GA N II M G K LFLPV D HLFY G Q Y L Q G ALDERAEV K VE S Q H EEHP RHF K TKH QAKALDEE HLFI KPT S R LNEYN LP S K Q EPA W DI GR A NH LNDLEEYH D HE G G K A N IKALRVIMRH K G L S T Q Y GLDLEE VNHF Q K AKALDE R NV G T Q EVPLI TANRTA G KI V G S AILH FIF Q V G K RP L S G KY G L ERVIMRH S K K TTAEI H M SDKKY LEEYR G Y LVLV IMPNTVI MIYYDF A G K AD E TA S C N LLE VNHF Q A LKHKVAD G LNN G AERLIMNA S LLN GYEI S L MIVLEML DIEPLR S T NV SN Q T DI H M SDKKY LT N L GYDDKDL N S E L T C LLE MVPN I M Q N SEH LIEKHEKNY R IR SIT A E G Y SKLVEKHKVAD G G ELPEE S L S I Q T N YDI S L EKVLE KVDA N E S T E S LN G Y VI V RKE Q TEEA VLEDKDL N E LV S C VLEFLFDTET A E G S KLVEKHRVNY N I VLDTK DITDF G N N LLDAKA S KL N QL LRVL S T KRYLPEE S S I Q T LEDKDL K G L KRYLP S T MLTFLEE VVKLTMLTFLFDTET A EKRYYVLVLREA I SKMLTDKTIVVVL S T KRYLPEE D A SDF Q K AKDDDVFD K H GA G G Q KL R C L T M NN S S RL L EEY QEW RKKRYKDDYVLVLREA KDD KRKKLTMLTFLFD LVRH NE H NYVRRKA L S T KA S EL VNYKRYKDD R LN KAKEK DFIKKAK K DKTIVVV S Y VLV K H GA G E L AKA TINH T LE R QEKFID S E SN Q V S D R Q S SELKRKKLTAK K G ET SDPK K G KTIRH LNHYIT ML IEN RYPE S DFVRRKV Q EIRKA S T V C PE G A Q E SF G RPE GPV F G ETVNYK L QDPK RRV F ILI QKYIL S I FN S ELKNH Q E VT Q DPTI E T K S RTI R L QV Q RY G Y QA V G DD Q EIK S S LILI S DFV Q EIRKK S S LILD G RP K F G E IKYK S A KKNLLLL E D SNT S A RHIILEEF N K QF QDP Q T EA DKTYI Q T RIKAY NI PFTENL DKPFKYDK YIL S I FN I DD G PV LILI S AD Q NVL T Y S DAAEEILTEKLRLA S A LK KR E DAA S K LD G RYDK DE G DK Q V EIK S T E G PDAA S K NIKKRPFKYDK YIL S S IKLT DLDLDD LRVLVRKY L IIY QFIE IF S Y F G DLDLDE I DD GDK Q V EIDLDLLVKIF E DAA S K LD RY TAEIFE G I VLIRD RII L A S PI F Q L L LTY IKKRPFKTH IF Y F G E DLDLDE I TEK GLRVRRIKPPNIKKKNR H A S V S FIN E TN SNLVKIF E IK E ITE SNMEA Q S S L QV YNVRLKTEIF Y F G E VRLKRNP Q A VV L LTR SFVK E TNIKVVIRKVRLKTIKNY D R K A S Y YN S M SE R K S NLTKRKN K S T H Q F QAT S F E ENI SALK PL C KFALKLMEA Q S S L G RALNIMNID V Q Q G I IPD FYRNP Q A VV L LALKLTV SF FYIK S PLKYNIR GI KFAL L KTKINY V G P HFYREA F PNI R N Q LMEADL Q EI G W Y NP I G S F L TIHLPLP I S TLA W H GYNKTV YN W H GYNKFDPI N R SVPD FYRNP FK H KE SYI A E Q LNKK HKRKP MYIK S PLK GI KF TYKDLHKKLA W H GYNKTA I KP E V I STKFF E D S L S E G Y S G S D Q D LEN R N LKPR Q Q L Y S G LEIRLD S N DLFI V I KFFDPI N R SVPD V I STKFELDDLEKP TYIK S S F GDD S Y F G E NPFA DLVMVKT IT A S T GNPFEKDLHKKLANPFE HITFI V I STKFFDPDRL L NE A NAKRVFL RD QDLIFK F ETYLDIKKI DI N L SKL N KP N A F VDIDL G A NEFEKDLKRV L S F T Y GNP D R S L T ATEAEKE YVKDA T F G D ELDDLE K P Q HITFI T F G D S A VYV KL N D ELDIFKYN IDT S Q A L SFR S F S D VYF AFEHHKL R N Q DKKETI Q Q K A F SAVDIDL G A I Q KP Q KK QKEDA K Q I DA T F GE A SVDRLDH KLDH E S D VKETIK P Q HF KF DH QK S A S F V Q R A S VPD L TA GK I RV KKT QPF G T PE T DEIDEN G T QAVLEKEL LDWIT Y G L V KKVYV GEDA D K Q I DA L KL G G V DIILDNRLDY HRANL G L ALNKLE S V V VILA L Y GA E K DI LP G K DLTYFVVF KET L Y GALNIKDHRWIT L KKI Q G G EDLPLKKP L Q D S S G DKNI K F SDT AEVT VIKA DIIHL DAEDIILDNRL AEVKDDIANL L Y GE IVD N S I ELIIKDHRWIT I D SELIKLVIVIHL L NKLEI I I K SELIVI SEDIII N F SDL G A A MI I HKKI G A EYDL C T SDLTM FIAVK T KKV SIVL DKE G Y IYEA NLYEA DD I K LILKDHK KL S Y Q E I G Y EKDITKL S F FLNNW LF LDTK Y VKDDIA GKLVIVIHLNAI G Y FH D DK GLD G E Y N S E GIYEA IRDDL Q T DAKLLADLKDIDHT S L TETY Q A AA ENDN F NAI SKKLADD DK NKKLANYVENDK I G Y DLVITK LYVW H 0 E G LD G E G Y IK N W V F K K DEAN S Q LLADDDDIN K T K QL E G T N FL G G Y ND E F LKMIWL K V G G RLARI 0 G D K E N K Y FH D G KV Q D S L S T G N G N G NYVENDK Y F T G N G G P G T M F F N T L Q G DNE N T S T H T L S M G T N G Y FHDYIWIT VFFDWH G S H VIIF WYDLKK I N Q D GEA O T S Y F I M F F N W T V D E A N S G N Y I D K A N L M I L K P Y E L R E D L M K L L I E K I H 6 W

0 5 0 1 7 . . 7 5 7 7 8 9 0 1 9 9

8 5 3 5 3 6 3 6 1

3 1 8

B 7 4

I S C S R LRD Q L Y Q I LRV IET S Q S A FA Q S VTY K KVYNKANNTKKIRALPVAILIKHEKHIMPNYA RY AA MLATEK DH L A IPLAK A K G I Q I RTTEV RVIII F VLLRKAEAKM VLKE TF S K KL C F FYVIMD G A D KPV G T KY YI I L GF R NE VKNT PP S I DYMI QY LL ENLHEIL L Y GA G G KHV FPL S I MALF LKEVP Q S EHLD K Q Q K G Y S N A VE D L SA Y I SFLKV VV RL G P IRKKTTN L G KDEI S K DLN HHLLY SDDKHHK RH L S K HKALEYIMPN S D FY VDL G APY GDDK AK TVKNETK I V LIVI R KT RALDE GIL DLNEYE E V SA LNRLMVVL N LDY A R G YKDNEEKHW LN S LYKA I A S E A VI L DL SVTKIT TADRLE G PN I PID SLLTAEE Y G G Q Q G L AV EFDEAN S L PV L N G D YD DYF F Q I F QFLA KVI G Y EKDDIDLF LLN IMVLEAT LLI L EKN SRV AKPTVL C A G A KVVRHE IE I Q PY G LAEL MVNY YDKHDL S D DPLVKREEPTI HK G K GI AHID L RNLL KI QTMEI D G YVWDL G N YDL S L EK K LA G N SEDLVEEKEE HLF MLIILLK G N YR IK KHRE S D TEIKV A G N N Y FL Q KH LIDKH G G ND D KV KI VLDVKRV D A SDF LDIPFD G K A M Q S KDDNYILADP S D IK AEFKKNV FNWH G S VI SLE VI KRYLPDL N LV R FLLVRH S I EHRATETK NIA EI DDT E S N V R AF S PI EE K H GA G G VK G L S Y YVTI HE KKKLDV Q S S S VL RL VYL L L QFV Q L YK S K L KPY EN FLTLL G T V RR G E QNITKE KDDEVFDRH K EM DDD F N I GLVEEIKILTEK C E KK RIKDPRFLF V P GHN Q S I G T AIL A KLVNH T Q KK QL A KL K S N SL N S K L ITND R N LE S G KKVLHKFK KDT K G E G Y Q A S K K K E S TVKYV T K SAK S RG Q G DL K NF K T Q D S S LKTE I L GVV S S K KF AIETIEIY E HR S KFE S L K D QLDNKE PD T E LK E G E TPE I L S S MLKLK AD R K K S K I E TV D N SNT YV F DPV S S Q N E G G EL Y NDL QVFYKMKNY G N HD G I N Q I M IVT SKK L G F A I G H Q VK E F I NTARV L G N GI E A SND Q E EEEI K T Q DP SLMLL RIK S A RK S Q VLILT SL D LR IVNLVKEEID KY P PEKFEDDILP YDKKNV Q T TEKAKYDK K Y QLE G I VV Q T TE ITA L KYL QPLR KD N A ER G K SRN I AT G K E KK KI RD NT S Q KA Q L KRRKVYLFKEVAEKILLRVLRDAALDLKRKE A E STLE K Y QFFNP S I F A EL Q YTDFYKPDD DLDLDD I VVIMDDLDLNVKNYEAVVRFDDL F K QI L K S Q PDKK S D I KE GL K E GVE KEAKMETMTRIRTKEIFE G KKNRITHENNDID VL AL V VP S N GDKTV S R I A E Q FAKFDL IKD KVIEWV PH IKAPNLKNY KDKTVA F KL Q T N LK L KLLRE S E L PLLPA L DNK S MRNKVRLKTIKD V KTV Q T RLLMNP L G S FKRY W H G Q G YKK A NATRALE SDLIF AR Q TFL KL E A QVY A NLF S G D G R KN ALDIMDI F P G R D S V LEYRD VRR K HY YI NIPD G I E D R I S EY FFREA I Q S F HFKE S NE V I K LDF SL YEYVN I G E VH M S F G G S LK KEL Q Q R I SEH Q N D G E F W H PDD M NF SEY G W Y YI S G DK SAKR I S T S T G G IA Q S S F S N G L S R L GNNRV G K F V Q S V LT AE TFIIERK RR ILLN A G Y N KEN YE Q T NDALEEA S VH DE Q DL V G N S V S S S YIE SL V I STKFFE S DKLLK I S N FFDLIF GKRVEF S V TKD M F V N D Q E IPL G E VI QE Q S T YID GN VK NF VV KKIAIYE DR S R LRD L Y QY I A QLRVNPFE ALFKFANPF A F Q E N PLKIK L I S K S V QI S A Q K Q L Q S VV R Q E RKMAV DH AA MLATEK NE S N QDLF R A T FN E Q Q AEV S S Q V I G Y E M LHKI AA Q L LR Q K S R KTKVF KF S K KL C F FYVIMD T Y GKP N D G S V S L P G EPN K Y EIDH SFD EVDDLE N Y GI TE IIRELMLRT L NLHEIL IET S Q A L SFA Q S VTY KMT S S V PL L A L Y GA F A Q IKKITT YL F D L G R RNV L Y Q D F L SRL P E GIRKKTTN DH IPLAK A I GDH D Q I GHI S N V S L DIVA G Y NLDDVK K I QT GH L S I S G EDD S P TVKNETK L A HV FPL Y G L K T GKILHKKYLTEFDWLTYIKA GE KLAL K KD LYKA I A G A D EI S K DLN G L KFDDIDDLKKFYE PD RVE S S YL DLHF G WR FYD Q DD S A SKVFE DVL N I F G D L Y GA G G K HHK KT VVIIK ITVPKPE M I GARH S T S N L IR VILATN K N DKI D YF LA L G KD SDDK QPY F Q Q F G LL I V SELIVI DL G R IL S I E L G D DVNFFNI IY V IN AY K Q R Y G G D DAAHID L RN EI YDA VI L S VTKITYD S L Y V GNL YIAK T T A SN V L Q N TY D A S E PY GKR Q T E KMKHRE D Q TM STEIKVKVI G Y EK D DIDLFPAVADD G D FI G K GMAKN E T Q NE A L M S E GAEHM Q AK 0 DK N S D LRR G I D Y GKPD YEAEFKKNV G YVWDLKIM LL 0 LL DLH L Y FHI Q K SVHIPNF FD KITR E N V RKILAEL S D VKHK KT F G E IA S Y ED N O H F P N I M R G K K LYK DT Q K V P G S G D G M N D S V Y L L L Q F V L S Q Y K S K M G N N G Y F G N D Q D K G S V I K I M N G NY LN G G N N W T Q V N R T N A K G M P P E K A I L N YM G D R T 6 W

0 5 0 1 7 . . 7 5 7 2 3 4 5 6 9 9 6 6 3 6 3 6 3 6 1 8 3 3 1 8

B 7 4

KYTE PIAEV I ETEDVVKPY I ETEDVVKPY KPY L G H ETE I S F K S P E KKEE C Y Q L PEVFA L G H D NKVIAIL G L L G H KVIAIL L I G L H ETEDVV G KVIAIL G L NR N S D IKLDV IILE E L R G K NR G Q YY FT NR G D R N QYY EVLFT NR G D R N QYY VLFT KF G D Q R YYIMLFY QIDI S L K G Y P G V G H KFIKVI I EVL Q VKKI KFIKVN Q I KKI KFIKVN I E Q VKKI KKIKV NEL KHEKHIMPNYA KK EE L V SA K RTEE S L A M II T Q K AHA G E GR KAEAKM VLKE II Q RTDE S L A M GLFLPPN S D S KK Q RT LPPN D M S S K I Q V G LFLPPN S D S E R FLPV DL IPR S NYII YILFKRKVL F Q II G LF QH YILFKREVL F Q I ILFKREVL F Q Q H S M G L QHLFT NV N YH GKE SA S Y FLKV D APY EHLDEK Y G N TI EHLDEK DY N Q H Y GTI EHLDEK DY G N TI KA E Q R VKP NLDYVDL G DNA KANEYK Y D G AHPL KANEYK G Y HPL KANEYK G Y HPL DL L D GEYR H E G VPE GYKDNEEKHL NIIM S D DAL DLENIIM D A SDAL LENIIM D A SDAL TAERLIM S I ANY AV AA EFEEAN L DLE S TALK V HLA S T TALK V LA T D S TALK MV LA S T VL C G KVVRHT I L M SEK G K A LN DI L M SEK K H GA IDI S L EK K H GA N LLD P MVPNWD KREEPTV E LD KHRVRH S I EL N I GYEKHRVRH I LN SEL G N YEKHRVRH I LN G YDI S EKVL KD SEL LVEKHRVDY S Y E MLIILLK N HK G G N YE GYKD LVDEKDLNHE LNHE IDEKDLNHE KDDNYILADPI VL LPEEKY N VLIDEKD GE VL EKY N VL L LPEEKY N VVLEDKDL ANI G E RATETK LLFDAD S K Y LPE DAN K G E V S KRR S Y FLFDAN K G E KRYLPEE S D DKRL S KMLTFLFD HT KKKLDV Q S S NIA RR S Y SVLE MLDYVLV S NF K KRR S FLF Q MLDNVLV NF Q K LDNVLV F Q K VEEIKILTEKR KDKEKTI S V L S T KDK KTI S S V T M DK TI S N G AK S Y S K SV T KDDYVLV K L TND LR KAA KVE G L S K G AA E K SELKVE G L S KA KTIRHRHL G D S I SLKTE G I VV S K SE AK S ELKVE G G GE E T KAA S E EL LMLKLK S PE Q F T LLR G Y AK F DE LLR E T K E T AK K D S ELKNHPYD T G Y AK DE Q T LLR P S Q F G E VKYILI YND A RPE Q P Q T E F Q VI T G Y Q RV S Q DP Q T DLLR QVFYKMKNY D A GHDVKL E P Q VI Q T QLV Q S RKI S KVKL Q E LV S VI Q Q RKI A RP SKVKL E P QLV Q S RKI S A KKILILI Q S S D NKTT NLVKEEID KYILNYEARKKPKYILNYEARKKPKYILNYEARYDK YILRI TE TE YITA L RYLKKP TLD D LD YDTTFD LE Q K FFNP I Q PLRYDT S KLKDAKLDE G I F T LVDAKLDE I D LD YDTTFD I D TLD DVA S K LD E S D FY G F T GKLVDAKLDE G F KLVDLDLDE I T GLR VRFDDL N Q F PT LNLK I G K SFKRI DLDLNLK S I FKRI DLDLNLK I G S FKRI TH NIKVV G E Q R G L AL V VP G S DK R DLD STEE K N DEVRKTEE K N KDEVRKTEE VRKIK E T SNLIKRKKPT WH G N Q NATR N M QLEIKL S T I K SIDKI T S I IDKI KL K N ST I KDE SIDKI D VR TKINYPII GYKK S A DLIF Y L EAKRN R D IKL S G F NLMEAKRN R D I GF RNLMEAKRN G R F AL L K QLMEADLLV VI T F N H I RN Y C M NPIF NF I R SLFYRNPIF I LFYRNPIF ST G S K LD Q S N L S LF KHD F N EK YKHD N NF S N NF HFYRNP E L Q S NRV K G IA GF S S FP G HH QV T G W YKYIP G S N Q F W H GYKYIP S F EK HYKHD F Q EK G W YNKTV L F S I EH G N Q S PLF G W YKYIP G S N PLF ITYIK S S G NEE TYIDD E V QIPL E L GLI FFNA A S PL QV A KFFNA Q A V EA IKFFNA A S Q V GN IVK V V IK K M AADLPL Y E SFA V I S FAADLPL S Y FA S V FAADLPL Y EA S V TKFFDPDK N I GL SFANPFEKDLKR K L S S IK S I S A Q Q S S TF QNPD DHI Y K GE I Q G LHKI Y F E EL I L P N S T D Q F S E I S N V S L E EL E I D L P FN ELDIFF K T QA GI N L SV L P N S T S E EL I V S L T A G D Q D L Y AEVDDLE G N I G T K Q A N S A DVHKKY G A T F GK A Q Q N A F SDVHKKY G A T F GK A Q F G E I S N QN S A DVHKKY G ID Q P K S AF HF SV N LN S G G E GA Q F ITTKIIE TA VDLRK IE RK H KV Q A V R C L IVA Y IKK GNMDDVKH K VDLRK L IE EITVP S L DH N TKKVDL G DEITVP L D S Y G L V K GEDNPL Q T TEFDWLAYIKV T DH G N S YV K E G EITVP S DH N TKK G ED GKLKDVTLT V G K KLKDVTLT V K E GKLKDVTLT G L K NKLAI A E YE PD EDILYIAIF L Y GDIEDILYIAIF L Y GDIEDILYIAIF V S L EDIII N E S T SEA PE M I GARH S T N RVEN G L DI SVINL I KLIVKN N NI IY LPNAL S EK Q KFIT KLIVK Q KFIT LIVKN FIT S I ELILKDHKTL S V KT T A S NE D MS EY Y LE S VHLF S I EK LE GEL S LKKKL Y L G Y DL D S VHLF I K SEK LE Q K SVHLF YAA RDDLKLT SLKKKL Y L G Y DL S D LKKKL EAI Y I T Q V G DLVITKRY MAK S N G E AEHM Q G KP E L QFAED NRKDN P Q E FAEDHNRKDN P Q E FAEDHNRKDN KILADD DIVRR 0 NF D KITRLLKLE KTI KLE 0 KF E F GIA S Y EDDYMKNTA Y FH S H DI GNY IRDRI NTA Y FH GNY S DIKAI KLE Y FH KAI K H G D YIIRK GIRDRI NTA G NY S DI GIRDRI M G N G N Y F GNYI KNKR O M P P E K A I L E N R R M A F D W T S G D R W I N M A F D W T A D R W I N M A F D W T A D R W I N M F F N W T I S Q V V N D 6 W

0 5 0 1 7 . . 7 5 7 7 8 9 0 1 9 9 6 6 6 7 1 8

1 8 3 3 3 3 B 7 4

] ] ]

T Q V NE S A E KY G D Q R Y S I IKK S D LY VA Q E IEAT Q L YKA KKIKVAVVAD G N TKKIKVDVVVL Q K DKKIKVDV AEHM Q M AKKKIKVDVML RKKIRFLH II RA EVF AII RA EA LII RA YKITRLL KI G N S Y L MKH G EAE FM G Q LF Q Y S I LF Q Y S I G L FFM G Q LF Q Y S I SED I I T SI Q R GLF Q Y S V EAKPT T F LLRRF CDF LV K K QV HHLFTAE L VL Q K NFM G Q SA FTAE L N G E SLNI AIL N YMN GDRT EHLFTNEVKPLL A D E CYLKD S E FKAV KALDELPPN G L GE F HHL CKALDELPPY Q HHLFTAE SKALDELP N I EHLRY KA ELPIALV DLNEY VLNI EY VA G Q S S NDLNEY F S NV NL L D SEY AEV S FTR Q I L VLNRAKI QTIDATRD TTERL R D Q NY S DLN SK G Q DK R R GKTAERL Q R L V I M SEH E K GIEVTEKAKL Q S TVERL R A QHNDVAPTAERL R E Q D TKKR H AE I Q N A IKKL N LLE E KD T LK LLE K G H GYDI L K G H SIM D A G S S V SDVA N LL I S L IM D H NLLE S S A GYDI L K G S IMPN G N L L M Q LY G N YDI S L IM E F T SYEI I S KR D AI S G K VE LIEKHMV S G YD KHMV RLIEKHMV LLINT E D CI LIEKHMVVL S K T P E G I VLDDKEK K H Q LIE GA T L QLYVLDEKEK K HER GHE VVLDEKEK G NEET VLEEKVKNYF Q K A C Q E S A T S RD KD KRYLPRVRHERRTRYLPRVRYT Q R E N D S FLYAKRYLPRV A LKLEL K Q A S A S L AF QKK G LPVY MLTLLDLNHERVMLTFLDLNDK S KRYLPRV D S MLTFLDL ID G K MLTFLDL S D G G EL REPEFDT KDDYVEEDYT YVEEDNK G G TN R YLE QKTF KDDYVEE H KI YFAALKK G VH G S D FL Q N KDD KA KFDKDK G S DKFDKIVN Q KDDYVEE S A DKFE PI Q N HKYF KV KFD G K A Q T RLRLH R S K K D S EIV NKD K A SK S K S LDKKAK K E S EIVRHE A E ST RLTFD M K GHE S K S I Q H PE A ETL S S IVN S Q PE T EIV I S K K S K GETI S EIF SRNEEPE A EIV E L L QYIWP PE A ETINHEAVRNNPIDVAIYE V F G Q DPLKTEIFIV F EPLKTV F G ETI IMYNLL S F V F G Q EPLKDYTL TYEYIIEKLAV K S K LILTILRNKEK K Q S LFLTILK V F Q A V EPLK GK K Q S LFLTI QLLAIKAK S K LILTIKDKM S T YV YA TKTF YDP YH VVF YDP NNYV TP YDP YH NKRY RT S D KE L K QILVNL DAA S K LV S V QRK V Q L G DAA K YH S VYP S L LYDP SLV Q RL FDAA K YH SLV Q S IRLN G E L DAA S K LV Q S S S IVRRRVL E K HL D KLDILNY S S LD KLDILNR Y Y SKKD LDIL NNRKNV G A DLKIDILTEI RLT Q Q R K SKDD P K SK T Q L ELYN L YFT Q L LYN DFLPFT L K QELYN R TEKMTIELYN RN R K GRNALRELLFFD Q IKENNE I D G R S L KRIK G E NNE G I L L N GN IKENNE I L GVV NE TIIR Q VRLKTLK S L FLPFVRLKTLK L EVL SKKK A IKENNE G I GIRLKTLK FDK I DPA QIHN VRLKTFKRK Q V EI RF AR G Q RK HF S Q K ALDIMIKDEIL ALDIMIKDRKIPVLDIMIK LDLRIRF ALDIMIKNYPLD KL D L SIL R Y G G S E EN TR W HFYK IDKKK G A KAIDFII Y L LAY G W A S R G YHKN Q E RRIP W HFY GYHKNEAK A HFYKEI G G W YHKNEA LM Q E EE Y V GM W YFYREID GYHKHEA L E G S F G K R Q K LHLRNI D YE GKID DLNV FL RVILPYK YKK V IKYIN A K SIFII INPI N TL SAN STKFFK V IKY F VFP F V IKYINP S S TKFF KYD K KL V IKYINP S L STK FK EV L F SF KHL G A VP G NPFAAP S F G N TL G A S TKF SAN NPFAD S KS F G LDNTNPFAD S K G S KET L Q ED QAEVNPF G F DP S D GDKKYK A DAE QT TR FD S G S F KLPRFLF RRKF AN D K SLMK G N QAE L T D EEP Q A VFP S F EDP Q A L G Q F E ADLPLDNT T F D G Q E ADLP N KI SENK T FD GN E EDP QADL VIHHFF T D EDPK G Q F E FVPVT LHNRA A G W K S R IK A Q Q T KTEILA Q S IK A Q Q T A DLI SM F N ILAE S E N ELT Q L EKDLDY S M N IILIK A Q Q T S MEIKKKLIK Q A T M SF Q E RLKFMEADH E N T S F E I QI S ENKDH S FRILNY DH T S N T S F Q S ARP AK LKYD L Y G ARVHKKKL N T AEVHTW L K HLRIEI VFI G NK Q I SLAHIK T R QRKR G EV S K KK LN L Y G G EV G K KK IA G S N CI L Y G G EV K AEV GKK Q S I ANDX L Y G S EV Q A GEV G K KK LDI G A D APYLNWPE R I SR K Q Q KTKK I KIDDD I D QITW G Y S L DD V D QIIIE SELIVKKDIAYI I KID SELIVKKDK E S F I KIDDD Q V SELVVKK FK YEI I KIDDD V P QI SELIVKKI N RNF F G K LIK EVYIAT SKEALEF V S V KLLEYEK IIVYIIE YEK IIVYV G E L YEK LD S D GA YEK MIVHKKET VIL G S KK R R SDKKYYPTL G Y EID K E S F PVL G Y EID NTY E D S PAL Y IIV GEID DRFE F F QM G PEL G Y EIDDLNVF TEMKL L KLFAEKD S S V G E L KLFAEKD S S RDA KLFAEKN 0 N ETKKE KLFAEKNITWVT T LDN GFADLFEML S D F G L T N I Q L 0 F G I K KRKK N T FLPLTTY S E FL T FLTDIA L H YFELIDA KT S A S E G Y NL RDA N T SA S E G Y NL L K S K PY N T FLT SA S E G Y NL O I V I G L A E N M G S E G Y N L E Y I I N L Q L M G H T N N V R T G N M I M F F E W D K H G N I K T Q L M F F D W D E Q G N K Y N I M F F D W D G K

6 W 0 5 0 1 7 . . 7 5 7 2 3 4 5 6 7 9 9 7 7 3 7 3 7 3 7 1 8 3 3 1 8

B 7 4

VAD G N TTKFFE G S KRVEFKF G D Q R YYIKLFY KFIKVN Q I VKKIKFIKVN Q I VKKIKFIKVNI S S V EVF K APFE NALFKFAKK NRA D KK L Q RTEE S L A M E S A M LVL Q DYNE Q S DLF AII I KVN QRTEEV G G LFLPPN S D S KK Q RTE PPN S D S KK Q RTEKVE G G GLFLLLR CA LLKP D R V S L P K G L II LFKREVL F Q II G LFL QHYILFKREVL F Q II QHYILFKR PN G G EFET Q A LN S S FA S S Q VTY H IALFL P V G YH YI LDEK DY G N TIEHLDEK G TIEHL VLNI L A K A S HLFKR Q DIKK EH GKALDEK NEYK G Y AHPLKANEYK Y DY N DEK S VI Q T QRKI G AHPLKANEYKLNYE NY Q H A DIPLA Q S S Y G G G KHV FPLDLVEYR H VKP KA G S K ATE DLENIIM S D DAL DLENIIM S D DAL DLENII DA G S NA G KDEI S K DLNTAERLIMPNDD TALK V HLA S T TALK V K HLA S T TALK I D G F T GK SDVAPV S L DDKHHK T LLE IVLWV IDI L M SEK G K A LN DI L M SEK G A LN DI L M SEK S I FK KH LKELIVI DL R K GIL G N YEI L M SEKDYYKD G N YEKHRVRH S I EL N I GYEKHRVRH S I EL N I GYEKHRKDEV GA Q T LYDA TLVEKHRV EKDLNHE LNHE DEKDIDKI RHERRVI Y VI S L VTKI GEKDDIDLFVLEDKDL D AFH ID SDKI G L GVL LPEEKY N VLIDEKD GEVL EKY N VLI GEVL NHE AEL YVWDLKRYLPEE FLFDAN S K Y LPE DAN S K Y LPEAKRN SFLFPIF EYT R VIL Q HMLTFLFD K HKNLRR S Y GATK NVLV K KRR S FLF Q KDK S N Y FL G S NF Q MLDNVLV NF K KRR Q MLDNVL N G S G G N G ND D KVK SVIKIKDD RHKA K MLD SKDK T E KTI S V L S KDK KTI S S V L S T KDK KT S F G N Q S P SNK G FFNWH S G LE IKT Y VLV G NHRHLKAA S ELKVE G G S ELKVE G G A S E ELA Q A V SIVN S Q I KPY NAK S K K TI EYPYDAK DE E TKAA E TEIFIP G L V RR E V GE EPE S Q LK GNTLKDILIPE Q F T LLR G YAK LLR E TKA GYAK F DE LPL S Y LRNEEHN S T NITK QI T Q G AIL E P Q VI Q T RPE F DE Q P T A Q Q LV Q S RKI S KVKL Q E LV S VI Q T Q E P Q T QRKI A RPE SKVKL Q LV E I GI N L SV VV F DT DK G E G Y A VP Q F EPL QKNLILV Q S S NLLRVKL SIK TKKPKYILNYEARKKPKYILNYEARKKPKYIVHKK RK Q V V RK FE S L Q K LDNKEYDPKYILTE T EYDTTFD D DYDTTFD NYP L G EK SLH F AI RDAANLD LR D G S LYDAKLDE G I F TL VDAKLDE I D LDYDTTFDVDLR E G F T GKLVDAKLDEEITV D L YFK G I S E ND E NTA QEEEIDLKLDE G I VV LNLK I G KL SFKRIDLDLNLK S I FKRIDLDLNLKDVT L G S L KKKPEKFEDDILPT RK N H GN L DLD GTEE SFLPFD KRRKVYLFKV Y E QE L YLK GNIKNYKPTIKL K N DEVRKTEE KDEVRKTEE N LYIA L S T I K SIDKI DIKL K N ST S I IDKI L S K T S I N DEIL E Q YTDFYKPDDVRLKTKIDLPLI LMEAKRN G R F RNLMEAKRN R DIK GF NLME Q KF DKKK G A EAKMETMTRIRALELMEA I RN YRNPIF F S I LFYRNPIF NF I R SLFYRN D S VH SLKK KRKIPVIEWV DNKPH HFYRNP I ELV SF S S LF Q HYKHD F N N K HYKHD F N IFII PA L K G W YHKN NE I I SEH G W YKYIP G S N Q E F G W YKYIP G S N Q EK YKHDHNRK F A LL NTL G F NLF G S S D R MRN G R IKN IKYI S K G S DKE I IKFFNA A S PL QV A IKFFNA A S PLF W H GYKYIP QV EA KFFN S DIK GIRD A S AN Y S V TKFFNAKR N L S V AADLPL Y E SFA S V AADLPL S Y FA V I S FAAD QVFP F S E SEL Q R ID Q S EH N S Q D E E GF PLDNTERR LLN A NPFAADIIF K G S TN T F SD T S FD F E EL I L PN T F SD E EL E I L N S D A D Q F G E I S N V S L I S N V L P S E EL Q L R W GV I KILL V G N V I S Y A S S L G T N E EL QAF S D N F Q K A F Q A N S A DVHKKY A G T F A Q F G G K Q N S A DVHKKY G A T F GK A Q AA QN A F SD K I I S N ENKR S R LRD Q L Y Q I LRVIE Q A N DV A S K G T K QV G L GELIE N TKKVDVRK HKKKLH A LATEKDH T S S K PL IDH G L IE TKKVDLRK K S DLNY K A L F M CFYVIMD Y G N I T R Q E YV K EDEITVP S DH G N ITVP L IE SDH N TKK S G E D G E G EDITLT GKLKDVTLT YV K EDE GKLKDVTLT V G K KL ITW L F S K S LL P ENLHEIL G L KV K ED Q V GKLNI S N E S A T G L DIEDILYIAIF G L DIEDILYIAIF L Y GDIEDI I YED SMNN DIA C G I S F RL G IRKKTTN VIDDILHKEA IVKN CIIE Q KFIT KLIVKN Q KFIT KLIVKKDNN FD VKNETK I A S I ELIV DLVL T KL S S I EK I Y LE S VHLF S EK S VHLF S I EK FV SK S D A T SLYKA DYEK K N ITKLTY G DL S D LKKKLY Y LE GDL S D LKKKLY Y LE IF G G DL F S G R RE SV E E GL E VL D YF N Q FLAPEL Y I C D GELTDVKRYP E L QFAEDHNRKDNP E L QFAEDHNRKDNP E L QFAED EI 0 NTY S KI Q PY F Q G RNLLKLFADD YIVKRKLE DIKAIKLE G Q N DD 0 Q L RDA S DIKAIKLE LAAHID L TMEIK H G E Y FH GNY G IRDRINTA Y FH GNY G S IRDRINTA Y FH L GNYDLLD O N N K I Q M K H R E D Q S T E I K V M N E G A Y F G N Y I Q KI KNTA S V N G R R M A F D W T A D R W I N M A F D W T A D R W I N M A F D W T D Y I K 6 W

0 5 0 1 7 . . 7 5 7 8 9 0 1 2 9 9

8 7 3 7 3 8 3 8 1

3 1 8

B 7 4

ET KK RTEE S A KK VYNNA G DDD KK KVYIKLFM Q KKIKV ALNIDLH GY II G Q LFLPPN D M S S I I K QRMKDV D KDYVDL KHW RTTNKA HII RT K N QARAA REH G W A RYILFKREVL Y Q IALFVPT T G VKDNEE GKY EAN S L I I Q I SMALFKEV S E GNI M G E LFLPV N G E YETT SKEHLDEK NY N Q H S I GTITHLFMLNEVPK C AA EFD GKVVRHT HHLLYLPV G T RAL Q S HLFT V G T FRRLKY ARKANEYK G Y HPLRALDER VWNEEPTI LD DLENIIM D A SDAL LNEYR Q I GNA PIILLK N HK G K RALDERKEVA TKA DE R N Q EI GYR DLNEYE L KPH LVTALK MV LA T D STADRLEMPN S I L NYILADP S D TADRLE G H E VY Q K SA S DL G L EYR G Y STAERLIM I V G NI G H L Q R SARNK Q DR RI LDI S L EK K H N G A LLE IIVLEA S D ETK RK G YEKHRVRH I LNT SEL YDL S P MKNY S NIV N IMPN I L S L LLD MVPNP S L N LL GYDL S L MVVLE G D I G N YDI S L EKVL T V HDE GPIL RD LVDEKDLNHE IE K LKLDV Q S SEETKILT G V K C E LIDKHEKNYN GF VL LPEEKY N V Q G LE K H SK A V QI D A S F LR VLDAKKV S LVEKHRVNY K S G LLIYK QVLEDKDL NF RR S Y FLFDAD K G EI S KRR IPDE D N ITNN VV S N SA KRYLPDL D AT S R K S DLNPKRYLPEE D AR I F RK SDD S E G T IL EKMLDYVLV F Q L S Y LLED K Q G A G G LKTE G I G LK D FLTLLEE LF KNKEKTI S N SV T M DDKVFVRH K DMMLK NY N A GHD KDDEVFD K H GA A N GN S MLTFLFD GKDD K H FNKI GA L A GL EAKAA LKVE G L S K G A T Q FYKMK D FAAK S E F G E E T K EKLI Q LPKEEI L KHL A EKLVRH YKA D VLV GKTIRHHM D DT SFL G T T LLR K S TK N H SYVE YITA PLR S K K S K RAK S K NH LP PE Q Q VI T G Y S K Q RP Q T E QFFNP I Q S KLKPD T ETVNH T A QM NLKKYV KP S ELK GE T RATED S KL E P QLV Q S RKI S A KV I G ELI EK K N Q F IT L G PTV RV S Q Q F NP T L Q E Y Q R SDV S R NL G Q YV Y A V G KKPKYILNYEARK D Q DPT A D QNT FDDV SLLLW Q S TVK S A VP S E S V GDKTV R V Q D S K S T LMLL D N SNTAPKIL I Q S K DTTLD D TLDYDKKYVLTEMA G N Q A RALE YDKKNV Q S RIKLIYDK I L QYIL S NE G S SITD L KV SRM S D P L Y S DAKLDE G I F KLVDAALTI LRVLKK A NAT SDLIF LT DLDLNLK I G S FKRIDLDLDE G I VLIL K HY DAAEKILTEKMTDAALLD TEKKTLYF IF TEE Q S N DD RVR SFP L DLDL G TKEIFE I L GVVIR T DLDLDE G I LRKLLF GTH VVVEEN Q K Q S IT VKL K N ST I KDEVRKT SIDKI DI T EIFEKTKNR S T G K LDF K G IA QLPNLKNY G F V Q S V LT IKAPNLKKKN VK S E I NIK GL RKIFFE LF RNLMEAKRN G R FVRLKTIID V KV Q IDD Q E IPL G E VI VRLKTIKNY T L VR KT I K QINYN Y Q KL S V LFYRNPIF LEIM F P G R MDID V G F AL L KQ S Q LMEADF L L E C L DN HHKHD S F V NFA N Q EK HFYR D A QP I G S F L IVK A ALDI S Q Q Q REA F Q NK QPEF V S G E AI G Y YKYIP G PLF G W YNKEN ND Y N S Y K S V QI SEI G E G I IHKI W HFF GY ENP S I F FD W HFYRNP GYNKTE F K Q YA L E SK SFPKN Q I HI RI IKFFNA A S Q V ITYIA G S DKML EVDDLE N Y GI N K IE D S M EA IN S V TFAADLPL Y EA SFA S V TKFFEAKRVE F A Q IKKITT V I S Y STKFFE S D GDKLFP V ITYIK G S DD STKFFNADK Y P SL K DKM SIPR NN NPD L V L PNPFEADLIFKFA G Y NLDDVK K I Q NPFE NPFEKDIKRLENFLK LA G E TF A Q F G E I S N V S L YNERDNF R FHWLAYIKA S T S NAKRV QDLIFK L A S YL G K Q N S A DVHKKY G A G T DP LA D I S L RVEN T YNE GKP T FN EL IFVIVYVD I GE IE A LNF G D D D F A DNE NTAKVDLRK NT S Q S A FI S S Q V M KPDT VINL IET S Q S FA D RY S G ID P Q Q K A S S S F V Q K KD DH G EDEITVP L I SDH TAATPM T Y G ERHT S N QK IY Q N DH S S VKA DH A S RK G EF S V G K KLKDVTLT Y G L KEI MP T T S V L K A L SE L A A D VTPT GK I Q Q PLALL L KK VVKL G A KV E S V L Y GDIEDILYIAIF G L TL G K Q E DEI S K DLK S N E Q N G S T EHM Q M AK L Y G G A G KDEV L Y G G V ED V Q Q PLII GK F DPT LI LIVKN IADDK HKKK DYKITRLL L G L NKLNI N KD S ILL FN I K SEK LK Q KFMT I V S DDKHI K FKT SD I V S EDILI S FK D S VHLF S I E IVF S L DLVI E F GIA ED D S L IEITKIPEK S S IL N G Y MN S ELIVI K I L CL S ELILKDHKD T NYI ATK AE Y L G Y EL S LKKKLY S RN YDA VI L H SDL G R KNYAA IRDDL Y C G ILDV SKP Q E FAEDHNRKDNPKV Y V GEKNDID YERN HMRY KVI G Y EK TKDI Y IK H V DLTIT Q T MKI 0 PN KME FH LLAELKYIW S S LYT I D S DV KILAEL G DID K AI G Q ILADD IK K E GEKNE 0 N S DIKD D NTA G Y NY G IRDRIR N LI V T KHL S F K G Q DK R R GKD FL VW I V Q D FH E D G INL KTE O K Q M A F D W T A D R W I N M G G N Y F G N D T Q K S V I Q L K W D F T K K E R M G N G N G Y N D Q Y G D K V V S S M G N G N G Y N Y I Q Y K W L S N M M L 6 W

0 5 0 1 7 . . 7 5 7 3 4 5 6 7 8 9 9 8 8 3 8 3 8 3 8 1 8 3 3 1 8

B 7 4

RYWEK KF G D Q R YYIKLFY KFE Q R YYIKLFY Q I IDI S L K G Y P G V G G KFIKVN Q I VKKIKF G D Q R YAIK PDDKE KKIKVNNRA G D KK KVNNRA HEKHIMPNY S N KK TEE S I A M KK KF KL II G L II Q I RTEEV G D K GL KAEAKM VLKE II Q R GLFLPPN S D S I KVYNN SK Q F IR K RTEEV GLFL V G K YH IALFL V G K YH L IPR S V DYMI YILFKREVL Y Q Q H I I Q RMKDV SIALFVPT NRHVD H I SHLFKR Q P DIKK S H HLFKR Q P DIKK S E A S Y FLKV APY EHLDEK Y S N TITHLFMLNE N K YTN KALDEK KP KALDEK KP LDYVDL G D DDD KANEYK Y D G AHPLRALDER K S KA DLEEYR G H K V SATE DLVEYR G H K V SATE G D YKDNEEKHW LENIIM S D DAL DLNEYR Q I GN GM G D KL TAERLIMPNDD TAERLIMPNDD AV FDEAN L D S TALK KLYKT LE MIVLWV LLE MIVLWV IL C AA E GKVVRHT LDI L MV HLA S T TADRLEMP SEK G K A LDTLLE GKVRY N L GYDI S L EKDYYKD G N YEI S L EKDYYKD KREEPTI K G K G N YEKHRVRH S I EL P IIV T A FKR LVEKHRV FH VEKHRV FH LIILLK N H GY DLNHE G YDL S MKN QIE H V T Q EEK VLEDKDL D A SDKI G L G VLEDRDL D A SDKI G M G KD NYILAD R VDEK S D L S VL LPEEKY N V GEILE S K K Q A I S D QYPYA KRYLPEE KNL KRYLPEE KNL RA S D ETK IV RR S Y FLFDAN S K RR PDE NLEEF MLTFLFD K H GATK LTFLFD K H GATK KKLDV Q S S N SVL E MLDYVLV NF K K Q ML Y I SLLED G K EK L I KDD KA K M S KDD LVRHKA R K S VEETKILTDK C KDK TI S S V L S T KDDKVFVR RE Q V S E KV Y VLVRH RHL KT Y V G TINHRHL L AWP K G INH S K T YD AK S K K LKEYPYD D S IMNN LR IKVE G G SLKTE G I VV S N KAA E K SE SA AK DE E T K A KLI N S Q LKEYP G Y S K E S ETK S MFY I I AK SY PE LI PE S Q TLKDILI NDMMLKLK T LLR Q VI Q T P Q K T ELI EAVYA VP F G NTLKDI QEPL LRVP F G N QEPL LMR Q Y VFYKMKNY N VD PE Q F GHD VKL E P QLV Q S RKV A R SKV I G S Q DPT A VKKAL KNLILV S NL Q S IK LPKEEID KHL KKPKYILNYEARK S D LLLW S Q Q T VRLML YDPKYILTE T T KNLILV Q S S N SIK GE YDPKYILTE T T D E YITA L PLRYDTTLD D TLDYDKKYVLT IKLNE DAANLD LY DAANLD LR D G E T SLY LE Q K FFNP I Q S KLKDAKLDE G I F KL DAALTI TRT KLDE I LR S D GVV H DLKLDE G I VV RFDDV N Q F IT LDLNLK I G S FKR S V DLDLDE I L GV QVN S A DL GF T Y E YLKRK G N N G L T E YLKRK N H GN L V G AL P G S DKTV R D S TEE PVADM V Q E G L NIKNYKPT V Q Y E G L NIKNYKPT H G N A V Q NATRALE IKL K N ST I KDEVRKT EIFEKT SIDKI DI Q T LPNLKN PEVFE VRLKTKIDLPLI VRLKTKIDLPLI G W YKK S A DLIF EAKRN G R FVRLKTIID YL VG R ALELMEA ELV ALELMEA LV I T LDF K HY RNLM SN I LFYRNPIF ALEIM A GL G G H FYRNP S I F I I Q S FYRNP I E SF S T G S K IA Q FP L S G HHKHD F V NF QEK FYR Q D P S MPNYM W H GYHKN EH W H GYHKN I I Q S V I YNKEN IVLKE KYI S KS NE S G DKE S NE S EH NPV K G G F S S P G S N PLF W H G YIDD E V Q V G YKYI QIPL E LT W GVI NA A S Q V ITYIA S N GD GNYII V I STKFFNAKR N I KYI S K G DKE GL V I STKFFNAKR N I T K G L G N IVK V IKFF A DLPL Y EA SFA S V TKFFEAK V APY NPFAADIIF S K K T NPFAADIIF S T IK S L S V KQ S S TFAA D Q Q Q NPD L V L PNPFEADLI L G DNA D Q A D HI Y K Q I S G E I G IHKI Y F E E Q F G E I S N V S L YNERDNF EKHW T F E EL D F LN T F GN E EL D F F Q K AD S S N Y K Q A N S A DVHKKY G A G T DP DEAD L G N S IE A Q AF S N F S Q N EL IE A Q AF QN V A G Q V G LN DDLE G N I T GEL G L E F AEV Q KKITT I IE TKKVDLRK NT S Q A LA SFI Q S VRHA S DV Q A V G G RI DH S D A Y I GNLDDVK Q K H G N DEITVP L I SDH TAATP INHK E DY G N T S K G ED V PL QI T N T S K V PL RI IV FHWLAYIKA T D S YV K E GKLKDVTLT Y G L KAY D L Y GKV G K KLNI N Q S E A E G ED Q I T ST L Y GKV G K KLNI N Q S E A E TE ST YE LTD S R N IDDILHKEA IDDILHKEA PE M KPDT VEN G L DIEDILYIAIF G L TL G KE KEI QDEI GERHT N R S INL I KLIVKN FIT IADDK S NIA I V SELIV NDLVL S T I V SELIVKNDLVL S T KI EK Q S S VL K I C K DITKLT YEK IRDITKLT KT A TIY N Y LK Q K SVHLF S I E IVF L H SD S V L Q V A L S L S D KKLYD S L IEI LVEK E YE C PEL G Y ELTDVKRY PEL G Y ELTDVKRY MAK S N G E T Q NK SEHM M S E Y QAKP E L G D QFAED L K SRKDNPKV Y V GEKND 0 LR N KLFADD KRKLFADD KME H S H DIKTIKLLAELKY 0 G I VV S S K K E E YIV E YIVRRNF E FDYKITRLL YFH G KI R KK E FH G KI G IA ED Y IPDRIR O K R K N A D M G N A G N Y I S Q V N G R M G N A G Y N Y I S Q V N R KKF G R M P P E K S S I L G N Y MN NTA Y F GN S R N M A F D W T S G D K W I N M N N Y FLI G G G N D T S Q

6 W 0 5 0 1 7 . . 7 5 7 9 0 1 2 3 9 9 8 3 9 3 9 3 9 1 8

3 1 8

B 7 4

L S FKF G Q YAIKL S F F KEE E LD KF G Q YYIMLFY EYL A K LA LA Q KT S LAE LE DKKK I KVYNNA S ED LKHK L G S VKR S KL Y QAD RI KKIKV EL FT E LYA SEN T G V RMKDV D K GVF R INKT LI ENKTAWPVK K II T K N QAHA G E GR LF S I NN H GKY I I Q S IALFVPT G T KY N L ILKV Q LFAN L ADTLM YIH D E R FLPV N YH R VPKTHLFMLNEVPK G G V AE NNKKK G A F G YFLLE S Y VP K R G G N S M G L QHLFT V G KE N DRL S G S M VWNRALDER IVWN K E DT I L GKEATKA L NTE VK G S D NF KALDE R N Q EVKP V G LPVHF GRDKELK A DLNEYR G Q NA T Q K YLRKRRYRA L G H QL HEFAL I NR N I P SLTADRLEMPN I P S VMLDV EF DLE YR G H WYDVYK SLV T V S K ML N A G T KF G D R N D V E Q YYIKLFY LK TAE C E LIM I VPE SANY LKLL EI LEATLLE E G T K V D QK G F EKY G K Y KK KVNNRA D EF Y P IIVLEA SMKNY T Y HY VEVKR II Q I RTEEV G L F N LLD GYDI P MVPNWN NL Y Q R N SEKVL DKI S K A K L SE G YDL QIE H K L K S A R Q A PA G D VWFE LFL K G H A LVEKHRVDY Y K SE T YL Q E D F ILE S K K A V QI D A S E S KAK LIVLIYEY N H IA S S HLFKR P V G Y QNIKK L S S P VLEDKDL E KNE G I GKHI I V Q N RR PDE D F N VLR F NRPIF KALDEK P Y KRYLPEE D AN SDK Q I LILD KL Q I S S KK A G G G ML Y I SLLED K Q G A G G G IMD V V KV GT KREFD DLVEYR H K VK G S ATE K G A MLTFLFD H K KDDKVFVRH K NRI DNL Q S PMLAI TAERLIMPNDD PL KDD VLV K HT GAK S Y K NAKINDI SLL KKE H T Q Q L KLI T Q E MIVLWV LD KA D KTIRHRHLLY S G EPF Q S YVE K A S K E S ETK N H Q L SYVE V KT RAEYNRKV Q T FTNK V LL I S L EKDYYKDKT AK K G S ELKNHPYDDV ELIT DEKP Q K T ELI P G R D ETFK Y IPA Q G N YE GVYYD LVEKHRV D AFH IF PE S E VKYILIYL S F AVT NT A V I G Q DPT A DEK QNT DKDL S DKI G G IT V F G Q NP Q T DLLRNRLEPL G I VK S K S D LLLW Q S TVK S A M NF E VKAL VLE SEY R I QT L I QI LPEE NL K S K LILI Q S S D NKTTDIKK WR EMAYDKKYVLTEMALLK S G KYTE K KVI KRY QLVE MLTFLFD K HK GATK S F SL YNKKYILRI ENDEA G E EK RVLDAALTI RVLVEF KL RIDLPR KDD LILDLDLDE I L GVLILKFA EL Q K ILPA F KV Y VLVRHKA S K DAALLD E D T SFYTAELVIL K G G LR KNRT EIFEKTKNRR A LFYMEVI G S K TINHRHL T N QL DLDLDE I T S YPYDIM TE NIKVV G E Q RL VKDNLLA GANKT E Y V KI Q T LPNLKNY V L AK SP NLEEDAF E S PE S Q LKE F G NTLKDILI IE E S I G LTKRKKPT KTP S Y E S P F Q R VRLKTIID V K TY L DV L VP Q EPL LR D N QI VR TKINYPII G W N LD GP G ALEIM A F Q G P G R AK A S I I G E E RV Q S DIM G G H KNLILV Q S S NL SIK MEADLLV F R Q D P S F NFPL LDHEKK G V YD YDPKYILTE T T AL L K Y A Q L GE G Q RNP E L S VI S L AT I V QH Q D Y N SE W HFY GYNKEN D S Y EDLN EVKFLLN N DAANLD LR S D LYKE W HFY GYNKTA L F S I EH S ITVVNI GLK PI RL S K V DLK I KTYD KML IA S N GDKML LDE G VV H E IK S S G NEE RVE V ITY STKFFEAKRVE R KT EK GIL EELT Q T NYVY T FKFNPFEADLIFKFKIT MVLWEADD V Y E QE L YLKRK G N N L K GE V ITY STKFFDPDK N I GL T EFLMNN GKIFRKV GNIKNYKPTL S I NKFEKDLKR S K IRTKRNL DR L T YNERDNF DLF S Y FKI KD KTKIDLPLIFF ELDIFF K T QADLNKIN SI S G DP D R WDL EREK G I W VRL N S H G S F ALELMEA T FN V YINT S Q A LA S S I S L SFI Q V YVKH ETVK S AAE YRNP I ELV SF S AN G K D QID IK P Q Q E A HF S S F V N S G LN FE S G E L HL GFH S L KMI M Q T KDH L TAATPM Q T KIRI S K VHKK W HF GYHKN NE I I SEHPK DH KKV Q A V R C L IIATN KMP VI S PIK GLDILH IKYI S K G S DKE IDE Y L V G EDNPL Q T NTVY F SDL L Y G G TL G KE KEI QDEI K MP SDL G E EN Q L AYEAVH R H QA Q S V TKFFNAKR N LKK G L E G KLAI A E N YP S S Q KKK DK KKKTKE I AADIIF K G S TEY K L S S EDIII N E S T SEA F N L S G EK S D GE LVI I IAD SE VF L H SDLVI I TFPKNII FL S I ELILKDHKTL S V P Q ELKEFN TKIYD L I S VIEITKI G E Y G DM NNLE E NPF G FD G A K Q ANYA G S DIKDD G T N E EL F A QAF S D N F Q K NE EA IRDDLKLTERI RIK ID S PKV G Y EKNDID KE ELNNL N DV A S Q V G L GEL N Y Q EKI G Y DLVITKRYKYL G T DKN 0 IW S KLLAELKYIW S N STAR EEIML I EDYWIE Q A SRP LLH T S S K PL I Q D I KLLADD IVRRKYNYVKL 0 KV T R N FLI KV EI DKMEANV H Q TDY G N Q V I T R Q EEK K FH D D G IIRKTEIKIMN O V I Q M G G N G Y N D T S Q V I T E Q I L P M K R P E D K K G A A M K V K ED G K L N I S N E S A T M I M G T G N G Y N Y I Q Y K N K R M K V P E D H 6 W

0 5 0 1 7 . . 7 5 7 9 4 5 6 7 8 9 0 9 9 9 9 9 3 9 1 8 3 3 3 3 4 0 1 8

B 7 4

LMK FYD Q K LTE Q L K S R KI Q K IE REDIVRNI S L D Q L KLNNTK Q S MEN Q E KVFE RFV DWKLD YDDL IV M DDIEN G T RLPNITLIL FHIDNNDVT SL IDLRDDAK TT R K QY G N GD R W Q S YVK HKLK RDNL KI G LYDVLVNV G LAEKLRIK S K WP G I SE T YE E S D YPRY S K DNKE ERLA L N GL N K EKDFLNA RFFV I G D KPDRLIVIKEFKRF LALKTN I T SR ER LYK G RKN LTLRVK RVT S K Q K NPE S VVVVL Y E SR R I ST 4 LR G L GI KVP G S G KDH S K GNKRLFKE HTR L PREI SNNKID LLPRNNK 3 NLYN AFDKLNE LEKMVV KNVLLKP I R K QLIKILPPI 3 KRIK N WL YI L NL STE /

I RTD 2 DYND L S E DK S Q R D I KWKL QAFMK G RYH H TFN G P G E T I ELALKYI N Q R Q L KRENKDR I PY L EKELR S I R N SYKERY M RNFF G L N K L S Y HKAN 3 0 V G T DF Q LKYDRAF I Y K Y LDT Y RA L MDDP S TP NLMA RKR E N V QEK S E M E G S KI REI K YEL G E SEERNA RE I F QIY L V GRIR S F DL WPA H I SR I D SMVKYDRKYKP RKK F RKKA N S K YYIATRNYL DEDVKY RNN F VN S E A K QKL RVL L R H DK CT S LD LA I NKLLE L RKL Q L DLPD KYT Q I LL K Q Q D KKKYY S F PFRA SK E L S K GFVLEI T T S A KHNRYEKK L PN K AYHN GYKAM LYAP MI R LNIAID K G S EK RHFLIHDYI R S D KD DAID KL D L SF G L R G Q DE YYFDIL A LALALK RTKN E RYNN QTDLE D LE KANNTTENL S L KIPHAK L S L Q A ETRFPI K MDKKL A Q D KD V RVIAYEENKYVMAD RFKEELH E G A F K T T Q HDVF KAKP V G KY ALVIKKKVNII RKYINLP Q E E RRKWHLD N Y GE RHY EVP S TF LPLPV QHK Q I VL K Q S HE E DP SHLR DY L PPILKRWF G NFLLNKYR Q LDLEKD V S K IVEKRIYY NFPLE S E A A G N C DR Q Y L AKRERR P A QKA R EDNIIPN I PI SLL F LVR R Y S NDINHKT KNKRHVLEAT N R Q EKKLDL SYD Q I EMMKR L VRD N KPKT R G ELKIKII AI INY FKDKDIV G L S VEY G K G RIIEF KD Q MDV F PK EV ID Q I KI LN P L SLR K LA LVYAPKK R M S Q FFI G E EDK E Q DI Q S K PNT N I L QNL HIM VRY D A S ED SDF RI M TATDE N G E FIY Q K PVN LNI A T SE N L N EFKTN L KVLA F DNK EDKAK T AK K H GA G G V G S H S PYDI G G GH EE E G G S IYM PTAL K Q THE QKKER VKEKA S V I P V GE KV T HY QE RH K E WL D T SD C T FEPLKN NRRNMN NKY NFFNI NLDP G L NH T Q K QL LAFV VVLAND EYIEKT K LP SLF ALY S L ERIDR EEYPTKYV S T HLKK S Y T VVV L A RLL EREL LPFLV G R S S G M GRRVE R E QI KELN K NEKN NE G E T G IK SH S D NT A Y K YLERL Q I LID GIK MDLVYKD DL EFKRR F KHTDY K L SL DK Q I GL Q S RIK S RNLI C I RY RARF FI S P G K DT DFT TEKAKRR KKI Q S NDN N II L Q L E R YD s

PHK Q Y QILRVLRKP S E KIIWDHK D I RLN T T GRE Q E KN Q R S V QL Q N F E Q AL SFEED IEETN C y

IYIRFVVIMDMDDV K G AD LTL YRENDKKNRILA G F G F E G I F ANK NRE V EET SHVY G V ELY Q RYN S Q Q AVR Y I GELEY G V 4 - TDTATNY KTRK Q KEIRDHPI LEKELW R TP QRY DYTKEYI R D KIDD V HEKYALKDYIA DINNERYP IKILIEE N I GV 2 - P Q REN F Q R T N FTDLKDVN EHIRREF S L VEV KPTRX L TRL I Q P G D SF NFM S T S I EAFAVNAY IKHLA A S D F DEL N R SEK M EY - I S D V DD S M EYD DDNPLIEI RLE L G S TER RLN T S EL s

HLN D T S DKLLKV C E C E VV DLPMW S Y E V F TAN I N Q DA L C N G KRVEFVLLPD S S DT Q G S Q R AKI I G L R G K T E LE Q T K EY S Y G E C y

VK Q L HLFKFAAAIAVDAPN G T IYD T VR STIHLAYY Y S K DIRI Q A KN - QRH S TF LKLNFD DNNEN YFYLK D R L A W KMKTILVKK S S V S P G RKN KKNE I S A S T I YIMD SE I R KYI o

RL G ELKr

DAN Q VTY VLTP Q D K KIY GAH S F G L G R PLAK G A C TIK Q DVA D PE SFT QYVNFDPPP H M I S A E R Q G K EPI P L S T IVFV PL S HVRYFVDHH Q G FNL - GYNNNIE NAE TTK E I NT K KLKKAI K F SDLN E I A G RYF S TT G G R Y Q YLMK KD GEL EA Q M EN 2 6

NNL KHK KT T IMIWNE Q I KD GTKYPTYYVNNT G W Y T TK CKW IAI K S NLN SKDL G R ILLP NLFTKRPL E TM SATK G N AD V KD E S ANINE D N - GY NRKYEVTKITDY G N YIDFDFLV S P S I KKLKKV S E T S T NLMLM 3

L A DIIDIDLF S Q RINYVWDL L KA I EV GH Y L TRPAIEI Q L Y LYN IV Y 2

K L Q Q S F X S Q S IRAIPL G I EAT Q K HKEA- E I G Q R W G I F Q N G RVKH Y I Y G KNPDD H T SLMI G I N A SR G ANA PI YLAKKMI u

E C G G F S VIRI C YTVETKILNAFIRKFNED E K SNRD G N KEDKAIF STVIILE VIYTVK NED YTNPLRR G E ENDYL KL DNF S C P LA F ETVKVLPI T FKLE G l

M V I KDVTAKET G L D S V Q E S K TDK - L VHNITKEVDE E G S E C N L G G RPFKLHKI S G W QIN IL INKDL T S S KR Q S KI EDIREDRVHKA E LERINELL Q A X A Y A G IELDK KNIN S DK G E G Q NFYNR F A A S G TLKKFI A LLVKAYVE E E Y K SK KN - Q S s 0 GK E K LDNKE S A i 0 G I S T Q D RK Q E G P YYA Q V VN NY E NTAR S E S A FINEYV Q K S FKL T P CRTK S N K NEE S RIWYI SRNP S A O Q M T I N Q E E E I M D K M W E I M K D T T M H F F L N E T P Q L H K V F Q L K E S K T K Q K ( H

6 W 0 5 0 1 7 . . 7 5 7 1 2 3 4 5 9 9

8 0 4 0 4 0 4 0 1

4 1 8

B 7 4 EQUIVALENTS AND SCOPE

[00549] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents of the embodiments described herein. The scope of the present disclosure is not intended to be limited to the above description, but rather is as set forth in the appended claims.

[00550] Articles such as“a,”“an,” and“the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include“or” between two or more members of a group are considered satisfied if one, more than one, or all of the group members are present, unless indicated to the contrary or otherwise evident from the context. The disclosure of a group that includes“or” between two or more group members provides embodiments in which exactly one member of the group is present, embodiments in which more than one members of the group are present, and embodiments in which all of the group members are present. For purposes of brevity those embodiments have not been individually spelled out herein, but it will be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.

[00551] It is to be understood that the invention encompasses all variations, combinations, and permutations in which one or more limitation, element, clause, or descriptive term, from one or more of the claims or from one or more relevant portion of the description, is introduced into another claim. For example, a claim that is dependent on another claim can be modified to include one or more of the limitations found in any other claim that is dependent on the same base claim. Furthermore, where the claims recite a composition, it is to be understood that methods of making or using the composition according to any of the methods of making or using disclosed herein or according to methods known in the art, if any, are included, unless otherwise indicated or unless it would be evident to one of ordinary skill in the art that a contradiction or inconsistency would arise.

[00552] Where elements are presented as lists, e.g., in Markush group format, it is to be understood that every possible subgroup of the elements is also disclosed, and that any element or subgroup of elements can be removed from the group. It is also noted that the term“comprising” is intended to be open and permits the inclusion of additional elements or steps. It should be understood that, in general, where an embodiment, product, or method is referred to as comprising particular elements, features, or steps, embodiments, products, or methods that consist, or consist essentially of, such elements, features, or steps, are provided as well. For purposes of brevity those embodiments have not been individually spelled out herein, but it will be understood that each of these embodiments is provided herein and may be specifically claimed or disclaimed.

[00553] Where ranges are given, endpoints are included. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and/or the understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value within the stated ranges in some embodiments, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. For purposes of brevity, the values in each range have not been individually spelled out herein, but it will be understood that each of these values is provided herein and may be specifically claimed or disclaimed. It is also to be understood that unless otherwise indicated or otherwise evident from the context and/or the understanding of one of ordinary skill in the art, values expressed as ranges can assume any subrange within the given range, wherein the endpoints of the subrange are expressed to the same degree of accuracy as the tenth of the unit of the lower limit of the range.

[00554] In addition, it is to be understood that any particular embodiment of the present invention may be explicitly excluded from any one or more of the claims. Where ranges are given, any value within the range may explicitly be excluded from any one or more of the claims. Any embodiment, element, feature, application, or aspect of the compositions and/or methods of the invention, can be excluded from any one or more claims. For purposes of brevity, all of the embodiments in which one or more elements, features, purposes, or aspects is excluded are not set forth explicitly herein.