Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYMMETRIC PROTEINS
Document Type and Number:
WIPO Patent Application WO/2023/006348
Kind Code:
A1
Abstract:
The invention relates to protein building block named the Self-Assembling Kelch (SAKe) protein. The protein has a stable, symmetric design with readily accessible loops that can be varied in both sequence and length to later bind larger molecules or scaffold a catalytic site.

Inventors:
CLARKE DAVID (GB)
DE FEYTER STEVEN (BE)
VELPULA GANGAMALLAIAH (BE)
VOET ARNOUT (BE)
WOUTERS STAF (BE)
Application Number:
PCT/EP2022/068475
Publication Date:
February 02, 2023
Filing Date:
July 04, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV LEUVEN KATH (BE)
International Classes:
C07K14/00; C07K19/00; G16B15/20
Domestic Patent References:
WO2011094598A22011-08-04
WO2013093115A22013-06-27
Other References:
LI XUCHU ET AL: "Crystal Structure of the Kelch Domain of Human Keap1", JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 279, no. 52, 1 December 2004 (2004-12-01), US, pages 54750 - 54758, XP055934058, ISSN: 0021-9258, DOI: 10.1074/jbc.M410073200
VRANCKEN JEROEN P. M. ET AL: "The symmetric designer protein Pizza as a scaffold for metal coordination", PROTEINS: STRUCTURE, FUNCTION, AND BIOINFORMATICS, vol. 89, no. 8, 12 March 2021 (2021-03-12), US, pages 945 - 951, XP055934021, ISSN: 0887-3585, DOI: 10.1002/prot.26072
VOET ARNOUT R. D. ET AL: "Computational design of a self-assembling symmetrical [beta]-propeller protein", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 111, no. 42, 21 October 2014 (2014-10-21), pages 15102 - 15107, XP093004723, ISSN: 0027-8424, DOI: 10.1073/pnas.1412768111
ABE SATOSHI ET AL: "Functionalization of protein crystals with metal ions, complexes and nanoparticles", CURRENT OPINION IN CHEMICAL BIOLOGY, CURRENT BIOLOGY LTD, LONDON, GB, vol. 43, 12 December 2017 (2017-12-12), pages 68 - 76, XP085358771, ISSN: 1367-5931, DOI: 10.1016/J.CBPA.2017.11.015
VRANCKEN JEROEN P.M. ET AL: "Molecular assemblies built with the artificial protein Pizza", JOURNAL OF STRUCTURAL BIOLOGY: X, vol. 4, 1 January 2020 (2020-01-01), pages 100027, XP055934024, ISSN: 2590-1524, DOI: 10.1016/j.yjsbx.2020.100027
ADAMS JOSEPHINE ET AL: "The kelch repeat superfamily of proteins: propellers of cell function", TRENDS IN CELL BIOLOGY, vol. 10, no. 1, 1 January 2000 (2000-01-01), pages 17 - 24, XP085016269, ISSN: 0962-8924, DOI: 10.1016/S0962-8924(99)01673-6
MCCONNELL ET AL., ACS SYNTH. BIOL., vol. 9, 2020, pages 381 - 391
BROUWER ET AL., CELL, vol. 184, 2021, pages 1188 - 1200
SCHUSTER, BIOSENSORS, vol. 8, 2018, pages 40
CHEN ET AL., J. AM. CHEM. SOC., vol. 141, 2019, pages 8891 - 8895
PYLES ET AL., NATURE, vol. 571, 2019, pages 251 - 256
ZHANG ET AL., NAT. COMMUN., vol. 11, 2020, pages 1 - 12
BEN-SASSON ET AL., NATURE, vol. 589, 2021, pages 468 - 473
KUHLMANBRADLEY, NAT. REV. MOL. CELL BIO., vol. 20, 2019, pages 681 - 697
YEATES, ANNU. REV. BIOPHYS., vol. 46, 2017, pages 23 - 42
VOET ET AL., PROC. NATL. ACAD. SCI. U.S.A., vol. 111, 2014, pages 15102 - 15107
VRANCKEN ET AL., J. STRUCT. BIOL., vol. 4, 2020, pages 100027
CLARKE ET AL., CHEM. COMMUN., vol. 55, 2019, pages 8880 - 8883
VOET ET AL., ANGEW. CHEM. INT. ED., vol. 127, 2015, pages 9995 - 9998
VANDEBROEK ET AL., CHEM. COMMUN., 2020
BEAMER ET AL., ACTA. CRYSTALLOGR., SECT. D.: BIOL. CRYSTALLOGR., vol. 61, 2005, pages 1335 - 1342
ADAMS ET AL., TRENDS CELL BIOL., vol. 10, 2000, pages 17 - 24
ZUO ET AL., BMC GENOMICS, vol. 18, 2017, pages 797
CANNING ET AL., JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 288, 2013, pages 7803 - 7814
SPEIR ET AL., STRUCTURE, vol. 3, 1995, pages 63 - 78
LAVELLE ET AL., J. PHYS. CHEM. B, vol. 113, 2009, pages 3813 - 3819
VAN ELDIJK ET AL., J. AM. CHEM. SOC., vol. 134, 2012, pages 18506 - 18509
SALGADO ET AL., ACC. CHEM. RES., vol. 43, 2010, pages 661 - 672
HUARD ET AL., NAT. CHEM. BIOL., vol. 9, 2013, pages 169 - 176
SUZUKI ET AL., NATURE, vol. 533, 2016, pages 369 - 373
BRODIN ET AL., NAT. CHEM., vol. 4, 2012, pages 375 - 382
VOET ET AL., PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 111, 2014, pages 15102 - 15107
MADEIRA ET AL., NUCLEIC ACIDS RESEARCH, vol. 47, 2019, pages W636 - W641
ASHKENAZY ET AL., NUCLEIC ACIDS RESEARCH, vol. 40, 2012, pages W580 - W584
CHAUDHURY ET AL., BIOINFORMATICS, vol. 26, 2010, pages 689 - 691
ANDRE ET AL., PROC NATL ACAD SCI, vol. 104, 2007, pages 17656 - 17661
EMSLEY ET AL., ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY, vol. 66, 2010, pages 486 - 501
WINTER ET AL., ACTA CRYSTALLOGRAPHICA SECTION D., vol. 74, 2018, pages 85 - 97
EVANS ET AL., ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY, vol. 69, 2013, pages 1204 - 1214
WINN ET AL., ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY, vol. 67, 2011, pages 235 - 242
MCCOY ET AL., JOURNAL OF APPLIED CRYSTALLOGRAPHY, vol. 40, 2007, pages 658 - 674
DOLINSKY ET AL., NUCLEIC ACIDS RESEARCH, vol. 32, 2004, pages W665 - W667
LI ET AL., PROTEINS: STRUCTURE, FUNCTION, AND BIOINFORMATICS, vol. 61, 2005, pages 704 - 721
DOLINSKY ET AL., NUCLEIC ACIDS RESEARCH, vol. 35, 2007, pages W522 - W525
OLSSON ET AL., JOURNAL OF CHEMICAL THEORY AND COMPUTATION, vol. 7, 2011, pages 525 - 537
SAXENA ET AL., BIOCHEMISTRY, vol. 35, 1996, pages 15215 - 15221
NIKKHAH ET AL., BIOMOLECULAR ENGINEERING, vol. 23, 2006, pages 185 - 194
Download PDF:
Claims:
CLAIMS

1. A polypeptide comprising 2 to 9 repeats of a sequence, each of said sequences having at least 70% identity with [SEQ ID NO:34] NGRIX5AVG8G9-Xn-LNSVXi4AXi6DXi8ETDEW23SFVAPM29TTPR33SGVG37VAV4oL], wherein Xn are between 1 and 15 amino acids, wherein x can be any amino acid, and wherein repeats are separated from each other from between 0 and 15 amino acids, wherein the amino acids , Glys, Glyg, and Trp23 in each of said sequences are conserved, and with the proviso that optionally for one of said sequences an aminoterminal part of said one sequence is located at the cterminal end of said polypeptide and the remaining carboxyterminal part of said one sequence is located at the aminoterminal end of said polypeptide.

2. The polypeptide according to claim 1, wherein, in one or more, or in all of said sequences,

X5 is Tyrosine, Phenylalanine, Tryptophane, Histidine or Methionine,

Xi6 is Tyrosine, Phenylalanine, or Tryptophane, and Xi8 is Proline or Valine.

3. The polypeptide according to claim 1 or 2, wherein, in one or more, or in all of said sequences Xi4 is Glutamic acid, Aspartic acid, Cysteine, or Serine.

4. The polypeptide according to any one of claims 1 to 3, wherein, in one or more, or in all of said sequences,

Arg33 is conserved and/or VaUo is conserved.

5. The polypeptide according to any one of claims 1 to 4, wherein, in one or more, or in all of said sequences,

Xs is Tyrosine, Xi4 is Glutamic acid,

Xi6 is Tyrosine Xi8 is Proline ,

Arg33 is conserved, and VaUo is conserved.

6. The polypeptide according to any one of claims 1 to 5, wherein, in one or more, or in all of said sequences, the amino acids, Trp23, Met29, , and Gly37 are conserved.

7. The polypeptide according to any one of claims 1 to 6, comprising 2, 3 or 6 repeats of said sequence.

8. The polypeptide according to any one of claims 1 to 7, wherein, one or more or all of said sequences has at least 80, 90 or 95% identity with SEQ ID NO:32 or is identical to SEQ ID NO:32.

9. The polypeptide according to any one of claims 1 to 8, wherein Xn are between 5 to 15 amino acids, or between 5 to 10 amino acids.

10. The polypeptide according to any one claims 1 to 9, wherein each of said sequences in a repeat are separated from each other with 1 up to 5 amino acids. 11. The polypeptide according to any one of claims 1 to 10, wherein one or more of said repeats are immediately adjacent to each other.

12. The polypeptide according to any one of claims 1 to 11, wherein all of said sequences are immediately adjacent to each other.

13. The polypeptide according to any one of claims 1 to 12, wherein the repeats of said sequence in the polypeptide are identical, with exception of the amino acids Xn-

14. The polypeptide according to any one of claims 1 to 9, comprising a first repeat and a second repeat of said sequence wherein said first and second repeat occur alternating in said polypeptide. 15. A multimer polypeptide of polypeptides as defined in any one of claims 1 to 14.

16. The multimeric polypeptide according to claim 15, wherein the polypeptides, as defined in any one of claims 1 to 14, are non-covalently bound to each other.

17. The multimeric polypeptide according to claim 15, wherein within the multimer of polypeptides, as defined in any one of claims 1 to 10, polypeptides are bound by cystine bonds. 18. The multimeric polypeptide according to any one of claims 15 to 17, which is a hexamer of 3 non non-covalently bound polypeptides as defined in any one of claims 1 to , having two repeats, or which is a hexamer of 2 non non-covalently bound polypeptides as defined in any one of claims 1 to 9 having 3 repeats. 19. A method of producing a functional protein comprising the steps of: a) providing a polypeptide according to any one of claims 1 to 14, wherein Xn is random selected or designed by an in silico method, b) generating a multimeric protein of said polypeptides, c) testing the multimeric protein for the function, and d) selecting the multimer protein with the function.

20. The method according to claim 19, wherein the function is protein binding, an enzymatic activity, or the binding to an organic molecule. 21. The method according to claim 19 or 20, further comprising the step of determining whether the multimeric protein is stable at pH below 4, or wherein the multimeric protein is resistant against proteolytic degradation.

Description:
SYMMETRIC PROTEINS

FIELD OF THE INVENTION

The invention relates to the design and expression of proteins with a symmetrical structures.

The invention relates to synthetic proteins with functional properties such as metal binding and enzymatic activity.

BACKGROUND OF THE INVENTION The functional and structural diversity of proteins has inspired researchers to engineer them for various applications. Recent examples have demonstrated the engineering of proteins as enhanced catalysts, vaccines, biosensors, and building blocks for 2D/3D frameworks [McConnell eta/. ACS Synth. Biol. 2020, 9, 381-391; Brouwer eta/. Cell 2021, 184, 1188-1200; Schuster Biosensors 2018, 8, 40; Chen et a/. J. Am. Chem. Soc. 2019, 141, 8891-8895; Pyles et a/. Nature 2019, 571, 251-256; Zhang et a/. Nat. Commun. 2020, 11, 1-12; Ben-Sasson et a/. Nature 2021, 589, 468-473]. In many cases, proteins with new functions are obtained via the redesign of existing proteins. Advances in computational protein design have stimulated the development of unique proteins with various conformations and functionality [Kuhlman 8i Bradley Nat. Rev. Mol. Cell Bio. 2019, 20, 681-697]. Therefore, protein engineers are not limited to re-purposing natural proteins and are able to expand their toolkit with new molecules with improved properties.

Symmetric proteins are highly desirable due to their stability and versatility as building blocks for the development of 2D/3D assemblies [Yeates Annu. Rev. Biophys. 2017, 46, 23-42]. An exceptionally stable, symmetric /3-propeller protein called

Pizza is described in [Voet et at. Proc. Natl. Acad. Sci. U.S.A. 2014, 111, 15102- 15107]. To showcase its functional potential, Pizza was redesigned to obtain protein assemblies, artificial enzymes, and high affinity scaffolds for various metals and metal-oxo clusters [Vrancken et at. J. Struct. Biol.: 4 2020, 100027; Clarke et at. Chem. Commun. 2019, 55, 8880-8883; Voet et at. Angew. Chem. Int. Ed. 2015, 127, 9995-9998; Vandebroek et at. Chem. Commun. 2020]. However, Pizza lacks an extensively modifiable interface which limits its capacity to bind more complex molecules. SUMMARY OF THE INVENTION

Advances in computational protein design have allowed for the development of new proteins with unique properties. Symmetric designer proteins have remarkable stability and can serve as versatile building blocks for the creation of macromolecular assemblies. The present invention describes the development of SAKe: A new symmetric, stable protein building block with modifiable loops. In the present invention polypeptides as claimed will be referred to as SAKE proteins. Following the observation of pH induced 3D self-assembly, metal binding sites were engineered along the protein's internal rotational axis to fabricate 2D surface arrays. Using atomic force microscopy, Cu(II) dependent on-surface 2D self-assembly is demonstrated. The present invention discloses a stable and highly modifiable SAKe protein scaffold, which for use as a building block for the creation of multi-functional macromolecular materials. In the conserved sequence motif, the amino acids Xn are not numbered. As the sequence listing does number the Xn aminoacids, the numbering of amino acids cterminal of Xn will differ.

Thus Xi4, Xi 6 and Xis of the sequence motif become respectively positions 29, 31 and 33 in SEQ ID NO 34 of the sequence listing. Throughout the specification the numbering of the conserved sequence motif is used.

Throughout the specification, when wording is used such as "Ile4 has Leu or Phe as alternative" this means that at position 4 of the sequence He, Leu or Phe can be present.

The invention is further summarised in the following statements:

1. A polypeptide comprising a sequence having at least 60 or 70 % identity with NGRIY 5 AVG 8 G 9 -X n -LNSVEi 4 AYi 6 DPi 8 ETDEW 23 SFVAPM 29 TTPR 33 SGVG 37 VAV 4 oL [SEQ ID NO:32] or comprising 2 to 9 repeats of said sequence, wherein X n are between 1 and 15 amino acids, wherein x can be any amino acid, and wherein the amino acids Tyrs, Glys, Glyg, Tyri 6 , Trp23, Arg 3 3, and VaUo in said sequence are conserved. 2. A polypeptide comprising 2 to 9 repeats of a sequence, each of said sequences having at least 70% identity with [SEQ ID NO:34]

NGRIX5AVG8G9-Xn-LNSVXi4AXi6DXi8ETDEW23SFVAPM29TTPR33SGVG3 7VAV 4 oL], wherein X n are between 1 and 15 amino acids, wherein x can be any amino acid, and wherein repeats are separated from each other from between 0 and 15 amino acids, wherein the amino acids , Glys, Glyg, and Trp23 in each of said sequences are conserved, and with the proviso that optionally for one of said sequences an aminoterminal part of said one sequence is located at the carboxyterminal end of said polypeptide and the remaining carboxyterminal part of said one sequence is located at the aminoterminal end of said polypeptide.

An example of this proviso is represented as follows : A polypeptide with three repeats schematically represented as

ABCDEFGH-ABCDE FGH- ABCDE FGH

Can thus also occur as

BCDEFGH- ABCDEFGH ABCDEFGH A CDEFGH- ABCDEFGH ABCDEFGH AB DEFGH- ABCDEFGH ABCDEFGH ABC EFGH- ABCDEFGH ABCDEFGH ABCD FGH- ABCDEFGH ABCDEFGH ABCDE GH- ABCDEFGH ABCDEFGH ABCDEF H- ABCDEFGH ABCDEFGH ABCDEFG

3. The polypeptide according to statement 2, wherein, in one or more, or in all of said sequences,

X5 is Tyrosine, Phenylalanine, Tryptophane, Histidine or Methionine, or wherein X5 is Tyrosine Phenylalanine or Tryptophane , X5 is Tyrosine or Phenylalanine, Xi 6 is Tyrosine, Phenylalanine, or Tryptophane, or Xi 6 is Tyrosine or Phenylalanine, and

Xi8 is Proline or Valine.

Embodiments for all above combinations of X5, Xi6 and Xis are explicitly envisaged and disclosed herein. In a specific embodiment X5 is Tyrosine Phenylalanine or Tryptophane, and Xi 6 is Tyrosine, Phenylalanine, or Tryptophane, and Xis is Proline or Valine.

4. The polypeptide according to statement 2 or 3, wherein, in one or more, or in all of said sequences X14 is Glu, Asp, Cys, or Ser, or wherein X i4 is Glu or Asp. 5. The polypeptide according to any one of statements 1 to 4, wherein, in one or more, or in all of said sequences,

Arg 33 , and/or VaUo are conserved.

VaUo is conserved.

5. The polypeptide according to any one of statements 2 to 4, wherein, in one or more, or in all of said sequences,

Xs is Tyrosine,

Xi 4 is Glu,

Xi 6 is Tyrosine Xi 8 is Proline ,

Arg 33 , and/or VaUo is conserved.

6. The polypeptide according to any one of statements 2 to 5, wherein, in one or more, or in all of said sequences, the amino acids, Trp2 3 , Met2 9 , , and Gly 37 are conserved.

Embodiments of all possible variations of X 5 , X 14 , Xi 6 , Xis are herewith envisaged and explicitly disclosed.

As recited above, apart from the absolute conserved Glys, Glyg, and Trp2 3 sequence variation is limited at position X 5 , X 14 , Xi 6 , Xis, and sequence variation is less stringent for the remaining positions as long as the overall sequence identity is above the defined percentage.

Further specific embodiments of sequences falling under the definition of SEQ ID NO:34 are sequences wherein:

Ile 4 has Leu or Phe as alternative; X 5 is Tyr, Met, Phe Ala or Val; Leuio has His as alternative; Xi 6 is Tyr, Phe or Trp ;Xis is Pro or Val; Met 29 has Leu as alternative; Arg 33 has as alternatives Ser or Met; Gly 35 has Ala as alternative; Gly 37 is conserved;VaUo has Ser, His or Arg as alternative;

Sequences wherein Ile 4 has Leu or Phe as alternative; X 5 is Tyr, Met, Ale or Val; Ansn has Asp as alternative; Seri 2 is has Lys as alternative; Xi 6 is Tyr or Phe; Xis is Pro or Val; Thr2o has Arg as alternative; Met2 9 has Leu as alternative; Arg 33 has as alternative Ser; Gly 35 has Ala as alternative; Gly 37 is conserved; VaUo has His or Arg as alternative.

Sequences wherein Ile 4 has Leu as alternative; Ala 6 has Val as alternative ; Leuio has His as alternative; Asnn has Asp as alternative; Xis is Pro or Val; Met 29 has Leu as alternative; Gly 35 has Ala as alternative choice; Gly 37 is conserved Sequences wherein Ile4 has Leu as alternative; Ala 6 has Val as alternative; Leuio has His as alternative; Xis is Pro or Val; Met29 has Leu as alternative; Gly35 has Ala as alternative; Gly37 is conserved.

For all the above sequences any sequence obtained by combination of the different possibilities for the recited amino acids is herewith explicitly disclosed.

The amino acids X of Xn wherein n = 1 to 15, or the amino acids in between sequence in a repeat can be any of the 20 natural amino acids encountered in polypeptides as well as modified versions thereof as obtained by post-translation modification. Other side chain can be envisaged when synthetic peptides are produced as long as the amino acids can be incorporated via its NH2 and COOH group in a polypeptide.

It is further envisaged that in the non-conserved parts of the sequence other amino acids occur which differ from the regular 20 naturally occurring amino acids as long as they are incorporated via its NH2 and COOH group in a polypeptide.

7. The polypeptide according to any one of statements 1 to 6, wherein, in one or more, or in all of said sequences, the amino acids Tyrs, Glys, Glyg, Glui4, Tyri 6 , Prois, Trp23, Met29, Arg33, and Gly37and VaUo in said sequence are conserved.

8. The polypeptide according to any one of statements 1 to 7, comprising between 2 to 9 repeats of said sequence.

9. The polypeptide according to any one of statements 1 to 8, comprising 2, 3 or 6 repeats of said sequence.

10. The polypeptide according to any one of statements 1 to 9, wherein one or more, or all of said sequences have at least 80, 90 or 95% identity with SEQ ID NO:32 or is identical to SEQ ID NO:32.

As an example, for a polypeptide with 3 repeats of the sequence, one sequence can be for example 82 % identical (>80%), a second 92 % identical (>90%), and a third 97 % identical (>95 %).

11. The polypeptide according to any one of statements 1 to 10, wherein X n are between 5 to 15 amino acids, or between 5 to 10 amino acids.

In these embodiments, for each of Xn the length can differ or can be identical, as long as the length falls within the defined range

12. The polypeptide according to any one statements 1 to 11, wherein one or more of said repeats of a sequence are separated from each other with 1 up to 5 amino acids.

In these embodiments, the length between two sequences can differ or can be identical, as long as the length falls within the defined range. 13. The polypeptide according to any one of statements 1 to 12, wherein one or more of said repeats of said sequence are immediately adjacent to each other.

14. The polypeptide according to any one of statements 1 to 13, wherein all of said repeats of said sequence are immediately adjacent to each other.

15. he polypeptide according to anyone of statements 1 to 14, wherein two or more, or all repeats of said sequence are immediately adjacent to each other.

16. The polypeptide according to anyone of statements 1 to 15, wherein two or more or all the repeats of said sequence are identical, with exception of the amino acids Xn.

17. The polypeptide according to any one of statements 1 to 16, comprising a first repeat and a second repeat of said sequence wherein said first and second repeat occur alternating in said polypeptide.

18. A multimeric polypeptide which of polypeptides as defined in any one of statements 1 to 17. In such multimerthe polypeptides may be non-covalently bound to each other, and optionally via the presence of disulfide cystine bridges.

19. The multimeric polypeptide according to statement 18, which is a hexamer of 3 non non-covalently bound polypeptides as defined in any one of statements 1 to 9 having two repeats.

20. The multimeric polypeptide according to statement 19, which is a hexamer of 2 non non-covalently bound polypeptides as defined in any one of statements 1 to 9 having 3 repeats.

The invention further relates to nucleic acids encoding these polypeptides, as well as expressions vector comprising these nucleic acids, and bacterial yeast or eukaryotic cells comprising these vectors.

21. A method of producing a functional protein comprising the steps of: a) providing a polypeptide according to any one of statements 1 to 17, wherein X n is random selected or designed by an in silico method, b) generating a multimeric protein of said polypeptides, c) testing the multimeric protein for the function, and d) selecting the multimer protein with the function.

22. The method according to statement 21, wherein the function is protein binding, an enzymatic activity, or the binding to an organic molecule.

23. The method according to statement 21 or 22, further comprising the step of determining whether the multimeric protein has rotational symmetry.

24. The method according to any one of statements 21 to 23, further comprising the step of determining whether the multimeric protein is stable at pH below 4, or wherein the multimeric protein is resistant against proteolytic degradation. 25. The method according to any one of statements 21 to 24, further comprising determining whether said multimeric protein assembles into quaternary structures, such as fibres, tubes, three dimensional cages or two-dimensional layers.

DETAILED DESCRIPTION Figure legends

Figure 1. Overview of the Revolutionary design strategy. The human keapl b- propeller was used as a template (PDB: 1ZGK). A. The blades were separated. B. Using Rosetta SymDock, a perfect sixfold symmetric protein backbone was constructed from one blade. C. Simultaneously, a multiple sequence alignment (MSA) was constructed from the six unique blades. D. Putative ancestral sequences, derived from the MSA using the FastML server, are mapped on the perfect symmetric backbone. The resulting models were evaluated by their Rosetta Talaris2013 energy score and root mean square deviation (RMSD) from the ideal symmetric backbone. For each SAKe type, three designs were experimentally tested.

Figure 2. A. APBS calculated surface electrostatics for the S6BE normal crystal structure at pH 8.0 and the self-assembled crystal structure at pH 4.0. B. Close-up of the axial hydrogen bonding networks found in the self-assembled S6BE crystal. C. Close-up of the lateral hydrogen bonding networks found in the self-assembled S6BE crystal. Additionally, there is a hydrophobic interaction between two prolines (5 °A distance). The only structural difference between the normal and self-assembled S6BE crystals is found in the alternative conformations of the lateral glutamates, here shown in slate/wheat (pH 4) and white (pH 7-8).

Figure 3. AFM imaging of S6BE-3HH:Cu(II) 2D arrays. A. Rectangular-dimeric lattices are formed at a ratio of 1:10 (protein:Cu(II)).(i) Topography map of ImM S6BE-3HH, 10 mM Cu(II) (100 nm scale bar) and (ii) selected line profiles, (iii) Topography map of 0.5 mM S6BE-3HH, 5 mM Cu(II) (20 nm scale bar), (iv) Dimensions calculated from the packing of a S6BE-3HH crystal (PDB code 70PU)). B. Hexagonal lattices are generated at a ratio of 1:20 (ImM 56BE-3HH:20mM Cu(II)). (i) Topography map (100 nm scale bar) and (ii) selected line profiles, (iii) Topography map (20 nm scale bar), (iv) filtered zoom image of selected region (scale bar = 10 nm) and (v) phase map (20 nm scale bar), (vi) Dimensions calculated from the packing of a S6BE-3HH crystal (PDB code 70PV)). All AFM imaging was performed in 20 mM MES buffer (pH 5.6) and on a mica surface.

Figure 4: Sequence logos generated with Consurf and Webl_ogo,l,2 using 150 sequences from the NR proteins database and the default Consurf settings. (Top) Logo generated from the second blade of the human keapl b-propeller (PDB code: 1ZGK).3 A clear region with low conservation corresponds to the protein's loops. (Bottom) Logo generated from the third blade of the Mycobacterium tuberculosis PknD b-propeller (PDB code: 1RWL).4 This protein was the template for the design of Pizza and lacks large variable regions.5

Figure 5: (Left) Circular Dichroism (CD) spectra of SAKe proteins were similar to that of the parent human keapl b-propeller. (Right) The CD signal at 233 nm was followed in function of temperature. Compared to keapl, all SAKe designs have an improved melting temperature. Between SAKe designs, loop length and composition seem to have the more pronounced impact on thermal stability. When loops are conserved (as is the case between S6A proteins), the scaffold sequence can still have a significant influence as well.

Figure 6: Top and side views of crystal structures of Pizza6, A-type SAKe (S6A), B- type SAKe (S6B) , S6BE-L1, S6BE-L2 and S6BE-L3. The modular loops are coloured in slate and their respective lengths are given. For S6BE-L2, the loops were too flexible and not visible in the electron density map. For visualization purposes, they have been modelled in this figure.

Figure 7: Self assembled crystals of S6BE, grown in (A) 50 mM Na acetate - acetic acid pH 4.0 and (B) 50 mM Na citrate - citric acid pH 4.0. Both batches grew crystals after overnight dialysis of 20 mg/mL protein at 4 °C. The bubbles have an approximate diameter of 5 mm.

Figure 8: Various concentrations of S6BE were dialyzed in parallel to either 50 mM Na citrate - citric acid pH 4.0 or 4.5. At pH 4.5, crystals only grew after 144h to 216h. At pH 4.0, the proteins readily crystallized after 6h to 72h, depending on their concentration. The first pictures that show crystals are marked in green. The bubbles have a radius of approximately 5 mm.

Figure 9: Various concentrations of S6BE-3HH were dialyzed in parallel to either 50 mM Na citrate - citric acid pH 4.0 or 4.5. At pH 4.5, crystals only grew after 144h to 216h. At pH 4.0, the proteins readily crystallized after 24h to 216h, depending on their concentration. The first pictures that show crystals are marked in green. The bubbles have a radius of approximately 5 mm.

Figure 10: 5 mg/mL aliquots of S6BE and S6BE-3HH were dialyzed to 50 mM Na citrate citric acid at varied pH. S6BE yielded more crystals and they grew faster than S6BE-3HH. At pH 4.5, crystals remained stable. From pH 5.0 onward they dissolved again. The first pictures that show crystals were marked in green, while the first notices of disassembly were marked in red. The bubbles have a radius of approximately 5 mm. Figure 11: (A) The hexagonal crystal packings of normal (0.2 M K formate, 20% (w/v) PEG3350) and self-assembled (50 mM Na acetate - acetic acid pH 4.0) S6BE crystals are identical. The only obvious difference is found in the alternative rotamers for the lateral Glu. (B) In the normal crystal these Glu point outwards (0.48 occupancy of carboxy group, alternative rotamers were not visible). (C) The protonated Glu in the self-assembled crystal turn inward to form a putative hydrogen bridge with a serine hydroxyl group. Additionally, electrostatic repulsion in the normal crystal might direct the normal crystal's Glu outward.

Figure 12: To enable metal-induced self-assembly, three double-histidine sites were added in S6BE (creating S6BE-3HH). A closely packed structure resembling the self- assembled crystal structure is obtained through a combination of metal-mediated and complementary interactions of adjacent proteins. The vacancies of the hexagonal lattice are likely to be occupied by S6BE-3HH proteins.

Figure 13: DLS spectra of S6BE-3HH with different ratios of Cu(N03)2 in MilliQ pH 7.0 (A) and 20 mM MES pH 5.6 (B). DLS spectrum of S6BE-3HH with different ratios of Zn(N03)2 in 20 mM MES pH 5.6 (C).

Figure 14: DLS spectra of S6BE with different ratios of Cu(N03)2 in MilliQ pH 7 (A) and 20 mM MES pH 5.6 (B).

Figure 15: AFM of 1 mM S6BE-3HH imaged in 20 mM MES pH 5.6 on mica. (A) Topography map, (B) phase map and (C) amplitude map. The scale bar on all images is 100 nm. (D) Height traces corresponding to the line profiles taken from the Topography map.

Particle analysis of the topography, (E) average radius and (F) maximum diameter. Figure 16: AFM of 0.5 pM S6BE-3HH, 5 pM Cu(N03)2 imaged in 20 mM MES pH 5.6 on mica. Topography maps (A, D & G), phase maps (B, E & H) and amplitude maps (C, F 8i I). The scale bar is 100 nm for images A-C, 50 nm for D-F and 20 nm for G- I.

Figure 17: AFM of 1.0 pM S6BE-3HH, 10 pM Cu(N03)2 imaged in 20 mM MES pH 5.6 on mica. Topography map (A), phase map (B) and amplitude map (C). The scale bar is 100 nm. (D) Height traces corresponding to the line profiles taken from the topography map (A). (E) 2D-FFT map of the highlighted area selections in phase map (B), and schematic of the array's unit cell derived from the calculated distances of the 2D-FFT map and height traces (D).

Figure 18: AFM of 1.0 pM S6BE-3HH, 20 pM Cu(N03)2 imaged in 20 mM MES pH 5.6 on mica. Topography map (A), phase map (B) and amplitude map (C). The scale bar is 100 nm. (D) Height traces corresponding to the line profiles taken from the topography map (A). (E) 2D-FFT map of the highlighted area selection in phase map (B), and schematic of the array's unit cell derived from the calculated distances of the 2D-FFT map and height traces (D).

Figure 19: AFM of 1.0 mM S6BE-3HH, 20 pM Cu(N03)2 imaged in 20 mM MES pH

5.6 on mica. Topography map and 2D-FFT (A), phase map and 2D-FFT (B) and Amplitude map (C). The scale bar is 20 nm for A-C. (D) Projection and filtered image corresponding to the area selection in the topography map (A). The scale bar is 10 nm. (E) 2D-FFT map of the phase image (B), and schematic of the array's unit cell derived from the calculated distances of the 2D-FFT map.

Figure 20: AFM of 1.0 pM S6BE-3HH, 30 pM Cu(N03)2 imaged in 20 mM MES pH

5.6 on mica. Topography maps (A & D), phase maps (B & E) and amplitude maps (C & F). The scale bar is 100 nm for A-C and 50 nm for D-F. (G) Height traces corresponding to the line profiles taken from the topography map (A). (E) 2D-FFT map of the highlighted area selections in phase map (E), and schematic of the array's unit cell derived from the calculated distances of the 2D-FFT map and height traces (G).

Figure 21: AFM of 1.0 pM S6BE-3HH, 20 pM Cu(N03)2 imaged in 20 mM MES pH

5.6 on HOPG. Topography maps (A & D), phase maps (B & E) and amplitude maps (C & F). The scale bar is 100 nm for A-C and 20 nm for D-F. (G) Height traces corresponding to the line profiles taken from the topography maps (A & D). (E) 2D- FFT map of the highlighted area selections in topography map (D), and schematic of the array's unit cell derived from the calculated distances of the 2D-FFT map.

Figure 22: AFM of 1.0 pM S6BE-3HH, 10 pM Zn(N03)2 imaged in 20 mM MES pH 5.6 on Mica. Topography map (A), phase map (B) and Amplitude map (C). The scale bar is 100 nm. (D) Height traces corresponding to the line profiles taken from the topography map (A).

Figure 23: AFM of 1.0 pM S6BE, 20 pM Cu(N03)2 imaged in 20 mM MES pH 5.6 on mica. Topography maps (A & D), phase map (B) and amplitude map (C). The scale bar is 100 nm for A-C and 50 nm for D. (E) Height traces corresponding to the line profiles taken from the topography maps (A & D).

Figure 24: Manual sequence alignment of all SAKe design sequences. Each entry contains only one blade. Proteins that have two blades per repeat span two entries. The amino acid sequences between the GG and LNS motifs are annotated as the protein's variable loop regions.

Figure 25: SDS page from samples collected during preparative SEC for proteins S6BE-3CHR, S6BE-3HR-L3, mEm-v22-S6BE-3HR

Figure 26: SDS page from samples collected during nickel affinity chromatography expression testing of dS6AC. Clearly, the protein expresses as a genetic fusion of two S6AC units. F: Flow through. B: Wash B fraction. E: Elution fraction. I: Inclusion bodies.

Figure 27: SDS page from samples collected during nickel affinity chromatography purification of S6BE-3HR. BD: Before dialysis, with hexahistidine tag. AD: After dialysis, without hexahistidine tag. F: Flow through. A: Wash A fraction. B: Wash B fraction. C: Wash C fraction. Multiple thicker bands indicate presence of partially/non- denatured species. This has been reported for thermostable SAKe such as S6BE and S2BE.

Figure 28: SDS page from samples collected during nickel affinity chromatography purification of S2BE-3HR. BD: Before dialysis, with hexahistidine tag. AD: After dialysis, without hexahistidine tag. F: Flow through. A: Wash A fraction. B: Wash B fraction. C: Wash C fraction. Multiple thicker bands indicate presence of partially/non- denatured species. This has been reported for thermostable SAKe such as S6BE and S2BE.

Figure 29: Size Exclusion Chromatograms for various SAKe mutants. A: S6BE-3HR on a HiLoad Superdex 75pg 16/600 column. B: S6BE-3CHR on a HiLoad Superdex 200pg 16/600 column. C: S2BE-3HR on a HiLoad Superdex 75pg 16/600 column. D: S6BE-3HR-L3 on a HiLoad Superdex 200pg 16/600 column. E: mEm-v22S6BE-3HR on a HiLoad Superdex 200pg 16/600 column. F: dS6AC on a HiLoad Superdex 200pg 16/600 column.

Figure 30: Structures of Zn-induced SAKe cages, determined via xray diffraction. A to C: Upon addition of Zn(II), S6BE-3HR and S6BE-3CHR assemble similar tetra meric cages in solution. D: In S6BE-3HR the Zn-binding site is 2His-2Asp. E: In S6BE- 3CHR, the Zn-binding site contains a 2His-2Cys Zinc finger motif.

Figure 31: Crystal structure of S2BE-3HR cages

The present invention discloses the design and engineering an improved protein building block named the Self-Assembling Kelch (SAKe) protein. SAKe has a stable, symmetric design with readily accessible loops that can be varied in both sequence and length to later bind larger molecules or scaffold a catalytic site. To demonstrate SAKe's versatility, its structure was modified to undergo metal-mediated self- assembly into 2D surface arrays. This highlights SAKe as a promising new protein scaffold which can be readily redesigned to target various applications.

Through an investigation into naturally occurring pseudosymmetric proteins, the human keapl Kelch protein was identified as a template for the development of SAKe [Beamer et at.. Acta. Crystallogr., Sect D.: Biol. Crystallogr. 2005, 61, 1335-1342]. Kelch repeat proteins are /3-propeller proteins composed of six nearly identical tandem sequence repeats that fold into 4-stranded anti-parallel sheets around a central cavity [Adams etai. Trends Cell Biol. 2000, 10, 17-24]. This structural family has well-conserved blades, with the first and second strands connected by loops that vary in length and sequence. Using a computational procedure that combines ancestral sequence reconstruction with computational protein backbone optimization and subsequent sequence scoring, a new family of proteins named SAKe were derived from the keapl /3-propeller (Figure 1) . Initially, six different SAKe proteins were designed and evaluated. They varied in their core amino acid sequence and length of surface loops. The variable "top-side" loops sit between the inner two /3-strands of the propeller's blades; Atype SAKe (S6AE, S6AR and S6AC) loops have 10 amino acids while B-type SAKe (S6BE, S6BR and S6BC) loops have 6. Four of the six proteins were successfully expressed and readily crystallized: three A-type SAKes (S6AE, S6AR and S6AC) and one B-type SAKe (S6BE). Each protein has perfect 6- fold symmetry, with a central cavity surrounded by flexible loops on the "top" face. Circular Dichroism (CD) spectroscopy at elevated temperatures highlights their exceptional thermal stability, whereby S6BE exhibits a melting temperature (T m ) of over 95 °C.

To investigate loop modification and further functionalisation, three loop variants of S6BE were created (Figure 6). S6BE-L1 was created by conserving the intersection of S6Atype and S6B-type loops, while adding in 4 amino acids. For S6BE-L2, a larger part of the S6A-type loop was conserved with the addition of 4 amino acids. These 4 amino acid sequences were random Tyr-containing combinations of frequently occurring residues in nanobody CDR motifs [Zuo et at. BMC genomics 2017, 18, 797], and are not found in the natural Kelch proteins. S6BE-L3, was created, which is a 3-fold symmetric variant with two long loops of various length through grafting them from the Kelch domain of human KBTBD5 [Canning et a/. Journal of Biological Chemistry 2013, 288, 7803-7814]. All 3 proteins were successfully purified, and their structures were confirmed via X-ray diffraction (XRD). However, the S6BE loop variants were found to be less stable than the original S6BE, with T m dropping to a minimum of 51.7 ° C as the loop lengths increased (Figure 5). Nonetheless, all of these designs are significantly more stable than the template keapl /3-propeller, which unfolds at 44.1 ° C.

SAKe's symmetry and modifiable interfaces make it an ideal building block for construction of macromolecular assemblies. While dialyzing S6BE to low pH, a pH induced self-assembly was observed (Figure 7. This pH responsive behaviour was further investigated through dialysis experiments and XRD. The tipping point for this self-assembly was found to be around pH 4.0 (Figure 8 and 9). At pH 4.0, large crystals were observed for S6BE within a few hours up to two days, depending on the concentration. In contrast, crystals only formed after 6 days at 5 mg/ml_ when dialyzing to pH 4.5. The self-assembled crystals grown at pH 4.0 remained stable at pH 4.5. However, they disassembled at pH 5.0 (Figure 10), but readily re-assembled and formed crystals again when the pH was dropped to pH 4.0. To understand this reversible self-assembly mechanism, the crystals formed at different pH were investigated with XRD. These self-assembled structures revealed identical packing to the crystals grown via vapor diffusion at pH 7.0-8.5. At pH 8.0, His (pKa3 6.08-6.90) and Glu (pKa 4.3-4.4) are deprotonated and the net positive charge of S6BE likely causes mutual repulsion. However, at pH 4 these residues are protonated. Due to its symmetry and dipole-like character, S6BE shape and charge complementarity is then expected to allow for self-assembly into a tight hexagonal packing (Figures 2 and 11). A series of hydrogen bonds and hydrophobic interactions can then further stabilize the assembly. Similar mechanisms have been reported for the capsid proteins of cowpea chlorotic mosaic virus (CCMV) [Speir et al. Structure 1995, 3, 63-78; Lavelle et al. J. Phys. Chem. B 2009, 113, 3813-3819; Van Eldijk et al. J. Am. Chem. Soc. 2012, 134, 18506-18509].

The hexagonal packing and complementary interactions found in S6BE's self- assembled structures provide a promising starting point for its application as a supramolecular building block. Therefore, this protein was rationally reengineered to coordinate divalent metal-ions, and induce self-assembly into on-surface 2D arrays (Figure 11). 6 residues in S6BE were selected to be mutated to His residues, creating the S6BE-3HH variant. These double His sites can be found on the bottom protrusions of the 1st, 3th and 5th blade, forming 3-fold rotational symmetry. Previous reports have combined a similar metal coordination strategy with naturally occurring symmetric protein oligomers to fabricate nano-particles and crystalline protein arrays [Salgado et al. Acc. Chem. Res. 2010, 43, 661-672; Huard et al. Nat. Chem. Biol. 2013, 9, 169-176; Suzuki et al. Nature 2016, 533, 369-373].

The metal-induced assembly of S6BE and S6BE-3HH proteins was first screened using Dynamic Light Scattering (DLS) in solution. Divalent metals (Cu(NC>3)2and Zn(NC>3)2) were titrated into the protein solution in different buffers and pH, resulting in larger structures being formed as the ratio of metal: protein was increased (Figures 13 and 14). Interestingly, in both neutral and acidic conditions and at a ratio of 3:1 Cu(II):S6BE-3HH, an intermediate peak with a diameter of ~ 10 nm can be recognized before the structures reached larger indistinguishable sizes at higher ratios. To investigate the formation of on-surface 2D protein arrays, in solution amplitude- modulated atomic force microscopy (AFM) was utilized (Figure 3). In the absence of metals and on a muscovite mica surface, it was recognized that the optimal conditions for imaging S6BE and S6BE-3HH was in a mildly acidic buffer (20 mM MES pH 5.6). In these conditions, both proteins adhered to the surface and adopted monomeric species with some small aggregates (Figures 15). It is likely that in acidic conditions the proteins carry enough positively charged residues to form complementary interactions with the negatively charged mica surface. Organized 2D surface arrays were first observed for the S6BE-3HH protein at a ratio of 10:1 Cu(II): protein with an overall protein concentration of 0.5-1.0 mM, whilst imaging in 20 mM MES buffer (pH 5.6). The stable 2D surface crystals adopted high aspect ratio rectangular geometries with a preferred directional growth (Figure 3A). Through analysing the dimensions derived from the 2D-FFT of selected surface crystals and their height profiles, the unit cell of these arrays could be obtained with c ~ 4.5 nm, a = 7.23 nm, b = 4.84 nm and y = 90.20 (Figure 17). These unit cell dimensions correspond to arrays composed of S6BE-3HH dimers, and are similar to the 2D protein arrangements found in a S6BE-3HH crystal structure (Figure 3A(iv)). Therefore, it is highly likely that these arrays are composed of protein dimers arranged bottom- bottom, which are bridged via Cu(II) ions and complementary interactions. At a ratio of > 20:1 (Cu(II):S6BE-3HH), the surface arrays were found to transition from the rectangular-dimer species to hexagonal-honeycomb lattice. The 2D-FFT of these crystals confirmed their highly crystalline hexagonal packing with unit cell dimensions of a = 4.62 nm, b = 4.64 nm and g = 61.60 (Figures 3B and S18). This unit cell coupled with the height traces of individual domains (c ~ 3.2 nm) suggested that these 2D surface arrays were similar in geometry to the planar protein packing in a S6BE-3HH crystal (a = 4.70 nm, b = 4.72 nm, a = 90.00, b = 90.00 and g = 120.00). Similar crystalline structures were also observed on a highly oriented pyrolytic graphite (HOPG) surface (Figure 21). The unit cell of the hexagonal array domains derived from the 2D-FFT (a = 4.55 nm, b = 4.51 nm and a = 61.21) had similar dimensions to that obtained for the arrays on mica in the same conditions. However, the hexagonal arrays appear to have more vacancies and imperfections within the lattices. Through replacing Cu(NC>3)2 with Zn(NC>3)2, which is attributed with a similar atomic radius but a different tetrahedral coordination, no ordered arrays for S6BE- 3HH could be obtained in the same imaging conditions (mica, 20 mM MES pH 5.6). Instead, aggregated proteins were observed with no obvious crystallinity (Figure 22). This highlights the importance of Cu(II) and its square-planar coordination in the formation of these arrays. The assembly of S6BE in the presence of Cu(II), which does not contain His mutations on its bottom loops was also investigated. At a ratio of 20:1 Cu(II):S6BE, amorphous aggregated proteins were observed with no obvious crystallinity (Figure 23). This indicates the importance of the His mutations in S6BE-3HH and their role in facilitating Cu(II) coordination and subsequent assembly of 2D arrays. At pH 5.6, these His residues are close to their pKa and are likely to adopt a mixture of protonated and deprotonated states. The pKa may also be lowered and mediated by nearby local bases such as Glu, further facilitating deprotonation and assembly in the presence of excess of Cu(II). In addition, pH 5.6 is close to S6BE-3HH pi (5.32), where the proteins will be predominantly neutralized and will no longer repel each other. A similar mechanism has been previously reported for Zn-mediated protein assemblies at pH 5.5 [Brodin et at. Nat. Chem. 2012, 4, 375-382]. Though demonstrating the Cu(II) dependent structural self-assembly, the square planar coordination of Cu(II) combined with the bi-chelate His mutations and shape complementarity of S6BE-3HH appears important for connecting adjacent proteins into close packed 2D arrays (Figure 12).

Using a computational approach combining consensus design and Rosetta energy scoring, SAKe proteins were developed: a new symmetric, stable protein scaffold with modifiable loops. The loops can be varied in both length and sequence, highlighting their potential to be optimized for the binding of clinically relevant molecules or programming of catalytic activity. Following SAKe's modification with metal binding sites, Cu(II) induced self-assembly was observed of on-surface 2D arrays. The present invention discloses SAKe as a highly modifiable protein scaffold which can double as a building block for the fabrication of 2D protein assemblies. In summary, SAKe's versatility holds great promise for the creation of biotherapeutics and innovative on-surface materials.

The following crystal structures were submitted to RCSB PDB: SAKe6AE (70N6), SAKe6AR (70N8), SAKe6AC (70NA), SAKe6BE (70NC and 70NE), SAKe6BE3HH (70P4, 70PU and 70PV), SAKe6BE-Ll (70NG), SAKe6BE-Ll (70N7) and SAKe6BE-Ll (70NH).

Overview of DNA and protein sequences:

SEQ ID NO:l to SEQ ID NO: 19 (odd numbers): DNA sequences of designed SAKe proteins. The outermost emphasized 5'and 3'sequences contain a start codon, Ndel restriction site, stop codon and Xhol restriction site. SEQ ID NO: l to SEQ ID NO: 19 (even numbers): corresponding amino acid sequences. ligands

Number of TLS groups 6 18

Example 1. Protein design

The proteins examined in this research were designed using the Revolutionary protein design method [Voet et al. Proceedings of the National Academy of Sciences 2014, 111, 15102- 15107]. The KELCH domain of human Keapl (PDB code: 1ZGK) was chosen as a template for the SAKe designs. Clustal Omega was used to generate multiple sequence alignments (MSAs) starting from the six repeats of the keapl b- propeller [Madeira et al. Nucleic acids research 2019, 47, W636-W641]. The MSAs and their accompanying unrooted phylogenetic trees were used to construct lists of putative ancestral sequences using the FastML server [The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrodinger, LLC.] , with 250 sequences per node for a total of 10000 sequences for each SAKe construct. Idealized symmetric backbone models were designed using both PyMOL [Ashkenazy et al. Nucleic acids research 2012, 40, W580-W584] and PyRosetta [Chaudhury et al. Bioinformatics 2010, 26, 689-691]. With PyMOL, the second and last blades of the Keapl /Spropeller were extracted. The first blade was used for the design of type A SAKe (47 amino acids per repeat) and the last blade for type B SAKe (51 amino acids per repeat). The N- termini were truncated to minimize clashing with their symmetry equivalents during the subsequent Rosetta Symmetric Docking procedure [Andre et al. Proc Natl Acad Sci 2007, 104, 17656-17661]. A sixfold rotational symmetry was enforced and generated 20000 Monte Carlo Simulated Annealing (MCSA) optimized models per SAKe type. Results were evaluated on their docking score and RMSD from a manually constructed symmetric backbone. The blades of the best models were reconnected, adding in only the exact amount of amino acids that were removed earlier. The putative ancestral sequences were mapped on their corresponding backbone models using a custom PyRosetta script. For each SAKe type a model was selected with the lowest Talaris2013 energy score (e.g. SAKe6E 'E'), a model with lowest RMSD from the input backbone (e.g. SAKe6R 'R') and a c-optimized model (e.g. SAKe6C 'C')· For all SAKe constructs except S6AR, cysteine residues were mutated to serine or alanine. Following the interpretation of later experiments, S6BE mutants were rationally designed to have altered self-assembly properties. The S6BE- 3HH variant was designed from S6BE via following mutations: E24H-R25H, E118H-R119H and E212H-R213H. Amino acid sequences were reverse translated into DNA sequences, using a codon optimization tool provided by the supplier (Integrated DNA Technologies, Iowa, United States). Example 2. Production of proteins

DNA sequences were cloned into pET-28a(+) via Ndel and Xhol restriction sites, adding an N-terminal hexahistidine tag to the constructs. For cloning, the recombinant vectors were transformed into E. coli DH5o via heatshock. The vectors were validated via Sanger Sequencing (LGC Ltd, Teddington, United Kingdom), using T7 promotor and terminator primers provided by LGC. Correct plasmids were transformed into E. coli BL21 via heat shock. 1L cultures were grown in a shaking incubator at 37 ° C to an ODeooof 0.6. Thereafter, cultures were incubated on ice for 20 min. After adding 1 mM isopropyl /3-D-lthiogalactopyranoside (IPTG) incubation was continued at 20 ° C for 16-18 h while shaking. Cells were harvested via centrifugation at 3000 g. The supernatant was discarded and pellets were immediately stored at -24 ° C. Pellets were thawed on ice and suspended in 40 mL of 50 mM NahhPCU (pH 8), 200 mM NaCI, 10 mM Imidazole, 1 mM phenylmethylsulfonyl fluoride (PMSF) and 30 mg hen eggwhite lysozyme. They were incubated for 30 min at 15 ° C, while rotating. After, they were lysed via sonication. Lysates were centrifuged at 3000 g for 30 min. The Supernatant was filtered (0.45 uM) and loaded on a Nickel nitrilotriacetic (NiNTA) column equilibrated with a 50 mM NaH2P04 (pH 8), 200 mM NaCI and 10 mM imidazole buffer. The column was washed with 10 mM and 20 mM imidazole and the proteins eluted with 300 mM imidazole. The fractions containing proteins were collected and dialyzed overnight in 50 mM NaH2P04 (pH 8) and 200 mM NaCI. At the same time, histidine tags were removed via thrombin (100 U per protein). The dialyzed samples were subjected to an additional Ni-NTA chromatography step and then loaded on a Superdex200pg 16/600 column equilibrated with 20 mM HEPES (pH 8) and 200 mM NaCI.

AbS28o peaks were collected, dialyzed in 20 mM HEPES (pH8) and concentrated to stocks of 20 mg/mL or more. These stock proteins were stored at 4 ° C. The Superdex column was standardized using the Bio-Rad "gel filtration standard 151-1901" protein marker.

Example 3. Crystallization and Xray diffraction

Proteins were crystallized via sitting-drop vapor diffusion; using Qiagen Nextal Crystal Screening kits, MRC 96-well plates and an ARI Gryphon robot. For native crystallography droplets consisted of 0.3 pL mother liquor and 0.3 uL of 10 mg/mL protein in 20 mM HEPES pH 8.0, 200 mM NaCI. Protein crystals were vitrified after single-step soaking. PEG 400 or glycerol were used as cryoprotectant. Xray diffraction experiments were performed at Diamond Light Source (United Kingdom), Elletra (Italy) and SLS (Switzerland). The diffraction patterns were indexed using XDS or DIALS [Kabsc Acta Crystallographies Section D: Biological Crystallography 2010, 66, 125-132; Winter et al. Acta Crystallographies Section D. 2018, 74, 85-97]. Data reduction was done with Aimless in CCP4 [Evans et al. Acta Crystallographies Section D: Biological Crystallography 2013, 69, 1204-1214; Winn et at. Acta

Crystallographies Section D: Biological Crystallography 2011, 67, 235-242]. Molecular Replacement phasing was done with PHASER, using computationally designed models as search ensemble [McCoy etal. Journal of applied crystallography 2007, 40, 658-674]. Refinement was done manually with phenix. refine and Coot [Adams et al. Acta Crystallographies Section D: Biological Crystallography 2010, 66, 213-221; Emsley et al. Acta Crystallographies Section D: Biological Crystallography 2010, 66, 486-501]. The final structures were validated using Molprobity and the PDB validation tool [Chen et al. Acta Crystallographies Section D: Biological Crystallography 2010, 66, 12-21], before being deposited at RCSB PDB. The data for S6BE-L2 was severely anisotropic. The Diffraction Anisotropy Server was used to improve the corresponding MTZ file (merged with Aimless at a 1.95 °A cutoff). Example 4. Determination of pi and surface electrostatics

Surface electrostatics were calculated at pH 4.0 and 8.0 via PDB2PQR, using a complete

SAKe monomer as input. The calculated surfaces were visualized via PyMOL. pi values were calculated via PROPKA [Dolinsky et at. Nucleic acids research 2004, 32, W665-W667;Li et al. Proteins: Structure , Function, and Bioinformatics 2005, 61, 704-721;Dolinsky et al. Nucleic acids research 2007, 35, W522- W525; Olsson et at. Journal of chemical theory and computation 2011, 7, 525-537]. Example 5. CD spectroscopy

CD spectroscopy was performed with a JASCO J-1500 spectrometer. To measure the CD spectra, protein samples were diluted to 400 uLof 0.1 mg/ml_ in 20 mM NaH2P04 (pH 7.6). Ellipticity was measured at 20 ° C from 260 nm to 200 nm, using 1 mm cuvettes. 5 Accumulations were averaged. For determination of melting temperatures, samples were diluted to 400 uL of 0.25 mg/ml_ in 20 mM NaH2P04 (pH 7.6). The signal at 233 nm was followed from 0 to 95 ° C with intervals of 0.2 ° C, using sealable 2 mm cuvettes. The data was analyzed with a custom Python script, which fits a sigmoidal curve and extracts its midpoint. The positive signal at 233 nm was previously attributed to interactions between aromatic residues. Disappearance of this signal during melting experiments seemed indicative of tertiary structure disruption [Saxena et al.. Biochemistry 1996, 35, 15215-15221; Nikkhah et al. Biomolecular engineering 2006, 23, 185-194].

Example 6. Spontaneous crystal assembly

The tipping point of pH induced self-assembly was found to be approximately 4.5. To show reversibility of assembly, 500 pL of 5 mg/ml_ S6BE was dialyzed at 20 ° C in 50 mM citrate - citric acid buffer at pH 4.0, 4.5 and 5.0. To confirm the pH 4.5 tipping point, a similar experiment was repeated for both S6BE and S6BE-3HH. 500 uL Samples of various protein concentrations (5.0 mg/ml_, 2.5 mg/ml_, 1.0 mg/ml_ and 0.5 mg/ml_) were dialyzed at 20 ° C in 50 mM citrate - citric acid buffer (pH 4.5 and 4.0). For each protein, a self-assembled crystal was soaked in cryo-protectant, vitrified and shipped of for Xray diffraction. Pictures of the self-assembled crystals were taken with a Nikon SMZ800N microscope, outfitted with a TV Lens C 0.45x (Nikon, Japan).

DLS

Concentrated Sake6 and Sake6-3HH protein stock solutions (20 mg/ml, HEPES pH 8) were diluted with either 20 mM MES pH 5.6 or MilliQ to a concentration of 1 mg/ml. Metal suspensions {Cu{NO^)2 and Zn(NO 3)2, MilliQ) were titrated into the 100 mI of prepared protein solution to achieve the desired ratio of protein: metal. Size measurements were obtained at 25 ° C using a Zetasizer Nano ZS instrument (Malvern Instruments) and quartz cuvette (ZEN2112). Data analysis was performed using the Zetasizer software 7.11 (Malvern Instruments).

AFM

The protein stock solutions (40 mg/ml, HEPES pH 8) were diluted with the imaging buffer (20 mM MES pH 5.6 or MilliQ) to a desired concentration. The metal salts were suspended in MilliQ ( Cu{NOs)2 and Zn{NOs)2, Sigma-Aldrich) and mixed with the protein solutions to obtain the correct ratio of protein: metal, before being left to incubate for 20 minutes. Next, 30 m\ of diluted protein/metal solution was drop cast onto freshly cleaved substrates, muscovite mica (Agar Scientific) or HOPG (ZYB grade, Advanced Ceramics Inc.).

An additional 30 m\ droplet of the protein/metal solution was then carefully pipetted onto the AFM's cantilever holder, previously loaded with Nanoworld ARROW-UHF AuD20 (Resonance Frequency 0.7 - 2.0 MHz) and brought into contact with the surface droplet in the AFM. The sample and AFM system was then left to equilibrate for 1 hour at 25 ° C before imaging.

The images were captured on a Cypher ES atomic force microscope (Asylum Research) using the amplitude modulation mode whilst in solution. The imaging force and frequency were both carefully adjusted to reduce any disruption in the self- assembled surface arrays. The AFM data processing was performed with a combination of SPIP and Gwyddion software. The imaging buffer was either 20 mM MES (pH 5.6) or MilliQ, with protein concentrations of 0.5 - 5 mM and Cu(/V03)2/Zn(/V03)2Concentrations (5 mM - 50 mM) depending on the requirements of the experiment. Example 7. In vivo half-life of SAKe cages

In vivo half-life is an important factor in developing pharmaceutical molecules, as it directly effects the duration of adequate therapeutic effect. Herein, size is an important determinant for in vivo half-life of proteins. Below 70 kDa, most proteins are quickly removed via renal clearance. SAKe proteins of the present invention are roughly 30 kDa. To avoid clearance, SAKe proteins can either be genetically fused or the formation of larger complexes can be induced to surpass the glomerular filtration cutoff. Various SAKe mutants have been engineered and characterized to increasing biological size (Table 6). Additionally, these SAKe mutants show potential as bi/multispecifics.

Table 6; SAKe scaffold mutations allowing an increase of biological size. SDS PAGE

S6BE-3CHR (30.4 kDa), S6BE-3HR-L3 (34.2 kDa) and mEm-v22S6BE-3HR (57.7 kDa) are soluble and SDS PAGE confirms their expected sizes (Figure 25). Multiple thicker bands indicate presence of partially/non-denatured species. This has been reported for thermostable SAKes such as S6BE, S2BE and several of their mutants, DS6AC (65.0 kDa), a fusion of two S6AC units, is soluble and SDS PAGE analysis confirms the expected size (Figure 26).

S6BE-3HR (30.5 kDa) is soluble and SDS PAGE analysis confirms the expected size (Figure 27). Multiple thicker bands indicate presence of partially/non-denatured species. This has been reported for thermostable SAKes such as S6BE, S2BE and several of their mutants.

S2BE-3HR (10.4 kDa) is soluble and SDS PAGE confirms the expected size (Figure 28). Multiple thicker bands indicate presence of partially/non-denatured species. This has been reported for thermostable SAKes such as S6BE, S2BE and several of their mutants. Here, the lower band corresponds with the fully denatured species. ize Exclusion Chromatography (SEC)

Concentrated protein samples were loaded on HiLoad Superdex 75 pg 16/600 or HiLoad Superdex 200 pg 16/600 SEC columns (Cytiva). SEC can also be used to assess biological size. All samples were run with a HEPES buffer (20 mM HEPES, pH 8.0, 200 mM NaCI). Proteins expected to interact with metals were incubated with at least 5 mM EDTA prior to SEC injection.

Size exclusion chromatography confirms the expected sizes of each SAKe mutant in solution (figure 29). Without metals, no assemblies can be observed. Peaks at lower elution volumes, or higher molecular weights, are caused by impurities or domain swapped species. xrav diffraction (XRD)

Protein crystals can be diffracted to study the atomic three dimensional structure of the constituents. This way, the mechanism of metal-binding could be unraveled for proteins such as S6BE-3HR and S6BE-3CHR. Crystals were grown via sitting drop vapor diffusion in MRC 2-well plates (Hampton Research, UK). Droplets were set up using a Gryphon crystallization robot (Art Robbins Instruments, USA): 0.5 uL crystal screening kit buffer was mixed with 0.25 uL Zn(NC>3) (in water) and 0.25 uL from a 20 mg/ml_ protein stock (20 mM HEPES pH 8.0, 200 mM NaCI) (Table 7). The crystals were vitrified after single-step soaking using 25% PEG400 or glycerol as a cryoprotectant. Xray diffraction experiments were performed at Diamond Light Source (UK). Diffraction patterns were indexed using XDS or DIALS. Data reduction was done with Aimless in CCP4. Molecular Replacement phasing was done with PHASER, using an in-house model of S6BE. Refinement was done manually with phenix. refine and Coot. Table 7: Co-crystalization conditions for various Zn-containing SAKe cages.

Xray diffraction experiments carried out on S6BE-3HR and S6BE-3CHR highlight their mechanisms for zinc coordination and subsequent cage assembly (Figure 30). S6BE- 3HR cages coordinate zinc via 2His-2Asp motives, while S6BE-3CHR cages contain 2His-2Cys zinc finger-like motives on its vertexes.

SEC shows that S2BE-3HR proteins first self-assemble as six-bladed trimers (Figure 105). When zinc is added, these trimers assemble tetrahedral cages identical to those formed by S6BE-3HR (figure 31).