Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MODULAR BINDING PROTEINS
Document Type and Number:
WIPO Patent Application WO/2019/034332
Kind Code:
A1
Abstract:
This invention relates to modular proteins that are capable of binding to one or more target molecules. The modular binding proteins comprise two or more repeat domains, such as tetratricopeptide repeat domains; inter-repeat loops linking the repeat domains; and one or more binding domains. Each binding domain is located in an inter-repeat loop or at the N or C terminus of the modular binding protein. The binding domains may include heterologous peptidyl binding motifs, such as short linear motifs (SLiMs). Modular binding proteins with various configurations and methods for their production and use are provided.

Inventors:
ITZHAKI, Laura (Department of Pharmacology, Tennis Court Road, Cambridge Cambridgeshire CB2 1PD, CB2 1PD, GB)
PEREZ RIBA, Alberto (Department of Pharmacology, Tennis Court Road, Cambridge Cambridgeshire CB2 1PD, CB2 1PD, GB)
ROWLING, Pamela (Department of Pharmacology, Tennis Court Road, Cambridge Cambridgeshire CB2 1PD, CB2 1PD, GB)
Application Number:
EP2018/068580
Publication Date:
February 21, 2019
Filing Date:
July 09, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CAMBRIDGE ENTERPRISE LIMITED (The Old Schools, Trinity Lane, Cambridge Cambridgeshire CB2 1TN, CB2 1TN, GB)
International Classes:
C12N15/00; C12N15/10; C12N15/62
Domestic Patent References:
WO2009100990A12009-08-20
WO2010060748A12010-06-03
WO2017106728A22017-06-22
WO1992001047A11992-01-23
Foreign References:
US5969108A1999-10-19
US5565332A1996-10-15
US5733743A1998-03-31
US5858657A1999-01-12
US5871907A1999-02-16
US5872215A1999-02-16
US5885793A1999-03-23
US5962255A1999-10-05
US6140471A2000-10-31
US6172197B12001-01-09
US6225447B12001-05-01
US6291650B12001-09-18
US6492160B12002-12-10
US6521404B12003-02-18
Other References:
JACKREL M E ET AL: "Screening Libraries To Identify Proteins with Desired Binding Activities Using a Split-GFP Reassembly Assay", ACS CHEMICAL BIOLOGY,, vol. 5, no. 6, 18 June 2010 (2010-06-18), pages 553 - 562, XP002718322, ISSN: 1554-8929, [retrieved on 20091228], DOI: 10.1021/CB900272J
JUNG-HOON LEE ET AL: "Protein grafting of p53TAD onto a leucine zipper scaffold generates a potent HDM dual inhibitor", NATURE COMMUNICATIONS, vol. 5, no. 1, 7 May 2014 (2014-05-07), XP055522736, DOI: 10.1038/ncomms4814
CORTAJARENA AITZIBER L ET AL: "Protein design to understand peptide ligand recognition by tetratricopeptide repeat proteins", PROTEIN ENGINEERING, DESIGN AND SELECTION, OXFORD JOURNAL, LONDON, GB, vol. 17, no. 4, 1 April 2004 (2004-04-01), pages 399 - 409, XP002773683, ISSN: 1741-0126, [retrieved on 20040527], DOI: 10.1093/PROTEIN/GZH047
JI-SEON PARK ET AL: "Regulation of amyloid precursor protein processing by its KFERQ motif", BMB REPORTS, vol. 49, no. 6, 30 June 2016 (2016-06-30), KR, pages 337 - 343, XP055522899, ISSN: 1976-6696, DOI: 10.5483/BMBRep.2016.49.6.212
ABDELLALI KELIL ET AL: "Fast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences", PLOS ONE, vol. 9, no. 9, 10 September 2014 (2014-09-10), pages e106081, XP055523304, DOI: 10.1371/journal.pone.0106081
SAMPATHKUMAR ET AL., J. MOL. BIOL., vol. 381, 2008, pages 867 - 880
ZHU ET AL., PLOS ONE, vol. 7, no. 3, 2012, pages e33943
LI ET AL., BIOCHEMISTRY, vol. 45, 2006, pages 15168 - 15178
MARGARIT ET AL., CELL, vol. 112, no. 5, 2003, pages 685 - 95
HOLEHOUSE ET AL., BIOPHYS. J., vol. 112, 2017, pages 16 - 21
KOBE; KAJAVA, TRENDS IN BIOCHEMICAL SCIENCES, vol. 25, no. 10, 2000, pages 509 - 15
FINN ET AL., NUCLEIC ACIDS RESEARCH, 2016, pages D279 - D285
HUBER ET AL., CELL, vol. 90, 1997, pages 871 - 882
GROVES ET AL., CELL, vol. 96, no. 1, pages 99 - 110
SMALL, TRENDS BIOCHEM. SCI., vol. 25, no. 2, 2000, pages 46 - 7
ZHANG ET AL., NATURE BIOTECHNOLOGY, vol. 29, no. 2, pages 149 - 53
BLATCH ET AL., BIOESSAYS, vol. 21, no. 11, pages 932 - 9
BRUNETTE ET AL., NATURE, vol. 528, 2015, pages 580 - 584
PARMEGGIANI ET AL., J. MOL. BIOL., vol. 427, pages 563 - 575
NANJAPPA ET AL., NUCL ACIDS RES, vol. 42, January 2014 (2014-01-01), pages D959 - 65
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, pages 3389 - 34021
ALTSCHUL ET AL., FEBS J., vol. 272, pages 5101 - 5109
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 405 - 410
PEARSON; LIPMAN, PNAS USA, vol. 85, 1988, pages 2444 - 2448
SMITH; WATERMAN, J. MOL BIOL., vol. 147, 1981, pages 195 - 197
NUCL. ACIDS RES., vol. 25, 1997, pages 3389 - 3402
DAVEY ET AL., TRENDS BIOCHEM. SCI., vol. 36, 2011, pages 159 - 169
DAVEY ET AL., NUCLEIC ACIDS RES., vol. 39, 1 July 2011 (2011-07-01), pages W56 - W60
SAKAMOTO ET AL., BIOCHEM. BIOPHYS. RES. COMM., vol. 484, 2017, pages 605 - 611
ERKIZAN ET AL., CELL CYCLE, vol. 10, 2011, pages 3397 - 408
BAYLISS ET AL., MOL. CELL, vol. 12, 2003, pages 851 - 62
RICHARDS ET AL., PNAS, vol. 113, 2016, pages 13726 - 31
SONG; KINGSTON, J. BIOL. CHEM., vol. 283, 2008, pages 35258 - 64
PATEL ET AL., J. BIOL. CHEM., vol. 283, 2008, pages 32158 - 61
WATSON ET AL., NAT. COMM., vol. 7, 2016, pages 11262
DOWLING ET AL., BIOCHEM., vol. 47, 2008, pages 13554 - 63
MOELLERING ET AL., NATURE, vol. 462, 2009, pages 182 - 8
GONDEAU ET AL., J. BIOL. CHEM., vol. 280, 2005, pages 13793 - 800
MENDOZA ET AL., CANCER RES., vol. 63, 2003, pages 1020 - 4
MACCIONI ET AL., EMBO J., vol. 7, 1988, pages 1957 - 63
RIVAS ET AL., PNAS, vol. 85, 1988, pages 6092 - 6
HETZ ET AL., MOL. CELL, vol. 63, 2016, pages 686 - 95
KIM ET AL., NAT. CHEM. BIOL., vol. 9, 2013, pages 643 - 50
STEWART ET AL., NAT. CHEM. BIOL., vol. 6, 2010, pages 595 - 601
SAKAMOTO ET AL., BBRC, vol. 484, 2017, pages 605 - 11
LLOUZ ET AL., J. BIOL. CHEM., vol. 281, 2006, pages 30621 - 30630
PLOTKIN ET AL., J. PHARMACOL. EXP. THER., vol. 305, 2003, pages 974 - 980
BIRTS ET AL., CHEM. SCI., vol. 4, 2013, pages 3046 - 57
LEE ET AL., CHEMBIOCHEM, vol. 14, 2013, pages 445 - 451
LI ET AL., NAT. STRUCT. MOL. BIOL., vol. 17, 2010, pages 105 - 111
BONDESON ET AL., NAT. CHEM. BIOL., 2015
DICE J.F., TRENDS BIOCHEM. SCI., vol. 15, 1990, pages 305 - 309
FAN, X. ET AL., NATURE NEUROSCIENCE, vol. 17, 2014, pages 471 - 480
BECHARA ET AL., FEBS LETTERS, vol. 587, no. 1, 2013, pages 1693 - 1702
MEIER ET AL., J. MOL. BIOL., vol. 344, no. 4, 2004, pages 1051 - 69
"Protocols in Molecular Biology", 1992, JOHN WILEY & SONS
"Recombinant Gene Expression Protocols", March 1997, HUMANA PRESS INC
KONTERMANN, R; DUBEL, S: "Antibody Engineering", 2001, SPRINGER-VERLAG
RUSSELL ET AL.: "Molecular Cloning: a Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
KUBE ET AL., LANGMUIR, vol. 33, 2017, pages 1051 - 1059
KOLASINAC ET AL., INT. J. MOL. SCI., vol. 19, 2018, pages 346
"Remington's Pharmaceutical Sciences", 1990, MACK PUBLISHING COMPANY
KORINEK ET AL., SCIENCE, vol. 275, no. 5307, 1997, pages 1784 - 7
MARGARIT ET AL., CELL, vol. 112, no. 5, 2003, pages 685 - 695
LESHCHINER, PNAS, vol. 112, no. 6, 2015, pages 1761 - 1766
BONDESON, D.P.; MARES, A.; SMITH, I.E.D.; KO, E.; CAMPOS, S.; MIAH, A.H.; MULHOLLAND, K.E.; ROUTLY, N.; BUCKLEY, D.L.; GUSTAFSON,: "Catalytic in vivo protein knockdown by small-molecule PROTACs", NAT. CHEM. BIOL., vol. 11, 2015, pages 611 - 617, XP055279063, DOI: doi:10.1038/nchembio.1858
BOUDKO, S.P.; LONDER, Y.Y.; LETAROV, A. V; SERNOVA, N. V; ENGEL, J.; MESYANZHINOV, V. V: "Domain organization, folding and stability of bacteriophage T4 fibritin, a segmented coiled-coil protein", EUR. J. BIOCHEM., vol. 269, 2002, pages 833 - 841
BRUNETTE, T.J.; PARMEGGIANI, F.; HUANG, P.-S.; BHABHA, G.; EKIERT, D.C.; TSUTAKAWA, S.E.; HURA, G.L.; TAINER, J.A.; BAKER, D.: "Exploring the repeat protein universe through computational protein design", NATURE, vol. 528, 2015, pages 580 - 584
CHAPMAN; MCNAUGHTON, B.R.: "Scratching the surface: Resurfacing proteins to endow new properties and function", CELL CHEM. BIOL., vol. 23, 2016, pages 543 - 553, XP029552462, DOI: doi:10.1016/j.chembiol.2016.04.010
D'ANDREA, L.D.; REGAN, L.: "TPR proteins: the versatile helix", TRENDS BIOCHEM. SCI., vol. 28, 2003, pages 655 - 662, XP004476604, DOI: doi:10.1016/j.tibs.2003.10.007
DESHAIES, R.J.: "Protein degradation: Prime time for PROTACs", NAT. CHEM. BIOL., vol. 11, 2015, pages 634 - 635, XP055414762, DOI: doi:10.1038/nchembio.1887
DE VRIES, S.J.; BONVIN, A.M.J.J.: "CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK", PLOS ONE, vol. 6, 2011, pages e17695
DE VRIES, S.J.; VAN DIJK, M.; BONVIN, A.M.J.J.: "The HADDOCK web server for data-driven biomolecular docking", NAT. PROTOC., vol. 5, 2010, pages 883 - 897
GUETTLER, S.; LAROSE, J.; PETSALAKI, E.; GISH, G.; SCOTTER, A.; PAWSON, T.; ROTTAPEL, R.; SICHERI, F.: "Structural basis and sequence rules for substrate recognition by Tankyrase explain the basis for cherubism disease", CELL, vol. 147, 2011, pages 1340 - 1354, XP028392398, DOI: doi:10.1016/j.cell.2011.10.046
GUTHE, S.; KAPINOS, L.; MOGLICH, A.; MEIER, S.; GRZESIEK, S.; KIEFHABER, T.: "Very Fast Folding and Association of a Trimerization Domain from Bacteriophage T4 Fibritin", J. MOL. BIOL., vol. 337, 2004, pages 905 - 915, XP004496019, DOI: doi:10.1016/j.jmb.2004.02.020
HAO, B.; ZHENG, N.; SCHULMAN, B.A.; WU, G.; MILLER, J.J.; PAGANO, M.; PAVLETICH, N.P.: "Structural basis of the Cks1-dependent recognition of p27(Kip1) by the SCF(Skp2) ubiquitin ligase", MOL. CELL, vol. 20, 2005, pages 9 - 19
KOBE, B.; KAJAVA, A.V.: "When protein folding is simplified to protein coiling: the continuum of solenoid protein structures", TRENDS IN BIOCHEM. SCI., vol. 25, 2000, pages 509 - 515, XP004224292, DOI: doi:10.1016/S0968-0004(00)01667-4
LEE, J.-H.; KANG, E.; LEE, J.; KIM, J.; LEE, K.H.; HAN, J.; KANG, H.Y.; AHN, S.; OH, Y.; SHIN, D. ET AL.: "Protein grafting of p53TAD onto a leucine zipper scaffold generates a potent HDM dual inhibitor", NAT. COMMUN., vol. 5, 2014, pages 3814
LESHCHINER, E.S.; PARKHITKO, A.; BIRD, G.H.; LUCCARELLI, J.; BELLAIRS, J.A.; ESCUDERO, S.; OPOKU-NSIAH, K.; GODES, M.; PERRIMON, N: "Direct inhibition of oncogenic KRAS by hydrocarbon-stapled SOS1 helices", PROC. NATL. ACAD. SCI. U. S. A., vol. 112, 2015, pages 1761 - 1766
LONGO, L.M.; BLABER, M.: "Symmetric protein architecture in protein design: to-down symmetric deconstruction", METHODS MOL. BIOL., vol. 1216, 2014, pages 161 - 82
LU, J.; QIAN, Y.; ALTIERI, M.; DONG, H.; WANG, J.; RAINA, K.; HINES, J.; WINKLER, J.D.; CREW, A.P.; COLEMAN, K. ET AL.: "Hijacking the E3 Ubiquitin Ligase Cereblon to Efficiently Target BRD4", CHEM. BIOL., vol. 22, 2015, pages 755 - 763
MARGARIT, S.M.; SONDERMANN, H.; HALL, B.E.; NAGAR, B.; HOELZ, A.; PIRRUCCELLO, M.; BAR- SAGI, D.; KURIYAN, J.: "Structural evidence for feedback activation by Ras.GTP of the Ras-specific nucleotide exchange factor SOS", CELL, vol. 112, 2003, pages 685 - 695
MEIER, S.; GUTHE, S.; KIEFHABER, T.; GRZESIEK, S.: "Foldon, the natural trimerization domain of T4 fibritin, dissociates into a monomeric A-state form containing a stable beta-hairpin: atomic details of trimer dissociation and local beta-hairpin stability from residual dipolar couplings", J. MOL. BIOL, vol. 344, 2004, pages 1051 - 1069, XP055270296, DOI: doi:10.1016/j.jmb.2004.09.079
PARMEGGIANI, F.; HUANG, P.-S.; VOROBIEV, S.; XIAO, R.; PARK, K.; CAPRARI, S.; SU, M.; SEETHARAMAN, J.; MAO, L.; JANJUA, H.: "A general computational approach for repeat protein design", J. MOL. BIOL., vol. 427, 2015, pages 563 - 575, XP029189207, DOI: doi:10.1016/j.jmb.2014.11.005
ROWLING, P.J.; SIVERTSSSON, E.M.; PEREZ-RIBA, A.; MAIN, E. R.; ITZHAKI, L.S., BIOCHEM. SOC. TRANS., vol. 43, 2015, pages 881 - 888
TAMASKOVIC, R.; SIMON, STEFAN, N.; SCWHILL, PLUCKTHUN, A.: "Designed ankyrin repeat proteins (DARPins): From research to therapy", METHODS IN ENZYM., vol. 503, 2012, pages 101 - 134
THOMPSON ,D.B.; CRONICAN, J.J.; LIU, D.R.: "Engineering and identifying supercharged proteins for macromolecule delivery into mammalian cells", METHODS ENZYMOL., vol. 503, 2012, pages 293 - 319, XP055044980, DOI: doi:10.1016/B978-0-12-396962-0.00012-4
Attorney, Agent or Firm:
SUTCLIFFE, Nicholas et al. (Mewburn Ellis LLP, City Tower40 Basinghall Street, London Greater London EC2V 5DE, EC2V 5DE, GB)
Download PDF:
Claims:
Claims

1 . A modular binding protein comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more heterologous binding domains that bind to a target molecule, each said binding domain being located in an inter-repeat loop or at the N or C terminus of the modular binding protein. 2. A modular binding protein according to claim 1 wherein the repeat domains are helix- turn-helix repeat domains.

3. A modular binding protein according to claim 2 wherein the repeat domains are tetratricopeptide (TPR) repeat domains.

4. A modular binding protein according to claim 3 wherein the repeat domains have the amino acid sequence Y-X1X2X3X4;

wherein Y is an amino acid sequence shown in any of Tables 4 to 6 or a variant thereof and Xi, X2, X3, X4 are independently any amino acid.

5. A modular binding protein according to claim 3 or claim 4 wherein the repeat domains have the amino acid sequence;

6. A modular binding protein according to claim 4 or claim 5 wherein Xi is D.

7. A modular binding protein according to any one of claims 4 to 6 wherein X2 is P.

8. A modular binding protein according to any one of the preceding claims comprising 2-5 repeat domains.

9. A modular binding protein according to any one of the preceding claims wherein binding domains are located in one or more inter-repeat loops.

10. A modular binding protein according to claim 9 wherein the binding domains are connected to the inter-repeat loops by one or more additional residues.

1 1 . A modular binding protein according to claim 10 wherein the binding domains are connected to the inter-repeat loops by a linker.

12. A modular binding protein according to any one of claims 9 to 1 1 wherein binding domains are non-hydrophobic.

13. A modular binding protein according to any one of the preceding claims wherein a binding domain is located at the N terminus. 14. A modular binding protein according to claim 13 wherein the N terminal binding domain comprises an a helix.

15. A modular binding protein according to claim 13 or 14 wherein the N terminal binding domain comprises the sequence where residues denoted by X are independently any amino acid, Xi, and X2 are independently any amino acid and n is 0 or any number.

16. A modular binding protein according to claim 15 wherein Xi is D. 17. A modular binding protein according to claim 15 or claim 16 wherein X2 is P.

18. A modular binding protein according to any one of the preceding claims wherein binding domain is located at the C terminus. 19. A modular binding protein according to claim 18 wherein the C terminal binding domain comprises an a helix.

20. A modular binding protein according to claim 19 wherein the C terminal binding domain comprises the sequence where residues

denoted by X are independently any amino acid, Xi, and X2 are independently any amino acid and n is 0 or any number.

21. A modular binding protein according to claim 20 wherein Xi is D.

22. A modular binding protein according to claim 20 or claim 21 wherein X2 is P. 23. A modular binding protein according to any one of the preceding claims wherein the target molecule is a receptor, enzyme, antigen, polynucleotide, oligosaccharide, integral membrane protein, G protein coupled receptor (GPCR), transcription factor, transcriptional regulator or bromodomain protein. 24. A modular binding protein according to any one of the preceding claims wherein the target molecule is β-catenin, KRAS, tankyrase, c-myc, n-myc, ras, notch and aurora A, o synucleinJ β-amyloid, tau, superoxide dismutase, huntingtin, oncogenic histone deacetylase, or oncogenic histone methyltransferase. 25. A modular binding protein according to any one of the preceding claims wherein one or more binding domains bind to the same target molecule.

26. A modular binding protein according to any one of claims 1 to 25 comprising a binding domain having an amino acid sequence set out in Table 2 or Table 7.

27. A modular binding protein according to any one of claims 1 to 26 comprising an E3 ubiquitin ligase binding domain, optionally an E3 ubiquitin ligase binding domain having an amino acid sequence set out in Table 3. 28. A modular binding protein according to any one of claims 1 to 26 comprising a target- selective autophagy binding domain, optionally a heat shock cognate of 70kDa (Hsc70) binding domain.

29. A modular binding protein according to claim 28 comprising an hsc70 binding domain having the sequence KFERQ or a variant thereof.

30. A modular binding protein according to any one of claims 1 to 29 comprising a first binding domain that binds a first target molecule and a second binding domain that binds a second target molecule.

31 . A modular binding protein according to claim 30 wherein one of the first or second target molecules is an E3 ubiquitin ligase.

32. A modular binding protein according to claim 31 comprising an N terminal binding domain that binds a target protein and a C terminal binding domain that binds an E3 ubiquitin ligase.

33. A modular binding protein according to claim 31 comprising an inter-repeat binding domain that binds a target protein and a C terminal binding domain that binds an E3 ubiquitin ligase.

34. A modular binding protein according to claim 31 comprising an inter-repeat binding domain that binds a target protein and an N terminal binding domain that binds an E3 ubiquitin ligase. 35. A modular binding protein according to claim 31 comprising a C terminal domain that binds a target protein and an N terminal binding domain that binds an E3 ubiquitin ligase.

36. A modular binding protein according to claim 31 comprising an inter-repeat binding domain that binds an E3 ubiquitin ligase and an N terminal binding domain that binds a target protein.

37. A modular binding protein according to claim 31 comprising an inter-repeat binding domain that binds an E3 ubiquitin ligase and a C terminal binding domain that binds a target protein.

38. A modular binding protein according to claim 30 wherein one of the first or second target molecules is a component of a target-selective autophagy pathway, optionally heat shock cognate of 70kDa (Hsc70). 39. A modular binding protein according to claim 38 comprising an N terminal binding domain that binds a target protein and a C terminal binding domain that binds a component of a target-selective autophagy pathway.

40. A modular binding protein according to claim 38 comprising an inter-repeat binding domain that binds a target protein and a C terminal binding domain that binds a component of a target-selective autophagy pathway.

41 . A modular binding protein according to claim 38 comprising an inter-repeat binding domain that binds a target protein and an N terminal binding domain that binds a component of a target-selective autophagy pathway.

42. A modular binding protein according to claim 38 comprising a C terminal domain that binds a target protein and an N terminal binding domain that binds a component of a target- selective autophagy pathway.

43. A modular binding protein according to claim 38 comprising an inter-repeat binding domain that binds a component of a target-selective autophagy pathway and an N terminal binding domain that binds a target protein. 44. A modular binding protein according to claim 38 comprising an inter-repeat binding domain that binds a component of a target-selective autophagy pathway and a C terminal binding domain that binds a target protein.

45. A modular binding protein according to any one of the preceding claims further comprising a targeting domain, intracellular transfer domain, stabilising domain,

oligomerisation domain, cytotoxic agent, therapeutic agent and/or a detectable label.

46. A modular binding protein according to any one of the preceding claims comprising an amino acid sequence set out in Table 8.

47. A nucleic acid encoding a modular binding protein according to any one of claims 1 to 46.

48. An expression vector comprising a nucleic acid according to claim 47.

49. A host cell comprising an expression vector according to claim 48.

50. A method of producing a modular binding protein comprising expressing a nucleic acid according to claim 47 to produce the modular binding protein.

51 . A method of producing a modular binding protein comprising;

inserting a first nucleic acid encoding a binding domain into a second nucleic acid encoding two or more repeat domains linked by inter-repeat loops to produce a chimeric nucleic acid encoding a modular binding protein according to any one of claims 1 to 46; and expressing said chimeric nucleic acid to produce the modular binding protein.

52. A method of producing a modular binding protein that binds to a first target molecule and a second target molecule comprising;

providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops; and

incorporating into said nucleic acid a first nucleotide sequence encoding a first binding domain that binds to a first target molecule and a second nucleotide sequence encoding a second binding domain that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second binding domains, wherein said binding domains are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and

expressing the nucleic acid to produce said protein

53. A method according to claim 52 wherein one of the first or second target molecules is an E3 ubiquitin ligase.

54 A library comprising modular binding proteins, each modular binding protein in the library comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter- repeat loop or at the N or C terminus of the modular binding protein,

wherein at least one amino acid residue in the binding domains in said library is diverse.

55. A library according to claim 54 wherein each modular binding protein in the library comprises one binding domain.

56. A library comprising a first and a second sub-library of modular binding proteins, each modular binding protein in the first and second sub-libraries comprising;

(i) two or more repeat domains, (ii) inter-repeat loops linking said repeat domains; and

(iii) a binding domain comprising at least one diverse amino acid residue,

wherein the binding domain in the modular binding proteins in the first sub-library binds to a first target molecule and is located in one of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein, and

the binding domain in the modular binding proteins in the second sub-library binds to a second target molecule and is located in another of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein. 57. A library according to any one of claims 54-56 wherein the modular binding proteins are according to any one of claims 1 to 46.

58. A library according to any one of claims 54-57 wherein the library is displayed on the surface of particles

59. A method of screening a library comprising;

(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;

(i) two or more repeat domains

(ii) inter-repeat loops linking said repeat domains; and

(iii) a binding domain located in the inter-repeat loop, at the N terminus or at the C terminus of the protein.

wherein at least one amino acid residue in the binding domains in said library is diverse,

(b) screening the library for modular binding proteins which display a binding activity, and

(c) identifying one or more modular binding proteins in the library which display the binding activity. 60. A population of nucleic acids encoding a library according to any one of claims 54-58.

61 . A method of producing a library comprising expressing a population of nucleic acids according to claim 60. 62. A method of producing a library of modular binding proteins comprising;

(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising

(i) two or more repeat domains

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,

wherein the binding domains in said population are diverse, and (b) expressing said population of nucleic acids to produce the diverse population, thereby producing a library of modular binding proteins. 63. A pharmaceutical composition comprising a modular binding protein according to any one of claims 1 to 46, a nucleic acid according to claim 47 or a vector according to claim 48 and a pharmaceutically acceptable excipient.

64. A method of producing a pharmaceutical composition comprising formulating a modular binding protein according to any one of claims 1 to 46, a nucleic acid according to claim 47 or a vector according to claim 48 with a pharmaceutically acceptable excipient.

65. A population of liposomes comprising a modular binding protein according to any one of claims 1 to 46, a nucleic acid according to claim 47 or a vector according to claim 48.

66. A method of producing a population of liposomes comprising admixing a modular binding protein according to any one of claims 1 to 46, a nucleic acid according to claim 47 or a vector according to claim 48 with a lipid solution and evaporating said solution to produce liposomes encapsulating said modular binding protein, nucleic acid or vector.

67. A modular binding protein according to any one of claims 1 to 46, a nucleic acid according to claim 47 or a vector according to claim 48 for use in a method of diagnosis or treatment in human or animal subject. 68. A modular binding protein according to any one of claims 1 to 46 that binds to a target molecule, a nucleic acid according to claim 47 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 48 for use in the treatment of a disorder associated with the target molecule. 69. Use of a modular binding protein according to any one of claims 1 to 46 that binds to a target molecule, a nucleic acid according to claim 47 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 48 in the manufacture of a medicament for use in the treatment of a disorder associated with the target molecule.

70. A method of treatment of a disorder associated with the target molecule comprising; administering a modular binding protein according to any one of claims 1 to 46 that binds to the target molecule, a nucleic acid according to claim 47 that encodes a modular binding protein that binds to a target molecule or a vector according to claim 48 comprising said nucleic acid to an individual in need thereof.

71 . A modular binding protein, nucleic acid or vector for use according to claim 68, use according to claim 69 or method according to claim 70 wherein the disorder is an

inflammatory disorder, a neurodegenerative disease, an angiogenic disorder, a bone loss disorder or cancer.

72. A modular binding protein, nucleic acid or vector for use, use or method according to claim 71 wherein the cancer is breast, ovarian, colorectal, gastrointestinal, pancreatic, prostate, thyroid, lung, hepatocellular carcinoma, oesophageal, multiple myeloma, leukemia, T-cell lymphoma, neuroblastoma, glioblastoma multiforme, pleural mesothelioma, myelogenous leukemia (CML), acute lymphoblastic leukemia (ALL) or acute myelogenous leukemia (AML).

73. A modular binding protein, nucleic acid or vector for use, use or method according to claim 71 wherein the neurodegenerative disease is Alzheimer's disease, Huntington's disease, Parkinson's disease, multiple sclerosis or amylotrophic lateral sclerosis. 74. A modular binding protein, nucleic acid or vector for use, use or method according to claim 71 wherein the inflammatory disorder is an autoimmune disease.

75. A modular binding protein, nucleic acid or vector for use, use or method according any one of claims 68 to 74 wherein the modular binding protein, nucleic acid or vector is encapsulated in a liposome.

Description:
Modular Binding Proteins

Field

This invention relates to modular binding proteins and their production and uses.

Background

A priority area in medicine, particularly cancer research, is the expansion of the 'druggable' proteome, which is currently limited to narrow classes of molecular targets. For example, protein-protein interactions (PPIs) are fundamental to all biological processes and represent a large proportion of potential drug targets, but they are not readily amenable to conventional small molecule inhibition. The architecture of tandem repeat proteins has tremendous scope for rational design (Kobe & Kajava 2000, Longo & Blaber, 2014, Rowling et al., 2015). The key features of tandem repeat proteins are relatively small size, modularity and extremely high stability (and therefore recombinant production) without the need of disulphide bonds. Individual consensus-designed repeats are self-compatible and can be put together in any order; function is therefore also modular, which means that multiple functions can be independently designed and incorporated in a combinatorial fashion within a single molecule (WO2017106728).

Novel repeat protein functions, e.g. DARPins (Tamaskovic et al., 2012), have been developed based on the natural type of PPI interface of these proteins i.e. spanning many repeat units to create an extended, high-affinity binding interface for the target. Mutations have been introduced into the surface residues in the tetratricopeptide (TPR) repeats of the cytosolic receptor peroxin 5 (Sampathkumar et al. (2008) J. Mol. Biol., 381 , 867-880). Binding of peptide ligands to peroxin 5 is shown to be mediated by residues located in several different TPR repeats. The interactions of TPR containing protein kinesin-1 with different cargo proteins has also been reported (Zhu et al PLoS One 2012 7 3 e33943). The specificity and stability of ankyrin repeat proteins has been modified through the introduction of mutations into ankyrin repeat sequences (Li et al (2006) Biochemistry 45 15168-15178).

Summary

The present inventors have found that modular proteins capable of binding to one or more target molecules can be generated by displaying peptidyl binding motifs, such as short linear motifs (SLiMs), on modular scaffolds. These modular binding proteins may be useful for example, as single- or multi-function protein therapeutics.

An aspect of the invention provides a modular binding protein comprising;

(i) two or more repeat domains, (ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter- repeat loop or at the N or C terminus of the modular binding protein. In some preferred embodiments, the modular binding protein may comprise a first binding domain that binds a first target molecule and a second binding domain that binds a second target molecule. One of the first or second target molecules may be an E3 ubiquitin ligase.

Another aspect of the invention provides a method of producing a modular binding protein comprising;

inserting a first nucleic acid encoding a binding domain into a second nucleic acid encoding two or more repeat domains linked by inter-repeat loops, to produce a chimeric nucleic acid encoding a modular binding protein as described herein; and

expressing said chimeric nucleic acid to produce the modular binding protein.

Another aspect of the invention provides a method of producing a modular binding protein that binds to a first target molecule and a second target molecule comprising;

a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, and incorporating into said nucleic acid a first nucleotide sequence encoding a first binding domain that binds to a first target molecule and a second nucleotide sequence encoding a second binding domain that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second binding domains, wherein said binding domains are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and

expressing the nucleic acid to produce said protein.

In some preferred embodiments, one of the first or second target molecules is an E3 ubiquitin ligase. Another aspect of the invention provides a library comprising modular binding proteins, each modular binding protein in the library comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter- repeat loop or at the N or C terminus of the modular binding protein,

wherein at least one amino acid residue in the binding domains in said library is diverse. Another aspect of the invention provides a library comprising a first and a second sub-library of modular binding proteins, each modular binding protein in the first and second sub- libraries comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) a binding domain comprising at least one diverse amino acid residue, wherein the binding domain in the modular binding proteins in the first sub-library binds to a first target molecule and is located in one of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein, and

the binding domain in the modular binding proteins in the second sub-library binds to a second target molecule and is located in another of (i) an inter-repeat loop; (ii) the N terminus or (iii) the C terminus of the modular binding protein. Another aspect of the invention provides a method of producing a library of modular binding proteins comprising;

(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter-repeat loop or at the N or C terminus of the modular binding protein,

wherein the binding domains in said population are diverse, and

(b) expressing said population of nucleic acids to produce the diverse population, thereby producing a library of modular binding proteins.

Another aspect of the invention provides a method of screening a library comprising;

(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) a binding domain located in the inter-repeat loop, at the N terminus or at the C terminus of the protein.

wherein at least one amino acid residue in the binding domains in said library is diverse,

(b) screening the library for modular binding proteins which display a binding activity, (c) identifying one or more modular binding proteins in the library which display the binding activity.

Other aspects and embodiments of the invention are described in more detail below.

Brief Description of Figures

Figure 1 shows the thermostability of consensus-designed tetratricopeptide (CTPR) proteins containing loop- or helix-grafted binding motifs: Thermal denaturation, monitored by circular dichroism, of 2-repeat RTPR (a CTPR in which lysine residues have been replaced with arginine residues) proteins: RTPR2 (in diamonds), RTPR2 containing a loop binding-module (circles) and RTPR2 containing a helix binding-module (squares). All samples are at 20 μΜ in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.

Figure 2 shows the thermostability of CTPR proteins of increasing length containing an increasing number of binding modules (alternating with blank modules): Thermal denaturation curves, monitored by circular dichroism, of TPR proteins containing 1 , 2, 3 and 4 loops comprising a tankyrase-binding sequence: 1TBP-CTPR2, 2TBP-CTPR4, 3TBP- CTPR6, 4TBP-CTPR8. All samples are at 20 μΜ in 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI.

Figure 3 shows an example of helix grafting. Figure 3A (i) shows the crystal structures of SOS1 (son-of-sevenless homologue 1 ) bound to KRAS (Kirsten rat sarcoma) (PDB 1 NVU, Margarit et al. Cell (2003) 1 12(5):685-95), and (ii) shows the SOS1 helix grafted onto a helix at the N-terminus of a CTPR2 protein. The modelled structure of SOS-RTPR2 is shown, and the sequence of the helix is given with the key KRAS-binding residues in grey and the residues that form the interface with the CTPR helices in black, (iii) shows the modelled structure of SOS-TPR2 in complex with KRAS. Figure 3B shows binding of SOS-TPR2 to KRAS measured by competitive fluorescence polarization (FP). The complex between mant- GTP and KRAS was pre-formed, and 0.1 -300 μΜ SOS-RTPR2 was then titrated in to the complex, displacing the mant-GTP from KRAS resulting in a decrease in FP. EC50 is 3 μΜ.

Figure 4 shows another example of helix grafting. Figure 4A shows the modelled structure of the Mdm2 (Mouse double minute 2 homolog) N-terminal domain in complex with the p53- TPR2 comprising the Mdm2-binding helix of p53 grafted onto a helix at the C-terminus of a CTPR2 protein. Figure 4B shows an ITC analysis of the interaction between p53-TPR2 and Mdm2 N-terminal domain. The N-terminal domain of Mdm2 was titrated into the cell containing 10μΜ p53-TPR2. Figure 5 shows an example of single and multivalent loop-grafted CTPRs. Figure 5A shows an ITC analysis of the interaction between a series of tankyrase-binding loop-grafted CTPR2 proteins (TBP-CTPR2) and the substrate-binding ARC4 (ankyrin-repeat cluster) domain of tankyrase. There is an enhancement of both binding affinity and dissociation constant with increasing number of binding modules. Figure 5B shows native gel analysis (using a native gel in Tris-Glycine buffer pH 8.0, 40 μΜ protein concentration) of multivalent TBP-CTPR proteins expressed as fusion constructs with the foldon trimerisation domain (Boudko et al 2002; Meier et al. 2004). 1 TBP-CTPR2, 2TBP-CTPR4 and 4TBP-CTPR8 (all lacking the foldon domain) were purified and run as monomeric controls. Constructs having the foldon domain run at much higher molecular weights than their monomeric counterparts.

Figure 6 shows an example of loop-grafted CTPRs comprising the 10-residue Skp2-binding sequence derived from p27 grafted into a loop of a CTPR protein (CTPR-p27).

Figure 6A shows that HA-CTPR2-p27 is able to co-IP FLAG-Skp2 from HEK293T cells.

Figure 6B shows £. co//-expressed and purified TPR5-p27 inhibits p27 ubiquitination in vitro.

Figure 7 shows another example of loop-grafted CTPRs. Figure 7A shows (left) ITC analysis of the interaction between the Keapl (Kelch-like ECH-associated protein 1 ) KELCH domain and a CTPR2 protein containing a loop-grafted Keapl -binding sequence derived from the protein Nrf2 (Nuclear factor (erythroid-derived 2)-like 2) (Nrf-CTPR2). No binding is observed for the blank CPTR2 protein (right). Figure 7B shows that three variants of Nrf-CTPR2 (Nrf- CTPR2 (i), Nrf-CTPR2 (ii), Nrf-CTPR2 (iii) can co-IP Keapl from HEK293T cells. Figure 8 shows live-cell imaging of intracellular delivery of an RTPR achieved by resurfacing (by introducing Arginine residues at surface sites). PC3 (left) and U20S (right) cells incubated with 10 μΜ FITC-labelled resurfaced TBP-RTPR2 for 3 hours at 37°C, 5% C0 2 . Overlay of DIC (differential interference contrast) and confocal image. Intracellular fluorescence was also observed at lower concentrations of protein.

Figure 9 shows the induced degradation of the target protein beta-catenin by designed hetero-bifunctional RTPRs. Figure 9A shows the beta-catenin levels in cells transfected with either HA-tagged beta-catenin plasmid alone or HA-tagged beta-catenin plasmid together with one of two different hetero-bifunctional RTPR plasmids (LRH1 -TPR-p27 and axin-TPR- p27, designed to bind simultaneously to beta-catenin and to E3 ligase SCF Skp2 ). Figure 9B shows a quantitative analysis of the beta-catenin levels in the presence of different hetero- bifunctional RTPRs designed to bind simultaneously to beta-catenin and to either E3 ligase or Mdm2. The analysis was performed using densitometry of the bands

detected by Western blots corresponding to HA-tagged beta-catenin normalised to actin bands using ImageJ. Negative controls used were single-function TPRs or blank (nonfunctional) TPRs.

Figure 10 shows examples of different modular binding protein formats. A modular binding protein may comprise: two repeat domains with a helical target-binding peptide and a helical E3-binding peptide at the N and C termini (Figure 10A); three repeat domains with a helical E3-binding peptide at the C terminus and a target binding domain in the first inter-repeat loop from the N terminus (Figure 10B); three repeat domains with a helical target-binding peptide at the N terminus and an E3 binding domain in the second inter-repeat loop from the N terminus (Figure 10C), four repeat domains with a target-binding domain and an E3 binding domain in the first and third inter-repeat loop from the N terminus (Figure 10D). Figure 1 1 shows a schematic of a modular binding protein with four binding domains located in alternate inter-repeat loops. The binding sites are arrayed at 90° to each other.

Figure 12 shows a schematic of a modular binding protein engineered so that binding domains in alternate inter-repeat loops bind adjacent epitopes on the target.

Figure 13 shows the modelled structure of a hetero-bifunctional modular binding protein comprising TPR repeat domains, an LRH1 -derived binding domain designed to bind target beta-catenin, and a p53-derived N-terminal binding domain designed to bind to the E3 ubiquitin ligase mdm2.

Figure 14 shows a schematic of the combinatorial assembly of a module comprising a repeat domain and a terminal helical binding domain and a module comprising repeat domains and an inter-repeat loop binding domain to generate a modular binding protein. Figure 15 shows examples of different modular binding protein formats, (i) shows the blank proteins; (ii) shows binding peptides inserted into one or more inter-repeat loops, (iii) shows helical binding peptides at one or both of the termini; (iv) is a combination of loop and helical binding peptides; (v) and (vi) show examples of how multivalency can be achieved. Figure 16 shows a schematic of the assembly of a modular binding protein by the

progressive screening of modular binding proteins comprising modules with a diverse binding domain in addition to modules already identified in previous rounds of screening. Figure 17 shows the effect of designed multi-valent tankyrase-binding TPR proteins on Wnt signalling. HEK293T cells were transfected with TPR-encoding plasmids using

Lipofectamine2000. The TPR proteins contained 1-4 copies of a tankyrase-binding peptide (TBP) grafted onto the inter-repeat loop(s). For example, 2TBP-CTPR4 is a protein comprising 4 TPR modules with one TBP grafted onto the loop between the first and second TPR and one between the third and fourth TPR. 'Foldon' indicates a trimeric TPR-foldon fusion protein. Figure 18 shows characterisation of the size and charge of liposome-encapsulated TPR proteins.

Figure 19 shows the delivery of TPR proteins into cells by liposome encapsulation. FITC dye-labelled liposomes stain the cell membrane upon membrane fusion (red panel), and RITC-labelled TPR protein cargo is then delivered into the cytoplasm. The green panel and red-green merge show that the proteins have entered the cells and are spread diffusely in the cytoplasm.

Figure 20 shows that liposome-encapsulated TPR proteins are not toxic to HEK293T cells at the concentrations used.

Figure 21 shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on Wnt signalling. The TPR proteins contained a tankyrase-binding peptide and a SCF Skp2 -binding peptide to direct tankyrase for ubiquitination and subsequent degradation. Cells were treated with liposomes for 2 hr.

Figure 22 shows the effect of designed hetero-bifunctional TPR proteins (delivered by liposome encapsulation) on Wnt signalling. The TPR proteins contained a beta-catenin- binding peptide and a SCF Skp2 -binding peptide to direct beta-catenin for ubiquitination and subsequent degradation. Cells were treated with liposomes encapsulating 32 μg protein for variable times (2-8 h) indicated in the figure.

Figure 23 shows the effect of designed hetero-bifunctional TPR proteins on KRAS levels in HEK 293T cells. The TPR proteins contained a binding sequence for KRAS (a non-helical peptide sequence, referred to as KBL, grafted onto an inter-repeat loop of the RTPR) and a degron derived from p27 grafted onto another inter-repeat loop. Cells were transiently transfected with 50 ng or 500ng of TPR encoding plasmids, as indicated, and with KRAS plasmid or empty vector as control. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. In dark grey are cells treated transfected with single-function TPR plasmid (containing degron only).

Figure 24 shows the effect of hetero-bifunctional TPR proteins targeting endogenous KRAS to the CMA (chaperone-mediated autophagy) pathway. The TPR proteins contained a binding sequence for KRAS (either a grafted helix derived from son-of-sevenless-homolog 1 (SOS) or a non-helical peptide sequence (referred to as 'KBL') displayed in a loop of the RTPR) and targeted for degradation using two different chaperone-mediated autophagy peptides (referred to as 'CMA_Q' or 'CMA_K') at the N- or C-terminus of the construct.

Constructs or empty vector (light grey) were transiently transfected into either HEK293T or DLD1 (colorectal cancer cell line). 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. Those constructs that resulted in significant reduction in KRAS compared to the empty vector control are shown in white.

Figure 25 shows examples of variations in the linker sequence connecting a binding domain to an inter-repeat loop in order to optimise the binding affinity for the target. The example shown is Nrf-TPR, a TPR protein designed to bind to Keapl (see Fig. 7 of the original patent application). Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled 'Flexible') when compared with the consensus-like linker sequence. Altering the charge content of the linker sequence ('labelled 'Charged') and altering the conformational properties (based on the predictions of the program CIDER (Holehouse et al. Biophys. J. 1 12, 16-21 (2017)) of the loop by changing the amino acid composition of the linker sequence (labelled 'CIDER-optimised') also affected the Keapl -binding affinity.

Detailed Description

This invention relates to the modular binding proteins that comprise multiple repeat domains. These repeat domains are linked to each other in the polypeptide chain by inter-repeat loops. One or more binding domains, are located in one or more of the inter-repeat loops and/or in N or C terminal helices of the modular binding protein. The binding domains may be to the same or different target molecules and the modular binding protein may be multi- functional and/or multi-valent. The geometrical display of the grafted binding sites may be precisely and predictably tuned by adjusting the positions of the binding sites and the number and shape of the repeat domains. Modular binding proteins as described herein may be useful in a range of therapeutic and diagnostic applications.

A repeat domain is a repetitive structural element of 30 to 100 amino acids that forms a defined secondary structure. Multiple repeat domains stack sequentially in a modular fashion to form a stable protein, which may for example have a solenoid or toroid structure. Repeat domains may be synthetic or may be naturally-occurring repeats from tandem repeat proteins, or variants thereof.

A repeat domain may have the structure of a solenoid repeat. The structures of solenoid repeats are well known in the art (see for example Kobe & Kajava Trends in Biochemical Sciences 2000; 25(10):509-15). For example, A repeat domain may have an α/α or α/3ιο (helix-turn-helix or hth) structure, for example a tetratricopeptide repeat structure; α/α/α (helix-turn-helix-turn-helix or hthth) structure, for example an armadillo repeat structure; a β/β/α/α structure; a α/β or 3ιο/β structure, for example a leucine rich repeat (LRR) structure; a β/β/β structure, for example, an IGF1 RL, HPR or PeIC repeat structure; or a β/β structure, for example a serralysin or EGF repeat structure.

Suitable repeat domains may include domains of the Ankyrin clan (Pfam: CL0465), such as ankyrin (PF00023), which may comprise a 30-34 amino-acid repeat composed of two beta strands and two alpha helices; domains of the leucine-rich repeat (LRR) clan (Pfam;

CL0022), such as LRR1 (PF00560), which may comprise a 20-30 amino acid repeat composed of an α/β horseshoe fold; domains of the Pec Lyase-like (CL0268) clan, such as pec lyase C (PF00544), which may comprise a right handed beta helix; domains of the beta- Roll (CL0592) clan such as Haemolysin-type calcium-binding repeat (PF000353), which may comprise short repeat units (e.g. 9-mers) that form a beta-roll made up of a super-helix of beta-strand-turns of two short strands each, stabilised by Ca 2+ ions; domains of the PSI clan (CL0630), such as trefoil (PF00088); and domains of the tetratricopeptide clan (CL0020), such as TPR-1 (PR00515), which may comprise a 24 to 90 amino acid repeat composed of a helix-turn-helix.

Suitable repeat domains may be identified using the PFAM database (see for example Finn et al Nucleic Acids Research (2016) Database Issue 44:D279-D285).

In some preferred embodiments, the repeat domain may have the structure of an α/α- solenoid repeat domain, such as a helix-turn-helix. A helix-turn-helix domain comprises two antiparallel ohelices of 12-45 amino acids. Suitable helix-turn-helix domains include tetratricopeptide-like repeat domains. Tetratricopeptide-like repeats may include domains of the TPR clan (CL0020), for example and Arm domains (see for example Armadillo; PF00514; Huber et al Cell 1997;90: 871 -882), HEAT domains (Huntingtin, EF3, PP2A, TOR1 ; PF02985; see for example Groves et al . Cell. 96 (1 ): 99-1 10), PPR domains (pentatricopeptide repeat PF01535; see for example Small (2000) Trends Biochem. Sci. 25 (2): 46-7), TALE domains (TAL (transcription activator-like) effector; PF03377; see for example Zhang et al Nature Biotechnology. 29 (2): 149-53) and TPR1 domains (tetratricopeptide repeat-1 ; PF00515; see for example Blatch et al BioEssays. 21 (1 1 ): 932-9).

Other suitable helix-turn-helix domain may be synthetic, for example DHR1 to DHR83 as disclosed in Brunette et al., Nature 2015 528 580-584.

In some preferred embodiments, the helix-turn-helix scaffold may be a tetratricopeptide repeat domain (TPR) (D'Andrea & Regan, 2003) or a variant thereof. TPR repeat domains may include naturally occurring or synthetic TPR domains. Suitable TPR repeat domains are well known in the art (see for example Parmeggiani et al., J. Mol. Biol. 427 563-575) and may have the amino acid sequence:

wherein Xi- 4 are independently any amino acid, preferably Xi and X2 being D and P respectively, or may be a variant of this sequence. Other TPR repeat domain sequences are shown in Tables 4-6 below.

Preferred TPR domains may include CTPR, RTPRa, RTPRb and KTPRb domains, for example a domain having a sequence shown in Table 4 or Table 6 or a variant of a sequence shown in Table 4 or Table 6.

In some embodiments, a TPR repeat domain may be a human TPR repeat domain, preferably a TPR repeat domain from a human protein in blood. TPR repeat domains from human blood may have reduced immunogenicity in vivo. Suitable human blood TPR repeat domains may include repeat domains from IFIT1 , IFIT2 or IFIT3. Other examples of human blood repeat domains identified in the plasma proteome database are shown in Table 5.

Suitable human blood repeat domains may be identified from the plasma proteome database (Nanjappa et al Nucl Acids Res 2014 Jan;42(Database issue):D959-65) for example by searching for sequences with high sequence identity to the TPR repeat domain using standard sequence analysis tools (e.g. Altschul et al Nucleic Acids Res. 25:3389-34021; Altschul et al FEBS J. 272:5101 -5109).

A variant of a reference repeat domain or binding site sequence set out herein may comprise an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% sequence identity to the reference sequence.

Particular amino acid sequence variants may differ from a repeat domain shown above by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more than 10 amino acids. Preferred variants of a TPR repeat domain may comprise one or more conserved residues, for example, 1 , 2, 3, 4, 5, 6 or more preferably all of Leu at position 7, Gly or Ala at position 8, Tyr at position 1 1 , Ala at position 20, Ala at position 27, Leu or lie at positions 28 and 30 and Pro at position 32.

Sequence similarity and identity are commonly defined with reference to the algorithm GAP (Wisconsin Package, Accelerys, San Diego USA). GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, default parameters are used, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al. (1990) J. Mol. Biol. 215: 405-410), FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith-Waterman algorithm (Smith and Waterman (1981 ) J. Mol Biol. 147: 195-197), or the TBLASTN program, of Altschul et al. (1990) supra, generally employing default parameters. In particular, the psi-Blast algorithm (Nucl. Acids Res. (1997) 25 3389-3402) may be used. Sequence comparison may be made over the full-length of the relevant sequence described herein.

For example, a repeat domain may comprise one or more point mutations to facilitate grafting of hydrophobic binding domains. For example, aromatic residues in the repeat domain may be substituted for polar or charged residues. Suitable substitutions may be identified in a rational manner, for example using Hidden Markov plots of repeat domain sequences to identify non-aromatic residues that are found in nature in consensus aromatic positions. A suitable TPR repeat domain for grafting hydrophobic binding domain may have the amino acid sequence:

AEAWYN LG N AYYRQG DYQ RAI EYYQ RAL E L-X1X2 X3X4,

wherein Xi- 4 are independently any amino acid, preferably Xi and X2 being D and P respectively. In some embodiments, lysine residues in the repeat domain may be replaced by arginine residues to prevent ubiquitination and subsequent degradation. This may be particularly useful when the modular binding protein comprises an E3 ubiquitin ligase-binding domain, for example in a proteolysis targeting chimera (PROTAC). For example, a suitable TPR repeat domain may have the amino acid sequence:

wherein X 1-4 are independently any amino acid, preferably Xi and X2 being D and P respectively.

In preferred embodiments, the modular binding protein may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more than 10 repeat domains. Preferably, the modular binding protein comprises 2 to 5 repeat domains. Modular binding proteins with fewer repeat domains may display increased cell penetration. For example, a modular binding protein with 2-3 repeat domains may be useful in binding intracellular target molecule. Modular binding proteins with more repeat domains may display increased stability and functionality. For example, a modular binding protein with 4 or more repeat domains may be useful in binding extracellular target molecules. A modular binding protein with 6 or more repeat domains may be useful in producing long linear molecules for targeting or assembling extracellular complexes in bi- or multivalent formats.

In other embodiments, sufficient stability and functionality may be conferred by a single repeat domain with N and C terminal binding domains. For example, a modular binding protein may comprise:

(i) a repeat domain, and

(ii) binding domains at the N and C terminal of the repeat domain.

The repeat domains of a modular binding protein may lack binding activity i.e. the binding activity of the modular binding protein is mediated by the binding domains and not by residues within the repeat domains.

A binding domain is a contiguous amino acid sequence that specifically binds to a target molecule. Suitable binding domains that are capable of grafting onto a terminal helix or inter- repeat loop are well-known in the art and include peptide sequences selected from a library, antigen epitopes, natural protein-protein interactions (helical, extended or turn-like) and short linear motifs (SLiMs). Viral SLiMs (that hijack the host machinery) may be particularly useful because they may display high binding affinities (Davey et al (201 1 ) Trends Biochem. Sci. 36,159-169).

A suitable binding domain for a target molecule may be selected from a library, for example using phage or ribosome display, or identified or designed using rational approaches or computational design, for example using the crystal structure of a complex or an interaction. In some embodiments, binding domains may be identified in an amino acid sequence using standard sequence analysis tools (e.g. Davey et al Nucleic Acids Res. 201 1 Jul 1 ; 39(Web Server issue): W56-W60). Binding domains may be 5 to 25 amino acids in length, preferably 8 to 15 amino acids, although in some embodiments, longer binding domains may be employed.

The binding domains and the repeat domains of the modular binding protein are

heterologous i.e. the binding domain is not associated with the repeat domain in naturally occurring proteins and the binding and repeat domains are artificially associated in the modular binding protein by recombinant means.

A modular binding protein described herein may comprise 1 to n+1 binding domains, where n is the number of repeat domains in the modular binding protein. The number of binding domains is determined by the required functionality and valency of the modular binding protein. For example, one binding domain may be suitable for a mono-functional modular binding protein and two or more binding domains may be suitable for a bi-functional or multifunctional modular binding protein. Modular binding proteins may be monovalent. A target molecule may be bound by a single binding domain in a monovalent modular binding protein. Modular binding proteins may be multivalent. A target molecule may be bound by two or more of the same or different binding domains in a multivalent modular binding protein. Modular binding proteins may be monospecific. The binding domains in a monospecific modular binding protein may all bind to the same target molecule, more preferably the same site or epitope of the target molecule.

Modular binding proteins may be multi-specific. The binding domains in a multi-specific modular binding protein may bind to different target molecules. For example, a bi-specific modular binding protein may comprise one or more binding domains that bind to a first target molecule and one or more binding domains that bind to a second target molecule and a tri- specific modular binding protein may comprise one or more binding domains that bind to a first target molecule, one or more binding domains that bind to a second target molecule and one or more binding domains that bind to third target molecule. A bi-specific modular binding protein may bind to the two different target molecules concurrently. This may be useful in bringing the first and second target molecules into close proximity. When the target molecules are located on different cells, concurrent binding of the target molecules to the modular binding protein may bring the cells into close proximity, for example to promote or enhance the interaction of the cells. For example, a modular binding protein which binds to a tumour specific antigen and a T cell antigen, such as CD3, may be useful in bringing T cells into proximity to tumour cells. When the target molecules are from different biological pathways, this may be may be useful in achieving synergistic effects and also for minimising resistance. A tri-specific modular binding protein may bind to three different target molecules

concurrently. In some embodiments, one of the target molecules may be an E3 ubiquitin ligase. For example, tri-specific modular binding protein may binding to a first target molecule from a first biological pathway and a second target molecule from a second biological pathway as well as an E3 ubiquitin ligase. This may be useful in achieving synergistic effects and also for minimising resistance.

A binding domain may be located in an inter-repeat loop of the modular binding protein.

An inter-repeat binding domain may comprise 5 to 25 amino acid residues, preferably 8 to 15 amino acids. However, since there is no intrinsic restriction on the size of the inter-loop binding domain, longer sequences of more than 25 amino acid residues may be used in some embodiments.

In some embodiments, an unstructured binding domain may be inserted into an inter-repeat loop.

One or more, two or more, three or more, four or more or five or more of the inter-repeat loops in the modular binding protein may comprise binding domains. The binding domains may be located on consecutive inter-repeat loops or may have a different distribution in the inter-repeat loops of the modular binding protein. For example, inter-repeat loops comprising a binding domain may be separated in the modular protein by one or more, two or more, three or more or four or more inter-repeat loops which lack a binding domain. A binding domain may be connected to an inter-repeat loop directly or via one or more additional residues or linkers. Additional residues or linkers may be useful for example when a binding domain requires conformational flexibility in order to bind to a target molecule, or when the amino acid residues that are adjacent to the minimal binding domain favourably influence the micro-environment of the binding interface.

Additional residues or linkers may be positioned at the N terminus of the binding domain, the C terminus of the binding domain, or both. For example, the sequence of an inter-repeat loop containing a binding domain may be [Xi-/]-[Xi -n ]-[Xi-z], where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue that is also denoted by X, [Xi- n ] is the binding domain, n is 1 to 100, [Xi-,] is a linker and / is independently any number between 1 to 10. In some embodiments, D may be preferred at the first position of the linker [Xi-,], P may be preferred at the second position of linker [Xi-,], D may be preferred at the last position of the linker [Xi -Z ] and/or P may be preferred at the penultimate position of linker [Xi -Z ]. Examples of preferred inter-repeat loop sequences may include

The precise sequence of the residues or linkers used to connect a binding domain to an inter-repeat loop depends on the binding domain and may be readily determined for any binding domain of interest using standard techniques. For example, small, non-hydrophobic amino acids, such as glycine, may be used to provide flexibility and increased spatial sampling, for example when a binding domain needs to adopt a specific conformation, or proline residues may be used to increase rigidity, for example, when the binding domains are short. In some preferred embodiments, an inter-repeat binding domain may be non-hydrophobic. For example, at least 40% of the amino acids in the binding domain may be charged (e.g. D, E, R or K) or polar (e.g. Q, N , H, T, Y, C or W). Alternatively, the repeat domains may be modified to accommodate a hydrophobic binding domain, for example by replacing aromatic residues with charged or polar residues.

A binding domain may be located at one or both termini of the modular binding protein. In some embodiments, a binding domain located at the N or C terminus may comprise an o helical structure and may comprise all or part of a half-repeat (i.e. all or part of a single o helix) that stacks against the adjacent repeat domain. The a-helix of the terminal binding domain makes stabilising interactions with the adjacent repeat domain and is stable and folded. Only a few of the positions that structurally define an a-helix are required for the correct interfacial interaction with the adjacent repeat domain. The residues in some of these positions are defined for the N-terminal a-helix and

for the C-terminal helix), but the remaining positions of the a-helix may be modified to form a helical binding domain.

A helical binding domain may be located at the N terminus of the protein. The N terminal binding domain may be helical and may comprise all or part of the sequence

preferably all or part of the sequence where each residue denoted by X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is independently any amino acid, preferably D, and X2 is independently any amino acid, preferably P, and n is 0 or any number. In some embodiments, the Y, I, and/or L residues in the N terminal binding domain may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).

A helical binding domain may be located at the C terminus of the protein. The C terminal binding domain may be helical and may comprise all or part of the sequence Χ η -(Χ)ΐ5- preferably all or part of the sequence

where X is independently any amino acid and may be the same amino acid or a different amino acid to any other residue in the sequence that is also denoted by X, Xi is

independently any amino acid, preferably D, and X 2 is independently any amino acid, preferably P, and n is 0 or any number. In some embodiments, the A, L and/or V residues in the C terminal binding domain may be substituted for an amino acid residue with similar properties (i.e. a conservative substitution).

The minimum length of the terminal binding domain is determined by the number of residues required to form a helix that binds to the target molecule. There is no intrinsic maximum length of the terminal binding domain and n may be any number. In other embodiments, a binding domain located at the N or C terminus may comprise a non- helical structure. For example, a binding domain that is an obligate N- or C- terminal domain (for example because the terminal amino or carboxylate group mediates the binding interaction) may be located at the beginning or end of the one or more repeat domains.

In some embodiments, one or more positions in a binding domain may be diverse or randomised. A modular binding protein comprising one or more diverse or randomised residues may form a library as described below.

In some embodiments, the N and C terminal binding domains may be non-hydrophobic. For example, at least 20% of the amino acids in the binding domain may be charged (e.g. D, E, R or K) or polar (e.g. Q, N, H, T, Y, C or W). Alternatively, the helix turn helix scaffold of the repeat domains may be modified, for example by replacing aromatic residues with charged or polar residues in order to accommodate a hydrophobic binding domain.

A modular binding protein as described herein may comprise binding domains in any arrangement or combination. For example, binding domains may be located at both the N and C terminus and optionally one or more inter-repeat loops of a modular binding protein; at the N terminus and optionally one or more loops of a modular binding protein; at the C terminus and optionally one or more loops of a modular binding protein; or in one or more inter-repeat loops of a modular binding protein. The location of the binding domains within a modular binding protein may be determined by rational design, for example using modelling to identify the optimal arrangement for the presentation of two target molecules to each other (e.g. for substrate presentation to an E3 ubiquitin ligase); and/or by screening for example using populations of modular binding proteins with different arrangements of binding domains to identify the arrangement which confers the optimal interaction of target molecules.

Suitable target molecules for modular binding proteins described herein include biological macromolecules, such as proteins. The target molecule may be a receptor, enzyme, antigen, oligosaccharide, oligonucleotide, integral membrane protein, transcription factor, transcriptional regulator, G protein coupled receptor (GPCR) or any other target of interest. Proteins that are difficult to target with small molecules, such as PPIs, proteins that accumulate in neurodegenerative diseases and proteins overexpressed in disease conditions, such as cancer, may be particularly suitable target molecules. Target molecules may include a-synuclein; β-amyloid; tau; superoxide dismutase; huntingtin; β-catenin; KRAS; components of superenhancers and other types of transcriptional regulators, such as N-Myc, C-Myc, Notch, aurora A, EWS-FLI 1 (Ewing ' s sarcoma-friend leukemia integration 1 ), TEL- AMU , TAL1 (T-cell acute lymphocytic leukemia protein 1 ) and Sox2 ((sex determining region Y)-box 2); tankyrases; phosphatases such as PP2A; epigenetic writers, readers and erasers, such as histone deacetyiases and histone methyltransferases; BRD4 and other bromodomain proteins; and kinases, such as PLK1 (polo-iike kinase 1 ), c-ABL (Abeison murine leukemia viral oncogene homolog 1 ) and BCR (breakpoint cluster region)-ABL.

In some embodiments, a modular binding protein may neutralise a biological activity of the target molecule, for example by inhibiting or antagonising its activity or binding to another molecule or by tagging it for ubiquitination and proteasomal degradation or for degradation via autophagy. In other embodiments, a modular binding protein may activate a biological activity of the target molecule.

In some embodiments, the target molecule may be β-catenin. Suitable binding domains that specifically bind to β-catenin are well-known in the art and include β-catenin-binding domains derived from axin (e.g. GAYPEYILDIHVYRVQLEL and variants thereof), Bcl-9 (e.g. SQEQLEHRYRSLITLYDIQLML and variants thereof), TCF7L2 (e.g.

QELGDNDELMHFSYESTQD and variants thereof), ICAT (e.g. YAYQRAIVEYMLRLMS and variants thereof), LRH-1 (e.g. YEQAIAAYLDALMC and variants thereof) or APC (e.g.

SCSEELEALEALELDE and variants thereof).

In some embodiments, the target molecule may be KRAS. Suitable binding domains that specifically bind to KRAS are well-known in the art and include a KRAS-binding domain from SOS-1 (e.g. FEGIALTNYLKALEG and variants thereof) and KRAS-binding domains identified by phage display (see for example Sakamoto et al. Biochem. Biophys. Res.

Comm. (2017) 484 605-61 1 ).

In some embodiments, the target molecule may be tankyrase. Suitable binding domains that specifically bind to tankyrase are well-known in the art and include tankyrase binding domains from Axin (e.g. REAGDGEE and HLQREAGDGEEFRS or variants thereof). In some embodiments, the target molecule may be EWS-FLI1 . Suitable binding domains that specifically bind to EWS-FLI1 are well-known in the art and include the ESAP1 peptide TMRGKKKRTRAN and variants thereof. Other suitable sequences may be identified by phage display (see for example Erkizan et al. Cell Cycle (201 1 ) 10, 3397-408). In some embodiments, the target molecule may be Aurora-A. Suitable binding domains that specifically bind to Aurora-A are well-known in the art and include Aurora-A binding sequences from TPX2, such as SYSYDAPSDFINFSS (Bayliss et al. Mol. Cell (2003) 12, 851 -62) and Aurora-A binding sequences from N-myc, such as N-myc residues 19-47 or 61 - 89 (see for example Richards et al. PNAS (2016) 1 13, 13726-31 ).

In some embodiments, the target molecule may be N-Myc or C-Myc. Suitable binding domains that specifically bind to N-myc or C-myc are well-known in the art and include helical binding sequences from Aurora-A (see for example Richards et al. PNAS (2016) 1 13, 13726-31 ).

In some embodiments, the target molecule may be WDR5 (WD repeat-containing protein 5). Suitable binding domains that specifically bind to WDR5 are well-known in the art and include the WDR5-interacting motif (WIN) of MLL1 (mixed lineage leukemia protein 1 ) (see for example Song & Kingston J. Biol. Chem. (2008) 283, 35258-64; Patel et al. J. Biol.

Chem. (2008) 283, 32158-61 ), e.g. EPPLNPHGSARAEVHLRKS and variants thereof. In some embodiments, the target molecule may be BRD4 or a Bromodomain protein.

Suitable binding domains that specifically bind to BRD4 are well-known in the art and include sequences derived from histone protein ligands.

In some embodiments, the target molecule may be a HDAC (histone deacetylase). Suitable binding domains that specifically bind to HDAC are well-known in the art and include binding sequences derived from SMRT and other proteins that recruit HDACs to specific

transcriptional regulatory complexes or binding sequences derived from histone proteins (see for example Watson et al. Nat. Comm. (2016) 7, 1 1262; Dowling et al. Biochem. (2008) 47, 13554-63).

In some embodiments, the target molecule may be Notch. Suitable binding domains that specifically bind to Notch are well-known in the art and include binding sequences from the N-terminus of MAML1 (mastermind like protein 1 ), e.g. SAVMERLRRRIELCRRHHST and variants thereof (see for example Moellering et al. Nature (2009) 462, 182-8).

In some embodiments, the target molecule may be a Cdk (cyclin-dependent kinase).

Suitable binding domains that specifically bind to Cdks are well-known in the art and include substrate-based peptides, for example, Cdk2 sequences derived from cyclin A, such as TYTKKQVLRMEHLVLKVLTFDL and variants thereof (see for example Gondeau et al. J. Biol. Chem. (2005) 280, 13793-800; Mendoza et al. Cancer Res. (2003) 63, 1020-4).

In some embodiments, the target molecule may be PLK1 (polo-like kinase 1 ). Suitable binding domains that specifically bind to PLK1 are well-known in the art and include optimised substrate-derived sequences that bind to the substrate-binding PBD (polo-box domain), such as MAGPMQSEPLMGAKK and variants thereof. In some embodiments, the target molecule may be Tau. Suitable binding domains that specifically bind to Tau are well-known in the art and include tau-binding sequences derived from alpha- and beta-tubulin, such as KDYEEVGVDSVE and YQQYQDATADEQG and variants thereof (see for example Maccioni et al. EMBO J. (1988) 7, 1957-63; Rivas et al. PNAS (1988) 85, 6092-6).

In some embodiments, the target molecule may be BCR-ABL. Suitable binding domains that specifically bind to BCR-ABL are well-known in the art and include optimized substrate- derived sequences, such as EAIYAAPFAKKK and variants thereof. In some embodiments, the target molecule may be PP2A (protein phosphatase 2A). Suitable binding domains that specifically bind to PP2A are well-known in the art and include sequences that bind the B56 regulatory subunit, such as LQTIQEEE and variants thereof (see for example Hetz et al. Mol. Cell (2016), 63 686-95). In some embodiments, the target molecule may be EED (Embryonic ectoderm

development). Suitable binding domains that specifically bind to EED are well-known in the art and include helical binding sequences from co-factor EZH2 (enhancer of zeste homolog 2), such as FSSNRQKILERTEILNQEWKQRRIQPV and variants thereof (see for example Kim et al. Nat. Chem. Biol. (2013) 9, 643-50.)

In some embodiments, the target molecule may be MCL-1 (induced myeloid leukemia cell differentiation protein). Suitable binding domains that specifically bind to MCL-1 are well- known in the art and include sequences from BCL2, e.g. KALETLRRVG DGVQRN H ETAF and variants thereof (see for example Stewart et al. Nat. Chem. Biol. (2010) 6, 595-601 ).

In some embodiments, the target molecule may be RAS. Suitable RAS binding domains are well-known in the art and include RAS-binding peptides identified by phage display, such as RRRRCPLYISYDPVCRRRR and variants thereof (see for example Sakamoto et al. BBRC (2017) 484, 605-1 1 ).

In some embodiments, the target molecule may be GSK3 (glycogen synthase kinase 3). Suitable GSK3 binding domains are well-known in the art and include substrate-competitive binding sequences such as KEAPPAPPQDP, LSRRPDYR, RREGGMSRPADVDG, and YRRAAVPPSPSLSRHSSPSQDEDEEE and variants thereof (see for example llouz et al. J. Biol. Chem. 281 (2006), 30621 -30630. Plotkin et al. J. Pharmacol. Exp. Ther. (2003) 305, 974-980).

In some embodiments, the target molecule may be CtBP (C-terminal binding protein).

Suitable CtBP binding domains are well-known in the art and include sequences identified from a cyclic peptide library screen, such as SGWTVVRMY and variants thereof (see for example Birts et al. Chem. Sci. (2013) 4, 3046-57).

Examples of suitable binding domains for target molecules that may be used in a modular binding protein as described herein are shown in Tables 2 and 7.

In some preferred embodiments, a modular binding protein as described herein may comprise a binding domain for an E3 ubiquitin ligase. Examples of suitable E3 ubiquitin ligases include MDM2,

family, and Fbx4.

Suitable binding domains for E3 ubiquitin ligases (degrons) are well known in the art and may be 5 to 20 amino acids. For example, a suitable binding domain for MDM2 may include a binding domain from p53 (e.g. FAAYWNLLSAYG) and or a variant thereof. A suitable binding domain for SCF Skp2 may include a binding domain from p27 (e.g. AGSNEQEPKKRS) and variants thereof. A suitable binding domain for Keap1 -Cul3 may include a binding domain from Nrf2 (e.g. DPETGEL) or a variant thereof. A suitable binding domain for SPOP- Cul3 may be include a binding domain from Puc (e.g. LACDEVTSTTSSSTA or a variant thereof. A suitable binding domain for APC/C may include the degrons termed ABBA (e.g. SLSSAFHVFEDGN KEN), KEN (e.g. SEDKENVPP), or DBOX (e.g. PRLPLGDVSNN) or a variant thereof. In some instances, a combination of these degrons for may be used

(mimicking the bipartite or tripartite degrons found in some natural substrates). A suitable binding domain for SIAH may include a binding domain from PHYL (e.g. LRPVAMVRPTV) or a variant thereof. A suitable binding domain for CHIP (carboxyl terminus of Hsc70- interacting protein) may include peptide sequences such as ASRMEEVD (from Hsp90 C- terminus) and GPTIEEVD (from Hsp70 C-terminus) or a variant thereof. A suitable binding domain for beta-TrCP may include a degron sequence motif (including phosphomimetic amino acids), such as DDGYFD or a variant thereof. A suitable binding domain for Fbx4 may include sequences derived from TRF1 , such as MPI FWKAHRMSKMGTG or a variant thereof (see for example Lee et al. Chembiochem (2013) 14, 445-451 ). A suitable binding domain for FBw7 may include degron sequence motifs (including phosphomimetic amino acids), such as LPSGLLEPPQD. A suitable binding domain for DDB1 -Cul4 may include sequences derived from HBx (hepatitis B virus X protein) and similar proteins from other viruses and from DCAFs (DDB1-CUL4-associated factors) including helical motifs such as ILPKVLHKRTLGL, NFVSWHANRQLGM, NTVEYFTSQQVTG, and NITRDLIRRQIKE (see for example Li et al. Nat. Struct. Mol. Biol. (2010) 17, 105-1 1 1 ).

Examples of suitable binding domains for E3 ubiquitin ligases that may be used in a modular binding protein as described herein are shown in Table 3.

A modular binding protein comprising a binding domain for an E3 ubiquitin ligase may also comprise a binding domain for a target molecule. Binding of the modular binding protein to both the target molecule and the E3 ubiquitin ligase may cause the target molecule to be ubiquitinated by the E3 ubiquitin ligase. Ubiquitinylated target molecules are then degraded by the proteasome. This allows the specific targeting of molecules for proteolysis by the modular binding protein. The ubiquitination and subsequent degradation of a target protein has been shown for hetero-bifunctional small molecules (PROTACs; proteolysis targeting chimeras) that bind the target protein and a ubiquitin ligase simultaneously (see for example Bondeson et al. Nat. Chem. Biol. 2015; Deshaies 2015; Lu et al. 2015).

In some embodiments, the modular binding protein may lack lysine residues, so that it avoids ubiquitination by the E3 ubiquitin ligase. Examples of modular binding proteins that bind E3 ubiquitin ligase and a target molecule are shown in Tables 1 and 8.

A suitable modular binding protein may comprise an N terminal binding domain that binds a target protein, such as β catenin, and a C terminal binding domain that binds an E3 ubiquitin ligase. For example, the N terminal binding domain may be a β catenin-binding sequence derived from Bcl9 and the C terminal binding domain may be an Mdm2-binding sequence derived from p53. Alternatively, a modular binding protein may comprise a C terminal binding domain that binds a target protein, such as β catenin, and an N terminal binding domain that binds an E3 ubiquitin ligase (see figure 10A).

Another suitable modular binding protein may comprise three repeat domains, a binding domain located in an inter-repeat loop that binds a target protein, such as β catenin, and a C terminal binding domain that binds an E3 ubiquitin ligase. For example, the inter-repeat loop binding domain may be derived from the phosphorylated region of APC (adenomentous polyposis coli) and the C terminal binding domain may be an Mdm2-binding sequence derived from p53. Alternatively, the modular binding protein may comprise a binding domain located in an inter-repeat loop that binds an E3 ubiquitin ligase, and a C terminal binding domain that binds a target protein, such as β catenin (See figure 10B).

Another suitable modular binding protein may comprise three repeat domains, an N terminal binding domain that binds a target protein, such as β catenin, and a binding domain located in an inter-module loop that binds an E3 ubiquitin ligase. For example, the N terminal binding domain may be a β catenin-binding sequence derived from LRH1 (liver receptor homolog 1 ) and the inter-module loop binding domain may be a sequence derived from the Skp2- targeting region of p27. Alternatively, the modular binding protein may comprise an N terminal binding domain that binds an E3 ubiquitin ligase and a binding domain located in an inter-module loop that binds a target protein, such as β catenin (see figure 10C). Another suitable modular binding protein may comprise four repeat domains, a first binding domain located in an inter-repeat loop that binds an E3 ubiquitin ligase and a second binding domain located in an inter-repeat loop that binds a target molecule. The first and second inter-repeat loops may be separate by an inter-repeat loop lacking a binding domain. For example, the first binding domain may be located in the first inter-repeat loop inter-repeat loop from the N terminus and the second binding domain may be located in the third inter- repeat loop from the N terminus or vice versa.

In some preferred embodiments, a modular binding protein as described herein may comprise an amino acid shown in Table 8 or a variant thereof.

In other preferred embodiments, a modular binding protein as described herein may comprise a binding domain that binds to a component of a target-selective autophagy pathway, such as chaperone-mediated autophagy (CMA). The modular binding protein and target molecules bound thereto are thus recognised by the autophagy pathway and the target molecules are subsequently degraded. Suitable components of the CMA pathway include heat shock cognate protein of 70 kDa (hsc70, HSPA8, Gene ID: 3312). Suitable binding domains are well known in the art (Dice J.F. (1990). Trends Biochem. Sci. 15, 305- 309) and include Lys-Phe-Glu-Arg-Gln (KFERQ) and variants thereof, such as CMA_Q and CMA_K, as described herein. These domains have been demonstrated to be capable of targeting heterologous proteins to the autophagy pathway (Fan, X.et al; (2014) Nature Neuroscience 17, 471 -480). In addition to repeat domains and binding domains, a modular binding protein may further comprise one or more additional domains which confer additional functionality, such as targeting domains, intracellular transport domains, stabilising domains or oligomerisation domains. Additional domains may for example be located at the N or C terminus of the modular binding protein or in a loop between repeats.

A targeting domain may be useful in targeting the modular binding protein to a particular destination in vivo, such as a target tissue, cell, membrane or intracellular organelle. Suitable targeting domains include chimeric antigen receptors (CARs).

An intracellular transport domain may facilitate the passage of the modular binding protein through the cell membrane into cells, for example to bind intracellular target molecules. Suitable intracellular transfer domains are well known in the art (see for example Bechara et al FEBS Letters 587 1 (2013) 1693-1702) and include cell-penetrating peptides (CPPs), such as Antennapedia (43-58), Tat (48-60), Cadherin (615-632) and poly-Arg.

A stabilising domain may increase the half-life of the modular binding protein in vivo.

Suitable stabilising domains are well known in the art and include Fc domains, serum albumin, unstructured peptides such as XTEN 98 or PAS" and polyethylene glycol (PEG).

An oligomerisation domain may facilitate the formation of multi-protein complexes, for example to increase avidity against multi-valent targets. Suitable oligomerisation domains include the 'foldon' domain, the natural trimerisation domain of T4 fibritin (Meier et al., J. Mol. Biol. (2004) 344(4): 1051 -69).

In addition to repeat domains, binding domains and optionally one or more additional domains, a modular binding protein may further comprise a cytotoxic or therapeutic agent and/or or detectable label. Suitable cytotoxic agents include, for example, chemotherapeutic agents, such as methotrexate, auristatin adriamicin, doxorubicin, melphalan, mitomycin C, ozogamicin, chlorambucil, maytansine, emtansine, daunorubicin or other intercalating agents,

enzymatically active toxins of bacterial, fungal, plant, or animal origin, such as diphtheria A chain, nonbinding active fragments of diphtheria toxin, exotoxin A chain, ricin A chain, abrin A chain, modeccin A chain, oamanitin, alpha-sarcin, Aleurites fordii proteins, tubulysins, dianthin proteins, Phytolaca americana proteins (PAPI, PAPII, and PAP-S), Momordica charantia inhibitor, curcin, crotin, Sapaonaria officinalis inhibitor, gelonin, mitogellin, ^ restrictocin, phenomycin, enomycin, pyrrolobenzodiazepines, and the tricothecenes and fragments of any of these. Suitable cytotoxic agents may also include radioisotopes. A variety of radionuclides are available for the production of radioconjugated modular binding proteins including, but not limited to,

Conjugates of a modular binding protein and one or

more small anti-cancer molecules, for example toxins, such as a calicheamicin,

maytansinoids, a trichothene, and CC1065, and the derivatives of these toxins that have toxin activity, may also be used. Suitable therapeutic agents may include cytokines (e.g. IL2, IL12 and TNF), chemokines, pro-coagulant factors (e.g. tissue factor), enzymes, liposomes, and immune response factors.

A detectable label may be any molecule that produces or can be induced to produce a signal, including but not limited to fluorescers, radiolabels, enzymes, chemiluminescers or photosensitizers. Thus, binding may be detected and/or measured by detecting fluorescence or luminescence, radioactivity, enzyme activity or light absorbance. Detectable labels may be attached to modular binding proteins using conventional chemistry known in the art. There are numerous methods by which the label can produce a signal detectable by external means, for example, by visual examination, electromagnetic radiation, heat, and chemical reagents. The label can also be bound to another specific binding member that binds the modular binding protein, or to a support. In some embodiments, a modular binding protein may be configured for display on a particle or molecular complex, such as a cell, ribosome or phage, for example for screening and selection. A suitable modular binding protein may further comprise a display moiety, such as phage coat protein, to facilitate display on a particle or molecular complex. The phage coat protein may be fused or covalently linked to the modular binding protein.

Modular binding proteins as described herein may be produced by recombinant means. For example, a method of producing a modular binding protein as described herein may comprise expressing a nucleic acid encoding the modular binding protein. A nucleic acid may be expressed in a host cell and the expressed modular binding protein may then be isolated and/or purified from the cell culture.

In some embodiments, a method may comprise; inserting a first nucleic acid encoding a binding domain into a second nucleic acid encoding a two or more repeat domains to produce a chimeric nucleic acid encoding a modular binding protein comprising a binding domain located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and,

expressing said chimeric nucleic acid to produce the modular binding protein.

Methods described herein may be useful in producing a modular binding protein that binds to a first target molecule and a second target molecule. For example, a method may comprise; providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, each repeat domain; and

incorporating into said nucleic acid a first nucleotide sequence encoding a first binding domain that binds to a first target molecule and a second nucleotide sequence encoding a second binding domain that binds to a second target molecule to generate a nucleic acid encoding a modular binding protein comprising said first and second binding domains, wherein said binding domains are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and

expressing the nucleic acid to produce said protein

One of the first and second target molecules may be an E3 ubiquitin ligase. For example, a method may comprise;

providing a nucleic acid encoding two or more repeat domains linked by inter-repeat loops, each repeat domain; and

incorporating into said nucleic acid a first nucleotide sequence encoding a first binding domain that binds to a target molecule and a second nucleotide sequence encoding a second binding domain that binds to an E3 ubiquitin ligase to generate a nucleic acid encoding a modular binding protein comprising said first and second binding domains, wherein said binding domains are located in an inter-repeat loop or at the N or C terminus of the modular binding protein; and

expressing the nucleic acid to produce said protein.

An isolated nucleic acid encoding a modular binding protein as described herein is provided as an aspect of the invention. The nucleic acid may be comprised within an expression vector. Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate.

Preferably, the vector contains appropriate regulatory sequences to drive the expression of the nucleic acid in a host cell. Suitable regulatory sequences to drive the expression of heterologous nucleic acid coding sequences in expression systems are well-known in the art and include constitutive promoters, for example viral promoters such as CMV or SV40, and inducible promoters, such as Tet-on controlled promoters. A vector may also comprise sequences, such as origins of replication and selectable markers, which allow for its selection and replication and expression in bacterial hosts such as E. coli and/or in eukaryotic cells.

Many techniques and protocols that are suitable for the expression of recombinant modular binding proteins in cell culture and their subsequent isolation and purification are known in the art (see for example Protocols in Molecular Biology, Second Edition, Ausubel et al. eds. John Wiley & Sons, 1992; Recombinant Gene Expression Protocols Ed RS Tuan (Mar 1997) Humana Press Inc).

A host cell comprising a nucleic acid encoding a modular binding protein as described herein or vector containing such a nucleic acid is also provided as an aspect of the invention.

Suitable host cells include bacteria, mammalian cells, plant cells, filamentous fungi, yeast and baculovirus systems and transgenic plants and animals. The expression of proteins in prokaryotic cells is well established in the art. A common bacterial host is E. coli. A modular binding protein may also be produced by expression in eukaryotic cells in culture.

Mammalian cell lines available in the art for expression of a modular binding protein include Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney cells, NSO mouse melanoma cells, YB2/0 rat myeloma cells, human embryonic kidney cells (e.g. HEK293 cells), human embryonic retina cells (e.g. PerC6 cells) and many others. Modular binding proteins as described herein may be used to produce libraries. A suitable library may be screened in order to identify and isolate modular binding proteins with specific binding activity. A library may comprise modular binding proteins, each modular binding protein in the library comprising:

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter- repeat loop or at the N or C terminus of the modular binding protein,

wherein at least one amino acid residue in the binding domains in said library is diverse.

The residues at one or more positions in the binding domain of the modular binding proteins in the library may be diverse or randomised i.e. the residue located at the one or more positions may be different in different molecules in a population.

For example, 1 to 12 positions within a helical binding domain at the N or C terminus of the modular binding proteins in the library may be diverse or randomised. In addition, the non- constrained X n sequence of the binding domain may contain additional diversity. Alternatively or additionally, 1 to n positions within an inter-repeat binding domain of the modular binding proteins in the library may be diverse or randomised, where n is the number of amino acids in the binding domain.

In some embodiments, binding domains may be screened individually and a modular binding protein progressively assembled from repeat domains comprising binding domains identified in different rounds of screening. For example, a library may comprise modular binding proteins, each modular binding protein in the library comprising:

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more constant binding domains having the same amino acid sequence in each modular binding protein in the library and one or more diverse binding domains, preferably one diverse binding domain, having a different amino acid sequence in each modular binding protein in the library,

said binding domains being located in an inter-repeat loop or at the N or C terminus of the modular binding protein.

At least one amino acid residue in the diverse binding domains in said library may be diverse.

A library may be produced by a method comprising:

(a) providing a population of nucleic acids encoding a diverse population of modular binding proteins comprising

(i) two or more repeat domains,

(ii) inter-repeat loops linking the two or more repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter-repeat loop or at the N or C terminus of the modular binding protein ,

wherein one or more residues of a binding domain in each modular binding protein is diverse in said library, and

(b) expressing said population of nucleic acids to produce the diverse population, thereby producing a library of modular binding proteins. The population of nucleic acids may be provided by a method comprising inserting a first population of nucleic acids encoding a diverse binding domain into a second population of nucleic acids encoding the two or more repeat domains linked by inter-repeat loops, optionally wherein the first and second nucleic acids are linked with a third population of nucleic acids encoding linkers of up to 10 amino acids.

The nucleic acids may be contained in vectors, for example expression vectors. Suitable vectors include phage-based or phagemid-based phage display vectors. The nucleic acids may be recombinantly expressed in a cell or in solution using a cell-free in vitro translation system such as a ribosome, to generate the library. In some preferred embodiments, the library is expressed in a system in which the function of the modular binding protein enables isolation of its encoding nucleic acid. For example, the modular binding protein may be displayed on a particle or molecular complex to enable selection and/or screening. In some embodiments, the library of modular binding proteins may be displayed on beads, cell-free ribosomes, bacteriophage, prokaryotic cells or eukaryotic cells. Alternatively, the encoded modular binding protein may be presented within an emulsion where activity of the modular binding protein causes an identifiable change. Alternatively, the encoded modular binding protein may be expressed within or in proximity of a cell where activity of the modular binding protein causes a phenotypic change or changes in the expression of a reporter gene.

Preferably, the nucleic acids are expressed in a prokaryotic cell, such as E coli. For example, the nucleic acids may be expressed in a prokaryotic cell to generate a library of recombine binding proteins that is displayed on the surface of bacteriophage. Suitable prokaryotic phage display systems are well known in the art, and are described for example in Kontermann, R & Dubel, S, Antibody Engineering, Springer-Verlag New York, LLC; 2001 , ISBN: 3540413545, WO92/01047, US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793, US5962255, US6140471 , US6172197, US6225447, US6291650, US6492160 and US6521404. Phage display systems allow the production of large libraries, for example libraries with 10 8 or more, 10 9 or more, or 10 10 or more members.

In other embodiments, the cell may be a eukaryotic cell, such as a yeast, insect, plant or mammalian cell.

A diverse sequence as described herein is a sequence which varies between the members of a population i.e. the sequence is different in different members of the population. A diverse sequence may be random i.e. the identity of the amino acid or nucleotide at each position in the diverse sequence may be randomly selected from the complete set of naturally occurring amino acids or nucleotides or a sub-set thereof. Diversity may be introduced into the binding domain using approaches known to those skilled in the art, such as oligonucleotide-directed mutagenesis 22 , Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al., 2001 , Cold Spring Harbor Laboratory Press, and references therein).

Diverse sequences may be contiguous or may be distributed within the binding domain. Suitable methods for introducing diverse sequences into binding domain are well-described in the art and include oligonucleotide-directed mutagenesis (see Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al., 2001 , Cold Spring Harbor Laboratory Press, and references therein). For example, diversification may be generated using oligonucleotide mixes created using partial or complete randomisation of nucleotides or created using codons mixtures, for example using trinucleotides. Alternatively, a population of diverse oligonucleotides may be synthesised using high throughput gene synthesis methods and combined to create a precisely defined and controlled population of binding domains.

Alternatively, "doping" techniques in which the original nucleotide predominates with alternative nucleotide(s) present at lower frequency may be used.

Preferably, the library is a display library. The modular binding proteins in the library may be displayed on the surface of particles, or molecular complexes such as beads, for example, plastic or resin beads, ribosomes, cells or viruses, including replicable genetic packages, such as yeast, bacteria or bacteriophage (e.g. Fd, M13 or T7) particles, viruses, cells, including mammalian cells, or covalent, ribosomal or other in vitro display systems.

Techniques for the production of display libraries, such as phage display libraries are well known in the art. Each particle or molecular complex may comprise nucleic acid that encodes the modular binding protein that is displayed by the particle.

In some preferred embodiments, the modular binding proteins in the library are displayed on the surface of a viral particle such as a bacteriophage. Each modular binding protein in the library may further comprise a phage coat protein to facilitate display. Each viral particle may comprise nucleic acid encoding the modular binding protein displayed on the particle.

Suitable viral particles include bacteriophage, for example filamentous bacteriophage such as M13 and Fd.

Suitable methods for the generation and screening of phage display libraries are well known in the art. Phage display is described for example in WO92/01047 and US patents US5969108, US5565332, US5733743, US5858657, US5871907, US5872215, US5885793, US5962255, US6140471 , US6172197, US6225447, US6291650, US6492160 and

US6521404. Libraries as described herein may be screened for modular binding proteins which display binding activity, for example binding to a target molecule. Binding may be measured directly or may be measured indirectly through agonistic or antagonistic effects resulting from binding. A method of screening may comprise;

(a) providing a library of modular binding proteins, each modular binding protein in the library comprising;

(i) two or more repeat domains,

(ii) inter-repeat loops linking said repeat domains; and

(iii) one or more binding domains, each said binding domain being located in an inter-repeat loop or at the N or C terminus of the modular binding protein ,

wherein one or more residues of the one or more binding domains are diverse in said library,

(b) screening the library for modular binding proteins which display a binding activity, and (c) identifying one or more modular binding proteins in the library which display the binding activity.

In some embodiments, the modular binding proteins in the library may comprise one binding domain with at least one diverse amino acid residue. Conveniently the modular binding proteins in the library comprise two repeat domains. The library may be screened for binding domains that bind to a target molecule. Binding domains identified in this fashion can be assembled in a modular fashion to generate a modular binding protein as described herein that is multi-specific.

For example, a first library may be screened for a first binding domain that binds to a first target molecule and a second library may be screened for a second binding domain that binds to a second target molecule. The first and second binding domains are in different locations in the modular binding protein i.e. they are not both N terminal binding domains, C terminal binding domains or inter-repeat binding domains. First and second binding domains that bind to the first and second target molecules, respectively, are identified from the first and second libraries. The identified first and second binding domains may then be incorporated into a modular binding protein that binds to the first and second target molecules. A first library may comprise modular binding proteins in the library with a first diverse binding domain having at least one diverse amino acid residue. A first binding domain that binds to a target molecule may be identified from the first library. Modular binding proteins comprising the first binding domain may be used to generate a second library comprising a second diverse binding domain having at least one diverse amino acid residue. For example, the modular binding protein from the first library may be modified by addition of a second diverse binding domain at the N or C terminal or by the addition of additional repeat domains comprising the second diverse binding domain in an inter-repeat loop. A second binding domain that binds to the same or a different target molecule may be identified from the second library. Modular binding proteins comprising the first and second binding domains may be used to generate a third library comprising a third diverse binding domain having at least one diverse amino acid residue. For example, the modular binding protein from the second library may be modified by addition of a third diverse binding domain at the N or C terminal or by the addition of additional repeat domains comprising the third diverse binding domain in an inter-repeat loop. A third binding domain that binds to the same target molecule as the first and/or second binding domains or a different target molecule may be identified from the third library. In this way, a modular binding protein containing multiple binding domains may be sequentially assembled (see Figure 16). The use of separate libraries for each binding domain allows large numbers of different variants of each binding domain to be screened independently and then combined. For example, a phage library of 10 8 -10 12 first binding domain variants may be combined with a phage library of 10 8 -10 12 second binding domain variants and a phage library of 10 8 -10 12 third binding domain variants. In some embodiments, a phage library of 10 8 -10 12 N terminal binding domain variants may be combined with a phage library of 10 8 -10 12 C terminal binding domain variants to generate a modular binding protein with N and C terminal binding domains.

Screening a library for binding activity may comprise providing a target molecule and identifying or selecting members of the library that bind to the target, or expressing the library in a population of cells and identifying or selecting members of the library that elicit a cell phenotype. The one or more identified or selected modular binding proteins may be recovered and subjected to further selection and/or screening. Binding may be determined by any suitable technique. For example, the library may be contacted with the target molecule under binding conditions for a time period sufficient for the target molecule to interact with the library and form a binding reaction complex with a least one member thereof. Binding conditions are those conditions compatible with the known natural binding function of the target molecule. Those compatible conditions are buffer, pH and temperature conditions that maintain the biological activity of the target molecule, thereby maintaining the ability of the molecule to participate in its preselected binding interaction. Typically, those conditions include an aqueous, physiologic solution of pH and ionic strength normally associated with the target molecule of interest.

The library may be contacted with the target molecule in the form of a heterogeneous or homogeneous admixture. Thus, the members of the library can be in the solid phase with the target molecule present in the liquid phase. Alternatively, the target molecule can be in the solid phase with the members of the library present in the liquid phase. Still further, both the library members and the target molecule can be in the liquid phase.

Suitable methods for determining binding of a modular binding protein to a target molecule are well known in the art and include ELISA, bead-based binding assays (e.g. using streptavidin-coated beads in conjunction with biotinylated target molecules, surface plasmon resonance, flow cytometry, Western blotting, immunocytochemistry, immunoprecipitation, and affinity chromatography. Alternatively, biochemical or cell-based assays, such as fluorescence-based or luminescence-based reporter assays may be employed.

Multiple rounds of panning may be performed in order to identify modular binding proteins which display the binding activity. For example, a population of modular binding proteins enriched for the binding activity may be recovered or isolated from the library and subjected to one or more further rounds of screening for the binding activity to produce one or further enriched populations. Modular binding proteins which display binding activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.

In some embodiments, binding may be determined by detecting agonism or antagonism resulting from the binding of a modular binding protein to a target molecule, such as a ligand, receptor or enzyme. For example, the library may be screened by expressing the library in reporter cells and identifying one or more reporter cells with altered gene expression or phenotype. Suitable functional screening techniques for screening recombinant populations of modular binding proteins are well-known in the art

Modular binding proteins which display the binding activity may be further engineered to improve an activity or property or introduce a new activity or property, for example a binding property such as affinity and/or specificity, an in vivo property such as solubility, plasma stability, or cell penetration, or an activity such as increased neutralization of the target molecule and/or modulation of a specific activity of the target molecule or an analytical property. Modular binding proteins may also be engineered to improve stability, solubility or expression level.

Further rounds of screening may be employed to identify modular binding proteins which display the improved property or activity. For example, a population of modular binding proteins enriched for binding to the target molecule may be recovered or isolated from the library and subjected to one or more further rounds of screening for the improved or new property or activity to produce one or further enriched populations. Optionally, this may be repeated one or more times. Modular binding proteins which display the improved property or activity may be identified from the one or more further enriched populations and recovered, isolated and/or further investigated.

A modular binding protein as described herein may be encapsulated in a liposome, for example for delivery into a cell. Preferred liposomes include fusogenic liposomes. Suitable fusogenic liposomes may comprise a cationic lipid, such as 1 , 2-dioleoyl-3- trimethylammoniumpropane (DOTAP), and a neutral lipid, such as

dioleoylphosphatidylethanolamine (DOPE) for example in a 1 :1 (w/w) ratio. Optionally, a liposome may further comprise an aromatic lipid, such as DiO (3, 3'- dioctadecyloxacarbocyanine perchlorate), DiR (1 , 1 '-dioctadecyl-3, 3, 3', 3'- tetramethylindotricarbocyanine iodide), N-(4,4-difluoro-5,7-dimethyl-4-bora-3a,4a-diaza- sindacene-3-propionyl)-1 ,2-dihexadecanoyl-sn-glycero-3-phosphoethanolamine

(triethylammonium salt) (BODIPY FL-DHPE), and 2-(4,4-difluoro-5-methyl-4-bora-3a,4a- diazas-indacene-3-dodecanoyl)-1 -hexadecanoyl-sn-glycero-3-phosphocholine (BODIPY- C12HPC) for example in a 0.1 :1 :1 (w/w) ratio relative to the neutral and cationic lipid.

Suitable techniques for the encapsulation of proteins in liposomes and their delivery into cells are established in the art (see for example, Kube et al Langmuir (2017) 33 1051 -1059; Kolasinac et al (2018) Int. J. Mol. Sci. 19 346).

A method described herein may comprise admixing a modular binding protein or encoding nucleic acid as described herein with a solution of lipids, for example in an organic solvent, such as chloroform, and evaporating the solvent to produce liposomes encapsulating the modular binding protein. Liposome encapsulations comprising a modular binding protein as described herein are provided as an aspect of the invention. A modular binding protein or encoding nucleic acid as described herein may be admixed with a pharmaceutically acceptable excipient. A pharmaceutical composition comprising a modular binding protein or nucleic acid as described herein and a pharmaceutically acceptable excipient is provided as an aspect of the invention.

The term "pharmaceutically acceptable" as used herein pertains to compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgement, suitable for use in contact with the tissues of a subject (e.g., human) without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. Each carrier, excipient, etc. must also be "acceptable" in the sense of being compatible with the other ingredients of the formulation. Suitable carriers, excipients, etc. can be found in standard pharmaceutical texts, for example, Remington's Pharmaceutical Sciences, 18th edition, Mack Publishing Company, Easton, Pa., 1990. The pharmaceutical composition may conveniently be presented in unit dosage form and may be prepared by any methods well-known in the art of pharmacy. Such methods include the step of bringing the modular binding protein into association with a carrier which may constitute one or more accessory ingredients. In general, pharmaceutical compositions are prepared by uniformly and intimately bringing into association the active compound with liquid carriers or finely divided solid carriers or both, and then if necessary shaping the product.

Pharmaceutical compositions may be in the form of liquids, solutions, suspensions, emulsions, elixirs, syrups, tablets, lozenges, granules, powders, capsules, cachets, pills, ampoules, suppositories, pessaries, ointments, gels, pastes, creams, sprays, mists, foams, lotions, oils, boluses, electuaries, or aerosols.

A modular binding protein, encoding nucleic acid or pharmaceutical composition comprising the modular binding protein or encoding nucleic acid may be administered to a subject by any convenient route of administration, whether systemically/ peripherally or at the site of desired action, including but not limited to, oral (e.g. by ingestion); topical (including e.g. transdermal, intranasal, ocular, buccal, and sublingual); pulmonary (e.g. by inhalation or insufflation therapy using, e.g. an aerosol, e.g. through mouth or nose); rectal; vaginal;

parenteral, for example, by injection, including subcutaneous, intradermal, intramuscular, intravenous, intraarterial, intracardiac, intrathecal, intraspinal, intracapsular, subcapsular, intraorbital, intraperitoneal, intratracheal, subcuticular, intraarticular, subarachnoid, and intrasternal; by implant of a depot, for example, subcutaneously or intramuscularly. Pharmaceutical compositions suitable for oral administration (e.g., by ingestion) may be presented as discrete units such as capsules, cachets or tablets, each containing a predetermined amount of the active compound; as a powder or granules; as a solution or suspension in an aqueous or non-aqueous liquid; or as an oil-in-water liquid emulsion or a water-in-oil liquid emulsion; as a bolus; as an electuary; or as a paste.

Pharmaceutical compositions suitable for parenteral administration (e.g. by injection, including cutaneous, subcutaneous, intramuscular, intravenous and intradermal), include aqueous and non-aqueous isotonic, pyrogen-free, sterile injection solutions which may contain anti-oxidants, buffers, preservatives, stabilisers, bacteriostats, and solutes which render the formulation isotonic with the blood of the intended recipient; and aqueous and non-aqueous sterile suspensions which may include suspending agents and thickening agents, and liposomes or other microparticulate systems which are designed to target the compound to cells, tissue or organs. Examples of suitable isotonic vehicles for use in such formulations include Sodium Chloride Injection, Ringer's Solution, or Lactated Ringer's Injection. Typically, the concentration of the active compound in the solution is from about 1 ng/ml to about 10 μg ml, for example, from about 10 ng/ml to about 1 μg ml. The

formulations may be presented in unit-dose or multi-dose sealed containers, for example, ampoules and vials, and may be stored in a freeze-dried (lyophilised) condition requiring only the addition of the sterile liquid carrier, for example water for injections, immediately prior to use.

It will be appreciated that appropriate dosages of the modular binding protein, can vary from patient to patient. Determining the optimal dosage will generally involve the balancing of the level of diagnostic benefit against any risk or deleterious side effects of the administration. The selected dosage level will depend on a variety of factors including, but not limited to, the route of administration, the time of administration, the rate of excretion of the imaging agent, the amount of contrast required, other drugs, compounds, and/or materials used in combination, and the age, sex, weight, condition, general health, and prior medical history of the patient. The amount of imaging agent and route of administration will ultimately be at the discretion of the physician, although generally the dosage will be to achieve concentrations of the imaging agent at a site, such as a tumour, a tissue of interest or the whole body, which allow for imaging without causing substantial harmful or deleterious side-effects.

Administration in vivo can be effected in one dose, continuously or intermittently (e.g., in divided doses at appropriate intervals). Methods of determining the most effective means and dosage of administration are well known to those of skill in the art and will vary with the formulation used for therapy, the purpose of the therapy, the target cell being treated, and the subject being treated. Single or multiple administrations can be carried out with the dose level and pattern being selected by the physician.

Modular binding proteins described herein may be used in methods of diagnosis or treatment in human or animal subjects, e.g. human. Modular binding proteins for a target molecule may be used to treat disorders associated with the target molecule. Other aspects and embodiments of the invention provide the aspects and embodiments described above with the term "comprising" replaced by the term "consisting of" and the aspects and embodiments described above with the term "comprising" replaced by the term "consisting essentially of". It is to be understood that the application discloses all combinations of any of the above aspects and embodiments described above with each other, unless the context demands otherwise. Similarly, the application discloses all combinations of the preferred and/or optional features either singly or together with any of the other aspects, unless the context demands otherwise.

Modifications of the above embodiments, further embodiments and modifications thereof will be apparent to the skilled person on reading this disclosure, and as such, these are within the scope of the present invention. All documents and sequence database entries mentioned in this specification are

incorporated herein by reference in their entirety for all purposes.

"and/or" where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example "A and/or B" is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein.

Certain aspects and embodiments of the invention will now be illustrated by way of example and with reference to the figures described above.

Experiments

1 . Methods 1 .1 Large-scale protein purification (His-tagged) from E. coli

The pRSET B (His-tag) constructs were transformed into chemically competent E. coli C41 cells by heat shock and plated on LB-Amp plates. Colonies were grown in 2TY media containing ampicillin (50 micrograms/mL) at 37 °C, 220 rpm until the optical density (O.D.) at 600 nm reached 0.6. Cultures were then induced with IPTG (0.5mM) for 16-20 h at 20 °C or 4 h at 37 °C. Cells were pelleted by centrifugation at 3000 g (4 °C, 10 min) and resuspended in lysis buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI, 1 tablet of SIGMAFAST protease inhibitor cocktail (EDTA-free per 100 mL of solution), then lysed on a Emulsiflex C5 homogenizer at 15000 psi. Cell debris was pelleted by centrifugation at 15,000 g at 4 °C for 45 min. Ni-NTA beads 50% bed volume (GE Healthcare) (5 mL) were washed once with phosphate buffer (10 mM sodium phosphate pH 7.4, 150 mM NaCI ) before the supernatant of the cell lysate was bound to them for 1 hr at 4 °C in batch. The loaded beads were washed three times with phosphate buffer (40 mL) containing 30 mM of imidazole to prevent nonspecific interaction of lysate proteins with the beads. Samples were eluted using phosphate buffer with 300 mM imidazole, and purified by size-exclusion chromatography using a

HiLoad 16/60 SuperdexG75 column (GE Life-Science) pre-equilibrated in phosphate buffer (10 mM sodium phosphate, pH 7.4, 150 mM NaCI) and proteins separated in isocratic conditions. Purity was checked on NuPage protein gel (Invitrogen), and fractions found to be over 95% pure were pooled. Purified protein was flash-frozen and stored at -80 °C until further use. Concentrations were determined by measuring absorbance at 280 nm and using a calculated extinction coefficient from ExPASy ProtParam (Gasteiger et al. 2005) for each variant. Molecular weight and purity was confirmed using mass spectrometry (MALDI.

1 .2 Large-scale protein purification (heat treatment) from E. coli

All modular binding proteins described herein are thermally very stable, with melting temperatures above 80°C. This means that the modular binding proteins could be separated from E. coli proteins by incubating the cell lysates at 65 °C for 20 min. Very few of the E. coli proteins survive such temperatures, and therefore, they will unfold and aggregate.

Aggregated proteins were removed by centrifugation, leaving 80-90% pure sample of the desired protein. All our constructs folded reversibly, and therefore could be further purified by methods such as acetone or salt precipitation to remove DNA and other contaminants.

This approach allowed the production of large amounts of functional proteins without expensive affinity purification methods such as antibodies or His tags and is scalable to industrial production and bioreactors.

1 .3 Small-scale purification of His-tagged proteins for higher-throughput testing Plasmids were transformed into E. coli C41 cells and plated overnight. 15 mis of 2TY medium (Roche) containing 50 micrograms/ml ampicillin was placed in multiple 50 ml tubes. Several colonies were picked and resuspended in each 15 ml culture. For sufficient aeration it is important to only loosely tighten the lids of the 50 ml tubes. Cells were grown at 37 °C until OD600 of 0.6 and then induced with 0.5 mM IPTG overnight. Cells were pelleted at 3000 g (Eppendorf Centrifuge 5804) and then resuspended in 1 ml of BugBuster® cell lysis reagent. Alternatively, sonication in combination with lysozyme and DNAse I treatment was used. The lysate was spun at 12000 g for 1 minute to pellet any insoluble protein and cell debris.

The supernatant was added to 100 μΙ bed volume of pre-washed Ni-NTA agarose beads. The subsequent affinity purification was performed in batch, by washing the beads 4 times with 1 ml of buffer each time (alternatively, Qiagen Ni-NTA Spin Columns can be used). The first wash contained 10% BugBuster® solution and 30 mM imidazole in the chosen buffer. Here we used 50 mM sodium phosphate buffer pH 6.8, 150 mM NaCI. The three successive washes had 30 mM of imidazole in the chosen buffer. Beads were washed thoroughly to remove the detergent present in the BugBuster® solution. Protein was eluted from the beads in a single step using 1 ml of chosen buffer containing 300 mM imidazole. The combination of Bugbuster® and imidazole and the repeat washes in small bead volumes yielded >95% pure protein. Imidazole was removed using a NAP-5 disposable gel-filtration column (GE Healthcare).

1 .4 Competition Fluorescence Polarization (FP)

To assay the binding of the designed SOS-TPR protein to KRAS, Competition FP was performed using purified KRAS Q61 H mutant and (2'-(or-3')-0-(N-Methylanthraniloyl) Guanosine 5'-Triphosphate, a fluorescent version of GTP, also known as mant-GTP. SOS- TPR was titrated using a 2-fold serial dilution against a 1 :1 complex of KRAS Q61 H and mant-GTP (1 μΜ) in a black 96-well plate (CLS3993 SIGMA). Plates were prepared under reduced light conditions and incubated at room temperature. Readings were taken on the CLARIOstar microplate reader, using an excitation filter at 360 nm and emission filter at 440 nm.

1 .5 Isothermal Titration Calorimetry (ITC)

ITC was performed at 25°C using a VP-ITC (Microcal). 1TBP-CTPR2, 2TBP-CTPR4, 3TBP- CTPR6 and TNKS2 ARC4 were dialysed into 10 mM sodium phosphate buffer pH 7.4, 150 mM NaCI, 0.5 mM TCEP. Dialysed TNKS2 ARC4 (200 μΜ) was titrated into the sample cell containing 1TBP-CTPR2 at 20 μΜ. Similar experiments were performed for 2TBP-CTPR4 and 3TBP-CTPR6. Injections of TNKS2 ARC4 into the cell were initiated with a 5 μΙ_ injection, followed by 29 injections of 10 μΙ_. The reference power was set at 15 Ca\/s with an initial delay of 1000 s and a stirring speed of 485 rpm. Data were fitted using the instrument software a one-site binding model.

1 . 6 Cell culture

HEK293T cells were cultured in Dulbecco's Modified Eagle's Medium (Sigma Aldrich) supplemented with 10% fetal bovine serum and penicillin/streptomycin (LifeTech) at 37°C with 5% CO2 air supply.

1 . 7 Cell transfection

HEK293T were seeded in 6-well tissue culture plates (500,000 cells per well) and transfected the next day using the Lipofectamine2000 transfection reagent (Invitrogen) according to the manufacturer's protocol.

1 . 8 β-catenin levels western blot assay

ΗΑ-β-catenin (1 μg) alone and with various PROTACs (1 μg) was transfected in HEK293T cells in 6-well plates using Lipofectamine2000. After 48 hours of transfection, the cells were lysed in 200 μΙ_ of Laemmli buffer. After sample was boiled at 95°C for 20 min proteins were resolved by SDS-PAGE and transferred to a PVDF membrane, and immunoblotting was performed using anti-HA (C29F4, Cell Signaling Technologies) and anti-actin (A2066, Sigma-Aldrich) antibodies. Changes in β-catenin levels were evaluated by the densitometry of the bands corresponding to ΗΑ-β-catenin normalised to actin levels using ImageJ. 1 .9 Liposomal formulation and cytotoxicity assay

To make liposomal formulations of proteins (LFP), lipids (DOTAP (cationic): DOPE (neutral): DiR (aromatic) = 1 :1 :0.1 w/w) were dissolved in chloroform, and solvent was evaporated under vacuum overnight. Resulting mixed lipid cake was hydrated with 10 mM HEPES pH 7.4, containing 27 μΜ protein, so that the total lipid concentration is 4 mg/ml. This mixture was vortexed for 2 minutes and then sonicated for 20 minutes at room temperature.

Liposomes encapsulating proteins were stored at 4°C until further use. To make empty liposomes (EL, empty liposomes without proteins), lipid cake was hydrated with 10 mM HEPES pH 7.4 without proteins.

An ATP assay was used to investigate whether there is any cytotoxicity associated with EL and LFP. In a typical procedure, 2 x 10 5 HEK 293T cells/well in 500 μΐ of Dulbecco's Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum were grown for 24 hours in a 24-well cell culture plate. Cells were incubated with liposome (EL/LFP)- media (DMEM without FBS) mix, having different volumes (0-60 μΙ_) of EL and LFP, for 15 minutes at 37°C. After washing twice with 1x PBS, 500 μΙ_ of CellTiter-Glo ® Reagent

(Promega) was added and luminescence was measured using a microplate reader as par the manufacture's protocol. Untreated cells were used as control. Data were obtained from triplicate samples, and the standard deviations were calculated from two independent experiments. 1 .10 TOPFLASH assay

The Wnt pathway was activated by treating HEK293T cells with Wnt-conditioned media obtained from L-cells expressing Wnt3A for 8 days. To perform the assay, 10 5 HEK293T cells/well were seeded on a 24-well plate Nunclon Delta Surface plate (NUNC) and incubated overnight at 37°C, 5% C02. The following day, cells were transfected with 100 ng of TOPflash TCF7L2-firefly luciferase plasmid, 10 ng of CMV-Renilla plasmid (as internal control) and 100 ng of the corresponding TPR construct. Plasmids were mixed with 0.5 μΙ_ of Lipofectamine 2000 transfection reagent according to the manufacturer's protocol

(invitrogen). Transfected cells were allowed to recover for 8 h, then they were treated with Wnt-conditioned media (1 :2 final concentration) for a further 16 h. The TOPflash assay was performed using the Dual-Luciferase Reporter Assay System (Promega) (Korinek et al.,

1997 Science 275(5307): 1784-7) following the manufacturer's instructions. The activities of firefly and Renilla luciferases were measured sequentially from a single sample, using the CLARIOstar plate reader. Relative luciferase values were obtained from triplicate samples dividing the firefly luminescence activity by the CMV-induced Renilla activity, and standard deviation was calculated.

1 .1 1 TOPFLASH assay using liposome encapsulation to deliver designed TPR proteins into the cell

10 5 HEK 293T cells in 500 μΐ of Dulbecco's Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum were grown overnight in each well of a 24-well cell culture plate. For TOPFLASH reporter assays, 100 ng/well of TOPFLASH plasmid and 10 ng/well of CMV- Renilla plasmid (as internal control) were used to transfect cells in 24-well plates. Cells were transfected with the Lipofectamine 2000 transfection reagent according to the

manufacturer's protocol (Invitrogen). Transfected cells were allowed to recover for 8 hours, and Wnt signalling was activated by addition of Wnt3A-conditioned media obtained from L- cells. 16 hours post Wnt pathway activation, proteins were delivered into the cells by liposomal treatment. Cells were incubated with liposome (LFP)-media (DMEM without FBS) mix for 15 minutes at 37°C followed by one PBS wash. Wnt3A conditioned media was replaced and cells were incubated for variable time durations (2-8 hours). Following

incubation, TOPFLASH assays were performed using the Dual-Luciferase Reporter Assay System (Promega) (Korinek et al., 1997) following the manufacturer's instructions. Relative luciferase values were obtained from triplicate samples (from two independent experiments) by dividing the firefly luciferase values (from TOPFLASH) by the Renilla luciferase values (from CMV renilla), and standard deviations were calculated. 1 .12. Competition fluorescence polarisation (FP) assay to measure the binding of designed Nrf-TPR proteins to Keapl

To measure the binding of the designed Nrf-TPR proteins to Keapl , Competition FP was performed using 384-well black opaque optiplate microplates and a CLARIOstar microplate reader. Nrf-TPR proteins were titrated into a solution containing a mixture of FITC-labelled Nrf2 peptide and Keapl protein. The prepared plates were incubated for 30 minutes at room temperature before readings were taken.

2. Results

Tetratricopeptide repeat (TPR) is a 34-residue motif that can be repeated in tandem to generate modular proteins. TPRs are used here as an example of helix-turn-helix tandem-repeats arrays, but any tandem repeat array may be used.

RTPR proteins comprising TPRs were derived from the consensus TPR sequence (CTPR). Two repeats were found to be sufficient to generate a highly stable mini-protein of 68 amino acids (RTPR2). The biophysical properties of two types of engineering strategy; loop

insertions and terminal helix grafting, were assessed. The molar ellipticity at 222 nm (a measure of helical secondary structure content) of three different RTPR modules was monitored as a function of increasing temperature. A decrease in the absolute molar ellipticity with increasing temperature indicates a loss of structure and the unfolding of the protein. Even at the highest temperature recorded (85°C), the RTPR2 protein without insertion was not fully denatured (Figure 1 ). RTPR2 with a 20-residue unstructured loop between the two repeats showed a small shift to a lower melting temperature (Figure 1 ), but the protein remains fully folded up to 55°C. This is well above physiologically relevant temperatures. RTPR2 with an additional N-terminal helix showed an increase in absolute molar ellipticity, indicating that the additional helical domain is folded. Moreover, unlike the loop insertion, the helix domain was capable of stabilising the RTPR2 module, shifting the transition midpoint to above 90°C (Figure 1 ). These results showed that the two engineering strategies generated folded and stable modular mini-proteins capable of withstanding high temperatures.

A key feature of the TPR scaffold was its modular nature. This modularity allowed us to display any number of binding modules in tandem to obtain bi- and multi-valent and multifunctional molecules against one, two or more targets. The stability of these proteins was shown to be modular. The stabilities of proteins comprising TBP-CTPR2 (a two-repeat CTPR with a loop insertion that binds to the protein tankyrase (Guettler et al. 201 1 )) repeated in tandem were measured. The TBP-CTPR2-containing proteins had two, four, six, and eight repeats, and they displayed one, two, three and four binding loops, respectively. The helical content of the proteins, monitored by molar ellipticity at 222 nm, was found to increase in proportion to the number of repeats, as did the stability, indicating that they were behaving like classic helical repeat proteins (Figure 2). These results demonstrate that bi- or multifunctional modular binding proteins have a high thermostability.

2.1 . Demonstration of proteins with a single binding function grafted onto an alpha-helix 2.1 .1 SOS1 -TPR, a helix-grafted binding module designed to bind to oncoprotein KRAS

First, we mapped the helix of SOS1 that interacts with KRAS (Margarit et al. 2003 Cell 1 12 5 685-695) onto the heptad distribution. We matched the heptad positions with the stapled SOS1 helical peptide produced by Leshchiner ef al. (PNAS 2015 1 12 (6) 1761 -1766) and set the stapled side of the peptide to form the hydrophobic interface with the rest of the TPR protein (Fig. 3A). The length of the helix is important. An N-terminal solvating CTPR helix ends in the sequence DPN N, which forms a short loop that leads into the next repeat. CTPR-mediated "stapling" (constraining) of binding helices therefore occurred through residues Tyr (/ ' ) - lie (i+4) - Tyr (i+7) - Leu fully stapling a 15-residue helix.

We created a hydrophobic interface between the grafted helix and the adjacent repeat and allowed the formation of the DPNN loop at the C-terminal end of the grafted helix. We then grafted the final sequence onto the crystal structure of a CTPR B helix for further validation of the interaction. Our designed KRAS-binding protein, SOS1 -TPR, was docked against KRAS using the Haddock software (de Vries & Bonvin 201 1 ; de Vries et al. 2010). Haddock is a data- driven docking algorithm that uses known information about the interaction for its calculations. The crystal structure of SOS 1 -KRAS (PDB: 1 NVU) (Margarit et al. 2003) was originally used to design the stapled peptide. The active (primary interaction residues) and the passive (5 A proximity to active) residues were extracted and inputted into the calculations.

Since the initial work, we have found that docking is not necessary to validate new helical modules. The design strategy has a solid theory based on the geometry of a-helices, and a design will be successful as long as the key binding residues have been grafted. These TPR repeats were thus found to be exceptional scaffolds to display binding helices, as they grow linearly in the opposite direction of the helix, thereby avoiding any steric clashes with the target protein.

Next, we monitored KRAS binding using the change in fluorescence polarisation of mant-GTP (2'-/3'-0-(N'-Methylanthraniloyl) guanosine-5'-0-triphosphate), a fluorescent analog of GTP (Fig. 3B). The fluorescence of mant-GTP is dependent on the hydrophobicity of its environment (excitation at 360 nm, emission at 440 nm). An increase in fluorescence intensity and fluorescence polarization was observed previously upon binding to KRAS (Leshchiner et al. 2015). SOS-TPR2 was then titrated into the preformed mant-GTP-KRAS complex. There was a clear decrease in polarisation with increasing concentrations of SOS-TPR2, indicating displacement of mant-GTP upon binding of SOS-TRP2 to KRAS (Figure 3B). Fitting the data gave an EC50 of 3.4 μΜ. In contrast, a blank protein, CTPR3, had no effect on the fluorescence polarisation.

2.1.2 p53-TPR, a helix-grafted binding module designed to bind to Mdm2

Many degrons (region within the substrate that is recognized by the E3 ubiquitin ligase) are unstructured. However, p53 binds to the Mdm2 E3 through an alpha helix (Figure 4A). Stapled versions of the p53 helix, as well as circular peptides and grafted coiled coils, have been developed by many groups, and the sequences have been optimised to give nanomolar affinities in some cases (see for example, Ji et al. 2013; Lee et al. 2014; Kritzer et al. 2006). The p53 helix has a favourable geometry to be grafted onto the C-terminal solvating helix of the CTPR scaffold, and moreover the two helices have 30% sequence identity.

Proof of binding of p53-CTPR2 to Mdm2 (N-terminal domain) was obtained using isothermal titration calorimetry (ITC). Mdm2 was titrated into a solution containing 10 μΜ of p53-TPR2. ITC measures the heat released upon binding. A high-affinity interaction was observed with a dissociation constant of approximately 50 nM (Figure 4B).

2.2. Demonstration of proteins with a single binding function grafted onto an inter-repeat loop 2.2.1 TPB2-TPR, a loop module designed to bind to oncoprotein tankyrase

First, we introduced the SUM "3BP2", a sequence that binds to the substrate-binding ankyrin- repeat clusters (ARC) of the protein tankyrase, a multi-domain poly ADP-ribose polymerase that is upregulated in many cancers (Guettler et al. 201 1 ) onto the CTPR scaffold. Grafting SLiMs in folded domains led to an increase of proteolysis resistance; showing the potential to expand the interaction surface through further rational engineering, in silico methods and/or directed evolution; controlled geometric arrangement; and bi- or multivalency of interactions.

We tested the binding of 1TBP-CTPR2, 2TBP-CTPR4 and 3TBP-CTPR6 to the ARC4 domain of tankyrase using ITC (Figure. 5A). This technique is particularly useful for these interactions, as it can measure the stoichiometry (n) of the interaction. We showed that n increased with the number of binding loops, meaning that there were as many tankyrase molecules bound to one TBP-CTPR as loops in the protein. Thus, all loops are accessible to the binding partner.

Moreover, the binding affinity increases and the off rate decreases with the number of repeats indicative of an avidity effect. This type of multivalent molecule would be particularly useful for full-length tankyrase, as it has four ARC domains capable of binding the 3BP2 peptide.

Multivalency in this system was increased further via oligomerisation of the binding modules by fusing them to the foldon domain of T4 fibritin (Fig. 5B). This trimerisation domain comprises of a C-terminal helix, such as that of p53-CTPR, ending with the foldon domain, a short β-sheet peptide capable of homo-trimerising. The foldon domain has been shown to be highly stable and independently folded (Boudko et al 2002; Meier et al. 2004). In this way, multiple binding modules can be arranged with specified geometries to inhibit complex multivalent molecules that cannot be targeted with monovalent interactions due to their natural tendency to interact with other multivalent networks with high avidity.

2.2.2 Effect of introducing multivalency into a single binding function TPR

We tested the function of multi-valent CTPR proteins containing variable numbers of the

"3BP2" motif that binds to the protein tankyrase. (1 TBP-CTPR2, 2TBP-CTPR4 and 3TBP- CTPR6 etc.). Multi-valency was increased further via oligomerisation of the TPRs by fusing them to the foldon domain of T4 fibritin (1 TBP-CTPR2-Foldon, 2TBP-CTPR4-Foldon etc.).

Tankyrase is upregulated in many cancers and exerts its effect by downregulating beta- catenin. Therefore, the inhibitory effect of the TBP-grafted TPRs was assayed using a beta- catenin reporter gene assay (TOPFLASH assay). Increasing the number of functional units increased the inhibitory effect of the proteins, as mentioned using a Wnt signalling assay (Figure 17).

2.2.3 Skp2-RTPR, a loop module designed to bind to E3 ubiquitin ligase SCF Skp2

Skp2 is the substrate recognition subunit of the SCF Skp2 ubiquitin ligase. The Skp2-binding sequence that we inserted into the RTPR loop was based on the previously published degron peptide sequence derived from the substrate p27 that binds to Skp2 in complex with Cks1 (an accessory protein) (Hao et al. 2005). We used only 10 residues of this peptide. Although ideally the Skp2-binding sequence would include a phospho-threonine (as this residues makes some key contacts with Skp2 and Cks1 ), we instead explored whether we could replace the phospho- threonine with a phosphomimetic (glutamate) without affecting binding affinity. We found using co-immunoprecipitation that the resulting p27-TPR protein was able to bind to Skp2 (Fig. 6A) and that it was able to inhibit the ubiquitination of p27 in vitro with a high efficiency indicating a dissociation constant of the order of 30 nM (Fig. 6B). As the peptide adopts a turn-like conformation in its Skp2/Cks1 -bound state, constraining it within the RTPR scaffold leads to a large enhancement in binding affinity that outweighs any loss in affinity arising from replacing the phosphothreonine with a phosphomimetic.

2.2.4 Nrf-TPR, a loop module designed to bind to E3 ubiquitin ligase Keap1 -Cul3

Keapl is the substrate recognition subunit of the Keap1 -Cul3 ubiquitin ligase. A Keapl -binding sequence that we inserted into the CTPR loop was based on the previously published degron peptide sequence derived from the Keapl substrate Nrf2. We found using co- immunoprecipitation that the resulting Nrf-TPR protein was able to bind to Keapl (Fig. 7A) and that the interaction had a high affinity in the low nanomolar range as measured by ITC analysis (Fig. 7B).

2.3. Engineering the RTPR scaffold for delivery into the cell

Combining our RTPR sequences with an alternative consensus TPR sequence (Parmeggiani et al. 2015) we included additional solvent-exposed Arginine residues, as such 'resurfacing' or 'supercharging' has been shown previously to facilitate the entry of proteins into cells (Chapman & McNaughton 2016; Thompson et al. 2012). Figure 8 shows that this approach was successful in delivering a fluorescent-labelled resurfaced TBP-RTPR2 protein into two different cell lines.

2.4. Design of hetero-bifunctional TPRs to direct proteins for ubiquitination and subsequent degradation

The Wnt/p-catenin signalling pathway is deregulated in many cancers and in neurodegenerative diseases, and therefore β-catenin is an important drug target. There are a large number of known binding sequences (both helical and non-helical) for β-catenin that appear suitable for grafting onto the TPR scaffold, and therefore we chose it as the first target for our design of hetero-bifunctional TPRs to induce protein degradation. We selected Mdm2 and SCF Skp2 to test as E3 ubiquitin ligases, as we had successfully generated single-function TPRs to bind to them (Figs. 4 and 6). We generated structural models of some of the hetero-bifunctional molecules and used these as a crude assessment of whether the resulting presentation of β-catenin to the E3 looked appropriate. We then generated a small library of plasmids encoding proteins comprising three or four TPRs functionalized with different combinations of the β-catenin-binding module and the two E3 ligase-binding modules.

We transfected HA-tagged β-catenin plasmid alone or HA-tagged β-catenin plasmid together with one of the various hetero-bifunctional TPR plasmids in HEK293T cells using

Lipofectamine2000. After 48 hours of transfection, the cells were lysed, the sample was boiled and proteins were resolved by SDS-PAGE and immunoblotting was performed using anti-HA and anti-actin antibodies. Changes in β-catenin levels were evaluated by the densitometry of the bands corresponding to ΗΑ-β-catenin normalised to actin levels (Fig. 9). The results show that a number of the hetero-bifunctional molecules are capable of reducing β-catenin levels by up to 70%. In contrast, neither a blank TPR nor single-function TPRs have any effect on β-catenin levels.

A range of different factors contribute to efficient ubiquitination and target degradation by these hetero-bifunctional molecules, hence the power of screening different combinations of single-function modules and potentially also different lengths of intervening blank modules.

2.5 Using a delivery vehicle to introduce the modular TPR proteins into cells

We encapsulated the designed TPR proteins within fusogenic liposomes made from cationic, neutral, and aromatic lipids, and we showed that they were thereby delivered into cells

(Figures 18 and 19). Empty liposomes and liposomes encapsulating TPR proteins are not toxic to the cell (Figure 20).

2.6 Further examples of hetero-bifunctional TPRs to direct proteins for ubiquitination and subsequent degradation

Hetero-bifunctional TPR proteins were designed to target either tankyrase (Figure 21 ), beta- catenin (Figure 22) or KRAS (Figure 23) for ubiquitination and degradation. TPR proteins targeting tankyrase or beta-catenin were delivered into cells using liposome encapsulation, and the effect on Wnt signalling was assayed using a TOPFLASH assay. The results show that the designed hetero-bifunctional TPR proteins are able to inhibit Wnt signalling. For KRAS, we transfected KRAS plasmid alone or KRAS plasmid together with one of the TPR plasmids in HEK293T cells using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The results show that the designed hetero-bifunctional TPR is capable of reducing KRAS levels. 2.7 Hetero-bifunctional TPRs to direct KRAS for degradation via chaperone-mediated autophagy (CMA)

Hetero-bifunctional TPR proteins were designed to target endogenous KRAS for degradation via CMA (Figure 24). TPR constructs or empty vector (light grey) were transiently transfected into either HEK293T or DLD1 (colorectal cancer cell line) using Lipofectamine2000. 24 hours post transfection the cells were lysed, and KRAS levels were evaluated by western blot. The designed hetero-bifunctional TPRs that resulted in reduction of KRAS levels compared to the empty vector control are shown in white. 2.8 Variations in the linker sequence connecting a binding domain to an inter-repeat loop The linker sequence connecting a binding domain to an inter-repeat loop was varied in order to optimise the binding affinity for the target for Nrf-TPR, a TPR protein designed to bind to the protein Keapl (see Fig. 7). Glycine residues were introduced into the linker to provide flexibility and increased spatial sampling. The introduction of this more flexible linker

sequence was found to increase the binding affinity of the Nrf-TPR protein (labelled

'Flexible') when compared with the consensus-like linker sequence Altering the charge content of the linker sequence ('labelled 'Charged') and altering the conformational

properties (based on the predictions of the program CIDER (Holehouse et al. Biophys. J.

1 12, 16-21 (2017)) of the loop by changing the amino acid composition of the linker

sequence (labelled 'CIDER-optimised') also affected the Keapl -binding affinity (Figure 25).

Table 2

Table 3

CLUSTAL multiple sequence alignment by MUSCLE (3.8)

Multiple Alignment of DNA sequences of all CTPR and RTPR used in hetero-bifunctional

CTPRs and RTPRs to date

References

Bondeson, D.P., Mares, A., Smith, I.E.D., Ko, E., Campos, S., Miah, A.H., Mulholland, K.E., Routly, N., Buckley, D.L., Gustafson, J.L., et al. (2015). Catalytic in vivo protein knockdown by small-molecule PROTACs. Nat. Chem. Biol. 1 1 , 61 1-617.

Boudko, S.P., Londer, Y.Y., Letarov, A. V, Sernova, N. V, Engel, J., and Mesyanzhinov, V. V (2002). Domain organization, folding and stability of bacteriophage T4 fibritin, a segmented coiled-coil protein. Eur. J. Biochem. 269, 833-841.

Brunette, T.J., Parmeggiani, F., Huang, P.-S., Bhabha, G., Ekiert, D.C., Tsutakawa, S.E., Hura, G.L., Tainer, J.A., Baker, D. (2015) Exploring the repeat protein universe through computational protein design. Nature 528, 580-584.

Chapman & McNaughton, B.R. (2016). Scratching the surface: Resurfacing proteins to endow new properties and function. Cell Chem. Biol. 23, 543-553.

D'Andrea, L.D., and Regan, L. (2003). TPR proteins: the versatile helix. Trends Biochem. Sci. 28, 655-662.

Deshaies, R.J. (2015). Protein degradation: Prime time for PROTACs. Nat. Chem. Biol. 1 1 , 634-635.

de Vries, S.J., and Bonvin, A.M. J. J. (201 1 ). CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One 6, e17695.

de Vries, S.J., van Dijk, M., and Bonvin, A.M.J.J. (2010). The HADDOCK web server for data- driven biomolecular docking. Nat. Protoc. 5, 883-897.

Guettler, S., LaRose, J., Petsalaki, E., Gish, G., Scotter, A., Pawson, T., Rottapel, R., and

Sicheri, F. (201 1 ). Structural basis and sequence rules for substrate recognition by Tankyrase explain the basis for cherubism disease. Cell 147, 1340-1354.

Gijthe, S., Kapinos, L, Moglich, A., Meier, S., Grzesiek, S., and Kiefhaber, T. (2004). Very Fast Folding and Association of a Trimerization Domain from Bacteriophage T4 Fibritin. J. Mol. Biol. 337, 905-915.

Hao, B., Zheng, N., Schulman, B.A., Wu, G., Miller, J.J., Pagano, M., Pavletich, N.P. (2005). Structural basis of the Cks1 -dependent recognition of p27(Kip1 ) by the SCF(Skp2) ubiquitin ligase. Mol. Cell 20, 9-19.

Kobe, B. & Kajava, A.V. (2000). When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends in Biochem. Sci. 25, 509-515.

Lee, J.-H., Kang, E., Lee, J., Kim, J., Lee, K.H., Han, J., Kang, H.Y., Ahn, S., Oh, Y., Shin, D., et al. (2014). Protein grafting of p53TAD onto a leucine zipper scaffold generates a potent HDM dual inhibitor. Nat. Commun. 5, 3814.

Leshchiner, E.S., Parkhitko, A., Bird, G.H., Luccarelli, J., Bellairs, J.A., Escudero, S., Opoku- Nsiah, K., Godes, M., Perrimon, N., and Walensky, L.D. (2015). Direct inhibition of oncogenic KRAS by hydrocarbon-stapled SOS1 helices. Proc. Natl. Acad. Sci. U. S. A. 1 12, 1761-1766.

Longo, L.M. & Blaber, M. (2014). Symmetric protein architecture in protein design: to-down symmetric deconstruction. Methods Mol. Biol. 1216, 161 -82. Lu, J., Qian, Y., Altieri, M., Dong, H., Wang, J., Raina, K., Hines, J., Winkler, J.D., Crew, A.P., Coleman, K., et al. (2015). Hijacking the E3 Ubiquitin Ligase Cereblon to Efficiently Target BRD4. Chem. Biol. 22, 755-763.

Margarit, S.M., Sondermann, H., Hall, B.E., Nagar, B., Hoelz, A., Pirruccello, M., Bar- Sagi, D., and Kuriyan, J. (2003). Structural evidence for feedback activation by Ras.GTP of the Ras- specific nucleotide exchange factor SOS. Cell 1 12, 685-695.

Meier, S., Guthe, S., Kiefhaber, T. and Grzesiek, S. (2004). Foldon, the natural trimerization domain of T4 fibritin, dissociates into a monomeric A-state form containing a stable beta-hairpin: atomic details of trimer dissociation and local beta-hairpin stability from residual dipolar couplings. J. Mol. Biol 344, 1051 -1069.

Parmeggiani, F., Huang, P.-S., Vorobiev, S., Xiao, R., Park, K., Caprari, S., Su, M.,

Seetharaman, J., Mao, L, Janjua, H., Montelione, G.T., Hunt, J., Baker, D. (2015) A general computational approach for repeat protein design. J. Mol. Biol. 427, 563-575.

Rowling, P.J., Sivertssson, E.M., Perez-Riba, A., Main, E. R., Itzhaki, L.S. (2015) Biochem. Soc. Trans. 43 881 -888.

Tamaskovic, R., Simon, Stefan, N., Scwhill, Pluckthun, A. (2012). Designed ankyrin repeat proteins (DARPins): From research to therapy. Methods in Enzym. 503, 101 -134.

Thompson ,D.B., Cronican, J. J., Liu, D.R. (2012). Engineering and identifying supercharged proteins for macromolecule delivery into mammalian cells. Methods Enzymol. 503, 293-319.